Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache 26% more coins: Reduce CCoinsMap::value_type from 96 to 76 bytes #17060

Closed
wants to merge 3 commits into from

Conversation

@martinus
Copy link
Contributor

@martinus martinus commented Oct 5, 2019

This is an attempt to more tightly pack the data inside CCoinsMap. (replaces #16970, done after my investigations on #16957) Currently, there is quite a bit of overhead involved in the CCoinsMap::value_type data structure (total bytes to the left):

96 CCoinsMap::value_type
    36 COutPoint
        32 uint256
         4 uint32_t
     4 >>> PADDING <<<
    56 CCoinsCacheEntry
        48 Coin
            40 CTxOut
                8 nValue
                32 CScript
             4 fCoinBase & nHeight
             4 >>> PADDING <<<
         1 flags (dirty & fresh)
         7 >>> PADDING <<< 

So there is quite a bit of padding data. In my experiements I've noticed that the compiler is forced to use a padding size >=8 only because nValue's CAmount type is int64_t which has to be 8-byte aligned. When replacing nValue with a 4-byte aligned data structure, the whole CCoinsMap::value_type will be aligned to 4 bytes, reducing padding.

Another 4 bytes can be saved by refactoring prevector to only use a single byte for direct size information, and reducing CScript's size from 28 to 27 bytes. It is still able to directly cache most scripts as most are <= 25 bytes long.

The remaining 4 bytes due to the 1 byte flag + 3 bytes padding in CCoinsCacheEntry can be removed by moving the flags into Coin, stealing another 2 bits from nHeight.

Finally, the resulting data structure is 20 bytes smaller:

76 CCoinsMap::value_type
    36 COutPoint
        32 uint256
         4 uint32_t
    40 CCoinsCacheEntry
        40 Coin
            36 CTxOut
                 8 nValue
                28 CScript
            4 fCoinBase & nHeight & flags (dirty & fresh)

So we can store about 26% more data into dbcache's memory. I have evalued this on my Intel i7-8700, 32GB RAM, external SSD, for -reindex-chainstate with both -dbcache=500 and -dbcache=5000:

out2

    time max resident set size
master -dbcache=500 05:57:20 2957532
2019-09-more-compact-Coin -dbcache=500 05:33:37 2919312
Improvement   6,64% 1,29%
    time max resident set size
master -dbcache=5000 04:22:42 7072612
2019-09-more-compact-Coin -dbcache=5000 04:09:16 6533132
Improvement   5,11% 7,63%

So on my machine there is definitive an improvement, but honestly I am not sure if this ~6% improvement is worth the change. Maybe the improvement is bigger with slower disk, as ~26% more transaction can be cached.

@DrahtBot
Copy link
Contributor

@DrahtBot DrahtBot commented Oct 5, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #18113 ([coins] Don't allow a coin to spent and FRESH. Improve commenting. by jnewbery)
  • #18087 (Get rid of VARINT default argument by sipa)
  • #18000 ([WIP] Coin Statistics Index by fjahr)
  • #17708 (prevector: avoid misaligned member accesses by ajtowns)
  • #17487 (coins: allow write to disk without cache drop by jamesob)
  • #9384 (CCoinsViewCache code cleanup & deduplication by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@Sjors
Copy link
Member

@Sjors Sjors commented Oct 6, 2019

Would be nice to test this on a Raspberry-like device. They tend to have low memory and slow storage. Unfortunately they generally also require pruning, and the way pruning currently works we flush the coin cache.

For example I have an Orange Pi chipping away at IBD this week; although it has 2 GB of RAM, because I'm pruning it to 5 GB, it never uses more than 230 MB.

@MarcoFalke
Copy link
Member

@MarcoFalke MarcoFalke commented Oct 7, 2019

For example I have an Orange Pi chipping away at IBD this week; although it has 2 GB of RAM, because I'm pruning it to 5 GB, it never uses more than 230 MB.

A 512 GB micro SD card should do the job, no?

@fanquake
Copy link
Member

@fanquake fanquake commented Oct 8, 2019

@TheBlueMatt You might be interested here given you commented in #16970.

src/primitives/transaction.h Outdated Show resolved Hide resolved
src/primitives/transaction.h Outdated Show resolved Hide resolved
@laanwj
Copy link
Member

@laanwj laanwj commented Oct 9, 2019

A 512 GB micro SD card should do the job, no?

Yep, works fine.

src/primitives/transaction.h Outdated Show resolved Hide resolved
@fanquake
Copy link
Member

@fanquake fanquake commented Oct 9, 2019

Please lets keep the discussion high level before pointing out any typos.

Copy link
Member

@promag promag left a comment

Concept ACK. Didn't verified the claim but 26% sounds a good improvement to me.

AFAICT dirty and fresh are almost read and written at the same time, maybe use a 2 bit flag? Or is compiler doing that anyway?

@laanwj
Copy link
Member

@laanwj laanwj commented Oct 12, 2019

@promag that's what the bitfield syntax unsigned int dirty : 1; does, it makes the flags take up the minimum possible space.
Or maybe I'm misunderstanding what you're trying to accomplish.

@GChuf
Copy link
Contributor

@GChuf GChuf commented Oct 12, 2019

Would like to test this on my VM using a HDD.
I'm not sure how to benchmark this - I've tried https://github.com/chaincodelabs/bitcoinperf but haven't succeeded. Any info/help would be appreciated.

@MarcoFalke
Copy link
Member

@MarcoFalke MarcoFalke commented Oct 12, 2019

bitcoinperf is a bit messy to set up. I think it requires docker. Is there any specific issue you ran into? I guess it might be better to report such upstream at https://github.com/chaincodelabs/bitcoinperf to not bloat this thread

@martinus
Copy link
Contributor Author

@martinus martinus commented Oct 12, 2019

Would like to test this on my VM using a HDD.
I'm not sure how to benchmark this - I've tried https://github.com/chaincodelabs/bitcoinperf but haven't succeeded. Any info/help would be appreciated.

In my benchmarks I deleted debug.log, ran /usr/bin/time -v ./bitcoind -reindex-chainstate -stopatheight=594000 -printtoconsole=0 -dbcache=500, then used a hand written script to convert the debug.log into gnuplot-readable data, then used gnuplot to create the graphs.

It would be awesome to have some kind of logger in bitcoind that can periodically log statistics into a file in an easily parseable data.

@GChuf
Copy link
Contributor

@GChuf GChuf commented Oct 12, 2019

There were a couple issues. Could you recommend anything else to test -reindex-chainstate? Should I simply run bitcoind and have linux count time ... ?
Or maybe I'll look around things mentioned in this stackexchange answer.

p.s. thanks for the info @martinus. I'll figure something out with what you did and those tools mentioned in the link.

@promag
Copy link
Member

@promag promag commented Oct 12, 2019

@laanwj I mean stuff like this:

        auto oldDirty = coin.dirty;
        auto oldFresh = coin.fresh;

could be

        auto old_cache = coin.cache;

where

    unsigned int cache : 2;   // DIRTY = 1,  FRESH = 2

@GChuf
Copy link
Contributor

@GChuf GChuf commented Oct 15, 2019

The debug.log from first benchmark (master) got overwritten by the second benchmark, and I didn't make a script to do a graph from debug.log yet anyway, so I'm just gonna post the /usr/bin/time -v results.
The improvement seems remarkable! I tested this on Ubuntu 16 VM running on a Seagate SSHD (also called bybrid) disk. It seems however the process used more CPU - not sure what's going on with those percentages though. Must be a VM bug.

First Header Second Header time max resident set size
2019-09-more-compact-Coin dbcache=500 2:20:09 1673108
Master dbcache=500 3:25:46 1729412
Improvement 31,9% 3,26%

	Command being timed: "./bitcoindmaster -reindex-chainstate -stopatheight=325000 -printtoconsole=0 -dbcache=500"
	User time (seconds): 19760.08
	System time (seconds): 1031.02
	Percent of CPU this job got: 168%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 3:25:46
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1729412
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 2088592
	Voluntary context switches: 38379113
	Involuntary context switches: 2185645
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

	Command being timed: "./bitcoindcoin -reindex-chainstate -stopatheight=325000 -printtoconsole=0 -dbcache=500"
	User time (seconds): 19004.87
	System time (seconds): 598.24
	Percent of CPU this job got: 233%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:20:09
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1673108
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 1737560
	Voluntary context switches: 17203875
	Involuntary context switches: 2926137
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@martinus
Copy link
Contributor Author

@martinus martinus commented Oct 15, 2019

Thanks @GChuf for testing this! This are really nice results.

It seems however the process used more CPU

I don't think so:

  • master had 19760.08 sec user + 1031.02 sec system = 20791.1 sec
  • 2019-09-more-compact-Coin had 19004.87 sec user + 598.24 sec system = 19603,11

So the branch used ~6% less CPU. Note that the percentage 233% is much higher than for master because it used that CPU in a much shorter timeframe.

The percentage is calculated 100*(user + system time) / elapsed. 233 percent means that on average 2.33 cores were running for this job.

@GChuf
Copy link
Contributor

@GChuf GChuf commented Oct 16, 2019

The percentage is calculated 100*(user + system time) / elapsed. 233 percent means that on average 2.33 cores were running for this job.

Thanks for the explaination! The math works :)
If more testing is needed I'd be glad to help, but I think HDDs will show similar results to my SSHD and it's clear that the improvements are real.

@jamesob
Copy link
Member

@jamesob jamesob commented Oct 17, 2019

Concept ACK from me, seems like some great savings here. Though it'll be crucial to test this thoroughly across platforms. Will review in depth soon.

@GChuf
Copy link
Contributor

@GChuf GChuf commented Oct 17, 2019

I can also test it on windows. Does anyone have the windows executable for this already, to make my life easier? @martinus maybe?

@sipa
Copy link
Member

@sipa sipa commented Oct 17, 2019

These are pretty impressive gains, so concept ACK.

Some overall comments:

  • Since this is making the boundary between Coin and CCoinsCacheEntry a bit less clear, maybe it would be useful to either have a Coin::AssignWithoutFlags function, or even make Coin::operator= not touch the flags. That would avoid all the oldX = ...; ...; ... = oldX code, and make review easier.
  • The coding style guidelines for variables say to use snake_case in new code.

I do want to review the prevector changes in more detail.

@martinus
Copy link
Contributor Author

@martinus martinus commented Oct 17, 2019

I can also test it on windows. Does anyone have the windows executable for this already, to make my life easier? @martinus maybe?

Sorry, I build only in Linux, never tried to build in Windows

Since this is making the boundary between Coin and CCoinsCacheEntry a bit less clear, maybe it would be useful to either have a Coin::AssignWithoutFlags function, or even make Coin::operator= not touch the flags. That would avoid all the oldX = ...; ...; ... = oldX code, and make review easier.

I'm also unsure if it was everywhere necessary to backup and then restore the previous flags. Moving the flags into Coin only saves us 4bytes. I could also remove that change and do this in a separate PR.

@MarcoFalke
Copy link
Member

@MarcoFalke MarcoFalke commented Oct 17, 2019

@DrahtBot might have a windows build ready tomorrow

@dongcarl
Copy link
Contributor

@dongcarl dongcarl commented Oct 17, 2019

I have Gitian-built windows binaries of 03cb535 here for those who are keen (@GChuf): https://send.firefox.com/download/c78ac103a4dabe86/#S70uxKCUua-baEeR0v029g

@jamesob
Copy link
Member

@jamesob jamesob commented Oct 29, 2019

Okay, I've finally gotten around to taking perf to the weird bench-strong results. I took measurements at progress=0.20 (roughly height 387,000) by doing

$ # configure with -fno-omit-frame-pointer, tune system params to enable perf, make, etc.
$ ./src/bitcoind -reindex-chainstate -stopatheight=550000 -dbcache=7000 -printtoconsole=0 -connect=0 &
$ while ! tail -n 10000 ~/.bitcoin/debug.log | grep progress=0.2; do sleep 3; done; kill -STOP `pidof bitcoind`
[... I wait ...]
$ kill -CONT `pidof bitcoind` && sleep 0.4 && perf record -g -F 101 -p `pidof bitcoind` -o perf.$THE_BRANCH.data sleep 90

After running the above for both this branch and master at 6a97e8a, here's perf diff perf.master.data perf.martinus.data:

# Event 'cycles:ppp'
#
# Baseline  Delta Abs  Shared Object              Symbol
# ........  .........  .........................  ......................................................................................
#
     2.41%     +3.06%  bitcoind                   [.] SipHashUint256Extra
    15.36%     +2.93%  bitcoind                   [.] secp256k1_fe_mul_inner
     4.09%     -2.21%  libc-2.28.so               [.] 0x000000000007b544
     0.26%     +1.52%  bitcoind                   [.] (anonymous namespace)::ripemd160::rol
    14.02%     -1.42%  bitcoind                   [.] secp256k1_fe_sqr_inner
     1.99%     -1.40%  bitcoind                   [.] sha256d64_avx2::(anonymous namespace)::Xor
     1.48%     -1.32%  bitcoind                   [.] operator new
               +1.12%  bitcoind                   [.] prevector<27u, unsigned char, unsigned int, int>::const_iterator::operator!=
               +1.10%  bitcoind                   [.] prevector<27u, unsigned char, unsigned int, int>::const_iterator::operator++
     0.00%     +1.07%  bitcoind                   [.] ser_writedata32<CSizeComputer>
               +0.99%  bitcoind                   [.] prevector<27u, unsigned char, unsigned int, int>::is_direct
               +0.96%  bitcoind                   [.] prevector<27u, unsigned char, unsigned int, int>::size
     0.03%     +0.92%  bitcoind                   [.] sha256d64_avx2::(anonymous namespace)::Or

I think the percentages are misleading here because when I run sudo perf top --comms=b-loadblk, this is what I see:

Samples: 236K of event 'cycles:ppp', 2000 Hz, Event count (approx.): 252595020173
Overhead  Shared Object              Symbol
   2.79%  bitcoind                   [.] SipHashUint256Extra
   1.00%  bitcoind                   [.] base_blob<256u>::GetUint64
   0.56%  bitcoind                   [.] std::__detail::_Hash_code_base<COutPoint, std::pair<COutPoint const, CCoinsCacheEntry>, std::__
   0.50%  bitcoind                   [.] SaltedOutpointHasher::operator()
   0.45%  bitcoind                   [.] prevector<28u, unsigned char, unsigned int, int>::const_iterator::operator*
   0.33%  bitcoind                   [.] std::_Hashtable<COutPoint, std::pair<COutPoint const, CCoinsCacheEntry>, std::allocator<std::pa
   0.26%  bitcoind                   [.] ser_writedata32<CHashWriter>
   0.23%  bitcoind                   [.] std::_Hashtable<COutPoint, std::pair<COutPoint const, CCoinsCacheEntry>, std::allocator<std::pa
   0.22%  bitcoind                   [.] std::__uninitialized_default_n_1<false>::__uninit_default_n<CTxOut*, unsigned long>
   0.18%  libc-2.28.so               [.] 0x000000000015878a
   0.17%  bitcoind                   [.] operator-

In other words, near as I can figure SipHashUint256Extra clocks in as the bottleneck on the loadblk thread, which is where the reindexing happens. Based on the (unreliable?) percentage values, it looks like this branch is spending more than twice the time hashing for map access than master is. Why this would be (or even whether or not this is the actual cause of the difference) is outside of my understanding.

I can upload the raw perf data files if anyone wants, but to my understanding they aren't portable - unfortunately perf is not very well documented, and I had trouble even generating flamegraphs on a secondary machine, so I'm not really sure what the best way to make the datafiles portable is.

@elichai
Copy link
Contributor

@elichai elichai commented Oct 29, 2019

@jamesob twice the time or twice the percentage?
Because I can confirm that my own benchmarks of master from ~2-3 weeks ago showed SipHashUint256Extra very very high on the list (spent a while re-implementing it with SSE instructions just to see it's actually slower with them lol)

(i'm usually using the following flags: perf report -g 'graph,0.5,caller' that way it sorts the percentages by caller instead of callee, though you probably know more about perf than I :) )

@martinus
Copy link
Contributor Author

@martinus martinus commented Oct 29, 2019

@jamesob did your master build already contain #16957?

@JeremyRubin
Copy link
Contributor

@JeremyRubin JeremyRubin commented Oct 29, 2019

@martinus yes it did -- are you not rebased?

Worth pointing out that a prevector<28> on master fits perfectly in a 32 byte or 64 byte cache line.

No longer after this change (which is fine, we can increase to prevector<31> where sensitive to it, like std::vector<prevector<>>.

Also, now that CAmounts are unaligned, they too can be extra slow to do something with.

I doubt that's what's going on here, but worth looking at before merge.

@martinus
Copy link
Contributor Author

@martinus martinus commented Oct 29, 2019

@martinus yes it did -- are you not rebased?

I didn't saw he linked to the master revision, so both master and my branch has #16957 so it's not that.

I suspect that there is something strange going on with the threading code. Maybe some timing issue causes lots of caching misses for some reason? I don't know much about how the checkqueue and surrounding classes work though.

@martinus
Copy link
Contributor Author

@martinus martinus commented Oct 29, 2019

I think I might have been able to reproduce this issue. I've tried ./bitcoind -datadir=/run/media/martinus/tera/bitcoin/db -reindex-chainstate -stopatheight=150000 -printtoconsole=0 -connect=0 -dbcache=15000 on master, and on an older version, before #16957. The old version used ~20 seconds of user time, master used 332 seconds. I've done git bisect and found this revision as the first bad commit: fa3a733

I don't know why, but this commit seems to have a dramatic effect on performance.

@sipa
Copy link
Member

@sipa sipa commented Oct 29, 2019

@martinus Are you aware that blocks before the assumevalid point don't get signature/script validated?

@MarcoFalke
Copy link
Member

@MarcoFalke MarcoFalke commented Oct 30, 2019

For benchmarking purposes, it might be best to set all nodes compiled from different branches to -noassumevalid or the same block hash.

@martinus
Copy link
Contributor Author

@martinus martinus commented Oct 30, 2019

@martinus Are you aware that blocks before the assumevalid point don't get signature/script validated?

No, I didn't know. Also it seems that since I didn't yet have this block, -reindex-chainstate signature/script validates everything

It might be helpful to have a list of assumevalid blocks instead of just a single block, to prevent revalidating everything when that specific marker block is not yet there

martinus added 3 commits Nov 8, 2019
This change reduces CCoinMap's value_type from 96 bytes to 80 bytes by
more tightly packing it's data. This is achieved by these changes:

* Refactored prevector so it uses a single byte to determine its size
  when in direct mode
* Reduce CScriptBase from 28 to 27 indirect bytes
* Introduced PackableCAmount to be able to align CTxOut to 4 bytes to
  prevent padding when used as a member in Coin

This tighter packing means more data can be stored in the coinsCache
before it is full and has to be flushed to disk. In my benchmark,
-reindex-chainstate was 6% faster and used 6% less memory. The cache
could fit 14% more txo's before it had to resize.
Removed CCoinsCacheEntry's 1 byte for the flags and put it directly into
Coin. That way we can get rid of unnecessary padding, which reduces the
memory requirement for the coinsCache. We steal 2 bits from Coin's
nHeight, so now there are only 29 bits left. Still, we are save until
block 2^29-1 = 536870911.
This requires the operator=(CAmount) which I've implemented here as well. Using static_cast for nicer formatting.

Note that it would be cleaner to make the user defined conversion`operator CAmount()` explicit too, but that would mean that practically everywhere it is used we need to add an explicit cast.
See "C.164: Avoid implicit conversion operators" http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c164-avoid-implicit-conversion-operators
@martinus martinus force-pushed the 2019-09-more-compact-Coin branch from 03cb535 to 22d1d94 Nov 8, 2019
@jamesob
Copy link
Member

@jamesob jamesob commented Nov 12, 2019

Did some benches over the weekend (bench/master.1 vs. bench/compactcoin.1) with the same setup/reindex-to-550,000 as mentioned in the allocator PR (#16801 (comment)):

host         tag                      time       time% maxmem  cpu%  dbcache
bench-ssd-2  master.1                 9:13:54    1.00  5770.10MB 344%  4000MB
bench-ssd-2  master.1                 9:15:57    1.00  5773.23MB 343%  4000MB
bench-ssd-2  compactcoin.1            9:07:01    0.98  5899.80MB 347%  4000MB
bench-ssd-2  compactcoin.1            9:07:41    0.99  5923.72MB 346%  4000MB

bench-ssd-3  compactcoin.1            9:06:32    0.98  5959.14MB 347%  4000MB
bench-ssd-3  compactcoin.1            9:07:35    0.98  6013.83MB 346%  4000MB
bench-ssd-3  master.1                 9:17:00    1.00  5771.56MB 342%  4000MB
bench-ssd-3  master.1                 9:17:24    1.00  5773.17MB 342%  4000MB

bench-ssd-4  compactcoin.1            9:08:11    0.98  5936.75MB 346%  4000MB
bench-ssd-4  compactcoin.1            9:07:46    0.98  5916.50MB 346%  4000MB
bench-ssd-4  master.1                 9:13:56    0.99  5770.13MB 344%  4000MB
bench-ssd-4  master.1                 9:17:07    1.00  5760.78MB 342%  4000MB

bench-ssd-5  compactcoin.1            9:03:06    0.98  5943.29MB 349%  4000MB
bench-ssd-5  compactcoin.1            9:04:24    0.98  5975.77MB 348%  4000MB
bench-ssd-5  master.1                 9:11:54    1.00  5769.57MB 346%  4000MB
bench-ssd-5  master.1                 9:13:56    1.00  5775.04MB 344%  4000MB

bench-strong compactcoin.1            14:59:15   0.99  5710.87MB 546%  4000MB
bench-strong compactcoin.1            14:59:00   0.99  5701.12MB 548%  4000MB
bench-strong master.1                 15:06:04   1.00  5720.08MB 544%  4000MB
bench-strong master.1                 15:04:26   1.00  5745.98MB 545%  4000MB

See modest 1-2% improvement in runtime and (oddly) on the SSD machines I'm seeing significantly higher memory usage (usually ~200MB) for this branch.

@ajtowns
Copy link
Contributor

@ajtowns ajtowns commented Dec 10, 2019

If CScript needs to be 8 byte aligned (because it has a pointer), then I think this becomes:

  • 88 CCoinsMap::value_type
    • 36 COutPoint
      • 32 uint256
      • 4 uint32_t
    • 4 PADDING
    • 48 CCoinsCacheEntry
      • 48 Coin
        • 40 CTxOut
          • 8 nValue
          • 32 CScript
        • 4 nHeight
        • 1 fCoinBase & flags (dirty & fresh)
        • 3 PADDING

and there's no need for PackableCAmount, and nHeight doesn't need bit fields. 96 byte to 88 is only an 9% improvement though.

(EDIT: with 32-bit pointers, CAmount alignment becomes your blocker, but not sure that optimising that heavily for 32-bit systems makes sense?)

@DrahtBot
Copy link
Contributor

@DrahtBot DrahtBot commented Feb 11, 2020

Needs rebase

@martinus
Copy link
Contributor Author

@martinus martinus commented Feb 18, 2020

I'll close this issue, as @ajtowns noted, this PR has illegally aligned the CScript to 4 bytes to get it to 28 bytes.

@martinus martinus closed this Feb 18, 2020
martinus added a commit to martinus/bitcoin that referenced this issue Aug 14, 2021
This PR is similar to bitcoin#17060, but without the incorrect alignment, and
with safeguards. CCoinsCacheEntry makes use of Coin's padding to store
the flag.

Unfortunately great care needs to be taken with this approach that the
padding is not accidentally overwritten, e.g. when assigning/moving a coin.

This PR solves this issue by chainging CCoinsCacheEntry with getters,
and having only a single MutableCoin() method that safely allows to modify
the coin member without risking the risk over accidentally overwriting
the padding used to store the flags.

The original struct layout was like this:

    96 CCoinsMap::value_type
        36 COutPoint
            32 uint256
            4 uint32_t
        4 >>> PADDING <<<
        56 CCoinsCacheEntry
            48 Coin
                40 CTxOut
                    8 nValue
                    32 CScript
                4 fCoinBase & nHeight
                4 >>> PADDING <<<
            1 flags (dirty & fresh)
            7 >>> PADDING <<<

After that PR the struct layout looks like this:

    88 CCoinsMap::value_type
        36 COutPoint
            32 uint256
            4 uint32_t
        4 >>> PADDING <<<
        48 CCoinsCacheEntry
            48 Coin
                40 CTxOut
                    8 nValue
                    32 CScript
                4 fCoinBase & nHeight
                4 >>> PADDING <<< stores flags (dirty & fresh)

That change should translate to ~8% less memory usage of the CCoinsMap.
martinus added a commit to martinus/bitcoin that referenced this issue Aug 14, 2021
This PR is similar to bitcoin#17060, but without the incorrect alignment, and
with safeguards. CCoinsCacheEntry makes use of Coin's padding to store
the flag.

Unfortunately great care needs to be taken with this approach that the
padding is not accidentally overwritten, e.g. when assigning/moving a coin.

This PR solves this issue by chainging CCoinsCacheEntry with getters,
and having only a single MutableCoin() method that safely allows to modify
the coin member without risking the risk over accidentally overwriting
the padding used to store the flags.

The original struct layout was like this:

    96 CCoinsMap::value_type
        36 COutPoint
            32 uint256
            4 uint32_t
        4 >>> PADDING <<<
        56 CCoinsCacheEntry
            48 Coin
                40 CTxOut
                    8 nValue
                    32 CScript
                4 fCoinBase & nHeight
                4 >>> PADDING <<<
            1 flags (dirty & fresh)
            7 >>> PADDING <<<

After that PR the struct layout looks like this:

    88 CCoinsMap::value_type
        36 COutPoint
            32 uint256
            4 uint32_t
        4 >>> PADDING <<<
        48 CCoinsCacheEntry
            48 Coin
                40 CTxOut
                    8 nValue
                    32 CScript
                4 fCoinBase & nHeight
                4 padding: stores flags (dirty & fresh)

That change should translate to ~8% less memory usage of the CCoinsMap.
martinus added a commit to martinus/bitcoin that referenced this issue Aug 14, 2021
This PR is similar to bitcoin#17060, but without the incorrect alignment, and
with safeguards. CCoinsCacheEntry makes use of Coin's padding to store
the flag.

When using the padding of the member, great care needs to be taken that the
padding is not accidentally overwritten, e.g. when assigning/moving a coin.

This PR solves this potential error source by chainging CCoinsCacheEntry
into a class with private members and only beeing able to modify the
coin with a call to MutableCoin(). That method safely remembers and then
restores the flags, no matter what happens to the coin member.

The original struct layout was like this:

    96 CCoinsMap::value_type
        36 COutPoint
            32 uint256
            4 uint32_t
        4 >>> PADDING <<<
        56 CCoinsCacheEntry
            48 Coin
                40 CTxOut
                    8 nValue
                    32 CScript
                4 fCoinBase & nHeight
                4 >>> PADDING <<<
            1 flags (dirty & fresh)
            7 >>> PADDING <<<

After that PR the struct layout looks like this:

    88 CCoinsMap::value_type
        36 COutPoint
            32 uint256
            4 uint32_t
        4 >>> PADDING <<<
        48 CCoinsCacheEntry
            48 Coin
                40 CTxOut
                    8 nValue
                    32 CScript
                4 fCoinBase & nHeight
                4 padding: stores flags (dirty & fresh)

That change translates to about ~8% less memory usage of the CCoinsMap.
martinus added a commit to martinus/bitcoin that referenced this issue Aug 15, 2021
This PR is similar to bitcoin#17060, but without the incorrect alignment, and
with safeguards. CCoinsCacheEntry makes use of Coin's padding to store
the flag.

When using the padding of the member, great care needs to be taken that the
padding is not accidentally overwritten, e.g. when assigning/moving a coin.

This PR solves this potential error source by chainging CCoinsCacheEntry
into a class with private members and only beeing able to modify the
coin with a call to MutableCoin(). That method safely remembers and then
restores the flags, no matter what happens to the coin member.

The original struct layout was like this:

    96 CCoinsMap::value_type
        36 COutPoint
            32 uint256
            4 uint32_t
        4 >>> PADDING <<<
        56 CCoinsCacheEntry
            48 Coin
                40 CTxOut
                    8 nValue
                    32 CScript
                4 fCoinBase & nHeight
                4 >>> PADDING <<<
            1 flags (dirty & fresh)
            7 >>> PADDING <<<

After that PR the struct layout looks like this:

    88 CCoinsMap::value_type
        36 COutPoint
            32 uint256
            4 uint32_t
        4 >>> PADDING <<<
        48 CCoinsCacheEntry
            48 Coin
                40 CTxOut
                    8 nValue
                    32 CScript
                4 fCoinBase & nHeight
                4 padding: stores flags (dirty & fresh)

That change translates to about ~8% less memory usage of the CCoinsMap.
martinus added a commit to martinus/bitcoin that referenced this issue Aug 27, 2021
This PR is similar to bitcoin#17060, but without the incorrect alignment, and
with safeguards. CCoinsCacheEntry makes use of Coin's padding to store
the flag.

When using the padding of the member, great care needs to be taken that the
padding is not accidentally overwritten, e.g. when assigning/moving a coin.

This PR solves this potential error source by chainging CCoinsCacheEntry
into a class with private members and only beeing able to modify the
coin with a call to MutableCoin(). That method safely remembers and then
restores the flags, no matter what happens to the coin member.

The original struct layout was like this:

    96 CCoinsMap::value_type
        36 COutPoint
            32 uint256
            4 uint32_t
        4 >>> PADDING <<<
        56 CCoinsCacheEntry
            48 Coin
                40 CTxOut
                    8 nValue
                    32 CScript
                4 fCoinBase & nHeight
                4 >>> PADDING <<<
            1 flags (dirty & fresh)
            7 >>> PADDING <<<

After that PR the struct layout looks like this:

    88 CCoinsMap::value_type
        36 COutPoint
            32 uint256
            4 uint32_t
        4 >>> PADDING <<<
        48 CCoinsCacheEntry
            48 Coin
                40 CTxOut
                    8 nValue
                    32 CScript
                4 fCoinBase & nHeight
                4 padding: stores flags (dirty & fresh)

That change translates to about ~8% less memory usage of the CCoinsMap.
martinus added a commit to martinus/bitcoin that referenced this issue Aug 27, 2021
This PR is similar to bitcoin#17060, but without the incorrect alignment, and
with safeguards. CCoinsCacheEntry makes use of Coin's padding to store
the flag.

When using the padding of the member, great care needs to be taken that the
padding is not accidentally overwritten, e.g. when assigning/moving a coin.

This PR solves this potential error source by chainging CCoinsCacheEntry
into a class with private members and only beeing able to modify the
coin with a call to MutableCoin(). That method safely remembers and then
restores the flags, no matter what happens to the coin member.

The original struct layout was like this:

    96 CCoinsMap::value_type
        36 COutPoint
            32 uint256
            4 uint32_t
        4 >>> PADDING <<<
        56 CCoinsCacheEntry
            48 Coin
                40 CTxOut
                    8 nValue
                    32 CScript
                4 fCoinBase & nHeight
                4 >>> PADDING <<<
            1 flags (dirty & fresh)
            7 >>> PADDING <<<

After that PR the struct layout looks like this:

    88 CCoinsMap::value_type
        36 COutPoint
            32 uint256
            4 uint32_t
        4 >>> PADDING <<<
        48 CCoinsCacheEntry
            48 Coin
                40 CTxOut
                    8 nValue
                    32 CScript
                4 fCoinBase & nHeight
                4 padding: stores flags (dirty & fresh)

That change translates to about ~8% less memory usage of the CCoinsMap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet