Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9% less memory: make SaltedOutpointHasher noexcept #16957

Merged

Conversation

@martinus
Copy link
Contributor

martinus commented Sep 24, 2019

If the hash is not noexcept, unorderd_map has to assume that it can throw an exception. Thus when rehashing care needs to be taken. libstdc++ solves this by simply caching the hash value, which increases memory of each node by 8 bytes. Adding noexcept prevents this caching. In my experiments with -reindex-chainstate -stopatheight=594000, memory usage (maximum resident set size) has decreased by 9.4% while runtime has increased by 1.6% due to additional hashing. Additionally, memusage::DynamicUsage() is now more accurate and does not underestimate.

runtime h:mm:ss max RSS kbyte
master 4:13:59 7696728
2019-09-SaltedOutpointHasher-noexcept 4:18:11 6971412
change +1.65% -9,42%

Comparison of progress masters vs. 2019-09-SaltedOutpointHasher-noexcept
out

@jamesob

This comment has been minimized.

Copy link
Member

jamesob commented Sep 24, 2019

Concept ACK. Very exciting. Will start some benchmarks tonight.

@emilengler

This comment has been minimized.

Copy link
Contributor

emilengler commented Sep 24, 2019

Concept ACK, good found

@kristapsk

This comment has been minimized.

Copy link
Contributor

kristapsk commented Sep 24, 2019

Concept ACK

@sipa

This comment has been minimized.

Copy link
Member

sipa commented Sep 24, 2019

Wow, awesome.

ACK

@promag

This comment has been minimized.

Copy link
Member

promag commented Sep 24, 2019

Little big PR! Concept ACK.

@sipa

This comment has been minimized.

Copy link
Member

sipa commented Sep 24, 2019

Is it slower?

@martinus

This comment has been minimized.

Copy link
Contributor Author

martinus commented Sep 24, 2019

Yes it seems to be slower, by 1.6%. But I'm not sure how much of this is random test variation. I think the lower memory usage more than offsets this: If you increase dbcache by about 9% memory usage will be about the same as before but I'm pretty sure runtime will be faster because of better caching

@DrahtBot

This comment has been minimized.

Copy link
Contributor

DrahtBot commented Sep 24, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #16910 (wallet: reduce loading time by using unordered maps by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@martinus

This comment has been minimized.

Copy link
Contributor Author

martinus commented Sep 25, 2019

I did another run with the change and -dbcache=5471, so I allowed more memory. The result:

runtime h:mm:ss max RSS kbyte
master -dbcache=5000 4:13:59 7696728
noexcept -dbcache=5000 4:18:11 6971412
noexcept -dbcache=5471 4:01:16 7282044

This time synchronization was about 5% faster than master and still used 5% less memory than master. Here is a graph with all the results:

out

@fanquake fanquake added this to the 0.20.0 milestone Sep 25, 2019
@practicalswift

This comment has been minimized.

Copy link
Member

practicalswift commented Sep 25, 2019

ACK 9538fce -- diff looks correct and SipHashUint256Extra does not throw

Nice find! I believe there are similar opportunities waiting to be found: we underuse noexcept :)

Somewhat related previous discussions about the pros/cons of noexcept:

  • #15926 (tinyformat: Add format_no_throw)
  • #13382 (util: Don't throw in GetTime{Millis,Micros}(). Mark as noexcept.)
@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Sep 25, 2019

ACK 9538fce, this is trivially correct, had no idea that no_except could make so much of a concrete difference in generated code.

@jonasschnelli

This comment has been minimized.

Copy link
Member

jonasschnelli commented Sep 25, 2019

Nice!
utACK 9538fce.
Should we increase the dbcache by +~9% to not reduce the sync/reindex performance with default parameters (in this or in another PR)?

@jamesob

This comment has been minimized.

Copy link
Member

jamesob commented Sep 25, 2019

First bench run coincides with @martinus' findings:

-reindex-chainstate to 550,000 (dbcache=5000)

bench-s      martinus/2019-09-SaltedOutp...       2:41:35    1.00  6982.89MB
bench-s      master                               2:38:42    0.98  7469.20MB

on

Hostname:            bench-s
Kernel:              Linux 4.19.0-5-amd64
OS:                  Debian GNU/Linux 10
RAM (GB):            31.35
Architecture:        x86_64
CPU(s):              8
Thread(s) per core:  2
Core(s) per socket:  4
Model name:          Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
@ryanofsky

This comment has been minimized.

Copy link
Contributor

ryanofsky commented Sep 25, 2019

when rehashing care needs to be taken. libstdc++ solves this by simply caching the hash value

Curious if this applies to libc++ as well as libstdc++

@MarcoFalke

This comment has been minimized.

Copy link
Member

MarcoFalke commented Sep 25, 2019

I'd prefer if the doxygen comment was extended a bit on the effects of the noexcept keyword. Maybe throw some links in there (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/libstdc++/manual/manual/unordered_associative.html#containers.unordered.cache) ?

@jamesob

This comment has been minimized.

Copy link
Member

jamesob commented Sep 25, 2019

Curious if this applies to libc++ as well as libstdc++

@sdaftuar and I have tried reading through the libc++ implementation of hash tables to find the equivalent caching clause to what's in libstdc++, but so far we haven't been able to find it. Does anyone know where this happens in libc++ (clang)?

I'll report back soon with clang-based benches.

@martinus

This comment has been minimized.

Copy link
Contributor Author

martinus commented Sep 25, 2019

I'd prefer if the doxygen comment was extended a bit on the effects of the noexcept keyword. Maybe throw some links in there (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/libstdc++/manual/manual/unordered_associative.html#containers.unordered.cache) ?

Ah, I didn't know this was documented, thanks for the link. I've added a comment

If the hash is not noexcept, unorderd_map has to assume that it can throw an exception. Thus when rehashing care needs to be taken. libstdc++ solves this by simply caching the hash value, which increases memory of each node by 8 bytes. Adding noexcept prevents this caching. In my experiments with -reindex-chainstate -stopatheight=594000, memory usage has decreased by 9.4% while runtime has increased by 1.6% due to additional hashing. Additionally, memusage::DynamicUsage() is now more accurate and does not underestimate.
@martinus martinus force-pushed the martinus:2019-09-SaltedOutpointHasher-noexcept branch from e4a6668 to 67d9990 Sep 25, 2019
@jamesob

This comment has been minimized.

Copy link
Member

jamesob commented Sep 26, 2019

Tested ACK 67d9990

Clang results are in and they're consistent with what we've seen so far (benched IBD 500,000-535,000 at dbcache=4000).

ibd local range 500000 535000

2019-09-SaltedOutpointHasher-noexcept vs. master (absolute)

bench name x 2019-09-SaltedOutpointHasher-noexcept master
ibd.local.range.500000.535000.total_secs 3 3447.5602 (± 14.9648) 3434.1072 (± 3.6158)
ibd.local.range.500000.535000.peak_rss_KiB 3 5973120.0000 (± 15741.9850) 6472862.6667 (± 11442.6395)

2019-09-SaltedOutpointHasher-noexcept vs. master (relative)

bench name x 2019-09-SaltedOutpointHasher-noexcept master
ibd.local.range.500000.535000.total_secs 3 1.004 1.000
ibd.local.range.500000.535000.peak_rss_KiB 3 1.000 1.084
@practicalswift

This comment has been minimized.

Copy link
Member

practicalswift commented Sep 26, 2019

We should also bump the default dbcache to make sure IBD is not slowed down by the merge of this PR, right?

@jamesob

This comment has been minimized.

Copy link
Member

jamesob commented Sep 26, 2019

We should also bump the default dbcache to make sure IBD is not slowed down by the merge of this PR, right?

I think that's a good idea. I'll rerun a bench comparing master:-dbcache=450 (default) and martinus:-dbcache=500 to ensure the latter undershoots the former in terms of memory usage.

@jamesob

This comment has been minimized.

Copy link
Member

jamesob commented Sep 26, 2019

It's worth pointing out that when memory is not a limiting factor (e.g. a miner sets -dbcache=15000 for highest possible speed), this change will impose a minor performance penalty on the order of what we've seen above (<1.6%).

I still think this change is worthwhile because using noexcept here is the "correct" phrasing of this program, and the net expected performance gain is positive since most users are limited by RAM, but I think it's still worth considering that if we at some point offered a configuration setting that optimized for speed above all else, this is something that we would want to tweak.

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

TheBlueMatt commented Sep 26, 2019

I'm somewhat confused by the "increase dbcache for equivalent performance" thing - shouldn't DynamicUsage(const std::unordered_map<X, Y, Z>& m) take into account the new slightly lower memory usage and thus increase the capacity of the map? Is there some way to get the performance back with some reasonable pre-allocation of the map?

@jamesob

This comment has been minimized.

Copy link
Member

jamesob commented Sep 26, 2019

shouldn't DynamicUsage(const std::unordered_map<X, Y, Z>& m) take into account the new slightly lower memory usage and thus increase the capacity of the map?

Nah unfortunately I don't think so - because our memusage estimator optimistically assumes that each node in the hashmap consumes only as much memory as it takes to store the pair of key and value, plus each bucket pointer.

Is there some way to get the performance back with some reasonable pre-allocation of the map?

I've tried this with a call to cacheCoins.reserve(...) in init and didn't see much of a difference, but might be worth investigating again.

@martinus

This comment has been minimized.

Copy link
Contributor Author

martinus commented Sep 26, 2019

I'm somewhat confused by the "increase dbcache for equivalent performance" thing - shouldn't DynamicUsage(const std::unordered_map<X, Y, Z>& m) take into account the new slightly lower memory usage

The problem was that DynamicMemUsage was wrong before, it underestimated memory usage. Now it's more correct.

I think it might be possible to regain the performance loss by caching the hash ourselves. We have some padding in the maps' key, and it might be possible to use these bytes to store the hash value (or at least 4 bytes of it) without having to increase the memory.

@TheBlueMatt

This comment has been minimized.

Copy link
Contributor

TheBlueMatt commented Sep 26, 2019

@jamesob

This comment has been minimized.

Copy link
Member

jamesob commented Sep 26, 2019

Interestingly, the difference isn't pronounced near default dbcache values. This bench compares master at dbcache=450 (default) to this branch at dbcache=500. Master's runtime is 1.042x longer with this configuration.

Maybe the runtime difference isn't as pronounced since this is only for a relatively small subset of the chain (35,000 blocks). In any case, it at least shows that a new dbcache default of 500 with this change still consumes less memory than master with the current default.

ibd local range 500000 535000

laanwj added a commit that referenced this pull request Sep 30, 2019
67d9990 make SaltedOutpointHasher noexcept (Martin Ankerl)

Pull request description:

  If the hash is not `noexcept`, `unorderd_map` has to assume that it can throw an exception. Thus when rehashing care needs to be taken. libstdc++ solves this by simply caching the hash value, which increases memory of each node by 8 bytes. Adding `noexcept` prevents this caching. In my experiments with `-reindex-chainstate -stopatheight=594000`, memory usage (maximum resident set size) has decreased by 9.4% while runtime has increased by 1.6% due to additional hashing. Additionally, memusage::DynamicUsage() is now more accurate and does not underestimate.

  |                                       | runtime h:mm:ss | max RSS kbyte |
  |---------------------------------------|-----------------|--------------|
  | master                                |         4:13:59 |      7696728 |
  | 2019-09-SaltedOutpointHasher-noexcept |         4:18:11 |      6971412 |
  | change                                |          +1.65% |       -9,42% |

  Comparison of progress masters vs. 2019-09-SaltedOutpointHasher-noexcept
  ![out](https://user-images.githubusercontent.com/14386/65541887-69424e00-df0e-11e9-8644-b3a068ed8c3f.png)

ACKs for top commit:
  jamesob:
    Tested ACK 67d9990

Tree-SHA512: 9c44e3cca993b5a564dd61ebd2926b9c4a238609ea4d283514c018236f977d935e35a384dd4696486fd3d78781dd2ba190bb72596e20a5e931042fa465872a0b
@laanwj laanwj merged commit 67d9990 into bitcoin:master Sep 30, 2019
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Sep 30, 2019

It seems safe to merge this for 0.19.

It's clear that there are still opportunities left for optimization in and around the UTXO hash data structure, let's look further at that for 0.20.

@fanquake fanquake modified the milestones: 0.20.0, 0.19.0 Sep 30, 2019
@MarcoFalke

This comment has been minimized.

Copy link
Member

MarcoFalke commented Sep 30, 2019

Does this need release notes with potential action items for node operators and/or the default bumped?

@martinus

This comment has been minimized.

Copy link
Contributor Author

martinus commented Sep 30, 2019

I'd say the default should be bumped from 450 to 500, and Node operators can increase dbcache by 10% without needing more RAM than before

sidhujag added a commit to syscoin/syscoin that referenced this pull request Oct 2, 2019
67d9990 make SaltedOutpointHasher noexcept (Martin Ankerl)

Pull request description:

  If the hash is not `noexcept`, `unorderd_map` has to assume that it can throw an exception. Thus when rehashing care needs to be taken. libstdc++ solves this by simply caching the hash value, which increases memory of each node by 8 bytes. Adding `noexcept` prevents this caching. In my experiments with `-reindex-chainstate -stopatheight=594000`, memory usage (maximum resident set size) has decreased by 9.4% while runtime has increased by 1.6% due to additional hashing. Additionally, memusage::DynamicUsage() is now more accurate and does not underestimate.

  |                                       | runtime h:mm:ss | max RSS kbyte |
  |---------------------------------------|-----------------|--------------|
  | master                                |         4:13:59 |      7696728 |
  | 2019-09-SaltedOutpointHasher-noexcept |         4:18:11 |      6971412 |
  | change                                |          +1.65% |       -9,42% |

  Comparison of progress masters vs. 2019-09-SaltedOutpointHasher-noexcept
  ![out](https://user-images.githubusercontent.com/14386/65541887-69424e00-df0e-11e9-8644-b3a068ed8c3f.png)

ACKs for top commit:
  jamesob:
    Tested ACK bitcoin@67d9990

Tree-SHA512: 9c44e3cca993b5a564dd61ebd2926b9c4a238609ea4d283514c018236f977d935e35a384dd4696486fd3d78781dd2ba190bb72596e20a5e931042fa465872a0b
@MarcoFalke MarcoFalke mentioned this pull request Nov 6, 2019
14 of 19 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.