New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster sigcache nonce #13204
Faster sigcache nonce #13204
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 9c8cba8
src/script/sigcache.cpp
Outdated
typedef CuckooCache::cache<uint256, SignatureCacheHasher> map_type; | ||
map_type setValid; | ||
boost::shared_mutex cs_sigcache; | ||
|
||
public: | ||
CSignatureCache() | ||
{ | ||
GetRandBytes(nonce.begin(), 32); | ||
base_blob<64*8> nonce; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unsigned char nonce[64]
ought to work fine.
fac1223 Cache witness hash in CTransaction (MarcoFalke) faab55f Make CMutableTransaction constructor explicit (MarcoFalke) Pull request description: This speeds up: * compactblocks (v2) * ATMP * validation and miner (via `BlockWitnessMerkleRoot`) * sigcache (see also unrelated #13204) * rpc and rest (nice, but irrelevant) This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above. Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Benchmarks would be welcome. |
The last travis run for this pull request was 279 days ago and is thus outdated. To trigger a fresh travis build, this pull request should be closed and re-opened. |
This will be closed due to inactivity in two weeks |
Closing for now. Let me know when you want this reopened to work on it again. |
This seems like an obvious (minor) win. It does need to be benchmarked, but it could just be an informal test, not some benchmark tool checked in. |
The tests were failing, so at the very least this needs to be fixed:
|
Oddly, pruned IBD (500_000 - 505_000) comparison indicates this PR slightly slower than master. Maybe I'll try again after a rebase? faster-sigcache-nonce vs. master (absolute)
faster-sigcache-nonce vs. master (relative)
|
9c8cba8
to
f6b9da7
Compare
Rebased -- go ahead and retry! |
Irrelevant nit: I'd recommend just zerofilling the input rather than calling getrand again. 256-bits is more than then enough (and less wouldn't make getrand any faster). |
fixed build issue; travis needs to be kicked @MarcoFalke |
kicked travis, @jamesob needs to be kicked |
I'd prefer if the commits were squashed, so that git bisect would't break on half of them |
I added a contrived microbenchmark which shows:
Pre padding is indeed better than regular padding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 5b87767. Only changes since last review are new comments and benchmark. The new comments look good! And the new microbenchmark should be a sufficient sanity check for performance of this change, along with the previous neutral IBD results. Like I wrote in #13204 (review), I think this change seems nice even as a simple code cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a benchmark. Just one question about it.
Would be nice to add the benchmark in the first commit, so that performance can be compared.
src/bench/hashpadding.cpp
Outdated
while (state.KeepRunning()) { | ||
unsigned char out[32]; | ||
CSHA256 h = hasher; | ||
hasher.Write(nonce.begin(), 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h
or hasher
? If hasher
is intended, it looks like that this benchmark will perform as good as the previous one on every other run. Might want to add a comment to explain why.
If h
is intended, could make hasher
const
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're right. good catch. Let me fix that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be made const because it needs to be updated a little bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well it can be for one of the tests, and not the other. But I want to keep the code as similar as possible between pre/regular.
src/script/sigcache.cpp
Outdated
@@ -24,21 +24,27 @@ class CSignatureCache | |||
{ | |||
private: | |||
//! Entries are SHA256(nonce || signature hash || public key || signature): | |||
uint256 nonce; | |||
CSHA256 salted_hasher; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSHA256 salted_hasher; | |
CSHA256 m_salted_hasher; |
@MarcoFalke because of modularization, there's no real interfaces exposed for me to run in a bench, so the benchmark just had both techniques demonstrated (doesn't really matter if the commit comes before or after, the code changes don't affect the benchmark). |
5b87767
to
dec5f96
Compare
@MarcoFalke nits addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK dec5f96. Only changes since last review were adding suggested m_
prefixes and making the RegularPadded case closer to what it's supposed to measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK dec5f96, I've tested change locally and run benchmark on a i7-7700HQ CPU @ 2.80GHz (Debian), got the following results (yes PrePadded is faster):
# Benchmark, evals, iterations, total, min, max, median
RegularPadded, 5, 10000, 0.0195961, 3.75465e-07, 4.17618e-07, 3.83324e-07
# Benchmark, evals, iterations, total, min, max, median
PrePadded, 5, 10000, 0.0114673, 1.9201e-07, 2.7354e-07, 2.1187e-07
12a8340
to
82740bd
Compare
82740bd
to
152e8ba
Compare
Use salted hasher instead of nonce in Script Execution Cache Don't read more than 32 bytes from GetRand Apply g_* naming convention to scriptExecutionCache in validation.cpp Fully apply g_* naming convention to scriptCacheHasher Write same uint256 nonce twice for cache hash rather than calling getrand twice Use salted hasher instead of nonce in sigcache Use salted hasher instead of nonce in Script Execution Cache Don't read more than 32 bytes from GetRand Apply g_* naming convention to scriptExecutionCache in validation.cpp Fully apply g_* naming convention to scriptCacheHasher Write same uint256 nonce twice for cache hash rather than calling getrand twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 152e8ba. No code changes, just rebase since last review and expanded commit message
I think this is ready to merge @MarcoFalke. All outstanding feedback has been addressed & has two "fresh" ACKs and a handful of stale utACKs/makes senses. |
Any agreement / disagreement? |
On arm:
|
152e8ba Use salted hasher instead of nonce in sigcache (Jeremy Rubin) 5495fa5 Add Hash Padding Microbenchmarks (Jeremy Rubin) Pull request description: This PR replaces nonces in two places with pre-salted hashers. The nonce is chosen to be 64 bytes long so that it forces the SHA256 hasher to process the chunk. This leaves the next 64 (or 56 depending if final chunk) open for data. In the case of the script execution cache, this does not make a big performance improvement because the nonce was already properly padded to fit into one buffer, but does make the code a little simpler. In the case of the sig cache, this should reduce the hashing overhead slightly because we are less likely to need an additional processing step. I haven't benchmarked this, but back of the envelope it should reduce the hashing by one buffer for all combinations except compressed public keys with compact signatures. ACKs for top commit: ryanofsky: Code review ACK 152e8ba. No code changes, just rebase since last review and expanded commit message Tree-SHA512: b133e902fd595cfe3b54ad8814b823f4d132cb2c358c89158842ae27daee56ab5f70cde2585078deb46f77a6e7b35b4cc6bba47b65302b7befc2cff254bad93d
Summary: ``` This PR replaces nonces in two places with pre-salted hashers. The nonce is chosen to be 64 bytes long so that it forces the SHA256 hasher to process the chunk. This leaves the next 64 (or 56 depending if final chunk) open for data. In the case of the script execution cache, this does not make a big performance improvement because the nonce was already properly padded to fit into one buffer, but does make the code a little simpler. In the case of the sig cache, this should reduce the hashing overhead slightly because we are less likely to need an additional processing step. ``` Backport of core [[bitcoin/bitcoin#13204 | PR13204]]. Test Plan: ninja all check-all ninja bench-bitcoin Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D9212
fac1223 Cache witness hash in CTransaction (MarcoFalke) faab55f Make CMutableTransaction constructor explicit (MarcoFalke) Pull request description: This speeds up: * compactblocks (v2) * ATMP * validation and miner (via `BlockWitnessMerkleRoot`) * sigcache (see also unrelated bitcoin#13204) * rpc and rest (nice, but irrelevant) This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above. Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7
fac1223 Cache witness hash in CTransaction (MarcoFalke) faab55f Make CMutableTransaction constructor explicit (MarcoFalke) Pull request description: This speeds up: * compactblocks (v2) * ATMP * validation and miner (via `BlockWitnessMerkleRoot`) * sigcache (see also unrelated bitcoin#13204) * rpc and rest (nice, but irrelevant) This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above. Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7
fac1223 Cache witness hash in CTransaction (MarcoFalke) faab55f Make CMutableTransaction constructor explicit (MarcoFalke) Pull request description: This speeds up: * compactblocks (v2) * ATMP * validation and miner (via `BlockWitnessMerkleRoot`) * sigcache (see also unrelated bitcoin#13204) * rpc and rest (nice, but irrelevant) This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above. Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7
fac1223 Cache witness hash in CTransaction (MarcoFalke) faab55f Make CMutableTransaction constructor explicit (MarcoFalke) Pull request description: This speeds up: * compactblocks (v2) * ATMP * validation and miner (via `BlockWitnessMerkleRoot`) * sigcache (see also unrelated bitcoin#13204) * rpc and rest (nice, but irrelevant) This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above. Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7
fac1223 Cache witness hash in CTransaction (MarcoFalke) faab55f Make CMutableTransaction constructor explicit (MarcoFalke) Pull request description: This speeds up: * compactblocks (v2) * ATMP * validation and miner (via `BlockWitnessMerkleRoot`) * sigcache (see also unrelated bitcoin#13204) * rpc and rest (nice, but irrelevant) This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above. Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7
This PR replaces nonces in two places with pre-salted hashers.
The nonce is chosen to be 64 bytes long so that it forces the SHA256 hasher to process the chunk. This leaves the next 64 (or 56 depending if final chunk) open for data. In the case of the script execution cache, this does not make a big performance improvement because the nonce was already properly padded to fit into one buffer, but does make the code a little simpler. In the case of the sig cache, this should reduce the hashing overhead slightly because we are less likely to need an additional processing step.
I haven't benchmarked this, but back of the envelope it should reduce the hashing by one buffer for all combinations except compressed public keys with compact signatures.