Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster sigcache nonce #13204

Merged
merged 2 commits into from Jun 2, 2020
Merged

Conversation

JeremyRubin
Copy link
Contributor

@JeremyRubin JeremyRubin commented May 9, 2018

This PR replaces nonces in two places with pre-salted hashers.

The nonce is chosen to be 64 bytes long so that it forces the SHA256 hasher to process the chunk. This leaves the next 64 (or 56 depending if final chunk) open for data. In the case of the script execution cache, this does not make a big performance improvement because the nonce was already properly padded to fit into one buffer, but does make the code a little simpler. In the case of the sig cache, this should reduce the hashing overhead slightly because we are less likely to need an additional processing step.

I haven't benchmarked this, but back of the envelope it should reduce the hashing by one buffer for all combinations except compressed public keys with compact signatures.

Copy link
Contributor

@jimpo jimpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 9c8cba8

typedef CuckooCache::cache<uint256, SignatureCacheHasher> map_type;
map_type setValid;
boost::shared_mutex cs_sigcache;

public:
CSignatureCache()
{
GetRandBytes(nonce.begin(), 32);
base_blob<64*8> nonce;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unsigned char nonce[64] ought to work fine.

laanwj added a commit that referenced this pull request May 23, 2018
fac1223 Cache witness hash in CTransaction (MarcoFalke)
faab55f Make CMutableTransaction constructor explicit (MarcoFalke)

Pull request description:

  This speeds up:
  * compactblocks (v2)
  * ATMP
  * validation and miner (via `BlockWitnessMerkleRoot`)
  * sigcache (see also unrelated #13204)
  * rpc and rest (nice, but irrelevant)

  This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above.

Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7
@DrahtBot DrahtBot closed this Jul 22, 2018
@DrahtBot DrahtBot reopened this Jul 22, 2018
@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 7, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@sipa
Copy link
Member

sipa commented Nov 10, 2018

Benchmarks would be welcome.

@DrahtBot DrahtBot closed this Apr 28, 2019
@DrahtBot
Copy link
Contributor

The last travis run for this pull request was 279 days ago and is thus outdated. To trigger a fresh travis build, this pull request should be closed and re-opened.

@DrahtBot DrahtBot reopened this Apr 28, 2019
@maflcko
Copy link
Member

maflcko commented Apr 29, 2019

This will be closed due to inactivity in two weeks

@maflcko
Copy link
Member

maflcko commented May 7, 2019

Closing for now. Let me know when you want this reopened to work on it again.

@maflcko maflcko closed this May 7, 2019
@gmaxwell
Copy link
Contributor

gmaxwell commented May 7, 2019

This seems like an obvious (minor) win. It does need to be benchmarked, but it could just be an informal test, not some benchmark tool checked in.

@maflcko
Copy link
Member

maflcko commented May 7, 2019

The tests were failing, so at the very least this needs to be fixed:

test_bitcoin-qt: random.cpp:577: void ProcRand(unsigned char*, int, RNGLevel): Assertion `num <= 32' failed.

@maflcko maflcko reopened this May 7, 2019
@jamesob
Copy link
Member

jamesob commented May 7, 2019

Oddly, pruned IBD (500_000 - 505_000) comparison indicates this PR slightly slower than master. Maybe I'll try again after a rebase?

nonce ibd local 500000 505000 dbcache=2048

faster-sigcache-nonce vs. master (absolute)

name iterations faster-sigcache-nonce master
build.make.23.clang 1 125.9031 (± 0.0000) 135.5557 (± 0.0000)
build.make.23.clang.mem-usage 1 700992.0000 (± 0.0000) 647144.0000 (± 0.0000)
ibd.local.500000.505000.dbcache=2048 3 472.4958 (± 2.4356) 439.0951 (± 2.0998)
ibd.local.500000.505000.dbcache=2048.mem-usage 3 2434952.0000 (± 2700.1412) 2417438.6667 (± 1709.0577)

faster-sigcache-nonce vs. master (relative)

name iterations faster-sigcache-nonce master
build.make.23.clang 1 1.000 1.08
build.make.23.clang.mem-usage 1 1.083 1.00
ibd.local.500000.505000.dbcache=2048 3 1.076 1.00
ibd.local.500000.505000.dbcache=2048.mem-usage 3 1.007 1.00

@JeremyRubin
Copy link
Contributor Author

Rebased -- go ahead and retry!

src/validation.cpp Outdated Show resolved Hide resolved
@gmaxwell
Copy link
Contributor

gmaxwell commented May 7, 2019

Irrelevant nit: I'd recommend just zerofilling the input rather than calling getrand again. 256-bits is more than then enough (and less wouldn't make getrand any faster).

@jamesob
Copy link
Member

jamesob commented May 8, 2019

Current revision fails to build. Benched the rebase and didn't see much difference between master and this branch, though I don't like how much variance there was so I'm going to run again once this compiles.

ibd local 500000 505000 dbcache=2048

@JeremyRubin
Copy link
Contributor Author

fixed build issue; travis needs to be kicked @MarcoFalke

@maflcko
Copy link
Member

maflcko commented May 9, 2019

kicked travis, @jamesob needs to be kicked

@maflcko
Copy link
Member

maflcko commented May 9, 2019

I'd prefer if the commits were squashed, so that git bisect would't break on half of them

@JeremyRubin
Copy link
Contributor Author

I added a contrived microbenchmark which shows:

# Benchmark, evals, iterations, total, min, max, median
PrePadded, 5, 10000, 0.00320418, 6.18847e-08, 7.06114e-08, 6.28207e-08
RegularPadded, 5, 10000, 0.00560628, 1.09935e-07, 1.18426e-07, 1.10481e-07

Pre padding is indeed better than regular padding.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 5b87767. Only changes since last review are new comments and benchmark. The new comments look good! And the new microbenchmark should be a sufficient sanity check for performance of this change, along with the previous neutral IBD results. Like I wrote in #13204 (review), I think this change seems nice even as a simple code cleanup.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a benchmark. Just one question about it.

Would be nice to add the benchmark in the first commit, so that performance can be compared.

while (state.KeepRunning()) {
unsigned char out[32];
CSHA256 h = hasher;
hasher.Write(nonce.begin(), 32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h or hasher? If hasher is intended, it looks like that this benchmark will perform as good as the previous one on every other run. Might want to add a comment to explain why.
If h is intended, could make hasher const?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're right. good catch. Let me fix that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be made const because it needs to be updated a little bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it can be for one of the tests, and not the other. But I want to keep the code as similar as possible between pre/regular.

@@ -24,21 +24,27 @@ class CSignatureCache
{
private:
//! Entries are SHA256(nonce || signature hash || public key || signature):
uint256 nonce;
CSHA256 salted_hasher;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CSHA256 salted_hasher;
CSHA256 m_salted_hasher;

@JeremyRubin
Copy link
Contributor Author

@MarcoFalke because of modularization, there's no real interfaces exposed for me to run in a bench, so the benchmark just had both techniques demonstrated (doesn't really matter if the commit comes before or after, the code changes don't affect the benchmark).

@JeremyRubin
Copy link
Contributor Author

@MarcoFalke nits addressed

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK dec5f96. Only changes since last review were adding suggested m_ prefixes and making the RegularPadded case closer to what it's supposed to measure.

Copy link
Member

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK dec5f96, I've tested change locally and run benchmark on a i7-7700HQ CPU @ 2.80GHz (Debian), got the following results (yes PrePadded is faster):

# Benchmark, evals, iterations, total, min, max, median
RegularPadded, 5, 10000, 0.0195961, 3.75465e-07, 4.17618e-07, 3.83324e-07
# Benchmark, evals, iterations, total, min, max, median
PrePadded, 5, 10000, 0.0114673, 1.9201e-07, 2.7354e-07, 2.1187e-07

@JeremyRubin JeremyRubin force-pushed the faster-sigcache-nonce branch 2 times, most recently from 12a8340 to 82740bd Compare April 29, 2020 07:31
Use salted hasher instead of nonce in Script Execution Cache

Don't read more than 32 bytes from GetRand

Apply g_* naming convention to scriptExecutionCache in validation.cpp

Fully apply g_* naming convention to scriptCacheHasher

Write same uint256 nonce twice for cache hash rather than calling getrand twice

Use salted hasher instead of nonce in sigcache

Use salted hasher instead of nonce in Script Execution Cache

Don't read more than 32 bytes from GetRand

Apply g_* naming convention to scriptExecutionCache in validation.cpp

Fully apply g_* naming convention to scriptCacheHasher

Write same uint256 nonce twice for cache hash rather than calling getrand twice
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 152e8ba. No code changes, just rebase since last review and expanded commit message

@JeremyRubin
Copy link
Contributor Author

I think this is ready to merge @MarcoFalke.

All outstanding feedback has been addressed & has two "fresh" ACKs and a handful of stale utACKs/makes senses.

@ryanofsky
Copy link
Contributor

I think this is ready to merge @MarcoFalke.

All outstanding feedback has been addressed & has two "fresh" ACKs and a handful of stale utACKs/makes senses.

Any agreement / disagreement?

@maflcko maflcko merged commit 9e8bd21 into bitcoin:master Jun 2, 2020
@maflcko
Copy link
Member

maflcko commented Jun 2, 2020

On arm:

$ ./src/bench/bench_bitcoin --filter=.*Padded
# Benchmark, evals, iterations, total, min, max, median
PrePadded, 5, 10000, 0.0305467, 6.09309e-07, 6.13757e-07, 6.10061e-07
RegularPadded, 5, 10000, 0.057972, 1.15744e-06, 1.16534e-06, 1.15813e-06

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 2, 2020
152e8ba Use salted hasher instead of nonce in sigcache (Jeremy Rubin)
5495fa5 Add Hash Padding Microbenchmarks (Jeremy Rubin)

Pull request description:

  This PR replaces nonces in two places with pre-salted hashers.

  The nonce is chosen to be 64 bytes long so that it forces the SHA256 hasher to process the chunk. This leaves the next 64 (or 56 depending if final chunk) open for data. In the case of the script execution cache, this does not make a big performance improvement because the nonce was already properly padded to fit into one buffer, but does make the code a little simpler. In the case of the sig cache, this should reduce the hashing overhead slightly because we are less likely to need an additional processing step.

  I haven't benchmarked this, but back of the envelope it should reduce the hashing by one buffer for all combinations except compressed public keys with compact signatures.

ACKs for top commit:
  ryanofsky:
    Code review ACK 152e8ba. No code changes, just rebase since last review and expanded commit message

Tree-SHA512: b133e902fd595cfe3b54ad8814b823f4d132cb2c358c89158842ae27daee56ab5f70cde2585078deb46f77a6e7b35b4cc6bba47b65302b7befc2cff254bad93d
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Feb 12, 2021
Summary:
```
This PR replaces nonces in two places with pre-salted hashers.

The nonce is chosen to be 64 bytes long so that it forces the SHA256
hasher to process the chunk. This leaves the next 64 (or 56 depending if
final chunk) open for data. In the case of the script execution cache,
this does not make a big performance improvement because the nonce was
already properly padded to fit into one buffer, but does make the code a
little simpler. In the case of the sig cache, this should reduce the
hashing overhead slightly because we are less likely to need an
additional processing step.
```

Backport of core [[bitcoin/bitcoin#13204 | PR13204]].

Test Plan:
  ninja all check-all
  ninja bench-bitcoin

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D9212
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request Jun 18, 2021
fac1223 Cache witness hash in CTransaction (MarcoFalke)
faab55f Make CMutableTransaction constructor explicit (MarcoFalke)

Pull request description:

  This speeds up:
  * compactblocks (v2)
  * ATMP
  * validation and miner (via `BlockWitnessMerkleRoot`)
  * sigcache (see also unrelated bitcoin#13204)
  * rpc and rest (nice, but irrelevant)

  This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above.

Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request Jun 24, 2021
fac1223 Cache witness hash in CTransaction (MarcoFalke)
faab55f Make CMutableTransaction constructor explicit (MarcoFalke)

Pull request description:

  This speeds up:
  * compactblocks (v2)
  * ATMP
  * validation and miner (via `BlockWitnessMerkleRoot`)
  * sigcache (see also unrelated bitcoin#13204)
  * rpc and rest (nice, but irrelevant)

  This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above.

Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request Jun 26, 2021
fac1223 Cache witness hash in CTransaction (MarcoFalke)
faab55f Make CMutableTransaction constructor explicit (MarcoFalke)

Pull request description:

  This speeds up:
  * compactblocks (v2)
  * ATMP
  * validation and miner (via `BlockWitnessMerkleRoot`)
  * sigcache (see also unrelated bitcoin#13204)
  * rpc and rest (nice, but irrelevant)

  This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above.

Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request Jun 26, 2021
fac1223 Cache witness hash in CTransaction (MarcoFalke)
faab55f Make CMutableTransaction constructor explicit (MarcoFalke)

Pull request description:

  This speeds up:
  * compactblocks (v2)
  * ATMP
  * validation and miner (via `BlockWitnessMerkleRoot`)
  * sigcache (see also unrelated bitcoin#13204)
  * rpc and rest (nice, but irrelevant)

  This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above.

Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request Jun 28, 2021
fac1223 Cache witness hash in CTransaction (MarcoFalke)
faab55f Make CMutableTransaction constructor explicit (MarcoFalke)

Pull request description:

  This speeds up:
  * compactblocks (v2)
  * ATMP
  * validation and miner (via `BlockWitnessMerkleRoot`)
  * sigcache (see also unrelated bitcoin#13204)
  * rpc and rest (nice, but irrelevant)

  This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above.

Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants