Faster sigcache nonce #13204

JeremyRubin · 2018-05-09T18:26:44Z

This PR replaces nonces in two places with pre-salted hashers.

The nonce is chosen to be 64 bytes long so that it forces the SHA256 hasher to process the chunk. This leaves the next 64 (or 56 depending if final chunk) open for data. In the case of the script execution cache, this does not make a big performance improvement because the nonce was already properly padded to fit into one buffer, but does make the code a little simpler. In the case of the sig cache, this should reduce the hashing overhead slightly because we are less likely to need an additional processing step.

I haven't benchmarked this, but back of the envelope it should reduce the hashing by one buffer for all combinations except compressed public keys with compact signatures.

jimpo

utACK 9c8cba8

jimpo · 2018-05-14T22:29:26Z

src/script/sigcache.cpp

    typedef CuckooCache::cache<uint256, SignatureCacheHasher> map_type;
    map_type setValid;
    boost::shared_mutex cs_sigcache;

 public:
    CSignatureCache()
    {
-        GetRandBytes(nonce.begin(), 32);
+        base_blob<64*8> nonce;


nit: unsigned char nonce[64] ought to work fine.

fac1223 Cache witness hash in CTransaction (MarcoFalke) faab55f Make CMutableTransaction constructor explicit (MarcoFalke) Pull request description: This speeds up: * compactblocks (v2) * ATMP * validation and miner (via `BlockWitnessMerkleRoot`) * sigcache (see also unrelated #13204) * rpc and rest (nice, but irrelevant) This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above. Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7

DrahtBot · 2018-09-07T18:55:29Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

bench: Add logging benchmark #18815 (bench: Add logging benchmark by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

sipa · 2018-11-10T03:47:21Z

Benchmarks would be welcome.

DrahtBot · 2019-04-28T19:11:46Z

The last travis run for this pull request was 279 days ago and is thus outdated. To trigger a fresh travis build, this pull request should be closed and re-opened.

maflcko · 2019-04-29T00:27:34Z

This will be closed due to inactivity in two weeks

maflcko · 2019-05-07T14:15:59Z

Closing for now. Let me know when you want this reopened to work on it again.

gmaxwell · 2019-05-07T19:23:31Z

This seems like an obvious (minor) win. It does need to be benchmarked, but it could just be an informal test, not some benchmark tool checked in.

maflcko · 2019-05-07T20:21:57Z

The tests were failing, so at the very least this needs to be fixed:

test_bitcoin-qt: random.cpp:577: void ProcRand(unsigned char*, int, RNGLevel): Assertion `num <= 32' failed.

jamesob · 2019-05-07T20:41:55Z

Oddly, pruned IBD (500_000 - 505_000) comparison indicates this PR slightly slower than master. Maybe I'll try again after a rebase?

faster-sigcache-nonce vs. master (absolute)

name	iterations	faster-sigcache-nonce	master
build.make.23.clang	1	125.9031 (± 0.0000)	135.5557 (± 0.0000)
build.make.23.clang.mem-usage	1	700992.0000 (± 0.0000)	647144.0000 (± 0.0000)
ibd.local.500000.505000.dbcache=2048	3	472.4958 (± 2.4356)	439.0951 (± 2.0998)
ibd.local.500000.505000.dbcache=2048.mem-usage	3	2434952.0000 (± 2700.1412)	`2417438`.6667 (± 1709.0577)

faster-sigcache-nonce vs. master (relative)

name	iterations	faster-sigcache-nonce	master
build.make.23.clang	1	1.000	1.08
build.make.23.clang.mem-usage	1	1.083	1.00
ibd.local.500000.505000.dbcache=2048	3	1.076	1.00
ibd.local.500000.505000.dbcache=2048.mem-usage	3	1.007	1.00

JeremyRubin · 2019-05-07T20:50:18Z

Rebased -- go ahead and retry!

src/validation.cpp

gmaxwell · 2019-05-07T23:14:14Z

Irrelevant nit: I'd recommend just zerofilling the input rather than calling getrand again. 256-bits is more than then enough (and less wouldn't make getrand any faster).

jamesob · 2019-05-08T03:00:49Z

Current revision fails to build. Benched the rebase and didn't see much difference between master and this branch, though I don't like how much variance there was so I'm going to run again once this compiles.

JeremyRubin · 2019-05-09T22:09:24Z

fixed build issue; travis needs to be kicked @MarcoFalke

maflcko · 2019-05-09T22:40:44Z

kicked travis, @jamesob needs to be kicked

maflcko · 2019-05-09T22:41:51Z

I'd prefer if the commits were squashed, so that git bisect would't break on half of them

JeremyRubin · 2019-07-24T20:37:45Z

I added a contrived microbenchmark which shows:

# Benchmark, evals, iterations, total, min, max, median
PrePadded, 5, 10000, 0.00320418, 6.18847e-08, 7.06114e-08, 6.28207e-08
RegularPadded, 5, 10000, 0.00560628, 1.09935e-07, 1.18426e-07, 1.10481e-07

Pre padding is indeed better than regular padding.

ryanofsky

utACK 5b87767. Only changes since last review are new comments and benchmark. The new comments look good! And the new microbenchmark should be a sufficient sanity check for performance of this change, along with the previous neutral IBD results. Like I wrote in #13204 (review), I think this change seems nice even as a simple code cleanup.

maflcko

Thanks for adding a benchmark. Just one question about it.

Would be nice to add the benchmark in the first commit, so that performance can be compared.

maflcko · 2019-07-24T21:42:21Z

src/bench/hashpadding.cpp

+    while (state.KeepRunning()) {
+        unsigned char out[32];
+        CSHA256 h = hasher;
+        hasher.Write(nonce.begin(), 32);


h or hasher? If hasher is intended, it looks like that this benchmark will perform as good as the previous one on every other run. Might want to add a comment to explain why.
If h is intended, could make hasher const?

Ah you're right. good catch. Let me fix that...

It can't be made const because it needs to be updated a little bit.

well it can be for one of the tests, and not the other. But I want to keep the code as similar as possible between pre/regular.

maflcko · 2019-07-24T21:43:03Z

src/script/sigcache.cpp

@@ -24,21 +24,27 @@ class CSignatureCache
 {
 private:
     //! Entries are SHA256(nonce || signature hash || public key || signature):
-    uint256 nonce;
+    CSHA256 salted_hasher;


Suggested change

CSHA256 salted_hasher;

CSHA256 m_salted_hasher;

JeremyRubin · 2019-07-24T21:53:58Z

@MarcoFalke because of modularization, there's no real interfaces exposed for me to run in a bench, so the benchmark just had both techniques demonstrated (doesn't really matter if the commit comes before or after, the code changes don't affect the benchmark).

JeremyRubin · 2019-07-24T22:01:43Z

@MarcoFalke nits addressed

ryanofsky

utACK dec5f96. Only changes since last review were adding suggested m_ prefixes and making the RegularPadded case closer to what it's supposed to measure.

ariard

ACK dec5f96, I've tested change locally and run benchmark on a i7-7700HQ CPU @ 2.80GHz (Debian), got the following results (yes PrePadded is faster):

# Benchmark, evals, iterations, total, min, max, median
RegularPadded, 5, 10000, 0.0195961, 3.75465e-07, 4.17618e-07, 3.83324e-07
# Benchmark, evals, iterations, total, min, max, median
PrePadded, 5, 10000, 0.0114673, 1.9201e-07, 2.7354e-07, 2.1187e-07

Use salted hasher instead of nonce in Script Execution Cache Don't read more than 32 bytes from GetRand Apply g_* naming convention to scriptExecutionCache in validation.cpp Fully apply g_* naming convention to scriptCacheHasher Write same uint256 nonce twice for cache hash rather than calling getrand twice Use salted hasher instead of nonce in sigcache Use salted hasher instead of nonce in Script Execution Cache Don't read more than 32 bytes from GetRand Apply g_* naming convention to scriptExecutionCache in validation.cpp Fully apply g_* naming convention to scriptCacheHasher Write same uint256 nonce twice for cache hash rather than calling getrand twice

ryanofsky

Code review ACK 152e8ba. No code changes, just rebase since last review and expanded commit message

JeremyRubin · 2020-05-13T21:59:39Z

I think this is ready to merge @MarcoFalke.

All outstanding feedback has been addressed & has two "fresh" ACKs and a handful of stale utACKs/makes senses.

ryanofsky · 2020-05-19T19:06:35Z

I think this is ready to merge @MarcoFalke.

All outstanding feedback has been addressed & has two "fresh" ACKs and a handful of stale utACKs/makes senses.

Any agreement / disagreement?

maflcko · 2020-06-02T13:29:29Z

On arm:

$ ./src/bench/bench_bitcoin --filter=.*Padded
# Benchmark, evals, iterations, total, min, max, median
PrePadded, 5, 10000, 0.0305467, 6.09309e-07, 6.13757e-07, 6.10061e-07
RegularPadded, 5, 10000, 0.057972, 1.15744e-06, 1.16534e-06, 1.15813e-06

152e8ba Use salted hasher instead of nonce in sigcache (Jeremy Rubin) 5495fa5 Add Hash Padding Microbenchmarks (Jeremy Rubin) Pull request description: This PR replaces nonces in two places with pre-salted hashers. The nonce is chosen to be 64 bytes long so that it forces the SHA256 hasher to process the chunk. This leaves the next 64 (or 56 depending if final chunk) open for data. In the case of the script execution cache, this does not make a big performance improvement because the nonce was already properly padded to fit into one buffer, but does make the code a little simpler. In the case of the sig cache, this should reduce the hashing overhead slightly because we are less likely to need an additional processing step. I haven't benchmarked this, but back of the envelope it should reduce the hashing by one buffer for all combinations except compressed public keys with compact signatures. ACKs for top commit: ryanofsky: Code review ACK 152e8ba. No code changes, just rebase since last review and expanded commit message Tree-SHA512: b133e902fd595cfe3b54ad8814b823f4d132cb2c358c89158842ae27daee56ab5f70cde2585078deb46f77a6e7b35b4cc6bba47b65302b7befc2cff254bad93d

Summary: ``` This PR replaces nonces in two places with pre-salted hashers. The nonce is chosen to be 64 bytes long so that it forces the SHA256 hasher to process the chunk. This leaves the next 64 (or 56 depending if final chunk) open for data. In the case of the script execution cache, this does not make a big performance improvement because the nonce was already properly padded to fit into one buffer, but does make the code a little simpler. In the case of the sig cache, this should reduce the hashing overhead slightly because we are less likely to need an additional processing step. ``` Backport of core [[bitcoin/bitcoin#13204 | PR13204]]. Test Plan: ninja all check-all ninja bench-bitcoin Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D9212

fac1223 Cache witness hash in CTransaction (MarcoFalke) faab55f Make CMutableTransaction constructor explicit (MarcoFalke) Pull request description: This speeds up: * compactblocks (v2) * ATMP * validation and miner (via `BlockWitnessMerkleRoot`) * sigcache (see also unrelated bitcoin#13204) * rpc and rest (nice, but irrelevant) This presumably slows down rescan, which uses a `CTransaction` and its `GetHash`, but never uses the `GetWitnessHash`. The slow down is proportional to the number of witness transactions in the rescan window. I.e. early in the chain there should be no measurable slow down. Later in the chain, there should be a slow down, but acceptable given the speedups in the modules mentioned above. Tree-SHA512: 443e86acfcceb5af2163e68840c581d44159af3fd1fce266cab3504b29fcd74c50812b69a00d41582e7e1c5ea292f420ce5e892cdfab691da9c24ed1c44536c7

laanwj added the Validation label May 9, 2018

maflcko mentioned this pull request May 11, 2018

Cache witness hash in CTransaction #13011

Merged

jimpo approved these changes May 14, 2018

View reviewed changes

DrahtBot closed this Jul 22, 2018

DrahtBot reopened this Jul 22, 2018

DrahtBot mentioned this pull request Aug 3, 2018

Remove unused fScriptChecks parameter from CheckInputs #13868

Merged

HashUnlimited mentioned this pull request Sep 7, 2018

Cache witness hash in CTransaction chaincoin/chaincoin#353

Merged

DrahtBot mentioned this pull request Sep 21, 2018

Add missing locks: validation.cpp + related #11652

Closed

DrahtBot mentioned this pull request Oct 9, 2018

Remove redundant run time assertions for locks already checked at compile time #14443

Closed

DrahtBot closed this Apr 28, 2019

DrahtBot reopened this Apr 28, 2019

maflcko closed this May 7, 2019

maflcko reopened this May 7, 2019

JeremyRubin force-pushed the faster-sigcache-nonce branch from 9c8cba8 to f6b9da7 Compare May 7, 2019 20:48

maflcko reviewed May 7, 2019

View reviewed changes

src/validation.cpp Outdated Show resolved Hide resolved

ryanofsky approved these changes Jul 24, 2019

View reviewed changes

maflcko reviewed Jul 24, 2019

View reviewed changes

JeremyRubin force-pushed the faster-sigcache-nonce branch from 5b87767 to dec5f96 Compare July 24, 2019 22:01

ryanofsky approved these changes Jul 31, 2019

View reviewed changes

ariard approved these changes Aug 2, 2019

View reviewed changes

DrahtBot added the Needs rebase label Sep 2, 2019

JeremyRubin force-pushed the faster-sigcache-nonce branch 2 times, most recently from 12a8340 to 82740bd Compare April 29, 2020 07:31

Add Hash Padding Microbenchmarks

5495fa5

DrahtBot removed the Needs rebase label Apr 29, 2020

DrahtBot mentioned this pull request Apr 29, 2020

bench: Add logging benchmark #18815

Merged

JeremyRubin force-pushed the faster-sigcache-nonce branch from 82740bd to 152e8ba Compare April 29, 2020 17:49

DrahtBot mentioned this pull request May 3, 2020

Implement BIP 340-342 validation (Schnorr/taproot/tapscript) #17977

Closed

4 tasks

ryanofsky approved these changes May 13, 2020

View reviewed changes

maflcko merged commit 9e8bd21 into bitcoin:master Jun 2, 2020

bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster sigcache nonce #13204

Faster sigcache nonce #13204

JeremyRubin commented May 9, 2018 •

edited by DrahtBot

jimpo left a comment

jimpo May 14, 2018

DrahtBot commented Sep 7, 2018 •

edited

sipa commented Nov 10, 2018

DrahtBot commented Apr 28, 2019

maflcko commented Apr 29, 2019

maflcko commented May 7, 2019

gmaxwell commented May 7, 2019

maflcko commented May 7, 2019

jamesob commented May 7, 2019

JeremyRubin commented May 7, 2019

gmaxwell commented May 7, 2019

jamesob commented May 8, 2019

JeremyRubin commented May 9, 2019

maflcko commented May 9, 2019

maflcko commented May 9, 2019

JeremyRubin commented Jul 24, 2019

ryanofsky left a comment

maflcko left a comment •

edited

maflcko Jul 24, 2019

JeremyRubin Jul 24, 2019

JeremyRubin Jul 24, 2019

JeremyRubin Jul 24, 2019

maflcko Jul 24, 2019

JeremyRubin commented Jul 24, 2019

JeremyRubin commented Jul 24, 2019

ryanofsky left a comment •

edited

ariard left a comment

ryanofsky left a comment

JeremyRubin commented May 13, 2020

ryanofsky commented May 19, 2020

maflcko commented Jun 2, 2020

Faster sigcache nonce #13204

Faster sigcache nonce #13204

Conversation

JeremyRubin commented May 9, 2018 • edited by DrahtBot

jimpo left a comment

Choose a reason for hiding this comment

jimpo May 14, 2018

Choose a reason for hiding this comment

DrahtBot commented Sep 7, 2018 • edited

Conflicts

sipa commented Nov 10, 2018

DrahtBot commented Apr 28, 2019

maflcko commented Apr 29, 2019

maflcko commented May 7, 2019

gmaxwell commented May 7, 2019

maflcko commented May 7, 2019

jamesob commented May 7, 2019

faster-sigcache-nonce vs. master (absolute)

faster-sigcache-nonce vs. master (relative)

JeremyRubin commented May 7, 2019

gmaxwell commented May 7, 2019

jamesob commented May 8, 2019

JeremyRubin commented May 9, 2019

maflcko commented May 9, 2019

maflcko commented May 9, 2019

JeremyRubin commented Jul 24, 2019

ryanofsky left a comment

Choose a reason for hiding this comment

maflcko left a comment • edited

Choose a reason for hiding this comment

maflcko Jul 24, 2019

Choose a reason for hiding this comment

JeremyRubin Jul 24, 2019

Choose a reason for hiding this comment

JeremyRubin Jul 24, 2019

Choose a reason for hiding this comment

JeremyRubin Jul 24, 2019

Choose a reason for hiding this comment

maflcko Jul 24, 2019

Choose a reason for hiding this comment

JeremyRubin commented Jul 24, 2019

JeremyRubin commented Jul 24, 2019

ryanofsky left a comment • edited

Choose a reason for hiding this comment

ariard left a comment

Choose a reason for hiding this comment

ryanofsky left a comment

Choose a reason for hiding this comment

JeremyRubin commented May 13, 2020

ryanofsky commented May 19, 2020

maflcko commented Jun 2, 2020

JeremyRubin commented May 9, 2018 •

edited by DrahtBot

DrahtBot commented Sep 7, 2018 •

edited

maflcko left a comment •

edited

ryanofsky left a comment •

edited