New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache full script execution results in addition to signatures #10192

Merged
merged 7 commits into from Jun 29, 2017

Conversation

@TheBlueMatt
Contributor

TheBlueMatt commented Apr 11, 2017

This adds a new CuckooCache in validation, caching whether all of a
transaction's scripts were valid with a given set of script flags.

Unlike previous attempts at caching an entire transaction's
validity, which have nearly universally introduced consensus
failures, this only caches the validity of a transaction's
scriptSigs. As these are pure functions of the transaction and
data it commits to, this should be much safer.

This is somewhat duplicative with the sigcache, as entries in the
new cache will also have several entries in the sigcache. However,
the sigcache is kept both as ATMP relies on it and because it
prevents malleability-based DoS attacks on the new higher-level
cache. Instead, the -sigcachesize option is re-used - cutting the
sigcache size in half and using the newly freed memory for the
script execution cache.

Transactions which match the script execution cache never even have
entries in the script check thread's workqueue created.

Note that the cache is indexed only on the script execution flags
and the transaction's witness hash. While this is sufficient to
make the CScriptCheck() calls pure functions, this introduces
dependancies on the mempool calculating things such as the
PrecomputedTransactionData object, filling the CCoinsViewCache, etc
in the exact same way as ConnectBlock. I belive this is a reasonable
assumption, but should be noted carefully.

In a rather naive benchmark (reindex-chainstate up to block 284k
with cuckoocache always returning true for contains(),
-assumevalid=0 and a very large dbcache), this connected blocks
~1.7x faster.

void InitScriptExecutionCache() {
// nMaxCacheSize is unsigned. If -maxsigcachesize is set to zero,
// setup_bytes creates the minimum possible cache (2 elements).
size_t nMaxCacheSize = std::min(std::max((int64_t)0, GetArg("-maxsigcachesize", DEFAULT_MAX_SIG_CACHE_SIZE) / 2), MAX_MAX_SIG_CACHE_SIZE) * ((size_t) 1 << 20);

This comment has been minimized.

@sipa

sipa Apr 11, 2017

Member

I think the division should be outside of GetArg. Otherwise, if you specify -maxsigcachesize=32, you end up with a total of 64MiB worth of caches.

@sipa

sipa Apr 11, 2017

Member

I think the division should be outside of GetArg. Otherwise, if you specify -maxsigcachesize=32, you end up with a total of 64MiB worth of caches.

This comment has been minimized.

@gmaxwell

gmaxwell Apr 12, 2017

Member

Good, because the division is outside of the GetArg. :)

@gmaxwell

gmaxwell Apr 12, 2017

Member

Good, because the division is outside of the GetArg. :)

This comment has been minimized.

@sipa

sipa Apr 13, 2017

Member

It seems I am blind.

@sipa

sipa Apr 13, 2017

Member

It seems I am blind.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Apr 11, 2017

Contributor

Fixed test_bitcoin segfaulting as it didnt init the script cache as it does the sigcache.

Contributor

TheBlueMatt commented Apr 11, 2017

Fixed test_bitcoin segfaulting as it didnt init the script cache as it does the sigcache.

@fanquake fanquake added the Validation label Apr 12, 2017

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Apr 12, 2017

Member

Concept ACK.

Member

gmaxwell commented Apr 12, 2017

Concept ACK.

@JeremyRubin

Concept Ack!

Show outdated Hide outdated src/validation.cpp Outdated
Show outdated Hide outdated src/validation.cpp Outdated
Show outdated Hide outdated src/validation.cpp Outdated
Show outdated Hide outdated src/validation.cpp Outdated
Show outdated Hide outdated src/validation.cpp Outdated
Show outdated Hide outdated src/validation.cpp Outdated
@instagibbs

This comment has been minimized.

Show comment
Hide comment
@instagibbs

instagibbs Apr 12, 2017

Member

concept ACK

Member

instagibbs commented Apr 12, 2017

concept ACK

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Apr 12, 2017

Contributor

Addressed Jeremy's comments aside from the request for a wrapper class, I think we need fewer dummy classes, not more :/. Also rebased on #9480.

Contributor

TheBlueMatt commented Apr 12, 2017

Addressed Jeremy's comments aside from the request for a wrapper class, I think we need fewer dummy classes, not more :/. Also rebased on #9480.

Show outdated Hide outdated src/validation.cpp Outdated
@sdaftuar

This comment has been minimized.

Show comment
Hide comment
@sdaftuar

sdaftuar Apr 19, 2017

Member

Note that the cache is indexed only on the script execution flags
and the transaction's witness hash. While this is sufficient to
make the CScriptCheck() calls pure functions, this introduces
dependancies on the mempool calculating things such as the
PrecomputedTransactionData object, filling the CCoinsViewCache, etc
in the exact same way as ConnectBlock. I belive this is a reasonable
assumption, but should be noted carefully.

So if I understand correctly, this makes CCoinsViewMempool, and therefore the mempool itself, consensus critical, which I would prefer to avoid. Would we still get a performance benefit if we were to include the coins being spent in the hash for the index lookup?

Edited: as per offline discussion, perhaps just move CCoinsViewMemPool into validation.cpp to make it clear it's part of consensus, and sanity check the results from the mempool.

Member

sdaftuar commented Apr 19, 2017

Note that the cache is indexed only on the script execution flags
and the transaction's witness hash. While this is sufficient to
make the CScriptCheck() calls pure functions, this introduces
dependancies on the mempool calculating things such as the
PrecomputedTransactionData object, filling the CCoinsViewCache, etc
in the exact same way as ConnectBlock. I belive this is a reasonable
assumption, but should be noted carefully.

So if I understand correctly, this makes CCoinsViewMempool, and therefore the mempool itself, consensus critical, which I would prefer to avoid. Would we still get a performance benefit if we were to include the coins being spent in the hash for the index lookup?

Edited: as per offline discussion, perhaps just move CCoinsViewMemPool into validation.cpp to make it clear it's part of consensus, and sanity check the results from the mempool.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Apr 19, 2017

Contributor

@sdaftuar I'm not convinced its a massive concern, but I went ahead and added a wrapper which checks each scriptPubKey returned by the CCoinsViewCache is the one committed to by the input's prevout hash, which I believe removes that dependancy entirely.

Contributor

TheBlueMatt commented Apr 19, 2017

@sdaftuar I'm not convinced its a massive concern, but I went ahead and added a wrapper which checks each scriptPubKey returned by the CCoinsViewCache is the one committed to by the input's prevout hash, which I believe removes that dependancy entirely.

Show outdated Hide outdated src/script/sigcache.cpp Outdated
Show outdated Hide outdated src/validation.cpp Outdated
@morcos

This comment has been minimized.

Show comment
Hide comment
@morcos

morcos Apr 25, 2017

Member

Concept ACK
But I'd prefer if there were more safeguards in place against future changes that might cause consensus failure. For instance, I think anything that is inputted to CScriptCheck should be committed to by the hash. Right now it its the case that anything USED by CScriptCheck is committed to, but there is nothing stopping a future change to CScriptCheck that used the height from the Coins or something and caused problems.

Member

morcos commented Apr 25, 2017

Concept ACK
But I'd prefer if there were more safeguards in place against future changes that might cause consensus failure. For instance, I think anything that is inputted to CScriptCheck should be committed to by the hash. Right now it its the case that anything USED by CScriptCheck is committed to, but there is nothing stopping a future change to CScriptCheck that used the height from the Coins or something and caused problems.

Show outdated Hide outdated src/validation.cpp Outdated
Show outdated Hide outdated src/validation.cpp Outdated
Show outdated Hide outdated src/validation.cpp Outdated
Show outdated Hide outdated src/validation.cpp Outdated
@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Apr 27, 2017

Contributor

@morcos I added an additional commit to only pass the scriptPubKey and nValue from the prevout into CScriptCheck, so hopefully any such future changes would be super clear to reviewers as consensus bugs. Sadly I dont really want to just include the height in the hash, as there are many heights, but I think this is a sufficient change.

Contributor

TheBlueMatt commented Apr 27, 2017

@morcos I added an additional commit to only pass the scriptPubKey and nValue from the prevout into CScriptCheck, so hopefully any such future changes would be super clear to reviewers as consensus bugs. Sadly I dont really want to just include the height in the hash, as there are many heights, but I think this is a sufficient change.

Show outdated Hide outdated src/validation.cpp Outdated
@sdaftuar

This comment has been minimized.

Show comment
Hide comment
@sdaftuar

sdaftuar May 19, 2017

Member

Needs rebase

Member

sdaftuar commented May 19, 2017

Needs rebase

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell
Member

gmaxwell commented May 20, 2017

@TheBlueMatt REBASE ME

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt May 22, 2017

Contributor

Rebased :).

Contributor

TheBlueMatt commented May 22, 2017

Rebased :).

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa May 24, 2017

Member

Needs moar rebase.

Member

sipa commented May 24, 2017

Needs moar rebase.

@sdaftuar

Needs (simple) rebase, but code review ACK apart from a couple nits. Will test and profile.

Show outdated Hide outdated src/validation.cpp Outdated
Show outdated Hide outdated src/validation.cpp Outdated
@jtimon

This comment has been minimized.

Show comment
Hide comment
@jtimon

jtimon May 25, 2017

Member

#10427 introduces GetScriptFlags like here but with some of my nits solved. If merged first, should make this a little bit smaller and simpler to review.
This needs rebase again.

Member

jtimon commented May 25, 2017

#10427 introduces GetScriptFlags like here but with some of my nits solved. If merged first, should make this a little bit smaller and simpler to review.
This needs rebase again.

Show outdated Hide outdated src/validation.cpp Outdated

@laanwj laanwj added this to Blockers in High-priority for review Jun 1, 2017

@luke-jr

This comment has been minimized.

Show comment
Hide comment
@luke-jr

luke-jr Jun 1, 2017

Member

This sounds like it would break (or at least complicate) CHECKBLOCKVERSION and possibly even CHECKBLOCKATHEIGHT...?

Member

luke-jr commented Jun 1, 2017

This sounds like it would break (or at least complicate) CHECKBLOCKVERSION and possibly even CHECKBLOCKATHEIGHT...?

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Jun 21, 2017

Contributor

@JeremyRubin I have not, that would likely be interesting in the future (as well as possibly not making it an even 1/2-1/2 split in memory usage).

Contributor

TheBlueMatt commented Jun 21, 2017

@JeremyRubin I have not, that would likely be interesting in the future (as well as possibly not making it an even 1/2-1/2 split in memory usage).

Show outdated Hide outdated src/validation.cpp Outdated
@@ -1666,7 +1779,7 @@ static bool ConnectBlock(const CBlock& block, CValidationState& state, CBlockInd
std::vector<CScriptCheck> vChecks;
bool fCacheResults = fJustCheck; /* Don't cache results if we're actually connecting blocks (still consult the cache, though) */
if (!CheckInputs(tx, state, view, fScriptChecks, flags, fCacheResults, txdata[i], nScriptCheckThreads ? &vChecks : NULL))
if (!CheckInputs(tx, state, view, fScriptChecks, flags, fCacheResults, fCacheResults, txdata[i], nScriptCheckThreads ? &vChecks : NULL))

This comment has been minimized.

@sipa

sipa Jun 22, 2017

Member

I think the second fCacheResults here can be false; as we normally pass a non-NULL pvChecks here, fCacheFullScriptStore = true has no effects anyway. This only affects TestBlockValidity.

@sipa

sipa Jun 22, 2017

Member

I think the second fCacheResults here can be false; as we normally pass a non-NULL pvChecks here, fCacheFullScriptStore = true has no effects anyway. This only affects TestBlockValidity.

This comment has been minimized.

@TheBlueMatt

TheBlueMatt Jun 22, 2017

Contributor

fCacheFullScriptStore has a second meaning - it also deletes the element from the cache if a match is found, so we really should pass it through here to avoid deleting (or marking available) cache entries for TBV.

@TheBlueMatt

TheBlueMatt Jun 22, 2017

Contributor

fCacheFullScriptStore has a second meaning - it also deletes the element from the cache if a match is found, so we really should pass it through here to avoid deleting (or marking available) cache entries for TBV.

This comment has been minimized.

@sipa

sipa Jun 23, 2017

Member

Ah thanks, I missed that.

@sipa

sipa Jun 23, 2017

Member

Ah thanks, I missed that.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Jun 22, 2017

Member

utACK c435d9f apart from nits

Member

sipa commented Jun 22, 2017

utACK c435d9f apart from nits

TheBlueMatt added some commits Apr 21, 2017

Add CheckInputs wrapper CCoinsViewMemPool -> non-consensus-critical
This wraps CheckInputs in ATMP's cache-inputs call to check that
each scriptPubKey the CCoinsViewCache provides is the one which
was committed to by the input's transaction hash.
@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Jun 23, 2017

Contributor

@sdaftuar pointed out that we could directly test CheckInputs' use of its own cache in unit tests, so I added a rather simple one that did so.

Contributor

TheBlueMatt commented Jun 23, 2017

@sdaftuar pointed out that we could directly test CheckInputs' use of its own cache in unit tests, so I added a rather simple one that did so.

@laanwj laanwj self-assigned this Jun 26, 2017

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Jun 26, 2017

Member

utACK 316d328

Member

sipa commented Jun 26, 2017

utACK 316d328

@gmaxwell

ACK. It's worth pointing out that we have no system test that fails when the flags are mishandled here-- though the unit test correctly fails.

@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Jun 27, 2017

Contributor

I believe @sdaftuar indicated that he had a better test written for this, not sure what format it takes, however.

Contributor

TheBlueMatt commented Jun 27, 2017

I believe @sdaftuar indicated that he had a better test written for this, not sure what format it takes, however.

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa Jun 27, 2017

Member

Verified empirically that this actually gives a performance improvement:

Last 10 block verifications on my server (benchmarked using -debug=bench):

On master as of a few weeks ago:

2017-06-26 22:16:21.499109     - Verify 5160 txins: 118.87ms (0.023ms/txin) [96.86s]
2017-06-26 22:22:34.710518     - Verify 3957 txins: 84.70ms (0.021ms/txin) [96.95s]
2017-06-26 22:32:43.025738     - Verify 4333 txins: 85.08ms (0.020ms/txin) [97.03s]
2017-06-26 22:41:47.084320     - Verify 4300 txins: 83.29ms (0.019ms/txin) [97.12s]
2017-06-26 22:46:33.711967     - Verify 4650 txins: 60.67ms (0.013ms/txin) [97.18s]
2017-06-26 22:53:00.210314     - Verify 4656 txins: 61.19ms (0.013ms/txin) [97.24s]
2017-06-26 22:54:35.550521     - Verify 5128 txins: 100.05ms (0.020ms/txin) [97.34s]
2017-06-26 23:16:50.156286     - Verify 4889 txins: 118.09ms (0.024ms/txin) [97.46s]
2017-06-26 23:18:17.993009     - Verify 4973 txins: 104.94ms (0.021ms/txin) [97.56s]
2017-06-26 23:27:55.039898     - Verify 4712 txins: 92.86ms (0.020ms/txin) [97.65s]

After restarting with this PR:

2017-06-26 23:38:16.352833     - Verify 6016 txins: 39.89ms (0.007ms/txin) [0.26s]
2017-06-26 23:38:37.499180     - Verify 6650 txins: 33.67ms (0.005ms/txin) [0.29s]
2017-06-26 23:52:52.539748     - Verify 4686 txins: 60.94ms (0.013ms/txin) [0.35s]
2017-06-27 00:34:42.788390     - Verify 4417 txins: 61.53ms (0.014ms/txin) [0.42s]
2017-06-27 00:38:47.022972     - Verify 3023 txins: 33.88ms (0.011ms/txin) [0.45s]
2017-06-27 00:42:59.372504     - Verify 4289 txins: 38.45ms (0.009ms/txin) [0.49s]
2017-06-27 00:44:06.469073     - Verify 5065 txins: 34.92ms (0.007ms/txin) [0.52s]
2017-06-27 00:53:41.905643     - Verify 4378 txins: 72.81ms (0.017ms/txin) [0.60s]
2017-06-27 01:04:23.953806     - Verify 4917 txins: 52.77ms (0.011ms/txin) [0.65s]
2017-06-27 01:08:59.598035     - Verify 5002 txins: 48.47ms (0.010ms/txin) [0.70s]
Member

sipa commented Jun 27, 2017

Verified empirically that this actually gives a performance improvement:

Last 10 block verifications on my server (benchmarked using -debug=bench):

On master as of a few weeks ago:

2017-06-26 22:16:21.499109     - Verify 5160 txins: 118.87ms (0.023ms/txin) [96.86s]
2017-06-26 22:22:34.710518     - Verify 3957 txins: 84.70ms (0.021ms/txin) [96.95s]
2017-06-26 22:32:43.025738     - Verify 4333 txins: 85.08ms (0.020ms/txin) [97.03s]
2017-06-26 22:41:47.084320     - Verify 4300 txins: 83.29ms (0.019ms/txin) [97.12s]
2017-06-26 22:46:33.711967     - Verify 4650 txins: 60.67ms (0.013ms/txin) [97.18s]
2017-06-26 22:53:00.210314     - Verify 4656 txins: 61.19ms (0.013ms/txin) [97.24s]
2017-06-26 22:54:35.550521     - Verify 5128 txins: 100.05ms (0.020ms/txin) [97.34s]
2017-06-26 23:16:50.156286     - Verify 4889 txins: 118.09ms (0.024ms/txin) [97.46s]
2017-06-26 23:18:17.993009     - Verify 4973 txins: 104.94ms (0.021ms/txin) [97.56s]
2017-06-26 23:27:55.039898     - Verify 4712 txins: 92.86ms (0.020ms/txin) [97.65s]

After restarting with this PR:

2017-06-26 23:38:16.352833     - Verify 6016 txins: 39.89ms (0.007ms/txin) [0.26s]
2017-06-26 23:38:37.499180     - Verify 6650 txins: 33.67ms (0.005ms/txin) [0.29s]
2017-06-26 23:52:52.539748     - Verify 4686 txins: 60.94ms (0.013ms/txin) [0.35s]
2017-06-27 00:34:42.788390     - Verify 4417 txins: 61.53ms (0.014ms/txin) [0.42s]
2017-06-27 00:38:47.022972     - Verify 3023 txins: 33.88ms (0.011ms/txin) [0.45s]
2017-06-27 00:42:59.372504     - Verify 4289 txins: 38.45ms (0.009ms/txin) [0.49s]
2017-06-27 00:44:06.469073     - Verify 5065 txins: 34.92ms (0.007ms/txin) [0.52s]
2017-06-27 00:53:41.905643     - Verify 4378 txins: 72.81ms (0.017ms/txin) [0.60s]
2017-06-27 01:04:23.953806     - Verify 4917 txins: 52.77ms (0.011ms/txin) [0.65s]
2017-06-27 01:08:59.598035     - Verify 5002 txins: 48.47ms (0.010ms/txin) [0.70s]
Add CheckInputs() unit tests
Check that cached script execution results are only valid for the same
script flags; that script execution checks are returned for non-cached
transactions; and that cached results are only valid for transactions
with the same witness hash.
@TheBlueMatt

This comment has been minimized.

Show comment
Hide comment
@TheBlueMatt

TheBlueMatt Jun 27, 2017

Contributor

Replaced test with one by @sdaftuar which is much better. Should be good to go now.

Contributor

TheBlueMatt commented Jun 27, 2017

Replaced test with one by @sdaftuar which is much better. Should be good to go now.

@sdaftuar

This comment has been minimized.

Show comment
Hide comment
@sdaftuar
Member

sdaftuar commented Jun 27, 2017

ACK e3f9c05

// CheckInputs should succeed iff test_flags doesn't intersect with
// failing_flags
bool expected_return_value = !(test_flags & failing_flags);
if (expected_return_value && upgraded_nop) {

This comment has been minimized.

@sipa

sipa Jun 28, 2017

Member

This logic may become unnecessary with #10699.

@sipa

sipa Jun 28, 2017

Member

This logic may become unnecessary with #10699.

@laanwj laanwj merged commit e3f9c05 into bitcoin:master Jun 29, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Jun 29, 2017

Merge #10192: Cache full script execution results in addition to sign…
…atures

e3f9c05 Add CheckInputs() unit tests (Suhas Daftuar)
a3543af Better document CheckInputs parameter meanings (Matt Corallo)
309ee1a Update -maxsigcachesize doc clarify init logprints for it (Matt Corallo)
b014668 Add CheckInputs wrapper CCoinsViewMemPool -> non-consensus-critical (Matt Corallo)
eada04e Do not print soft-fork-script warning with -promiscuousmempool (Matt Corallo)
b5fea8d Cache full script execution results in addition to signatures (Matt Corallo)
6d22b2b Pull script verify flags calculation out of ConnectBlock (Matt Corallo)

Tree-SHA512: 0c6c3c79c64fcb21e17ab60290c5c96d4fac11624c49f841a4201eec21cb480314c52a07d1e3abd4f9c764785cc57bfd178511f495aa0469addb204e96214fe4

@laanwj laanwj removed this from Blockers in High-priority for review Jun 29, 2017

@gmaxwell

reACK w/ new tests

@morcos

This comment has been minimized.

Show comment
Hide comment
@morcos

morcos Jul 5, 2017

Member

posthumous utACK

Thanks for doing this.

Member

morcos commented Jul 5, 2017

posthumous utACK

Thanks for doing this.

@JeremyRubin

This comment has been minimized.

Show comment
Hide comment
@JeremyRubin

JeremyRubin Jul 6, 2017

Contributor

probably wrong place to have this conversation, but for the sake of continuity...

@JeremyRubin I have not, that would likely be interesting in the future (as well as possibly not making it an even 1/2-1/2 split in memory usage).

Yeah I've (asynchronously) been thinking about this one a little bit in spare cycles. There are a couple tricks on could do to easily increase frequency, e.g. add a insert_n_times to the cuckoocache where you can specify that it should be put onto 1-8 of it's hash locations. This makes the hash "stickier" in memory compared to other entries and is pretty easy to compute.

Generally, this also provides a really interesting mechanism for further tuning the hit rate based on other priors. E.g., if a txn comes in and we estimate that it has a 0.5 chance of entering a block, we could set it to have, say, half of it's hash values inserted onto. If a txn comes in and we estimate that it has 0.25 we could fill a quarter of them.

This can also be simulated by inserting the hash N times with different salts or something without modifying the cuckoocache internally.

Contributor

JeremyRubin commented Jul 6, 2017

probably wrong place to have this conversation, but for the sake of continuity...

@JeremyRubin I have not, that would likely be interesting in the future (as well as possibly not making it an even 1/2-1/2 split in memory usage).

Yeah I've (asynchronously) been thinking about this one a little bit in spare cycles. There are a couple tricks on could do to easily increase frequency, e.g. add a insert_n_times to the cuckoocache where you can specify that it should be put onto 1-8 of it's hash locations. This makes the hash "stickier" in memory compared to other entries and is pretty easy to compute.

Generally, this also provides a really interesting mechanism for further tuning the hit rate based on other priors. E.g., if a txn comes in and we estimate that it has a 0.5 chance of entering a block, we could set it to have, say, half of it's hash values inserted onto. If a txn comes in and we estimate that it has 0.25 we could fill a quarter of them.

This can also be simulated by inserting the hash N times with different salts or something without modifying the cuckoocache internally.

MarcoFalke added a commit that referenced this pull request Jul 16, 2017

Merge #10739: test: Move variable `state` down where it is used
5618b7d Do not shadow upper local variable `state`. (Pavel Janík)

Pull request description:

  Tests added in #10192 emit few shadowing warnings:

  ```
  test/txvalidationcache_tests.cpp:268:26: warning: declaration shadows a local variable [-Wshadow]
  test/txvalidationcache_tests.cpp:296:26: warning: declaration shadows a local variable [-Wshadow]
  test/txvalidationcache_tests.cpp:357:26: warning: declaration shadows a local variable [-Wshadow]
  ```

  Remove shadowing declarations and reuse the upper local declaration as in other already present test cases.

Tree-SHA512: 1e3c52cf963f8f33e729900c8ecdcd5cc6fe28caa441ba53c4636df9cc3d1a351ca231966d36384589f1340ae8ddd447424c2ee3e8527d334d0412f0d1a10c8f

@jnewbery jnewbery referenced this pull request Jul 31, 2017

Closed

TODO for release notes 0.15.0 #9889

12 of 12 tasks complete

laanwj added a commit that referenced this pull request Mar 6, 2018

Merge #10271: Use std::thread::hardware_concurrency, instead of Boost…
…, to determine available cores

937bf43 Use std::thread::hardware_concurrency, instead of Boost, to determine available cores (fanquake)

Pull request description:

  Following discussion on IRC about replacing Boost usage for detecting available system cores, I've opened this to collect some benchmarks + further discussion.

  The current method for detecting available cores was introduced in #6361.

  Recap of the IRC chat:
  ```
  21:14:08 fanquake: Since we seem to be giving Boost removal a good shot for 0.15, does anyone have suggestions for replacing GetNumCores?
  21:14:26 fanquake: There is std::thread::hardware_concurrency(), but that seems to count virtual cores, which I don't think we want.
  21:14:51 BlueMatt: fanquake: I doubt we'll do boost removal for 0.15
  21:14:58 BlueMatt: shit like BOOST_FOREACH, sure
  21:15:07 BlueMatt: but all of boost? doubtful, there are still things we need
  21:16:36 fanquake: Yea sorry, not the whole lot, but we can remove a decent chunk. Just looking into what else needs to be done to replace some of the less involved Boost usage.
  21:16:43 BlueMatt: fair
  21:17:14 wumpus: yes, it makes sense to plan ahead a bit, without immediately doing it
  21:18:12 wumpus: right, don't count virtual cores, that used to be the case but it makes no sense for our usage
  21:19:15 wumpus: it'd create a swarm of threads overwhelming any machine with hyperthreading (+accompanying thread stack overhead), for script validation, and there was no gain at all for that
  21:20:03 sipa: BlueMatt: don't worry, there is no hurry
  21:59:10 morcos: wumpus: i don't think that is correct
  21:59:24 morcos: suppose you have 4 cores (8 virtual cores)
  21:59:24 wumpus: fanquake: indeed seems that std has no equivalent to physical_concurrency, on any standard. That's annoying as it is non-trivial to implement
  21:59:35 morcos: i think running par=8 (if it let you) would be notably faster
  21:59:59 morcos: jeremyrubin and i discussed this at length a while back... i think i commented about it on irc at the time
  22:00:21 wumpus: morcos: I think the conclusion at the time was that it made no difference, but sure would make sense to benchmark
  22:00:39 morcos: perhaps historical testing on the virtual vs actual cores was polluted by concurrency issues that have now improved
  22:00:47 wumpus: I think there are not more ALUs, so there is not really a point in having more threads
  22:01:40 wumpus: hyperthreads are basically just a stored register state right?
  22:02:23 sipa: wumpus: yes but it helps the scheduler
  22:02:27 wumpus: in which case the only speedup using "number of cores" threads would give you is, possibly, excluding other software from running on the cores on the same time
  22:02:37 morcos: well this is where i get out of my depth
  22:02:50 sipa: if one of the threads is waiting on a read from ram, the other can use the arithmetic unit for example
  22:02:54 morcos: wumpus: i'm pretty sure though that the speed up is considerably more than what you might expect from that
  22:02:59 wumpus: sipa: ok, I back down, I didn't want to argue this at all
  22:03:35 morcos: the reason i haven't tested it myself, is the machine i usually use has 16 cores... so not easy due to remaining concurrency issues to get much more speedup
  22:03:36 wumpus: I'm fine with restoring it to number of virtual threads if that's faster
  22:03:54 morcos: we should have somene with 4 cores (and  actually test it though, i agree
  22:03:58 sipa: i would expect (but we should benchmark...) that if 8 scriot validation threads instead of 4 on a quadcore hyperthreading is not faster, it's due to lock contention
  22:04:20 morcos: sipa: yeah thats my point, i think lock contention isn't that bad with 8 now
  22:04:22 wumpus: on 64-bit systems the additional thread overhead wouldn't be important at least
  22:04:23 gmaxwell: I previously benchmarked, a long time ago, it was faster.
  22:04:33 gmaxwell: (to use the HT core count)
  22:04:44 wumpus: why was this changed at all then?
  22:04:47 wumpus: I'm confused
  22:05:04 sipa: good question!
  22:05:06 gmaxwell: I had no idea we changed it.
  22:05:25 wumpus: sigh 
  22:05:54 gmaxwell: What PR changed it?
  22:06:51 gmaxwell: In any case, on 32-bit it's probably a good tradeoff... the extra ram overhead is worth avoiding.
  22:07:22 wumpus: #6361
  22:07:28 gmaxwell: PR 6461 btw.
  22:07:37 gmaxwell: er lol at least you got it right.
  22:07:45 wumpus: the complaint was that systems became unsuably slow when using that many thread
  22:07:51 wumpus: so at least I got one thing right, woohoo
  22:07:55 sipa: seems i even acked it!
  22:07:57 BlueMatt: wumpus: there are more alus
  22:08:38 BlueMatt: but we need to improve lock contention first
  22:08:40 morcos: anywya, i think in the past the lock contention made 8 threads regardless of cores a bit dicey.. now that is much better (although more still to be done)
  22:09:01 BlueMatt: or we can just merge #10192, thats fee
  22:09:04 gribble: #10192 | Cache full script execution results in addition to signatures by TheBlueMatt · Pull Request #10192 · bitcoin/bitcoin · GitHub
  22:09:11 BlueMatt: s/fee/free/
  22:09:21 morcos: no, we do not need to improve lock contention first.   but we should probably do that before we increase the max beyond 16
  22:09:26 BlueMatt: then we can toss concurrency issues out the window and get more speedup anyway
  22:09:35 gmaxwell: wumpus: yea, well in QT I thought we also diminished the count by 1 or something?  but yes, if the motivation was to reduce how heavily the machine was used, thats fair.
  22:09:56 sipa: the benefit of using HT cores is certainly not a factor 2
  22:09:58 wumpus: gmaxwell: for the default I think this makes a lot of sense, yes
  22:10:10 gmaxwell: morcos: right now on my 24/28 physical core hosts going beyond 16 still reduces performance.
  22:10:11 wumpus: gmaxwell: do we also restrict the maximum par using this? that'd make less sense
  22:10:51 wumpus: if someone *wants* to use the virtual cores they should be able to by setting -par=
  22:10:51 sipa: *flies to US*
  22:10:52 BlueMatt: sipa: sure, but the shared cache helps us get more out of it than some others, as morcos points out
  22:11:30 BlueMatt: (because it means our thread contention issues are less)
  22:12:05 morcos: gmaxwell: yeah i've been bogged down in fee estimation as well (and the rest of life) for a while now.. otherwise i would have put more effort into jeremy's checkqueue
  22:12:36 BlueMatt: morcos: heh, well now you can do other stuff while the rest of us get bogged down in understanding fee estimation enough to review it 
  22:12:37 wumpus: [to answer my own question: no, the limit for par is MAX_SCRIPTCHECK_THREADS, or 16]
  22:12:54 morcos: but to me optimizing for more than 16 cores is pretty valuable as miners could use beefy machines and be less concerned by block validation time
  22:14:38 BlueMatt: morcos: i think you may be surprised by the number of mining pools that are on VPSes that do not have 16 cores 
  22:15:34 gmaxwell: I assume right now most of the time block validation is bogged in the parts that are not as concurrent. simple because caching makes the concurrent parts so fast. (and soon to hopefully increase with bluematt's patch)
  22:17:55 gmaxwell: improving sha2 speed, or transaction malloc overhead are probably bigger wins now for connection at the tip than parallelism beyond 16 (though I'd like that too).
  22:18:21 BlueMatt: sha2 speed is big
  22:18:27 morcos: yeah lots of things to do actually...
  22:18:57 gmaxwell: BlueMatt: might be a tiny bit less big if we didn't hash the block header 8 times for every block. 
  22:21:27 BlueMatt: ehh, probably, but I'm less rushed there
  22:21:43 BlueMatt: my new cache thing is about to add a bunch of hashing
  22:21:50 BlueMatt: 1 sha round per tx
  22:22:25 BlueMatt: and sigcache is obviously a ton
  ```

Tree-SHA512: a594430e2a77d8cc741ea8c664a2867b1e1693e5050a4bbc8511e8d66a2bffe241a9965f6dff1e7fbb99f21dd1fdeb95b826365da8bd8f9fab2d0ffd80d5059c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment