coins: use jumboblock SipHash-1-3 for hashing CCoinsMap keys#35215
coins: use jumboblock SipHash-1-3 for hashing CCoinsMap keys#35215l0rinc wants to merge 4 commits intobitcoin:masterfrom
Conversation
Rename the existing 32-byte benchmark to `SipHash24_32b`. Add a 36-byte variant for `uint256` plus a 32-bit outpoint index. This records the current baseline shape before adding outpoint-specific hashers.
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35215. ReviewsSee the guideline for information on the review process. |
f2768c5 to
374cdf8
Compare
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
Add `PresaltedSipHasher13Jumbo` for hashing a `uint256` plus a `uint32_t` outpoint index. For this fixed 36-byte input, the implementation processes the four hash limbs as one jumboblock, keeps the existing index/length word, omits `m5`, and runs 3 finalization SipRounds. This is the `SH13+JB+UP` case from Pieter Wuille's sketch. Switch the existing `SaltedOutpointHasher` wrapper to the new presalted hasher, so all existing `COutPoint` unordered containers keep their public hasher type while using the faster implementation. This is a non-standard table-hashing specialization, meant for internal hash tables whose keys already contain a uniformly distributed hash. Add a fixed test vector for the non-standard path. Co-authored-by: Pieter Wuille <pieter@wuille.net> Co-authored-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Add a focused benchmark for `PresaltedSipHasher13Jumbo` on the same 36-byte shape as `SipHash24_36b`. This lets reviewers compare the existing SipHash-2-4 36-byte benchmark with the new SipHash-1-3 jumboblock specialization.
Move `siphash_detail::SipRound` and `PresaltedSipHasher13Jumbo::operator()` into `siphash.h`. This lets the benchmark and `COutPoint` hash-table call sites inline the short specialized hash body through `SaltedOutpointHasher`. Inlining made this benchmark up to about 16% faster locally, while the existing SipHash implementations did not show improvement from the same treatment.
374cdf8 to
7ca4e5d
Compare
|
FTR I confirm my statements quoted by OP
|
Problem
Several internal unordered maps/sets use
COutPointkeys (most notablyCCoinsMapstoring the in-memory dbcache), repeatedly hashing a 32-byte txid plus a 32-bit output index using SipHash-2-4.That path is conservative but expensive for this fixed 36-byte shape: it processes the four txid limbs and the final output-index/length word in 14 SipRounds.
SipHash-1-3 & Jumbo blocks
This implementation follows Pieter Wuille's jumboblock suggestion based on the SipHash analysis paper.
The main input is already a 256-bit hash, so the hasher processes the four txid limbs together as one block instead of feeding them as four independent 64-bit SipHash message blocks.
Pieter Wuille's jumboblock and SipRound sketch
Design
For the
COutPointshape, this processes the txid limbs as a jumboblock, keeps Bitcoin Core's existing combined index/length word (m4) (omittingm5from Pieter's generic sketch).For symmetry the length byte is also included (even though this one doesn't need to be compatible with a variable-length SipHash implementation), and the path drops from 14 SipRounds (
SH24+UP) to 5 (SH13+JB+UP).Pieter also ran the jumboblock idea by Jean-Philippe Aumasson, one of the SipHash authors; based on a preliminary analysis, Aumasson did not think this made collisions easier to construct.
Aumasson also said SipHash-1-3 is fine for this hashmap use case and offered to comment on or review the PR.
Old: 4 separate 64-bit txid compressions + 1 index/length compression + 4 finalization rounds = 14 SipRounds.
New: 1 combined 256-bit txid compression + 1 index/length compression + 3 finalization rounds = 5 SipRounds.
Fix
Add
PresaltedSipHasher13Jumbo, a narrow SipHash-1-3 jumboblock specialization for hashing an existinguint256hash plus auint32_tindex.Then switch the existing
SaltedOutpointHasherwrapper to use it, so existingCOutPointunordered maps and sets keep their public hasher type while getting the faster implementation.This covers the coins cache and other in-memory outpoint tables through the existing abstraction, without spreading a variant-specific type name through call sites.
The regular
PresaltedSipHasherpath remains for txid/wtxid/uint256 hashers, compact-block short IDs, and persisted/index key derivation.The salted hash codes are only local in-memory table indexes for the current process; they already vary across normal restarts and are not serialized, persisted, sent over the network, or used for consensus.
Reproducer
A test vector documents the non-standard jumboblock output, and the benchmarks now include both the existing 36-byte SipHash-2-4 path and the new jumboblock path.
Counting the dbcache buckets indicates the new hasher satisfies the uniformness criteria we relied on before:

Fresh isolated aarch64 microbenchmarks on a Raspberry Pi 5 show the new 36-byte path about 2x faster with GCC and Clang.
Linux reproducer command
The command below rebuilds with GCC and Clang and prints only the benchmark output after the build.
aarch64 SipHash 36-byte microbenchmarks: ~2x faster with GCC and Clang
Benchmarks
The
-reindex-chainstateruns below were collected before this final shared-wrapper shape, while the branch still applied the same jumboblock hasher only toCCoinsMap.5% faster | reindex-chainstate | 946649 blocks | dbcache 1000 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | SSD
5% faster | reindex-chainstate | 946649 blocks | dbcache 30000 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | SSD
2% faster | reindex-chainstate | 946649 blocks | dbcache 1000 | rpi5-16-3 | aarch64 | Cortex-A76 | 4 cores | 15Gi RAM | SSD
2% faster | reindex-chainstate | 946649 blocks | dbcache 1000 | umbrel | x86_64 | Intel(R) N150 | 4 cores | 15Gi RAM | SSD
Future Work