-
Notifications
You must be signed in to change notification settings - Fork 38.1k
refactor: optimize block index comparisons (1.4-6.8x faster) #33637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33637. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Approach ACK I have tested the new block index comparator but I’ll refrain from acking the added benchmarks/tests |
Profiling shows this comparator consumes a significant portion `CheckBlockIndex`: ... ChainstateManager::CheckBlockIndex() ... std::_Rb_tree<...>::find(...) ... node::CBlockIndexWorkComparator::operator()(...) ... base_uint<256u>::CompareTo(...) const This commit introduces a benchmark that performs pairwise comparisons on 1000 randomly generated block indices (with some duplicates) to establish baseline performance metrics before further optimization. | ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 9.92 | 100,772,511.62 | 0.0% | 63.98 | 35.64 | 1.795 | 14.17 | 1.9% | 5.50 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 109,996.46 | 9,091.20 | 0.2% | 1,014,421.11 | 394,979.29 | 2.568 | 313,025.11 | 0.0% | 5.50 | `CheckBlockIndex`
Add equivalence tests to verify behavioral compatibility when optimizing comparison operations. These are duplicating behavior for now, but this way the reviewers can validate that the behave the same wway before the optimizations. The `arith_uint256` test verifies that the spaceship operator produces identical results to the original `CompareTo` method for all comparison operators (<, >, <=, >=, ==, !=). The `CBlockIndexWorkComparator` test captures the current comparison logic in a lambda and verifies that optimized versions maintain identical sorting behavior for chain work, sequence ID, and pointer tiebreaking. You can run the tests with: > cmake -B build && cmake --build build && ./build/bin/test_bitcoin --run_test=arith_uint256_tests,blockchain_tests
Remove the `CompareTo` method and inline its logic directly into `operator<=>`, updating related comments. This eliminates function call overhead in the hot path during block generation and chain selection. The comparison algorithm remains unchanged, iterating from most significant to least significant word, but now returns `std::strong_ordering` directly instead of an integer that gets converted via spaceship operator. | ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 6.48 | 154,439,491.66 | 0.0% | 31.01 | 23.25 | 1.334 | 7.16 | 3.8% | 5.50 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 105,754.86 | 9,455.83 | 0.1% | 913,130.11 | 379,588.20 | 2.406 | 276,692.11 | 0.0% | 5.50 | `CheckBlockIndex`
Replace multiple comparisons with a single C++20 spaceship operator call directly. | ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 6.37 | 156,990,488.06 | 0.0% | 28.85 | 22.87 | 1.261 | 8.61 | 3.2% | 5.50 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 83,803.10 | 11,932.73 | 0.0% | 743,565.09 | 300,824.84 | 2.472 | 232,646.08 | 0.0% | 5.56 | `CheckBlockIndex`
Replace manual comparison branches with a single tuple comparison, allowing the compilers to generate more efficient comparison code. Also, inlined the code implicitly by moving it to the header for additional gains. For symmetry, `CBlockIndexHeightOnlyComparator` was also moved to the header. | ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 1.52 | 656,674,205.98 | 0.0% | 13.18 | 5.47 | 2.410 | 3.06 | 0.1% | 5.50 | `CBlockIndexWorkComparator` | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 67,726.84 | 14,765.19 | 0.2% | 585,826.07 | 243,155.90 | 2.409 | 181,920.07 | 0.0% | 5.54 | `CheckBlockIndex` Co-authored-by: Raimo33 <claudio.raimondi@protonmail.com>
f74572e
to
c15d839
Compare
I attempted to run the script, not really sure what these results indicate. Just pasting what the results were
|
Thanks for the measurements @Christewart, this is how your measurements compare to mine (but most importantly how it compares to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK c15d839
i did not run the benchmarks but the code changes look good, using <=>
makes sense.
Summary
Profiling the performance regression in #33618 (comment) revealed that
CBlockIndexWorkComparator
and its underlyingbase_uint<256u>::CompareTo
are hot paths during block validation, consuming ~4% of CPU time.Context
The comparator is often called directly to compare two separate values and also defines the sorting order for
setBlockIndexCandidates
, a sorted tree set containing valid block headers where the comparator is invoked extensively.Testing
To ensure the optimized implementations are both fast and correct, the first commit adds a dedicated benchmark to measure
CBlockIndexWorkComparator
performance, and the second commit adds randomized tests comparing the new implementation with the original one.Results
CBlockIndexWorkComparator
: 100,772,511 cmp/s → 656,674,205 cmp/s = 6.51x fasterCheckBlockIndex
: 9,091 ops/s → 14,765 ops/s = 1.62x fasterCBlockIndexWorkComparator
: 100,451,893 cmp/s → 683,414,234 cmp/s = 6.8x fasterCheckBlockIndex
: 10,322 ops/s → 14,376 ops/s = 1.39x fastergcc and clang measurements
CBlockIndexWorkComparator
CheckBlockIndex
e2e0217 refactor: inline
arith_uint256
comparison operatorCBlockIndexWorkComparator
CheckBlockIndex
85b74b0 refactor: optimize
arith_uint256
comparison with spaceship operatorCBlockIndexWorkComparator
CheckBlockIndex
deb58ee refactor: optimize
CBlockIndexWorkComparator
with std::tieCBlockIndexWorkComparator
CheckBlockIndex
b60450f bench: add benchmark to measure
CBlockIndexWorkComparator
performanceCBlockIndexWorkComparator
CheckBlockIndex
e2e0217 refactor: inline
arith_uint256
comparison operatorCBlockIndexWorkComparator
CheckBlockIndex
85b74b0 refactor: optimize
arith_uint256
comparison with spaceship operatorCBlockIndexWorkComparator
CheckBlockIndex
deb58ee refactor: optimize
CBlockIndexWorkComparator
with std::tieCBlockIndexWorkComparator
CheckBlockIndex
Reproducer
Run the equivalence tests with:
Each commit shows how it changes the relevant benchmarks.
Benchmark script
Note: something similar was already started in #33334, but this is a broader optimization that doesn't use the same technique: added as coauthor regardless.