Skip to content

Conversation

l0rinc
Copy link
Contributor

@l0rinc l0rinc commented Oct 16, 2025

Summary

Profiling the performance regression in #33618 (comment) revealed that CBlockIndexWorkComparator and its underlying base_uint<256u>::CompareTo are hot paths during block validation, consuming ~4% of CPU time.

Context

The comparator is often called directly to compare two separate values and also defines the sorting order for setBlockIndexCandidates, a sorted tree set containing valid block headers where the comparator is invoked extensively.

Testing

To ensure the optimized implementations are both fast and correct, the first commit adds a dedicated benchmark to measure CBlockIndexWorkComparator performance, and the second commit adds randomized tests comparing the new implementation with the original one.

Results

GCC 15.0.1:

  • CBlockIndexWorkComparator: 100,772,511 cmp/s → 656,674,205 cmp/s = 6.51x faster
  • CheckBlockIndex: 9,091 ops/s → 14,765 ops/s = 1.62x faster

Clang 22.0.0:

  • CBlockIndexWorkComparator: 100,451,893 cmp/s → 683,414,234 cmp/s = 6.8x faster
  • CheckBlockIndex: 10,322 ops/s → 14,376 ops/s = 1.39x faster
gcc and clang measurements

Compiler: gcc 15.0.1
b60450f bench: add benchmark to measure CBlockIndexWorkComparator performance

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
9.92 100,772,511.62 0.0% 63.98 35.64 1.795 14.17 1.9% 5.50 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
109,996.46 9,091.20 0.2% 1,014,421.11 394,979.29 2.568 313,025.11 0.0% 5.50 CheckBlockIndex

e2e0217 refactor: inline arith_uint256 comparison operator

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
6.48 154,439,491.66 0.0% 31.01 23.25 1.334 7.16 3.8% 5.50 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
105,754.86 9,455.83 0.1% 913,130.11 379,588.20 2.406 276,692.11 0.0% 5.50 CheckBlockIndex

85b74b0 refactor: optimize arith_uint256 comparison with spaceship operator

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
6.37 156,990,488.06 0.0% 28.85 22.87 1.261 8.61 3.2% 5.50 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
83,803.10 11,932.73 0.0% 743,565.09 300,824.84 2.472 232,646.08 0.0% 5.56 CheckBlockIndex

deb58ee refactor: optimize CBlockIndexWorkComparator with std::tie

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
1.52 656,674,205.98 0.0% 13.18 5.47 2.410 3.06 0.1% 5.50 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
67,726.84 14,765.19 0.2% 585,826.07 243,155.90 2.409 181,920.07 0.0% 5.54 CheckBlockIndex

Compiler: clang 22.0.0

b60450f bench: add benchmark to measure CBlockIndexWorkComparator performance

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
9.96 100,451,893.46 0.0% 61.28 35.75 1.714 13.62 2.1% 5.52 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
96,878.70 10,322.19 0.1% 802,827.10 347,679.01 2.309 234,823.10 0.0% 5.50 CheckBlockIndex

e2e0217 refactor: inline arith_uint256 comparison operator

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
6.13 163,183,693.10 0.0% 25.74 22.01 1.170 6.10 4.6% 5.50 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
86,307.94 11,586.42 1.2% 646,229.09 309,785.33 2.086 195,119.09 0.0% 5.54 CheckBlockIndex

85b74b0 refactor: optimize arith_uint256 comparison with spaceship operator

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
6.36 157,330,900.16 1.0% 26.20 22.61 1.159 6.55 4.4% 5.53 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
75,031.66 13,327.71 1.0% 650,604.15 266,934.04 2.437 149,967.14 0.0% 5.54 CheckBlockIndex

deb58ee refactor: optimize CBlockIndexWorkComparator with std::tie

ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
1.46 683,414,234.14 0.0% 16.12 5.25 3.067 4.02 0.1% 5.50 CBlockIndexWorkComparator
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
69,559.70 14,376.14 0.1% 559,208.07 249,654.28 2.240 132,342.07 0.0% 5.51 CheckBlockIndex

Reproducer

Run the equivalence tests with:

cmake -B build && cmake --build build && ./build/bin/test_bitcoin --run_test=arith_uint256_tests,blockchain_tests

Each commit shows how it changes the relevant benchmarks.

Benchmark script
for compiler in gcc clang; do \
  if [ "$compiler" = "gcc" ]; then CC=gcc; CXX=g++; COMP_VER=$(gcc -dumpfullversion); \
  else CC=clang; CXX=clang++; COMP_VER=$(clang -dumpversion); fi && \
  echo "> Compiler: $compiler $COMP_VER" && \
  for commit in b60450fae83970daa9dc2da0706bf126a2f41515 e2e02177ba6f7fac34eda9696dad2e8ecd44e6cd 85b74b01de7e914d07630138eaa78f09a083b85b deb58eea2fdd2712a28aa6b81417087426b19f5b; do \
    git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && git log -1 --pretty='%h %s' && \
    rm -rf build && \
    cmake -B build -DBUILD_BENCH=ON -DENABLE_IPC=OFF -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=$CC -DCMAKE_CXX_COMPILER=$CXX >/dev/null 2>&1 && \
    cmake --build build -j$(nproc) >/dev/null 2>&1 && \
    for i in 1; do \
      build/bin/bench_bitcoin -filter='CBlockIndexWorkComparator|CheckBlockIndex' -min-time=5000; \
    done; \
  done; \
done

Note: something similar was already started in #33334, but this is a broader optimization that doesn't use the same technique: added as coauthor regardless.

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 16, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33637.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK laanwj
Approach ACK Raimo33

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #33300 (fuzz: compact block harness by Crypt-iQ)
  • #29640 (Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) by sr-gi)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@Raimo33
Copy link

Raimo33 commented Oct 17, 2025

Approach ACK

I have tested the new block index comparator but I’ll refrain from acking the added benchmarks/tests

l0rinc and others added 5 commits October 18, 2025 07:48
Profiling shows this comparator consumes a significant portion `CheckBlockIndex`:
... ChainstateManager::CheckBlockIndex()
    ... std::_Rb_tree<...>::find(...)
        ... node::CBlockIndexWorkComparator::operator()(...)
            ... base_uint<256u>::CompareTo(...) const

This commit introduces a benchmark that performs pairwise comparisons on 1000 randomly generated block indices (with some duplicates) to establish baseline performance metrics before further optimization.

|              ns/cmp |               cmp/s |    err% |         ins/cmp |         cyc/cmp |    IPC |        bra/cmp |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                9.92 |      100,772,511.62 |    0.0% |           63.98 |           35.64 |  1.795 |          14.17 |    1.9% |      5.50 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          109,996.46 |            9,091.20 |    0.2% |    1,014,421.11 |      394,979.29 |  2.568 |     313,025.11 |    0.0% |      5.50 | `CheckBlockIndex`
Add equivalence tests to verify behavioral compatibility when optimizing comparison operations.
These are duplicating behavior for now, but this way the reviewers can validate that the behave the same wway before the optimizations.

The `arith_uint256` test verifies that the spaceship operator produces identical results to the original `CompareTo` method for all comparison operators (<, >, <=, >=, ==, !=).

The `CBlockIndexWorkComparator` test captures the current comparison logic in a lambda and verifies that optimized versions maintain identical sorting behavior for chain work, sequence ID, and pointer tiebreaking.

You can run the tests with:
> cmake -B build && cmake --build build && ./build/bin/test_bitcoin --run_test=arith_uint256_tests,blockchain_tests
Remove the `CompareTo` method and inline its logic directly into `operator<=>`, updating related comments.
This eliminates function call overhead in the hot path during block generation and chain selection.

The comparison algorithm remains unchanged, iterating from most significant to least significant word, but now returns `std::strong_ordering` directly instead of an integer that gets converted via spaceship operator.

|              ns/cmp |               cmp/s |    err% |         ins/cmp |         cyc/cmp |    IPC |        bra/cmp |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                6.48 |      154,439,491.66 |    0.0% |           31.01 |           23.25 |  1.334 |           7.16 |    3.8% |      5.50 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          105,754.86 |            9,455.83 |    0.1% |      913,130.11 |      379,588.20 |  2.406 |     276,692.11 |    0.0% |      5.50 | `CheckBlockIndex`
Replace multiple comparisons with a single C++20 spaceship operator call directly.

|              ns/cmp |               cmp/s |    err% |         ins/cmp |         cyc/cmp |    IPC |        bra/cmp |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                6.37 |      156,990,488.06 |    0.0% |           28.85 |           22.87 |  1.261 |           8.61 |    3.2% |      5.50 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|           83,803.10 |           11,932.73 |    0.0% |      743,565.09 |      300,824.84 |  2.472 |     232,646.08 |    0.0% |      5.56 | `CheckBlockIndex`
Replace manual comparison branches with a single tuple comparison, allowing the compilers to generate more efficient comparison code.
Also, inlined the code implicitly by moving it to the header for additional gains.
For symmetry, `CBlockIndexHeightOnlyComparator` was also moved to the header.

|              ns/cmp |               cmp/s |    err% |         ins/cmp |         cyc/cmp |    IPC |        bra/cmp |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                1.52 |      656,674,205.98 |    0.0% |           13.18 |            5.47 |  2.410 |           3.06 |    0.1% |      5.50 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|           67,726.84 |           14,765.19 |    0.2% |      585,826.07 |      243,155.90 |  2.409 |     181,920.07 |    0.0% |      5.54 | `CheckBlockIndex`

Co-authored-by: Raimo33 <claudio.raimondi@protonmail.com>
@l0rinc l0rinc force-pushed the l0rinc/block_index_comparators branch from f74572e to c15d839 Compare October 18, 2025 05:55
@l0rinc l0rinc changed the title refactor: optimize block index comparisons (1.4-7.7x faster) refactor: optimize block index comparisons (1.4-6.8x faster) Oct 18, 2025
@Christewart
Copy link
Contributor

Christewart commented Oct 18, 2025

I attempted to run the script, not really sure what these results indicate. Just pasting what the results were

Darwin Chriss-MacBook-Pro.local 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:55 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6031 arm64
Apple clang version 17.0.0 (clang-1700.3.19.1)
Target: arm64-apple-darwin24.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                4.97 |      201,237,014.91 |    0.4% |      5.49 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           43,777.21 |           22,842.94 |    0.3% |      5.41 | `CheckBlockIndex`
|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                1.89 |      529,694,702.31 |    0.1% |      5.50 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           38,303.87 |           26,107.02 |    0.1% |      5.35 | `CheckBlockIndex`
|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                2.31 |      432,334,219.11 |    0.6% |      5.45 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           33,598.60 |           29,763.15 |    0.2% |      5.32 | `CheckBlockIndex`
|              ns/cmp |               cmp/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                0.78 |    1,287,258,159.19 |    0.1% |      5.46 | `CBlockIndexWorkComparator`

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|           31,055.32 |           32,200.61 |    0.1% |      5.30 | `CheckBlockIndex`

@l0rinc
Copy link
Contributor Author

l0rinc commented Oct 19, 2025

Thanks for the measurements @Christewart, this is how your measurements compare to mine (but most importantly how it compares to master):
image

Copy link
Member

@laanwj laanwj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK c15d839
i did not run the benchmarks but the code changes look good, using <=> makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants