streams: replace `std::find` with `memchr` (5x improvement) #34044

Raimo33 · 2025-12-10T17:34:23Z

Summary

This PR optimizes the FindByte method by using memchr instead of std::find. This takes advantage of the underlying optimizations that come with memchr, primarily vectorized chunked reads. While std::find is more standard and modern, it is suboptimal for iterating single bytes as they're iterated 1 by 1 instead of exploiting SIMD.

One could argue that this is not a concern of Bitcoin Core but rather of libc++ mantainers, but since it shows 5x improvement in existing benchmarks, I think it's worth including.

Benchmarks

Details

secp256k1 configure summary
===========================
Build artifacts:
  library type ........................ Static
Optional modules:
  ECDH ................................ OFF
  ECDSA pubkey recovery ............... ON
  extrakeys ........................... ON
  schnorrsig .......................... ON
  musig ............................... ON
  ElligatorSwift ...................... ON
Parameters:
  ecmult window size .................. 15
  ecmult gen table size ............... 86 KiB
Optional features:
  assembly ............................ x86_64
  external callbacks .................. OFF
Optional binaries:
  benchmark ........................... OFF
  noverify_tests ...................... OFF
  tests ............................... OFF
  exhaustive tests .................... OFF
  ctime_tests ......................... OFF
  examples ............................ OFF

Cross compiling ....................... FALSE
API visibility attributes ............. ON
Valgrind .............................. ON
Preprocessor defined macros ........... ECMULT_WINDOW_SIZE=15 COMB_BLOCKS=43 COMB_TEETH=6 USE_ASM_X86_64=1 VALGRIND
C compiler ............................ GNU 13.3.0, /usr/bin/cc
CFLAGS ................................ 
Compile options ....................... -Wall -pedantic -Wcast-align -Wcast-align=strict -Wextra -Wnested-externs -Wno-long-long -Wno-overlength-strings -Wno-unused-function -Wshadow -Wstrict-prototypes -Wundef
Build type:
 - CMAKE_BUILD_TYPE ................... Release
 - CFLAGS ............................. -O2 -g 
 - LDFLAGS for executables ............ 
 - LDFLAGS for shared libraries ....... 



Configure summary
=================
Executables:
  bitcoin ............................. OFF
  bitcoind ............................ ON
  bitcoin-node (multiprocess) ......... ON
  bitcoin-qt (GUI) .................... OFF
  bitcoin-gui (GUI, multiprocess) ..... OFF
  bitcoin-cli ......................... OFF
  bitcoin-tx .......................... OFF
  bitcoin-util ........................ OFF
  bitcoin-wallet ...................... OFF
  bitcoin-chainstate (experimental) ... OFF
  libbitcoinkernel (experimental) ..... OFF
  kernel-test (experimental) .......... OFF
Optional features:
  wallet support ...................... OFF
  external signer ..................... OFF
  ZeroMQ .............................. OFF
  IPC ................................. ON
  USDT tracing ........................ OFF
  QR code (GUI) ....................... OFF
  DBus (GUI) .......................... OFF
Tests:
  test_bitcoin ........................ OFF
  test_bitcoin-qt ..................... OFF
  bench_bitcoin ....................... OFF
  fuzz binary ......................... OFF

Cross compiling ....................... FALSE
C++ compiler .......................... GNU 13.3.0, /usr/bin/c++
CMAKE_BUILD_TYPE ...................... Release
Preprocessor defined macros ........... 
C++ compiler flags .................... -O2 -std=c++20 -fPIC -fno-extended-identifiers -fdebug-prefix-map=/home/claudio/Desktop/bitcoinknots/src=. -fmacro-prefix-map=/home/claudio/Desktop/bitcoinknots/src=. -fstack-reuse=none -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -Wall -Wextra -Wformat -Wformat-security -Wvla -Wredundant-decls -Wdate-time -Wduplicated-branches -Wduplicated-cond -Wlogical-op -Woverloaded-virtual -Wsuggest-override -Wimplicit-fallthrough -Wunreachable-code -Wbidi-chars=any -Wundef -Wno-unused-parameter
Linker flags .......................... -O2 -fstack-reuse=none -fstack-protector-all -fcf-protection=full -fstack-clash-protection -Wl,-z,relro -Wl,-z,now -Wl,-z,separate-code -fPIE -pie

taskset -c 1 ./bin/bench_bitcoin -filter="(FindByte|LoadExternalBlockFile)" --min-time=10000

Before:

ns/op	op/s	err%	total	benchmark
53.20	18,796,833.40	0.0%	11.00	`FindByte`
22,499,431.11	44.45	0.2%	10.90	`LoadExternalBlockFile`

After:

ns/op	op/s	err%	total	benchmark
10.38	96,365,031.03	0.0%	10.99	`FindByte`
22,128,903.67	45.19	0.3%	10.96	`LoadExternalBlockFile`

I've also ran a reindex benchmark up to block 300'000 and it shows a slight improvement of ~1.2%

Details

CMD ["hyperfine", \
    "--runs", "3", \
    "--setup", "pyperf system tune; bitcoind -datadir=. -stopatheight=1 || true", \
    "--prepare", "rm -rf chainstate/", \
    "--cleanup", "pyperf system reset", \
    "bitcoind -datadir=. -listen=0 -dnsseed=0 -fixedseeds=0 -printtoconsole=0 -blocksonly=1 -reindex -stopatheight=300000 -dbcache=4096"]

before:

  Time (mean ± σ): 2097.363 s ± 18.306 s    [User: 5859.220 s, System: 62.772 s]
  Range (min … max): 2079.740 s … 2116.283 s    3 runs

after:

  Time (mean ± σ): 2072.158 s ± 29.275 s    [User: 5857.330 s, System: 63.515 s]
  Range (min … max): 2046.102 s … 2103.836 s    3 runs

DrahtBot · 2025-12-10T17:34:30Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34044.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	Ataraxia009, l0rinc

If your review is incorrectly listed, please copy-paste  into the comment that the bot should ignore.

Ataraxia009 · 2025-12-10T18:30:33Z

Concept ACK

'memchr' seems like a better alternative here

l0rinc

Concept ACK

FindByte’s only production use seems to be scanning block files during -reindex to find the first byte of the network magic in a circular buffer.
Because of documented historical bugs the data may be jumbled a bit, it's why we need to be able to find the beginning.
It's not something we expect users to do often, but this should speed it up a tiny bit. Left a few suggestions to make it more readable as well to make it easier to sell :)

I would be curious whether a -reindex -stopafterblockimport speedup is measurable - since my understanding is that the magic is usually at the beginning anyway, so it's also possible this ends up slowing down the average case.

I haven't checked, but LoadExternalBlockFile bench should also exercise this change - and it puts the change into perspective.

src/streams.h

l0rinc · 2025-12-11T12:39:39Z

it doesn't help that you rebase on a Knots base - the first few commits are already merged on Core, please fix that.

src/streams.h

before: | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 53.20 | 18,796,833.40 | 0.0% | 11.00 | `FindByte` | 22,499,431.11 | 44.45 | 0.2% | 10.90 | `LoadExternalBlockFile` after: | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 10.38 | 96,365,031.03 | 0.0% | 10.99 | `FindByte` | 22,128,903.67 | 45.19 | 0.3% | 10.96 | `LoadExternalBlockFile`

Raimo33 · 2025-12-12T18:57:18Z

I would be curious whether a -reindex -stopafterblockimport speedup is measurable - since my understanding is that the magic is usually at the beginning anyway, so it's also possible this ends up slowing down the average case.

I've ran a reindex benchmark, see updated PR description. I'm not sure if ~1.2% is an irrelevant gain, but I argue a 5-6x improvement on the FindByte method is significant and should be considered for merging regardless, even for possible future use of this method.

l0rinc · 2025-12-14T11:08:33Z

I ran a reindex (without chainstate) 3 times for the whole mainchain - there is no measurable speedup here (it's even 1-2% slower than before, though that's most likely just noise):

COMMITS="938d7aacabd0bb3784bb3e529b1ed06bb2891864 e24701fe5522ac9b0eaeacc67bd16e11555a6020"; \
DATA_DIR="$HOME/Library/Application Support/Bitcoin"; LOG_DIR="$HOME/bitcoin-reindex-logs"; \
mkdir -p "$LOG_DIR"; \
COMMA_COMMITS=${COMMITS// /,}; \
(echo ""; for c in $(echo $COMMITS); do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && \
(echo "" && echo "reindex | $(hostname) | $(uname -m) | $(sysctl -n machdep.cpu.brand_string) | $(nproc) cores | $(printf '%.1fGiB' "$(( $(sysctl -n hw.memsize)/1024/1024/1024 ))") RAM | SSD | $(sw_vers -productName) $(sw_vers -productVersion) $(sw_vers -buildVersion) | $(xcrun clang --version | head -1)"; echo "") && \
hyperfine \
  --sort command \
  --runs 3 \
  --export-json "$LOG_DIR/reindex-$(echo "$COMMITS" | sed -E 's/([a-f0-9]{8})[a-f0-9]* ?/\1-/g;s/-$//')-appleclang.json" \
  --parameter-list COMMIT "$COMMA_COMMITS" \
  --prepare "killall -9 bitcoind 2>/dev/null || true; rm -f \"$DATA_DIR\"/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
    cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo && ninja -C build bitcoind -j2" \
  --conclude "killall bitcoind 2>/dev/null || true; sleep 5; grep -q 'Stopping after block import' \"$DATA_DIR\"/debug.log || { echo 'debug.log assertions failed'; exit 1; }; \
              cp \"$DATA_DIR\"/debug.log \"$LOG_DIR\"/reindex-{COMMIT}-\$(date +%s).log 2>/dev/null || true" \
  "./build/bin/bitcoind -datadir=\"$DATA_DIR\" -reindex -stopafterblockimport -printtoconsole=0"

938d7aacab Merge bitcoin/bitcoin#33657: rest: allow reading partial block data from storage
e24701fe55 streams: replace std::find with memchr

Benchmark 1: ./build/bin/bitcoind -datadir="/Users/lorinc/Library/Application Support/Bitcoin" -reindex -stopafterblockimport -printtoconsole=0 (COMMIT = 938d7aacabd0bb3784bb3e529b1ed06bb2891864)
  Time (mean ± σ):     17116.255 s ± 287.737 s    [User: 18550.915 s, System: 3721.215 s]
  Range (min … max):   16793.539 s … 17346.051 s    3 runs

Benchmark 1: ./build/bin/bitcoind -datadir="/Users/lorinc/Library/Application Support/Bitcoin" -reindex -stopafterblockimport -printtoconsole=0 (COMMIT = e24701fe5522ac9b0eaeacc67bd16e11555a6020)
  Time (mean ± σ):     17344.195 s ± 113.196 s    [User: 18589.419 s, System: 3727.864 s]
  Range (min … max):   17247.068 s … 17468.509 s    3 runs

It can still be a useful change, but I don't think it's fair to say it speeds up anything.

sipa · 2025-12-14T13:35:37Z

I wouldn't expect any change, because unless you're starting from a pre-0.8 node block files, or badly corrupted ones, it'll always be the first byte that matches.

l0rinc · 2025-12-14T13:38:13Z

It's also what I expected, as mentioned above, the current solution is worse for that case

Raimo33 · 2025-12-14T13:53:14Z

understood. std::find makes more sense then. Something to keep in mind for the future though.

l0rinc · 2025-12-14T17:07:00Z

Something to keep in mind for the future though.

~~I have pushed a change to remove the benchmark that lead to the confusion: #34046 (comment)~~ reverted

maflcko · 2025-12-15T08:28:28Z

Something to keep in mind for the future though.

I have pushed a change to remove the benchmark that lead to the confusion: #34046 (comment)

I think it could be better to remove the recovery logic. An upgrade from 0.8 does not seem like a use-case that any user will ever need in the future. Also, I don't see what kind of corruption this could possibly be able to recover, given that it can't progress past the first corrupt block anyway?

l0rinc · 2025-12-15T09:00:56Z

I think it could be better to remove the recovery logic

I will push a PR for that, let's see what others think

l0rinc mentioned this pull request Dec 10, 2025

bench: run FindByte across block-sized buffer #34046

Closed

l0rinc reviewed Dec 10, 2025

View reviewed changes

src/streams.h Outdated Show resolved Hide resolved

src/streams.h Outdated Show resolved Hide resolved

Raimo33 force-pushed the optimize-findbyte branch from 4115c99 to 246086c Compare December 11, 2025 12:33

l0rinc reviewed Dec 11, 2025

View reviewed changes

src/streams.h Show resolved Hide resolved

src/streams.h Outdated Show resolved Hide resolved

Raimo33 force-pushed the optimize-findbyte branch from 246086c to b6f93de Compare December 11, 2025 12:43

DrahtBot added the CI failed label Dec 11, 2025

Raimo33 added 2 commits December 11, 2025 13:58

refactor: more readable increment in FindByte

06d4e79

Raimo33 force-pushed the optimize-findbyte branch from b6f93de to 830024e Compare December 11, 2025 13:20

DrahtBot removed the CI failed label Dec 11, 2025

Raimo33 closed this Dec 14, 2025

streams: replace std::find with memchr (5x improvement) #34044

streams: replace std::find with memchr (5x improvement) #34044

Conversation

Raimo33 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmarks

Uh oh!

DrahtBot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Uh oh!

Ataraxia009 commented Dec 10, 2025

Uh oh!

l0rinc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

l0rinc commented Dec 11, 2025

Uh oh!

Uh oh!

Uh oh!

Raimo33 commented Dec 12, 2025

Uh oh!

l0rinc commented Dec 14, 2025

Uh oh!

sipa commented Dec 14, 2025

Uh oh!

l0rinc commented Dec 14, 2025

Uh oh!

Raimo33 commented Dec 14, 2025

Uh oh!

l0rinc commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko commented Dec 15, 2025

Uh oh!

l0rinc commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

streams: replace `std::find` with `memchr` (5x improvement) #34044

streams: replace `std::find` with `memchr` (5x improvement) #34044

Raimo33 commented Dec 10, 2025 •

edited

Loading

DrahtBot commented Dec 10, 2025 •

edited

Loading

l0rinc commented Dec 14, 2025 •

edited

Loading