Skip to content

fix: segfault during cached-block catch-up sync#765

Merged
ch4r10t33r merged 5 commits into
mainfrom
fix/sync-cached-block-uaf
Apr 20, 2026
Merged

fix: segfault during cached-block catch-up sync#765
ch4r10t33r merged 5 commits into
mainfrom
fix/sync-cached-block-uaf

Conversation

@ch4r10t33r
Copy link
Copy Markdown
Contributor

@ch4r10t33r ch4r10t33r commented Apr 18, 2026

Summary

Fixes two distinct segfaults observed during catch-up sync on x86_64:

  1. ssz.serialize corrupting live cached SignedBlocks. Stack bottoms out in
    onBlock iterating aggregated_attestations (or later in verifySignatures)
    with a dangling slice into a cached block's heap buffer. Root cause:
    ssz.serialize on a live cached SignedBlock (directly, or via sszClone)
    mutates nested List / Bitlist fields (aggregation_bits, proof_data),
    so the next read of the same cached block crashes.

  2. fetched_blocks hashmap rehash race (folded in from fix race condition leading to seg faults #741). The map in the
    network layer is shared between the libxev main thread (onInterval) and the
    libp2p worker thread (onGossip / onReqRespResponse / onReqRespRequest).
    When the worker inserts and triggers a rehash, the main thread's in-flight
    lookups/iterations become invalid and panic.

Changes

ssz-corruption fix (cherry-picked from fix/libp2p-glue-c-abi-x86_64)

  • 1aecb39 chain, database: persist pre-forkchoice signed block SSZ for rocksdb
  • 1fd70ea node, types: capture block SSZ bytes at cache time, never re-serialize cached blocks

Highlights:

  • types/utils: new sszCloneAndGetBytes — one serialize pass returning both a
    deep clone and the SSZ bytes (caller owns), avoiding a second serialize on the
    same value.
  • network: new fetched_block_ssz map (root → []u8) with lifecycle tied to
    cacheFetchedBlock / removeFetchedBlock / deinit; getFetchedBlockSsz /
    storeFetchedBlockSsz helpers.
  • node.cacheBlockAndFetchParent: uses sszCloneAndGetBytes and stores the
    bytes alongside the cached block.
  • chain.CachedProcessedBlockInfo: new sszBytes: ?[]const u8 field.
  • node.processCachedDescendants: retrieves stored bytes and passes them via
    blockInfo.sszBytes.
  • chain.onBlock / updateBlockDb: uses blockInfo.sszBytes when present,
    falls back to a disposable-clone serialize only for locally produced blocks.

Hashmap rehash race fix (cherry-picked from #741)

  • 71280e3 fix race condition leading to seg faults (by @anshalshukla)

Adds a std.Thread.Mutex to BeamNode that serializes work between the
libxev main thread and the libp2p worker thread. Locks are taken around:

  • onGossip
  • onReqRespResponse
  • onReqRespRequest (.blocks_by_root branch)
  • onInterval (chain tick, pending replay, RPC sweep, processReadyCachedBlocks)

Test plan

  • zig fmt --check .
  • cargo fmt --manifest-path rust/Cargo.toml --all -- --check
  • cargo clippy --manifest-path rust/Cargo.toml --workspace -- -D warnings
  • zig build test --summary all
  • zig build simtest --summary all
  • Soak under catch-up-sync scenario on x86_64 hardware to confirm both
    original crashes are resolved

ch4r10t33r and others added 3 commits April 18, 2026 10:11
…e cached blocks

ssz.serialize called on a live cached SignedBlock (directly or through sszClone)
has been observed to corrupt in-memory List/Bitlist state (aggregation_bits,
proof_data), causing segfaults on the next cached block processed from the same
stack frame — first in the fork-choice attestation loop, then in verifySignatures
after the previous sszClone fix shifted the call site earlier.

The root fix: serialize each block exactly once, at the moment it is cloned into
the cache, and carry those bytes through to DB persistence so onBlock never needs
to touch the live SignedBlock with ssz.serialize again.

- types/utils: add sszCloneAndGetBytes — one serialize pass that returns both a
  deep clone and the SSZ byte slice (caller owns), avoiding a second serialize on
  the same value
- network: add fetched_block_ssz map (root → []u8); removeFetchedBlock and deinit
  clean it up; new getFetchedBlockSsz / storeFetchedBlockSsz helpers
- node/cacheBlockAndFetchParent: use sszCloneAndGetBytes instead of sszClone and
  store the returned bytes in fetched_block_ssz
- chain/CachedProcessedBlockInfo: add sszBytes ?[]const u8 field
- node/processCachedDescendants: retrieve stored bytes and pass via blockInfo
- chain/onBlock: use blockInfo.sszBytes when present; fall back to clone-and-
  serialize only for locally produced blocks (no cached bytes available)
@ch4r10t33r ch4r10t33r changed the title node, types, database: fix segfault during cached-block catch-up sync fix: segfault during cached-block catch-up sync Apr 18, 2026
@noopur23
Copy link
Copy Markdown
Contributor

the pr resolved the seg fault issue.

@ch4r10t33r ch4r10t33r requested a review from noopur23 April 20, 2026 08:47
@ch4r10t33r ch4r10t33r merged commit 19e0bc5 into main Apr 20, 2026
13 checks passed
@ch4r10t33r ch4r10t33r deleted the fix/sync-cached-block-uaf branch April 20, 2026 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants