Skip to content

fix(blockchain): defer main-chain events until write batch is flushed#11640

Merged
asdacap merged 4 commits into
NethermindEth:masterfrom
obchain:fix/11616-blocktree-events-after-batch-flush
May 20, 2026
Merged

fix(blockchain): defer main-chain events until write batch is flushed#11640
asdacap merged 4 commits into
NethermindEth:masterfrom
obchain:fix/11616-blocktree-events-after-batch-flush

Conversation

@obchain
Copy link
Copy Markdown
Contributor

@obchain obchain commented May 18, 2026

Fixes #11616

Changes

  • BlockTree.UpdateMainChain opened a ChainLevelInfoRepository write batch and held it across MoveToMain for every block in the slice. BlockAddedToMain, NewHeadBlock, and OnUpdateMainChain all fired while the batch was still in scope. PersistLevel writes to _blockInfoCache and the batch simultaneously, so the bug was masked for any subscriber hitting the cache; once the cache evicts (CacheSize = 64), a fresh repository reads through to the underlying IDb, or a process restarts mid-flow, subscribers observe stale HasBlockOnMainChain markers. InMemoryWriteBatch / RocksDbWriteBatch only commit on Dispose().
  • MoveToMain now returns a buffered DeferredMainChainEvent (BlockReplacementEventArgs + optional BlockEventArgs from the head-update path) instead of raising the events directly.
  • UpdateHeadBlock(Block) is split into a pure state-mutating SetHeadBlock (writes _blockInfoDb[HeadAddressInDb], returns the event args) and a thin wrapper that fires NewHeadBlock. Existing callers (DeleteChainSlice) are untouched and continue to fire the event after their own batch closes.
  • UpdateMainChain wraps the batch in an inner using block, collects deferred events from each MoveToMain call, disposes the batch (flushing the underlying write batch), then raises the buffered events in their original relative order (per block: NewHeadBlockBlockAddedToMain), and finally OnUpdateMainChain.
  • Added regression test UpdateMainChain_fires_main_chain_events_after_chain_level_repository_batch_flushed in BlockTreeTests. Each handler constructs a fresh ChainLevelInfoRepository over the same underlying _blocksInfosDb (cache empty) and asserts LoadLevel(...).HasBlockOnMainChain == true. Pre-fix this fails for both per-block events because InMemoryWriteBatch has not yet flushed.

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

dotnet test --filter FullyQualifiedName~BlockTreeTests — 245/245 in Nethermind.Blockchain.Test, 38/38 in Nethermind.Merge.Plugin.Test. The new regression test fails pre-fix and passes post-fix. Existing notification-ordering tests (Shall_notify_new_head_block_once_and_block_added_to_main_multiple_times..., BlockAddedToMain_should_have_updated_Head) still pass — relative event order and Head visibility from subscribers are preserved.

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes
  • No

@LukaszRozmej
Copy link
Copy Markdown
Member

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 18, 2026

Claude finished @LukaszRozmej's task in 7m 3s —— View job


Code Review

  • Gather context and read changed files
  • Read relevant rules files
  • Review BlockTree.cs changes
  • Review BlockTreeTests.cs changes
  • Post inline comments and findings

Summary

The fix is correct and addresses a real production-impact bug. The root cause is well-identified: InMemoryWriteBatch/RocksDbWriteBatch only commit writes on Dispose(), so firing events while the batch is still open means any subscriber that bypasses _blockInfoCache (fresh ChainLevelInfoRepository instance, cache eviction at 64 entries, or process restart mid-flow) will observe HasBlockOnMainChain = false even though the in-memory cache shows the updated state.

The deferred-event approach is the right fix. Event ordering is preserved (NewHeadBlockBlockAddedToMain per block, then OnUpdateMainChain), Head visibility from subscribers is correct (set in SetHeadBlock before the batch closes, before events fire), and the regression test correctly isolates the underlying DB path.

Undocumented improvement: BatchWrite wraps a Monitor.Enter lock on the ChainLevelInfoRepository. Previously all three events fired while holding this lock; now the lock is released before any event fires. This reduces the critical-section duration and removes potential lock-order issues for subscribers that need to read chain levels.


Findings

Critical: 0 | High: 0 | Medium: 0 | Low: 4

Low findings (inline comments posted on specific lines):

  1. TryUpdateSyncPivot() placement — logically belongs after the batch is flushed (it reads the now-canonical head to decide the pivot). Moving it after the using block and before the event loop would make the write/notify split cleaner. BlockTree.cs:1055

  2. List<DeferredMainChainEvent> allocation — new per-call allocation on the block-processing path. Negligible given the surrounding DB work, but worth a comment if allocation profiling surfaces it. BlockTree.cs:1074

  3. SetHeadBlock naming — the Set prefix conventionally implies a pure write; a method that also produces a return value (BlockEventArgs) is slightly surprising. BlockTree.cs:1352

  4. Collection expression style in testnew[] { block1, block2 }[block1, block2] per C# 14 conventions. BlockTreeTests.cs:266

  5. Unrelated change (inline comment couldn't be placed at line level) — removing ReadOnlySpan<ChainLevelInfo?> levelsSpan = levels.AsSpan() and switching levelsSpan[i]levels[i] in UpdateOrCreateLevel is independent of the batch-flush fix. The Span<T> indexer can give the JIT a bounds-check-removal hint; removing it is a minor regression unless the JIT devirtualizes IOwnedReadOnlyList<T>. This is not harmful but would be cleaner in a separate commit to keep the bug-fix diff minimal and auditable.


Verdict: Mergeable. All Low findings are stylistic or minor. The core fix is sound, the test is well-structured and correctly validates the pre-fix failure, and the existing test suite (245/245 + 38/38) continues to pass.

pendingEvents.Add(deferred);
}

TryUpdateSyncPivot();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low – TryUpdateSyncPivot() placement inside the using block

In the original code, TryUpdateSyncPivot() was called while the batch was alive (using-declaration scope = entire method). Moving it inside the scoped using (...) block preserves that, so there's no behavioral change here.

However, TryUpdateSyncPivot() calls FindHeader (reads) and writes to _metadataDb, which is independent of the _chainLevelInfoRepository write batch. Logically it belongs after the batch is flushed (alongside the events), since it makes decisions based on the now-canonical head. Consider moving it after the closing brace of the using block and before the event loop — this makes the grouping more explicit: "write phase, then read/notify phase."

OnUpdateMainChain?.Invoke(this, new OnUpdateMainChainArgs(blocks, wereProcessed));
}

private readonly struct DeferredMainChainEvent(BlockReplacementEventArgs blockAdded, BlockEventArgs? newHead)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low – minor allocation on the block-processing hot path

new(blocks.Count) allocates a List<DeferredMainChainEvent> on every UpdateMainChain call. For the common PoS single-block case this is one small list object. For batch sync the list is proportional to blocks.Count, so it's proportional to the surrounding DB work and unlikely to matter in practice. Worth noting in case allocation profiling surfaces it later.

For the single-block path (blocks.Count == 1) you could skip the list entirely and use a stack-allocated local, but that optimization may not be worth the complexity given the surrounding allocation budget.

// Mutates Head and writes the head hash without raising NewHeadBlock.
// UpdateMainChain raises the event itself after the ChainLevelInfoRepository write batch
// has been disposed (and therefore flushed) so subscribers always observe committed state.
private BlockEventArgs SetHeadBlock(Block block)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low – naming: Set verb combined with a non-trivial return value is slightly surprising

SetHeadBlock both mutates Head / writes to _blockInfoDb and returns BlockEventArgs. The Set prefix usually implies a pure write with no return value. A name like PrepareHeadUpdate or MutateHeadAndCreateArgs would make the dual responsibility more obvious. Minor readability nit — the inline comment immediately above does help, so this is optional.

onUpdateDbObserved = level?.HasBlockOnMainChain == true;
};

blockTree.UpdateMainChain(new[] { block1, block2 }, wereProcessed: true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low – collection expression style

Suggested change
blockTree.UpdateMainChain(new[] { block1, block2 }, wereProcessed: true);
blockTree.UpdateMainChain([block1, block2], wereProcessed: true);

Per the C# 14 / repo coding style, prefer collection expressions [...] over new[] { ... } for array literals.

// observe the underlying RocksDB write batch as committed. Otherwise subscribers reading
// ChainLevelInfo through a path that bypasses _blockInfoCache (cache eviction, a separate
// repository instance, or a process restart in between) would see stale state.
List<DeferredMainChainEvent> pendingEvents = new(blocks.Count);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ArrayPoolListRef

Comment on lines +1006 to +1009
// Events are buffered and raised after the batch is disposed so that subscribers always
// observe the underlying RocksDB write batch as committed. Otherwise subscribers reading
// ChainLevelInfo through a path that bypasses _blockInfoCache (cache eviction, a separate
// repository instance, or a process restart in between) would see stale state.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment not needed

Comment on lines +1074 to +1078
private readonly struct DeferredMainChainEvent(BlockReplacementEventArgs blockAdded, BlockEventArgs? newHead)
{
public BlockReplacementEventArgs BlockAdded { get; } = blockAdded;
public BlockEventArgs? NewHead { get; } = newHead;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private readonly struct DeferredMainChainEvent(BlockReplacementEventArgs blockAdded, BlockEventArgs? newHead)
{
public BlockReplacementEventArgs BlockAdded { get; } = blockAdded;
public BlockEventArgs? NewHead { get; } = newHead;
}
private readonly record struct DeferredMainChainEvent(BlockReplacementEventArgs BlockAdded, BlockEventArgs? NewHead);

// Batch disposed above — write batch is now flushed to the underlying IDb.
// Raise the per-block events in their original relative order: NewHeadBlock first
// (for the block whose UpdateHeadBlock path ran), then BlockAddedToMain.
for (int i = 0; i < pendingEvents.Count; i++)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterate over span

Comment on lines +1058 to +1060
// Batch disposed above — write batch is now flushed to the underlying IDb.
// Raise the per-block events in their original relative order: NewHeadBlock first
// (for the block whose UpdateHeadBlock path ran), then BlockAddedToMain.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment not needed

obchain and others added 2 commits May 20, 2026 12:07
`UpdateMainChain` opens a `ChainLevelInfoRepository` write batch and
holds it open across `MoveToMain` for every block. With the previous
ordering, `BlockAddedToMain`, `NewHeadBlock`, and `OnUpdateMainChain`
all fired while the batch was still in scope. `PersistLevel` writes
to the in-memory cache and to the batch simultaneously, so the bug
was masked for any subscriber that happened to hit the cache. Once
the cache evicts (CacheSize=64), a fresh repository reads the level
directly from the underlying IDb, or a separate process restarts in
between, subscribers observed stale `HasBlockOnMainChain` markers.

Buffer the per-block `BlockAddedToMain` / `NewHeadBlock` event args
inside `MoveToMain`, dispose the batch (which flushes the underlying
write batch), then drain the buffered events and finally raise
`OnUpdateMainChain`. The relative per-block order
(`NewHeadBlock` then `BlockAddedToMain`) is preserved.

`UpdateHeadBlock(Block)` is split into a pure state-mutating
`SetHeadBlock` that returns the event args, plus a wrapper that
keeps existing callers (`DeleteChainSlice`) unchanged.

Regression test (`BlockTreeTests`) registers handlers that read each
level through a fresh `ChainLevelInfoRepository` instance (empty
cache) and asserts `HasBlockOnMainChain` is `true`. Pre-fix, this
read hits the underlying `InMemoryWriteBatch`-backed db before the
batch has flushed and fails for both `BlockAddedToMain` and
`NewHeadBlock`.

Fixes NethermindEth#11616
- use ArrayPoolListRef<DeferredMainChainEvent> to avoid List allocation on the hot path
- convert DeferredMainChainEvent to a record struct
- iterate the pending events over the underlying span
- move TryUpdateSyncPivot() after the batch is disposed (write phase / notify phase)
- drop redundant explanatory comments
- use collection expression in the new BlockTreeTests regression test
@obchain obchain force-pushed the fix/11616-blocktree-events-after-batch-flush branch from 577d27b to 767da5f Compare May 20, 2026 06:48
@LukaszRozmej
Copy link
Copy Markdown
Member

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 20, 2026

Claude finished @LukaszRozmej's task in 3m 10s —— View job


Code Review

  • Gather context and read changed files
  • Read relevant rules files
  • Review BlockTree.cs changes
  • Review BlockTreeTests.cs changes
  • Post findings and summary

Summary

This is a re-review of the refactoring commit (767da5f) that addressed @LukaszRozmej's inline feedback on the original fix. All prior review points have been resolved:

Prior finding Status
TryUpdateSyncPivot() inside the batch block ✅ Moved after using (batch) closes
List<DeferredMainChainEvent> allocation ✅ Replaced with using ArrayPoolListRef<DeferredMainChainEvent>
For-loop index style ✅ Replaced with foreach over pendingEvents.AsSpan()
Explanation comments above pendingEvents and the loop ✅ Removed per reviewer request
DeferredMainChainEvent multi-line struct ✅ Collapsed to private readonly record struct one-liner
Collection expression in test [block1, block2] used

ArrayPoolListRef usage is correct: declared with using at method scope, filled inside the inner using (batch) block, iterated after the batch disposes, and array returned to the pool on method exit. The Span<T> from AsSpan() is safe since pendingEvents is still in scope.

Event ordering and semantics are preserved: NewHeadBlockBlockAddedToMain per block, then OnUpdateMainChain. Head is set by SetHeadBlock before any event fires (inside the batch block, but Head is an in-memory field — not behind the write batch).

DeleteChainSlice is unaffected: it calls the private UpdateHeadBlock(Block) wrapper (which fires NewHeadBlock) after its own using (_chainLevelInfoRepository.StartBatch()) block closes, so it was already correct and remains so.


Critical: 0 | High: 0 | Medium: 0 | Low: 1

Low: stray double blank line between DeferredMainChainEvent and TryUpdateSyncPivot — inline suggestion posted.

Verdict: Mergeable. The fix is sound, all prior feedback has been applied, and the only remaining item is a trivial whitespace nit.

Comment thread src/Nethermind/Nethermind.Blockchain/BlockTree.cs Outdated
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
CS1002 build error from accidental duplicate method header.
@asdacap asdacap merged commit 5106feb into NethermindEth:master May 20, 2026
542 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Block tree events fire before write batch is flushed

3 participants