Skip to content

fix AuRaBlockFinalizationManager unclean shutdown#11062

Merged
LukaszRozmej merged 5 commits into
masterfrom
fix/aura-finalization-manager-shutdown
Apr 8, 2026
Merged

fix AuRaBlockFinalizationManager unclean shutdown#11062
LukaszRozmej merged 5 commits into
masterfrom
fix/aura-finalization-manager-shutdown

Conversation

@LukaszRozmej
Copy link
Copy Markdown
Member

@LukaszRozmej LukaszRozmej commented Apr 7, 2026

Changes

  • Unsubscribe AuRaBlockFinalizationManager from block processing events at merge transition, not just at shutdown
  • AuRaMergeFinalizationManager now hooks IPoSSwitcher.TerminalBlockReached and disposes the AuRa finalization manager when the merge happens, preventing post-merge blocks from triggering pre-merge finalization logic
  • The event handler self-unsubscribes to avoid event/memory leaks
  • Defense-in-depth: GetBlockInfo returns nullable and callers handle null gracefully, preventing NRE if LoadLevel returns null during shutdown races
  • Removed unused IValidSealerStrategy dependency from AuRaBlockFinalizationManager

Root cause

AuRaBlockFinalizationManager had zero merge awareness — it stayed subscribed to BlockProcessed/BlocksProcessing unconditionally and ran AuRa finalization logic on post-merge blocks. On Gnosis post-merge (block 20544072), this caused a NullReferenceException in GetBlockInfo when LoadLevel returned null during shutdown.

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Added AuRaMergeFinalizationManagerTests with 3 tests:

  • Disposes_aura_manager_on_merge (parameterized: startup vs runtime transition)
  • Terminal_block_handler_unsubscribes_itself (verifies no event leak)

Existing 28 AuRa finalization tests continue to pass.

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes
  • No

@LukaszRozmej
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 7, 2026

Claude finished @LukaszRozmej's task in 2m 39s —— View job


Code Review

  • Gather context
  • Read changed files
  • Read coding/robustness guidelines
  • Analyze changes
  • Post review feedback

Summary

The PR correctly identifies that AuRaBlockFinalizationManager continues to process blocks after the merge (post-terminal block), and fixes it by hooking into IPoSSwitcher.TerminalBlockReached to dispose the AuRa finalization manager at the right time. The null-safety improvements and removal of the unused IValidSealerStrategy dependency are welcome cleanups. One issue requires attention before merging.


Issues

Bug: OnTerminalBlock event handler is never unsubscribed — event/memory leak

In AuRaMergeFinalizationManager.cs, the constructor subscribes to poSSwitcher.TerminalBlockReached:

poSSwitcher.TerminalBlockReached += OnTerminalBlock;

But Dispose() never removes this subscription:

public override void Dispose()
{
    _auRaBlockFinalizationManager.BlocksFinalized -= OnBlockFinalized;
    _auRaBlockFinalizationManager.Dispose();
    base.Dispose();  // only unsubscribes OnSwitchHappened, not OnTerminalBlock
}

This means poSSwitcher holds a reference to OnTerminalBlock, which closes over this, preventing garbage collection of the AuRaMergeFinalizationManager instance after Dispose(). On a long-running node this is a memory leak.

Note that the base class (MergeFinalizationManager) correctly stores _poSSwitcher and unsubscribes its own handler in Dispose() — but AuRaMergeFinalizationManager's handler is separate.

Recommended fix — a self-unsubscribing handler is the cleanest approach given that _poSSwitcher is private in the base class:

private void OnTerminalBlock(object? sender, EventArgs e)
{
    if (sender is IPoSSwitcher switcher)
        switcher.TerminalBlockReached -= OnTerminalBlock;
    _auRaBlockFinalizationManager.Dispose();
}

Alternatively, AuRaMergeFinalizationManager could store poSSwitcher locally and unsubscribe in Dispose().

Fix this →


Observations & Minor Notes

_auRaBlockFinalizationManager.Dispose() called twice in post-merge path — In AuRaMergeFinalizationManager, if the terminal block is reached at runtime (not at startup), Dispose() is called once in OnTerminalBlock and then again in AuRaMergeFinalizationManager.Dispose(). Since AuRaBlockFinalizationManager.Dispose() only unsubscribes events (a safe no-op if done twice in C#), this is harmless, but worth noting.

IValidSealerStrategy removal (InitializeBlockchainAuRa.cs) — The strategy was stored but never actually used in AuRaBlockFinalizationManager (it only uses IValidatorStore for GetMinSealersForFinalization). Removing it is correct and tightens the constructor contract.

GetBlockInfo null-safety — Changing First(...)FirstOrDefault(...) with a nullable return type and propagating null through callers is a good defensive change. The else return Empty early-exit at line 220 correctly handles the case where the block isn't found in the chain level.

field keyword for LastFinalizedBlockLevel — This is the C# 14 stable field keyword (stabilized in .NET 10 / C# 14, which matches global.json's sdk.version: 10.0.100). Usage is correct.

IRlpDecoder.csin variance annotation — Adding in (contravariant) to IRlpStreamEncoder<in T> is correct since T is only consumed by Encode. However, this change is unrelated to the shutdown fix and should ideally be in a separate PR for traceability.

PR description — The Changes section and checkboxes are unfilled. Even a one-liner description ("Unsubscribe AuRa finalization from block processing events on merge transition") and ticking Bugfix would help reviewers and future changelog readers.


@LukaszRozmej
Copy link
Copy Markdown
Member Author

@claude re-review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 7, 2026

Claude finished @LukaszRozmej's task in 2m 55s —— View job


Code Review

  • Gather context
  • Read changed files
  • Analyze changes
  • Post review feedback

Summary

The previous review's blocking bug (event/memory leak from OnTerminalBlock never unsubscribing) has been fixed — the handler now self-unsubscribes correctly and a dedicated test (Terminal_block_handler_unsubscribes_itself) verifies it. The PR is close to ready, but one test has a correctness issue worth fixing.


Issues

Medium: Post_merge_blocks_do_not_trigger_aura_finalization passes vacuously

The test calls manager.SetMainBlockBranchProcessor(_branchProcessor), which in AuRaMergeFinalizationManager delegates to _auRaFinalizationManager.SetMainBlockBranchProcessor(branchProcessor). Since _auRaFinalizationManager is an NSubstitute mock, that call is a no-op — no handler is ever subscribed to _branchProcessor.BlockProcessed. Raising the event on the next line can't trigger anything regardless of whether the fix is in place or not.

// This can't reach AuRa finalization regardless — mock SetMainBlockBranchProcessor is a no-op
_branchProcessor.BlockProcessed += Raise.EventWith(
    new BlockProcessedEventArgs(...));

A test that would actually break without the fix should raise _auRaFinalizationManager.BlocksFinalized directly and verify it does not propagate to manager.BlocksFinalized post-merge. However, note that MergeFinalizationManager.OnBlockFinalized (line 40–43 in MergeFinalizationManager.cs) has no IsPostMerge guard — so even raising the mock event would propagate. The real guard is that AuRaBlockFinalizationManager.Dispose() prevents the real implementation from firing BlocksFinalized in the first place.

The simplest honest test is to verify that _auRaFinalizationManager stops receiving new block processing calls after the merge — which Disposes_aura_manager_on_merge already does indirectly. You could either remove Post_merge_blocks_do_not_trigger_aura_finalization (since it's covered by the dispose tests), or replace the _branchProcessor event raise with:

// After merge, verify AuRa Dispose was called (preventing future block processing)
_auRaFinalizationManager.Received(1).Dispose();
// And that the subscription is gone — raising BlocksFinalized on the mock would still propagate
// (base OnBlockFinalized has no IsPostMerge guard), so the real protection is Dispose()

Fix this →


Observations & Minor Notes

Previous blocking bug is fixed (OnTerminalBlock self-unsubscribes) — the new Terminal_block_handler_unsubscribes_itself test directly verifies this. Good.

_validSealerStrategy dead code in AuRaBlockFinalizationManagerTests (line 28 + 37) — the field is declared and set up in [SetUp] but never passed to any constructor or used in assertions after the parameter was removed. Safe to delete.

Double dispose is still present_auRaBlockFinalizationManager.Dispose() is called in OnTerminalBlock and in Dispose(). Since AuRaBlockFinalizationManager.Dispose() only unsubscribes event handlers (idempotent in C#), this is safe. But in the startup path (HasEverReachedTerminalBlock() == true), it's also called in the constructor, meaning it can be called three times. Still safe, but the mutual-exclusive branching in the constructor (if/else) already prevents the startup/runtime double-invoke for OnTerminalBlock. The remaining redundancy is just in Dispose() always calling it regardless of whether it was already disposed.

IRlpDecoder.cs variance annotation — Adding in contravariance to IRlpStreamEncoder<in T> is correct (T is only consumed), but this change is unrelated to the shutdown fix and would be better tracked separately.

PR description — Still unfilled (Changes section blank, checkboxes unticked). Even a one-liner and ticking Bugfix + Yes / Yes on testing would help reviewers and the changelog.


@LukaszRozmej
Copy link
Copy Markdown
Member Author

@claude re-review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 7, 2026

Claude finished @LukaszRozmej's task in 1m 55s —— View job


Code Review

  • Gather context
  • Read changed files
  • Analyze changes
  • Post review feedback

Summary

Good progress. Both blocking issues from prior reviews are resolved: OnTerminalBlock self-unsubscribes correctly, and the vacuous Post_merge_blocks_do_not_trigger_aura_finalization test has been removed. The PR is nearly ready. One dead-code nit remains, and one minor correctness gap in Dispose().


Issues

Minor: OnTerminalBlock is not unsubscribed in Dispose() — leak on early shutdown

In AuRaMergeFinalizationManager.cs, when HasEverReachedTerminalBlock() == false at construction time, OnTerminalBlock is subscribed:

else
{
    poSSwitcher.TerminalBlockReached += OnTerminalBlock;
}

But Dispose() only unsubscribes OnBlockFinalized and delegates to base.Dispose() (which unsubscribes OnSwitchHappened). It never removes OnTerminalBlock:

public override void Dispose()
{
    _auRaBlockFinalizationManager.BlocksFinalized -= OnBlockFinalized;
    _auRaBlockFinalizationManager.Dispose();
    base.Dispose();  // ← unsubscribes OnSwitchHappened only
}

If the node is disposed before the terminal block is reached (e.g., resync / early shutdown before merge), poSSwitcher retains a reference to OnTerminalBlock, which closes over this. The self-unsubscribe inside OnTerminalBlock only fires if the event is raised — it won't fire during a clean Dispose() path.

In practice this is unlikely (Gnosis will always reach the terminal block), but it's worth fixing. Since _poSSwitcher is private in the base, the easiest fix is to store it locally:

private readonly IPoSSwitcher _poSSwitcher;

// constructor:
_poSSwitcher = poSSwitcher;
if (!poSSwitcher.HasEverReachedTerminalBlock())
    _poSSwitcher.TerminalBlockReached += OnTerminalBlock;

// Dispose():
_poSSwitcher.TerminalBlockReached -= OnTerminalBlock;  // safe no-op if already self-unsubscribed

Fix this →

Nit: _validSealerStrategy dead code in AuRaBlockFinalizationManagerTests.cs

This was flagged in the previous review and is still present. Lines 28 and 37 declare and initialize _validSealerStrategy but it is never passed to any constructor or used in any assertion after IValidSealerStrategy was removed from AuRaBlockFinalizationManager.

// Line 28 — declared but never used
private IValidSealerStrategy _validSealerStrategy;

// Line 37 — set but never used
_validSealerStrategy = Substitute.For<IValidSealerStrategy>();

Fix this →


Observations

Previous blocking issues resolvedOnTerminalBlock self-unsubscribes ✓; Terminal_block_handler_unsubscribes_itself test verifies it ✓; vacuous Post_merge_blocks_do_not_trigger_aura_finalization test removed ✓.

Double dispose is still presentDispose() unconditionally calls _auRaBlockFinalizationManager.Dispose(), which was also called in OnTerminalBlock (runtime path) or in the constructor (startup path). AuRaBlockFinalizationManager.Dispose() only unsubscribes event handlers, so this is idempotent and safe.

field keyword — the C# 14 field keyword usage for LastFinalizedBlockLevel (lines 343–354 of AuRaBlockFinalizationManager.cs) is correct for .NET 10 / C# 14 per global.json.

IRlpDecoder.cs variance annotation — the in contravariant annotation on IRlpStreamEncoder<in T> is technically correct but unrelated to this shutdown fix. Better tracked separately for traceability.


@LukaszRozmej
Copy link
Copy Markdown
Member Author

@claude re-review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 7, 2026

Claude finished @LukaszRozmej's task in 58s —— View job


Code Review

  • Gather context
  • Read changed files
  • Analyze changes
  • Post review feedback

Summary

Both issues flagged in the previous review have been resolved. The PR is ready to merge.


Resolved Issues

OnTerminalBlock not unsubscribed in Dispose() — Fixed. _poSSwitcher is now stored as a field (line 16) and Dispose() explicitly unsubscribes (line 57):

_poSSwitcher.TerminalBlockReached -= OnTerminalBlock;  // safe no-op if already self-unsubscribed

This correctly handles the early-shutdown path (node disposed before terminal block reached).

_validSealerStrategy dead code in AuRaBlockFinalizationManagerTests.cs — Fixed. The unused field and its [SetUp] initialization are gone.


Remaining Observations (non-blocking)

Double dispose still present_auRaBlockFinalizationManager.Dispose() is called in both OnTerminalBlock (runtime path) and Dispose(). Since AuRaBlockFinalizationManager.Dispose() only unsubscribes event handlers, this remains idempotent and safe.

IRlpDecoder.cs variance annotation — Adding in contravariance to IRlpStreamEncoder<in T> is technically correct, but it is still unrelated to this shutdown fix. No action required — just noting it would be easier to trace in its own commit/PR.


@LukaszRozmej LukaszRozmej marked this pull request as ready for review April 7, 2026 16:43
@LukaszRozmej LukaszRozmej requested a review from Marchhill as a code owner April 7, 2026 16:43
@LukaszRozmej LukaszRozmej requested review from benaadams and flcl42 April 7, 2026 16:48
@LukaszRozmej LukaszRozmej merged commit 28d309c into master Apr 8, 2026
426 checks passed
@LukaszRozmej LukaszRozmej deleted the fix/aura-finalization-manager-shutdown branch April 8, 2026 20:39
stdevMac pushed a commit that referenced this pull request Apr 22, 2026
…-rc2 backport) (#11307)

fix: skip AuRa finalization startup walk on post-merge chains

AuRaBlockFinalizationManager ran a catch-up walk from head in its constructor,
allocating a List<BlockHeader> that grew to 10M+ entries on post-merge Gnosis
after upgrading to 1.37.0. The walk is meaningless post-merge (finalization is
handled by the consensus layer) but previously still executed because the
inner manager had no way to know it was wrapped by AuRaMergeFinalizationManager.

Move Initialize out of the constructor into SetMainBlockBranchProcessor with
an idempotency guard, and add an InitializeBlockchainAuRaMerge override that
checks IPoSSwitcher and skips the wiring call after the terminal block has
been reached. Pre-merge and pure-AuRa chains are unaffected.

Backport of #11306 (master) to release/1.37.0-rc2. The release branch is
pre-#11062 so the fix was re-implemented by hand for the older code structure.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants