Skip to content

fix(node): correct init error paths (double free / double thread_pool deinit)#877

Merged
ch4r10t33r merged 5 commits into
mainfrom
fix/init-teardown-double-free
May 14, 2026
Merged

fix(node): correct init error paths (double free / double thread_pool deinit)#877
ch4r10t33r merged 5 commits into
mainfrom
fix/init-teardown-double-free

Conversation

@ch4r10t33r
Copy link
Copy Markdown
Contributor

Summary

  • BeamNode.init (pkgs/node/src/node.zig): Removed the overlapping errdefer allocator.destroy(chain) that ran after the errdefer { chain.deinit(); allocator.destroy(chain); } on any error after BeamChain.init succeeded. Unwind then called destroy on the same pointer twice → DebugAllocator double free (seen when later init steps fail, e.g. metrics bind AddressInUse / ServerStartupFailed).
  • Node.init (pkgs/cli/src/node.zig): On xmss.setupVerifier() failure, dropped the extra thread_pool.deinit()errdefer self.thread_pool.deinit() already runs, so the manual call was a double deinit.

Context

Operators hit error.AddressInUse on the metrics port; startMetricsServer returns ServerStartupFailed and Node.init unwinds. The duplicate *BeamChain teardown amplified that into allocator corruption (double free + bogus leak reports).

Testing

  • zig build (full default pipeline) — succeeded locally.

- BeamNode.init: drop duplicate errdefer on *BeamChain; on BeamChain.init
  failure destroy the allocation only; keep a single errdefer for
  deinit+destroy after success.
- cli Node.init: xmss.setupVerifier errors must not call thread_pool.deinit
  manually — errdefer already runs it (was double deinit).
@zclawz
Copy link
Copy Markdown
Contributor

zclawz commented May 13, 2026

Reviewed PR #877. No blocking issues found.

What I checked:

  • NodeRunner.init XMSS verifier setup failure now relies on the existing errdefer self.thread_pool.deinit(), so the thread pool is not deinitialized twice.
  • BeamNode.init now destroys the raw BeamChain allocation only on BeamChain.init failure, and installs the chain.deinit(); allocator.destroy(chain); errdefer only after successful initialization. This removes the double-destroy path on later init errors like startChainWorker() / validator setup.
  • Existing network_init_cleanup behavior is preserved, so Network.init still gets cleaned up on downstream init failure.

Validation:

  • zig build passed locally on the PR head 6e9a764795b55c3c865d175efeead1428965e696 (/tmp/pr877-zig-build.log, EXIT:0).

Minor non-blocking note: the new comment says the two errdefers would double-destroy “if init failed after chain.* was written”; mechanically, the dangerous case is a later error after BeamChain.init succeeds and before BeamNode.init completes. The code itself is correct, so I wouldn’t block on wording.

noopur23
noopur23 previously approved these changes May 13, 2026
GitHub macos-latest often ran zeam-stress with attn queue_full=0 while
the block queue saturated, tripping the harness FATAL. Raise
ZEAM_STRESS_SAT_ATTN_PRODUCERS for the zig build test step only, extend
the quick window slightly, and widen the stall watchdog.
anshalshukla
anshalshukla previously approved these changes May 13, 2026
Homebrew cargo can precede rustup's shim and rejects +nightly.
rustup-run-nightly broke rustc resolution; prepend ~/.cargo/bin on macOS.
@ch4r10t33r ch4r10t33r merged commit 8ec7f10 into main May 14, 2026
13 checks passed
@ch4r10t33r ch4r10t33r deleted the fix/init-teardown-double-free branch May 14, 2026 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants