fix: heap-allocate RPC context and detach Rust thread#757
Merged
anshalshukla merged 5 commits intoApr 20, 2026
Conversation
Two ethlibp2p.zig memory-safety fixes surfaced by an adversarial review: 1. handleRPCRequestFromRustBridge was stack-allocating ServerStreamContext and handing &stream_context to ReqRespServerStream.ptr. The current request handler uses the stream synchronously so the pattern is safe today, but any future handler that retains the stream across async work would dereference a stack slot that has already been unwound. Move the context to the heap and free it via defer. 2. EthLibp2p.run spawned rustBridgeThread with Thread.spawn but deinit never joined or detached it, leaking the Thread handle and leaving the spawned thread calling into a freed struct on teardown. The Rust side has no shutdown hook, so joining would hang; detach the thread in deinit and document that a proper stop_network FFI is still needed for clean mid-process shutdown.
Rust side: add a per-network Arc<Notify> shutdown signal that's installed when run_eventloop starts and polled in a biased select arm. The new stop_network FFI posts notify_one, so a signal issued before the first .notified().await is latched as a permit and the loop still exits on its next iteration. After the loop breaks, clear the swarm, zig handler, notify and ready flag so a subsequent start_network on the same id starts from a blank slate. Zig side: declare the stop_network extern, and replace the detach() in EthLibp2p.deinit with stop_network + thread.join. The join now completes deterministically because the Rust thread is guaranteed to unwind, which closes the earlier use-after-free window where the Rust loop was still issuing callbacks into a struct whose gossip/peer/reqresp handlers had already been deinited.
…context-and-thread-lifecycle # Conflicts: # pkgs/network/src/ethlibp2p.zig
anshalshukla
approved these changes
Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two memory-safety fixes in pkgs/network/src/ethlibp2p.zig surfaced by an adversarial review of FFI ownership and thread lifecycle on the Zig ↔ Rust boundary.
1.
ServerStreamContextescaped the stack (latent)handleRPCRequestFromRustBridgestack-allocatedServerStreamContextand handed&stream_contexttoReqRespServerStream.ptr. With the current in-tree handler the stream is consumed synchronously, so the pointer is still valid when dereferenced — the bug is latent, not live. But any future handler that retains the stream across async work (e.g. queues it for a later response send) would be reading the stack slot after this function has already returned.Heap-allocate the context and free it via
defer. One extra small allocation per inbound RPC; no behavior change today, and future-proof against handlers that outlive the call.2.
rustBridgeThreadwas never joined or detached (real)EthLibp2p.runspawnedrustBridgeThreadviaThread.spawnbutdeinitdid nothing with the handle. Two problems:Threadhandle (the join-state slot) leaked on everyEthLibp2pteardown.EthLibp2pon mid-process shutdown — a use-after-free waiting for the right timing.The Rust libp2p loop has no shutdown hook today, so calling
thread.join()here would block forever. The minimal correct move isthread.detach()indeinit, which releases the Zig handle and makes it explicit that the OS thread is allowed to outlive the struct. The comment calls out that a properstop_networkFFI is the real fix.Follow-ups (not in this PR)
stop_network/ shutdown-signal FFI on the Rust side sodeinitcan actuallyjoininstead ofdetach, and soEthLibp2pcan be safely torn down mid-process.export fn ...FromRustBridgeentry points for the same stack-escape pattern.Test plan
zig fmt --check .zig build(release build, includes the Rust workspace)zig build test --summary all— 143/143 tests pass, includingpkgs/network/src/lib.zig(8 passed) andpkgs/node/src/lib.zig(53 passed)stop_networkand switchdetach→join