Skip to content

fix: heap-allocate RPC context and detach Rust thread#757

Merged
anshalshukla merged 5 commits into
mainfrom
fix/ethlibp2p-stream-context-and-thread-lifecycle
Apr 20, 2026
Merged

fix: heap-allocate RPC context and detach Rust thread#757
anshalshukla merged 5 commits into
mainfrom
fix/ethlibp2p-stream-context-and-thread-lifecycle

Conversation

@ch4r10t33r
Copy link
Copy Markdown
Contributor

Summary

Two memory-safety fixes in pkgs/network/src/ethlibp2p.zig surfaced by an adversarial review of FFI ownership and thread lifecycle on the Zig ↔ Rust boundary.

1. ServerStreamContext escaped the stack (latent)

handleRPCRequestFromRustBridge stack-allocated ServerStreamContext and handed &stream_context to ReqRespServerStream.ptr. With the current in-tree handler the stream is consumed synchronously, so the pointer is still valid when dereferenced — the bug is latent, not live. But any future handler that retains the stream across async work (e.g. queues it for a later response send) would be reading the stack slot after this function has already returned.

Heap-allocate the context and free it via defer. One extra small allocation per inbound RPC; no behavior change today, and future-proof against handlers that outlive the call.

2. rustBridgeThread was never joined or detached (real)

EthLibp2p.run spawned rustBridgeThread via Thread.spawn but deinit did nothing with the handle. Two problems:

  • The Thread handle (the join-state slot) leaked on every EthLibp2p teardown.
  • The OS thread continued calling back into the now-freed EthLibp2p on mid-process shutdown — a use-after-free waiting for the right timing.

The Rust libp2p loop has no shutdown hook today, so calling thread.join() here would block forever. The minimal correct move is thread.detach() in deinit, which releases the Zig handle and makes it explicit that the OS thread is allowed to outlive the struct. The comment calls out that a proper stop_network FFI is the real fix.

Follow-ups (not in this PR)

  • Add a stop_network / shutdown-signal FFI on the Rust side so deinit can actually join instead of detach, and so EthLibp2p can be safely torn down mid-process.
  • Audit other export fn ...FromRustBridge entry points for the same stack-escape pattern.

Test plan

  • zig fmt --check .
  • zig build (release build, includes the Rust workspace)
  • zig build test --summary all — 143/143 tests pass, including pkgs/network/src/lib.zig (8 passed) and pkgs/node/src/lib.zig (53 passed)
  • Manual sanity check on a running devnet that RPC request/response still round-trips
  • Follow-up PR: add Rust-side stop_network and switch detachjoin

Two ethlibp2p.zig memory-safety fixes surfaced by an adversarial review:

1. handleRPCRequestFromRustBridge was stack-allocating ServerStreamContext
   and handing &stream_context to ReqRespServerStream.ptr. The current
   request handler uses the stream synchronously so the pattern is safe
   today, but any future handler that retains the stream across
   async work would dereference a stack slot that has already been
   unwound. Move the context to the heap and free it via defer.

2. EthLibp2p.run spawned rustBridgeThread with Thread.spawn but deinit
   never joined or detached it, leaking the Thread handle and leaving
   the spawned thread calling into a freed struct on teardown. The Rust
   side has no shutdown hook, so joining would hang; detach the thread
   in deinit and document that a proper stop_network FFI is still needed
   for clean mid-process shutdown.
@ch4r10t33r ch4r10t33r changed the title libp2p: heap-allocate RPC stream context and detach rust bridge thread libp2p: heap-allocate RPC context and detach Rust thread Apr 17, 2026
@ch4r10t33r ch4r10t33r marked this pull request as ready for review April 17, 2026 19:32
@ch4r10t33r ch4r10t33r requested review from GrapeBaBa and g11tech April 17, 2026 19:47
@ch4r10t33r ch4r10t33r changed the title libp2p: heap-allocate RPC context and detach Rust thread fix: heap-allocate RPC context and detach Rust thread Apr 18, 2026
Comment thread pkgs/network/src/ethlibp2p.zig Outdated
ch4r10t33r and others added 2 commits April 20, 2026 10:28
Rust side: add a per-network Arc<Notify> shutdown signal that's installed
when run_eventloop starts and polled in a biased select arm. The new
stop_network FFI posts notify_one, so a signal issued before the first
.notified().await is latched as a permit and the loop still exits on its
next iteration. After the loop breaks, clear the swarm, zig handler,
notify and ready flag so a subsequent start_network on the same id
starts from a blank slate.

Zig side: declare the stop_network extern, and replace the detach() in
EthLibp2p.deinit with stop_network + thread.join. The join now completes
deterministically because the Rust thread is guaranteed to unwind, which
closes the earlier use-after-free window where the Rust loop was still
issuing callbacks into a struct whose gossip/peer/reqresp handlers had
already been deinited.
@ch4r10t33r ch4r10t33r requested a review from anshalshukla April 20, 2026 09:30
…context-and-thread-lifecycle

# Conflicts:
#	pkgs/network/src/ethlibp2p.zig
@anshalshukla anshalshukla merged commit 0b1835b into main Apr 20, 2026
13 checks passed
@anshalshukla anshalshukla deleted the fix/ethlibp2p-stream-context-and-thread-lifecycle branch April 20, 2026 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants