fix(p2p/kademlia): guard handlePing against nil/malformed Sender (panic-safety)#298
Merged
Merged
Conversation
A peer sending a Ping with nil or structurally-invalid Sender (e.g. nil *Node, or *Node with nil ID) caused gob.Encode of the response to walk a nil pointer and SIGSEGV inside encoding/gob.encUint8Array. Because the per-conn goroutine spawned at serve() L595 had no upstream recover() and handlePing — unlike handleFindNode — lacked its own defer s.handlePanic(), the entire supernode process died. Observed in production on lumera-devnet-1 val4 (goroutine 2067, repanicked). Fix is two layers, both required: 1. handlePing: nil-guard message/Sender/Sender.ID, sanitise the peer- supplied Node before reflecting it back on the wire (do not echo attacker-controlled HashedID), and wrap the function in a deferred recover() that converts any future panic in this path to an error return rather than process death. 2. serve loop: wrap the bare 'go s.handleConn(ctx, conn)' in a top- level recover() so any future request-handler that forgets its own defer s.handlePanic() cannot crash the process either. Adds handle_ping_test.go covering nil message, nil Sender, empty Sender ID, and a panic-recovery smoke test (constructs a *Network without a *DHT and asserts handlePing returns an error rather than panicking to the caller). Risk: low. Behaviour change only on inputs that previously crashed the process; any well-formed Ping is unaffected. Sanitisation drops the peer-supplied HashedID and recomputes it locally via SetHashedID, matching how dht.go does it everywhere else. Refs: ops RCA at /root/lumera/kademlia-panic-rca.md
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens Kademlia Ping handling against malformed peer input that previously could panic and crash the process.
Changes:
- Adds validation and panic recovery to
handlePing. - Sanitizes peer
Senderdata before using it in the Ping response and routing-table update. - Wraps per-connection handling in
servewith a top-level recover. - Adds regression tests for nil/malformed Ping inputs and panic recovery.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
p2p/kademlia/network.go |
Adds Ping validation, sanitized sender construction, local recovery, and outer connection recovery. |
p2p/kademlia/handle_ping_test.go |
Adds tests for nil message, nil sender, empty sender ID, and recovered panic behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+379
to
+385
| sender := &Node{ | ||
| ID: message.Sender.ID, | ||
| IP: message.Sender.IP, | ||
| Port: message.Sender.Port, | ||
| Version: message.Sender.Version, | ||
| } | ||
| sender.SetHashedID() |
Comment on lines
+360
to
+361
| // *Node and SIGSEGVs inside encUint8Array, killing the goroutine and (since | ||
| // there is no upstream recover()) the entire process. |
j-rafique
approved these changes
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(p2p/kademlia): guard
handlePingagainst nil/malformed Sender (panic-safety)What & why
A peer sending a kademlia Ping with a nil or structurally invalid
Sendercausedgob.Encodeof the outgoing response to walk a nil pointer and SIGSEGV insideencoding/gob.encUint8Array. Because the per-conn goroutine spawned byserve()(p2p/kademlia/network.go:595) had no upstreamrecover(), andhandlePing— unlike siblinghandleFindNode— had nodefer s.handlePanic(...)of its own, the entire supernode process died.Observed in production on
lumera-devnet-1val4 SN (goroutine 2067, gob's internalcatchErrorrepanicked, SIGSEGV addr=0x0). Full RCA:kademlia-panic-rca.md.Fix — two layers, both required
1.
handlePing(p2p/kademlia/network.go)Senderpointer, andSender.IDlength. Return an error instead of dereferencing.*Nodebefore reflecting it back on the wire. Specifically, do not echoHashedID(attacker-controlled) — recompute it locally viaNode.SetHashedID(), mirroring howdht.goconstructs every other outgoing Node.recover()around the whole function: any future panic in this code path becomes an error return rather than process death.2.
serveloop (p2p/kademlia/network.go)Wrap the bare
go s.handleConn(ctx, conn)in a top-level recover. Sibling handlers (handleFindNode,handleStoreData, etc.) already install their owndefer s.handlePanic(...), but this is defense-in-depth so any future handler that forgets to install its own cannot crash the process either. On recover the connection is closed cleanly.Risk
*Nodeis constructed from the same{ID, IP, Port, Version}andSetHashedID()is the same call already used everywhere else indht.go/hashtable.go.Mainnet exposure
HIGH (potentially CRITICAL pending review of
NewSecureServerConn). Any peer that can complete one Lumera P2P secure handshake can crash any other SN at will. After restart the same probe repeats — sustained DoS. The "Server secure handshake failed: EOF" WARNs that immediately preceded the crash on val4 indicate the attacker was probing the handshake state machine; the panic fired the moment one probe completed.Tests
p2p/kademlia/handle_ping_test.go:TestHandlePing_NilMessage— nil*Messagereturns error.TestHandlePing_NilSender— Message with no Sender returns error.TestHandlePing_EmptySenderID— Sender with nil/emptyIDreturns error (table-driven).TestHandlePing_PanicRecovered— constructs a*Networkwithout a*DHTsonewMessagepanics on the nil-receiver deref; asserts the deferredrecover()converts that to an error return rather than letting the panic escape to the caller (which, in production, would have crashed the process).Follow-ups (NOT in this PR)
FuzzDecodeThenDispatchoverhandleConn's switch — seed corpus with malformed Sender (empty ID, oversizedHashedID, registered-interface confusion onData).handle*innetwork.go:143–1325for the same omission pattern (handleFindNodeetc. all have defers, but worth one targeted pass).Observability
On panic, a structured
logtrace.Erroris emitted withmodule=p2pand the panic value. Datadog query:service:*supernode* @module:p2p "panic recovered".