Skip to content

fix(p2p/kademlia): guard handlePing against nil/malformed Sender (panic-safety)#298

Merged
mateeullahmalik merged 1 commit into
masterfrom
fix/kademlia-handleping-panic-safety
May 29, 2026
Merged

fix(p2p/kademlia): guard handlePing against nil/malformed Sender (panic-safety)#298
mateeullahmalik merged 1 commit into
masterfrom
fix/kademlia-handleping-panic-safety

Conversation

@mateeullahmalik
Copy link
Copy Markdown
Collaborator

fix(p2p/kademlia): guard handlePing against nil/malformed Sender (panic-safety)

What & why

A peer sending a kademlia Ping with a nil or structurally invalid Sender caused gob.Encode of the outgoing response to walk a nil pointer and SIGSEGV inside encoding/gob.encUint8Array. Because the per-conn goroutine spawned by serve() (p2p/kademlia/network.go:595) had no upstream recover(), and handlePing — unlike sibling handleFindNode — had no defer s.handlePanic(...) of its own, the entire supernode process died.

Observed in production on lumera-devnet-1 val4 SN (goroutine 2067, gob's internal catchError repanicked, SIGSEGV addr=0x0). Full RCA: kademlia-panic-rca.md.

panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0]
goroutine 2067 [running]:
encoding/gob.(*encBuffer).Write(...)
encoding/gob.encUint8Array(...)
encoding/gob.(*Encoder).encodeStruct(...) x2
…
github.com/LumeraProtocol/supernode/v2/p2p/kademlia.(*Network).handlePing
        github.com/LumeraProtocol/supernode/v2/p2p/kademlia/network.go:362 +0xbf
github.com/LumeraProtocol/supernode/v2/p2p/kademlia.(*Network).handleConn.func4
        github.com/LumeraProtocol/supernode/v2/p2p/kademlia/network.go:473 +0x26
created by …kademlia.(*Network).serve in goroutine 95
        github.com/LumeraProtocol/supernode/v2/p2p/kademlia/network.go:595 +0x105

Fix — two layers, both required

1. handlePing (p2p/kademlia/network.go)

  • Nil-guard the message, the Sender pointer, and Sender.ID length. Return an error instead of dereferencing.
  • Sanitise the peer-supplied *Node before reflecting it back on the wire. Specifically, do not echo HashedID (attacker-controlled) — recompute it locally via Node.SetHashedID(), mirroring how dht.go constructs every other outgoing Node.
  • Deferred recover() around the whole function: any future panic in this code path becomes an error return rather than process death.

2. serve loop (p2p/kademlia/network.go)

Wrap the bare go s.handleConn(ctx, conn) in a top-level recover. Sibling handlers (handleFindNode, handleStoreData, etc.) already install their own defer s.handlePanic(...), but this is defense-in-depth so any future handler that forgets to install its own cannot crash the process either. On recover the connection is closed cleanly.

Risk

  • Behaviour change only on inputs that previously crashed the process. Well-formed Pings — both ours and any legitimate peer's — produce byte-identical responses because the sanitised *Node is constructed from the same {ID, IP, Port, Version} and SetHashedID() is the same call already used everywhere else in dht.go / hashtable.go.
  • No public API change.
  • No state-machine implication (off-chain SN code).
  • Rollback: revert this commit; reverts to v1.12.0 behaviour.

Mainnet exposure

HIGH (potentially CRITICAL pending review of NewSecureServerConn). Any peer that can complete one Lumera P2P secure handshake can crash any other SN at will. After restart the same probe repeats — sustained DoS. The "Server secure handshake failed: EOF" WARNs that immediately preceded the crash on val4 indicate the attacker was probing the handshake state machine; the panic fired the moment one probe completed.

Tests

p2p/kademlia/handle_ping_test.go:

  • TestHandlePing_NilMessage — nil *Message returns error.
  • TestHandlePing_NilSender — Message with no Sender returns error.
  • TestHandlePing_EmptySenderID — Sender with nil/empty ID returns error (table-driven).
  • TestHandlePing_PanicRecovered — constructs a *Network without a *DHT so newMessage panics on the nil-receiver deref; asserts the deferred recover() converts that to an error return rather than letting the panic escape to the caller (which, in production, would have crashed the process).
$ go test -count=1 -run TestHandlePing ./p2p/kademlia/
ok  	github.com/LumeraProtocol/supernode/v2/p2p/kademlia	0.045s

$ go test -count=1 -timeout 60s ./p2p/kademlia/...
ok  	github.com/LumeraProtocol/supernode/v2/p2p/kademlia	0.046s
ok  	github.com/LumeraProtocol/supernode/v2/p2p/kademlia/store/meta	0.004s
ok  	github.com/LumeraProtocol/supernode/v2/p2p/kademlia/store/sqlite	9.290s

Follow-ups (NOT in this PR)

  • Fuzz target FuzzDecodeThenDispatch over handleConn's switch — seed corpus with malformed Sender (empty ID, oversized HashedID, registered-interface confusion on Data).
  • Audit every other handle* in network.go:143–1325 for the same omission pattern (handleFindNode etc. all have defers, but worth one targeted pass).
  • Confirm secure-handshake authentication actually gates identity, not just ciphertext framing — if the handshake completes without binding to a registered SN identity, the attacker prerequisite is "any TCP peer", and severity escalates to CRITICAL.

Observability

On panic, a structured logtrace.Error is emitted with module=p2p and the panic value. Datadog query: service:*supernode* @module:p2p "panic recovered".

A peer sending a Ping with nil or structurally-invalid Sender (e.g.
nil *Node, or *Node with nil ID) caused gob.Encode of the response to
walk a nil pointer and SIGSEGV inside encoding/gob.encUint8Array.

Because the per-conn goroutine spawned at serve() L595 had no upstream
recover() and handlePing — unlike handleFindNode — lacked its own
defer s.handlePanic(), the entire supernode process died. Observed in
production on lumera-devnet-1 val4 (goroutine 2067, repanicked).

Fix is two layers, both required:

1. handlePing: nil-guard message/Sender/Sender.ID, sanitise the peer-
   supplied Node before reflecting it back on the wire (do not echo
   attacker-controlled HashedID), and wrap the function in a deferred
   recover() that converts any future panic in this path to an error
   return rather than process death.

2. serve loop: wrap the bare 'go s.handleConn(ctx, conn)' in a top-
   level recover() so any future request-handler that forgets its own
   defer s.handlePanic() cannot crash the process either.

Adds handle_ping_test.go covering nil message, nil Sender, empty
Sender ID, and a panic-recovery smoke test (constructs a *Network
without a *DHT and asserts handlePing returns an error rather than
panicking to the caller).

Risk: low. Behaviour change only on inputs that previously crashed
the process; any well-formed Ping is unaffected. Sanitisation drops
the peer-supplied HashedID and recomputes it locally via SetHashedID,
matching how dht.go does it everywhere else.

Refs: ops RCA at /root/lumera/kademlia-panic-rca.md
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens Kademlia Ping handling against malformed peer input that previously could panic and crash the process.

Changes:

  • Adds validation and panic recovery to handlePing.
  • Sanitizes peer Sender data before using it in the Ping response and routing-table update.
  • Wraps per-connection handling in serve with a top-level recover.
  • Adds regression tests for nil/malformed Ping inputs and panic recovery.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
p2p/kademlia/network.go Adds Ping validation, sanitized sender construction, local recovery, and outer connection recovery.
p2p/kademlia/handle_ping_test.go Adds tests for nil message, nil sender, empty sender ID, and recovered panic behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread p2p/kademlia/network.go
Comment on lines +379 to +385
sender := &Node{
ID: message.Sender.ID,
IP: message.Sender.IP,
Port: message.Sender.Port,
Version: message.Sender.Version,
}
sender.SetHashedID()
Comment thread p2p/kademlia/network.go
Comment on lines +360 to +361
// *Node and SIGSEGVs inside encUint8Array, killing the goroutine and (since
// there is no upstream recover()) the entire process.
@mateeullahmalik mateeullahmalik merged commit b332fed into master May 29, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants