fix(daemon): route auto-handshake through HandshakeSendRequest wrapper (follow-up to #208)#209
Merged
Merged
Conversation
…keSendRequest Follow-up to PR #208. That PR added per-peer in-flight dedup at the daemon entry point (HandshakeSendRequest). Field-verified 2026-05-31: with the fix in place the *IPC-side* burst symptom is gone (0 "ephemeral ports exhausted"), but a second pathway was exposed — the proactive auto-handshake at line 3125 calls the plugin's SendRequest *directly*, bypassing the dedup wrapper. The recursive trip: DialConnection → checks trustedagents.IsTrusted → fires `go SendRequest` → sendMessage → DialAndSend → DialConnection → same check fires → another `go SendRequest` → ... (self-driving fanout) In the local repro, one user-triggered handshake to a trusted-agents peer (crossref-funders, node 41451) produced ~12,279 "direct handshake failed" log entries within ~800 ms before the dial budget timed out the inner chain. The dedup wrapper from PR #208 catches the *first* recursive call: it short-circuits with ErrHandshakeInFlight, the goroutine returns quietly, and the fanout doesn't happen. One-line change: replace the direct plugin call with the daemon-level wrapper. Error return is discarded (the goroutine is fire-and-forget, same as before); ErrHandshakeInFlight is the dedup success case. Wire / IPC / persistence contracts unchanged.
Collaborator
|
🤖 Hank — CI status Classification: The build/test failure is a genuine code defect: @matthew-pilot — fix or comment. Auto-classified at 2026-06-01T10:26:00Z. Re-runs on next push or check completion. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #208. That PR added per-peer in-flight dedup at the daemon's IPC entry point (`HandshakeSendRequest`) and the IPC-side burst symptom is gone — 0 "ephemeral ports exhausted" errors in the local repro post-merge.
But the same repro surfaced a second pathway: the proactive auto-handshake fired from inside `DialConnection` (`daemon.go:3125`) calls `d.handshakes.SendRequest` directly, bypassing the dedup wrapper. That call re-enters `DialConnection` on port 444 for the same peer, which re-checks the same gate, fires another `go SendRequest`, and the chain self-fuels until the dial budget cuts it.
In the local repro one user-triggered handshake to a `trustedagents` peer (crossref-funders, node 41451) produced ~12,279 "direct handshake failed" log entries within ~800 ms before the dial timeout cleared it.
Fix
One-line: route the auto-handshake through `d.HandshakeSendRequest` instead of `d.handshakes.SendRequest`. The dedup map now catches the recursive call — second entry sees the in-flight slot, short-circuits with `ErrHandshakeInFlight`, the goroutine returns, fanout doesn't happen.
The error return is discarded — the goroutine is fire-and-forget, same as before. `ErrHandshakeInFlight` is the dedup success case.
Why this is safe
Test plan