fix(daemon): reap dead-peer clients + inactivity backstop (Layer 2, #692)#712
Merged
Conversation
… can't leak (#692) Layer-2 defense-in-depth follow-up to the Windows PPID watchdog fix (#711). That fix makes an orphaned proxy exit so its socket closes and the daemon reaps via the refcount + idle timer. This adds two daemon-side safety nets for the residual case where a socket close is never delivered (a Windows named-pipe hazard) and a phantom client would otherwise pin the daemon forever: - Liveness sweep: a proxy now sends an optional client-hello carrying its pid (+ host pid) right after verifying the daemon hello; the daemon periodically drops any client whose peer process is dead, re-arming the idle timer. Fail-safe and version-pinned — a connection that never sends the hello just falls back to the socket-close lifecycle, and the daemon reads it before the transport so a non-hello first line is handed through untouched. - Inactivity backstop: the daemon exits after a generous no-traffic window (CODEGRAPH_DAEMON_MAX_IDLE_MS, default 30 min) even with clients attached, so a phantom client that sends nothing can't keep it alive. Pure helpers (parseClientHelloLine, peerIsDead) are unit-tested; the full handshake + sweep and the backstop are covered end-to-end in mcp-daemon.test.ts. Validated on a real Windows 11 VM: the sweep reaps a dead-pid client over a named pipe and the backstop fires with a client still connected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
colbymchenry
added a commit
that referenced
this pull request
Jun 6, 2026
…it (#662) (#713) When an MCP host (opencode and others) SIGTERM's the shared daemon as a new session starts, the existing session's proxy used to exit on the dropped socket — silently losing CodeGraph for that session, and hanging any request in flight at the drop. The SIGTERM originates in the host's process-tree teardown, not in CodeGraph (nothing here signals another process), so the fix is proxy resilience, not chasing the signal. The local-handshake proxy now treats a daemon disconnect as recoverable rather than terminal: it falls back to its in-process engine for the rest of the session (the same path used when no daemon is reachable at startup, and what CODEGRAPH_NO_DAEMON does) and re-serves any requests that were in flight to the dead daemon, so the host never hangs. The proxy still exits when the HOST goes away (stdin close / PPID watchdog) — only daemon loss is now non-fatal. Also replaces the over-the-wire liveness-sweep test added in #712 — which was flaky under heavy parallel load (a raced raw-socket connect) — with a deterministic Daemon.reapDeadClients unit test. The client-hello round-trip is still exercised by every daemon test (the real proxy now sends it). Validated with a reproduction (proxy stays alive, in-flight request answered, post-drop request recovers) and a regression test in mcp-daemon.test.ts. Confirmed on macOS (full suite green) and a Windows 11 VM. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Defense-in-depth follow-up to #711 (the Windows PPID watchdog fix for #692). #711 makes an orphaned proxy exit so its socket closes and the daemon reaps the client via the refcount + idle timer. This PR adds two daemon-side safety nets for the residual case the watchdog can't cover: a socket close that is never delivered (a Windows named-pipe hazard), where a phantom client would otherwise pin the daemon forever and the idle timer (which only arms at zero clients) never fires.
What changed
CODEGRAPH_DAEMON_CLIENT_SWEEP_MS, default 30s) drops any client whose peer process is dead, re-arming the idle timer. Reaps a dead client within one sweep instead of never.CODEGRAPH_DAEMON_MAX_IDLE_MS, default 30 min) even with clients still attached — a phantom client sends nothing, so it can't keep the daemon alive. A lightweight socket observer feeds the activity clock; no protocol parsing on this path.Safety / risk
Test plan
parseClientHelloLine+peerIsDead; two end-to-end tests inmcp-daemon.test.ts— a raw client announcing a dead pid gets reaped, and the daemon exits on the inactivity backstop with a client still connected.Reaping client with dead peer (pid 999999), and the backstop fires with a client connected.Follow-up to #711 / #692 (already closed by #711); no issue to re-close.
🤖 Generated with Claude Code