fix(daemon): keep a session alive when its daemon is restarted under it (#662)#713
Merged
Merged
Conversation
…it (#662) When an MCP host (opencode and others) SIGTERM's the shared daemon as a new session starts, the existing session's proxy used to exit on the dropped socket — silently losing CodeGraph for that session, and hanging any request in flight at the drop. The SIGTERM originates in the host's process-tree teardown, not in CodeGraph (nothing here signals another process), so the fix is proxy resilience, not chasing the signal. The local-handshake proxy now treats a daemon disconnect as recoverable rather than terminal: it falls back to its in-process engine for the rest of the session (the same path used when no daemon is reachable at startup, and what CODEGRAPH_NO_DAEMON does) and re-serves any requests that were in flight to the dead daemon, so the host never hangs. The proxy still exits when the HOST goes away (stdin close / PPID watchdog) — only daemon loss is now non-fatal. Also replaces the over-the-wire liveness-sweep test added in #712 — which was flaky under heavy parallel load (a raced raw-socket connect) — with a deterministic Daemon.reapDeadClients unit test. The client-hello round-trip is still exercised by every daemon test (the real proxy now sends it). Validated with a reproduction (proxy stays alive, in-flight request answered, post-drop request recovers) and a regression test in mcp-daemon.test.ts. Confirmed on macOS (full suite green) and a Windows 11 VM. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Git config--C++/test |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem (#662)
When multiple sessions run against the same project (reported on opencode / WSL2), starting a new session causes the shared daemon to receive
SIGTERMand restart. The new session reconnects, but the previously-running session's proxy dies and never recovers — that session silently loses CodeGraph until it's restarted.A reproduction confirmed the issue and surfaced a detail the report didn't: a
tools/callin flight at the moment the daemon dies hangs with no response (the host waits on a reply that never comes), on top of the proxy exiting and all subsequent calls being lost.The SIGTERM originates in the host's process-tree teardown, not in CodeGraph — nothing in the engine signals another process (verified). So the fix is proxy resilience, not chasing the signal source.
Fix
The local-handshake proxy now treats a daemon disconnect as recoverable, not terminal:
CODEGRAPH_NO_DAEMON=1does (the reporter's own workaround, now automatic).Before → after (from the reproduction)
Test plan
mcp-daemon.test.ts—proxy survives the daemon dying mid-session and keeps serving (#662): kills the daemon under a live proxy via SIGTERM, asserts the proxy is still alive and still answers a subsequenttools/call.Also in this PR
Replaces the over-the-wire liveness-sweep test added in #712 — flaky under heavy parallel load (a raced raw-socket connect) — with a deterministic
Daemon.reapDeadClientsunit test. The client-hello round-trip is still exercised by every daemon test, since the real proxy now sends it.Fixes #662.
🤖 Generated with Claude Code