fix(windows): reap orphaned MCP processes when their parent exits by colbymchenry · Pull Request #711 · colbymchenry/codegraph

colbymchenry · 2026-06-06T19:05:08Z

Problem

On Windows, codegraph's background processes pile up without bound over a long session and eventually saturate CPU — closing the editor/agent that launched CodeGraph does not terminate the associated processes, and the shared daemon's 5-minute idle timeout never fires. (#692, with #576 and #680 as the same symptom.)

Root cause

All three PPID watchdogs (proxy socket, proxy local-handshake, direct mode) detected parent death via only:

ppidChanged — a POSIX signal: the OS reparents an orphan to init, so process.ppid diverges. Windows never reparents, so process.ppid stays constant after the parent dies and this can never fire.
hostGone — needs CODEGRAPH_HOST_PPID, which is set by the wasm relaunch. The standalone bundle pre-bakes --liftoff-only, so the relaunch is skipped and HOST_PPID is never set.

On a Windows standalone bundle (exactly #692's environment) neither condition can fire → the orphaned proxy/server runs forever → its socket never closes → the shared daemon keeps a phantom client → clients never reaches 0 → the idle timer never arms → processes accumulate.

Confirmed empirically on a Windows 11 VM: a child's process.ppid stays constant across parent death (ppid_changed=false), while process.kill(originalPpid, 0) starts throwing the moment the parent exits.

Fix

Add a win32-only signal: poll the original parent's liveness directly (process.kill(originalPpid, 0)), since ppid is stable on Windows. Gated to win32 on purpose — on POSIX a double-forked grandparent can legitimately outlive the reparent, so a dead originalPpid is not proof of orphaning there; the ppid-change signal remains correct and sufficient. The decision is extracted into a pure helper (src/mcp/ppid-watchdog.ts) shared by all three sites, with a cross-platform unit matrix so the Windows branch is covered on any OS.

POSIX behavior is unchanged.

Test plan

macOS: full suite green (1228 passed, 2 skipped); new 9-case unit matrix; existing POSIX reparent integration test (mcp-ppid-watchdog.test.ts) and the daemon lifecycle suite (Share one serve --mcp per project across concurrent MCP clients to avoid N× inotify + N× index cost #411/serve --mcp is not reaped when the parent Claude Code process is SIGKILL'd (Linux) #277/idle-timeout) all pass.
Real Windows 11 VM: clean build; unit matrix 9/9; and a live probe reproducing Daemon processes leak indefinitely on Windows - idle timeout never fires after proxy disconnects via EPIPE #692's exact bundle scenario (direct mode, no HOST_PPID) — the orphaned server exited within one watchdog poll, stderr confirming the new path:
```
[CodeGraph MCP] Parent process exited (parent pid 8372 exited); shutting down.
```
(parent pid … exited is produced only by the new win32 liveness branch — not the POSIX ppid → path nor host pid … exited.)

Fixes #692.
Related: #576 (same Windows orphan-reaping mechanism) and #680 (the same symptom; should be resolved for Windows hosts).

🤖 Generated with Claude Code

…, #576, #680) On Windows the PPID watchdog could never fire: orphans aren't reparented, so `process.ppid` stays constant after the parent dies (defeating the ppid-change check), and the standalone bundle pre-bakes `--liftoff-only`, skipping the relaunch that sets `CODEGRAPH_HOST_PPID` (defeating the host-liveness check). With neither signal available, an orphaned proxy / direct server ran forever, the shared daemon never saw the client disconnect, and its idle timer never armed — node processes accumulated until CPU saturated. Add a win32-only signal: poll the original parent's liveness directly, since ppid is stable there. Gated to Windows so POSIX double-fork cases keep relying on the ppid-change signal (a dead original parent is not proof of orphaning on POSIX). The decision is extracted into a pure, unit-tested helper shared by all three watchdog sites (proxy socket, proxy local-handshake, direct mode). Validated on a real Windows 11 VM: in the exact bundle scenario (direct mode, no HOST_PPID) an orphaned server now exits within one watchdog poll via the new path; the POSIX reparent path is unchanged and its integration test still passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… can't leak (#692) (#712) Layer-2 defense-in-depth follow-up to the Windows PPID watchdog fix (#711). That fix makes an orphaned proxy exit so its socket closes and the daemon reaps via the refcount + idle timer. This adds two daemon-side safety nets for the residual case where a socket close is never delivered (a Windows named-pipe hazard) and a phantom client would otherwise pin the daemon forever: - Liveness sweep: a proxy now sends an optional client-hello carrying its pid (+ host pid) right after verifying the daemon hello; the daemon periodically drops any client whose peer process is dead, re-arming the idle timer. Fail-safe and version-pinned — a connection that never sends the hello just falls back to the socket-close lifecycle, and the daemon reads it before the transport so a non-hello first line is handed through untouched. - Inactivity backstop: the daemon exits after a generous no-traffic window (CODEGRAPH_DAEMON_MAX_IDLE_MS, default 30 min) even with clients attached, so a phantom client that sends nothing can't keep it alive. Pure helpers (parseClientHelloLine, peerIsDead) are unit-tested; the full handshake + sweep and the backstop are covered end-to-end in mcp-daemon.test.ts. Validated on a real Windows 11 VM: the sweep reaps a dead-pid client over a named pipe and the backstop fires with a client still connected. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

colbymchenry merged commit 565eb20 into main Jun 6, 2026

colbymchenry deleted the fix/win-ppid-watchdog-leak branch June 6, 2026 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(windows): reap orphaned MCP processes when their parent exits#711

fix(windows): reap orphaned MCP processes when their parent exits#711
colbymchenry merged 1 commit into
mainfrom
fix/win-ppid-watchdog-leak

colbymchenry commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

colbymchenry commented Jun 6, 2026

Problem

Root cause

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant