Skip to content

fix(server): self-terminate on stdin EOF to prevent zombie leaders#22

Merged
awdr74100 merged 1 commit into
mainfrom
fix/server-stdin-shutdown
Jun 29, 2026
Merged

fix(server): self-terminate on stdin EOF to prevent zombie leaders#22
awdr74100 merged 1 commit into
mainfrom
fix/server-stdin-shutdown

Conversation

@awdr74100

Copy link
Copy Markdown
Owner

Root cause

An MCP server is spawned over stdio by its client. When the client crashes or is force-closed it may send no signal — it just closes the pipe. index.ts only listened for SIGINT/SIGTERM, and the SDK's stdio transport reacts only to stdin 'data'/'error'never to EOF (onclose fires only on an explicit close()). So an orphaned server lingers, keeps holding the relay port (:3055), and becomes a stale "zombie" leader that serves an old build to the plugin. We hit this live twice: ping reported the new version while the actual work was served by a day-and-a-half-old leader.

Fix — self-terminate on stdin EOF

New wireShutdown (lifecycle.ts) treats stdin 'end'/'close' as a shutdown trigger alongside SIGINT/SIGTERM, running the existing graceful shutdown at most once (idempotent). When the client goes away the server now exits → election promotes a still-alive follower (which is, by construction, a current-build process) → no zombie, and version skew is structurally prevented, not just reported.

Why equal-or-better

  • Purely additive: SIGINT/SIGTERM behavior unchanged; adds the EOF trigger the SDK omits.
  • stdin EOF = the client is unambiguously gone, which is exactly when a stdio server should exit (standard MCP behavior). No new failure mode.
  • Lifecycle wiring extracted to a testable unit (index.ts is an entry point).

Relationship to the version-skew warning (#21)

This is the primary fix — it stops zombies being born. #21's versionSkew is the fallback gauge for the rare residual case (e.g. multiple clients legitimately sharing one leader, or a server hung so hard it never sees the stdin event).

Changed

  • New src/lifecycle.ts (wireShutdown) + test/lifecycle.test.ts (triggers / idempotency / no-fire).
  • index.ts wires it in place of the bare SIGINT/SIGTERM handlers.
  • 708 tests. Gate green: typecheck · lint · format · knip · build · test.

@awdr74100

Copy link
Copy Markdown
Owner Author

Live-verified (beyond the unit tests):

Isolated stdin-EOF test — ran the built server with its stdin closed:
```
$ perl -e 'alarm shift; exec @argv' 8 node packages/mcp/dist/index.mjs < /dev/null; echo $?
[node] became FOLLOWER (leader @ http://127.0.0.1:3055)
[figwright] server 0.2.0 (protocol 0.1.0) ready as follower, follower → http://127.0.0.1:3055
0
```
Exit 0 = it fully booted, then self-terminated on EOF (a hang would have hit the 8s SIGALRM → 142). This also confirms the timing is safe: stdin 'end' is not missed before wireShutdown registers its listener.

No side effects — left no orphan process and didn't steal :3055 (became a follower, exited immediately).

No accumulation — after two /mcp reconnects there is exactly one live server, not a growing pile of zombies.

Closes the loop on the root cause. Safe to merge.

@awdr74100 awdr74100 merged commit 9a8bf2f into main Jun 29, 2026
2 checks passed
@awdr74100 awdr74100 deleted the fix/server-stdin-shutdown branch June 29, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant