Bind E2E server lifetime to vitest via kernel pipe-EOF#56
Merged
antoninbas merged 3 commits intomainfrom Apr 20, 2026
Merged
Conversation
Spawn the server with detached: true so it leads its own process group, then signal -pgid on teardown. Previously SIGTERM went to the npx wrapper, which didn't propagate to the inner node child — leaving orphaned tsx server processes on random ports after every e2e run. Also fixes the "close timed out after 10000ms" warning vitest printed at the end of each run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The first attempt (process-group SIGTERM on teardown) handled clean
shutdown but still leaked when vitest itself died hard (SIGKILL, OOM,
runner yanked) — the detached child kept running, reparented to PID 1.
Insert a tiny supervisor between vitest and the server. The supervisor:
- Spawns the server in its own process group
- Polls KNOTES_E2E_ANCHOR_PID (vitest's PID) once a second via
process.kill(pid, 0); on ESRCH, tears the server down
- Forwards SIGTERM/SIGINT/SIGHUP to the server group on clean teardown
Anchor pid is passed via env because intermediate wrappers (npx, the
tsx CLI) exit between vitest and the supervisor, making process.ppid
unreliable.
Verified manually: SIGKILL'ing the vitest process leaves no orphans
within ~5s.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop the supervisor process and put the parent-death detection in the server itself, gated on KNOTES_E2E_WATCH_STDIN=1. The harness wires its own pipe to the server's stdin and holds the write end for the server's lifetime; the kernel closes that write end the moment the harness process dies (clean exit, SIGKILL, OOM), and the server reads EOF and exits. This is the canonical UNIX equivalent of prctl(PR_SET_PDEATHSIG): the binding is enforced by the kernel, not by a polling loop, and the leaf process detects parent death directly with no intermediary that could itself be killed and break the chain. Verified manually: SIGKILL'ing the vitest worker tears the e2e server down within ~5s, with no orphans left behind. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Every prior e2e run was leaving orphaned
tsx src/main.ts server --port <random>processes on the host. I cleared 8 of them while deploying v0.12.0.Two layers of fix:
Spawn the server in its own process group (
detached: true) so we can reliably tear the whole subtree down withprocess.kill(-pgid, sig)on clean teardown. Previously SIGTERM went to the npx wrapper and didn't propagate to the inner node child.Bind the server's lifetime to vitest via stdin-EOF, gated on
KNOTES_E2E_WATCH_STDIN=1. The harness wires its own pipe to the server's stdin; the kernel closes that pipe's write end the moment the harness process dies (clean exit, SIGKILL, OOM), and the server reads EOF and exits.This is the canonical UNIX equivalent of Linux
prctl(PR_SET_PDEATHSIG): the binding is enforced by the kernel, not by a polling loop, and the watchdog lives in the leaf process itself — no intermediary that could be killed and break the chain.The watchdog is a ~10 line opt-in hook in
src/cli/commands/server.ts; the production server never activates it.Also fixes the
close timed out after 10000mswarning vitest printed at the end of each run.Why not a separate supervisor process
An earlier draft of this PR introduced a small supervisor between vitest and the server that polled the anchor PID. It worked, but moved the problem rather than solving it: if the supervisor itself was SIGKILLed, the server orphaned again. The pipe-EOF approach removes that intermediary entirely — there is no Node-side process between vitest's death and the server's exit, just a kernel-managed FD.
Test plan
KNOTES_E2E_WATCH_STDIN=1and< /dev/nullexits immediately on EOF