Graceful Ctrl-C shutdown for mise dev / dev-all#4855
Conversation
Treat signal-driven shutdown (SIGINT/SIGTERM) as a normal exit across the dev orchestrators and per-service mise tasks so Ctrl-C no longer prints exit 143 / "task failed" / ELIFECYCLE noise. Also detach vite's stdin from the parent TTY in vite-with-traefik.js so vite's readline-based shortcut handler can't emit the "read EIO" stack trace as the terminal tears down. Real crashes still propagate — only INT/TERM are translated to exit 0, and existing EXIT-trap cleanup is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91a24422bc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Preview deploymentsHost Test Results 1 files ±0 1 suites ±0 1h 44m 58s ⏱️ - 1m 12s Results for commit e324125. ± Comparison against earlier commit d9f833e. Realm Server Test Results 1 files ±0 1 suites ±0 8m 13s ⏱️ -33s Results for commit e324125. ± Comparison against earlier commit d9f833e. |
With `shell: true`, the spawned `child` is the intermediate `sh -c` process, and `child.kill(signal)` only signals the shell — leaving the vite grandchild orphaned and still bound to port 4200 if a parent process manager signals just this wrapper instead of sweeping the whole process group. Spawning npx directly makes `child` the npx process, which forwards signals to its vite child. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR improves the developer experience when stopping mise run dev / mise run dev-all (and related service tasks) by making Ctrl-C / signal-driven shutdowns clean and non-erroring, while still allowing genuine failures to surface.
Changes:
- Add
SIGINT/SIGTERMtraps to service-level mise tasks so signal-driven exits don’t show up as task failures (143/130) in mise/run-p. - Update
mise-tasks/devandmise-tasks/dev-allto run cleanup onINT/TERM/HUPand then exit 0, avoiding propagation of signal exit codes from the underlyingwait. - Adjust the host’s Vite launcher to ignore stdin (avoids Vite readline
read EIOnoise) and forward shutdown signals to the Vite child, translating signal exits to 0 to prevent pnpm recursive-run failures during shutdown.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/host/scripts/vite-with-traefik.js | Detaches stdin for Vite and forwards shutdown signals; maps signal exits to 0 for clean pnpm start shutdown. |
| mise-tasks/services/worker-test | Trap INT/TERM to exit 0 to avoid 143/130 shutdown noise. |
| mise-tasks/services/worker-base | Trap INT/TERM to exit 0 to avoid 143/130 shutdown noise. |
| mise-tasks/services/worker | Trap INT/TERM to exit 0 for pipefail pipeline shutdown behavior under the orchestrators. |
| mise-tasks/services/test-realms | Split EXIT cleanup from INT/TERM handling to ensure cleanup runs once while exiting 0 on shutdown signals. |
| mise-tasks/services/realm-server-base | Trap INT/TERM to exit 0 to avoid 143/130 shutdown noise. |
| mise-tasks/services/realm-server | Split EXIT cleanup from INT/TERM handling to ensure cleanup runs once while exiting 0 on shutdown signals. |
| mise-tasks/services/prerender-mgr | Trap INT/TERM to exit 0 to avoid 143/130 shutdown noise. |
| mise-tasks/services/prerender | Trap INT/TERM to exit 0 to avoid 143/130 shutdown noise. |
| mise-tasks/dev-all | Split EXIT cleanup from INT/TERM/HUP handling; cleanup + exit 0 on signal-driven shutdown. |
| mise-tasks/dev | Split EXIT cleanup from INT/TERM/HUP handling; cleanup + exit 0 on signal-driven shutdown. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
`child.on('exit')` previously fell through to `process.exit(code || 0)`
for any signal that wasn't SIGINT/SIGTERM. Since `code` is null when a
process exits via signal, that masked SIGKILL/SIGSEGV/SIGABRT crashes
as clean shutdowns. Translate those into 128+signum so the orchestrator
sees the crash instead of treating it as success.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Several reinforcing fixes so `mise run dev` / `dev-all` Ctrl-C completes
quickly with clean exit codes and without leaking file watchers:
- realm-server (`packages/realm-server/main.ts`): wire SIGTERM/SIGINT to
the same shutdown path as IPC `stop`, and iterate the mounted realms
calling `realm.unsubscribe()` so each NodeAdapter's sane watcher (and
the underlying FSWatcher handles) actually closes. Without this the
process pinned hundreds of FSWatchers until the orchestrator SIGKILL'd
it.
- mise-tasks/dev{,-all}: replace the synchronous cleanup-then-exit
shutdown handler with a fire-and-forget `fast_shutdown_kick` that
SIGTERMs every recorded pgroup and returns. The cleanup guardian
spawned at script start polls the bash PID and finishes the KILL
escalation + `sweep_orphaned_services` after we exit, so mise sees
WIFEXITED(0) instead of WIFSIGNAL'ing us mid-cleanup. Also normalize
`wait $SAT_PID`'s 128+signal return to 0 — under `set -m` Ctrl-C is
delivered to SAT's pgroup, not this bash, so the INT/TERM/HUP trap
never fires and the script would otherwise fall through `exit $?` with
a signal-induced code.
- mise-tasks/lib/dev-common.sh sweep regexes:
- `VITE_SERVE_RE`: drop the absolute-path anchor — pnpm invokes the
wrapper as `node scripts/vite-serve.js` (relative argv), so the old
`${REPO_ROOT_RE}/...` pattern never matched and the sweep silently
skipped the wrapper.
- `VITE_BIN_RE`: drop the trailing `--port 4200`. In local-HTTPS dev
mode the wrapper puts vite on a dynamic internal port and the
dispatcher owns 4200, so the port-pinned pattern missed every vite
process in that mode.
- packages/host/scripts/vite-serve.js: inline `ensure-boxel-ui` via
`execFileSync` so the `start` package script can be a single `node …`
command instead of `pnpm ensure-boxel-ui && node …`. With `&&`, pnpm
ran the script through `sh -c`, and the shell — having no SIGTERM
handler — died via signal on Ctrl-C even though Node exited 0,
surfacing as `[ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL]` and `Command failed
with signal "SIGTERM"`. Removing the `&&` keeps Node as pnpm's direct
child.
- packages/host/scripts/vite-with-traefik.js: exit the wrapper with code
0 immediately on SIGTERM/SIGINT/SIGHUP rather than waiting for the
child to acknowledge. The dev orchestrator gives the process group
~2s of grace before SIGKILL'ing stragglers, so waiting longer than
that for the child gets us SIGKILL'd mid-wait and pnpm reports
`Command failed with signal "SIGTERM"`. The orchestrator's
`sweep_orphaned_services` is the safety net for the abandoned vite
grandchild.
- wtfnode handle dumps: add an opt-in `BOXEL_WTFNODE=1` helper in both
packages (`packages/realm-server/lib/wtfnode-on-signal.ts`,
`packages/host/scripts/wtfnode-on-signal.js`) and wire it into every
node entry point (realm-server main, worker-manager, worker children,
prerender-server, prerender manager-server, vite wrapper). Dumps the
active handles on SIGINT/SIGTERM and again 5s later, so future
shutdown-hang investigations have evidence without ad-hoc edits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous `pnpm ensure-boxel-ui && node scripts/vite-serve.js` ran through `sh -c`, and the shell layer — having no SIGTERM handler — died via signal on Ctrl-C even though Node exited 0. pnpm reported `[ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL]` and `Command failed with signal "SIGTERM"`. vite-serve.js now invokes ensure-boxel-ui inline via execFileSync, so the start script collapses to a single `node …` and Node is pnpm's direct child. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Ctrl-C through the
mise run dev/mise run dev-allstack now lands as a clean WIFEXITED(0) across mise, pnpm, run-p, and every service, with no leaked file watchers, no orphaned vite/worker processes, and no spurious error lines from intermediate wrapper layers.Orchestrators (mise tasks)
mise-tasks/devandmise-tasks/dev-all: signal trap does afast_shutdown_kick— SIGTERM every recorded pgroup and return immediately. The slow TERM→KILL escalation +sweep_orphaned_servicesruns in the background under the cleanup guardian after this bash exits, so mise recordsWIFEXITED(0)instead ofWIFSIGNALEDwhen its per-task grace runs out mid-cleanup.wait "$SAT_PID"'s 128+signal return is normalized to 0. Withset -m, Ctrl-C is delivered to SAT's pgroup — not this bash — so the INT/TERM/HUP trap doesn't fire; the EXIT trap still kicks the guardian and the script exits 0.mise-tasks/services/*scripts trap INT/TERM andexit 0so the 143-on-Ctrl-C from thets-node … | dev-log-tee.shpipeline isn't reported by mise / run-p as a task failure. The two services with existing icon-server EXIT cleanup (realm-server,test-realms) preserve that path; INT/TERM additionally call cleanup once and exit 0.mise-tasks/lib/dev-common.shsweep regexes:VITE_SERVE_REdrops the${REPO_ROOT_RE}/…anchor — pnpm invokes the wrapper asnode scripts/vite-serve.js(relative argv), so the absolute pattern never matched and the sweep silently skipped it.VITE_BIN_REdrops the trailing--port 4200— in local-HTTPS dev mode the wrapper puts vite on a dynamic internal port, so pinning to 4200 missed every vite process.Realm-server
packages/realm-server/main.tswires SIGTERM/SIGINT to the samestopRealmServerpath used by the IPCstopmessage. Process-group sweeps frommise devpreviously had no signal handler and had to escalate to SIGKILL.stopRealmServeriterates the mounted realms and callsrealm.unsubscribe()on each, closing the underlying sane → fs.watch FSWatcher handles. Without this, each realm pinned a watcher and the process couldn't exit naturally — wtfnode dumps showed hundreds of FSWatcher handles after shutdown signal.Host vite wrapper
packages/host/scripts/vite-with-traefik.js:stdio: ['ignore', 'inherit', 'inherit']soprocess.stdin.isTTYis false inside vite, suppressing thebindCLIShortcutsreadline that produced theread EIOstack trace when the parent TTY tore down.shell: truesochildis the npx process, not an intermediatesh -cthat would absorb our forwarded signal alone.[ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL]from pnpm.128 + signuminstead of silently exit 0, so SIGKILL/SIGSEGV/SIGABRT on vite aren't masked as a clean shutdown.packages/host/scripts/vite-serve.jsrunsensure-boxel-uiinline viaexecFileSync, andpackages/host/package.jsonstartcollapses to a singlenode scripts/vite-serve.js. The previouspnpm ensure-boxel-ui && node scripts/vite-serve.jschain forced pnpm to invoke the script throughsh -c, and the shell — having no SIGTERM handler — died via signal on Ctrl-C even though Node exited 0, surfacing as[ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL]andCommand failed with signal "SIGTERM". Removing the&&keeps Node as pnpm's direct child.Observability for future hang investigations
New
BOXEL_WTFNODE=1opt-in helper (packages/realm-server/lib/wtfnode-on-signal.tsandpackages/host/scripts/wtfnode-on-signal.js) dumps the active handles on SIGINT/SIGTERM and again 5 seconds later. Wired into every node entry point (realm-server main, worker-manager, worker children, prerender-server, prerender manager-server, vite wrapper) so future shutdown-hang investigations have evidence without ad-hoc edits. Disabled by default; the runtime cost is oneprocess.onlistener.Real failures still surface
Only INT/TERM/HUP are translated to exit 0. Any non-signal child failure (ts-node faulting on its own, an indexer crash, vite OOM, etc.) still flows through
pipefailand surfaces as a real error.Test plan
mise run dev, wait for the stack to come up, hit a card preview URL to kick off indexing, then Ctrl-C. Expect:exited with 143/ERROR task failed/read EIO/[ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL]/Command failed with signal "SIGTERM"/no exit status.mise run dev-all— same expectations.lsof -i:4200,4201,4202,4210,4211,4221,4222).BOXEL_WTFNODE=1 mise run dev, Ctrl-C and confirm the 5s-later dump shows no FSWatcher / unexpected timer handles for realm-server, worker-manager, etc.kill -KILLa ts-node child mid-run) and confirm the orchestrator still surfaces it as a real failure rather than swallowing.🤖 Generated with Claude Code