fix(desktop): reap orphaned agent processes on shutdown and restart#787
Conversation
Agent workers (goose, sprout-agent, etc.) are spawned in their own process groups by sprout-acp for crash isolation, making them unreachable by the desktop's group-kill during shutdown. When the desktop process is killed by SIGINT/SIGTERM/SIGHUP before the Tauri RunEvent handler fires, no cleanup runs at all. Three fixes: 1. Register SIGINT/SIGTERM/SIGHUP handlers that call shutdown_managed_agents() before exit, using the existing shutdown_done guard to prevent double-execution with RunEvent. 2. Add a system-wide process sweep on launch and shutdown that enumerates all user processes via libproc (macOS) or /proc (Linux), identifies known agent binaries, and kills orphans not tracked by the current session. 3. Attempt group-kill on dead PID-file entries before removing them, catching cases where the group leader exited but members survived.
BSDInfo struct had pbi_uid at wrong byte offset (24 vs 20), silently reading pbi_gid instead — correct to offset 20 with static size assert. Collapse signal handlers to use ctrlc "termination" feature (covers SIGINT/SIGTERM/SIGHUP in one call), eliminating raw libc::signal + polling thread. Log ctrlc::set_handler errors instead of discarding. Merge orphan + dead-group kill batches into single sigterm_then_sigkill call to halve worst-case shutdown latency.
|
Old behavior: tons of agent processes left open even while I don't have any windows of Sprout open, some even left orphaned from multiple days ago |
|
New behavior: all agent processes are reaped when Sprout exits, even using this also reaped orphaned agent processes I had when Sprout was launched from a different worktree |
|
@codex please review |
|
To use Codex here, create a Codex account and connect to github. |
|
@codex please review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d81bedbda0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…esses The system-wide orphan sweep matched agent binary names + UID but had no Sprout-specific marker, so independently-launched goose or codex-acp sessions would be killed on Sprout shutdown. Set SPROUT_MANAGED_AGENT=1 on sprout-acp at spawn time — propagates automatically through the full tree (sprout-acp → goose → MCP servers) — and verify the marker via KERN_PROCARGS2 (macOS) or /proc/environ (Linux) before killing. Also fix sigterm_then_sigkill to check group liveness (kill(-pid, 0)) instead of leader liveness (kill(pid, 0)) so SIGKILL escalation reaches surviving children when the group leader is already dead.
…787) Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Fix three gaps that leave agent worker processes (
goose,sprout-agent,claude-agent-acp, etc.) running after Sprout closes, plus two safety improvements found during review. Affects all agent types and all shutdown paths.Agent workers are spawned by
sprout-acpwithprocess_group(0), putting each worker in its own process group for crash isolation. This means the desktop'skill(-sprout_acp_pgid, sig)never reaches them — cleanup relies onsprout-acpreceivingSIGTERMand completing its own graceful shutdown. When the desktop exits before the TauriRunEvent::Exithandler fires, orsprout-acpisSIGKILL'd by the 2s escalation before finishing its 30s graceful drain, workers are orphaned permanently.SIGINT,SIGTERM, andSIGHUPhandlers viactrlccrate ("termination"feature) inlib.rsthat callshutdown_managed_agents()before exiting;shutdown_done: Arc<AtomicBool>prevents double-execution withRunEventsweep_system_agent_processes()inruntime.rsthat enumerates all user processes viaproc_listallpids/proc_pidinfo(macOS) or/proc(Linux), kills any matchingKNOWN_AGENT_BINARIESnot tracked by the current session — called on both launch and shutdownSPROUT_MANAGED_AGENT=1env var onsprout-acpat spawn time (propagates through full tree: sprout-acp -> goose -> MCP servers); system sweep verifies the marker viaKERN_PROCARGS2(macOS) or/proc/environ(Linux) before killing, so independently-launched agent processes are never touchedsigterm_then_sigkillto check process group liveness (kill(-pid, 0)) instead of leader liveness (kill(pid, 0)) so SIGKILL escalation reaches surviving children when the group leader is already dead