Skip to content

fix(desktop): reap orphaned agent processes on shutdown and restart#787

Merged
wpfleger96 merged 3 commits into
mainfrom
worktree-wpfleger+fix-orphaned-agent-processes
May 29, 2026
Merged

fix(desktop): reap orphaned agent processes on shutdown and restart#787
wpfleger96 merged 3 commits into
mainfrom
worktree-wpfleger+fix-orphaned-agent-processes

Conversation

@wpfleger96
Copy link
Copy Markdown
Collaborator

@wpfleger96 wpfleger96 commented May 29, 2026

Fix three gaps that leave agent worker processes (goose, sprout-agent, claude-agent-acp, etc.) running after Sprout closes, plus two safety improvements found during review. Affects all agent types and all shutdown paths.

Agent workers are spawned by sprout-acp with process_group(0), putting each worker in its own process group for crash isolation. This means the desktop's kill(-sprout_acp_pgid, sig) never reaches them — cleanup relies on sprout-acp receiving SIGTERM and completing its own graceful shutdown. When the desktop exits before the Tauri RunEvent::Exit handler fires, or sprout-acp is SIGKILL'd by the 2s escalation before finishing its 30s graceful drain, workers are orphaned permanently.

  • Register SIGINT, SIGTERM, and SIGHUP handlers via ctrlc crate ("termination" feature) in lib.rs that call shutdown_managed_agents() before exiting; shutdown_done: Arc<AtomicBool> prevents double-execution with RunEvent
  • Add sweep_system_agent_processes() in runtime.rs that enumerates all user processes via proc_listallpids/proc_pidinfo (macOS) or /proc (Linux), kills any matching KNOWN_AGENT_BINARIES not tracked by the current session — called on both launch and shutdown
  • Set SPROUT_MANAGED_AGENT=1 env var on sprout-acp at spawn time (propagates through full tree: sprout-acp -> goose -> MCP servers); system sweep verifies the marker via KERN_PROCARGS2 (macOS) or /proc/environ (Linux) before killing, so independently-launched agent processes are never touched
  • Fix sigterm_then_sigkill to check process group liveness (kill(-pid, 0)) instead of leader liveness (kill(pid, 0)) so SIGKILL escalation reaches surviving children when the group leader is already dead

Agent workers (goose, sprout-agent, etc.) are spawned in their own
process groups by sprout-acp for crash isolation, making them
unreachable by the desktop's group-kill during shutdown. When the
desktop process is killed by SIGINT/SIGTERM/SIGHUP before the Tauri
RunEvent handler fires, no cleanup runs at all.

Three fixes:

1. Register SIGINT/SIGTERM/SIGHUP handlers that call
   shutdown_managed_agents() before exit, using the existing
   shutdown_done guard to prevent double-execution with RunEvent.

2. Add a system-wide process sweep on launch and shutdown that
   enumerates all user processes via libproc (macOS) or /proc (Linux),
   identifies known agent binaries, and kills orphans not tracked by
   the current session.

3. Attempt group-kill on dead PID-file entries before removing them,
   catching cases where the group leader exited but members survived.
BSDInfo struct had pbi_uid at wrong byte offset (24 vs 20), silently
reading pbi_gid instead — correct to offset 20 with static size assert.
Collapse signal handlers to use ctrlc "termination" feature (covers
SIGINT/SIGTERM/SIGHUP in one call), eliminating raw libc::signal +
polling thread. Log ctrlc::set_handler errors instead of discarding.
Merge orphan + dead-group kill batches into single sigterm_then_sigkill
call to halve worst-case shutdown latency.
@wpfleger96
Copy link
Copy Markdown
Collaborator Author

Old behavior: tons of agent processes left open even while I don't have any windows of Sprout open, some even left orphaned from multiple days ago

 ~/Development/sprout/.claude/worktrees/wpfleger+sprout-agent-hints worktree-wpfleger+sprout-agent-hints = [wt] ps aux | grep -i goose
  wpfleger         72916   0.0  0.0 435587824  18480 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72915   0.0  0.0 435587664  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72914   0.0  0.0 435587344  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72909   0.0  0.0 435587088  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72908   0.0  0.0 435586912  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72907   0.0  0.0 435586864  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72902   0.0  0.0 435587008  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72901   0.0  0.0 435587520  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72900   0.0  0.0 435587776  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72895   0.0  0.0 435587744  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72894   0.0  0.0 435586992  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72893   0.0  0.0 435587008  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72887   0.0  0.0 435587664  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72886   0.0  0.0 435587776  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72885   0.0  0.0 435586896  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72883   0.0  0.0 435586928  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72882   0.0  0.0 435587696  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72881   0.0  0.0 435587456  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72880   0.0  0.0 435587104  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72878   0.0  0.0 435587760  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72879   0.0  0.0 435587376  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72876   0.0  0.0 435587696  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72877   0.0  0.0 435586896  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72875   0.0  0.0 435587088  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72873   0.0  0.0 435587024  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72872   0.0  0.0 435587136  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72871   0.0  0.0 435587232  18496 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72870   0.0  0.0 435587200  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72868   0.0  0.0 435587024  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72869   0.0  0.0 435587568  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72867   0.0  0.0 435587248  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72866   0.0  0.0 435587392  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72865   0.0  0.0 435587488  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72864   0.0  0.0 435587552  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72863   0.0  0.0 435587552  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72862   0.0  0.0 435587376  18464 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72861   0.0  0.0 435587840  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72859   0.0  0.0 435587728  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72860   0.0  0.0 435586832  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72858   0.0  0.0 435587024  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72857   0.0  0.0 435587216  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72856   0.0  0.0 435587120  18464 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72852   0.0  0.0 435587312  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72850   0.0  0.0 435587040  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72851   0.0  0.0 435587632  18464 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72846   0.0  0.0 435587056  18464 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72848   0.0  0.0 435587008  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72847   0.0  0.0 435587824  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72845   0.0  0.0 435587552  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72844   0.0  0.0 435587056  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72843   0.0  0.0 435587328  18464 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72842   0.0  0.0 435587568  18464 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72841   0.0  0.0 435587616  18464 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72840   0.0  0.0 435586848  18464 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72838   0.0  0.0 435587056  18464 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72836   0.0  0.0 435586832  18464 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72837   0.0  0.0 435586928  18464 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72830   0.0  0.0 435587840  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72828   0.0  0.0 435587600  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72829   0.0  0.0 435587728  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72824   0.0  0.0 435587072  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72823   0.0  0.0 435586848  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72822   0.0  0.0 435586864  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72815   0.0  0.0 435587008  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72814   0.0  0.0 435587472  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72813   0.0  0.0 435587136  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72803   0.0  0.0 435587264  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72801   0.0  0.0 435587664  18464 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72802   0.0  0.0 435586944  18496 s003  S     8:42PM   0:00.02 /usr/local/bin/goose acp
  wpfleger         72795   0.0  0.0 435586896  18464 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72793   0.0  0.0 435587056  18496 s003  S     8:42PM   0:00.03 /usr/local/bin/goose acp
  wpfleger         72789   0.0  0.1 435782384  46304 s003  S     8:42PM   0:01.03 /usr/local/bin/goose acp
  wpfleger         84484   0.0  0.1 435782832  44128   ??  S    Tue05PM   0:01.60 /usr/local/bin/goose acp
  wpfleger         60477   0.0  0.0 435587376  17264   ??  S    Tue05PM   0:00.07 /usr/local/bin/goose acp
  wpfleger         60476   0.0  0.0 435587344  17264   ??  S    Tue05PM   0:00.07 /usr/local/bin/goose acp

@wpfleger96
Copy link
Copy Markdown
Collaborator Author

New behavior: all agent processes are reaped when Sprout exits, even using Ctrl+C in "dev mode"

~/Development/sprout/.claude/worktrees/wpfleger+fix-orphaned-agent-processes worktree-wpfleger+fix-orphaned-agent-processes = [wt] ps aux | grep goose
wpfleger         22094   0.0  0.0 435300224   1472 s003  S+   12:45PM   0:00.00 grep goose

this also reaped orphaned agent processes I had when Sprout was launched from a different worktree

@wpfleger96 wpfleger96 marked this pull request as ready for review May 29, 2026 16:46
@wpfleger96 wpfleger96 requested a review from a team as a code owner May 29, 2026 16:46
@wpfleger96
Copy link
Copy Markdown
Collaborator Author

@codex please review

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

@wesbillman
Copy link
Copy Markdown
Collaborator

@codex please review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d81bedbda0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread desktop/src-tauri/src/lib.rs
Comment thread desktop/src-tauri/src/managed_agents/runtime.rs
…esses

The system-wide orphan sweep matched agent binary names + UID but had
no Sprout-specific marker, so independently-launched goose or codex-acp
sessions would be killed on Sprout shutdown. Set SPROUT_MANAGED_AGENT=1
on sprout-acp at spawn time — propagates automatically through the full
tree (sprout-acp → goose → MCP servers) — and verify the marker via
KERN_PROCARGS2 (macOS) or /proc/environ (Linux) before killing.

Also fix sigterm_then_sigkill to check group liveness (kill(-pid, 0))
instead of leader liveness (kill(pid, 0)) so SIGKILL escalation reaches
surviving children when the group leader is already dead.
@wpfleger96 wpfleger96 enabled auto-merge (squash) May 29, 2026 18:15
@wpfleger96 wpfleger96 merged commit 7beb0f8 into main May 29, 2026
15 checks passed
@wpfleger96 wpfleger96 deleted the worktree-wpfleger+fix-orphaned-agent-processes branch May 29, 2026 18:18
tlongwell-block pushed a commit that referenced this pull request May 29, 2026
…787)

Signed-off-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants