Skip to content

bug(subagent): worktree not removed when run_agent_loop panics #4702

@bug-ops

Description

@bug-ops

Description

In crates/zeph-subagent/src/manager.rs:1271–1279, when a worktree is allocated for a sub-agent,
the cleanup sequence is:

let result = run_agent_loop(agent_loop_args).await;
drop(guard);   // cwd restored, mutex released
wm.remove(&handle, prune_branch_on_remove).await  // stale if run_agent_loop panics

If run_agent_loop panics (async task unwind), guard drops via RAII (cwd restored, mutex released — correct),
but wm.remove(...) is never reached. The worktree directory is left on disk indefinitely.

Recovery happens at the next agent startup via reconcile(), but if the process never restarts
cleanly the stale worktree accumulates disk space.

Reproduction Steps

  1. Configure worktree.enabled = true.
  2. Spawn a sub-agent with permissions_worktree = true.
  3. Trigger a panic inside run_agent_loop (e.g. via debug assert or injected panic in tests).
  4. Observe the worktree directory remains on disk after the task completes.

Expected Behavior

Worktree is removed on both normal and panic paths.

Actual Behavior

Worktree is not removed when run_agent_loop panics; cleanup only runs on the happy path.

Environment

  • Crate: zeph-subagent
  • File: crates/zeph-subagent/src/manager.rs:1271–1279

Fix

Move wm.remove into a RAII scopeguard or a dedicated cleanup struct so it runs on both normal
return and panic unwind:

let _cleanup = scopeguard::defer(|| {
    // spawn a blocking task or call synchronous remove
});

Alternatively, wrap the body in std::panic::AssertUnwindSafe + catch_unwind, remove, then
propagate the panic.

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexitybugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions