Skip to content

fix(hub): close transport after IO loop to prevent sub-agent process leak#93

Merged
yishuiliunian merged 3 commits intomainfrom
fix/sub-agent-process-leak
Apr 9, 2026
Merged

fix(hub): close transport after IO loop to prevent sub-agent process leak#93
yishuiliunian merged 3 commits intomainfrom
fix/sub-agent-process-leak

Conversation

@yishuiliunian
Copy link
Copy Markdown
Contributor

Summary

  • Sub-agent processes (loopal --serve) hung indefinitely after completion due to a stdin pipe deadlock: the Hub never closed its write end of the child's stdin, so tokio::io::stdin()'s blocking read never received EOF
  • Observed: 37 zombie-like processes (32KB RSS, 23+ hours) under a single parent, confirmed via lsof that pipes were still open
  • Root cause: mutual deadlock — Hub waits for child exit (agent_proc.wait()), child waits for stdin EOF (blocking thread, uncancellable per tokio docs)

Changes

  • finish.rs: Move conn.close() into finish_and_deliver — closes transport writer after emit/unregister/delivery, giving the child EOF on stdin
  • agent_io.rs: Pass conn to finish_and_deliver; remove standalone conn.close() calls from spawn_io_loop and start_agent_io
  • connection.rs: IPC reader task breaks on tx.send() failure instead of silently continuing when incoming channel is dropped
  • process.rs: Add AgentProcess::wait_or_kill(timeout) utility for bounded process reaping (not yet wired into default path)
  • transport_close_test.rs: 5 tests — transport disconnect, EOF propagation, result-before-close ordering, child crash path, registry cleanup
  • reader_exit_test.rs: 1 test — reader task exits and releases Mutex after channel drop

Test plan

  • bazel build //... compiles
  • bazel build //... --config=clippy zero warnings
  • bazel test //... — 50/50 pass
  • CI passes

…ent process leak (#91)

Sub-agent processes (`--serve`) hung indefinitely after completion due to
a stdin pipe deadlock: the Hub never closed its write end, so the child's
`tokio::io::stdin()` blocking read never received EOF.

- Close transport in `finish_and_deliver` after emit/unregister/delivery,
  giving the child EOF on stdin so the process can exit
- Break IPC reader task on `tx.send()` failure instead of silently continuing
- Add `AgentProcess::wait_or_kill(timeout)` utility for bounded reaping
@yishuiliunian yishuiliunian merged commit 4c5f562 into main Apr 9, 2026
5 of 6 checks passed
@yishuiliunian yishuiliunian deleted the fix/sub-agent-process-leak branch April 9, 2026 08:44
yishuiliunian added a commit that referenced this pull request Apr 9, 2026
test_load_settings_empty_json_file_uses_defaults asserted a specific
default model string, but LOOPAL_MODEL set by a parallel test
(test_load_settings_all_env_var_scenarios) could override it. Replace
with a non-empty check — the exact default is already verified in the
serialized env-var test.
yishuiliunian added a commit that referenced this pull request Apr 9, 2026
test_load_settings_empty_json_file_uses_defaults asserted a specific
default model string, but LOOPAL_MODEL set by a parallel test
(test_load_settings_all_env_var_scenarios) could override it. Replace
with a non-empty check — the exact default is already verified in the
serialized env-var test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant