feat: never leave child processes alive after the parent#1886
feat: never leave child processes alive after the parent#1886paul-nechifor merged 1 commit intodevfrom
Conversation
Greptile SummaryThis PR introduces a three-layer orphan-process prevention system: every child process inherits a
Confidence Score: 4/5Safe to merge for most deployments, but the PID-reuse infinite-loop in One P1 finding:
Important Files Changed
Sequence DiagramsequenceDiagram
participant CLI as dimos run (main)
participant Coord as ModuleCoordinator(workers)
participant WD as watchdog_main sidecar
participant Reg as RunRegistry
CLI->>CLI: os.environ[DIMOS_RUN_ID_ENV] = run_id
CLI->>Coord: ModuleCoordinator.build(blueprint)
CLI->>Reg: entry.save()
CLI->>WD: spawn_watchdog(run_id) — no DIMOS_RUN_ID_ENV in env
activate WD
WD->>WD: wait_for_pid_exit(main_pid)
Note over CLI,Coord: Normal or signal-triggered shutdown
CLI->>Coord: coordinator.stop()
CLI->>CLI: kill_run_processes(run_id) [excludes self]
CLI->>Reg: entry.remove()
CLI->>CLI: sys.exit(0) / process ends
WD->>WD: PID gone, sleep 0.5 s grace
WD->>WD: kill_run_processes(run_id) — sweeps any survivors
deactivate WD
Note over Reg: Next dimos run calls cleanup_stale() as third-line safety net
|
leshy
left a comment
There was a problem hiding this comment.
looks good, is it hard to give watchdog voice in the logs?
I'd like to get a warning when it's killing some runaway children, what do you think?
Problem
Certain child processes might not get shut down correctly, especially processes started by modules.
Closes DIM-XXX
Solution
DIMOS_RUN_IDenv var for them.Breaking Changes
None.
How to Test
Contributor License Agreement