Problem
After app restart, a resumed session can get stuck with IsProcessing=true and no watchdog running. The session stalls indefinitely until the user manually aborts.
Evidence (2026-04-04)
20:31:51 [RESUME-QUIESCE] 'PolyPilot' — abort + 30s quiescence
20:32:35 [COMPLETE] session completed normally (gen=1)
20:35:11 [SEND] new prompt (gen=2)
20:35:14–20:36:26 — active tool rounds (turn_start/turn_end every ~10s)
20:36:26 [turn_start] — LAST EVENT
... 11 minutes of silence, ZERO watchdog entries ...
20:47:01 [ABORT] user manually aborted
No [WATCHDOG] entries for the PolyPilot session between 20:36 and 20:47. The watchdog was not running.
Expected
The watchdog should fire at 120s (no tools) or 180s (HasUsedToolsThisTurn) after the last event, detecting the stall and clearing IsProcessing.
Root Cause (suspected)
The watchdog timer may not be started (or may be killed) during the RESUME-QUIESCE → COMPLETE → new SEND cycle after restart. The StartProcessingWatchdog call in SendPromptAsync may be racing with the quiescence cleanup.
Related but distinct from
Repro
- Have PolyPilot running with an active session doing tool calls
- Restart the app (
relaunch.sh)
- Session resumes, starts a new turn
- If the CLI stops sending events mid-turn, the watchdog never fires
Problem
After app restart, a resumed session can get stuck with
IsProcessing=trueand no watchdog running. The session stalls indefinitely until the user manually aborts.Evidence (2026-04-04)
No
[WATCHDOG]entries for the PolyPilot session between 20:36 and 20:47. The watchdog was not running.Expected
The watchdog should fire at 120s (no tools) or 180s (HasUsedToolsThisTurn) after the last event, detecting the stall and clearing IsProcessing.
Root Cause (suspected)
The watchdog timer may not be started (or may be killed) during the RESUME-QUIESCE → COMPLETE → new SEND cycle after restart. The
StartProcessingWatchdogcall inSendPromptAsyncmay be racing with the quiescence cleanup.Related but distinct from
Repro
relaunch.sh)