Skip to content

bug: Watchdog not running after session resume — 11-minute stall with no detection #502

@PureWeen

Description

@PureWeen

Problem

After app restart, a resumed session can get stuck with IsProcessing=true and no watchdog running. The session stalls indefinitely until the user manually aborts.

Evidence (2026-04-04)

20:31:51 [RESUME-QUIESCE] 'PolyPilot' — abort + 30s quiescence
20:32:35 [COMPLETE] session completed normally (gen=1)
20:35:11 [SEND] new prompt (gen=2)
20:35:14–20:36:26 — active tool rounds (turn_start/turn_end every ~10s)
20:36:26 [turn_start] — LAST EVENT
  ... 11 minutes of silence, ZERO watchdog entries ...
20:47:01 [ABORT] user manually aborted

No [WATCHDOG] entries for the PolyPilot session between 20:36 and 20:47. The watchdog was not running.

Expected

The watchdog should fire at 120s (no tools) or 180s (HasUsedToolsThisTurn) after the last event, detecting the stall and clearing IsProcessing.

Root Cause (suspected)

The watchdog timer may not be started (or may be killed) during the RESUME-QUIESCE → COMPLETE → new SEND cycle after restart. The StartProcessingWatchdog call in SendPromptAsync may be racing with the quiescence cleanup.

Related but distinct from

Repro

  1. Have PolyPilot running with an active session doing tool calls
  2. Restart the app (relaunch.sh)
  3. Session resumes, starts a new turn
  4. If the CLI stops sending events mid-turn, the watchdog never fires

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions