Conversation
…on restart Two bugs fixed: 1. False quiescence timeout on resumed sessions: When the app restarts and resumes a session that was mid-turn, the 30s quiescence timeout could fire incorrectly even though the session was genuinely still active on the server. Fix: During restore, check events.jsonl freshness. If recently modified (< 120s), pre-seed HasReceivedEventsSinceResume to bypass the 30s quiescence and use the appropriate longer timeout (120s or 600s). 2. Response content lost between turn_end and session.idle: When the app restarts after assistant.turn_end but before session.idle, accumulated response text in CurrentResponse was never flushed to history/DB. Fix: Call FlushCurrentResponse on AssistantTurnEndEvent so content is persisted at the end of each sub-turn, not just on idle. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Code review findings: - Add dedup guard in FlushCurrentResponse: if last assistant message in History has identical content, skip the add to prevent duplicates when SDK replays events after session resume. - Add malformed JSON test for GetEventsFileRestoreHints - Add FlushCurrentResponse idempotency test - Add content dedup model-level test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- SKILL.md: Add Content Persistence Safety section (turn_end flush, dedup guard, quiescence bypass, zero-idle sessions), update PR count to 8, add missing-flush to recurring mistakes list - copilot-instructions.md: Add 30s quiescence tier with bypass to watchdog section, document turn_end flush in IsProcessing invariant and SDK event flow sections Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SKILL.md changes: - Remove checklist item 10 (internal behavior, not caller requirement) - Remove inaccurate FlushCurrentResponse call sites table - Remove quiescence bypass and zero-idle sections (moved to instructions) - Fix PR count to 7 (not 8, PR #224 is pending) - Keep turn-end flush and dedup guard sections (directly relevant) copilot-instructions.md changes: - Move FlushCurrentResponse paragraph to new Content Persistence section (was incorrectly under IsProcessing Cleanup Invariant) - Integrate SDK Event Flow sub-turn note into item 8 text - Remove GetEventsFileRestoreHints function name (too specific) - Add zero-idle session context to Processing Watchdog section - Fix PR count references Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Move FlushCurrentResponse inside Invoke() lambda in AssistantTurnEndEvent handler to avoid racing with UI thread on List<ChatMessage> (Finding #1) 2. Preserve isRecentlyActive across catch block in GetEventsFileRestoreHints — malformed JSON in a recently-written events.jsonl no longer loses the file-age signal that bypasses the 30s quiescence timeout (Finding #2) 3. Add Debug log when dedup guard fires in FlushCurrentResponse so legitimate content drops are observable in diagnostics (Finding #3) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen
added a commit
that referenced
this pull request
Feb 26, 2026
Three bugs fixed: 1. Sessions stuck forever when SDK sends repeated SessionUsageInfoEvent (e.g., FailedDelegation) without terminal events. Root cause: HandleSessionEvent unconditionally updated LastEventAtTicks for ALL events, including metrics-only events that don't indicate progress. Fix: Skip LastEventAtTicks update for SessionUsageInfoEvent and AssistantUsageEvent. 2. Added WatchdogMaxProcessingTimeSeconds (3600s) as absolute safety net. Even if progress events keep arriving, no turn should run for 60 minutes without user notification. Uses ProcessingStartedAt (set in SendPromptAsync) so it cannot be reset by events. 3. False 'session stuck' warnings after app restart. Root cause: GetEventsFileRestoreHints used WatchdogInactivityTimeoutSeconds (120s) as file age threshold, but tool executions can go 5-10 minutes without events.jsonl writes. Fix: Use WatchdogToolExecutionTimeoutSeconds (600s) threshold. Added ~25 regression guard tests covering every known failure mode from PRs #148, #163, #195, #211, #224, plus invariant tests for the 8 processing state safety invariants. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen
added a commit
that referenced
this pull request
Feb 26, 2026
Three bugs fixed: 1. Sessions stuck forever when SDK sends repeated SessionUsageInfoEvent (e.g., FailedDelegation) without terminal events. Root cause: HandleSessionEvent unconditionally updated LastEventAtTicks for ALL events, including metrics-only events that don't indicate progress. Fix: Skip LastEventAtTicks update for SessionUsageInfoEvent and AssistantUsageEvent. 2. Added WatchdogMaxProcessingTimeSeconds (3600s) as absolute safety net. Even if progress events keep arriving, no turn should run for 60 minutes without user notification. Uses ProcessingStartedAt (set in SendPromptAsync) so it cannot be reset by events. 3. False 'session stuck' warnings after app restart. Root cause: GetEventsFileRestoreHints used WatchdogInactivityTimeoutSeconds (120s) as file age threshold, but tool executions can go 5-10 minutes without events.jsonl writes. Fix: Use WatchdogToolExecutionTimeoutSeconds (600s) threshold. Added ~25 regression guard tests covering every known failure mode from PRs #148, #163, #195, #211, #224, plus invariant tests for the 8 processing state safety invariants. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen
added a commit
that referenced
this pull request
Feb 26, 2026
## Problem Three bugs in the processing watchdog: 1. **Session stuck at 361 minutes** — MergeNET11 session had `IsProcessing=true` for 361m. The SDK was sending repeated `SessionUsageInfoEvent` (FailedDelegation) which reset `LastEventAtTicks` without ever sending a terminal event. The watchdog timer kept resetting and never fired. 2. **No max processing time** — Even with fix #1, there was no absolute ceiling. A session could run forever as long as progress events arrived. 3. **False 'stuck' warnings** — After app restart, sessions got `⚠️ Session appears stuck — no events received for over 30 seconds` even though they were actively working. `GetEventsFileRestoreHints` used 120s threshold, but tool executions can go 5-10 minutes without events.jsonl writes. ## Fix 1. **Gate `LastEventAtTicks` update** — Skip for `SessionUsageInfoEvent` and `AssistantUsageEvent` (metrics-only events that don't indicate turn progress). 2. **Add `WatchdogMaxProcessingTimeSeconds`** (3600s = 60 min) — Absolute safety net using `ProcessingStartedAt`, which cannot be reset by events. 3. **Fix restore hints threshold** — Changed from `WatchdogInactivityTimeoutSeconds` (120s) to `WatchdogToolExecutionTimeoutSeconds` (600s). ## Testing - ~25 new regression guard tests covering every known failure mode from PRs #148, #163, #195, #211, #224 - Invariant tests for INV-1, INV-4, INV-5, INV-6 - End-to-end scenario tests - All 1447 tests pass --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Two bugs when PolyPilot restarts while sessions are actively processing:
1. False 'Session appears stuck' warning (30s quiescence timeout)
When the app restarts and resumes a session that was mid-turn, the 30s quiescence timeout fires even though the session was genuinely still active on the server. The quiescence timeout is designed for sessions that had already finished before the restart — but if the SDK takes >30s to reconnect and start streaming events, the watchdog fires incorrectly.
2. Response content lost between
turn_endandsession.idleWhen the app restarts after
assistant.turn_endbut beforesession.idle, accumulated response text inCurrentResponseis never flushed to history/DB. This caused the ReviewPRs session to lose a 6123-character PR review — the review was generated (visible in events.jsonl) but never displayed becauseCompleteResponse(which flushes content) is only triggered bySessionIdleEvent.Fix
Quiescence bypass for recently-active sessions
GetEventsFileRestoreHints()to check events.jsonl freshness during restoreHasReceivedEventsSinceResume=trueto bypass the 30s quiescenceHasUsedToolsThisTurn=truefor 600s timeoutFlush response on turn_end
FlushCurrentResponse(state)in theAssistantTurnEndEventhandlerCurrentResponseis empty (tool-only sub-turns), it's a no-opTests
RestoreHints_*tests for file freshness, tool detection, stale files, and integration with timeout selectionTurnEndFlush_*tests for content preservation, empty response handling, and deduplication