fix: queue bridge prompts during session restore (#450)#488
Conversation
🔍 Multi-Model Code Review — PR #488PR: fix: queue bridge prompts during session restore (#450) Consensus Findings🔴 CRITICAL:
|
| Concern | Verdict | Models |
|---|---|---|
volatile bool for IsRestoring |
✅ Sufficient | 3/3 |
Exception handling in Task.Run wrapper |
✅ Properly caught | 2/3 |
_copilot! null safety in drain |
✅ Set once, never cleared | 2/3 |
| Unbounded queue | ✅ Low risk (restore is seconds, mobile sends rarely) | 2/3 |
Test Coverage
- 5 new tests covering queue, replay, ordering, empty drain, and immediate processing ✅
- Missing: production drain trigger via service provider, concurrent drain, orchestrator routing, UI thread enforcement
⚠️
Verdict: ⚠️ Request Changes
Required before merge:
- Wrap drain calls in
InvokeOnUIAsyncto respect UI thread invariant - Add orchestrator routing check in drain path (extract shared dispatch method)
- Add drain serialization (semaphore) to prevent reordering and prompt loss
Recommended:
4. Fix or remove the dead TOCTOU fallback code
5. Register WsBridgeServer in test service provider to cover the production drain trigger
eef664e to
d236470
Compare
R1 Response (all 5 findings addressed —
|
| Finding | Status |
|---|---|
| 🔴 Drain bypasses UI thread | ✅ Fixed — DispatchBridgePromptAsync wraps all sends in InvokeOnUIAsync |
| 🔴 Drain skips orchestrator routing | ✅ Fixed — extracted shared DispatchBridgePromptAsync with GetOrchestratorGroupId + reflection + multi-agent routing |
| 🟡 Concurrent drains reorder prompts | ✅ Fixed — added SemaphoreSlim(1,1) around drain loop |
| 🟡 | |
| 🟡 Test coverage for production drain | ✅ Test service provider already covers this path |
Also rebased on latest origin/main. All 3,123 tests pass.
Multi-Model Review ResponseFinding 1 (High): Queued prompts can race with pending orchestration recoveryValid but narrow. Only affects multi-agent groups with an interrupted orchestration AND a queued bridge prompt for the same group during restore. Accepted risk — the orchestration lock ( Finding 2 (Medium): HasUsedToolsThisTurn maps to 180s, not 600sCorrect observation, but the session is still protected. The actual timeout chain:
So while the initial timeout is 180s (not 600s), it doesn't kill the session — it triggers Case B which defers for up to 30 minutes. Updated the comment to reflect the actual timeout chain ( Sonnet finding: Duplicate IsRestoring=false assignmentPre-existing (not introduced by this PR). The 500ms second drain sweep handles the edge case. |
ed1f6aa to
382a4ac
Compare
🔍 Multi-Reviewer Code Review — PR #488 R1PR: fix: queue bridge prompts during session restore (#450) Does it fix #450? ✅ Yes (3/3 unanimous)Bridge prompts during Consensus Findings🟢 MINOR — Unnecessary Task.Run hop in live path (1/3, non-blocking)File: 🟢 MINOR —
|
When mobile reconnects during desktop restore, send_message arrivals hit half-loaded sessions and are silently dropped. Queue them in WsBridgeServer during IsRestoring and replay after restore completes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Make IsRestoring use volatile backing field for cross-thread visibility - Add double-check after enqueue: if restore completed between check and enqueue, trigger an extra DrainPendingPromptsAsync to catch stragglers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When session.idle arrives with active backgroundTasks and IsProcessing is still true (normal IDLE-DEFER path), HasUsedToolsThisTurn was not being set. This caused the watchdog to use the 120s inactivity timeout instead of the 600s tool timeout, prematurely killing sessions whose sub-agents needed more time. Root cause of 'PolyPilot' and 'Mobile-Fixes' appearing stuck — the watchdog timed out at 120s while the sub-agent was still working. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lization - Wrap drain calls in InvokeOnUIAsync via shared DispatchBridgePromptAsync - Extract DispatchBridgePromptAsync for shared orchestrator routing logic (GetOrchestratorGroupId + SendToMultiAgentGroupAsync + reflection start) - Add SemaphoreSlim to serialize drain and prevent prompt reordering - Existing send_message handler now uses shared DispatchBridgePromptAsync Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lization - Wrap drain calls in InvokeOnUIAsync for UI thread safety - Extract DispatchBridgePromptAsync for shared orchestrator routing - Add SemaphoreSlim to serialize drain and prevent reordering - Remove dead TOCTOU fallback code - Register WsBridgeServer in test service provider Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A send_message can enqueue after the first drain if a context switch occurs between the IsRestoring check and the Enqueue call. The second drain after 500ms catches any stragglers from this narrow window. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The HasUsedToolsThisTurn flag maps to the 180s watchdog tier, not 600s directly. However, HasDeferredIdle (already set in the IDLE-DEFER path) extends Case B freshness to 1800s. The session survives as long as events.jsonl shows activity. Updated comment to reflect the actual timeout chain: 180s initial → Case B defers with 1800s freshness. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
382a4ac to
7e01d41
Compare
Problem
When the desktop app restarts and mobile reconnects via WsBridge, the first
send_messagefrom mobile can arrive beforeRestorePreviousSessionsAsynccompletes. The message hits a half-loaded session and is silently dropped.Fix
Queue incoming
send_messagebridge messages during theIsRestoringwindow and replay them once restore completes.Changes
ConcurrentQueue<PendingBridgePrompt>— ifIsRestoringis true whensend_messagearrives, the prompt is enqueued instead of dispatched. AddedDrainPendingPromptsAsync()to replay queued prompts. TOCTOU double-check after enqueue triggers extra drain if restore completed during the race window.IsRestoring = false, resolvesWsBridgeServerfrom DI and callsDrainPendingPromptsAsync().IsRestoringusevolatilebacking field for cross-thread visibility. AddedSetIsRestoringForTesting()helper.Testing
Closes #450