Conversation
When SendPromptAsync detects a connection error and recreates _client, the local `client` variable (captured from GetClientForGroup before the reconnection) still pointed to the old disposed CopilotClient. Subsequent calls to client.ResumeSessionAsync/CreateSessionAsync used the disposed client, throwing 'Client not connected. Call StartAsync() first.' and preventing session recovery. The fix adds `client = _client` after successful client recreation so the resume/create calls use the new, connected client. This was the root cause of multi-agent orchestrator sessions failing to recover after connection drops — the reconnect logic existed but always operated on the stale disposed client reference. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR #325 Review — "fix: update stale client reference after reconnect in SendPromptAsync"Models: claude-opus-4.6 ×2, claude-sonnet-4.6, gemini-3-pro-preview, gpt-5.3-codex Fix AssessmentThe one-line fix Placement is correct: The assignment is after No race condition: Consensus Finding (5/5 models)🟡 MODERATE — New tests do not cover the actual bug path ( All four new tests use If someone removed A meaningful regression test would need a stub Verdict: ✅ Approve (with test coverage note)The runtime fix is correct and minimal. The test gap is noted but not blocking — the reconnect path is inherently hard to unit-test without deeper mocking infrastructure. 5-model consensus review by PR Review Squad |
PR Review: fix: update stale client reference after reconnect in SendPromptAsyncCI: SummaryThe one-line production fix is correct and well-placed. All 5 models independently confirmed: The only consensus finding is in test coverage. Findings🟡 MODERATE —
|
PR Review Squad correctly identified that the Demo-mode tests never reach the reconnect catch block in SendPromptAsync (demo short-circuits at line 2225). If the fix were reverted, those tests would still pass. Replace with source-code structural tests (following the established pattern in MultiAgentRegressionTests) that: - Verify 'client = _client' exists after '_client = CreateClient()' - Verify the refresh occurs BEFORE client.ResumeSessionAsync - Verify the refresh covers the CreateSessionAsync fallback path - Fail immediately if the fix line is removed (verified by experiment) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR #325 Re-Review — Round 2 (post
|
| # | Finding | Status |
|---|---|---|
| 1 | 🟡 Demo-mode tests do not cover the actual bug path | ✅ FIXED — replaced with structural regression guards |
Commit 9642dee0 replaced all 4 Demo-mode tests with structural source-code tests that:
SendPromptAsync_ReconnectPath_RefreshesLocalClientAfterRecreation— readsCopilotService.cs, finds_client = CreateClient(connSettings), verifiesclient = _clientappears within the next 400 chars, and confirms it comes BEFOREclient.ResumeSessionAsyncSendPromptAsync_ReconnectPath_UsesRefreshedClientForCreateSession— verifiesclient.CreateSessionAsyncfallback is also after the refreshIsConnectionError_DetectsOrchestratorDispatchError— tests the exact error string from the bug reportIsConnectionError_DetectsConnectionLostFollowedByNotConnected— tests both phases of the failure pattern
This is the right approach since CopilotClient is a concrete SDK class with no mockable interface. If someone removes client = _client, tests 1 and 2 will fail immediately. Tests 3 and 4 add error detection coverage.
New Issues
None. The structural guard pattern is sound and follows MultiAgentRegressionTests precedent in this codebase.
Verdict: ✅ Approve
Runtime fix is correct (confirmed Round 1, 5/5 models). Test coverage gap is now addressed with structural regression guards. Clean to merge.
5-model consensus re-review by PR Review Squad
PR #325 Re-Review (Round 2) — ✅ Approve5-model parallel review (claude-opus-4.6 ×2, claude-sonnet-4.6 pending, gemini-3-pro-preview, gpt-5.3-codex) Previous Finding StatusFinding 1 [🟡 MODERATE — 5/5 Round 1]: Demo-mode tests don't cover the reconnect path → FIXED The new structural tests in
If someone removes Anchor uniqueness verified: Call sites verified (actual branch):
Note on Model B/D/E FindingsThree models (claude-opus-4.6 B, gemini, gpt-5.3-codex) flagged a 🔴 CRITICAL "compile error: Minor Test Design Note (2/4 models — informational)
Production Fix
Recommended Action: ✅ ApproveRound 1 finding fully addressed. Production fix is correct. Structural test approach is appropriate given |
Bug
Multi-agent orchestrator sessions (e.g. "PR Review Squad-orchestrator") fail to recover after connection drops, showing:
Root Cause
In
SendPromptAsync's reconnect path, a localclientvariable is captured fromGetClientForGroup()before the client is recreated. After the code disposes the old_clientand creates a new one (lines 2408-2413), the localclientstill points to the old disposedCopilotClient. The subsequentclient.ResumeSessionAsync()/client.CreateSessionAsync()calls use the stale reference, throwingInvalidOperationException: Client not connected. Call StartAsync() first.Fix
One-line addition:
client = _client;after successful client recreation, so the resume/create calls use the new connected client.Tests
Added 5 regression tests in
ConnectionRecoveryTests.cs:GetClientForGroup_ReturnsCurrentClient_AfterReconnectClientIsNewIsConnectionError_DetectsOrchestratorDispatchErrorIsConnectionError_DetectsConnectionLostFollowedByNotConnectedSendPromptAsync_DemoMode_ConnectionRecovery_SucceedsWithNewClientAll 2,283 tests pass. Mac Catalyst build succeeds.