Skip to content

fix: skip provider sessions during SDK reconnect#569

Merged
PureWeen merged 2 commits intomainfrom
fix/provider-session-reconnect-recovery
Apr 8, 2026
Merged

fix: skip provider sessions during SDK reconnect#569
PureWeen merged 2 commits intomainfrom
fix/provider-session-reconnect-recovery

Conversation

@Redth
Copy link
Copy Markdown
Collaborator

@Redth Redth commented Apr 8, 2026

Summary

  • skip provider and other virtual sessions when recreating the shared Copilot SDK client after a JSON-RPC disconnect
  • avoid invalid session.resume calls against synthetic provider session IDs like provider:papilot:leader
  • add a regression guard covering the sibling reconnect path

Why

When the Copilot CLI transport dropped, PolyPilot tried to re-resume Papilot provider sessions using their persistence-only IDs. Those are not real CLI session IDs, so reconnect recovery could cascade into more errors and leave provider chats looking broken.

Testing

  • /opt/homebrew/bin/dotnet test PolyPilot.Tests/PolyPilot.Tests.csproj --no-restore --filter "ConnectionRecoveryTests"

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Redth Redth requested a review from PureWeen April 8, 2026 16:15
@PureWeen
Copy link
Copy Markdown
Owner

PureWeen commented Apr 8, 2026

🔍 Multi-Model Code Review — PR #569

PR: fix: skip provider sessions during SDK reconnect
Branch: fix/provider-session-reconnect-recoverymain
Changes: +30 / -0 across 2 files (1 production, 1 test)
CI Status: ⚠️ No CI checks reported on this branch
Prior Reviews: None


🟢 MINOR — Structural test could also assert guard ordering relative to ForceCompleteProcessingAsync

File: PolyPilot.Tests/ConnectionRecoveryTests.cs:287-290
Flagged by: 1/3 reviewers (production code verified correct by another reviewer)

The test asserts the provider-skip guard appears before ResumeSessionAsync, but not before ForceCompleteProcessingAsync. The production ordering is correct today — the guard is inserted before the IsProcessing check that gates ForceCompleteProcessingAsync. But the test wouldn't catch a future refactor that moves the guard between those two calls.

Suggested improvement:

var forceCompleteIndex = source.IndexOf("ForceCompleteProcessingAsync(kvp.Key", loopIndex, StringComparison.Ordinal);
Assert.True(skipIndex < forceCompleteIndex,
    "Provider-session skip must appear before ForceCompleteProcessingAsync");

Not blocking — this is a test hardening suggestion, not a production bug.


✅ Verified Correct

  • Guard completenessSession == null catches all non-SDK session types (provider leaders/members, demo/remote/lazy-resume placeholders). IsProviderSession(kvp.Key) is defense-in-depth for any future case where a provider session gets a non-null Session.
  • Ordering — guard runs after string.IsNullOrEmpty(SessionId) (provider sessions have non-empty synthetic IDs like "provider:{id}:leader" so they pass that check), and before both the IsProcessing/ForceCompleteProcessingAsync block and ResumeSessionAsync.
  • Provider Session is always null — provider sessions are created with Session = null! (Providers.cs:129, 382) and no code path assigns a real CopilotSession afterward.
  • Debug logging — correctly logs skipped sessions for diagnostics.
  • Test quality — structural regression guard follows established ConnectionRecoveryTests patterns with clear ordering assertions.

🏁 Recommendation: ✅ Approve

Clean, minimal, well-scoped fix. The guard correctly prevents invalid session.resume calls against synthetic provider session IDs during SDK reconnect recovery.

…essingAsync

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen merged commit ee5b271 into main Apr 8, 2026
@PureWeen PureWeen deleted the fix/provider-session-reconnect-recovery branch April 8, 2026 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants