workaround: avoid SDK cancellation serialization bug#320
Merged
Conversation
Pass CancellationToken.None to SDK SendAsync calls to prevent StreamJsonRpc's StandardCancellationStrategy from attempting to serialize RequestId, which is not included in the SDK's AOT-safe JSON context. ## Root Cause (SDK Bug) When a CancellationToken fires during an SDK call: 1. StreamJsonRpc's StandardCancellationStrategy.CancelOutboundRequest() tries to send a \$/cancelRequest notification 2. This requires serializing StreamJsonRpc.RequestId 3. The SDK's SystemTextJsonFormatter doesn't include RequestId in its TypeInfoResolverChain (ClientJsonContext, TypesJsonContext, etc.) 4. NotSupportedException is thrown, becomes unobserved, kills the JSON-RPC reader loop silently 5. Session gets stuck — no events arrive, watchdog eventually fires ## Why This Workaround Works PolyPilot already handles cancellation at the TCS level: - AbortSessionAsync calls ResponseCompletion.TrySetCanceled() - Watchdog timeout clears IsProcessing state - All abort/error paths properly clean up state The CancellationToken was only used by SDK to send server-side cancellation notifications — which is the broken path. ## Trade-off The server won't receive \$/cancelRequest notifications, so it may continue processing cancelled requests. However: - Local state is correctly cleaned up (good UX) - Watchdog provides safety net - Server resources are wasted but operations complete gracefully ## Investigation 4 sub-agents analyzed: SDK changelogs (no breaking changes), SDK Client.cs (confirms missing RequestId), PolyPilot usage (correct), SDK samples (don't use CancellationToken). All confirmed SDK bug. Fixes #319 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fe42b32 to
366fbe3
Compare
- CRITICAL: Fix missed call site in PolyPilot.Console/Models/AgentSession.cs The Console app was also vulnerable to the RequestId serialization crash. - MODERATE: Add comment about watchdog limitation when SendAsync blocks If the transport write itself blocks (half-open TCP), watchdog completes the TCS but execution never reaches it. Edge-case on mobile/codespace. - MINOR: Add explanatory comment to soft-steer path (line 2702) Functionally safe (implicit CancellationToken.None) but prevents future maintainers from accidentally reintroducing the bug. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion If SendAsync throws (transport error, connection dropped, etc.), execution exits without resetting IsProcessing. Unlike CopilotService, AgentSession has no watchdog — only event handlers reset the flag. If SendAsync throws before sending, no events arrive, and IsProcessing stays true forever → all subsequent calls throw 'already processing'. Fix: Wrap SendAsync and TCS await in try/catch that resets IsProcessing on any exception before re-throwing. Note: The TrySetCanceled() path self-heals because the server continues processing (CancellationToken.None) and eventually fires SessionIdleEvent. The SendAsync exception path does not self-heal without this fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pass \CancellationToken.None\ to SDK \SendAsync\ calls to prevent \StreamJsonRpc's \StandardCancellationStrategy\ from attempting to serialize \RequestId, which is not included in the SDK's AOT-safe JSON context.
Fixes #319
Root Cause (SDK Bug)
When a \CancellationToken\ fires during an SDK call:
Why This Workaround Works
PolyPilot already handles cancellation at the TCS level:
The \CancellationToken\ was only used by SDK to send server-side cancellation notifications — which is the broken path.
Trade-off
The server won't receive $/cancelRequest\ notifications, so it may continue processing cancelled requests. However:
Investigation Summary
4 parallel sub-agents analyzed:
All agents agree: This is an SDK bug, not a PolyPilot defect.
Changes
Testing
SDK Versions Affected
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com