Propagate user-initiated cancellation across the WithoutCancel boundary#2916
Closed
trungutt wants to merge 4 commits into
Closed
Propagate user-initiated cancellation across the WithoutCancel boundary#2916trungutt wants to merge 4 commits into
trungutt wants to merge 4 commits into
Conversation
Adds a new sub-behavior to the unmanaged OAuth flow that lets the
runtime own PKCE / state / DCR / token exchange while the connected
client acts as a thin courier (open browser, receive deeplink,
forward {code, state}).
Activated by a new server-level flag --mcp-oauth-redirect-uri (also
exposed on the runtime as WithUnmanagedOAuthRedirectURI and on the
RuntimeConfig as MCPOAuthRedirectURI). When set, the runtime:
- generates state + PKCE in-process
- runs DCR if needed (or uses per-toolset explicit credentials)
- builds the full authorize URL with the configured redirect_uri
- emits an elicitation whose Meta carries cagent/authorize_url +
cagent/state alongside the existing auth_server_metadata
- accepts {code, state} as a ResumeElicitation payload, verifies
state in constant time, and exchanges the code at the token
endpoint using the same redirect_uri (RFC 6749 §4.1.3 binding)
- stores the resulting token in the keychain with client_id /
client_secret stamped on for later silent refresh
When the flag is empty, the existing client-driven contract is
preserved verbatim: the elicitation carries only metadata and the
client is expected to reply with {access_token, refresh_token, …}.
The {access_token, …} reply shape is also accepted on the new path
so a client that prefers to do the exchange itself is still free to.
A per-toolset RemoteOAuthConfig.CallbackRedirectURL overrides the
runtime-wide flag for that toolset.
Docker-agent never learns anything about the URL it advertises: it
is opaque, and what happens at the URL (HTML bouncer, OS deeplink,
universal-link claim, …) is the host's concern.
Plumbing:
- cmd/root/flags.go: add --mcp-oauth-redirect-uri
- pkg/config/runtime.go: MCPOAuthRedirectURI field
- pkg/runtime/runtime.go: unmanagedOAuthRedirectURI field +
WithUnmanagedOAuthRedirectURI Opt
- pkg/runtime/loop.go: pass through to ConfigureHandlers
- pkg/server/session_manager.go: forward from runtime config
- pkg/tools/capabilities.go: extend OAuthCapable interface +
ConfigureHandlers
- pkg/tools/mcp/mcp.go, remote.go, session_client.go,
builtin/mcpcatalog: implement the new SetUnmanagedOAuthRedirectURI
- pkg/tools/mcp/oauth.go: rework handleUnmanagedOAuthFlow,
factor out resolveClientCredentials helper and
consumeUnmanagedElicitationReply
- docs/features/remote-mcp/index.md: new 'Unmanaged OAuth flow'
section documenting the meta keys and behavior
Tests: six new oauth_test.go tests covering the drive-flow happy
path (incl. asserting RFC 6749 §4.1.3 redirect_uri parity at /token),
state-mismatch rejection, legacy access-token reply on the new path,
legacy mode shape (no authorize_url emitted), legacy mode rejection
of {code, state}, and the per-toolset > runtime-wide precedence.
Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
Builds on the unmanaged-OAuth drive-flow added in this branch by
giving embedders a second, session-less way to deliver the deeplink
payload to the runtime.
Adds a new HTTP route:
POST /api/mcp-oauth/callback?state=<opaque>
[&code=<opaque>]
[&error=<rfc6749 code>]
[&error_description=<text>]
The parameters mirror what an authorization server sends to a
redirect URI per RFC 6749 §4.1.2. The body is empty; everything
travels in the query string.
A process-wide pending-OAuth registry maps state -> waiting
unmanaged flow. handleUnmanagedOAuthFlow registers its waiter on
entry (drive-flow only) and unregisters on exit. The new route
looks up the waiter by state and hands it the payload through a
single-shot buffered channel. The waiting goroutine then performs
the token exchange and persists the token exactly as before.
The lookup-by-state IS the authentication: only states the runtime
has just generated are present in the registry. Unknown or replayed
states return 404. State values continue to come from GenerateState
(>=128 bits, opaque, base64).
Inside the flow, the elicitation result and the direct callback
race. Whichever arrives first wins; the other is cancelled cleanly
with no further side effects on the connected client.
Strictly additive to the existing session-keyed
POST /api/sessions/:id/elicitation path: both reply shapes
({access_token, ...} legacy and {code, state} drive-flow) still
work, and an embedder may use the elicitation path, the new
callback path, or both. The --mcp-oauth-redirect-uri flag, the
cagent/authorize_url and cagent/state elicitation meta keys, and
the runtime's redirect-URI parity check at /token are unchanged.
Why it matters for embedders:
- No session context required. The embedder forwards an opaque
{state, code} (or {state, error}) and is done. It never needs
to know which session is waiting, or even that the runtime has
sessions at all.
- No state tracking required. All state -> waiter accounting
lives where it belongs: inside the runtime.
- No UI-lifecycle dependency. As long as the runtime is alive
when the deeplink fires, the token is exchanged and stored. An
embedder UI that died between authorize and callback is no
longer fatal.
- Cross-process deeplink handlers can be thin couriers: a
system-wide URL-scheme handler or an OS launcher does not need
a connection to the runtime's elicitation stream at all.
Plumbing:
- pkg/tools/mcp/pending_oauth.go: new file, process-wide
state -> chan registry, exported DeliverPendingOAuthCallback
+ ErrPendingOAuthNoWaiter for the server handler
- pkg/tools/mcp/oauth.go: handleUnmanagedOAuthFlow registers a
waiter, runs the elicitation in a goroutine scoped to a
cancellable child context, and selects between elicitation
result, direct callback, and the outer ctx
- pkg/server/server.go: new mcpOAuthCallback handler at
POST /api/mcp-oauth/callback
Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
clientConnector.Connect detaches the caller's ctx with
context.WithoutCancel before sending the MCP initialize handshake.
The rationale is correct: an MCP toolset's session must outlive any
single request -- otherwise the underlying SSE/stdio connection
tears down when the request that triggered toolset.Start returns,
and a successful OAuth flow gets killed the moment its triggering
request completes.
But that detachment also severs user-initiated cancellation, which
the OAuth flow inside Connect DOES need to observe. Concretely:
handleUnmanagedOAuthFlow blocks on a select waiting for an
elicitation reply or an out-of-band callback. If the embedder
aborts the in-progress agent run while the user is in the OAuth
dialog, the streamCtx the SessionManager derived from the request
ctx gets cancelled -- but every ctx in the chain BELOW Connect's
WithoutCancel boundary stays alive. The OAuth select then blocks
forever, the per-session streaming lock is never released, and the
next user message returns 409 Conflict from RunSession.ErrSessionBusy.
This commit preserves the connection-longevity invariant and
restores user-initiated cancellation by stashing the caller's
original ctx as a value on the detached ctx. The detached ctx
remains the primary driver of the connection's lifetime, and
operations that want to observe user-initiated cancellation can
opt in by reading the parent back out and selecting on its Done
channel alongside the local ctx.
Carrying ctx-as-value is unusual but appropriate here: the parent
is metadata about how the detached ctx was constructed, not data
the request is operating on. The helpers (withCancellableParent /
cancellableParentFromContext) live next to the conector that
creates the relationship so the rationale stays close to the code.
handleUnmanagedOAuthFlow now extends its select with a userCancelCh
case keyed on the parent's Done channel. When the parent is nil
(e.g. unit tests, CLI invocations that don't go through
clientConnector.Connect), userCancelCh stays nil and the select
silently never picks it -- preserving back-compat.
Plumbing:
- pkg/tools/mcp/cancellable_parent.go: new helper file with
withCancellableParent + cancellableParentFromContext,
extensively commented because ctx-as-value is unusual.
- pkg/tools/mcp/mcp.go: clientConnector.Connect captures the
caller's ctx before WithoutCancel and attaches it via
withCancellableParent.
- pkg/tools/mcp/oauth.go: handleUnmanagedOAuthFlow extracts the
parent and adds a fourth select case watching its Done channel;
returns parent.Err() so the agent loop sees a cancellable
error and the SessionManager goroutine's deferred Unlock fires.
Tests:
- cancellable_parent_test.go: round-trip, absent-returns-nil,
nil-parent-no-op, and the key invariant that the local ctx
stays independent of the parent's cancellation (only opt-in
callers see it).
- oauth_test.go: TestUnmanagedOAuthFlow_DriveFlow_AbortsOnParent-
CtxCancellation asserts the new branch fires under the exact
boundary clientConnector.Connect sets up, returns the parent's
ctx error, and skips the token exchange.
Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
CI flags pre-existing lint issues in code introduced by c97124d and a93ae4d (the drive-flow and callback-route commits this PR is stacked on). Fix them here so this PR's CI is green; these fixes are also present on extend-unmanaged-oauth-flow (docker#2896). When that PR merges, this commit will be dropped on rebase. - bodyclose: close response bodies in RoundTrip test goroutines - gci/gofmt: reflow var block in pending_oauth.go - testifylint error-is-as: use assert.ErrorIs instead of assert.True(errors.Is(...)) - unconvert: drop redundant tools.ElicitationAction(...) conversion - unparam: drop unused []byte return from httpDoStatus helper Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
|
❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on
This PR is stacked on top of #2896 (
extend-unmanaged-oauth-flow). The two predecessor commits (c97124d0f,a93ae4d3c) shown in the diff belong to #2896; review only the last commit in this PR. Will be rebased onto a clean main after #2896 (or its replacement) merges.Problem
clientConnector.Connectdetaches the caller'sctxwithcontext.WithoutCancelbefore sending the MCPinitializehandshake. The rationale is correct: an MCP toolset's session must outlive any single request — otherwise the underlying SSE/stdio connection tears down when the request that triggeredtoolset.Startreturns, and a successful OAuth flow gets killed the moment its triggering request completes.But that detachment also severs user-initiated cancellation, which the OAuth flow inside
ConnectDOES need to observe. Concretely:handleUnmanagedOAuthFlowblocks on a select waiting for an elicitation reply or an out-of-band callback. If the embedder aborts the in-progress agent run while the user is in the OAuth dialog, thestreamCtxtheSessionManagerderived from the requestctxgets cancelled — but every ctx in the chain BELOWConnect'sWithoutCancelboundary stays alive. The OAuth select then blocks forever, the per-session streaming lock is never released, and the next user message returns 409 Conflict fromRunSession.ErrSessionBusy.Fix
Preserve the connection-longevity invariant and restore user-initiated cancellation by stashing the caller's original ctx as a value on the detached ctx. The detached ctx remains the primary driver of the connection's lifetime; operations that want to observe user-initiated cancellation can opt in by reading the parent back out and selecting on its
Donechannel alongside the local ctx.Carrying ctx-as-value is unusual but appropriate here: the parent is metadata about how the detached ctx was constructed, not data the request is operating on. The helpers (
withCancellableParent/cancellableParentFromContext) live next to the connector that creates the relationship so the rationale stays close to the code.handleUnmanagedOAuthFlownow extends its select with auserCancelChcase keyed on the parent'sDonechannel. When the parent is nil (e.g. unit tests, CLI invocations that don't go throughclientConnector.Connect),userCancelChstays nil and the select silently never picks it — preserving back-compat.Plumbing
pkg/tools/mcp/cancellable_parent.go— new helper file withwithCancellableParent+cancellableParentFromContext, extensively commented because ctx-as-value is unusual.pkg/tools/mcp/mcp.go—clientConnector.Connectcaptures the caller's ctx beforeWithoutCanceland attaches it viawithCancellableParent.pkg/tools/mcp/oauth.go—handleUnmanagedOAuthFlowextracts the parent and adds a fourth select case watching itsDonechannel; returnsparent.Err()so the agent loop sees a cancellable error and theSessionManagergoroutine's deferredUnlockfires.Tests
cancellable_parent_test.go— round-trip, absent-returns-nil, nil-parent-no-op, and the key invariant that the local ctx stays independent of the parent's cancellation (only opt-in callers see it).oauth_test.go—TestUnmanagedOAuthFlow_DriveFlow_AbortsOnParentCtxCancellationasserts the new branch fires under the exact boundaryclientConnector.Connectsets up, returns the parent's ctx error, and skips the token exchange.