fix(mcp): bind OAuth PKCE verifiers to callback state#1666
Conversation
🦋 Changeset detectedLatest commit: 2be50a4 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
| async codeVerifierKeys(clientId: string): Promise<string[]> { | ||
| const keys = await this.storage.list({ | ||
| prefix: this.codeVerifierKey(clientId) | ||
| }); | ||
| return [...keys.keys()]; | ||
| } |
There was a problem hiding this comment.
🟡 codeVerifierKeys prefix match inadvertently includes challenge keys
codeVerifierKeys uses this.codeVerifierKey(clientId) (which resolves to .../{clientId}/code_verifier) as the prefix for storage.list(). Because code_verifier_challenge starts with code_verifier, this prefix also matches challenge keys stored under .../{clientId}/code_verifier_challenge/{hash}. This means both invalidateCredentials("verifier") at do-oauth-client-provider.ts:309 and the fallback path in deleteCodeVerifier() at do-oauth-client-provider.ts:454 will inadvertently list and delete pending challenge keys from concurrent in-flight OAuth flows — challenge keys that haven't yet been promoted to state-bound keys by redirectToAuthorization. In the deprecated connect() path (client.ts:1037), deleteCodeVerifier is called without runWithCodeVerifierState, so it always hits this fallback, meaning a completing OAuth flow can destroy a concurrent flow's challenge verifier before it is promoted.
Prompt for agents
The bug is in codeVerifierKeys() at do-oauth-client-provider.ts:337-342. It calls storage.list with prefix codeVerifierKey(clientId) which is .../{clientId}/code_verifier. This also matches .../{clientId}/code_verifier_challenge/{hash} keys since code_verifier_challenge starts with code_verifier.
To fix, codeVerifierKeys should enumerate keys more precisely. For example, check for the legacy key explicitly via storage.get, and list only the state-based prefix (stateCodeVerifierPrefix) and optionally the challenge prefix (challengeCodeVerifierPrefix) separately, then combine the results. This avoids the accidental cross-prefix match.
Affected callers:
- invalidateCredentials (scope verifier/all) at line 309 — may want to intentionally include challenge keys here, but should do so explicitly
- deleteCodeVerifier fallback at line 454 — should NOT delete challenge keys from concurrent flows
Was this helpful? React with 👍 or 👎 to provide feedback.
|
cant find any issues regarding mcp or oauth spec. clanker came up with these 3 things which are kinda nits but seem nice to fix: 1. Orphaned challenge-key verifiers never self-expire
The expiry sweeps don't catch these:
So orphaned challenge verifiers have no TTL and are only cleaned by an explicit 2. Stale-callback early-return consumes The first early-return in 3. The old |
Add provider-level unit coverage for the PKCE-verifier-by-callback-state
logic that the manager-driven tests exercise only indirectly:
- redirectToAuthorization ignores cross-server state (binding guard) and
leaves the verifier orphaned under the challenge key when state or
code_challenge is absent (fail-soft)
- codeVerifier() with no ALS context throws loudly on multiple pending
verifiers (regression guard for the original wrong-verifier bug) and
resolves the sole pending verifier on the deprecated reconnect path
- codeVerifier() inside a state context with no stored verifier throws a
state-specific error
- checkState() and codeVerifier() delete expired bound state verifiers
- invalidateCredentials("verifier") sweeps all pending verifiers, not a
single slot
Tests run inside TestOAuthAgent against real DurableObjectStorage.
Strengthen the provider PKCE branch suite: - Add testRedirectPromotesMatchingServerId: a matching serverId MUST move the verifier from the challenge key to the state-nonce key and delete the challenge key. This is the positive control that anchors the negative binding-guard tests, which would otherwise pass even if promotion were broken everywhere. - Tighten testRedirectWithoutStateOrChallengeKeepsOrphan to also assert that no state-nonce key is created, distinguishing a correct early-return from a silently broken promotion (the old assertion only checked the pre-existing challenge key still existed). Mutation-verified: disabling promotion fails the positive control; skipping the expiry deletes fails the two expiry tests.
agents
@cloudflare/ai-chat
@cloudflare/codemode
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
Problem
DurableObjectOAuthClientProviderstores OAuth state per nonce, but stored the PKCEcode_verifierin a single client/server slot.When multiple OAuth attempts overlap for the same MCP server/client, a callback can arrive for one state while the provider reads the verifier from another attempt. The token exchange then fails with the upstream OAuth error:
This can happen when a user retries auth, opens multiple auth popups, completes an older popup, or hits reauth churn.
Fix
Store PKCE verifier data by OAuth callback state instead of by client/server.
The MCP SDK calls
saveCodeVerifier(verifier)without passingstate, then later callsredirectToAuthorization(authUrl)with an authorization URL that contains bothstateandcode_challenge. The provider now uses thatcode_challengeas the bridge to bind the saved verifier to the generated OAuth state without changing the MCP SDK provider interface.Callback handling now runs token exchange in the returned state verifier context, so
codeVerifier()resolves the verifier for that exact callback.Stale/duplicate callbacks are treated idempotently once auth is already accepted or progressing. Their state is consumed to prevent replay, but verifier cleanup is left to the active completion path, direct reconnect cleanup, invalidation, or expiry so stale callbacks cannot delete the verifier currently in use.
Tests
Adds coverage for:
Verification:
npm --workspace packages/agents run test:workers -- --run src/tests/mcp/create-oauth-provider.test.ts src/tests/mcp/client-manager.test.ts src/tests/mcp/client-connection.test.tsnpm run checkContext
This fixes the MCP OAuth failure mode where downstream clients surface
Invalid PKCE code_verifierafter overlapping or stale auth windows. The manager-driven callback path is now concurrency-safe for overlapping attempts.The deprecated direct
connect(..., { reconnect: { oauthCode } })path remains best-effort because it does not carry OAuth state, but it now performs verifier cleanup after completion.