Skip to content

fix(mcp): bind OAuth PKCE verifiers to callback state#1666

Merged
dcartertwo merged 6 commits into
mainfrom
fix/mcp-oauth-pkce-state
Jun 3, 2026
Merged

fix(mcp): bind OAuth PKCE verifiers to callback state#1666
dcartertwo merged 6 commits into
mainfrom
fix/mcp-oauth-pkce-state

Conversation

@dcartertwo
Copy link
Copy Markdown
Collaborator

@dcartertwo dcartertwo commented Jun 3, 2026

Problem

DurableObjectOAuthClientProvider stores OAuth state per nonce, but stored the PKCE code_verifier in a single client/server slot.

When multiple OAuth attempts overlap for the same MCP server/client, a callback can arrive for one state while the provider reads the verifier from another attempt. The token exchange then fails with the upstream OAuth error:

Invalid PKCE code_verifier

This can happen when a user retries auth, opens multiple auth popups, completes an older popup, or hits reauth churn.

Fix

Store PKCE verifier data by OAuth callback state instead of by client/server.

The MCP SDK calls saveCodeVerifier(verifier) without passing state, then later calls redirectToAuthorization(authUrl) with an authorization URL that contains both state and code_challenge. The provider now uses that code_challenge as the bridge to bind the saved verifier to the generated OAuth state without changing the MCP SDK provider interface.

Callback handling now runs token exchange in the returned state verifier context, so codeVerifier() resolves the verifier for that exact callback.

Stale/duplicate callbacks are treated idempotently once auth is already accepted or progressing. Their state is consumed to prevent replay, but verifier cleanup is left to the active completion path, direct reconnect cleanup, invalidation, or expiry so stale callbacks cannot delete the verifier currently in use.

Tests

Adds coverage for:

  • resolving PKCE verifiers by OAuth callback state;
  • overlapping OAuth states for the same server/client;
  • stale valid callbacks after auth is already progressing;
  • callbacks that become stale while state validation is pending;
  • stale OAuth error callbacks after auth is accepted/progressing;
  • duplicate callbacks with the same active state;
  • deprecated direct reconnect verifier cleanup;
  • connection state while OAuth token exchange is pending.

Verification:

  • npm --workspace packages/agents run test:workers -- --run src/tests/mcp/create-oauth-provider.test.ts src/tests/mcp/client-manager.test.ts src/tests/mcp/client-connection.test.ts
  • npm run check

Context

This fixes the MCP OAuth failure mode where downstream clients surface Invalid PKCE code_verifier after overlapping or stale auth windows. The manager-driven callback path is now concurrency-safe for overlapping attempts.

The deprecated direct connect(..., { reconnect: { oauthCode } }) path remains best-effort because it does not carry OAuth state, but it now performs verifier cleanup after completion.


Open in Devin Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 3, 2026

🦋 Changeset detected

Latest commit: 2be50a4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +337 to +342
async codeVerifierKeys(clientId: string): Promise<string[]> {
const keys = await this.storage.list({
prefix: this.codeVerifierKey(clientId)
});
return [...keys.keys()];
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 codeVerifierKeys prefix match inadvertently includes challenge keys

codeVerifierKeys uses this.codeVerifierKey(clientId) (which resolves to .../{clientId}/code_verifier) as the prefix for storage.list(). Because code_verifier_challenge starts with code_verifier, this prefix also matches challenge keys stored under .../{clientId}/code_verifier_challenge/{hash}. This means both invalidateCredentials("verifier") at do-oauth-client-provider.ts:309 and the fallback path in deleteCodeVerifier() at do-oauth-client-provider.ts:454 will inadvertently list and delete pending challenge keys from concurrent in-flight OAuth flows — challenge keys that haven't yet been promoted to state-bound keys by redirectToAuthorization. In the deprecated connect() path (client.ts:1037), deleteCodeVerifier is called without runWithCodeVerifierState, so it always hits this fallback, meaning a completing OAuth flow can destroy a concurrent flow's challenge verifier before it is promoted.

Prompt for agents
The bug is in codeVerifierKeys() at do-oauth-client-provider.ts:337-342. It calls storage.list with prefix codeVerifierKey(clientId) which is .../{clientId}/code_verifier. This also matches .../{clientId}/code_verifier_challenge/{hash} keys since code_verifier_challenge starts with code_verifier.

To fix, codeVerifierKeys should enumerate keys more precisely. For example, check for the legacy key explicitly via storage.get, and list only the state-based prefix (stateCodeVerifierPrefix) and optionally the challenge prefix (challengeCodeVerifierPrefix) separately, then combine the results. This avoids the accidental cross-prefix match.

Affected callers:
- invalidateCredentials (scope verifier/all) at line 309 — may want to intentionally include challenge keys here, but should do so explicitly
- deleteCodeVerifier fallback at line 454 — should NOT delete challenge keys from concurrent flows
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@mattzcarey
Copy link
Copy Markdown
Contributor

cant find any issues regarding mcp or oauth spec. clanker came up with these 3 things which are kinda nits but seem nice to fix:

1. Orphaned challenge-key verifiers never self-expire

saveCodeVerifier(v) always writes …/code_verifier_challenge/<challenge>, and redirectToAuthorization only promotes it to …/code_verifier/<nonce> (and deletes the challenge key) when state + code_challenge are present and parsed.serverId === this.serverId. Any flow that starts (saveCodeVerifier) but never cleanly redirects — serverId mismatch, missing params, or redirect never happens — orphans a challenge verifier.

The expiry sweeps don't catch these:

Path Prefix listed Sweeps challenge orphans?
codeVerifier() no-context expiry sweep …/code_verifier/ (trailing slash) No — doesn't match …/code_verifier_challenge/
checkState expired specific state key No
deleteCodeVerifier() / invalidateCredentials("verifier") …/code_verifier (no slash) Yes

So orphaned challenge verifiers have no TTL and are only cleaned by an explicit deleteCodeVerifier/invalidate, which an aborted flow may never reach — they can accumulate unbounded. Low severity (an unredeemed verifier is useless), but it means the 10-min expiry only really covers state-bound verifiers, not orphans. Cheapest fix: opportunistically drop expired challenge entries in saveCodeVerifier, or include the challenge prefix in the no-context expiry sweep.

2. Stale-callback early-return consumes state without checkState

The first early-return in handleCallbackRequest (the !validation.valid branch — ?error=… or no code) on an already-accepted connection calls consumeStaleOAuthState(state) and returns authSuccess: true without ever calling checkState. Not exploitable (nonce is an unguessable nanoid, state keys are namespaced by serverId, nothing new is minted on an already-authenticated connection), but it reports success on a forged callback and deletes a state it never validated. Suggest running checkState on this path too — keep the idempotent success (don't tear down a working connection over a stale popup), but only consumeState when state validates, and console.warn on mismatch. The other two early-returns (post-checkState) are fine.

3. redirectToAuthorization can now throw via getters

The old redirectToAuthorization never touched clientId/serverId; the new one reads both through getters that throw if unset. Unreachable in the SDK flow (saveCodeVerifier dereferences both first, so it'd throw earlier), but it's a new throw-site in a method that previously couldn't fail. Belt-and-suspenders: read this._serverId_/this._clientId_ directly and return early when unset, consistent with the existing fail-soft if (!state || !codeChallenge) return.

Matt Carey added 2 commits June 3, 2026 15:39
Add provider-level unit coverage for the PKCE-verifier-by-callback-state
logic that the manager-driven tests exercise only indirectly:

- redirectToAuthorization ignores cross-server state (binding guard) and
  leaves the verifier orphaned under the challenge key when state or
  code_challenge is absent (fail-soft)
- codeVerifier() with no ALS context throws loudly on multiple pending
  verifiers (regression guard for the original wrong-verifier bug) and
  resolves the sole pending verifier on the deprecated reconnect path
- codeVerifier() inside a state context with no stored verifier throws a
  state-specific error
- checkState() and codeVerifier() delete expired bound state verifiers
- invalidateCredentials("verifier") sweeps all pending verifiers, not a
  single slot

Tests run inside TestOAuthAgent against real DurableObjectStorage.
Strengthen the provider PKCE branch suite:

- Add testRedirectPromotesMatchingServerId: a matching serverId MUST move
  the verifier from the challenge key to the state-nonce key and delete the
  challenge key. This is the positive control that anchors the negative
  binding-guard tests, which would otherwise pass even if promotion were
  broken everywhere.
- Tighten testRedirectWithoutStateOrChallengeKeepsOrphan to also assert that
  no state-nonce key is created, distinguishing a correct early-return from a
  silently broken promotion (the old assertion only checked the pre-existing
  challenge key still existed).

Mutation-verified: disabling promotion fails the positive control; skipping
the expiry deletes fails the two expiry tests.
Copy link
Copy Markdown
Contributor

@mattzcarey mattzcarey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Jun 3, 2026

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1666

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1666

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1666

hono-agents

npm i https://pkg.pr.new/hono-agents@1666

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1666

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1666

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1666

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1666

commit: 2be50a4

@dcartertwo dcartertwo merged commit 01a0b35 into main Jun 3, 2026
4 checks passed
@dcartertwo dcartertwo deleted the fix/mcp-oauth-pkce-state branch June 3, 2026 15:12
@github-actions github-actions Bot mentioned this pull request Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants