fix(training-agent): per-route signing authenticator with isolated replay store#3346
Merged
fix(training-agent): per-route signing authenticator with isolated replay store#3346
Conversation
The post-5.21.1 grader run surfaced neg/016-replayed-nonce accepting both submissions of the same (keyid, nonce) pair on /mcp-strict — a MUST-level RFC 9421 §3.3.2 violation. Root cause: /mcp-strict was using the same lazySigningAuth() singleton as /mcp, so they shared one InMemoryReplayStore. The shared singleton was also bound to the *default* capability (required_for: []) rather than the strict one (required_for: ['create_media_buy']) — a quieter conformance gap that compounded with the replay leak. Adds buildStrictRequestSigningAuthenticator() in request-signing.ts (parallel to the existing strict-required and strict-forbidden builders from #3340), and a matching lazyStrictSigningAuth() in index.ts. /mcp-strict now binds to its own replay store and the strict capability. Un-skips the regression test at training-agent-strict.test.ts:124 (was skipped per #3080 with a stale assertion); regex updated to match the SDK's current "Signature required for create_media_buy." text. The triage's bug #1 ("bearer evaluated before signing") didn't reproduce against @adcp/client@5.21.1 — requireSignatureWhenPresent already implements presence-first ordering. Per-route signing-auth instances eliminate any leftover bypass surface regardless. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dbe15d5 to
d4fa90c
Compare
This was referenced Apr 27, 2026
bokelley
added a commit
that referenced
this pull request
Apr 27, 2026
…s cross-instance replays (#3351) #3346 closed the cross-route bug but vector 016 still failed in prod because Fly runs min_machines_running = 2 web machines and the per-process InMemoryReplayStore can't see across machines. Probe 1 hits machine A and consumes the nonce; probe 2 routes to machine B, which has never seen the nonce locally, accepts. Swaps to PostgresReplayStore from @adcp/client/signing/server (5.21.0+, adcp-client#1018). All instances share one adcp_replay_cache table. Adds: - Migration 447_adcp_replay_cache.sql — schema mirrors the SDK's getReplayStoreMigration() output: (keyid, scope, nonce) PK + expires_at TTL column + two indexes. - startReplayCacheSweeper() called from index.ts at boot — runs sweepExpiredReplays every 60s. Postgres has no native TTL. Singleton replay store across the per-route authenticators (default / strict / strict-required / strict-forbidden). The (keyid, scope, nonce) primary key partitions by route via the @target-uri-derived scope, so sharing the table is safe. Closes the remaining piece of #3338. Grader vector 016 should pass against /mcp-strict once deployed. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3338.
Three coordinated fixes for the replay-store gap
The post-5.21.1 grader run surfaced
neg/016-replayed-nonceaccepting both submissions of the same(keyid, nonce)pair — a MUST-level RFC 9421 §3.3.2 violation.buildRequestSigningAuthenticator()now takes aVerifierCapabilityparameter. Previously hard-coded togetRequestSigningCapability()(default —required_for: []). The strict route was advertisingrequired_for: ['create_media_buy']inget_adcp_capabilitiesbut enforcingrequired_for: []at verification time./mcpand/mcp-strictnow build their own signing authenticators — distinct lazy singletons, each with its ownInMemoryReplayStore. The shared singleton meant a nonce consumed on one route falsely firedrequest_signature_replayedon the other.Strict route bound to
getStrictRequestSigningCapability()so capability advertisement and enforcement match.Test gate
Un-skips
server/tests/integration/training-agent-strict.test.ts:124(the test that pinned the expected behavior; was skipped per #3080 with a stale regex). Now passes.What didn't reproduce
Triage's bug #1 ("bearer evaluated before signing") didn't reproduce against
@adcp/client@5.21.1—requireSignatureWhenPresentalready implements presence-first ordering. Per-route signing-auth instances eliminate any leftover bypass surface regardless.Test plan
tsc --noEmitcleanserver/tests/integration/training-agent-strict.test.ts8/8 pass (was 7/8 with one skip)adcp grade request-signing https://agenticadvertising.org/api/training-agent/mcp-strict --transport mcp --skip-rate-abuse→ expect 31/33 pass (016 + the two capability-semantics-only failures from training-agent verifier: covers_content_digest='either' fails grader neg/007 + neg/018 — clarify grader vs. capability semantics #3339, fixed by feat(training-agent): add /mcp-strict-required and /mcp-strict-forbidden conformance routes #3340)