Skip to content

fix(training-agent): per-route signing authenticator with isolated replay store#3346

Merged
bokelley merged 1 commit intomainfrom
bokelley/fix-replay-store-3338
Apr 27, 2026
Merged

fix(training-agent): per-route signing authenticator with isolated replay store#3346
bokelley merged 1 commit intomainfrom
bokelley/fix-replay-store-3338

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Closes #3338.

Three coordinated fixes for the replay-store gap

The post-5.21.1 grader run surfaced neg/016-replayed-nonce accepting both submissions of the same (keyid, nonce) pair — a MUST-level RFC 9421 §3.3.2 violation.

  1. buildRequestSigningAuthenticator() now takes a VerifierCapability parameter. Previously hard-coded to getRequestSigningCapability() (default — required_for: []). The strict route was advertising required_for: ['create_media_buy'] in get_adcp_capabilities but enforcing required_for: [] at verification time.

  2. /mcp and /mcp-strict now build their own signing authenticators — distinct lazy singletons, each with its own InMemoryReplayStore. The shared singleton meant a nonce consumed on one route falsely fired request_signature_replayed on the other.

  3. Strict route bound to getStrictRequestSigningCapability() so capability advertisement and enforcement match.

Test gate

Un-skips server/tests/integration/training-agent-strict.test.ts:124 (the test that pinned the expected behavior; was skipped per #3080 with a stale regex). Now passes.

What didn't reproduce

Triage's bug #1 ("bearer evaluated before signing") didn't reproduce against @adcp/client@5.21.1requireSignatureWhenPresent already implements presence-first ordering. Per-route signing-auth instances eliminate any leftover bypass surface regardless.

Test plan

The post-5.21.1 grader run surfaced neg/016-replayed-nonce accepting
both submissions of the same (keyid, nonce) pair on /mcp-strict — a
MUST-level RFC 9421 §3.3.2 violation.

Root cause: /mcp-strict was using the same lazySigningAuth() singleton
as /mcp, so they shared one InMemoryReplayStore. The shared singleton
was also bound to the *default* capability (required_for: []) rather
than the strict one (required_for: ['create_media_buy']) — a quieter
conformance gap that compounded with the replay leak.

Adds buildStrictRequestSigningAuthenticator() in request-signing.ts
(parallel to the existing strict-required and strict-forbidden builders
from #3340), and a matching lazyStrictSigningAuth() in index.ts.
/mcp-strict now binds to its own replay store and the strict capability.

Un-skips the regression test at training-agent-strict.test.ts:124 (was
skipped per #3080 with a stale assertion); regex updated to match the
SDK's current "Signature required for create_media_buy." text.

The triage's bug #1 ("bearer evaluated before signing") didn't reproduce
against @adcp/client@5.21.1 — requireSignatureWhenPresent already
implements presence-first ordering. Per-route signing-auth instances
eliminate any leftover bypass surface regardless.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley bokelley force-pushed the bokelley/fix-replay-store-3338 branch from dbe15d5 to d4fa90c Compare April 27, 2026 11:17
@bokelley bokelley merged commit c10e95d into main Apr 27, 2026
14 checks passed
@bokelley bokelley deleted the bokelley/fix-replay-store-3338 branch April 27, 2026 11:20
bokelley added a commit that referenced this pull request Apr 27, 2026
…s cross-instance replays (#3351)

#3346 closed the cross-route bug but vector 016 still failed in prod
because Fly runs min_machines_running = 2 web machines and the
per-process InMemoryReplayStore can't see across machines. Probe 1
hits machine A and consumes the nonce; probe 2 routes to machine B,
which has never seen the nonce locally, accepts.

Swaps to PostgresReplayStore from @adcp/client/signing/server (5.21.0+,
adcp-client#1018). All instances share one adcp_replay_cache table.

Adds:
- Migration 447_adcp_replay_cache.sql — schema mirrors the SDK's
  getReplayStoreMigration() output: (keyid, scope, nonce) PK +
  expires_at TTL column + two indexes.
- startReplayCacheSweeper() called from index.ts at boot — runs
  sweepExpiredReplays every 60s. Postgres has no native TTL.

Singleton replay store across the per-route authenticators (default /
strict / strict-required / strict-forbidden). The (keyid, scope, nonce)
primary key partitions by route via the @target-uri-derived scope, so
sharing the table is safe.

Closes the remaining piece of #3338. Grader vector 016 should pass
against /mcp-strict once deployed.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

training-agent verifier: replay store doesn't reject duplicate (keyid, nonce) — fails grader neg/016

1 participant