Conversation
…s cross-instance replays #3346 closed the cross-route bug but vector 016 still failed in prod because Fly runs min_machines_running = 2 web machines and the per-process InMemoryReplayStore can't see across machines. Probe 1 hits machine A and consumes the nonce; probe 2 routes to machine B, which has never seen the nonce locally, accepts. Swaps to PostgresReplayStore from @adcp/client/signing/server (5.21.0+, adcp-client#1018). All instances share one adcp_replay_cache table. Adds: - Migration 447_adcp_replay_cache.sql — schema mirrors the SDK's getReplayStoreMigration() output: (keyid, scope, nonce) PK + expires_at TTL column + two indexes. - startReplayCacheSweeper() called from index.ts at boot — runs sweepExpiredReplays every 60s. Postgres has no native TTL. Singleton replay store across the per-route authenticators (default / strict / strict-required / strict-forbidden). The (keyid, scope, nonce) primary key partitions by route via the @target-uri-derived scope, so sharing the table is safe. Closes the remaining piece of #3338. Grader vector 016 should pass against /mcp-strict once deployed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Post-deploy verification — completed. Ran the full grader against Specifically:
The two remaining Net: 33/33 graded vectors pass across the four strict-mode routes, closing #3338. |
|
Verification noted — 33/33 graded vectors passing across all four strict-mode routes, #3338 closed. Thanks for the thorough post-deploy rundown. Generated by Claude Code |
|
Glad the verification landed cleanly — 33/33 across all four routes is a solid close on #3338. Nothing further from me here. Generated by Claude Code |
|
Acknowledged — 33/33 is a clean close. Nothing further needed here either. Generated by Claude Code |
|
Noted — closed cleanly, nothing queued on my end either. Generated by Claude Code |
…ol (#3379) #3351 swapped to PostgresReplayStore to close the cross-instance replay gap (#3338). Worked in production but broke the storyboard runner: CI runs the full server in-process without initializing a Postgres pool, and getReplayStore() was unconditionally calling getPool() which throws "Database not initialized." Symptom (#3373 storyboards CI): signed_requests ✗ 3P / 28F / 9S (every positive vector returned 401 because PostgresReplayStore .insert rejected on the unavailable pool, verifier failed closed) Fix: - getReplayStore() falls back to InMemoryReplayStore when getPool() throws — gated on NODE_ENV !== 'production' so a misconfigured prod still fails loudly (preserves the multi-instance protection). - startReplayCacheSweeper() is a silent no-op when no pool is initialized. - resetRequestSigning() now also clears the cached _replayStore so test suites that swap process state stay coherent. Verified locally: signed_requests ✓ 31P / 9S / 0N/A (recovered) Production unaffected — prod always has a pool, so PostgresReplayStore is used and cross-instance replay protection holds. Confirmed via adcp grade --only 016-replayed-nonce → still PASS. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uthenticator The comment still referenced per-authenticator InMemoryReplayStore as the isolation mechanism. Since #3351, production isolation comes from the @target-uri-derived scope column of the Postgres singleton. https://claude.ai/code/session_015vh3dBv4iAKcoWj4a16ygU
What and why
#3346 closed the cross-route replay bug but
adcp grade request-signingvector 016 (replayed-nonce) still failed in prod. Root cause: Fly runsmin_machines_running = 2web machines and the per-processInMemoryReplayStorecan't see across machines. Probe 1 hits machine A and consumes the nonce; probe 2 gets routed to machine B by Fly's LB, machine B has never seen the nonce locally, accepts.Swaps to
PostgresReplayStorefrom@adcp/client/signing/server(shipped in adcp-client#1018, 5.21.0+). All instances share oneadcp_replay_cachetable; the (keyid, scope, nonce) primary key serializes concurrent inserts andON CONFLICT DO NOTHINGreturns'replayed'on the second submission.What's in this PR
(keyid, scope, nonce)PK +expires_atTTL column + two supporting indexes. Schema mirrors the SDK'sgetReplayStoreMigration()exactly.getReplayStore()singleton inrequest-signing.tsshared across all per-route authenticators (default / strict / strict-required / strict-forbidden). Safe to share because(keyid, scope, nonce)already partitions by route via the@target-uri-derived scope.startReplayCacheSweeper()wired inindex.tsboot path — runssweepExpiredReplays(pool)every 60s with an unref'd interval (won't keep the loop alive on shutdown). Postgres has no native TTL; without sweeping the table grows unboundedly.Test plan
tsc --noEmitcleanserver/tests/integration/training-agent-strict.test.ts15/15 pass (no regression)npx adcp grade request-signing https://agenticadvertising.org/api/training-agent/mcp-strict --transport mcp --skip-rate-abuse --only 016-replayed-nonce→ expect PASSCloses
#3338 (the remaining cross-instance piece). #3346 closed the cross-route piece earlier.
Sibling work
Upstream adcp-client#1018 is the SDK side that made this trivial — three months of in-flight work landed in 5.21.0 just before we needed it.