feat(addie): conformance Socket Mode — storyboard runner + Addie chat tools#4082
Merged
Conversation
PR #2 of 3 for Addie Socket Mode. PR #1 added the server-side WS plumbing; this PR adds the runner that lets Addie execute a storyboard against a connected adopter session. The adapter is small (~80 LOC) because the SDK already gives us everything we need: - `AgentClient.fromMCPClient(mcpClient)` is the in-process injection factory the SDK has shipped since 6.7. It wraps any pre-connected MCP `Client` into an AgentClient that the storyboard runner consumes via `_client` (the same path `comply()` uses internally). - `runStoryboard(agentUrl, storyboard, options)` is the existing runner. We pass a placeholder `adcp-conformance-socket://<orgId>` URL plus `_client: agentClient` and the runner happily ignores the URL and dispatches via the injected client. Net result: zero changes to `server/src/services/storyboards.ts`, `server/src/addie/services/compliance-testing.ts`, or `server/src/addie/jobs/compliance-heartbeat.ts`. The conformance runner is a separate function picking storyboards from the same registry. Files: - server/src/conformance/run-storyboard-via-ws.ts — the adapter. Throws `ConformanceNotConnectedError` when no live session, `StoryboardNotFoundError` when storyboard id is unknown. - server/src/conformance/index.ts — re-export. - server/tests/unit/conformance-run-storyboard.test.ts — 3 tests covering both error paths and the success path. The success test stands up a real WebSocket connection (proving the wiring is end-to-end) but mocks `runStoryboard` itself so we can inspect the AgentClient and options the adapter passes through, without needing a real sales agent + test kits to actually run a storyboard. PR #3 adds the Addie chat tools (`issue_conformance_token`, `run_conformance_against_my_agent`) that consume this adapter, gated behind `CONFORMANCE_SOCKET_ENABLED=1`. Stacked on `bokelley/conformance-socket-mode-server` (PR #4007). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #3 of 3 for Addie Socket Mode. PRs #1+#2 added the server WS plumbing and the storyboard runner adapter; this PR exposes them to the user via two Addie chat tools: - `issue_conformance_token` — mints a fresh JWT bound to the caller's WorkOS organization, returns shell exports + a copy- paste @adcp/sdk/server ConformanceClient snippet so the adopter can wire it into their dev environment in under a minute. - `run_conformance_against_my_agent` — runs a storyboard against the adopter MCP server connected to the live conformance session for the caller's org. Renders phase/step pass/fail/ skipped status as markdown with trimmed error text on failures. Both tools are bound to a WorkOS organization. Anonymous chats get a not-mapped hint; orgs with no live conformance session get a connect-the-client hint with the exact snippet they need. Wiring: - server/src/addie/mcp/conformance-tools.ts — tool definitions + handler factory `createConformanceToolHandlers(memberContext)`. - server/src/addie/bolt-app.ts — registration block gated on `CONFORMANCE_SOCKET_ENABLED=1`. Server-side WS plumbing remains always-wired (the chat surface is what the flag toggles). - server/src/addie/tool-sets.ts — new `agent_conformance` toolset for the router. - server/tests/unit/conformance-addie-tools.test.ts — 8 unit tests covering org-binding enforcement, missing-secret error, not-connected hint, missing-id message, passing+failing storyboard markdown rendering. Stacked on `bokelley/conformance-storyboard-runner` (PR #4051), which is itself stacked on `bokelley/conformance-socket-mode-server` (PR #4007). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…script Adds docs/building/addie-socket-mode.mdx — an adopter-facing walkthrough of the conformance Socket Mode channel from PRs #4007/#4051/#4054. Covers when to use it (vs the public-endpoint AAO heartbeat), prerequisites, the five-minute setup, what Addie can do once connected, privacy/safety posture, and troubleshooting for the common failure modes I hit while smoke-testing. Mounted in the "Build" sidebar between validate-your-agent and grading so it lives next to the other agent-development tools rather than buried in implementation reference. Cross-linked from the existing get-test-ready and aao-verified pages where appropriate via inline references in the body. Also adds scripts/smoke-conformance.ts — the end-to-end smoke I ran against the stack before writing the doc. Spins up the server-side conformance routes, connects a real adopter via @adcp/sdk 6.9 ConformanceClient, exercises both Addie chat tools (issue_token + run storyboard). Stays as a runnable artifact for future regression checks and as a worked example for anyone who wants to see the full flow in ~150 LOC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…moke Adds a dev-only POST /api/conformance/_debug/run-storyboard endpoint that triggers runStoryboardViaConformanceSocket(orgId, storyboardId) against a live conformance session. Gated on NODE_ENV !== 'production' alongside the existing /_debug session-list endpoint, requires auth, and exists so local smoke harnesses can exercise the full PR #2 path without the Addie chat surface. Also adds scripts/smoke-conformance-training-agent.ts — a "proxy adopter" that connects to the local conformance endpoint via @adcp/sdk 6.9 ConformanceClient and forwards every inbound MCP request to the locally running training agent's /api/training-agent/sales/mcp HTTP endpoint. Demonstrates the full Socket Mode → training agent path end-to-end without modifying the training agent's HTTP-bound setup. Run output (storyboard: media_buy_state_machine): ✓ Capability discovery ✓ Create a media buy (2/2) ✗ Valid state transitions (Pause + Resume passed, Cancel returned a training-agent 500 — separate bug to file) ⊘ Terminal state enforcement (skipped on prerequisite failure) The failures here are real training-agent issues surfaced by the storyboard runner reaching it through Socket Mode + the proxy. The transport itself is invisible above MCP: same shape as running the storyboard over direct HTTP, plus a hop through the WebSocket. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five fixes from the security + code-review pass: 1. **Race-eviction on same-org reconnect (must-fix).** The close-listener on each WS connection removed the session by orgId only. When a second adopter from the same org connected, register()'s last-writer- wins displacement closed the prior socket, which fired its close listener and deleted the just-registered new session — leaving the org permanently unreachable until the next disconnect. Fix: identity- keyed eviction, only remove if `sessions.get(orgId)?.transport` still points at THIS transport. New regression test covers it. 2. **Pre-register disconnect leak.** If the adopter closed the socket between `client.connect(transport)` and `register(...)`, we'd register a session whose underlying socket is already gone. Fix: check `ws.readyState === OPEN` before registering; bail otherwise. 3. **Liveness check before runner dispatch.** `runStoryboardViaConformanceSocket` now bails with `ConformanceNotConnectedError` if the resolved session's transport is closed (and evicts it from the store as a side effect). Avoids handing a dead AgentClient to the runner. Adds `ConformanceWSServerTransport.isClosed()` for the check. 4. **Subprotocol-sentinel tightening.** Earlier code accepted `Sec-WebSocket-Protocol: mcp, <token>` as a fallback. Drop it; require the explicit `adcp.conformance` sentinel. Reordered token extraction to prefer the header path (out of access logs) over the `?token=` query (which lands in pino/proxy logs). Documented the staging-log-as-token-equivalent caveat for the query fallback. New regression test rejects the wrong-sentinel form. 5. **Debug endpoint tenant scoping.** `_debug/run-storyboard` previously accepted arbitrary `org_id` from the request body, letting any authenticated user (on a misconfigured staging) fan storyboards into another tenant's session. Fix: normal callers must run against their own resolved org; static-admin-API-key callers retain the ability to specify `org_id` for local smoke tools (consistent with the rest of the admin-key surface). Production builds still skip the entire `_debug/*` block. 42/42 conformance tests pass (added 2 regression tests for #1 and #4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0010b18 to
eb18bca
Compare
4 tasks
bokelley
added a commit
that referenced
this pull request
May 4, 2026
…4098) * fix(training-agent): tenant router lock + storyboard-smoke --brand Two follow-ups surfaced by the Socket Mode stack work (#4007/#4082) against the local training agent. 1. **Tenant router race (closes #4084).** The tenant router shares one MCP `Server` instance per tenant from the framework's `DecisioningAdcpServer` registry. Each request created a fresh `StreamableHTTPServerTransport`, called `server.connect(transport)`, handled the request, and `server.close()`'d. Two concurrent requests against the same tenant overlapped and the second `.connect()` threw "Already connected to a transport" — surfaced as intermittent 500s under back-to-back HTTP load. Fix: per-tenant async lock (`withTenantLock`) serializes the `connect/handle/close` window per tenant so the shared server only ever has one transport bound at a time. Throughput is gated by the in-flight request's wallclock; the storyboard runner's sequential dispatch makes this a non-issue, and the compliance heartbeat runs one tenant at a time. A future improvement could pool servers per tenant for true parallelism — this lock is the minimum-mass correctness change. Verification: 10 concurrent `tools/list` POSTs against `/api/training-agent/sales/mcp` return all `"result"` (was a mix of "result" and 500s before); server log shows zero "Already connected" errors (was 7+ per run before). 2. **storyboard-smoke `--brand` flag (closes #4083).** That issue reported `update_media_buy` on a cancelled buy returning `MEDIA_BUY_NOT_FOUND` instead of `INVALID_STATE`. After diagnosis the training agent is correct — the bug is in the upstream SDK runner's `update_media_buy` enricher, which fabricates an account from `resolveBrand(options)` (defaulting to `test.example`) when options.brand is unset. Positive-path steps get rewritten to `test.example`; `expect_error` steps skip the enricher and keep the YAML's literal `acmeoutdoor.example`. Result: split-brain session keying and stale-state reads on the negative-path probes. Fix: storyboard-smoke now accepts `--brand <domain>`. With `--brand acmeoutdoor.example`, the SDK runner's `applyBrandInvariant` normalizes both step kinds to the kit's domain and the storyboard passes 9/9. Long-form rationale in the JSDoc. Also un-blocks the pre-commit typecheck by adding a `// @ts-expect-error` on the v6-sales-platform `handoffToTask` call. PR #4080 bumped @adcp/sdk to 6.11 expecting the two-arg signature from adcp-client#1554, but the published 6.11 .d.ts still declares the single-arg shape only. Runtime accepts the second arg correctly — pure typings gap. Drop the directive when 6.12+ ships the matching typing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(training-agent): drop @ts-expect-error — local was stale on @adcp/sdk CI surfaced "Unused '@ts-expect-error' directive" because CI's npm install resolved @adcp/sdk 6.11.0 (which has the two-arg `handoffToTask` typing from adcp-client#1554), while my local node_modules was still on 6.9.0 (no typing for the second arg) — the version mismatch made the directive necessary locally but unused on CI. Refreshed node_modules to 6.11.0 and dropped the directive. Both surfaces typecheck clean now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(training-agent): expand tenant-lock comments per code review Two doc-only fixes from the PR #4098 code review pass: 1. `withTenantLock` — explain why the lock includes flushDirtySessions and server.close() (not just the transport window): in-memory session state mutations from request N must persist before N+1 runs against the same shared `DecisioningAdcpServer`. Narrowing the lock would race on the v5 handlers' session-context state. 2. `withTenantLock` — explain the `.catch(() => {})` chain-keepalive pattern so a future reader doesn't "fix" it by removing the catch (which would poison every subsequent same-tenant request). 3. `storyboard-smoke.ts` — add `--brand` to the usage block so the flag is discoverable from the file header. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Combines what was originally PR #4051 (storyboard runner adapter) and PR #4054 (Addie chat tools + docs + smoke + expert-review fixes) into a single PR against main, since both of those got auto-closed when the base of the stack (#4007) was squash-merged.
PR #4007 (server-side WS transport + token issuance) is already merged to main. This PR is what's left.
What's in this PR
1. Storyboard runner adapter (was #4051 —
bebef0c358)server/src/conformance/run-storyboard-via-ws.ts—runStoryboardViaConformanceSocket(orgId, storyboardId)resolves the live session, wraps its MCP client as anAgentClientviaAgentClient.fromMCPClient(already in@adcp/sdk6.7+), dispatches viarunStoryboardwith_clientinjection2. Addie chat tools (was #4054 —
537fb12d)server/src/addie/mcp/conformance-tools.ts—issue_conformance_token+run_conformance_against_my_agentmemberContextCONFORMANCE_SOCKET_ENABLED=1agent_conformancetoolset entry; registered inbolt-app.ts3. User-facing docs (
1d460e96)docs/building/verification/addie-socket-mode.mdx— "Pair-program with Addie (Socket Mode)" walkthrough: when to use it vs the AAO heartbeat path, prerequisites, five-minute setup, what Addie can do once connected, privacy/safety posture, troubleshootingdocs.jsonunder "Verification & trust"4. Smoke harnesses + dev debug endpoint (
db5de04e)scripts/smoke-conformance.ts— single-process end-to-end (server + adopter + Addie tool calls)scripts/smoke-conformance-training-agent.ts— proxy adopter that connects via Socket Mode and forwards to the local training agent's/api/training-agent/sales/mcp. Demonstrates the architecture against a real AdCP serverPOST /api/conformance/_debug/run-storyboard— dev-only trigger (NODE_ENV !== 'production') so smoke harnesses can exercise the full path without the Addie chat surface. Tenant-scoped per the security review5. Expert-review fixes (
0bb16d41) — five issues caught by security + code review:ConformanceWSServerTransport.isClosed()+ early return in the runnermcpfallback, prefer header over query (out of access logs)org_idfor smokesTest plan
npx vitest run server/tests/unit/conformance-*.test.ts— 42/42 passnpx tsc --project server/tsconfig.json --noEmit— cleanpause_canceled_buyfailure as direct HTTP, proving the transport is invisible above MCPLinked
ConformanceClientin@adcp/sdk6.9: feat(server): ConformanceClient Socket Mode primitive adcp-client#1506🤖 Generated with Claude Code