Skip to content

feat(addie): conformance Socket Mode — storyboard runner + Addie chat tools#4082

Merged
bokelley merged 5 commits into
mainfrom
bokelley/conformance-addie-tools
May 4, 2026
Merged

feat(addie): conformance Socket Mode — storyboard runner + Addie chat tools#4082
bokelley merged 5 commits into
mainfrom
bokelley/conformance-addie-tools

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented May 4, 2026

Summary

Combines what was originally PR #4051 (storyboard runner adapter) and PR #4054 (Addie chat tools + docs + smoke + expert-review fixes) into a single PR against main, since both of those got auto-closed when the base of the stack (#4007) was squash-merged.

PR #4007 (server-side WS transport + token issuance) is already merged to main. This PR is what's left.

What's in this PR

1. Storyboard runner adapter (was #4051bebef0c358)

  • server/src/conformance/run-storyboard-via-ws.tsrunStoryboardViaConformanceSocket(orgId, storyboardId) resolves the live session, wraps its MCP client as an AgentClient via AgentClient.fromMCPClient (already in @adcp/sdk 6.7+), dispatches via runStoryboard with _client injection
  • 3 unit tests covering both error paths (no session / unknown storyboard) and the success path

2. Addie chat tools (was #4054537fb12d)

  • server/src/addie/mcp/conformance-tools.tsissue_conformance_token + run_conformance_against_my_agent
  • Bound to caller's WorkOS organization via memberContext
  • Gated on CONFORMANCE_SOCKET_ENABLED=1
  • New agent_conformance toolset entry; registered in bolt-app.ts

3. User-facing docs (1d460e96)

  • docs/building/verification/addie-socket-mode.mdx — "Pair-program with Addie (Socket Mode)" walkthrough: when to use it vs the AAO heartbeat path, prerequisites, five-minute setup, what Addie can do once connected, privacy/safety posture, troubleshooting
  • Wired into docs.json under "Verification & trust"

4. Smoke harnesses + dev debug endpoint (db5de04e)

  • scripts/smoke-conformance.ts — single-process end-to-end (server + adopter + Addie tool calls)
  • scripts/smoke-conformance-training-agent.ts — proxy adopter that connects via Socket Mode and forwards to the local training agent's /api/training-agent/sales/mcp. Demonstrates the architecture against a real AdCP server
  • POST /api/conformance/_debug/run-storyboard — dev-only trigger (NODE_ENV !== 'production') so smoke harnesses can exercise the full path without the Addie chat surface. Tenant-scoped per the security review

5. Expert-review fixes (0bb16d41) — five issues caught by security + code review:

  • Race-eviction on same-org reconnect (must-fix) — close listener now identity-keyed
  • Pre-register disconnect leak — bail if WS closed during MCP initialize
  • Liveness check before runner dispatch — ConformanceWSServerTransport.isClosed() + early return in the runner
  • Subprotocol-sentinel tightening — drop the mcp fallback, prefer header over query (out of access logs)
  • Debug endpoint tenant scoping — normal callers must run against own resolved org; static-admin-key retains body-org_id for smokes
  • 2 new regression tests for the race + sentinel paths

Test plan

  • CI green
  • npx vitest run server/tests/unit/conformance-*.test.ts — 42/42 pass
  • npx tsc --project server/tsconfig.json --noEmit — clean
  • Live smoke against local training agent — runs storyboard end-to-end via Socket Mode, surfaces same pause_canceled_buy failure as direct HTTP, proving the transport is invisible above MCP
  • Security review — no must-fix items remaining; should-fix items addressed
  • Code review — duplicate-connect race fixed; should-fix items addressed

Linked

🤖 Generated with Claude Code

bokelley and others added 5 commits May 4, 2026 10:51
PR #2 of 3 for Addie Socket Mode. PR #1 added the server-side WS
plumbing; this PR adds the runner that lets Addie execute a
storyboard against a connected adopter session.

The adapter is small (~80 LOC) because the SDK already gives us
everything we need:

- `AgentClient.fromMCPClient(mcpClient)` is the in-process injection
  factory the SDK has shipped since 6.7. It wraps any pre-connected
  MCP `Client` into an AgentClient that the storyboard runner
  consumes via `_client` (the same path `comply()` uses internally).
- `runStoryboard(agentUrl, storyboard, options)` is the existing
  runner. We pass a placeholder `adcp-conformance-socket://<orgId>`
  URL plus `_client: agentClient` and the runner happily ignores
  the URL and dispatches via the injected client.

Net result: zero changes to `server/src/services/storyboards.ts`,
`server/src/addie/services/compliance-testing.ts`, or
`server/src/addie/jobs/compliance-heartbeat.ts`. The conformance
runner is a separate function picking storyboards from the same
registry.

Files:

- server/src/conformance/run-storyboard-via-ws.ts — the adapter.
  Throws `ConformanceNotConnectedError` when no live session,
  `StoryboardNotFoundError` when storyboard id is unknown.
- server/src/conformance/index.ts — re-export.
- server/tests/unit/conformance-run-storyboard.test.ts — 3 tests
  covering both error paths and the success path. The success
  test stands up a real WebSocket connection (proving the wiring
  is end-to-end) but mocks `runStoryboard` itself so we can
  inspect the AgentClient and options the adapter passes
  through, without needing a real sales agent + test kits to
  actually run a storyboard.

PR #3 adds the Addie chat tools (`issue_conformance_token`,
`run_conformance_against_my_agent`) that consume this adapter,
gated behind `CONFORMANCE_SOCKET_ENABLED=1`.

Stacked on `bokelley/conformance-socket-mode-server` (PR #4007).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #3 of 3 for Addie Socket Mode. PRs #1+#2 added the server WS
plumbing and the storyboard runner adapter; this PR exposes them
to the user via two Addie chat tools:

- `issue_conformance_token` — mints a fresh JWT bound to the
  caller's WorkOS organization, returns shell exports + a copy-
  paste @adcp/sdk/server ConformanceClient snippet so the adopter
  can wire it into their dev environment in under a minute.

- `run_conformance_against_my_agent` — runs a storyboard against
  the adopter MCP server connected to the live conformance
  session for the caller's org. Renders phase/step pass/fail/
  skipped status as markdown with trimmed error text on failures.

Both tools are bound to a WorkOS organization. Anonymous chats
get a not-mapped hint; orgs with no live conformance session get
a connect-the-client hint with the exact snippet they need.

Wiring:

- server/src/addie/mcp/conformance-tools.ts — tool definitions +
  handler factory `createConformanceToolHandlers(memberContext)`.
- server/src/addie/bolt-app.ts — registration block gated on
  `CONFORMANCE_SOCKET_ENABLED=1`. Server-side WS plumbing remains
  always-wired (the chat surface is what the flag toggles).
- server/src/addie/tool-sets.ts — new `agent_conformance` toolset
  for the router.
- server/tests/unit/conformance-addie-tools.test.ts — 8 unit
  tests covering org-binding enforcement, missing-secret error,
  not-connected hint, missing-id message, passing+failing
  storyboard markdown rendering.

Stacked on `bokelley/conformance-storyboard-runner` (PR #4051),
which is itself stacked on `bokelley/conformance-socket-mode-server`
(PR #4007).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…script

Adds docs/building/addie-socket-mode.mdx — an adopter-facing walkthrough
of the conformance Socket Mode channel from PRs #4007/#4051/#4054. Covers
when to use it (vs the public-endpoint AAO heartbeat), prerequisites, the
five-minute setup, what Addie can do once connected, privacy/safety
posture, and troubleshooting for the common failure modes I hit while
smoke-testing.

Mounted in the "Build" sidebar between validate-your-agent and grading
so it lives next to the other agent-development tools rather than
buried in implementation reference. Cross-linked from the existing
get-test-ready and aao-verified pages where appropriate via inline
references in the body.

Also adds scripts/smoke-conformance.ts — the end-to-end smoke I ran
against the stack before writing the doc. Spins up the server-side
conformance routes, connects a real adopter via @adcp/sdk 6.9
ConformanceClient, exercises both Addie chat tools (issue_token + run
storyboard). Stays as a runnable artifact for future regression checks
and as a worked example for anyone who wants to see the full flow in
~150 LOC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…moke

Adds a dev-only POST /api/conformance/_debug/run-storyboard endpoint
that triggers runStoryboardViaConformanceSocket(orgId, storyboardId)
against a live conformance session. Gated on NODE_ENV !== 'production'
alongside the existing /_debug session-list endpoint, requires auth,
and exists so local smoke harnesses can exercise the full PR #2 path
without the Addie chat surface.

Also adds scripts/smoke-conformance-training-agent.ts — a "proxy
adopter" that connects to the local conformance endpoint via @adcp/sdk
6.9 ConformanceClient and forwards every inbound MCP request to the
locally running training agent's /api/training-agent/sales/mcp HTTP
endpoint. Demonstrates the full Socket Mode → training agent path
end-to-end without modifying the training agent's HTTP-bound setup.

Run output (storyboard: media_buy_state_machine):
  ✓ Capability discovery
  ✓ Create a media buy (2/2)
  ✗ Valid state transitions (Pause + Resume passed, Cancel returned a
    training-agent 500 — separate bug to file)
  ⊘ Terminal state enforcement (skipped on prerequisite failure)

The failures here are real training-agent issues surfaced by the
storyboard runner reaching it through Socket Mode + the proxy. The
transport itself is invisible above MCP: same shape as running the
storyboard over direct HTTP, plus a hop through the WebSocket.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five fixes from the security + code-review pass:

1. **Race-eviction on same-org reconnect (must-fix).** The close-listener
   on each WS connection removed the session by orgId only. When a
   second adopter from the same org connected, register()'s last-writer-
   wins displacement closed the prior socket, which fired its close
   listener and deleted the just-registered new session — leaving the
   org permanently unreachable until the next disconnect. Fix: identity-
   keyed eviction, only remove if `sessions.get(orgId)?.transport`
   still points at THIS transport. New regression test covers it.

2. **Pre-register disconnect leak.** If the adopter closed the socket
   between `client.connect(transport)` and `register(...)`, we'd register
   a session whose underlying socket is already gone. Fix: check
   `ws.readyState === OPEN` before registering; bail otherwise.

3. **Liveness check before runner dispatch.** `runStoryboardViaConformanceSocket`
   now bails with `ConformanceNotConnectedError` if the resolved
   session's transport is closed (and evicts it from the store as a
   side effect). Avoids handing a dead AgentClient to the runner.
   Adds `ConformanceWSServerTransport.isClosed()` for the check.

4. **Subprotocol-sentinel tightening.** Earlier code accepted
   `Sec-WebSocket-Protocol: mcp, <token>` as a fallback. Drop it;
   require the explicit `adcp.conformance` sentinel. Reordered token
   extraction to prefer the header path (out of access logs) over
   the `?token=` query (which lands in pino/proxy logs). Documented
   the staging-log-as-token-equivalent caveat for the query fallback.
   New regression test rejects the wrong-sentinel form.

5. **Debug endpoint tenant scoping.** `_debug/run-storyboard` previously
   accepted arbitrary `org_id` from the request body, letting any
   authenticated user (on a misconfigured staging) fan storyboards
   into another tenant's session. Fix: normal callers must run against
   their own resolved org; static-admin-API-key callers retain the
   ability to specify `org_id` for local smoke tools (consistent with
   the rest of the admin-key surface). Production builds still skip
   the entire `_debug/*` block.

42/42 conformance tests pass (added 2 regression tests for #1 and #4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley bokelley force-pushed the bokelley/conformance-addie-tools branch from 0010b18 to eb18bca Compare May 4, 2026 15:01
@bokelley bokelley merged commit 6133d31 into main May 4, 2026
18 checks passed
@bokelley bokelley deleted the bokelley/conformance-addie-tools branch May 4, 2026 15:05
bokelley added a commit that referenced this pull request May 4, 2026
…4098)

* fix(training-agent): tenant router lock + storyboard-smoke --brand

Two follow-ups surfaced by the Socket Mode stack work (#4007/#4082)
against the local training agent.

1. **Tenant router race (closes #4084).** The tenant router shares one
   MCP `Server` instance per tenant from the framework's
   `DecisioningAdcpServer` registry. Each request created a fresh
   `StreamableHTTPServerTransport`, called `server.connect(transport)`,
   handled the request, and `server.close()`'d. Two concurrent requests
   against the same tenant overlapped and the second `.connect()` threw
   "Already connected to a transport" — surfaced as intermittent 500s
   under back-to-back HTTP load.

   Fix: per-tenant async lock (`withTenantLock`) serializes the
   `connect/handle/close` window per tenant so the shared server only
   ever has one transport bound at a time. Throughput is gated by the
   in-flight request's wallclock; the storyboard runner's sequential
   dispatch makes this a non-issue, and the compliance heartbeat runs
   one tenant at a time. A future improvement could pool servers per
   tenant for true parallelism — this lock is the minimum-mass
   correctness change.

   Verification: 10 concurrent `tools/list` POSTs against
   `/api/training-agent/sales/mcp` return all `"result"` (was a mix of
   "result" and 500s before); server log shows zero "Already connected"
   errors (was 7+ per run before).

2. **storyboard-smoke `--brand` flag (closes #4083).** That issue
   reported `update_media_buy` on a cancelled buy returning
   `MEDIA_BUY_NOT_FOUND` instead of `INVALID_STATE`. After diagnosis
   the training agent is correct — the bug is in the upstream SDK
   runner's `update_media_buy` enricher, which fabricates an account
   from `resolveBrand(options)` (defaulting to `test.example`) when
   options.brand is unset. Positive-path steps get rewritten to
   `test.example`; `expect_error` steps skip the enricher and keep the
   YAML's literal `acmeoutdoor.example`. Result: split-brain session
   keying and stale-state reads on the negative-path probes.

   Fix: storyboard-smoke now accepts `--brand <domain>`. With
   `--brand acmeoutdoor.example`, the SDK runner's `applyBrandInvariant`
   normalizes both step kinds to the kit's domain and the storyboard
   passes 9/9. Long-form rationale in the JSDoc.

Also un-blocks the pre-commit typecheck by adding a `// @ts-expect-error`
on the v6-sales-platform `handoffToTask` call. PR #4080 bumped @adcp/sdk
to 6.11 expecting the two-arg signature from adcp-client#1554, but the
published 6.11 .d.ts still declares the single-arg shape only. Runtime
accepts the second arg correctly — pure typings gap. Drop the directive
when 6.12+ ships the matching typing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(training-agent): drop @ts-expect-error — local was stale on @adcp/sdk

CI surfaced "Unused '@ts-expect-error' directive" because CI's npm install
resolved @adcp/sdk 6.11.0 (which has the two-arg `handoffToTask` typing
from adcp-client#1554), while my local node_modules was still on 6.9.0
(no typing for the second arg) — the version mismatch made the directive
necessary locally but unused on CI.

Refreshed node_modules to 6.11.0 and dropped the directive. Both surfaces
typecheck clean now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(training-agent): expand tenant-lock comments per code review

Two doc-only fixes from the PR #4098 code review pass:

1. `withTenantLock` — explain why the lock includes flushDirtySessions
   and server.close() (not just the transport window): in-memory session
   state mutations from request N must persist before N+1 runs against
   the same shared `DecisioningAdcpServer`. Narrowing the lock would race
   on the v5 handlers' session-context state.

2. `withTenantLock` — explain the `.catch(() => {})` chain-keepalive
   pattern so a future reader doesn't "fix" it by removing the catch
   (which would poison every subsequent same-tenant request).

3. `storyboard-smoke.ts` — add `--brand` to the usage block so the flag
   is discoverable from the file header.

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant