fix(ci): v3 reference seller storyboard job actually asserts on results by bokelley · Pull Request #693 · adcontextprotocol/adcp-client-python

bokelley · 2026-05-12T11:48:48Z

Closes #410.

Problem

The `storyboard-v3-reference-seller` job advertised itself as gating "every PR's translator-pattern conformance" but ended its storyboard invocation with `|| true` (.github/workflows/ci.yml:645), so a failing run silently passed CI. The artifact path also pointed at `examples/v3_reference_seller/` while the file was written to workspace root, so post-mortem inspection landed on `if-no-files-found: warn` instead of the actual result.

The job had cosmetic gate-value, no real gate-value.

Fix

Three changes:

Drop `|| true` from the storyboard invocation. `cd` into `examples/v3_reference_seller` so the result file lands where `upload-artifact` expects it.
"Assert storyboard passed" step matching the seller_agent job's pattern (ci.yml:435-451): `overall_status == 'passing'` AND `controller_detected == true`. On failure, dump the full JSON so the diagnostic IS the storyboard report.
"Assert upstream traffic" anti-façade gate: GET `/_debug/traffic` from the seller; fail if all per-method counts are zero. A seller that returns stub data without translating would have passed the storyboard but emitted no traffic — this is the failure mode feat(testing): storyboard runner matrix entry for examples/v3_reference_seller #410 specifically called out.

What was tested

`python3 -c "import yaml; yaml.safe_load(open('.github/workflows/ci.yml'))"` — YAML valid
CI will run on this PR; the v3 reference seller already implements `enable_debug_endpoints=True` (per examples/v3_reference_seller/src/app.py:192) so the `/_debug/traffic` endpoint exists and will return real counts.

🤖 Generated with Claude Code

Closes #410. The storyboard-v3-reference-seller job advertised itself as gating "every PR's translator-pattern conformance" but ended its storyboard invocation with `|| true`, so a failing run silently passed CI. The artifact path also pointed at examples/v3_reference_seller/ while the file was written to workspace root, so post-mortem inspection landed on a warning instead of the actual result. Three changes: 1. Drop `|| true` from the storyboard invocation. Move into examples/v3_reference_seller so the result file matches the upload-artifact path. 2. Add an "Assert storyboard passed" step matching the seller_agent job's pattern: overall_status == 'passing' AND controller_detected. On failure, dump the full JSON so the PR diagnostic is the raw storyboard report, not a partial cat | head -50. 3. Add an "Assert upstream traffic" step that GETs /_debug/traffic and fails if all per-method counts are zero. This is the anti-façade gate the issue called for: a seller that returns stub data without translating to upstream calls would have passed the storyboard but emitted no traffic. The gate makes that mode detectable in CI instead of in production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley · 2026-05-12T15:06:17Z

Review

The patch itself is right — drop || true, assert overall_status == 'passing', assert non-zero upstream traffic. That's exactly the gate description.

But the gate is already doing its job: the new `AdCP storyboard runner — v3 reference seller (translator)` check is failing on this PR with 12 storyboard-level failures, which is what the previous || true was masking. Three distinct failure clusters:

1. Storyboard fixture / authorization setup (7 failures — same root cause)

All 7 media_buy_seller/*/get_products_brief steps fail with:

PERMISSION_DENIED: Buyer agent is not authorized for this seller. The seller's commercial allowlist did not authorize this credential.

This looks like a storyboard buyer-agent fixture issue — the runner is presenting credentials the v3 ref seller's commercial allowlist doesn't carry. Either the storyboard scenarios assume a fixture step that no longer runs, or the v3 ref seller's seed data missed the buyer-agent registry. Worth confirming locally before landing this PR — if it's a fixture gap on the reference seller side, fix in same PR; if it's a real allowlist policy regression, file separately.

2. Cascade-rule error code mismatches (2 failures)

create_media_buy_aggressive_terms: expected TERMS_REJECTED, got something else at /adcp_error/code
update_unknown_media_buy: expected MEDIA_BUY_NOT_FOUND
update_unknown_package: expected PACKAGE_NOT_FOUND

These are real spec-conformance regressions in the v3 ref seller. Each one should be a separate issue; the cascade-rules layer can't normalize these because they're terminal codes.

3. Spec conformance: missing `list_accounts` / `sync_accounts` (1 failure)

Agent declared account-bearing specialism(s) [sales-non-guaranteed, sales-guaranteed]
but advertises neither list_accounts nor sync_accounts.

This is exactly #377, which is already open. Worth linking that issue from this PR's description so the dependency is explicit.

4. Storyboard step bug?

INVALID_REQUEST: buying_mode='refine' must not be combined with brief

get_products_refine storyboard step is sending buying_mode=refine alongside brief. Either the spec changed (refine doesn't take brief), or the storyboard step is wrong. Verify against the spec slice.

Process recommendation

The PR has correctly diagnosed and fixed the cosmetic-gate problem. The trade-off now is:

Land as-is → the v3 storyboard job goes red on main for every PR until the 12 failures are fixed. This is the honest signal but blocks unrelated work.
Hold until the 12 failures are fixed (or at least the fixture issue, which is 7 of them) → defeats the purpose somewhat but keeps main green.
Land + immediately file follow-ups (preferred) — capture each cluster as a tracked issue, link from PR body, and accept the temporary red. The whole point of the gate is that broken conformance should be visible, not hidden.

Also: branch is BEHIND base. Needs rebase before merge regardless.

I'll file follow-ups for clusters 2 and 4 separately. Cluster 1 needs human eyes on the v3 ref seller fixture setup — please confirm before I file.

bokelley · 2026-05-12T15:12:10Z

Filed the PERMISSION_DENIED cluster as #703 with three hypotheses (fixture seed gap / allowlist policy regression / middleware drift) and triage steps. Team can pick it up from there.

Full follow-up set for this PR's failures: #701, #702, #703 + already-open #377.

Brings the v3 reference seller from 12 storyboard failures down to (hopefully) 0 + #702 (storyboard YAML bug, upstream). Four changes: 1. CI storyboard step gets `--auth dev-bearer-token-acme-1` so the runner's requests resolve through PgBuyerAgentRegistry to ba_acme_bearer (seeded in seed.py). Without this, 7 storyboard steps fail with PERMISSION_DENIED. 2. update_media_buy validates inputs against the upstream BEFORE bailing with UNSUPPORTED_FEATURE. Unknown media_buy_id resolves to MEDIA_BUY_NOT_FOUND (via SDK 404 projection on get_order). Unknown package_id resolves to PACKAGE_NOT_FOUND. Closes the 2 invalid_transitions storyboard probes that asserted on the specific code. 3. create_media_buy gains _reject_unworkable_terms. When a buyer proposes measurement_terms.billing_measurement.max_variance_percent <= 0 we raise TERMS_REJECTED. Zero variance on third-party measurement is unworkable for any real vendor. Closes the measurement_terms_rejected/aggressive_terms probe that asserted TERMS_REJECTED. 4. Tests grow three new cases covering the three new error paths, and the existing UNSUPPORTED_FEATURE smoke test gains a respx mock for the new upstream-existence check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…entials Closes the root-cause behind the 12-failure cluster surfaced by the storyboard gate in #410. The v3 reference seller had no bearer auth middleware wired in serve(), so every storyboard request arrived with ``Authorization: Bearer dev-bearer-token-acme-1`` but the framework never extracted it into a credential — BuyerAgentRegistry dispatch saw credential=None and returned PERMISSION_DENIED on every call. Three coordinated changes in app.py: 1. ``_make_validate_token(sessionmaker)`` — async validator that looks up a bearer token against the seeded ``BuyerAgent.api_key_id`` column and returns a Principal with ``caller_identity=agent_url`` and ``metadata={"api_key_id": token}`` so the raw token survives the middleware → dispatch handoff. 2. ``_build_context_factory()`` — replaces the simple tenant-pinning factory with one that wraps ``adcp.server.auth.auth_context_factory`` and upgrades the bearer-flow ``adcp.auth_info`` from ``credential=None`` to ``credential=ApiKeyCredential(key_id=token)``. That's exactly what auth_context_factory's docstring tells adopters to do when they wire BuyerAgentRegistry alongside bearer auth. 3. ``serve()`` gains ``auth=BearerTokenAuth(validate_token=...)`` so the middleware is actually installed. With these wired together the storyboard runner's --auth flag finally resolves through to ``BuyerAgentRegistry.resolve_by_credential`` and the seeded ``ba_acme_bearer`` admits the request. The platform fixes for MEDIA_BUY_NOT_FOUND / PACKAGE_NOT_FOUND / TERMS_REJECTED (already in this branch) then run for the negative-path probes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two CI failures on commit 1ec1f1a: 1. v3 ref seller boot crashed with TypeError because BearerTokenAuth rejects async validators when transport='both' (the A2A leg's middleware cannot await). Fix: load the BuyerAgent.api_key_id rows into a token-to-Principal map at boot, swap the async DB-lookup validator for a sync dict lookup. The bootstrap function does schema-create + token-load in the same event loop, disposes the engine before returning so uvicorn opens a fresh asyncpg pool. 2. conventional-commits gate rejected commit 63453bb ("Merge branch 'main' into ...") because the skip regex only matched the merge-queue API shape ("Merge <sha> into <sha>"), not the shape gh pr update-branch and git merge produce. Widen the regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the ACCOUNT_NOT_FOUND cluster surfaced by the storyboard gate after bearer auth started working. The seller previously used ExplicitAccounts which requires account.account_id on every request, but AdCP storyboards send account: {brand: ..., operator: ...} with no account_id during the discover-products phase. Replace ExplicitAccounts with a custom AccountStore that handles both reference shapes: 1. Explicit {account_id} ref - direct lookup against the accounts table's account_id column (existing behavior). 2. Brand-shaped {brand, operator} ref with no account_id - resolve to the first active account associated with the authenticated buyer agent (via auth_info.principal -> agent_url -> buyer_agent row -> first matching account). This is the path AdCP storyboards exercise before the buyer has been issued a per-relationship account_id. The row-to-Account projection is factored out so both paths share the same upstream-routing metadata stamping (network_code, advertiser_id, mock_upstream_url). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ia_buy_id After bearer auth + AccountStore brand-fallback landed, the storyboard gate's remaining 5 create_buy failures all shared the same root cause: seller returns Submitted({task_id, status: 'submitted'}) when the upstream order lands in pending_approval, and the storyboard validation expects media_buy_id in the response body. The reference seller runs against a fast mock upstream that auto-approves in milliseconds, so awaiting the polling synchronously is the right behavior for this example. Adopters with slow real-world approvals swap to ctx.handoff_to_task(...) per the documented escape hatch. Tests updated: - test_create_media_buy_returns_task_handoff_on_pending_approval renamed to test_create_media_buy_sync_polls_to_success_on_pending_approval; mocks the task-completion poll and the post-completion order refetch, asserts CreateMediaBuySuccessResponse with media_buy_id. - test_create_media_buy_raises_when_polling_times_out simplified to await directly (no TaskHandoff._fn indirection). - test_create_media_buy_raises_when_task_rejected likewise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… status create_media_buy now mints a deterministic package_id per requested package and echoes the spec-marked echo fields (targeting_overlay, measurement_terms, creative_assignments, etc.) so AdCP storyboards can capture packages[0].package_id and verify list-targeting / measurement- terms persistence. When the buyer supplies no creatives anywhere, the response surfaces status='pending_creatives' so the next step is sync_creatives. This unblocks the inventory_list_targeting, invalid_transitions, creative_fate_after_cancellation, and pending_creatives_to_start storyboard create steps.

The v3 storyboard runner job now passes 19/25 active steps but reports overall_status='partial' on a handful of scenarios that require feature work tracked under #706 — update_media_buy operations on persisted state, get_media_buys per-package projection, list_accounts/sync_accounts (#377), and an upstream storyboard YAML bug (adcp#702). Split the assert into a hard gate (runner produced output + controller_detected) and a soft gate (overall_status == passing), so the anti-façade work in #410 stays enforced while the residuals are tracked openly.

controller_detected has never been true for the v3 reference seller's translator topology — the runner detects DemoStore overrides only for the examples/seller_agent.py stub-mode shape. The previous "Assert storyboard passed" step never reached the controller_detected check because overall_status != 'passing' tripped first. Splitting the asserts surfaced this latent gate as a hard failure even when the soft gate is suppressed. Anti-façade enforcement is unchanged: the "Assert upstream traffic" step still requires non-zero per-method upstream-call counts.

…ia_buy create_media_buy now POSTs each requested package as an upstream line item (POST /v1/orders/{id}/lineitems) and returns the line_item_id as the AdCP package_id. update_media_buy is no longer UNSUPPORTED_FEATURE: cancel / pause / creative-assignment patches are applied to a seller-local shadow store (self._buy_state), and creative_assignments flow upstream via POST .../lineitems/{li}/creative-attach. get_media_buys projects each line item plus its shadow-store entry so list-targeting, measurement_terms, and per-package cancel/pause survive create → get. Per-buy double-cancel raises NOT_CANCELLABLE. Unblocks the storyboard residuals tracked in #706: pending_creatives_to_start, inventory_list_targeting, invalid_transitions, and creative_fate_after_cancellation.

get_media_buys now narrows to the requested media_buy_ids when the buyer supplies them — without the filter, the response leaks every advertiser-scoped buy and the storyboard's media_buys[0] lookup hits a different scenario's order. Adds a bidirectional buyer-creative-id ↔ upstream-creative-id map so sync_creatives can echo the buyer's id on list_creatives and update_media_buy can translate before attach_creative (the upstream mints cr_<uuid> regardless of client_request_id).

update_media_buy now applies the buyer's package echo fields (targeting_overlay, measurement_terms, etc.) to the shadow store using replacement semantics, so inventory_list_targeting's swap step reads back the new list_id values. list_creatives now honors the request's filters.creative_ids — without the filter the storyboard's creatives[0] lookup picks up a different scenario's creative.

sync_creatives now treats a re-sync of a known buyer creative_id as a no-op upload (action='unchanged') instead of letting the upstream's idempotency-conflict fire. The assignments[] field on the request is applied via attach_creative + shadow store, mirroring the path update_media_buy uses. Replaces the storyboard's continue-on-error soft gate with an explicit allowlist of expected failures — only adcp#702 (refine_products) and #377 (account_discovery) are allowed. Any new failure fails the job; any allowlisted failure that newly passes also fails the job so the list gets pruned as upstream issues close.

…oses #377) Moves the sync_accounts / list_accounts implementations off the platform class (where they were dead code — the framework never dispatched there) onto the AccountStore returned by _make_account_store. The framework's tool-advertising layer probes platform.accounts.upsert and platform.accounts.list as callables; without them every sales-* agent silently dropped both tools and the AdCP 3.0.9 §accounts/overview probe failed. upsert(refs, ctx) persists the buyer's AccountReference (full billing_entity, bank and all) and returns SyncAccountsResultRow per account. The framework projects through to_wire_sync_accounts_row, applying the billing_entity.bank write-only strip on emit. list(filter, ctx) returns Account[TMeta] with the billing_entity populated from Postgres; framework's to_wire_account strips bank on emit. Per-principal scoping by buyer-agent agent_url is enforced — unauthenticated callers see an empty list. Tests cover both projections end-to-end (bank persists on write, never appears on the wire) plus a surface-check that AccountStore exposes upsert/list callables. Drops account_discovery from the storyboard allowlist; allowlist now covers only the refine_products feature gap.

… fetch) The reference seller has no pricing / forecasting model upstream — the mock-server returns the same product set regardless of refinements. The new refine_get_products re-fetches the base product list via get_products and reports each refine[] entry as RefinementOutcome (status='partial', notes=<no engine>). Adopters with a real forecaster swap the outcomes to 'applied' and project pricing changes onto products. With refine_get_products in place AND sync_accounts/list_accounts on the AccountStore from the previous commit, every in-scope storyboard scenario passes. Drops the allowlist step in favor of a strict overall_status == 'passing' gate.

bokelley and others added 7 commits May 12, 2026 11:20

Merge branch 'main' into claude/issue-410-v3-storyboard-assertion

63453bb

bokelley mentioned this pull request May 12, 2026

v3 ref seller: residual storyboard response-shape mismatches #706

Closed

bokelley added 8 commits May 12, 2026 15:13

bokelley mentioned this pull request May 12, 2026

feat(v3-ref-seller): implement sync_accounts + list_accounts with billing_entity write-only projection guard wired #377

Closed

4 tasks

bokelley merged commit 27a5866 into main May 12, 2026
16 checks passed

bokelley deleted the claude/issue-410-v3-storyboard-assertion branch May 12, 2026 21:34

github-actions Bot mentioned this pull request May 12, 2026

chore(main): release 5.3.0 #682

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): v3 reference seller storyboard job actually asserts on results#693

fix(ci): v3 reference seller storyboard job actually asserts on results#693
bokelley merged 16 commits into
mainfrom
claude/issue-410-v3-storyboard-assertion

bokelley commented May 12, 2026

Uh oh!

bokelley commented May 12, 2026

Uh oh!

bokelley commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bokelley commented May 12, 2026

Problem

Fix

What was tested

Uh oh!

bokelley commented May 12, 2026

Review

1. Storyboard fixture / authorization setup (7 failures — same root cause)

2. Cascade-rule error code mismatches (2 failures)

3. Spec conformance: missing list_accounts / sync_accounts (1 failure)

4. Storyboard step bug?

Process recommendation

Uh oh!

bokelley commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

3. Spec conformance: missing `list_accounts` / `sync_accounts` (1 failure)