From 534668732247567f822d9e666842f9adc6562efa Mon Sep 17 00:00:00 2001 From: Brian O'Kelley Date: Sun, 3 May 2026 19:38:32 -0400 Subject: [PATCH 1/7] docs(proposals): salesagent side-car experiment plan (full GAM) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Falsifies #502's two-platform composition against salesagent + live GAM. Side-car runtime in a salesagent worktree, same DB, untouched admin UI; binary exit criterion is the AdCP media_buy_seller storyboard passing against a sandbox GAM Network ID. In scope: real GAM upstream (wrap, don't port), HITL approval lifecycle (compose_method + ShortCircuit against salesagent's actual WorkflowStep flow), webhook delivery via SDK F12 auto-emit. The HITL mapping is grounded in salesagent's existing gate at google_ad_manager.py:571 + execute_approved_media_buy at media_buy_create.py:458 — no paused-task resumption needed; the SDK's TaskRegistry doesn't participate. Five concrete questions the experiment answers, including the resumption-marker shape (typed ctx.resumption_token vs. adopter-side sentinel) and whether dynamic_products.py factors onto ProposalManager.get_products without gutting it. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../salesagent-sidecar-experiment.md | 376 ++++++++++++++++++ 1 file changed, 376 insertions(+) create mode 100644 docs/proposals/salesagent-sidecar-experiment.md diff --git a/docs/proposals/salesagent-sidecar-experiment.md b/docs/proposals/salesagent-sidecar-experiment.md new file mode 100644 index 000000000..ae6f1fc4d --- /dev/null +++ b/docs/proposals/salesagent-sidecar-experiment.md @@ -0,0 +1,376 @@ +# Experiment: salesagent side-car runtime (full GAM) + +Status: **PROPOSED**. Falsifies [#502](https://github.com/adcontextprotocol/adcp-client-python/pull/502)'s +two-platform composition against a real adopter codebase + a live ad +server. Runs in a salesagent worktree, points at salesagent's existing +Postgres, drives real GAM. Admin UI untouched. + +## Hypothesis + +The four-layer model + two-platform composition (`ProposalManager` + +`DecisioningPlatform`) in [#502](https://github.com/adcontextprotocol/adcp-client-python/pull/502) +holds up when ported on top of salesagent's actual code and database, +with GAM as the live decisioning target. Either it ports cleanly and +we ship the Protocol shape the experiment forces, or it doesn't and +we learn what's wrong before the architecture solidifies. + +The narrative-level claim in [PR #489](https://github.com/adcontextprotocol/adcp-client-python/pull/489) +is that adopters port to `SalesPlatform` today and split along the +proposal/decisioning seam later when `ProposalManager` lands. The +experiment runs that exact split — except we force it now, with +`ProposalManager` designed inline against salesagent's complexity. + +## Why GAM, not mock-mode + +Mock-mode would tell us if the wire shape works. It wouldn't tell us +whether the SDK can host salesagent's actual upstream complexity — +which is the whole question. GAM is where ~99% of salesagent's +deployed value lives (per the migration guide §"Migration order"). +The dynamic-product assembly, the `implementation_config` shape, the +line item / ad unit / KV-targeting plumbing, the lifecycle state +graph — all of it is GAM-shaped today. If our architecture survives +contact with GAM, it survives. If it doesn't, mock conformance was +never going to tell us. + +## Slice + +**One salesagent tenant, configured for GAM, serving the AdCP +`media_buy_seller` storyboard end-to-end against a sandbox GAM Network +ID.** Other tenants stay on the existing runtime. Same process, same +DB, routed by tenant. + +In scope: + +* **Salesagent admin UI** (`src/admin/`) — UNCHANGED. Operators + configure tenants, products, principals through existing Flask + blueprints. +* **Salesagent Postgres schema** (`src/core/database/models.py`) — + UNCHANGED. The side-car runtime reads/writes the same tables. +* **A new `src/sdk_runtime/` directory** in salesagent, owning: + - `BuyerAgentRegistry` impl projecting from `Principal` rows + (`access_key_hash`, `oauth_client_id`) + - `AccountStore` impl projecting account context from `Principal` + + `Tenant` rows + - `GAMProposalManager` — reads `Product` rows, runs the + `dynamic_products.py` (`src/services/dynamic_products.py`) + assembly logic, emits wire `Product[]` with + `implementation_config` recipes + - `GAMDecisioningPlatform` — wraps salesagent's existing + `google_ad_manager.py` adapter; calls real GAM + - `adcp.serve(...)` entrypoint mounted alongside the existing A2A + + MCP servers, routing the experiment tenant only +* **Real GAM upstream** — actual orders/line items in a sandbox + network. Real auth, connection pooling, error projection. The + existing GAM adapter handles upstream calls; we wrap it. +* **HITL approval lifecycle** — the + `compose_method` + `ShortCircuit` claim from #489 §3.9 against + salesagent's actual approval flow. + + **How salesagent does HITL today.** Synchronous gate + DB-persisted + re-entry, not a paused coroutine. + + 1. Adapter's `create_media_buy` (`google_ad_manager.py:571`) + checks `_requires_manual_approval(op) and not getattr(request, + "_already_approved", False)`. If gated: writes a `WorkflowStep` + row (status `"pending_approval"`), an `ObjectWorkflowMapping` + linking step → media buy, and a `MediaBuy` row carrying + `raw_request` as JSON. Returns 200 with `workflow_step_id` + populated. **No GAM call has happened.** + 2. Operator approves through admin UI. Flask route + `approve_workflow_step` (`admin/blueprints/workflows.py:155`) + directly imports `execute_approved_media_buy` + (`media_buy_create.py:458`) — same process, same DB. + 3. `execute_approved_media_buy` reconstructs the request from + `raw_request`, sets `request._already_approved = True` + (`media_buy_create.py:529`), re-calls the adapter. The gate + skips this time; real GAM call fires. + + **What that means for the SDK runtime.** No paused-task + resumption needed. Salesagent's `WorkflowStep` table IS its task + registry; we keep using it. Mapping: + + - `before` hook on `create_media_buy` / + `update_media_buy` / `add_creative_assets` checks + `manual_approval_required`. If true: calls + salesagent's existing `workflow_manager` to write the + `WorkflowStep` + `MediaBuy(status='pending_approval', + raw_request=...)` rows, returns + `ShortCircuit(value=CreateMediaBuySuccess( + status='pending_approval', workflow_step_id=...))`. + - Admin UI unchanged. `approve_workflow_step` keeps calling + `execute_approved_media_buy(...)`. + - Rewrite the body of `execute_approved_media_buy`: reconstruct + request, attach a resumption marker (the + `_already_approved` sentinel, or a typed equivalent on `ctx`), + call `sdk_runtime.create_media_buy(req, ctx)` directly. Gate + sees the marker, falls through, real GAM call happens, SDK F12 + auto-emit fires the completion webhook. + + **The interesting design question.** Salesagent's + `_already_approved` is an untyped `setattr` sentinel — fine for + one codebase, fragile as an SDK seam. The experiment forces a + decision: does the SDK ship a typed resumption marker (e.g., + `ctx.resumption_token: ResumptionToken | None`), or does the + before-hook check on adopter-defined state? Whatever shape + works in this experiment is the recommended Protocol seam. +* **Webhook delivery via SDK F12 auto-emit.** Disable salesagent's + `protocol_webhook_service.py` for the experiment tenant; configure + `WebhookSender` on `serve(...)`. We're testing whether buyer- + registered `push_notification_config` gets sync-completion webhooks + fired automatically, signed correctly, retried on transient + failure, logged-and-swallowed on permanent failure — without + adapter code participating. The migration guide §3.14 claims + adopters delete their webhook plumbing wholesale; the experiment + validates that against a real workload. + +Out of scope (deliberate): + +* Other adapters (Kevel, Broadstreet, Triton, mock — keep on the + existing runtime). +* Other tenants. +* Refine flow / proposal lifecycle (`finalize`, `expires_at`, + draft → committed). Wire later if v1 is stable. +* Push reporting (`reporting_webhook` / `reporting_bucket`). +* Creative agent / creative builder. + +## Architecture + +``` +┌────────────────────────────────────────────────────┐ +│ Salesagent admin (Flask blueprints) — unchanged │ +└──────────────────┬─────────────────────────────────┘ + │ writes + ↓ +┌────────────────────────────────────────────────────┐ +│ Salesagent Postgres — unchanged schema │ +│ Tenant, Principal, Product, MediaBuy, ... │ +└──────────────────┬─────────────────────────────────┘ + │ reads + ↓ +┌────────────────────────────────────────────────────┐ +│ src/sdk_runtime/ — the experiment │ +│ BuyerAgentRegistry ← Principal rows │ +│ AccountStore ← Principal + Tenant rows │ +│ GAMProposalManager ← Product rows + dynamic │ +│ GAMDecisioningPlatform → wraps gam adapter │ +│ adcp.serve(transport='both') │ +└──────────────────┬─────────────────────────────────┘ + │ + ↓ + real GAM upstream +``` + +The existing `adcp_a2a_server.py` (2,276 LOC) and +`mcp_server_enhanced.py` keep serving every other tenant. The +experiment tenant routes to `adcp.serve(...)` only. + +Routing strategy: a tiny shim at the framework entry consults +`Principal.tenant_id` and dispatches to either the existing runtime +or the SDK runtime. One process, two runtimes, clean handoff at the +tenant boundary. + +## Wrap, don't port + +`media_buy_create.py` is 3,930 LOC. Porting it body-by-body into +`GAMDecisioningPlatform.create_media_buy` is the failure mode. The +discipline: **wrap the existing tool body**. The SDK runtime takes +the wire request, projects to salesagent's internal shape, calls the +existing tool function (or the GAM adapter directly), projects the +response back to wire shape. + +Same rule for `dynamic_products.py` (505 LOC): wrap, don't port. +`GAMProposalManager.get_products` calls `dynamic_products.py` with +salesagent's expected inputs, gets back its native output, projects +to wire `Product[]`. + +This isolates the experiment to **the seam**: where wire requests +become salesagent-shaped calls and back. We learn whether the SDK +abstractions can host salesagent's existing logic without re-porting +it. If a wrapper feels forced, that's the architecture telling us +something. If it ports as a thin shim, the architecture holds. + +## Exit criteria (binary) + +The AdCP `media_buy_seller` storyboard passes against the experiment +tenant, with: + +1. Real GAM line items / orders created in the sandbox network. +2. Salesagent's existing Postgres schema as the data backing. +3. Salesagent admin UI untouched and functional. +4. No double-write conflicts: every write for the experiment tenant + goes through the SDK runtime; the existing tools don't serve + that tenant. + +If the storyboard passes: the architecture holds. We have a working +`ProposalManager` shape that emerged from real adopter code, ready +to factor into the SDK Protocol. + +If it fails: we learn precisely where #502's model breaks — +proposal-side (assembly factoring), decisioning-side (recipe shape), +seam (capability overlap, lifecycle, hydration), or supporting (auth +projection, schema mismatch). + +## What we expect to learn + +Three concrete questions the experiment answers: + +1. **Does `dynamic_products.py` (`src/services/dynamic_products.py`, + 505 LOC) factor onto `ProposalManager.get_products`?** + Salesagent's signal-driven assembly is the most complex piece of + proposal logic in the wild. If it ports as a thin wrapping + (target: <300 LOC of glue), the proposal-side abstraction is + correct. If it requires gutting `dynamic_products.py` itself, the + abstraction is wrong. + +2. **Does the recipe carry enough?** GAM's `implementation_config` + (`Product.implementation_config: JSONType`, + `models.py:256`) is the most-evolved form: line item template + ids, ad unit ids, KV targeting, frequency caps, signal mappings. + If the recipe shape we settle on can carry it without escape + hatches (no `extra: dict[str, Any]` smuggled fields), the + typed-recipe model in #502 is sound. + +3. **What does framework-owned proposal lifecycle actually need?** + When `create_media_buy` runs, it needs to hydrate the recipe from + somewhere. Three plausible answers (session cache, DB row, fresh + lookup); the experiment forces a choice. The choice IS the + design for the `ProposalManager` Protocol. + +4. **What is the right shape for the resumption marker?** Salesagent's + HITL model (gate → persist → admin approves → re-call same code + path with sentinel) maps cleanly onto `compose_method` + + `ShortCircuit` without needing the SDK's `TaskRegistry` to round-trip. + The gate writes salesagent's `WorkflowStep` row; the admin UI keeps + calling `execute_approved_media_buy`; that function re-enters the + SDK runtime with a marker. The open seam is the marker itself — + salesagent uses an untyped `setattr(request, "_already_approved", + True)`, fine for one codebase, fragile as an SDK contract. Two + plausible shapes: (a) typed `ctx.resumption_token: ResumptionToken + | None` carried through framework dispatch, the before-hook checks + it; (b) keep the marker adopter-side entirely, the SDK doesn't + model it. The experiment forces a choice; the choice IS the + recommended Protocol seam for cross-cutting `compose_method` gates + in adopters with persisted-and-re-fired approval flows. Sub-question + to resolve in passing: does `compose_method`'s `before` hook need + read access to `ctx` rich enough to make this decision (it should). + +5. **Does F12 webhook auto-emit hold up under real load?** The + migration guide claims adopters delete their webhook plumbing. + The experiment runs that — `protocol_webhook_service.py` off for + the experiment tenant, `WebhookSender` on. If signing, retry, + and failure handling all work without adapter code, the claim + stands. If anything leaks through to adapter code, we revise + §3.14. + +## Risks + +* **Scope creep on the wrap.** The temptation to "just port this + small piece" compounds. Mitigation: every wrap stays a wrap until + the storyboard passes. Re-porting is a follow-up, not part of v1. +* **GAM credential handling.** Need a sandbox GAM Network ID with + real auth. Salesagent already has this in dev; experiment + inherits. Document the env vars in the experiment README. +* **HITL resumption marker shape.** Salesagent's untyped + `_already_approved` sentinel is what re-entry uses today. Lifting + that to a typed SDK contract (or deciding to leave it adopter-side) + is a real design call, not just plumbing. Wrong choice here means + every adopter with a persisted-approval flow re-invents the + marker. Mitigation: prototype both shapes in the experiment; + whichever felt natural at the gate site wins. +* **Webhook signing config.** SDK `WebhookSender` and salesagent's + existing `webhook_authenticator.py` need to agree on signing + semantics, or the buyer rejects deliveries. Validate against a + test buyer before the storyboard run. +* **Auth projection drift.** If `Principal.access_key_hash` doesn't + project cleanly onto `BuyerAgentRegistry`, the foundation in + #489 §"Foundations" is wrong. Catch this early — write the + registry shim first, before anything else. +* **Two writers, one DB — controlled.** The experiment tenant is a + test tenant we own and is not actively serving real buyers, so + duplication is operationally fine. Tenant routing still enforces + the boundary in code (experiment tenant never hits old runtime, + old tenants never hit new runtime), but the consequence of a + routing bug is "we notice and fix it," not "we corrupt a buyer's + campaign." +* **The storyboard hits a feature we deferred.** If `media_buy_seller` + exercises refine or push reporting on the happy path, scope + expands. Pin the storyboard version up front; test against it + locally before declaring v1 done. + +## Workstream + +1. **Set up the worktree.** New salesagent worktree for the + experiment. Install adcp-client-python from `main` (or a working + branch) into salesagent's venv. +2. **Auth shim first.** `BuyerAgentRegistry` + `AccountStore` reading + from `Principal` + `Tenant`. Validate against existing test + fixtures before anything else lands. +3. **`GAMDecisioningPlatform` wrapper.** Thinnest possible wrap of + `google_ad_manager.py`. `create_media_buy` and + `get_media_buy_delivery` only. +4. **`GAMProposalManager` wrapper.** Thinnest possible wrap of + `dynamic_products.py` + the `Product` catalog read. +5. **HITL gate via `compose_method`.** `before` hook on + `create_media_buy` / `add_creative_assets` / `update_media_buy` + that consults salesagent's `manual_approval_required` config, + writes the `WorkflowStep` + `MediaBuy(status='pending_approval', + raw_request=...)` rows via salesagent's existing `workflow_manager`, + and returns `ShortCircuit(value=CreateMediaBuySuccess( + status='pending_approval', workflow_step_id=...))`. Rewrite the + body of `execute_approved_media_buy` (`media_buy_create.py:458`) + to reconstruct the request, attach the resumption marker, and + call back into the SDK runtime's `create_media_buy`. Admin UI + route untouched. Prototype both marker shapes (typed + `ctx.resumption_token` vs. adopter-side sentinel); record which + reads cleanly at the gate site. +6. **`WebhookSender` configured on `serve(...)`.** Disable + `protocol_webhook_service.py` for the experiment tenant. Verify + sync-completion webhooks fire after every mutating tool call, + signed correctly, against a test buyer that validates signature. +7. **Routing shim.** Per-tenant dispatch between old and new + runtimes inside the same process. +8. **Storyboard run.** AdCP `media_buy_seller` against the + experiment tenant; sandbox GAM; admin UI verified untouched. + Approval-required path exercised at least once. +9. **Findings doc.** What ported cleanly, what didn't, what the + `ProposalManager` Protocol should look like based on what worked, + and which resumption-marker shape (typed + `ctx.resumption_token` vs. adopter-side sentinel) reads cleanly + at the `compose_method` gate site. + +## Next steps after experiment + +If exit criteria pass: + +1. Factor the `ProposalManager` Protocol from the experiment into + adcp-client-python (separate PR). +2. Land `MockProposalManager` forwarder per #502. +3. Update [#489](https://github.com/adcontextprotocol/adcp-client-python/pull/489) + §3.3 with experiment findings; remove "ProposalManager will own X" + hedging where the experiment proved it. +4. Storyboard the experiment as a worked example in + `examples/salesagent_sidecar/` (with credentials redacted). + +If exit criteria fail: + +1. Document where the model broke in + `docs/proposals/product-architecture-revision-1.md`. +2. Revise [#502](https://github.com/adcontextprotocol/adcp-client-python/pull/502) + against the real failure mode. +3. Rerun the experiment against the revised model. + +## References + +* [#502](https://github.com/adcontextprotocol/adcp-client-python/pull/502) + / `docs/proposals/product-architecture.md` — the architecture this + experiment falsifies. +* [#489](https://github.com/adcontextprotocol/adcp-client-python/pull/489) + / `examples/multi_platform_seller/MIGRATION_FROM_ADAPTER_REGISTRY.md` + — the migration narrative this experiment validates against real code. +* [`prebid/salesagent`](https://github.com/prebid/salesagent) — the + adopter codebase the experiment runs inside. Key files: + `src/core/database/models.py` (2,113 LOC), + `src/services/dynamic_products.py` (505 LOC), + `src/core/tools/media_buy_create.py` (3,930 LOC), + `src/adapters/google_ad_manager.py`, + `src/a2a_server/adcp_a2a_server.py` (2,276 LOC). From c16614fc7d85251fec6a5f1ee1bba6853f23bb3e Mon Sep 17 00:00:00 2001 From: Brian O'Kelley Date: Sun, 3 May 2026 19:55:32 -0400 Subject: [PATCH 2/7] =?UTF-8?q?docs(proposals):=20revise=20experiment=20pl?= =?UTF-8?q?an=20v2=20=E2=80=94=20address=20self-review?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Major changes addressing self-review on PR #506: 1. Two-phase structure. Phase 1 (~1 day): falsify recipe model on dynamic_products.py in isolation, no infrastructure. Phase 2 (3-5 days, gated on Phase 1): full side-car runtime with GAM live + HITL + webhooks. The cheapest available falsifier of #502 runs first; full experiment only if Phase 1 finds the recipe shape can carry signal-driven variants without escape hatches. 2. Wrap target corrected. Wrap salesagent's _impl seams (_create_media_buy_impl, _update_media_buy_impl, _add_creative_assets_impl, _get_products_impl), NOT adapter bodies. Self-review correctly flagged that wrapping google_ad_manager.py forces re-implementing principal resolution, tenant config, currency validation, signal lookup, audit logs — a port disguised as a wrap. 3. Routing strategy: separate process + nginx tenant-header routing, NOT in-process dispatch. Avoids dual MCP+A2A transport mounts, FastMCP registry collisions, ResolvedIdentity construction conflicts. Killable, independently restartable. 4. Step 0 with concrete prereqs: pin SHAs (SDK, both storyboards, GAM Network ID); identify _impl seams; enumerate AST guards that fire on src/sdk_runtime/; enumerate cross-tenant background services to disable; validate _already_approved sentinel under extra='forbid'; verify webhook signing parity against subscribed test buyer; pre-register falsification signals. 5. Multi-criteria exit (not binary): both storyboards pass, recipe carries implementation_config without extra: dict escape hatches, glue LOC under 60% ratio, zero structural-guard allowlist additions, at least one finding contradicts a #502 prior, webhook signature verified by real subscribed buyer. 6. HITL marker decision decoupled. Salesagent's pattern is N=1 and unrepresentative; experiment informs but doesn't settle the Protocol seam. Spec PR comes after as separate work. 7. Q1.5 added: does recipe model allow proposal-time assembly, not just lookup? implementation_config lives on Product row; dynamic products generate signal-driven variants at brief time. #502's session-cache model may be too restrictive. 8. Three HITL re-entry surfaces in scope (create/update/ add_creative_assets). Creative-specific re-entry through order_approval_service.py out of scope for v1. 9. Both storyboards pinned: media_buy_seller for happy path + media_buy_guaranteed_approval for HITL. Test-controller hybrid mode for delivery simulation alongside real GAM mutations. 10. Auth shim sized correctly per schema audit (~150 LOC total — Principal is bearer-token only, Account is already AdCP-shaped). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../salesagent-sidecar-experiment.md | 837 +++++++++++------- 1 file changed, 538 insertions(+), 299 deletions(-) diff --git a/docs/proposals/salesagent-sidecar-experiment.md b/docs/proposals/salesagent-sidecar-experiment.md index ae6f1fc4d..46ec119b2 100644 --- a/docs/proposals/salesagent-sidecar-experiment.md +++ b/docs/proposals/salesagent-sidecar-experiment.md @@ -1,9 +1,10 @@ # Experiment: salesagent side-car runtime (full GAM) -Status: **PROPOSED**. Falsifies [#502](https://github.com/adcontextprotocol/adcp-client-python/pull/502)'s +Status: **PROPOSED v2** (revised after self-review on PR #506). +Falsifies [#502](https://github.com/adcontextprotocol/adcp-client-python/pull/502)'s two-platform composition against a real adopter codebase + a live ad -server. Runs in a salesagent worktree, points at salesagent's existing -Postgres, drives real GAM. Admin UI untouched. +server. Two phases — a cheap recipe-shape falsification first, full +runtime gated on it passing. ## Hypothesis @@ -11,8 +12,8 @@ The four-layer model + two-platform composition (`ProposalManager` + `DecisioningPlatform`) in [#502](https://github.com/adcontextprotocol/adcp-client-python/pull/502) holds up when ported on top of salesagent's actual code and database, with GAM as the live decisioning target. Either it ports cleanly and -we ship the Protocol shape the experiment forces, or it doesn't and -we learn what's wrong before the architecture solidifies. +we have a Protocol shape that emerged from real complexity, or it +doesn't and we learn what's wrong before the architecture solidifies. The narrative-level claim in [PR #489](https://github.com/adcontextprotocol/adcp-client-python/pull/489) is that adopters port to `SalesPlatform` today and split along the @@ -20,344 +21,577 @@ proposal/decisioning seam later when `ProposalManager` lands. The experiment runs that exact split — except we force it now, with `ProposalManager` designed inline against salesagent's complexity. -## Why GAM, not mock-mode +## Two phases -Mock-mode would tell us if the wire shape works. It wouldn't tell us -whether the SDK can host salesagent's actual upstream complexity — -which is the whole question. GAM is where ~99% of salesagent's -deployed value lives (per the migration guide §"Migration order"). -The dynamic-product assembly, the `implementation_config` shape, the -line item / ad unit / KV-targeting plumbing, the lifecycle state -graph — all of it is GAM-shaped today. If our architecture survives -contact with GAM, it survives. If it doesn't, mock conformance was -never going to tell us. +Self-review surfaced that the original single-phase plan front-loaded +3-5 days of expensive infrastructure work (GAM credentials, admin UI +isolation, webhook signing, runtime routing) before testing the +question that's most likely to falsify #502 — whether the recipe +model can carry salesagent's dynamic-product assembly without escape +hatches. The cheapest available falsifier doesn't need any of that +infrastructure. + +**Phase 1 — recipe falsification (~1 day, no infrastructure).** +Port `dynamic_products.py` to a `ProposalManager.get_products` +wrapper in isolation. No GAM, no admin UI, no DB rewiring, no +runtime routing. Drive it from recorded signal-agent fixtures. +Assert the output recipe carries every signal-driven variant +salesagent generates without typed escape hatches. If this dies, +the side-car experiment doesn't run; #502 needs revision first. -## Slice +**Phase 2 — full side-car runtime (3-5 days, gated on Phase 1).** +Salesagent worktree, side-car process driving real GAM, HITL +exercised through `media_buy_guaranteed_approval`, webhooks +delivered through SDK F12 auto-emit and verified by a real +subscribed buyer. -**One salesagent tenant, configured for GAM, serving the AdCP -`media_buy_seller` storyboard end-to-end against a sandbox GAM Network -ID.** Other tenants stay on the existing runtime. Same process, same -DB, routed by tenant. +Phase 1 gates Phase 2. Phase 2 only runs if Phase 1 produced findings +consistent with #502. + +## Why GAM live (Phase 2 only) + +Mock-mode would tell us if the wire shape works. It wouldn't tell us +whether the SDK can host salesagent's actual upstream complexity — +which is the whole question for Phase 2. GAM is where ~99% of +salesagent's deployed value lives. The dynamic-product assembly, +recipe shape, lifecycle state graph, HITL plumbing — all of it is +GAM-shaped today. If our architecture survives contact with GAM, it +survives. Mock conformance was never going to tell us. + +## Step 0 — Prereqs (both phases) + +These must land before any wrapping code is written. Several are +themselves unscoped today; some are concrete prereqs, some are +investigations whose output shapes the rest of the work. + +* **Pin SHAs.** `adcp-client-python@`, storyboard + `media_buy_seller@`, storyboard + `media_buy_guaranteed_approval@`, GAM sandbox Network ID + `` documented in the experiment README. Without pins, Phase 2's + "is it the storyboard or our wrap" debugging burns days. +* **Identify the `_impl` seams in salesagent.** Self-review correctly + flagged that wrapping `google_ad_manager.py` is a port disguised as + a wrap — the adapter doesn't own buyer/principal resolution, tenant + config, currency validation, signal lookup, audit logs, workflow + rows, or webhook scheduling. Salesagent's CLAUDE.md Pattern #5 + establishes `_impl` functions as the transport-agnostic seam: + `_create_media_buy_impl`, `_update_media_buy_impl`, + `_add_creative_assets_impl`, `_get_products_impl`. **The wrap target + is the `_impl`, not the adapter.** If any of these `_impl`s don't + exist or aren't transport-agnostic enough today, that's a + prerequisite refactor in salesagent before the side-car experiment + starts. +* **Enumerate AST structural guards** that fire on `src/sdk_runtime/`. + Salesagent has ~25 guards via `make quality` — no raw `select()` + outside repos, no `get_db_session()` in `_impl`, schema inheritance + from adcp library, repository pattern enforcement, single alembic + head. Decide: (a) earn allowlist entries for each (and document + each as an architectural finding), or (b) move `src/sdk_runtime/` + outside `src/` so the side-car isn't really inside salesagent. + Either is defensible; not deciding is not. +* **Enumerate cross-tenant background services to disable** for the + experiment tenant: `background_sync_service`, + `background_approval_service`, `delivery_webhook_scheduler`, + `protocol_webhook_service`. Per-tenant disable is doable but + invisible in the test tenant's behavior unless explicitly listed. + Document the disable mechanism (env var? tenant-scoped config? + process-level kill switch?) before relying on it. +* **Validate `_already_approved` survives SDK re-validation.** + Salesagent uses `setattr(request, "_already_approved", True)` on a + Pydantic model (`media_buy_create.py:529`). Salesagent runs + `extra="forbid"` (Pattern #7); the SDK runtime presumably + re-validates on inbound. Confirm the sentinel survives the + projection round-trip. If it doesn't, prototype the typed + alternative (`ctx.resumption_token`) on day 1, before the HITL gate + is wired. +* **Document a test buyer that validates HMAC signatures.** §3.14's + claim that adopters delete their webhook plumbing is only validated + if a real subscribed buyer accepts SDK-signed webhooks. Identify the + test buyer up front; verify signature parity between salesagent's + `webhook_authenticator.py` scheme and SDK F12 / `WebhookSender` + before the storyboard run, not during. + +## Phase 1 — `dynamic_products.py` recipe falsification + +**Slice.** Port `dynamic_products.py` +(`src/services/dynamic_products.py`, 505 LOC plus the AI services in +`src/services/ai/`) to a `ProposalManager.get_products` wrapper as a +self-contained unit. No GAM, no admin UI, no DB rewiring, no runtime +routing. Drive it from recorded inputs: + +* Recorded buyer briefs (synthetic or pulled from prod transcripts) +* Recorded signal-agent outputs (the + `signals_agent_registry.SignalsAgent.get_signals` returns) +* Recorded `Product` table rows (a fixture catalog) + +The wrapper takes a brief, calls into the existing +`dynamic_products.py` body, projects the resulting signal-driven +variant `Product` rows into wire `Product[]` with +`implementation_config` recipes per #502. + +**Exit criteria for Phase 1.** + +1. Recipe shape carries `implementation_config` for both static + catalog products AND signal-driven variants without + `extra: dict[str, Any]` typed escape hatches. Salesagent's actual + `implementation_config` content (line item template ids, ad unit + ids, KV targeting, signal mappings, frequency caps) is the + ground truth. +2. Glue LOC ≤ 60% of `dynamic_products.py` body. Source is 505 LOC, + so glue target is ≤ 300 LOC. **The 60% ratio is the falsifiable + threshold; the 300 absolute is a derived target. If the source + grows, the ratio holds.** +3. At least one finding that **contradicts or materially refines** a + #502 prior. The point of a falsification is to find something + wrong; an experiment that confirms every prior is uninformative. + Pre-register the candidate contradictions before running: + - Q1.5 (recipe assembly vs. lookup): if salesagent generates + signal-driven variants at brief time, that's proposal-time + *assembly*, not lookup of pre-existing recipes — and #502's + "framework session cache against `proposal_id`" model may be + too restrictive. + - Q2 (recipe escape hatches): if the wire-shape recipe can't + carry GAM's full `implementation_config` without `extra`, the + typed-recipe model is wrong. + +**If Phase 1 dies.** Document findings. Side-car experiment doesn't +run; #502 needs revision first. Cheap, fast falsification done its +job. + +**If Phase 1 passes.** Proceed to Phase 2 with the recipe model +validated. The side-car experiment focuses on the runtime, HITL, +webhook, and decisioning seams — questions Phase 1 can't answer. + +## Phase 2 — side-car runtime (gated on Phase 1) + +**One salesagent test tenant we own**, configured for GAM, serving +two storyboards end-to-end against a sandbox GAM Network ID: + +* `media_buy_seller` — the 9 core lifecycle scenarios, instant + approval path, exercises the wire shape end-to-end +* `media_buy_guaranteed_approval` — exercises HITL approval flow, + the `compose_method` + `ShortCircuit` claim against salesagent's + actual `WorkflowStep` mechanism In scope: * **Salesagent admin UI** (`src/admin/`) — UNCHANGED. Operators - configure tenants, products, principals through existing Flask - blueprints. + configure tenants, products, principals, approval queues through + existing Flask blueprints. * **Salesagent Postgres schema** (`src/core/database/models.py`) — UNCHANGED. The side-car runtime reads/writes the same tables. -* **A new `src/sdk_runtime/` directory** in salesagent, owning: - - `BuyerAgentRegistry` impl projecting from `Principal` rows - (`access_key_hash`, `oauth_client_id`) - - `AccountStore` impl projecting account context from `Principal` + - `Tenant` rows - - `GAMProposalManager` — reads `Product` rows, runs the - `dynamic_products.py` (`src/services/dynamic_products.py`) - assembly logic, emits wire `Product[]` with - `implementation_config` recipes - - `GAMDecisioningPlatform` — wraps salesagent's existing - `google_ad_manager.py` adapter; calls real GAM - - `adcp.serve(...)` entrypoint mounted alongside the existing A2A - + MCP servers, routing the experiment tenant only + Self-review correctly flagged this as a cross-runtime contract on + shared rows, not a tenant-isolated boundary: `WorkflowStep` rows + written by the SDK's `before` hook must be re-readable by admin + UI's `approve_workflow_step` and re-callable into the SDK runtime + via the rewritten `execute_approved_media_buy`. +* **A new side-car process** running adcp-client-python's + `adcp.serve(...)` with: + - `BuyerAgentRegistry` projecting from `Principal` rows + (`access_token` bearer lookup; ~80 LOC) + - `AccountStore` projecting from `Account` rows (already + AdCP-shaped — ~70 LOC) + - `GAMProposalManager` wrapping `_get_products_impl` (the + transport-agnostic seam, after Phase 1 validated the recipe + shape) + - `GAMDecisioningPlatform` wrapping `_create_media_buy_impl`, + `_update_media_buy_impl`, `_get_media_buy_delivery_impl` + - HITL `before` hook on all three mutating ops, plus + `_add_creative_assets_impl` (the **third** approval re-entry + surface — see HITL section below) + - `WebhookSender` configured with signing parity verified at + Step 0 against the test buyer * **Real GAM upstream** — actual orders/line items in a sandbox - network. Real auth, connection pooling, error projection. The - existing GAM adapter handles upstream calls; we wrap it. -* **HITL approval lifecycle** — the - `compose_method` + `ShortCircuit` claim from #489 §3.9 against - salesagent's actual approval flow. - - **How salesagent does HITL today.** Synchronous gate + DB-persisted - re-entry, not a paused coroutine. - - 1. Adapter's `create_media_buy` (`google_ad_manager.py:571`) - checks `_requires_manual_approval(op) and not getattr(request, - "_already_approved", False)`. If gated: writes a `WorkflowStep` - row (status `"pending_approval"`), an `ObjectWorkflowMapping` - linking step → media buy, and a `MediaBuy` row carrying - `raw_request` as JSON. Returns 200 with `workflow_step_id` - populated. **No GAM call has happened.** + network. Real auth (read from `AdapterConfig`), connection + pooling, error projection. + +* **HITL approval lifecycle (three surfaces).** + Self-review surfaced that `_already_approved` is one of three + re-entry surfaces, plus creative approval has its own re-entry + through `order_approval_service.py` and + `background_approval_service.py`. Either we exercise all of them + or we scope the others out explicitly. + + **Decision: in scope** — `create_media_buy`, `update_media_buy`, + `add_creative_assets`. Creative-approval-specific re-entry through + `order_approval_service.py` is **out of scope** for v1 (creative + flows are deferred regardless); revisit if `media_buy_guaranteed_approval` + storyboard happens to exercise it. + + **How salesagent does HITL today** (verified, file:line): + + 1. `_create_media_buy_impl` (or the GAM adapter at + `google_ad_manager.py:571`) checks + `gam_manual_approval_required` from `AdapterConfig` + (`models.py:1129`, **tenant-scoped, not account-scoped**) AND + `not getattr(request, "_already_approved", False)`. If gated: + writes `WorkflowStep(status="pending_approval")` + + `ObjectWorkflowMapping` linking step → media buy + `MediaBuy` + row carrying `raw_request` as JSON. Returns 200 with + `workflow_step_id`. **No GAM call has happened.** 2. Operator approves through admin UI. Flask route `approve_workflow_step` (`admin/blueprints/workflows.py:155`) directly imports `execute_approved_media_buy` (`media_buy_create.py:458`) — same process, same DB. 3. `execute_approved_media_buy` reconstructs the request from - `raw_request`, sets `request._already_approved = True` - (`media_buy_create.py:529`), re-calls the adapter. The gate - skips this time; real GAM call fires. - - **What that means for the SDK runtime.** No paused-task - resumption needed. Salesagent's `WorkflowStep` table IS its task - registry; we keep using it. Mapping: - - - `before` hook on `create_media_buy` / - `update_media_buy` / `add_creative_assets` checks - `manual_approval_required`. If true: calls - salesagent's existing `workflow_manager` to write the - `WorkflowStep` + `MediaBuy(status='pending_approval', - raw_request=...)` rows, returns - `ShortCircuit(value=CreateMediaBuySuccess( - status='pending_approval', workflow_step_id=...))`. - - Admin UI unchanged. `approve_workflow_step` keeps calling - `execute_approved_media_buy(...)`. - - Rewrite the body of `execute_approved_media_buy`: reconstruct - request, attach a resumption marker (the - `_already_approved` sentinel, or a typed equivalent on `ctx`), - call `sdk_runtime.create_media_buy(req, ctx)` directly. Gate - sees the marker, falls through, real GAM call happens, SDK F12 - auto-emit fires the completion webhook. - - **The interesting design question.** Salesagent's - `_already_approved` is an untyped `setattr` sentinel — fine for - one codebase, fragile as an SDK seam. The experiment forces a - decision: does the SDK ship a typed resumption marker (e.g., - `ctx.resumption_token: ResumptionToken | None`), or does the - before-hook check on adopter-defined state? Whatever shape - works in this experiment is the recommended Protocol seam. -* **Webhook delivery via SDK F12 auto-emit.** Disable salesagent's - `protocol_webhook_service.py` for the experiment tenant; configure - `WebhookSender` on `serve(...)`. We're testing whether buyer- - registered `push_notification_config` gets sync-completion webhooks - fired automatically, signed correctly, retried on transient - failure, logged-and-swallowed on permanent failure — without - adapter code participating. The migration guide §3.14 claims - adopters delete their webhook plumbing wholesale; the experiment - validates that against a real workload. + `MediaBuy.raw_request`, sets `request._already_approved = True` + (`media_buy_create.py:529`), re-calls. Gate skips; real GAM + call fires. + + **What that means for the SDK runtime.** No paused-task resumption + needed. Salesagent's `WorkflowStep` table IS its task registry; we + keep using it. The before-hook writes the workflow row, the admin + UI route stays untouched, and `execute_approved_media_buy` gets + rewritten to call back into the SDK runtime instead of the legacy + tool function. + +* **Webhook delivery via SDK F12 auto-emit.** Per-tenant disable for + `protocol_webhook_service.py` (verified at Step 0); `WebhookSender` + configured with signing scheme that matches salesagent's + `webhook_authenticator.py`. Verified end-to-end against the test + buyer at Step 0, not just "did something fire." Out of scope (deliberate): * Other adapters (Kevel, Broadstreet, Triton, mock — keep on the existing runtime). -* Other tenants. +* Other tenants. We control the experiment tenant; nothing real + rides on it. * Refine flow / proposal lifecycle (`finalize`, `expires_at`, draft → committed). Wire later if v1 is stable. * Push reporting (`reporting_webhook` / `reporting_bucket`). * Creative agent / creative builder. +* Creative-approval-specific re-entry through + `order_approval_service.py` / `background_approval_service.py` + unless the HITL storyboard happens to exercise it. ## Architecture +Self-review correctly flagged that "same process, two runtimes, +tenant-routed" buries the hard problem. Both runtimes own MCP+A2A +transport mounts, FastMCP tool registries, identity resolution, and +ResolvedIdentity construction. Routing must happen *before* either +runtime's transport layer claims the request. + +**Decision: separate process, nginx-level routing by tenant header.** + ``` -┌────────────────────────────────────────────────────┐ -│ Salesagent admin (Flask blueprints) — unchanged │ -└──────────────────┬─────────────────────────────────┘ - │ writes - ↓ -┌────────────────────────────────────────────────────┐ -│ Salesagent Postgres — unchanged schema │ -│ Tenant, Principal, Product, MediaBuy, ... │ -└──────────────────┬─────────────────────────────────┘ - │ reads - ↓ -┌────────────────────────────────────────────────────┐ -│ src/sdk_runtime/ — the experiment │ -│ BuyerAgentRegistry ← Principal rows │ -│ AccountStore ← Principal + Tenant rows │ -│ GAMProposalManager ← Product rows + dynamic │ -│ GAMDecisioningPlatform → wraps gam adapter │ -│ adcp.serve(transport='both') │ -└──────────────────┬─────────────────────────────────┘ - │ - ↓ - real GAM upstream +┌────────────────────────────────────────────────────────┐ +│ nginx — routes by `X-Tenant-Id` header │ +│ experiment-tenant → :3001 (side-car) │ +│ everything else → :3000 (existing runtime) │ +└────────────────┬──────────────────────┬─────────────────┘ + │ │ + ┌────────▼────────┐ ┌──────▼────────────┐ + │ existing salesagent │ │ side-car process │ + │ runtime (port 3000) │ │ (port 3001) │ + │ adcp_a2a_server.py │ │ adcp.serve(...) │ + │ mcp_server_*.py │ │ GAMProposalMgr │ + └────────┬────────────┘ │ GAMDecisioning │ + │ └──────┬────────────┘ + │ │ + ↓ ↓ + ┌────────────────────────────────────────────────┐ + │ Salesagent admin UI (Flask) — unchanged │ + │ Salesagent Postgres — shared schema │ + │ Tenant, Principal, Account, Product, │ + │ MediaBuy, WorkflowStep, ObjectWorkflowMapping│ + └────────────────────────────────────────────────┘ + │ + ↓ + real GAM upstream + (Phase 2 only) ``` -The existing `adcp_a2a_server.py` (2,276 LOC) and -`mcp_server_enhanced.py` keep serving every other tenant. The -experiment tenant routes to `adcp.serve(...)` only. - -Routing strategy: a tiny shim at the framework entry consults -`Principal.tenant_id` and dispatches to either the existing runtime -or the SDK runtime. One process, two runtimes, clean handoff at the -tenant boundary. - -## Wrap, don't port - -`media_buy_create.py` is 3,930 LOC. Porting it body-by-body into -`GAMDecisioningPlatform.create_media_buy` is the failure mode. The -discipline: **wrap the existing tool body**. The SDK runtime takes -the wire request, projects to salesagent's internal shape, calls the -existing tool function (or the GAM adapter directly), projects the -response back to wire shape. - -Same rule for `dynamic_products.py` (505 LOC): wrap, don't port. -`GAMProposalManager.get_products` calls `dynamic_products.py` with -salesagent's expected inputs, gets back its native output, projects -to wire `Product[]`. - -This isolates the experiment to **the seam**: where wire requests -become salesagent-shaped calls and back. We learn whether the SDK -abstractions can host salesagent's existing logic without re-porting -it. If a wrapper feels forced, that's the architecture telling us -something. If it ports as a thin shim, the architecture holds. - -## Exit criteria (binary) - -The AdCP `media_buy_seller` storyboard passes against the experiment -tenant, with: - -1. Real GAM line items / orders created in the sandbox network. -2. Salesagent's existing Postgres schema as the data backing. -3. Salesagent admin UI untouched and functional. -4. No double-write conflicts: every write for the experiment tenant - goes through the SDK runtime; the existing tools don't serve - that tenant. - -If the storyboard passes: the architecture holds. We have a working -`ProposalManager` shape that emerged from real adopter code, ready -to factor into the SDK Protocol. - -If it fails: we learn precisely where #502's model breaks — -proposal-side (assembly factoring), decisioning-side (recipe shape), -seam (capability overlap, lifecycle, hydration), or supporting (auth -projection, schema mismatch). - -## What we expect to learn - -Three concrete questions the experiment answers: - -1. **Does `dynamic_products.py` (`src/services/dynamic_products.py`, - 505 LOC) factor onto `ProposalManager.get_products`?** - Salesagent's signal-driven assembly is the most complex piece of - proposal logic in the wild. If it ports as a thin wrapping - (target: <300 LOC of glue), the proposal-side abstraction is - correct. If it requires gutting `dynamic_products.py` itself, the - abstraction is wrong. - -2. **Does the recipe carry enough?** GAM's `implementation_config` - (`Product.implementation_config: JSONType`, - `models.py:256`) is the most-evolved form: line item template - ids, ad unit ids, KV targeting, frequency caps, signal mappings. - If the recipe shape we settle on can carry it without escape - hatches (no `extra: dict[str, Any]` smuggled fields), the - typed-recipe model in #502 is sound. +Why separate process beats in-process dispatch: + +* No dual MCP+A2A transport mounts in one ASGI app +* No FastMCP tool registry collisions +* No double-initialization of `ResolvedIdentity` construction (a + structural guard in salesagent) +* Independent lifecycle: side-car restarts without touching the + existing runtime +* Killable: turn off the side-car port, traffic falls back to the + existing runtime, experiment is over +* No SQLAlchemy model / metadata import-time conflicts (single + alembic head stays single) + +The cost: nginx config change in dev, plus an env-var or tenant +metadata flag for the `X-Tenant-Id` header injection. Cheap. + +## Wrap, don't port (Phase 2) + +The discipline is to wrap **the `_impl` functions**, not adapters. +Salesagent's `_impl` seam (Pattern #5 in salesagent's CLAUDE.md) is +already transport-agnostic — `_create_media_buy_impl` takes a +typed request + tenant context and returns a typed response. The +SDK runtime calls `_impl` directly with projected requests; the +adapter underneath stays untouched. + +The wrong target — `google_ad_manager.py` — would force the side-car +to re-implement principal resolution, tenant config loading, +currency validation, signal lookup, audit logger wiring, workflow +row creation, webhook scheduling. That's the 3,930 LOC of +`media_buy_create.py` getting smuggled into the wrap. The right +target — `_create_media_buy_impl` — already absorbs all of that. + +For Step 0: confirm `_create_media_buy_impl`, +`_update_media_buy_impl`, `_add_creative_assets_impl`, +`_get_products_impl` exist and are transport-agnostic. If not, +introducing them is a salesagent-side prerequisite refactor. + +## Exit criteria (multi-criteria, not binary) + +A passing storyboard with wrappers that grew enormous tells us +nothing about whether the architecture works. The exit criteria +must be co-equal — all of them, not just (1). + +1. **Both storyboards pass** — `media_buy_seller` (9 core happy + path) AND `media_buy_guaranteed_approval` (HITL exercise), + against a sandbox GAM Network ID, with real GAM line items + created. +2. **Recipe carries `implementation_config` without escape + hatches** — no `extra: dict[str, Any]`, no untyped passthrough. + GAM's full implementation_config (line item template ids, ad + unit ids, KV targeting, frequency caps, signal mappings) fits + the typed recipe shape. +3. **Glue LOC under ratio thresholds** — proposal-side glue ≤ 60% + of `dynamic_products.py` body (≤ 300 LOC against 505 source); + decisioning-side glue ≤ 30% of `_create_media_buy_impl` body. + Pre-register thresholds before measuring. +4. **Zero structural-guard allowlist additions** — or every + addition is documented as an architectural finding requiring a + spec response. Salesagent's guards encode design constraints; + bypassing them silently means the side-car is violating + constraints we should be debating. +5. **At least one of the five learning questions has an answer + that contradicts a #502 prior** — pre-register the candidate + contradictions before running. An experiment that confirms + every prior is tautological; pre-registration prevents + post-hoc rationalization of "what we found." +6. **Webhook signature verified by a subscribed test buyer** — + not just "WebhookSender fired something." Buyer must validate + HMAC and accept the delivery, end-to-end. + +If any of (1)-(6) fails, the experiment is informative — it tells +us where the architecture breaks. Don't paper over a partial pass +by relaxing a criterion mid-run. + +## What we expect to learn (five questions + Q1.5) + +Pre-register which signal would falsify each prior, before running. +Self-review's "one author wearing three hats" warning applies — if I +don't commit upfront to what would tell me I'm wrong, I'll find what +I'm looking for. + +1. **Does `dynamic_products.py` factor onto + `ProposalManager.get_products`?** Phase 1 answers this in + isolation. Falsifier: the wrap exceeds 60% of source LOC, or + requires escape hatches in the recipe. + +2. **Does the recipe carry enough?** GAM's + `implementation_config` (`Product.implementation_config: JSONType`, + `models.py:256`) is the most-evolved form. Falsifier: any + `extra: dict[str, Any]` field on the recipe, or an untyped + passthrough mechanism. 3. **What does framework-owned proposal lifecycle actually need?** - When `create_media_buy` runs, it needs to hydrate the recipe from - somewhere. Three plausible answers (session cache, DB row, fresh - lookup); the experiment forces a choice. The choice IS the - design for the `ProposalManager` Protocol. - -4. **What is the right shape for the resumption marker?** Salesagent's - HITL model (gate → persist → admin approves → re-call same code - path with sentinel) maps cleanly onto `compose_method` + - `ShortCircuit` without needing the SDK's `TaskRegistry` to round-trip. - The gate writes salesagent's `WorkflowStep` row; the admin UI keeps - calling `execute_approved_media_buy`; that function re-enters the - SDK runtime with a marker. The open seam is the marker itself — - salesagent uses an untyped `setattr(request, "_already_approved", - True)`, fine for one codebase, fragile as an SDK contract. Two - plausible shapes: (a) typed `ctx.resumption_token: ResumptionToken - | None` carried through framework dispatch, the before-hook checks - it; (b) keep the marker adopter-side entirely, the SDK doesn't - model it. The experiment forces a choice; the choice IS the - recommended Protocol seam for cross-cutting `compose_method` gates - in adopters with persisted-and-re-fired approval flows. Sub-question - to resolve in passing: does `compose_method`'s `before` hook need - read access to `ctx` rich enough to make this decision (it should). - -5. **Does F12 webhook auto-emit hold up under real load?** The - migration guide claims adopters delete their webhook plumbing. - The experiment runs that — `protocol_webhook_service.py` off for - the experiment tenant, `WebhookSender` on. If signing, retry, - and failure handling all work without adapter code, the claim - stands. If anything leaks through to adapter code, we revise - §3.14. - -## Risks - -* **Scope creep on the wrap.** The temptation to "just port this - small piece" compounds. Mitigation: every wrap stays a wrap until - the storyboard passes. Re-porting is a follow-up, not part of v1. -* **GAM credential handling.** Need a sandbox GAM Network ID with - real auth. Salesagent already has this in dev; experiment - inherits. Document the env vars in the experiment README. -* **HITL resumption marker shape.** Salesagent's untyped - `_already_approved` sentinel is what re-entry uses today. Lifting - that to a typed SDK contract (or deciding to leave it adopter-side) - is a real design call, not just plumbing. Wrong choice here means - every adopter with a persisted-approval flow re-invents the - marker. Mitigation: prototype both shapes in the experiment; - whichever felt natural at the gate site wins. -* **Webhook signing config.** SDK `WebhookSender` and salesagent's - existing `webhook_authenticator.py` need to agree on signing - semantics, or the buyer rejects deliveries. Validate against a - test buyer before the storyboard run. -* **Auth projection drift.** If `Principal.access_key_hash` doesn't - project cleanly onto `BuyerAgentRegistry`, the foundation in - #489 §"Foundations" is wrong. Catch this early — write the - registry shim first, before anything else. -* **Two writers, one DB — controlled.** The experiment tenant is a - test tenant we own and is not actively serving real buyers, so - duplication is operationally fine. Tenant routing still enforces - the boundary in code (experiment tenant never hits old runtime, - old tenants never hit new runtime), but the consequence of a - routing bug is "we notice and fix it," not "we corrupt a buyer's - campaign." -* **The storyboard hits a feature we deferred.** If `media_buy_seller` - exercises refine or push reporting on the happy path, scope - expands. Pin the storyboard version up front; test against it - locally before declaring v1 done. + Three plausible answers (session cache, DB row, fresh lookup); + the experiment forces a choice. Falsifier: none of the three + work — the framework needs primitives #502 doesn't anticipate. + +4. **What is the right shape for the HITL resumption marker?** + Salesagent's HITL model (gate → persist → admin approves → + re-call same code path with sentinel) maps cleanly onto + `compose_method` + `ShortCircuit` without needing the SDK's + `TaskRegistry` to round-trip. The open seam is the marker + itself. Two candidate shapes: + - (a) typed `ctx.resumption_token: ResumptionToken | None` + carried through framework dispatch + - (b) keep the marker adopter-side entirely, the SDK doesn't + model it + + **Sub-question on the resumption substrate.** Salesagent's + `MediaBuy.raw_request` JSON is the actual resumption payload. + The SDK has no concept of it. The "typed + `ctx.resumption_token`" option only works if the SDK can carry + the equivalent through dispatch — or if the adopter is + responsible for rebuilding the request before calling back in. + The experiment forces a decision here too; it's not a free + choice. + + **Important caveat (self-review N=1).** Salesagent's pattern + is DB-persisted re-call with a sentinel — unrepresentative of + paused-coroutine resumption shapes other adopters might use. + The experiment can answer "does the SDK seam accommodate + salesagent's shape." It **cannot** answer "what's the right + Protocol seam for `ProposalManager` in general." Decoupling + matters: the experiment informs a Protocol RFC; the spec + decision lives in a separate doc after the experiment, not + inherited from what felt natural in salesagent. + +5. **Does F12 webhook auto-emit hold up under real load?** Per + the §3.14 claim. Falsifier: signing semantics don't match the + subscribed test buyer's expectations, or anything leaks + through to adapter code. + +**Q1.5 — Does the recipe model allow proposal-time *assembly*, +not just lookup?** Self-review surfaced this. PR #502 says the +recipe "lives in framework session cache against `proposal_id`" +during refine, then "framework persists it" on finalize. +Salesagent's `implementation_config` is on the *Product* row +(`models.py:256`), and dynamic products generate signal-driven +variant Product rows at brief time. Those are new Product rows +the framework doesn't know about until the brief lands. The +recipe-as-framework-managed-state model probably has to allow +proposal-time recipe *assembly*, not just lookup. + +Phase 1 is the cheapest place to falsify this. If the wire shape +can't carry signal-driven variants without escape hatches, #502's +recipe model needs revision. + +## Risks (revised) + +* **Wrap target drift.** Mitigated by Step 0 `_impl` identification + and the discipline that wraps wrap `_impl` not adapters. If any + needed `_impl` doesn't exist in transport-agnostic form, that's a + salesagent-side refactor before the side-car experiment runs. +* **Webhook signature parity is a real risk, not paperwork.** + Mitigated by Step 0 dry-run against a subscribed test buyer. +* **`_already_approved` may not survive `extra="forbid"` projection.** + Mitigated by Step 0 validation. If it doesn't survive, prototype + typed marker on day 1. +* **Cross-runtime contract on shared rows.** `WorkflowStep` and + `MediaBuy.raw_request` are written by the SDK runtime's before-hook + and read/re-fired by the admin UI's approval handler. This is a + contract on a shared schema, not a tenant-isolated boundary. The + test tenant being controlled doesn't change the contract; it just + makes a contract bug recoverable instead of catastrophic. +* **Three HITL re-entry surfaces, not one.** `create_media_buy`, + `update_media_buy`, `add_creative_assets` — all in scope. + Creative-specific re-entry through `order_approval_service.py` + out of scope for v1. Easy to invisibly skip one; risk is "looks + like it works" with one path missed. +* **HITL marker decision is N=1.** Mitigated by decoupling — the + experiment informs but doesn't settle the Protocol seam. Spec PR + comes after, with the experiment as one data point among multiple. +* **One author wearing three hats** (SDK author, salesagent adopter, + reviewer). Mitigated by pre-registering falsification signals + before running. Self-review caught this; pre-registration is the + enforcement mechanism. +* **Storyboard test-controller methods + real GAM are in tension.** + Test-controller methods (`force_*`, `simulate_delivery`) bypass + upstream determinism by design. Hybrid mode: real GAM for + `create_media_buy` / `update_media_buy` mutations; salesagent's + existing `delivery_simulator.py` wired into the side-car for + delivery simulation and state forcing. The hybrid maintains + storyboard determinism without sacrificing the GAM-live test for + mutations. +* **Two writers, one DB — controlled, but with a contract.** The + experiment tenant is a test tenant we own; duplication is + operationally fine. But salesagent's SQLAlchemy models in + `models.py` (2,113 LOC) and the side-car's schema view must agree + exactly on cascade/relationship semantics. Single alembic head + stays salesagent's. ## Workstream -1. **Set up the worktree.** New salesagent worktree for the - experiment. Install adcp-client-python from `main` (or a working - branch) into salesagent's venv. -2. **Auth shim first.** `BuyerAgentRegistry` + `AccountStore` reading - from `Principal` + `Tenant`. Validate against existing test - fixtures before anything else lands. -3. **`GAMDecisioningPlatform` wrapper.** Thinnest possible wrap of - `google_ad_manager.py`. `create_media_buy` and - `get_media_buy_delivery` only. -4. **`GAMProposalManager` wrapper.** Thinnest possible wrap of - `dynamic_products.py` + the `Product` catalog read. -5. **HITL gate via `compose_method`.** `before` hook on - `create_media_buy` / `add_creative_assets` / `update_media_buy` - that consults salesagent's `manual_approval_required` config, - writes the `WorkflowStep` + `MediaBuy(status='pending_approval', - raw_request=...)` rows via salesagent's existing `workflow_manager`, - and returns `ShortCircuit(value=CreateMediaBuySuccess( - status='pending_approval', workflow_step_id=...))`. Rewrite the - body of `execute_approved_media_buy` (`media_buy_create.py:458`) - to reconstruct the request, attach the resumption marker, and - call back into the SDK runtime's `create_media_buy`. Admin UI - route untouched. Prototype both marker shapes (typed - `ctx.resumption_token` vs. adopter-side sentinel); record which - reads cleanly at the gate site. -6. **`WebhookSender` configured on `serve(...)`.** Disable - `protocol_webhook_service.py` for the experiment tenant. Verify - sync-completion webhooks fire after every mutating tool call, - signed correctly, against a test buyer that validates signature. -7. **Routing shim.** Per-tenant dispatch between old and new - runtimes inside the same process. -8. **Storyboard run.** AdCP `media_buy_seller` against the - experiment tenant; sandbox GAM; admin UI verified untouched. - Approval-required path exercised at least once. -9. **Findings doc.** What ported cleanly, what didn't, what the - `ProposalManager` Protocol should look like based on what worked, - and which resumption-marker shape (typed - `ctx.resumption_token` vs. adopter-side sentinel) reads cleanly - at the `compose_method` gate site. +Step 0 is required for both phases. Phase 1 runs to completion before +Phase 2 starts; Phase 2 is gated on Phase 1's findings being +consistent with #502. + +**Step 0 — Prereqs (~1-2 days, mostly investigation).** + +0.1. Pin `adcp-client-python@`, both storyboard SHAs, GAM + Network ID; document in experiment README. +0.2. Identify (or refactor for) `_impl` seams: + `_create_media_buy_impl`, `_update_media_buy_impl`, + `_add_creative_assets_impl`, `_get_products_impl`. Confirm + transport-agnostic. +0.3. Run salesagent's `make quality` against a stub + `src/sdk_runtime/` directory; enumerate every guard that fires; + decide allowlist-with-justification or alternate location. +0.4. Enumerate cross-tenant background services to disable for + experiment tenant; document the disable mechanism. +0.5. Write a unit test that validates `setattr(request, + "_already_approved", True)` survives the SDK's request + projection round-trip under `extra="forbid"`. If it doesn't, + prototype the typed-marker alternative. +0.6. Identify a test buyer that validates HMAC signatures; verify + SDK F12 / `WebhookSender` signing parity with salesagent's + `webhook_authenticator.py` against the test buyer. +0.7. Pre-register the candidate contradictions for each of the five + learning questions (which finding would tell us each prior is + wrong). + +**Phase 1 — `dynamic_products.py` recipe falsification (~1 day).** + +1.1. Build `ProposalManager.get_products` wrapper around + `dynamic_products.py`. No GAM, no admin UI, no DB rewiring. + Drive from recorded fixtures. +1.2. Assert the output recipe carries every variant Product without + escape hatches. +1.3. Measure glue LOC against `dynamic_products.py` body; assert + ratio ≤ 60%. +1.4. Document findings against the pre-registered contradictions. +1.5. **Decision point: proceed to Phase 2 or stop and revise #502.** + +**Phase 2 — side-car runtime (~3-5 days, gated on Phase 1).** + +2.1. Auth shim — `BuyerAgentRegistry` reading `Principal` + (`access_token` bearer; ~80 LOC) + `AccountStore` reading + `Account` (already AdCP-shaped; ~70 LOC). + `AgentAccountAccess` cross-check for access scoping. +2.2. `GAMDecisioningPlatform` wrapping `_create_media_buy_impl` + and `_update_media_buy_impl`. `GAMProposalManager` wrapping + `_get_products_impl` (now informed by Phase 1's findings). +2.3. HITL gate via `compose_method`. `before` hook on all three + mutating ops, consulting + `AdapterConfig.gam_manual_approval_required` (tenant-scoped). + Rewrite `execute_approved_media_buy` body to reconstruct + request, attach resumption marker, call back into side-car + runtime. Admin UI route untouched. +2.4. `WebhookSender` configured on `serve(...)` with the signing + parity verified at Step 0. Per-tenant disable for + `protocol_webhook_service.py`. +2.5. Test-controller hybrid: implement `simulate_delivery`, + `force_*` methods using salesagent's existing + `delivery_simulator.py`; real GAM for `create_media_buy` / + `update_media_buy`. +2.6. Nginx routing config: `X-Tenant-Id` header → side-car port + for experiment tenant. +2.7. Storyboard runs: + - `media_buy_seller` against the experiment tenant + - `media_buy_guaranteed_approval` against the experiment + tenant (HITL exercise) +2.8. Findings doc — what ported cleanly, what didn't, which + resumption-marker shape reads cleanly at the gate site, plus + any of the five questions that produced a contradicting + finding. ## Next steps after experiment -If exit criteria pass: +If exit criteria (1)-(6) all pass: -1. Factor the `ProposalManager` Protocol from the experiment into - adcp-client-python (separate PR). -2. Land `MockProposalManager` forwarder per #502. -3. Update [#489](https://github.com/adcontextprotocol/adcp-client-python/pull/489) - §3.3 with experiment findings; remove "ProposalManager will own X" - hedging where the experiment proved it. -4. Storyboard the experiment as a worked example in +1. **`ProposalManager` Protocol RFC** — separate spec PR, informed + by but not inherited from the experiment. Open the design call + to other adopters before settling shape. +2. **Land `MockProposalManager` forwarder** per #502. +3. **Update [#489](https://github.com/adcontextprotocol/adcp-client-python/pull/489) + §3.3** with experiment findings; remove "ProposalManager will + own X" hedging where the experiment proved it. +4. **Storyboard the experiment as a worked example** in `examples/salesagent_sidecar/` (with credentials redacted). -If exit criteria fail: +If any criterion fails: 1. Document where the model broke in `docs/proposals/product-architecture-revision-1.md`. 2. Revise [#502](https://github.com/adcontextprotocol/adcp-client-python/pull/502) against the real failure mode. -3. Rerun the experiment against the revised model. +3. Decide whether to rerun the experiment against the revised + model or whether the next falsifier should come from a + different adopter (agentic-adapters social shapes). ## References @@ -372,5 +606,10 @@ If exit criteria fail: `src/core/database/models.py` (2,113 LOC), `src/services/dynamic_products.py` (505 LOC), `src/core/tools/media_buy_create.py` (3,930 LOC), - `src/adapters/google_ad_manager.py`, - `src/a2a_server/adcp_a2a_server.py` (2,276 LOC). + `src/core/tools/media_buy_create.py:458` (`execute_approved_media_buy`), + `src/core/tools/media_buy_create.py:529` (`_already_approved` sentinel), + `src/adapters/google_ad_manager.py:571` (HITL gate), + `src/admin/blueprints/workflows.py:155` (`approve_workflow_step`), + `src/core/database/models.py:1129` (`gam_manual_approval_required`), + salesagent's CLAUDE.md Pattern #5 (`_impl` seam) and Pattern #7 + (`extra="forbid"`). From 23851674e45a22f3b89783853a5102e98f92eae6 Mon Sep 17 00:00:00 2001 From: Brian O'Kelley Date: Sun, 3 May 2026 20:00:40 -0400 Subject: [PATCH 3/7] =?UTF-8?q?docs(proposals):=20reframe=20=E2=80=94=20sa?= =?UTF-8?q?lesagent=20is=20a=20GAM=20agent?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Salesagent's multi-adapter abstraction is vestigial: GAM is the only deployed backend (~99% of clients per migration guide §"Migration order"); Kevel/Broadstreet/Triton/Xandr are scaffolding with no client traffic; mock is a fixture. Treating salesagent as a GAM agent that ships dead code simplifies the experiment in three concrete ways: 1. Wrap target is unconditional GAM. The if-adapter-class switches in _impl (e.g., media_buy_create.py:2431-2464) collapse to unconditional logic; no compatibility surface to preserve. 2. Single recipe type — salesagent contributes only the GAM shape to #502's typed-recipe model. Phase 1 falsification narrows. 3. MockAdServer (~1,800 LOC) deletion joins post-experiment cleanup. Updates: - New "Reframing" section after Two phases - Out of scope clarified: other adapters slated for deletion, not preserved - Next steps adds adapter deprecation roadmap (~3,500-4,000 LOC deletion across 4 sequenced PRs) and side-car-to-runtime promotion path - Note that #489 §3.1 needs a "single-adapter adopters skip PlatformRouter" addition (tracked separately) Doesn't change experiment shape: two-platform composition seam, recipe falsification target, HITL/webhook/auth shim work all stay. Reframing simplifies the success path; doesn't shrink the questions. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../salesagent-sidecar-experiment.md | 72 ++++++++++++++++++- 1 file changed, 70 insertions(+), 2 deletions(-) diff --git a/docs/proposals/salesagent-sidecar-experiment.md b/docs/proposals/salesagent-sidecar-experiment.md index 46ec119b2..9a1a52da5 100644 --- a/docs/proposals/salesagent-sidecar-experiment.md +++ b/docs/proposals/salesagent-sidecar-experiment.md @@ -48,6 +48,57 @@ subscribed buyer. Phase 1 gates Phase 2. Phase 2 only runs if Phase 1 produced findings consistent with #502. +## Reframing: salesagent is a GAM agent + +Salesagent's multi-adapter abstraction is vestigial. GAM is the only +adapter with real deployments (~99% of clients per the migration +guide §"Migration order"); Kevel, Broadstreet, Triton, Xandr are +scaffolding from earlier iterations with no client traffic; mock is +a test fixture, not a backend. Treating salesagent as a GAM agent +that happens to ship dead code simplifies the experiment in three +concrete ways. + +1. **The wrap target is unconditional.** Salesagent today carries + `if adapter.__class__.__name__ == "GoogleAdManager"` switches in + the `_impl` layer (e.g., + `media_buy_create.py:2431-2464` for GAM-specific + `implementation_config` auto-generation + validation). Reframing + collapses these to unconditional GAM logic. The wrap doesn't + need to preserve the registry abstraction — there's no + compatibility surface to preserve. + +2. **Single recipe type.** Salesagent contributes only the GAM + recipe shape to #502's typed-recipe model. The + discriminated-union-over-multiple-recipes question (Path B in + #502) stays real for the SDK in general — adopters with + heterogeneous upstreams (Prebid-style) are the exercise — but + salesagent doesn't test that axis. Phase 1 falsification narrows + to "does the GAM recipe shape carry without escape hatches" — + sharper question, fewer variables. + +3. **`MockAdServer` migration sharpens.** The ~1,800 LOC + `mock_ad_server.py` deletion was a follow-up; reframed, it joins + the post-experiment cleanup story alongside Kevel/Broadstreet/ + Triton/Xandr deletion. v1 of the experiment uses SDK + `Account.mode='mock'` for the experiment tenant; the legacy + `MockAdServer` keeps serving everyone else until the cutover. + +What doesn't change: the two-platform composition seam (proposal-side +dynamic assembly + decisioning-side GAM execution), the recipe +falsification target (Phase 1), the HITL/webhook/auth shim work. The +reframing simplifies the success path; it doesn't shrink the +experiment's questions. + +What this implies for the migration guide. +[#489](https://github.com/adcontextprotocol/adcp-client-python/pull/489) +§3.1 maps `ADAPTER_REGISTRY` → `PlatformRouter`. For GAM-only +adopters (salesagent, anyone with a single live adapter), +`PlatformRouter` is vestigial — the migration is "delete the +registry, instantiate one `GAMPlatform`," not "translate registry +into router." The router pattern is the right primitive for +heterogeneous adopters; single-adapter adopters skip it. §3.1 +should carry a note. Tracked separately from this experiment. + ## Why GAM live (Phase 2 only) Mock-mode would tell us if the wire shape works. It wouldn't tell us @@ -253,8 +304,11 @@ In scope: Out of scope (deliberate): -* Other adapters (Kevel, Broadstreet, Triton, mock — keep on the - existing runtime). +* Other adapters (Kevel, Broadstreet, Triton, Xandr, mock) — keep + serving on the existing runtime FOR THE EXPERIMENT. Per the + reframing above, they're slated for post-experiment deletion; + the experiment doesn't preserve compatibility, just doesn't + break them mid-run. * Other tenants. We control the experiment tenant; nothing real rides on it. * Refine flow / proposal lifecycle (`finalize`, `expires_at`, @@ -582,6 +636,20 @@ If exit criteria (1)-(6) all pass: own X" hedging where the experiment proved it. 4. **Storyboard the experiment as a worked example** in `examples/salesagent_sidecar/` (with credentials redacted). +5. **Adapter deprecation roadmap.** Salesagent is a GAM agent; the + registry pattern is vestigial. Sequenced deletion: (a) delete + Kevel, Broadstreet, Triton, Xandr adapter packages; (b) collapse + `media_buy_create.py`'s GAM-specific switches into unconditional + logic; (c) delete `mock_ad_server.py` (~1,800 LOC, replaced by + SDK `Account.mode='mock'` + `bin/adcp.js mock-server`); (d) + delete the `ADAPTER_REGISTRY` itself once no callers remain. + Each step lands as its own PR with storyboard validation. Total: + ~3,500-4,000 LOC deletion. +6. **Promote the side-car to primary runtime** for salesagent. + With other adapters deleted, the existing `adcp_a2a_server.py` + (2,276 LOC) and `mcp_server_enhanced.py` paths can be retired in + favor of `adcp.serve(...)`. Cutover happens tenant by tenant; the + side-car shape stays revertible until the last tenant migrates. If any criterion fails: From 00fc4ff4a46fd33b2073990cd56abefc4d9593e1 Mon Sep 17 00:00:00 2001 From: Brian O'Kelley Date: Sun, 3 May 2026 20:28:50 -0400 Subject: [PATCH 4/7] docs(proposals): fold Step 0 investigation findings into experiment plan MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three Step 0 investigations completed; results updated in the doc. 0.2 — _impl seams (CONFIRMED CLEAN) - All four core _impl functions exist transport-agnostic in src/core/tools/. File:line refs: _create_media_buy_impl (media_buy_create.py:1270, async) _update_media_buy_impl (media_buy_update.py:117, sync) _get_products_impl (products.py:145, async) _get_media_buy_delivery_impl (media_buy_delivery.py:67, sync) _sync_creatives_impl (creatives/_sync.py:29, sync) - _add_creative_assets_impl doesn't exist; HITL gate uses GAM- internal name mapping to sync_creatives. Before-hook needs a small operation-name table. - Zero salesagent-side refactor required before Phase 2. 0.3 — Structural guards (MUCH SMALLER THAN FEARED) - ~25 total guards via make quality + .pre-commit-hooks/ - Most are path-filtered out of src/sdk_runtime/ (tests-only, alembic-only, models.py-only, tools-only, schemas.py/main.py-only) - Guards that apply to sdk_runtime/: ~5-7 hygiene-level (sqlalchemy 2.0, no hasattr root, no .fn(), import usage, type ignore count, code duplication). Trivially satisfiable. - Zero allowlist additions likely needed; possibly one if check_parameter_alignment.py doesn't accept a third _impl caller. - Recommendation: keep src/sdk_runtime/ inside src/. 0.4 — Cross-tenant background services (TWO, NOT FOUR) - protocol_webhook_service, background_approval_service, order_approval_service, background_sync_service all fire per- request or per-order; do NOT need per-tenant disable. - Two genuinely cross-tenant schedulers from core/main.py lifespan: - media_buy_status_scheduler.py (60s cadence, lifecycle transitions cross-tenant) - delivery_webhook_scheduler.py (daily, sends reporting_webhook cross-tenant) - Both would race/duplicate against side-car runtime. - Concrete prereq added: Tenant.skip_legacy_schedulers: bool alembic migration + 6-line consults in both schedulers. Updates also flow through: - HITL section: sync_creatives is the wire surface, not add_creative_assets; operation-name mapping called out - Workstream Phase 2: wraps now include _sync_creatives_impl; webhook step notes per-tenant scheduler disable, not service disable - Risks: three HITL surfaces with sync_creatives correction Co-Authored-By: Claude Opus 4.7 (1M context) --- .../salesagent-sidecar-experiment.md | 201 +++++++++++++----- 1 file changed, 143 insertions(+), 58 deletions(-) diff --git a/docs/proposals/salesagent-sidecar-experiment.md b/docs/proposals/salesagent-sidecar-experiment.md index 9a1a52da5..2d512e066 100644 --- a/docs/proposals/salesagent-sidecar-experiment.md +++ b/docs/proposals/salesagent-sidecar-experiment.md @@ -111,42 +111,94 @@ survives. Mock conformance was never going to tell us. ## Step 0 — Prereqs (both phases) -These must land before any wrapping code is written. Several are -themselves unscoped today; some are concrete prereqs, some are -investigations whose output shapes the rest of the work. +These must land before any wrapping code is written. Several were +investigations whose output shapes the rest of the work; results +folded in below. Items still TBD are flagged. + +### Investigated (results below) + +* **`_impl` seams in salesagent — confirmed clean.** All four + required `_impl` functions exist as transport-agnostic functions + (Pydantic request + `ResolvedIdentity`, no FastMCP/Flask types in + signature): + - `_create_media_buy_impl` (`media_buy_create.py:1270`, async) + - `_update_media_buy_impl` (`media_buy_update.py:117`, sync) + - `_get_products_impl` (`products.py:145`, async) + - `_get_media_buy_delivery_impl` (`media_buy_delivery.py:67`, sync) + - `_sync_creatives_impl` (`creatives/_sync.py:29`, sync) — + **substitutes for the imagined `_add_creative_assets_impl`** + + `_add_creative_assets_impl` doesn't exist. The HITL gate at + `google_ad_manager.py:880` checks the GAM-internal operation name + `add_creative_assets`; at the wire level the equivalent is + `sync_creatives`. **Implication:** the SDK runtime's HITL + before-hook needs an operation-name mapping (GAM-internal → + AdCP-wire) since salesagent's `manual_approval_operations` config + keys on GAM-internal names. Small, but not zero. + + Bottom line: **zero salesagent-side refactor required** before + Phase 2. Wrap targets exist. The pattern is real and pervasive + (14 `_impl` functions across `src/core/tools/`). + +* **Structural guards — much smaller than feared.** `make quality` + runs ruff format/check, mypy, `check_code_duplication.py`, and + unit tests (`Makefile:8-13`). ~25 custom guards live in + `.pre-commit-hooks/` + `.pre-commit-config.yaml`. **Most are + path-filtered out** of `src/sdk_runtime/`: + - `check-tenant-context-order` — `^src/core/tools/.*\.py$` only + - `enforce-jsontype` — `models.py` only + - `mcp-contract-validation` / `mcp-schema-alignment` — + `schemas.py` / `main.py` only + - migration guards — alembic only + - test guards (`no-skip-tests`, `ast-grep-bdd-guards`) — tests only + + Guards that DO apply to `src/sdk_runtime/`: project-hygiene only + (sqlalchemy 2.0 patterns, no `hasattr(x, 'root')`, no `.fn()` + calls, import usage, type:ignore count, code duplication). All + trivially satisfiable with normal coding. + + One worth inspecting before relying on the prediction: + **`check-parameter-alignment`** verifies MCP/A2A wrappers pass + aligned params to `_impl`. The side-car is a *third* caller of + `_impl`. Whether the guard accepts a third call site or assumes + exactly two needs a quick read of + `.pre-commit-hooks/check_parameter_alignment.py`. + + Bottom line: **zero allowlist additions likely needed**, possibly + one. Keep `src/sdk_runtime/` inside `src/`. + +* **Cross-tenant background services — only two, not four.** + Investigation found that `protocol_webhook_service`, + `background_approval_service`, `order_approval_service`, + `background_sync_service` all fire per-request or per-order, NOT + cross-tenant. They don't need per-tenant disable — the side-car + routing decides when they fire. + + Two genuinely cross-tenant schedulers, both started from + `core/main.py` lifespan: + - **`media_buy_status_scheduler.py`** + (`core/main.py:95`, 60s cadence): auto-transitions `MediaBuy` + lifecycle (pending_activation → active → completed) by flight + dates, cross-tenant. **Would race with the side-car's + lifecycle handling.** + - **`delivery_webhook_scheduler.py`** + (`core/main.py:85`, daily): sends `reporting_webhook` reports + cross-tenant. **Would fire duplicate webhooks alongside SDK F12 + auto-emit.** + + **Concrete prereq:** add a per-tenant skip flag (e.g., + `Tenant.skip_legacy_schedulers: bool`, default False; alembic + migration). Both schedulers consult the flag and skip experiment + tenants. ~6 lines per scheduler + one small migration. Without + this, the experiment tenant fights its own DB on a 60s cadence. + +### Still TBD before Phase 2 starts * **Pin SHAs.** `adcp-client-python@`, storyboard `media_buy_seller@`, storyboard `media_buy_guaranteed_approval@`, GAM sandbox Network ID `` documented in the experiment README. Without pins, Phase 2's "is it the storyboard or our wrap" debugging burns days. -* **Identify the `_impl` seams in salesagent.** Self-review correctly - flagged that wrapping `google_ad_manager.py` is a port disguised as - a wrap — the adapter doesn't own buyer/principal resolution, tenant - config, currency validation, signal lookup, audit logs, workflow - rows, or webhook scheduling. Salesagent's CLAUDE.md Pattern #5 - establishes `_impl` functions as the transport-agnostic seam: - `_create_media_buy_impl`, `_update_media_buy_impl`, - `_add_creative_assets_impl`, `_get_products_impl`. **The wrap target - is the `_impl`, not the adapter.** If any of these `_impl`s don't - exist or aren't transport-agnostic enough today, that's a - prerequisite refactor in salesagent before the side-car experiment - starts. -* **Enumerate AST structural guards** that fire on `src/sdk_runtime/`. - Salesagent has ~25 guards via `make quality` — no raw `select()` - outside repos, no `get_db_session()` in `_impl`, schema inheritance - from adcp library, repository pattern enforcement, single alembic - head. Decide: (a) earn allowlist entries for each (and document - each as an architectural finding), or (b) move `src/sdk_runtime/` - outside `src/` so the side-car isn't really inside salesagent. - Either is defensible; not deciding is not. -* **Enumerate cross-tenant background services to disable** for the - experiment tenant: `background_sync_service`, - `background_approval_service`, `delivery_webhook_scheduler`, - `protocol_webhook_service`. Per-tenant disable is doable but - invisible in the test tenant's behavior unless explicitly listed. - Document the disable mechanism (env var? tenant-scoped config? - process-level kill switch?) before relying on it. * **Validate `_already_approved` survives SDK re-validation.** Salesagent uses `setattr(request, "_already_approved", True)` on a Pydantic model (`media_buy_create.py:529`). Salesagent runs @@ -161,6 +213,13 @@ investigations whose output shapes the rest of the work. test buyer up front; verify signature parity between salesagent's `webhook_authenticator.py` scheme and SDK F12 / `WebhookSender` before the storyboard run, not during. +* **Read `check_parameter_alignment.py`** to confirm whether the + guard gracefully accepts a third `_impl` caller (the side-car + alongside MCP and A2A wrappers) or needs an allowlist entry. +* **Add `Tenant.skip_legacy_schedulers: bool` migration** + the + 6-line consults in `media_buy_status_scheduler.py` and + `delivery_webhook_scheduler.py`. Land in salesagent before the + experiment tenant sees side-car traffic. ## Phase 1 — `dynamic_products.py` recipe falsification @@ -246,10 +305,13 @@ In scope: transport-agnostic seam, after Phase 1 validated the recipe shape) - `GAMDecisioningPlatform` wrapping `_create_media_buy_impl`, - `_update_media_buy_impl`, `_get_media_buy_delivery_impl` - - HITL `before` hook on all three mutating ops, plus - `_add_creative_assets_impl` (the **third** approval re-entry - surface — see HITL section below) + `_update_media_buy_impl`, `_get_media_buy_delivery_impl`, + `_sync_creatives_impl` + - HITL `before` hook on all three mutating ops (the + `_add_creative_assets_impl` referenced in earlier drafts doesn't + exist; the wire-level equivalent is `sync_creatives` and the + GAM-internal HITL operation name `add_creative_assets` maps to + it via a small operation-name table on the before-hook) - `WebhookSender` configured with signing parity verified at Step 0 against the test buyer * **Real GAM upstream** — actual orders/line items in a sandbox @@ -264,10 +326,13 @@ In scope: or we scope the others out explicitly. **Decision: in scope** — `create_media_buy`, `update_media_buy`, - `add_creative_assets`. Creative-approval-specific re-entry through + `sync_creatives` (the wire surface; salesagent's HITL config + references it as `add_creative_assets` per its GAM-internal + operation name, mapped via a small table on the before-hook). + Creative-approval-specific re-entry through `order_approval_service.py` is **out of scope** for v1 (creative - flows are deferred regardless); revisit if `media_buy_guaranteed_approval` - storyboard happens to exercise it. + flows are deferred regardless); revisit if + `media_buy_guaranteed_approval` storyboard happens to exercise it. **How salesagent does HITL today** (verified, file:line): @@ -522,7 +587,9 @@ recipe model needs revision. test tenant being controlled doesn't change the contract; it just makes a contract bug recoverable instead of catastrophic. * **Three HITL re-entry surfaces, not one.** `create_media_buy`, - `update_media_buy`, `add_creative_assets` — all in scope. + `update_media_buy`, `sync_creatives` (the wire surface; salesagent + HITL config keys it as the GAM-internal `add_creative_assets`, + mapped via a small table on the before-hook) — all in scope. Creative-specific re-entry through `order_approval_service.py` out of scope for v1. Easy to invisibly skip one; risk is "looks like it works" with one path missed. @@ -554,19 +621,28 @@ Step 0 is required for both phases. Phase 1 runs to completion before Phase 2 starts; Phase 2 is gated on Phase 1's findings being consistent with #502. -**Step 0 — Prereqs (~1-2 days, mostly investigation).** +**Step 0 — Prereqs (~1-2 days, mix of investigation already done + +remaining concrete work).** + +Investigations 0.2, 0.3, 0.4 are complete; results in the Step 0 +section above. Remaining items are concrete prereqs. 0.1. Pin `adcp-client-python@`, both storyboard SHAs, GAM Network ID; document in experiment README. -0.2. Identify (or refactor for) `_impl` seams: - `_create_media_buy_impl`, `_update_media_buy_impl`, - `_add_creative_assets_impl`, `_get_products_impl`. Confirm - transport-agnostic. -0.3. Run salesagent's `make quality` against a stub - `src/sdk_runtime/` directory; enumerate every guard that fires; - decide allowlist-with-justification or alternate location. -0.4. Enumerate cross-tenant background services to disable for - experiment tenant; document the disable mechanism. +0.2. ✅ `_impl` seams identified — all four exist transport-agnostic. + Note: HITL operation-name mapping needed (GAM-internal + `add_creative_assets` → AdCP-wire `sync_creatives`). +0.3. ✅ Structural-guard story scoped — ~5-7 hygiene guards apply, + zero allowlist additions likely. **Remaining:** read + `.pre-commit-hooks/check_parameter_alignment.py` to confirm it + accepts a third `_impl` caller (side-car alongside MCP and + A2A). +0.4. ✅ Cross-tenant scheduler audit — only two genuinely + cross-tenant (`media_buy_status_scheduler`, + `delivery_webhook_scheduler`). **Remaining:** add + `Tenant.skip_legacy_schedulers: bool` migration + the 6-line + consults in both schedulers. Land in salesagent before the + experiment tenant sees side-car traffic. 0.5. Write a unit test that validates `setattr(request, "_already_approved", True)` survives the SDK's request projection round-trip under `extra="forbid"`. If it doesn't, @@ -596,18 +672,27 @@ consistent with #502. (`access_token` bearer; ~80 LOC) + `AccountStore` reading `Account` (already AdCP-shaped; ~70 LOC). `AgentAccountAccess` cross-check for access scoping. -2.2. `GAMDecisioningPlatform` wrapping `_create_media_buy_impl` - and `_update_media_buy_impl`. `GAMProposalManager` wrapping +2.2. `GAMDecisioningPlatform` wrapping `_create_media_buy_impl`, + `_update_media_buy_impl`, `_get_media_buy_delivery_impl`, + `_sync_creatives_impl`. `GAMProposalManager` wrapping `_get_products_impl` (now informed by Phase 1's findings). -2.3. HITL gate via `compose_method`. `before` hook on all three - mutating ops, consulting - `AdapterConfig.gam_manual_approval_required` (tenant-scoped). - Rewrite `execute_approved_media_buy` body to reconstruct - request, attach resumption marker, call back into side-car - runtime. Admin UI route untouched. +2.3. HITL gate via `compose_method`. `before` hook on the three + mutating ops (`create_media_buy`, `update_media_buy`, + `sync_creatives`), consulting + `AdapterConfig.gam_manual_approval_required` (tenant-scoped) + + a small operation-name mapping table (GAM-internal + `add_creative_assets` → AdCP-wire `sync_creatives`). Rewrite + `execute_approved_media_buy` body to reconstruct request, + attach resumption marker, call back into side-car runtime. + Admin UI route untouched. 2.4. `WebhookSender` configured on `serve(...)` with the signing - parity verified at Step 0. Per-tenant disable for - `protocol_webhook_service.py`. + parity verified at Step 0. **No** per-tenant disable needed for + `protocol_webhook_service.py` (per Step 0.4 — fires per-event + in the active request, not cross-tenant). The two cross-tenant + schedulers (`media_buy_status_scheduler`, + `delivery_webhook_scheduler`) are disabled per-tenant via the + `Tenant.skip_legacy_schedulers` flag landed in salesagent at + Step 0. 2.5. Test-controller hybrid: implement `simulate_delivery`, `force_*` methods using salesagent's existing `delivery_simulator.py`; real GAM for `create_media_buy` / From 23083ed3d23e73a85fedb5310232607b87318325 Mon Sep 17 00:00:00 2001 From: Brian O'Kelley Date: Sun, 3 May 2026 20:35:46 -0400 Subject: [PATCH 5/7] docs(proposals): fold (1)/(2)/(3) findings + local-fork constraint MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three more Step 0 investigations completed; doc updated. (1) check_parameter_alignment.py analyzed - Guard enumerates pairs of (mcp_wrapper, a2a_raw) from a hardcoded tools list (lines 36-78). Does NOT enumerate "all callers of _impl." Side-car's GAMDecisioningPlatform calling _impl directly is invisible to the guard. - Confirmed: zero allowlist additions needed. (2) _already_approved sentinel works as-is - compose_method (compose.py:173-194) passes req through unchanged from before-hook to inner; no model_validate/model_copy/model_dump on the request side. - Dispatcher only model_dumps on response side (dispatch.py:1234, 1306-1307). - Generated request models use extra='forbid' (validation-time only) without frozen=True or validate_assignment=True; setattr lands in __dict__ and persists through Python-level dispatch. - No typed marker prototype needed for this experiment. Q4 design question stays open as a Protocol RFC. (3) Webhook signing parity does NOT hold (real finding) - Salesagent's webhook_authenticator.py emits X-Webhook-Signature + X-Webhook-Timestamp with payload f"{ts}.{json.dumps(payload, separators=(',',':'), sort_keys=True)}" canonicalization. - SDK's from_adcp_legacy_hmac emits X-AdCP-Signature + X-AdCP-Timestamp + X-AdCP-Key-Id with different canonicalization. - Different headers, different canonicalization, different scheme entirely. - §3.14's "adopters delete their webhook plumbing wholesale" claim doesn't hold cleanly — production cutover requires buyer migration. - Experiment validates SDK→SDK signing with adcp.WebhookReceiver as test buyer. Does not validate the migration claim against actual subscribed buyers. Constraint added to Step 0: - Local fork edits only. No upstream PRs to salesagent. The scheduler skips become hardcoded constants or env-var consults in our experiment fork; revert with `git checkout main`. Updates flow through: - Step 0.4: prereq is local fork patch, not alembic migration - Step 0 Investigated section: three more ✅ items + one ⚠️ - Workstream 2.4: webhook test buyer is adcp.WebhookReceiver (not a real subscribed buyer; that's out of scope post-experiment) - Risks: webhook signing parity bullet rewritten with the correct finding (incompatible, not "verify before run") - Next steps adds 3a — correct §3.14 of #489 migration guide Co-Authored-By: Claude Opus 4.7 (1M context) --- .../salesagent-sidecar-experiment.md | 182 +++++++++++++----- 1 file changed, 131 insertions(+), 51 deletions(-) diff --git a/docs/proposals/salesagent-sidecar-experiment.md b/docs/proposals/salesagent-sidecar-experiment.md index 2d512e066..58d81c8aa 100644 --- a/docs/proposals/salesagent-sidecar-experiment.md +++ b/docs/proposals/salesagent-sidecar-experiment.md @@ -115,6 +115,14 @@ These must land before any wrapping code is written. Several were investigations whose output shapes the rest of the work; results folded in below. Items still TBD are flagged. +**Constraint: local fork edits only.** All salesagent-side changes +this experiment requires (scheduler skips, the `execute_approved_media_buy` +rewrite to call into the side-car runtime) live as patches in our +experiment fork. **No upstream PRs to salesagent.** This makes the +experiment fully revertible (`git checkout main` in the fork resets +everything) and lets us be more aggressive with local changes than +upstream review would allow. + ### Investigated (results below) * **`_impl` seams in salesagent — confirmed clean.** All four @@ -186,11 +194,74 @@ folded in below. Items still TBD are flagged. cross-tenant. **Would fire duplicate webhooks alongside SDK F12 auto-emit.** - **Concrete prereq:** add a per-tenant skip flag (e.g., - `Tenant.skip_legacy_schedulers: bool`, default False; alembic - migration). Both schedulers consult the flag and skip experiment - tenants. ~6 lines per scheduler + one small migration. Without - this, the experiment tenant fights its own DB on a 60s cadence. + **Concrete prereq (local fork patch, not upstream PR):** add a + per-tenant skip in our experiment fork. Two viable shapes: + - (a) hardcoded constant in each scheduler: + `EXPERIMENT_TENANT_IDS = {"tenant_acme_test"}` consulted in the + cross-tenant query + - (b) env var: `SKIP_TENANT_IDS=tenant_acme_test` parsed at + scheduler start + + Either is ~6 lines per scheduler. Option (b) is slightly cleaner; + doesn't matter since neither leaves the fork. Without this, the + experiment tenant fights its own DB on a 60s cadence. + +### Investigated this round + +* ✅ **`check_parameter_alignment.py` analyzed.** The guard + enumerates pairs of `(mcp_wrapper, a2a_raw)` from a hardcoded + `tools` list (`.pre-commit-hooks/check_parameter_alignment.py:36-78`) + and checks signature alignment for those specific named functions. + It does NOT enumerate "all callers of `_impl`." A side-car's + `GAMDecisioningPlatform.create_media_buy` calling + `_create_media_buy_impl` is invisible to the guard. **Confirmed: + zero allowlist additions needed.** + +* ✅ **`_already_approved` sentinel works as-is** for the SDK + runtime. Verified via: + - `compose_method` (`src/adcp/decisioning/compose.py:173-194`) + passes `req` through unchanged from before-hook to inner; no + re-validation, no `model_copy`, no `model_dump` on the request + side. + - The dispatcher (`src/adcp/decisioning/dispatch.py`) only + `model_dump`s on the response side (lines 1234, 1306-1307); + request objects flow through as-is. + - Generated request models use `extra='forbid'` (validation-time + only) without `frozen=True` or `validate_assignment=True`, so + `setattr(req, "_already_approved", True)` lands in `__dict__` + and persists through Python-level dispatch. + + The sentinel is stripped on JSON serialization (which is + intentional — buyers can't smuggle it). In-process resumption via + Python function call preserves it. **Salesagent's existing pattern + ports to the SDK runtime without a typed marker for this + experiment.** The Q4 design question (typed vs untyped resumption + marker) remains open as a Protocol RFC, but the experiment can run + with the untyped pattern that already works. + +* ⚠️ **Webhook signing parity does NOT hold** between salesagent's + scheme and SDK `from_adcp_legacy_hmac`. Salesagent + (`src/core/webhook_authenticator.py:14-47`) emits: + - Headers: `X-Webhook-Signature: sha256=` + `X-Webhook-Timestamp` + - Canonicalization: `f"{timestamp}.{json.dumps(payload, separators=(",",":"), sort_keys=True)}"` + + SDK `from_adcp_legacy_hmac` (`src/adcp/webhook_sender.py:404`, + `src/adcp/webhook_auth.py`) emits: + - Headers: `X-AdCP-Signature` + `X-AdCP-Timestamp` + `X-AdCP-Key-Id` + - Different canonicalization (per `adcp.signing.standard_webhooks`) + + Different headers, different canonicalization, different scheme + entirely. **§3.14's claim that "adopters delete their webhook + plumbing wholesale" doesn't hold cleanly** — production cutover + requires buyer migration to the SDK signing scheme. + + For the experiment: SDK→SDK signing works (use + `adcp.WebhookReceiver` as the test buyer with the same secret). + This validates that SDK signing is internally consistent. It does + NOT validate that buyers subscribed to salesagent today will + accept SDK signatures — they won't. **§3.14 of the migration + guide needs a correction**: webhook plumbing deletion is + contingent on buyer migration, not unconditional. ### Still TBD before Phase 2 starts @@ -199,27 +270,11 @@ folded in below. Items still TBD are flagged. `media_buy_guaranteed_approval@`, GAM sandbox Network ID `` documented in the experiment README. Without pins, Phase 2's "is it the storyboard or our wrap" debugging burns days. -* **Validate `_already_approved` survives SDK re-validation.** - Salesagent uses `setattr(request, "_already_approved", True)` on a - Pydantic model (`media_buy_create.py:529`). Salesagent runs - `extra="forbid"` (Pattern #7); the SDK runtime presumably - re-validates on inbound. Confirm the sentinel survives the - projection round-trip. If it doesn't, prototype the typed - alternative (`ctx.resumption_token`) on day 1, before the HITL gate - is wired. -* **Document a test buyer that validates HMAC signatures.** §3.14's - claim that adopters delete their webhook plumbing is only validated - if a real subscribed buyer accepts SDK-signed webhooks. Identify the - test buyer up front; verify signature parity between salesagent's - `webhook_authenticator.py` scheme and SDK F12 / `WebhookSender` - before the storyboard run, not during. -* **Read `check_parameter_alignment.py`** to confirm whether the - guard gracefully accepts a third `_impl` caller (the side-car - alongside MCP and A2A wrappers) or needs an allowlist entry. -* **Add `Tenant.skip_legacy_schedulers: bool` migration** + the - 6-line consults in `media_buy_status_scheduler.py` and - `delivery_webhook_scheduler.py`. Land in salesagent before the - experiment tenant sees side-car traffic. +* **Patch the two cross-tenant schedulers in the experiment fork** + (per Step 0.4 above): hardcoded skip or env-var skip for the + experiment tenant ID. Local fork patch only. +* **Pre-register the candidate contradictions** for each of the + five learning questions (Step 0.7 in workstream). ## Phase 1 — `dynamic_products.py` recipe falsification @@ -575,8 +630,17 @@ recipe model needs revision. and the discipline that wraps wrap `_impl` not adapters. If any needed `_impl` doesn't exist in transport-agnostic form, that's a salesagent-side refactor before the side-car experiment runs. -* **Webhook signature parity is a real risk, not paperwork.** - Mitigated by Step 0 dry-run against a subscribed test buyer. +* **Webhook signing schemes don't match — confirmed in Step 0.** + Salesagent's `X-Webhook-Signature` scheme and SDK's + `X-AdCP-Signature` scheme are incompatible. The experiment + validates SDK→SDK signing with `WebhookReceiver` as the test + buyer; it does NOT validate that buyers subscribed to salesagent + today will accept SDK signatures. **§3.14 of the migration guide + needs correction**: webhook plumbing deletion is contingent on + buyer migration, not unconditional. The cutover-path decision + (buyers migrate / SDK adds salesagent-compatible mode / side-car + passes through legacy signer) belongs in the findings doc, not + this experiment. * **`_already_approved` may not survive `extra="forbid"` projection.** Mitigated by Step 0 validation. If it doesn't survive, prototype typed marker on day 1. @@ -632,24 +696,26 @@ section above. Remaining items are concrete prereqs. 0.2. ✅ `_impl` seams identified — all four exist transport-agnostic. Note: HITL operation-name mapping needed (GAM-internal `add_creative_assets` → AdCP-wire `sync_creatives`). -0.3. ✅ Structural-guard story scoped — ~5-7 hygiene guards apply, - zero allowlist additions likely. **Remaining:** read - `.pre-commit-hooks/check_parameter_alignment.py` to confirm it - accepts a third `_impl` caller (side-car alongside MCP and - A2A). +0.3. ✅ Structural-guard story scoped + verified — ~5-7 hygiene + guards apply; `check_parameter_alignment.py` checks named + MCP/A2A pairs from a hardcoded list, not all `_impl` callers, + so the side-car is invisible to it. Zero allowlist additions + needed. 0.4. ✅ Cross-tenant scheduler audit — only two genuinely - cross-tenant (`media_buy_status_scheduler`, - `delivery_webhook_scheduler`). **Remaining:** add - `Tenant.skip_legacy_schedulers: bool` migration + the 6-line - consults in both schedulers. Land in salesagent before the - experiment tenant sees side-car traffic. -0.5. Write a unit test that validates `setattr(request, - "_already_approved", True)` survives the SDK's request - projection round-trip under `extra="forbid"`. If it doesn't, - prototype the typed-marker alternative. -0.6. Identify a test buyer that validates HMAC signatures; verify - SDK F12 / `WebhookSender` signing parity with salesagent's - `webhook_authenticator.py` against the test buyer. + cross-tenant. **Remaining (local fork patch only):** hardcoded + skip or env-var skip in `media_buy_status_scheduler.py` and + `delivery_webhook_scheduler.py` for the experiment tenant ID. +0.5. ✅ `_already_approved` sentinel works as-is. Verified + `compose_method` passes `req` through unchanged; dispatcher + only `model_dump`s on response side; setattr lands in + `__dict__` and persists through Python-level dispatch. No + typed marker prototype needed for this experiment. +0.6. ⚠️ Webhook signing parity does NOT hold between salesagent's + `webhook_authenticator.py` (X-Webhook-Signature scheme) and + SDK's `from_adcp_legacy_hmac` (X-AdCP-Signature scheme). For + the experiment, use SDK→SDK signing only (test buyer is + `adcp.WebhookReceiver` with the same secret). Production + cutover requires buyer migration as separate work. 0.7. Pre-register the candidate contradictions for each of the five learning questions (which finding would tell us each prior is wrong). @@ -685,14 +751,18 @@ section above. Remaining items are concrete prereqs. `execute_approved_media_buy` body to reconstruct request, attach resumption marker, call back into side-car runtime. Admin UI route untouched. -2.4. `WebhookSender` configured on `serve(...)` with the signing - parity verified at Step 0. **No** per-tenant disable needed for - `protocol_webhook_service.py` (per Step 0.4 — fires per-event +2.4. `WebhookSender` configured on `serve(...)` using + `from_adcp_legacy_hmac` with a controlled experiment-tenant + secret. Test buyer is `adcp.WebhookReceiver` + (`src/adcp/webhook_receiver.py`) with the same secret — + SDK→SDK signing parity, not parity against salesagent's + existing scheme (which is incompatible per Step 0 + investigation). **No** per-tenant disable needed for + `protocol_webhook_service.py` (Step 0.4 — fires per-event in the active request, not cross-tenant). The two cross-tenant schedulers (`media_buy_status_scheduler`, - `delivery_webhook_scheduler`) are disabled per-tenant via the - `Tenant.skip_legacy_schedulers` flag landed in salesagent at - Step 0. + `delivery_webhook_scheduler`) are skipped via the local-fork + patch (hardcoded tenant ID or env var) per Step 0.4. 2.5. Test-controller hybrid: implement `simulate_delivery`, `force_*` methods using salesagent's existing `delivery_simulator.py`; real GAM for `create_media_buy` / @@ -719,6 +789,16 @@ If exit criteria (1)-(6) all pass: 3. **Update [#489](https://github.com/adcontextprotocol/adcp-client-python/pull/489) §3.3** with experiment findings; remove "ProposalManager will own X" hedging where the experiment proved it. +3a. **Correct [#489](https://github.com/adcontextprotocol/adcp-client-python/pull/489) + §3.14** — webhook plumbing deletion is contingent on buyer + signing-scheme migration, not unconditional. Salesagent's existing + `X-Webhook-Signature` scheme is incompatible with SDK's + `X-AdCP-Signature` scheme; production cutover requires either + (a) buyers migrate to SDK signing, (b) SDK ships a salesagent- + compatible signing strategy, or (c) the side-car runtime + passes through salesagent's `webhook_authenticator.py` rather + than using F12 auto-emit. Findings doc names which path is + recommended based on subscribed-buyer count. 4. **Storyboard the experiment as a worked example** in `examples/salesagent_sidecar/` (with credentials redacted). 5. **Adapter deprecation roadmap.** Salesagent is a GAM agent; the From 16a56ccc9156d10e9348ae22bf703028a78079cc Mon Sep 17 00:00:00 2001 From: Brian O'Kelley Date: Sun, 3 May 2026 20:39:02 -0400 Subject: [PATCH 6/7] docs(proposals): pre-register falsification signals (Step 0.7) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit For each of the six learning questions (Q1, Q1.5, Q2, Q3, Q4, Q5), write down the specific finding that would falsify the prior — before the experiment runs. This is the enforcement mechanism for self-review's "one author wearing three hats" warning: if I don't commit upfront to what would tell me I'm wrong, I'll find what I'm looking for. Concrete, observable falsifiers per question: - Q1: glue >303 LOC, monkey-patching needed, identity impedance - Q1.5: recipe schema requires proposal_id, variant Products need forged rows, hash-dedup state crosses sessions - Q2: any extra: dict, type: ignore on recipe construction, lossy round-trip - Q3: none of the three hydration models work, OR adopter-owned hydration turns out to be the right primitive - Q4: salesagent pattern is N=1; experiment informs but doesn't settle the Protocol seam - Q5: SDK→SDK signing parity already partially falsified; remaining falsifiers are auto-emit doesn't fire, retry doesn't behave Step 0.7 marked ✅ in workstream. The experiment is now fully unblocked from a planning standpoint. Phase 1 can start; Phase 2 prereqs are mechanical (pin SHAs, local fork patch for two schedulers). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../salesagent-sidecar-experiment.md | 187 +++++++++++++++++- 1 file changed, 184 insertions(+), 3 deletions(-) diff --git a/docs/proposals/salesagent-sidecar-experiment.md b/docs/proposals/salesagent-sidecar-experiment.md index 58d81c8aa..15037c528 100644 --- a/docs/proposals/salesagent-sidecar-experiment.md +++ b/docs/proposals/salesagent-sidecar-experiment.md @@ -624,6 +624,187 @@ Phase 1 is the cheapest place to falsify this. If the wire shape can't carry signal-driven variants without escape hatches, #502's recipe model needs revision. +## Pre-registered falsification signals + +Self-review's "one author wearing three hats" warning applies — if I +don't commit upfront to what would tell me each prior is wrong, I'll +find what I'm looking for. For each learning question, the specific +finding that would falsify the prior is named here, before the +experiment runs. **A finding that contradicts any of these is a +positive result — it's what the experiment is for.** + +### Q1 — Does `dynamic_products.py` factor onto `ProposalManager.get_products`? + +Prior: salesagent's signal-driven assembly fits the +`ProposalManager.get_products` shape via a thin wrapping that calls +into the existing 505-LOC body without re-implementing it. + +Falsified if any of: + +* **LOC budget exceeded.** Glue exceeds 60% of source body + (>303 LOC against 505). Hard threshold; pre-registered. +* **Wrap-as-port.** The wrapper has to re-execute logic from inside + `dynamic_products.py` rather than calling it as-is — e.g., + re-running `signals_agent_registry` lookup, rebuilding variant + products from intermediate state, or duplicating the de-dup hash + logic. +* **Monkey-patching required.** The wrapper has to inject into + `dynamic_products` module-level state, replace function references, + or modify globals to make it work in a `ProposalManager` shape. If + this happens, the abstraction is a leaky shim, not a clean factor. +* **Identity-shaped impedance.** `dynamic_products.py` requires + `ResolvedIdentity` shaped exactly the way salesagent's MCP wrapper + builds it; the SDK's projection from `BuyerAgent` + `Account` to + the equivalent loses information the assembly logic depends on. + +If any falsifier fires: #502's claim that proposal-side assembly is +a clean wrap-of-`_impl` shape is wrong. Adopters with non-trivial +proposal logic would have to choose between (a) restructuring their +assembly to fit the SDK shape, or (b) sticking with their existing +runtime. Either is a real finding that revises #502. + +### Q1.5 — Does the recipe model allow proposal-time *assembly*? + +Prior: #502's "framework session cache against `proposal_id`" model +accommodates dynamic products. Salesagent generates signal-driven +variant `Product` rows at brief time; the SDK's session-cache +abstraction can carry these. + +Falsified if any of: + +* **Recipe schema requires `proposal_id` lookup.** Signal-driven + variants generated at brief time have no committed `proposal_id` + yet; if the recipe schema requires one to validate or hydrate, + the model is too late-bound. +* **Variant Products require new schema rows.** Salesagent's + dynamic products land as new `Product` rows with TTL + (`expires_at`); the SDK's session-cache model assumes recipes + are looked up against pre-existing Products, not assembled + alongside them. If we have to forge `Product` rows the framework + doesn't know about to make this work, the abstraction is wrong + — recipes must support proposal-time *assembly*, not just lookup. +* **Hash-dedup state crosses sessions.** `dynamic_products.py` + hashes inputs to dedup variants; if the hash state can't fit + the framework's session-scoped cache (because dedup is global + cross-session), the session-scoped model is wrong. + +If any falsifier fires: #502 needs a revision adding proposal-time +recipe assembly as a first-class concern. The session-cache model +becomes one shape among multiple. + +### Q2 — Does the recipe carry enough? + +Prior: GAM's `implementation_config` (the most-evolved recipe shape +in salesagent) fits a typed Pydantic recipe without escape hatches. + +Falsified if any of: + +* **`extra: dict[str, Any]` field on the recipe.** Any typed escape + hatch — including `vendor_specific: dict`, `__pydantic_extra__` + carrying GAM data, or `Annotated[Any, Field(extra=True)]` — is + a tell that the typed recipe doesn't actually carry GAM's full + shape. +* **`# type: ignore` to make recipe construction work.** If we + have to bypass mypy to build the recipe from salesagent's + `Product.implementation_config` JSON, the typed shape isn't + capturing what's there. +* **Lossy projection.** Round-trip from + `Product.implementation_config: JSONType` (salesagent) → + `GAMRecipe` (typed) → `dict` (passed to `_create_media_buy_impl`) + loses any field. A literal dict comparison after round-trip + must be equal. + +If any falsifier fires: #502's typed-recipe model is wrong, or +incomplete, or needs an escape-hatch design (`unstructured: dict` +field with documented semantics, like Kubernetes annotations). +Worth surfacing in a Protocol RFC. + +### Q3 — What hydration model does `create_media_buy` need? + +Prior: framework hydrates the recipe at `create_media_buy` time +from one of three sources (session cache, persisted DB row, fresh +lookup); the experiment forces a choice. + +Falsified if: + +* **None of the three work.** Hydration requires re-running the + proposal-side assembly logic at `create_media_buy` time + (because assembly depends on signal-time-of-day, signal agent + state at brief moment, or other non-idempotent inputs). +* **Framework-owned hydration is the wrong primitive.** The right + answer is "framework owns no hydration; adopter handles it + inside `_create_media_buy_impl`" — meaning the SDK's framework + abstraction is incorrectly drawn. + +If any falsifier fires: #502's framework-managed-recipe-state +model is wrong. The recipe is adopter-owned data the SDK doesn't +need to mediate; the SDK's job is just to type the contract. + +### Q4 — What is the right shape for the HITL resumption marker? + +Prior: the experiment can answer "does the SDK seam accommodate +salesagent's setattr-sentinel pattern" with the SDK as it ships +today. + +**Step 0 partially answered this:** the setattr pattern works as-is +(`compose_method` passes `req` through unchanged; setattr on a +Pydantic model with `extra='forbid'` survives Python-level +dispatch). So the prior holds for this experiment. + +The deeper question — "what is the right Protocol seam for +resumption markers across multiple adopters?" — is **N=1 from +this experiment**. Falsifiers for the broader claim: + +* **Salesagent's pattern doesn't map cleanly to a paused-coroutine + shape** another adopter might use. If a future adopter with + TaskRegistry-style resumption can't reuse the experiment's + marker shape, the typed seam needs to be different. +* **The setattr survives only because no transport boundary + intervenes.** If the experiment's SDK runtime ever needs to + re-validate, re-project, or serialize the request between gate + and inner, the sentinel dies. (This isn't true today — verified + in Step 0.5 — but it's a fragile invariant.) + +If any falsifier fires: the Protocol RFC should propose a typed +`ctx.resumption_token: ResumptionToken | None` that's robust to +re-projection. **The experiment can't choose between shapes; it +just shows the untyped pattern works for one adopter.** + +### Q5 — Does F12 webhook auto-emit hold up under real load? + +Prior, original: `WebhookSender` configured on `serve(...)` fires +sync-completion webhooks automatically, signed correctly, retried +on transient failure, logged-and-swallowed on permanent failure — +without adapter code participating. §3.14's claim that adopters +delete their webhook plumbing wholesale. + +**Step 0.6 already partially falsified this.** Salesagent's +`X-Webhook-Signature` scheme and SDK's `X-AdCP-Signature` scheme +are incompatible. §3.14 needs a correction. So the prior is +already known wrong — the question now is which of three cutover +paths the experiment recommends: + +(a) Buyers migrate to SDK signing. +(b) SDK ships a salesagent-compatible signing mode alongside + `from_adcp_legacy_hmac`. +(c) Side-car preserves salesagent's `webhook_authenticator.py` + rather than using F12 auto-emit. + +Falsifiers for the SDK→SDK signing path (the only one the +experiment validates): + +* **`WebhookSender` → `WebhookReceiver` round-trip fails** with + matching secrets (extremely unlikely — well-tested in + conformance suite, but worth running once on day 1). +* **Auto-emit doesn't fire** after a successful mutating tool + call (means F12 framework wiring is broken or our `serve(...)` + config is wrong). +* **Retry / failure-swallow doesn't behave per spec** — would + require buyer-side observation of retried deliveries. + +If any falsifier fires: F12 isn't ready as the default path even +for SDK→SDK signing. + ## Risks (revised) * **Wrap target drift.** Mitigated by Step 0 `_impl` identification @@ -716,9 +897,9 @@ section above. Remaining items are concrete prereqs. the experiment, use SDK→SDK signing only (test buyer is `adcp.WebhookReceiver` with the same secret). Production cutover requires buyer migration as separate work. -0.7. Pre-register the candidate contradictions for each of the five - learning questions (which finding would tell us each prior is - wrong). +0.7. ✅ Falsification signals pre-registered for each of the five + (six, with Q1.5) learning questions. See "Pre-registered + falsification signals" section above. **Phase 1 — `dynamic_products.py` recipe falsification (~1 day).** From 599d96b34fe828fc25cb132c9a98d3da7270d90a Mon Sep 17 00:00:00 2001 From: Brian O'Kelley Date: Sun, 3 May 2026 20:42:49 -0400 Subject: [PATCH 7/7] =?UTF-8?q?docs(proposals):=20Phase=201A=20early=20fin?= =?UTF-8?q?dings=20=E2=80=94=20Q1.5=20falsifier=20fires?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Read dynamic_products.py end-to-end (505 LOC, src/services/ dynamic_products.py). Two of three pre-registered Q1.5 falsifiers fire from reading alone, no harness needed. Five structural facts: 1. Variants are persistent DB rows in the products table — not session-scoped state 2. Variants share the template's implementation_config verbatim (line 303); recipe shape doesn't differ 3. Variant identity is globally deterministic via md5 hash of activation_key — dedup crosses sessions/buyers 4. Signal-derived data lives on the Product row (signal_metadata, activation_key, parent_product_id, expires_at, is_dynamic_variant), NOT in implementation_config 5. Variants have an independent lifecycle (TTL + archival) with no relationship to proposal acceptance/finalization Falsifiers fired (Q1.5): - "Variant Products require new schema rows" — confirmed - "Hash-dedup state crosses sessions" — confirmed - "Recipe schema requires proposal_id lookup" — not strictly fired, but related: recipe is Product-scoped not proposal-scoped, so #502's framework-managed-recipe-state model is wrong shape Implication for #502: - Recipe is adopter-owned data on the Product row (or equivalent) - Framework's job at the seam is TYPING the recipe contract, not CACHING it - Proposal-time assembly (generating new Product rows that share a template's recipe) is adopter logic; framework shouldn't try to cache "proposal recipes" because proposals don't own them Exit criterion (5) — at least one finding contradicts a #502 prior — satisfied early, pre-registered, before 1B harness runs. Q1 prediction (still pre-1B): wrapper is small (~50-80 LOC) for the dynamic-products subset. Wrapper sketch included in doc. Q2 still pending — needs 1B run projecting actual implementation_config through a typed Pydantic recipe. Phase 1B harness setup documented (worktree, fixtures, signals_agent_registry mock, wrapper module, typed GAMRecipe model, test harness). ~2 hours in a salesagent worktree. Next session. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../salesagent-sidecar-experiment.md | 198 ++++++++++++++++++ 1 file changed, 198 insertions(+) diff --git a/docs/proposals/salesagent-sidecar-experiment.md b/docs/proposals/salesagent-sidecar-experiment.md index 15037c528..e49ccb42d 100644 --- a/docs/proposals/salesagent-sidecar-experiment.md +++ b/docs/proposals/salesagent-sidecar-experiment.md @@ -624,6 +624,204 @@ Phase 1 is the cheapest place to falsify this. If the wire shape can't carry signal-driven variants without escape hatches, #502's recipe model needs revision. +## Phase 1 — early findings from code reading (1A) + +Phase 1 is in two parts: **1A** is a careful read of `dynamic_products.py` +to surface findings that don't require a running harness; **1B** is the +empirical run with fixtures + projection to wire shape. + +1A complete, 1B TBD. Here's what 1A produced. + +### What `dynamic_products.py` actually does + +* `generate_variants_for_brief(tenant_id, brief, our_agent_url)` + (`src/services/dynamic_products.py:28`, async). +* Reads `Product` rows where `is_dynamic=True` from the DB + (the "templates"). +* Calls the singleton `SignalsAgentRegistry.get_signals(brief, ...)` + with all configured signal agents for the tenant. +* For each (template, signal) pair, generates a variant `Product` + via `generate_variants_from_signals` (`:133`): + - Computes deterministic `variant_id = + f"{template_id}__variant_{md5(activation_key)[:8]}"` (`:267`) + - Looks up existing variant by `variant_id`; if found, extends + `expires_at` and returns it + - If not, creates a new `Product(**variant_data)` with + `implementation_config` copied verbatim from the template + (`:303`) +* `session.add(variant)` then `session.commit()` — variants are + full DB rows. + +### Five structural facts that bear on the experiment + +1. **Variants are persistent DB rows**, not session-scoped state. + They live in the same `products` table as static products, + indexed and queryable globally. +2. **Variants share the template's `implementation_config` + verbatim** (`:303`). The recipe shape doesn't differ between + template and variant. +3. **Variant identity is globally deterministic.** Multiple briefs + from any buyer hitting the same signal segment converge on the + same `variant_id` via the md5 hash of the activation key. +4. **Variants carry signal-specific data on the Product row, NOT + in `implementation_config`.** `signal_metadata`, + `activation_key`, `parent_product_id`, `expires_at`, + `is_dynamic_variant` are top-level Product columns. +5. **Variants have an independent lifecycle.** TTL via + `expires_at`; archival via `archive_expired_variants()` + (`:475`). The lifecycle has nothing to do with proposal + acceptance, finalization, or buy creation. + +### Falsifiers fired by 1A (no harness needed) + +**Q1.5 — Does the recipe model allow proposal-time *assembly*?** +Three pre-registered falsifiers; **two confirmed**, one partial: + +* ✅ **"Variant Products require new schema rows."** + Confirmed. Salesagent variants are full `Product` rows with + signal-derived columns (`signal_metadata`, `activation_key`, + `parent_product_id`, `is_dynamic_variant`, `expires_at`) + that don't fit a "recipe in session cache" abstraction. +* ✅ **"Hash-dedup state crosses sessions."** + Confirmed. `generate_variant_id` is a deterministic hash; + variants are deduplicated globally across all briefs from all + buyers, not per-session. +* ⚪ **"Recipe schema requires `proposal_id` lookup."** + Not directly fired (#502 doesn't strictly require this in its + current draft), but related: salesagent's recipe is + *Product-scoped*, not proposal-scoped. The framework + abstraction in #502 conflates two layers — the recipe content + (Product-scoped, stable) and the proposal/session state + (per-buyer-session, transient). They don't need to share a + cache. + +**Implication for #502.** The "framework-managed recipe state +against `proposal_id`" framing in #502 is the wrong shape for +salesagent's pattern. The corrected model: + +* Recipe lives on the Product (or its equivalent in adopter + storage). Adopter-owned, not framework-owned. +* Framework's job at the seam is **typing** the recipe contract, + not **caching** it. `recipe_type: ClassVar[type[Recipe]]` on + `DecisioningPlatform` is the contract; the framework validates + the shape at adapter boundaries, doesn't manage the storage. +* Proposal-time *assembly* — generating new Product rows that + share a template's recipe — is adopter logic. The framework + shouldn't try to cache "proposal recipes" because proposals + don't own them. + +**This is a contradicting finding for #502 — exit criterion (5) +satisfied early.** The recipe-as-framework-managed-state model +in the current draft of #502 needs revision. The simpler shape +(framework types the recipe contract; storage is adopter-owned) +fits salesagent without escape hatches. + +### What Q1 looks like from reading + +Q1 asks whether `dynamic_products.py` factors onto +`ProposalManager.get_products` via a thin wrapping. Reading the +code suggests the wrapper is **small** for the dynamic-products +subset: + +```python +# Sketch — not run yet, awaiting 1B +async def get_products( + self, req: GetProductsRequest, ctx: RequestContext[Any] +) -> GetProductsResponse: + tenant_id = ctx.account.metadata["tenant_id"] + our_agent_url = self.our_agent_url + + # Static catalog (existing path) + static_products = await self._fetch_static_catalog(tenant_id, req) + + # Dynamic variants (the salesagent path) + if req.brief: + variant_products = await generate_variants_for_brief( + tenant_id=tenant_id, + brief=req.brief, + our_agent_url=our_agent_url, + ) + else: + variant_products = [] + + # Project all (static + variants) to wire shape + all_products = [ + self._project_to_wire(p) for p in static_products + variant_products + ] + return GetProductsResponse(products=all_products) +``` + +**Predicted glue: ~50-80 LOC** for the dynamic-products subset +(request projection in, ORM-to-wire projection out). Well under +the 60% / 303 LOC threshold for the whole `dynamic_products.py` +body. **Q1 unlikely to fire on this subset.** + +Caveat: the full `_get_products_impl` wrap (Phase 2) does more — +brand manifest filtering, brand-policy gates, AI ranking. That's +where Q1 might bite. 1A doesn't speak to that. + +### Q2 (recipe carries enough) — preliminary read + +Variants share template's `implementation_config` verbatim. So +the question reduces to: does the typed `GAMRecipe` shape carry +salesagent's actual `implementation_config` content? That's a +**1B** question — needs running variants and projecting their +`implementation_config` field through a typed Pydantic recipe. + +Pre-registered falsifiers stand: any `extra: dict[str, Any]`, +any `# type: ignore`, any lossy round-trip. 1B will produce the +verdict. + +### Phase 1B — harness still TBD + +The empirical run requires: + +* **Salesagent worktree** with a writable test DB (sqlite is + fine). +* **Seeded `Product` template rows** with `is_dynamic=True` and + configured `signals_agent_ids`. ~3 templates covering the + shapes we expect (key_value activation, segment_id activation, + null-activation fallback). +* **Mocked `signals_agent_registry`** — replace the singleton + with a fixture that returns deterministic signal lists. The + registry currently lives in `src/core/signals_agent_registry.py` + as a module-level singleton; mock via patching the module + attribute or via the `get_signals_agent_registry()` accessor. +* **The wrapper module** (sketched above) at + `src/sdk_runtime/proposal_manager_wrapper.py` in the salesagent + worktree. +* **The typed `GAMRecipe` Pydantic model** — needs writing, + informed by reading actual `implementation_config` JSON from + salesagent fixtures (or a dev DB). +* **The test harness** — pytest test that calls the wrapper with + recorded inputs, asserts the output, measures glue LOC, + documents any escape hatches encountered. + +Not done in this session. The setup is concrete and small (~2 +hours of work in a salesagent worktree), but it requires +salesagent fixtures and a running `SignalsAgentRegistry` mock — +both outside the scope of an adcp-client-python doc-writing +session. **Next session in a salesagent worktree completes 1B.** + +### Phase 1A net result + +* **Q1.5 contradicting finding for #502** — confirmed without + running anything. The "framework-managed recipe state" model + is wrong shape; recipe is Product-scoped and adopter-owned; + framework's job is to type the contract. +* **Q1 prediction** — wrapper is small (~50-80 LOC) for the + dynamic-products subset. 1B will measure exactly. +* **Q2 still pending** — needs 1B run with real + `implementation_config` values projected through a typed recipe. +* **Exit criterion (5) satisfied early** — at least one + contradicting finding, pre-registered, fired before 1B. + +The experiment can proceed to 1B / Phase 2 with the recipe model +revision in mind. **Or**, given the contradicting finding, we +revise #502 first and then run 1B against the revised model. The +falsifier already fired; running 1B confirms the empirical edges +but doesn't change the structural conclusion. + ## Pre-registered falsification signals Self-review's "one author wearing three hats" warning applies — if I