Skip to content

feat(testing): add SellerA2AClient for in-process A2A handler testing#694

Merged
bokelley merged 3 commits into
mainfrom
claude/issue-678-seller-a2a-client
May 12, 2026
Merged

feat(testing): add SellerA2AClient for in-process A2A handler testing#694
bokelley merged 3 commits into
mainfrom
claude/issue-678-seller-a2a-client

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Refs #678.

Why

`SellerTestClient` shipped in #666 killed the JSON-RPC envelope + `structuredContent` extraction boilerplate every MCP-adopting test file was rewriting by hand. Adopters who ship A2A in production today (salesagent, the v3 reference seller in this SDK) have been rewriting the exact same shape of boilerplate against the A2A executor — building proto Messages with DataParts, plumbing a SendMessageRequest into RequestContext, draining an EventQueueLegacy, projecting the terminal Task's first artifact back to a dict, and handling structured vs. unstructured failure modes separately.

`SellerA2AClient` collapses that into the same one-call shape as the MCP version.

What this PR ships

```python
from adcp.testing import SellerA2AClient

@pytest.fixture
def seller_a2a():
return SellerA2AClient(MySeller())

async def test_buy_not_found(seller_a2a):
result = await seller_a2a.invoke(
"update_media_buy", {"media_buy_id": "missing", ...}
)
assert not result.ok
assert result.adcp_error.code == "MEDIA_BUY_NOT_FOUND"
```

  • New `SellerA2AClient` in `adcp.testing`. Same constructor shape as `SellerTestClient` (takes a `DecisioningPlatform`, optional `validation` config).
  • Reuses `ToolInvokeResult` and `AdcpErrorPayload` — assertion grammar is identical across MCP and A2A test suites.
  • Drains the A2A event queue with a bounded loop + per-event timeout (default 5s). A buggy handler that never publishes a terminal event raises a clear `RuntimeError` instead of hanging the test runner.
  • Structured A2A errors (AdcpError raised by handler, `adcp_error` in failed-Task DataPart per transport-errors.mdx §A2A Binding) extract identically to MCP.
  • Unstructured A2A errors (unknown skill, unparseable message — FAILED Task with no `adcp_error` payload) synthesize into an `INTERNAL_ERROR`-coded `AdcpErrorPayload` so callers can assert on `result.adcp_error` uniformly across structured/unstructured failures.

What this PR does NOT ship (deferred to follow-ups on #678)

These are the genuinely A2A-shaped concerns that motivated the original issue framing:

  • Push-notification capture sink — assertion grammar for "did the seller emit the completion webhook? was it RFC 9421-signed with the right key id? was the body shape correct?"
  • Intermediate-state observer — assertion on `working` / `input_required` transitions.
  • Task cancel / resume harness — drive a task mid-flight, cancel, assert on resulting state and cleanup.

All three depend on the TaskStore + push-notification + middleware hooks that are themselves deferred per the framework's A2A roadmap. Shipping basic `invoke()` now buys adopters the boilerplate elimination they need today; the lifecycle harness arrives when the framework hooks do.

What was tested

  • `pytest tests/test_seller_a2a_client.py -v` — 9 new tests covering success path, structured error extraction (code/message/recovery/field), unstructured-failure synthesis (unknown skill), data=None on error, structured_content always populated, public import path.
  • `ruff check src/adcp/testing/` — clean.
  • `mypy src/adcp/testing/harness.py --ignore-missing-imports` — clean.

🤖 Generated with Claude Code

Refs #678.

A2A sibling to SellerTestClient. Same call shape (`await client.invoke(skill, payload)`),
same return type (ToolInvokeResult), but routes through the A2A executor +
event-queue dispatch path instead of MCP's tool call.

This eliminates the same boilerplate adopters currently rewrite in every A2A
test file: ADCPAgentExecutor construction, RequestContext + SendMessageRequest
+ DataPart proto plumbing, EventQueueLegacy drain loop, terminal-Task
projection back to a dict, plus separate handling of the structured (adcp_error
in DataPart) vs. unstructured (FAILED Task without adcp_error) error paths.

The harness drains the event queue with a bounded loop + per-event timeout so
a buggy handler that never publishes a terminal event can't hang the test
runner. Unstructured A2A failures (unknown skill, unparseable message) are
synthesized into an INTERNAL_ERROR-coded AdcpErrorPayload so callers can
assert on `result.adcp_error` uniformly across both failure modes.

Explicitly out of scope for this PR (tracked as follow-ups on #678):
- Push-notification capture sink for asserting on outbound signed webhooks
- Intermediate-state observation (working/input_required transitions)
- Task cancellation harness

These depend on the TaskStore + push-notification + middleware hooks
that are themselves deferred per the framework's A2A adoption roadmap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley
Copy link
Copy Markdown
Contributor Author

Review

All 13 CI jobs green. The boilerplate elimination matches SellerTestClient so adopters with mixed MCP/A2A test suites get a single grammar — that consistency is the win.

mergeStateStatus: BLOCKED here just means review-required, not check failure.

Solid

  • Same ToolInvokeResult + AdcpErrorPayload return shape as the MCP client. An adopter porting an MCP assertion to A2A only changes the fixture, not the assertions. That's the right call.
  • _executor_lock + to_thread(self._build_executor_sync) is correct: create_adcp_server_from_platform calls asyncio.run internally via validate_capabilities_response_shape, which would explode if invoked from an already-running loop. The fix is right, but the root cause is in the framework, not this client — see follow-up below.
  • Unstructured-failure synthesis (INTERNAL_ERROR envelope for unknown skill / unparseable message) closes a real ergonomics gap. Without it, every adopter would have to handle structured vs. unstructured failures with two assertion patterns.
  • Strict envelope validation (raise RuntimeError if adcp_error.code/adcp_error.message is missing) catches non-conformant servers loudly. Good defensive posture for a test client.

Non-blocking observations

  1. for _ in range(32): is a magic number. The comment says "bounded to a small number of intermediate state events" — fine for current A2A behavior, but the cap should be a constructor kwarg. If a future intermediate-state observer (one of the deferred follow-ups in the PR body) ever lands, that 32 will start mattering. Suggest max_events: int = 32 as a constructor arg now to avoid an API break later.

  2. EventQueueLegacy import. The class name advertises its own deprecation. If a2a.server.events has a non-legacy queue, prefer it; if not, file an upstream tracking issue. Will add this to my follow-up batch.

  3. Test gap: validation round-trip. Constructor accepts a validation: ValidationHookConfig param documented as "Pass SERVER_DEFAULT_VALIDATION to match production behavior" — but no test asserts that an invalid payload is rejected when validation is wired. With validation defaulting to None (off) and not tested in either configuration, an adopter who sets validation=SERVER_DEFAULT_VALIDATION is trusting the executor to enforce. One test that flips the switch and asserts the rejection path would close that gap.

  4. role="ROLE_USER" as a string literal. Works because protobuf accepts the enum value name as a string, but if a2a.types.Role.USER (or similar) is available, prefer it — the protobuf enum migration risk is real and pinning by name dodges that.

  5. Skill-name validation deferred to server. A typo'd skill returns INTERNAL_ERROR synthesized envelope. That's correct (mirrors prod behavior — unknown skills fail at the executor), but the test that asserts this (test_a2a_invoke_unknown_skill_surfaces_failure) is the only coverage of the unstructured-failure synthesis path. Worth a second test for "no parseable skill at all" (DataPart present but no skill key) to exercise the same path with a different upstream cause.

Deferred follow-ups already in PR body

The three #678 follow-ups (push-notification capture, intermediate-state observer, cancel/resume harness) are correctly scoped out. None of them block this PR — the basic invoke path is the most-used surface and unlocks the largest swath of adopter test rewrites.

LGTM modulo the follow-ups (which I'll file). The validation round-trip test is the only one I'd ask for in this PR specifically; the rest are issues for separate work.

bokelley and others added 2 commits May 12, 2026 11:21
Per PR #694 review: prove the `validation=` parameter actually engages
the validation hook chain end-to-end, not just that __init__ accepts it.

The stub returns `creative_formats: []` while the spec requires
`formats`. With validation off (the test default) this passes through;
with SERVER_DEFAULT_VALIDATION the response-side validator rejects
and surfaces a structured VALIDATION_ERROR with `side: response` in
details. Asserting on that proves the harness threads validation
through the A2A executor's event-queue → terminal-Task path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley bokelley merged commit 2d1ae2f into main May 12, 2026
15 of 16 checks passed
@bokelley bokelley deleted the claude/issue-678-seller-a2a-client branch May 12, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant