fix: distinguish task completion from operation success#34
Merged
Conversation
## Problem
When agents return `status: completed` with `success: false` in the response
data, the library incorrectly reported the operation as successful. This caused
silent failures where:
- Task execution completed successfully
- But the actual operation failed (validation error, rejection, etc.)
- Library returned `success: true`, masking the failure
Example: Wonderstruck agent returns:
```json
{
"success": false,
"message": "Missing required parameters: ['product_ids', ...]"
}
```
But library returned `success: true, status: 'completed'`
## Root Cause
The library conflated two different concepts:
1. **Task completion** - Did the API call complete? (A2A/MCP level)
2. **Operation success** - Did the operation succeed? (Application level)
## Solution
Check `response.data.success` to determine actual operation success:
- If `data.success === false`, set `result.success = false`
- If `data.success` is undefined/missing, assume success (backward compatible)
- Include error message from `data.message` in `result.error`
Applied to all completion paths:
- Immediate completion (ADCP_STATUS.COMPLETED)
- Polling completion (waitForWorkingCompletion)
- Task polling (pollTaskCompletion)
- Default/unknown status handling
## Impact
- Scripts now correctly report failures: "❌ FAILED to create media buy"
- Error messages surfaced: "Missing required parameters..."
- Backward compatible: existing agents that don't return `success` field unaffected
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
e20a9cc to
5661c68
Compare
This was referenced Oct 8, 2025
bokelley
added a commit
that referenced
this pull request
Apr 7, 2026
* feat: add storyboard-driven testing module with CLI
Storyboards become the primary testing concept, replacing scenario-based
comply retrofitting. Each YAML step maps directly to a SingleAgentClient
method via a stateless execution engine designed for LLM consumption.
- Add src/lib/testing/storyboard/ module (types, runner, loader, validations, context, task-map)
- Add CLI: adcp storyboard {list, show, run, step} with --json output
- Bundle 12 storyboard YAMLs from adcontextprotocol/adcp spec repo
- Tag storyboards with platform_types for backwards compat with PlatformType
- Add yaml as runtime dependency for YAML parsing
Closes #423
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: format storyboard module with prettier
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: migrate comply() to storyboard-driven testing
Storyboards are now the single testing surface for comply(). The
compliance engine runs storyboard YAMLs instead of hand-written
scenario functions, while maintaining the existing ComplianceResult
interface for backwards compatibility.
YAML format extensions:
- expect_error: inverts pass/fail for negative testing
- requires_tool: skip steps when agent lacks a tool
- context_outputs/context_inputs: explicit data flow between steps
- error_code validation: check error codes in error responses
- track/required_tools: map storyboards to compliance tracks
New compliance storyboards (10):
- governance_property_lists, governance_content_standards
- si_session, brand_rights, media_buy_state_machine
- error_compliance, schema_validation, behavioral_analysis
- audience_sync, deterministic_testing
Deprecates SCENARIO_REQUIREMENTS, DEFAULT_SCENARIOS, testAllScenarios()
in favor of storyboard execution. Old exports remain functional.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: guard setPath against prototype pollution
Block __proto__, constructor, and prototype keys in setPath() to
prevent prototype-polluting assignments from crafted YAML paths.
Resolves CodeQL alerts #33 and #34.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add request builder for storyboard steps
Build valid requests from discovered context (products, accounts,
formats) instead of sending raw YAML sample_request payloads. Each
task has a builder that mirrors how hand-written scenarios construct
requests — selecting products with pricing options, building proper
asset records, generating valid date ranges, etc.
sample_request from YAML becomes documentation/fallback only. The
runner priority is: --request override > request builder > sample_request.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: session init, schema registration, and validation paths
- Initialize MCP session in runner before executing steps (fixes
Streamable HTTP → SSE degradation after a few calls)
- Close MCP connections after standalone storyboard/step runs
- Register 20+ missing response schemas in TOOL_RESPONSE_SCHEMAS
(sync_accounts, list_accounts, governance, SI, capabilities, etc.)
- Fix get_media_buy_delivery validation path: media_buys → media_buy_deliveries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: bump MCP SDK to 1.29.0, retry session errors, fix validation paths
- Bump @modelcontextprotocol/sdk from 1.27.1 to 1.29.0
- Retry StreamableHTTP on any StreamableHTTPError before falling back
to SSE (workaround for SDK #1708 and #1852 — session expiry)
- Fix sync_creatives validation path: results[0].action → creatives[0].action
(matches actual SyncCreativesSuccess schema)
Test agent media_buy_seller: 8/9 pass (only sync_governance fails due to
missing spec schema — adcontextprotocol/adcp#1978)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: sync schemas and register sync_governance response schema
Resync schemas from spec to pick up the new sync_governance response
schema (adcontextprotocol/adcp#1978). Register it in TOOL_RESPONSE_SCHEMAS.
Test agent media_buy_seller: 9/9 pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: regenerate types to include syncGovernance method
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: use Object.defineProperty in setPath to satisfy CodeQL
Replace bracket assignment with Object.defineProperty so CodeQL's
static analysis recognizes the prototype pollution guard. The
FORBIDDEN_KEYS check was already correct but CodeQL doesn't trace
through Set.has().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: extract parsePath/resolvePath/setPath to shared path.ts
Deduplicate parsePath (was in both context.ts and validations.ts).
Move resolvePath and setPath to path.ts as the canonical location.
Add isPlainObject type guard before Object.defineProperty in setPath.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: assign track and required_tools to all storyboards
Every storyboard now has a track field mapping it to a ComplianceTrack,
so comply() can discover and run storyboards for all 11 tracks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When agents return
status: completedwithsuccess: falsein the response data, the library incorrectly reported the operation as successful. This caused silent failures:Example: Wonderstruck Media Buy
Agent response:
{ "success": false, "message": "Missing required parameters: ['product_ids', 'total_budget', 'flight_start_date', 'flight_end_date']" }Library returned:
Result: Script printed "✅ SUCCESS! Media buy created" even though the operation failed and no media buy was created.
Root Cause
The library conflated two different concepts:
When A2A/MCP returns
status: completed, it means "I successfully processed your request and returned a response." It does NOT mean "the operation succeeded."Solution
Check
response.data.successto determine actual operation success:data.success === false, setresult.success = falseand includeerrorfielddata.successis undefined/missing, assume success (backward compatible)data.messageintoresult.errorChanges
Applied to all completion code paths in
TaskExecutor.ts:ADCP_STATUS.COMPLETED)waitForWorkingCompletion)pollTaskCompletion)Impact
Before:
After:
Backward Compatibility
✅ Fully backward compatible - Agents that don't return a
successfield are unaffected. Only agents that explicitly returnsuccess: falsewill trigger the new failure handling.Testing
🤖 Generated with Claude Code