Skip to content

fix: distinguish task completion from operation success#34

Merged
bokelley merged 3 commits intomainfrom
bokelley/fix-media-buy-success-detection
Oct 8, 2025
Merged

fix: distinguish task completion from operation success#34
bokelley merged 3 commits intomainfrom
bokelley/fix-media-buy-success-detection

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented Oct 6, 2025

Problem

When agents return status: completed with success: false in the response data, the library incorrectly reported the operation as successful. This caused silent failures:

Example: Wonderstruck Media Buy

Agent response:

{
  "success": false,
  "message": "Missing required parameters: ['product_ids', 'total_budget', 'flight_start_date', 'flight_end_date']"
}

Library returned:

{ success: true, status: 'completed', data: { success: false, ... } }

Result: Script printed "✅ SUCCESS! Media buy created" even though the operation failed and no media buy was created.

Root Cause

The library conflated two different concepts:

  1. Task completion - Did the API call complete? (protocol level)
  2. Operation success - Did the business operation succeed? (application level)

When A2A/MCP returns status: completed, it means "I successfully processed your request and returned a response." It does NOT mean "the operation succeeded."

Solution

Check response.data.success to determine actual operation success:

  • If data.success === false, set result.success = false and include error field
  • If data.success is undefined/missing, assume success (backward compatible)
  • Extract error message from data.message into result.error

Changes

Applied to all completion code paths in TaskExecutor.ts:

  1. Immediate completion (ADCP_STATUS.COMPLETED)
  2. Polling completion (waitForWorkingCompletion)
  3. Task polling (pollTaskCompletion)
  4. Default/unknown status handling
// Before
return { success: true, status: 'completed', data: completedData };

// After
const operationSuccess = completedData?.success !== false;
return {
  success: operationSuccess,
  status: 'completed',
  data: completedData,
  error: operationSuccess ? undefined : (completedData?.message || 'Operation failed')
};

Impact

Before:

✅ SUCCESS! Media buy created
🎉 Media buy successfully created on Wonderstruck!
   Your 300x250 creative is now running.

After:

❌ FAILED to create media buy

Error Details:
   Error: Missing required parameters: ['product_ids', 'total_budget', 'flight_start_date', 'flight_end_date']
   Status: completed
   Response Time: 75ms

Backward Compatibility

Fully backward compatible - Agents that don't return a success field are unaffected. Only agents that explicitly return success: false will trigger the new failure handling.

Testing

  • Build passes
  • Tested with Wonderstruck agent - correctly reports failure
  • Error message properly surfaced to scripts
  • No changes to existing test expectations needed

🤖 Generated with Claude Code

## Problem
When agents return `status: completed` with `success: false` in the response
data, the library incorrectly reported the operation as successful. This caused
silent failures where:
- Task execution completed successfully
- But the actual operation failed (validation error, rejection, etc.)
- Library returned `success: true`, masking the failure

Example: Wonderstruck agent returns:
```json
{
  "success": false,
  "message": "Missing required parameters: ['product_ids', ...]"
}
```
But library returned `success: true, status: 'completed'`

## Root Cause
The library conflated two different concepts:
1. **Task completion** - Did the API call complete? (A2A/MCP level)
2. **Operation success** - Did the operation succeed? (Application level)

## Solution
Check `response.data.success` to determine actual operation success:
- If `data.success === false`, set `result.success = false`
- If `data.success` is undefined/missing, assume success (backward compatible)
- Include error message from `data.message` in `result.error`

Applied to all completion paths:
- Immediate completion (ADCP_STATUS.COMPLETED)
- Polling completion (waitForWorkingCompletion)
- Task polling (pollTaskCompletion)
- Default/unknown status handling

## Impact
- Scripts now correctly report failures: "❌ FAILED to create media buy"
- Error messages surfaced: "Missing required parameters..."
- Backward compatible: existing agents that don't return `success` field unaffected

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Oct 8, 2025

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@bokelley bokelley force-pushed the bokelley/fix-media-buy-success-detection branch from e20a9cc to 5661c68 Compare October 8, 2025 02:45
@bokelley bokelley merged commit 34b8d88 into main Oct 8, 2025
6 checks passed
This was referenced Oct 8, 2025
bokelley added a commit that referenced this pull request Apr 7, 2026
Block __proto__, constructor, and prototype keys in setPath() to
prevent prototype-polluting assignments from crafted YAML paths.
Resolves CodeQL alerts #33 and #34.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bokelley added a commit that referenced this pull request Apr 7, 2026
* feat: add storyboard-driven testing module with CLI

Storyboards become the primary testing concept, replacing scenario-based
comply retrofitting. Each YAML step maps directly to a SingleAgentClient
method via a stateless execution engine designed for LLM consumption.

- Add src/lib/testing/storyboard/ module (types, runner, loader, validations, context, task-map)
- Add CLI: adcp storyboard {list, show, run, step} with --json output
- Bundle 12 storyboard YAMLs from adcontextprotocol/adcp spec repo
- Tag storyboards with platform_types for backwards compat with PlatformType
- Add yaml as runtime dependency for YAML parsing

Closes #423

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: format storyboard module with prettier

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: migrate comply() to storyboard-driven testing

Storyboards are now the single testing surface for comply(). The
compliance engine runs storyboard YAMLs instead of hand-written
scenario functions, while maintaining the existing ComplianceResult
interface for backwards compatibility.

YAML format extensions:
- expect_error: inverts pass/fail for negative testing
- requires_tool: skip steps when agent lacks a tool
- context_outputs/context_inputs: explicit data flow between steps
- error_code validation: check error codes in error responses
- track/required_tools: map storyboards to compliance tracks

New compliance storyboards (10):
- governance_property_lists, governance_content_standards
- si_session, brand_rights, media_buy_state_machine
- error_compliance, schema_validation, behavioral_analysis
- audience_sync, deterministic_testing

Deprecates SCENARIO_REQUIREMENTS, DEFAULT_SCENARIOS, testAllScenarios()
in favor of storyboard execution. Old exports remain functional.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: guard setPath against prototype pollution

Block __proto__, constructor, and prototype keys in setPath() to
prevent prototype-polluting assignments from crafted YAML paths.
Resolves CodeQL alerts #33 and #34.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add request builder for storyboard steps

Build valid requests from discovered context (products, accounts,
formats) instead of sending raw YAML sample_request payloads. Each
task has a builder that mirrors how hand-written scenarios construct
requests — selecting products with pricing options, building proper
asset records, generating valid date ranges, etc.

sample_request from YAML becomes documentation/fallback only. The
runner priority is: --request override > request builder > sample_request.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: session init, schema registration, and validation paths

- Initialize MCP session in runner before executing steps (fixes
  Streamable HTTP → SSE degradation after a few calls)
- Close MCP connections after standalone storyboard/step runs
- Register 20+ missing response schemas in TOOL_RESPONSE_SCHEMAS
  (sync_accounts, list_accounts, governance, SI, capabilities, etc.)
- Fix get_media_buy_delivery validation path: media_buys → media_buy_deliveries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: bump MCP SDK to 1.29.0, retry session errors, fix validation paths

- Bump @modelcontextprotocol/sdk from 1.27.1 to 1.29.0
- Retry StreamableHTTP on any StreamableHTTPError before falling back
  to SSE (workaround for SDK #1708 and #1852 — session expiry)
- Fix sync_creatives validation path: results[0].action → creatives[0].action
  (matches actual SyncCreativesSuccess schema)

Test agent media_buy_seller: 8/9 pass (only sync_governance fails due to
missing spec schema — adcontextprotocol/adcp#1978)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sync schemas and register sync_governance response schema

Resync schemas from spec to pick up the new sync_governance response
schema (adcontextprotocol/adcp#1978). Register it in TOOL_RESPONSE_SCHEMAS.

Test agent media_buy_seller: 9/9 pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: regenerate types to include syncGovernance method

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use Object.defineProperty in setPath to satisfy CodeQL

Replace bracket assignment with Object.defineProperty so CodeQL's
static analysis recognizes the prototype pollution guard. The
FORBIDDEN_KEYS check was already correct but CodeQL doesn't trace
through Set.has().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: extract parsePath/resolvePath/setPath to shared path.ts

Deduplicate parsePath (was in both context.ts and validations.ts).
Move resolvePath and setPath to path.ts as the canonical location.
Add isPlainObject type guard before Object.defineProperty in setPath.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: assign track and required_tools to all storyboards

Every storyboard now has a track field mapping it to a ComplianceTrack,
so comply() can discover and run storyboards for all 11 tracks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant