fix(x402): gate artifact delivery on settlement success (#562)#563
Conversation
x402's verify→execute→settle ordering let a verified-but-unsettled payment still deliver the artifact: when settle returned a non-success or raised, the worker carried on, marked the task completed, and pushed the artifact while only recording the failure in metadata. A payer who double-spends between verify and settle, or whose settlement times out, walks away with paid output. - Gate artifact delivery on settle_ok in _handle_terminal_state; on failure delegate to a new _handle_settlement_failure that marks the task failed with no artifact pushed. - Persist EIP-3009 nonce, full authorization block, and network in _settle_payment failure metadata so an operator can reconcile a half-settled transaction instead of seeing only str(e). - Add three tests covering: settle-fail withholds artifact, settle exception persists recovery metadata, happy path unchanged. - Document the new behavior, the recovery metadata fields, and the remaining parallel-nonce double-spend gap in docs/PAYMENT.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughManifestWorker now withholds completed-task artifacts when x402 payment settlement fails. On terminal-state completion, settlement metadata status is checked; failure triggers a new handler that marks the task failed with reconciliation details, emits telemetry, and notifies lifecycle webhooks before returning early. ChangesPayment Settlement Failure Handling
Sequence DiagramsequenceDiagram
participant CompletedState as Completed State
participant SettlePayment as _settle_payment
participant FacilitatorClient as Facilitator
participant FailureHandler as _handle_settlement_failure
participant TaskStorage as Task Storage
participant LifecycleNotif as Lifecycle Webhook
CompletedState->>SettlePayment: initiate settlement
SettlePayment->>FacilitatorClient: call settle API
FacilitatorClient-->>SettlePayment: settlement metadata
SettlePayment-->>CompletedState: return metadata
alt Settlement Success
CompletedState->>TaskStorage: mark completed
else Settlement Failure
CompletedState->>FailureHandler: handle failure
FailureHandler->>TaskStorage: mark failed + metadata
FailureHandler->>LifecycleNotif: notify failure
FailureHandler-->>CompletedState: return early
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bindu/server/workers/manifest_worker.py`:
- Around line 556-563: The current code appends settlement details (error_text)
into the user-facing message, leaking internal info; change the
MessageConverter.to_protocol_messages call to use the fixed string "Payment
settlement failed; output withheld." (keep using task["id"] and
task["context_id"]) and do NOT include
settlement_metadata.get(app_settings.x402.meta_error_key) in the message body;
preserve the facilitator reason only in task.metadata/settlement_metadata (leave
settlement_metadata and app_settings.x402.meta_error_key usage for internal
tracking) so the caller receives the generic message via
MessageConverter.to_protocol_messages.
In `@tests/unit/server/workers/test_manifest_worker.py`:
- Around line 759-825: Update test_settle_failure_does_not_deliver_artifact to
assert the stored failure message is the generic string and the detailed
facilitator error lives in metadata: after calling
worker._handle_terminal_state, inspect mock_storage.update_task.call_args_list
and assert one of the update calls has kwargs.get("message") == "Payment
settlement failed; output withheld." and that its kwargs.get("metadata",
{}).get("x402.payment.error") (or the payment error key used by
_handle_terminal_state) equals or contains "transfer reverted: insufficient
balance"; use the same symbols from the diff (worker._handle_terminal_state,
mock_storage.update_task, payment_context) to locate where to add these
assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c8cc99cd-8078-41d1-b018-d88ae979e794
📒 Files selected for processing (3)
bindu/server/workers/manifest_worker.pydocs/PAYMENT.mdtests/unit/server/workers/test_manifest_worker.py
| error_text = settlement_metadata.get( | ||
| app_settings.x402.meta_error_key, "settlement failed" | ||
| ) | ||
| error_message = MessageConverter.to_protocol_messages( | ||
| f"Payment settlement failed; output withheld. Reason: {error_text}", | ||
| task["id"], | ||
| task["context_id"], | ||
| ) |
There was a problem hiding this comment.
Keep the settlement-failure message body generic.
The documented behavior for this flow is a fixed Payment settlement failed; output withheld. message, with the facilitator reason stored in task.metadata. Appending error_text here leaks internal settlement details to the caller and breaks that contract.
Proposed fix
- error_text = settlement_metadata.get(
- app_settings.x402.meta_error_key, "settlement failed"
- )
error_message = MessageConverter.to_protocol_messages(
- f"Payment settlement failed; output withheld. Reason: {error_text}",
+ "Payment settlement failed; output withheld.",
task["id"],
task["context_id"],
)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@bindu/server/workers/manifest_worker.py` around lines 556 - 563, The current
code appends settlement details (error_text) into the user-facing message,
leaking internal info; change the MessageConverter.to_protocol_messages call to
use the fixed string "Payment settlement failed; output withheld." (keep using
task["id"] and task["context_id"]) and do NOT include
settlement_metadata.get(app_settings.x402.meta_error_key) in the message body;
preserve the facilitator reason only in task.metadata/settlement_metadata (leave
settlement_metadata and app_settings.x402.meta_error_key usage for internal
tracking) so the caller receives the generic message via
MessageConverter.to_protocol_messages.
| async def test_settle_failure_does_not_deliver_artifact(self): | ||
| """Verify-ok + settle-fail must NOT deliver paid output. | ||
|
|
||
| Reproduces the leak in issue #562: today, if facilitator.settle fails | ||
| after verify already passed, the worker still transitions the task to | ||
| completed and pushes the artifact. The fix gates artifact delivery on | ||
| settle success — task is marked failed and no artifact is notified. | ||
| """ | ||
| mock_manifest = Mock() | ||
| mock_manifest.did_extension = Mock() | ||
| mock_manifest.did_extension.did = "did:example:123" | ||
| mock_manifest.x402_extension = Mock() | ||
| mock_scheduler = Mock() | ||
| mock_storage = AsyncMock() | ||
|
|
||
| task_id = uuid4() | ||
| context_id = uuid4() | ||
| task = cast( | ||
| Task, | ||
| { | ||
| "id": task_id, | ||
| "context_id": context_id, | ||
| "status": {"state": "working", "timestamp": "2024-01-01T00:00:00Z"}, | ||
| }, | ||
| ) | ||
|
|
||
| worker = ManifestWorker( | ||
| manifest=mock_manifest, scheduler=mock_scheduler, storage=mock_storage | ||
| ) | ||
|
|
||
| # Simulate facilitator-reported settlement failure. | ||
| worker._settle_payment = AsyncMock( # type: ignore[method-assign] # ty: ignore[invalid-assignment] | ||
| return_value={ | ||
| "x402.payment.status": "payment-failed", | ||
| "x402.payment.error": "transfer reverted: insufficient balance", | ||
| } | ||
| ) | ||
| worker._notify_artifact = AsyncMock() # type: ignore[method-assign] # ty: ignore[invalid-assignment] | ||
|
|
||
| payment_context = { | ||
| "payment_payload": { | ||
| "payload": { | ||
| "authorization": { | ||
| "nonce": "0xdeadbeef", | ||
| "from": "0xMallory", | ||
| "to": "0xAlice", | ||
| "value": "1000000", | ||
| } | ||
| }, | ||
| }, | ||
| "payment_requirements": {}, | ||
| } | ||
|
|
||
| await worker._handle_terminal_state( | ||
| task, "Paid summary contents", "completed", payment_context=payment_context | ||
| ) | ||
|
|
||
| # The task must NOT end in "completed" — that's the leak. | ||
| update_calls = mock_storage.update_task.call_args_list | ||
| terminal_states = [call.kwargs.get("state") for call in update_calls] | ||
| assert "completed" not in terminal_states, ( | ||
| f"Settle failed but task still marked completed: {terminal_states}" | ||
| ) | ||
| assert "failed" in terminal_states | ||
|
|
||
| # And the artifact must not have been pushed to the client. | ||
| worker._notify_artifact.assert_not_called() # ty: ignore[unresolved-attribute] |
There was a problem hiding this comment.
Assert the failed-task message stays generic.
This test verifies the state/artifact behavior, but it won't catch a regression where the task history exposes the facilitator error to the client. Please also assert that the stored failure message is exactly Payment settlement failed; output withheld. and that the detailed reason remains in metadata.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/server/workers/test_manifest_worker.py` around lines 759 - 825,
Update test_settle_failure_does_not_deliver_artifact to assert the stored
failure message is the generic string and the detailed facilitator error lives
in metadata: after calling worker._handle_terminal_state, inspect
mock_storage.update_task.call_args_list and assert one of the update calls has
kwargs.get("message") == "Payment settlement failed; output withheld." and that
its kwargs.get("metadata", {}).get("x402.payment.error") (or the payment error
key used by _handle_terminal_state) equals or contains "transfer reverted:
insufficient balance"; use the same symbols from the diff
(worker._handle_terminal_state, mock_storage.update_task, payment_context) to
locate where to add these assertions.
…re message Per CodeRabbit review on #563: the failure message included the raw ``error_text`` from the facilitator (e.g. internal HTTP 500 body, JSON payloads). That can disclose facilitator topology or internal state to the caller — and the caller already gets a clear "task failed" signal plus the structured metadata fields they need. The full reason still lives in ``task.metadata["x402.payment.error"]`` for operator audit; it's just no longer echoed in the agent message that goes back over the wire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #563 closed the artifact-leak half of issue #562 (failed settle no longer delivered the artifact) but agents still ate the LLM cost on every failed settle — a payer who drained their wallet after verify, or who lost a parallel-nonce settle race, still triggered an LLM call. This PR moves _settle_payment to BEFORE manifest.run, so a failed settlement costs the agent zero LLM tokens: - run_task now calls _settle_payment immediately after extracting payment_context and before transitioning the task to "working". Settle-fail → _handle_settlement_failure with full recovery metadata → return. Manifest never runs. - _handle_terminal_state takes settlement_metadata (pre-settled receipts) instead of payment_context; the redundant post-execute gate from #563 is removed. - _handle_task_failure accepts the settlement_metadata and tags the payment as "payment-orphaned" when work raises after a successful settle. x402 has no refund primitive, so the EIP-3009 fields are persisted for the operator to issue a manual transfer back. Wall-clock latency on the happy path is unchanged — settle (2-5s on Base) just moves from after the LLM call to before it. This is the same choice Google's A2A x402 extension makes for the same reason; Coinbase's x402-express middleware uses the opposite ordering, but that's tuned for sub-second API endpoints, not for agent workloads where the verify-vs-settle gap can span minutes. Three new tests cover the new contract: - settle-fail must not invoke manifest.run - settle must complete before manifest.run starts (call-order assertion) - work failure after successful settle persists payment-orphaned metadata Two tests from #563 (test_settle_failure_does_not_deliver_artifact, test_settle_success_unchanged) are removed — they exercised the post-execute gate that no longer exists. The new end-to-end settle-first tests subsume them. Stacks on #563. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re message Per CodeRabbit review on #563: the failure message included the raw ``error_text`` from the facilitator (e.g. internal HTTP 500 body, JSON payloads). That can disclose facilitator topology or internal state to the caller — and the caller already gets a clear "task failed" signal plus the structured metadata fields they need. The full reason still lives in ``task.metadata["x402.payment.error"]`` for operator audit; it's just no longer echoed in the agent message that goes back over the wire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(x402): adopt settle-first ordering for paid agent tasks PR #563 closed the artifact-leak half of issue #562 (failed settle no longer delivered the artifact) but agents still ate the LLM cost on every failed settle — a payer who drained their wallet after verify, or who lost a parallel-nonce settle race, still triggered an LLM call. This PR moves _settle_payment to BEFORE manifest.run, so a failed settlement costs the agent zero LLM tokens: - run_task now calls _settle_payment immediately after extracting payment_context and before transitioning the task to "working". Settle-fail → _handle_settlement_failure with full recovery metadata → return. Manifest never runs. - _handle_terminal_state takes settlement_metadata (pre-settled receipts) instead of payment_context; the redundant post-execute gate from #563 is removed. - _handle_task_failure accepts the settlement_metadata and tags the payment as "payment-orphaned" when work raises after a successful settle. x402 has no refund primitive, so the EIP-3009 fields are persisted for the operator to issue a manual transfer back. Wall-clock latency on the happy path is unchanged — settle (2-5s on Base) just moves from after the LLM call to before it. This is the same choice Google's A2A x402 extension makes for the same reason; Coinbase's x402-express middleware uses the opposite ordering, but that's tuned for sub-second API endpoints, not for agent workloads where the verify-vs-settle gap can span minutes. Three new tests cover the new contract: - settle-fail must not invoke manifest.run - settle must complete before manifest.run starts (call-order assertion) - work failure after successful settle persists payment-orphaned metadata Two tests from #563 (test_settle_failure_does_not_deliver_artifact, test_settle_success_unchanged) are removed — they exercised the post-execute gate that no longer exists. The new end-to-end settle-first tests subsume them. Stacks on #563. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(x402): end-to-end exercise of all four #562 scenarios A single integration test file that drives the real worker pipeline (ManifestWorker + InMemoryStorage + InMemoryNonceStore) against a mocked facilitator for each scenario from the #562 discussion: 1. Front-run drain — settle returns failure → manifest.run not called 2. Settle timeout — settle raises → recovery metadata persisted 3. Parallel-nonce double-spend — first task settles & runs, second fails settle and burns zero LLM tokens 4. Replay — middleware rejects identical nonce on the second request Scenarios 1-3 hit the worker directly with a realistic payment_context dict (the same shape the middleware produces). Scenario 4 goes through Starlette TestClient and the real X402Middleware to verify the middleware-level defense is intact. Each test prints a narrative trace so `pytest -s` reads like a walkthrough — useful when explaining the fix to reviewers or auditing post-merge behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(x402): loguru positional placeholders in _settle_payment + live E2E demo Two changes: 1. Fix latent loguru misuse in _settle_payment. ``logger.error(f"Error settling payment: {e}", exc_info=True)`` eagerly interpolates the exception into the message string, then passes that string to loguru — which re-parses it for ``{placeholder}`` templates. When the exception text contains JSON (every x402 SDK error does), loguru raises ``KeyError`` trying to resolve ``{success}`` / ``{error}``, masking the original error and skipping the recovery-metadata return. Switched both error logs to loguru's native templating: ``logger.error("Payment settlement failed: {}", error_reason)`` ``logger.opt(exception=True).error("Error settling payment: {}", e)`` Positional values are inserted literally — braces inside them are never re-parsed. This was already in the codebase before settle-first; the E2E demo below is what surfaced it. 2. Add a runnable subprocess E2E demo for all four #562 scenarios. ``tests/e2e/x402_scenarios/`` contains: - ``mock_facilitator.py`` — programmable fake that keys failure modes off the EIP-3009 nonce prefix (0xfa11 = settle-reverted, 0xcdcd = facilitator timeout) - ``agent.py`` — minimal Bindu echo agent behind an x402 paywall, pointed at the mock facilitator via ``X402__FACILITATOR_URL`` - ``run_e2e.py`` — driver that spawns both subprocesses, fires real HTTP requests for each scenario, and prints the observed task state, metadata, and recovery fields Run with: ``uv run python tests/e2e/x402_scenarios/run_e2e.py`` Unlike the existing unit/integration tests (which mock at the worker boundary), this demo drives the whole stack — HTTP → X402Middleware → scheduler → ManifestWorker → real facilitator client — exactly as a real deployment would. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(payment): point at the live E2E demo for failure-mode walkthroughs PAYMENT.md already had a "Watching it work" section with an inline fake facilitator, but it only covered the happy path. The new tests/e2e/x402_scenarios/ harness covers all four #562 failure modes end-to-end with a single command — surface it from the docs so operators don't have to discover it from the PR history. Also adds manifest_worker.py and the E2E directory to the "Where to look in the code" pointer list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(x402): keep facilitator error detail out of the user-facing failure message Per CodeRabbit review on #563: the failure message included the raw ``error_text`` from the facilitator (e.g. internal HTTP 500 body, JSON payloads). That can disclose facilitator topology or internal state to the caller — and the caller already gets a clear "task failed" signal plus the structured metadata fields they need. The full reason still lives in ``task.metadata["x402.payment.error"]`` for operator audit; it's just no longer echoed in the agent message that goes back over the wire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
completedwith the artifact already pushed, so a payer who double-spent or whose settlement timed out walked away with paid output._settle_paymentreturningpayment-completed; on failure the task transitions tofailedvia a new_handle_settlement_failurehelper and no artifact is notified._settle_paymentnow persists the EIP-3009 nonce, full authorization block, and network in failure metadata so an operator can reconcile a half-settled transaction against the chain instead of seeing onlystr(e).Behavior change
completed, artifact pushed,task.metadata["x402.payment.status"] = "payment-failed"buried in metadata.failed, no artifact pushed, message body explains "Payment settlement failed; output withheld", andtask.metadatacarriesx402_nonce,x402_authorization,x402_networkfor reconciliation.What's deliberately out of scope
docs/PAYMENT.mdas a known limitation.task.metadata.Test plan
uv run pytest tests/unit/server/workers/test_manifest_worker.py— 31 passed (28 existing + 3 new).uv run pytest tests/unit/server/— 376 passed.uv run pytest tests/unit/extensions/x402/ tests/unit/server/middleware/x402/ tests/unit/server/workers/test_manifest_worker.py— 111 passed.uv run ruff checkon touched files — clean.The three new tests are the contract:
test_settle_failure_does_not_deliver_artifact— facilitator returnssuccess=False, asserts task isfailedand_notify_artifactwas not called.test_settle_exception_persists_recovery_metadata— facilitator raisesTimeoutError, assertsx402_nonce/x402_authorization/x402_networkare persisted.test_settle_success_unchanged— happy-path regression guard.🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Documentation
Tests