feat(x402): adopt settle-first ordering + E2E scenario coverage#565
Conversation
PR #563 closed the artifact-leak half of issue #562 (failed settle no longer delivered the artifact) but agents still ate the LLM cost on every failed settle — a payer who drained their wallet after verify, or who lost a parallel-nonce settle race, still triggered an LLM call. This PR moves _settle_payment to BEFORE manifest.run, so a failed settlement costs the agent zero LLM tokens: - run_task now calls _settle_payment immediately after extracting payment_context and before transitioning the task to "working". Settle-fail → _handle_settlement_failure with full recovery metadata → return. Manifest never runs. - _handle_terminal_state takes settlement_metadata (pre-settled receipts) instead of payment_context; the redundant post-execute gate from #563 is removed. - _handle_task_failure accepts the settlement_metadata and tags the payment as "payment-orphaned" when work raises after a successful settle. x402 has no refund primitive, so the EIP-3009 fields are persisted for the operator to issue a manual transfer back. Wall-clock latency on the happy path is unchanged — settle (2-5s on Base) just moves from after the LLM call to before it. This is the same choice Google's A2A x402 extension makes for the same reason; Coinbase's x402-express middleware uses the opposite ordering, but that's tuned for sub-second API endpoints, not for agent workloads where the verify-vs-settle gap can span minutes. Three new tests cover the new contract: - settle-fail must not invoke manifest.run - settle must complete before manifest.run starts (call-order assertion) - work failure after successful settle persists payment-orphaned metadata Two tests from #563 (test_settle_failure_does_not_deliver_artifact, test_settle_success_unchanged) are removed — they exercised the post-execute gate that no longer exists. The new end-to-end settle-first tests subsume them. Stacks on #563. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A single integration test file that drives the real worker pipeline (ManifestWorker + InMemoryStorage + InMemoryNonceStore) against a mocked facilitator for each scenario from the #562 discussion: 1. Front-run drain — settle returns failure → manifest.run not called 2. Settle timeout — settle raises → recovery metadata persisted 3. Parallel-nonce double-spend — first task settles & runs, second fails settle and burns zero LLM tokens 4. Replay — middleware rejects identical nonce on the second request Scenarios 1-3 hit the worker directly with a realistic payment_context dict (the same shape the middleware produces). Scenario 4 goes through Starlette TestClient and the real X402Middleware to verify the middleware-level defense is intact. Each test prints a narrative trace so `pytest -s` reads like a walkthrough — useful when explaining the fix to reviewers or auditing post-merge behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…2E demo
Two changes:
1. Fix latent loguru misuse in _settle_payment.
``logger.error(f"Error settling payment: {e}", exc_info=True)`` eagerly
interpolates the exception into the message string, then passes that
string to loguru — which re-parses it for ``{placeholder}`` templates.
When the exception text contains JSON (every x402 SDK error does),
loguru raises ``KeyError`` trying to resolve ``{success}`` / ``{error}``,
masking the original error and skipping the recovery-metadata return.
Switched both error logs to loguru's native templating:
``logger.error("Payment settlement failed: {}", error_reason)``
``logger.opt(exception=True).error("Error settling payment: {}", e)``
Positional values are inserted literally — braces inside them are
never re-parsed.
This was already in the codebase before settle-first; the E2E demo
below is what surfaced it.
2. Add a runnable subprocess E2E demo for all four #562 scenarios.
``tests/e2e/x402_scenarios/`` contains:
- ``mock_facilitator.py`` — programmable fake that keys failure modes
off the EIP-3009 nonce prefix (0xfa11 = settle-reverted,
0xcdcd = facilitator timeout)
- ``agent.py`` — minimal Bindu echo agent behind an x402 paywall,
pointed at the mock facilitator via ``X402__FACILITATOR_URL``
- ``run_e2e.py`` — driver that spawns both subprocesses, fires real
HTTP requests for each scenario, and prints the observed task
state, metadata, and recovery fields
Run with: ``uv run python tests/e2e/x402_scenarios/run_e2e.py``
Unlike the existing unit/integration tests (which mock at the worker
boundary), this demo drives the whole stack — HTTP → X402Middleware
→ scheduler → ManifestWorker → real facilitator client — exactly as
a real deployment would.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PAYMENT.md already had a "Watching it work" section with an inline fake facilitator, but it only covered the happy path. The new tests/e2e/x402_scenarios/ harness covers all four #562 failure modes end-to-end with a single command — surface it from the docs so operators don't have to discover it from the PR history. Also adds manifest_worker.py and the E2E directory to the "Where to look in the code" pointer list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re message Per CodeRabbit review on #563: the failure message included the raw ``error_text`` from the facilitator (e.g. internal HTTP 500 body, JSON payloads). That can disclose facilitator topology or internal state to the caller — and the caller already gets a clear "task failed" signal plus the structured metadata fields they need. The full reason still lives in ``task.metadata["x402.payment.error"]`` for operator audit; it's just no longer echoed in the agent message that goes back over the wire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughManifestWorker now settles x402 payments before executing the manifest/LLM; settlement failures mark tasks failed without invoking the agent, and post-settlement failures create "orphan payment" metadata for manual recovery. Supporting documentation, unit tests, integration tests with middleware replay defense, and an end-to-end scenario driver (with mock facilitator and echo agent) validate the settle-first behavior and failure workflows. ChangesX402 Settle-First Ordering and Orphan Payment Handling
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The #562/#565 settle-first work landed the recovery metadata (EIP-3009 nonce + authorization + network) on every failed-settle task, but two structural gaps remain that operators should know about: - x402-settle-false-negative-silent-orphans: facilitator /settle can time out while the chain ultimately confirms — payer debited but task marked failed, no `payment-orphaned` tag because the worker's only signal was settle's `success=False`. Reconciliation worker (which would query the chain for AuthorizationUsed and flip these tasks) is scoped out as a follow-up. - x402-no-auto-refund-for-orphan-payments: even when orphans ARE tagged, Bindu has no outbound-wallet path (pay_to_address is a config string; no private key, no Base RPC). Refunding is fully manual. Custody surface is large enough that we agreed to wait for real volume before building. Both entries include the metadata fields operators need for manual reconciliation today, and reference each other as the detection/remediation pair. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Stacks on top of #563. PR #563 closed the artifact leak (failed settle no longer ships the deliverable); this PR closes the LLM-cost leak by moving
_settle_paymentto beforemanifest.run.Under #563 today, a failed settle (drain attack, parallel-nonce race, validBefore expiry) cost the agent the LLM call but withheld the artifact. Under this PR, a failed settle exits before any state transition or LLM call. Zero token cost on failed payments.
Why settle-first
Researched x402 reference implementations before picking the order:
Bindu's workload is agent tasks. Following A2A's lead. Coinbase's buffer-then-settle pattern works for 200ms endpoints but leaves a verify-vs-settle window of seconds-to-minutes wide open for agent workloads — exactly the gap exploited in #562.
Latency on the happy path is unchanged: settle (2-5s on Base) just moves from after the LLM call to before it. Total wall-clock is identical.
Behavior change
payment-orphanedwith full EIP-3009 fields for manual refund.What else is in this PR
InMemoryStorage+ realX402Middleware+ mocked facilitator. Seetests/integration/x402/test_e2e_scenarios.py.tests/e2e/x402_scenarios/— a programmable fake facilitator + minimal Bindu agent + a driver that boots both and exercises each scenario via real HTTP. Run withuv run python tests/e2e/x402_scenarios/run_e2e.py._settle_payment— surfaced by the live demo. The previouslogger.error(f"...{e}", exc_info=True)pattern raisedKeyErrorwhen the exception text contained JSON, masking the original error and skipping the recovery-metadata return. Switched to native loguru positional templating.task.metadatafor audit.docs/PAYMENT.mdupdated with the settle-first model, the orphan-payment failure mode, and a pointer to the live demo.Orphan payment
Settle-first introduces a symmetric risk: settle succeeds, then
manifest.runraises (LLM provider outage, agent code bug). The payer paid, got nothing. x402 has no refund primitive, so this PR does NOT auto-refund — it persists the EIP-3009 fields and tags the paymentpayment-orphaned. Operator responsibility.The expectation is orphan payments are rare; each one is a real bug to investigate.
Test plan
uv run pytest tests/unit/server/workers/test_manifest_worker.py— 32 passed (3 new + updated existing).uv run pytest tests/unit/server/ tests/unit/extensions/x402/— 427 passed.uv run pytest tests/integration/x402/test_e2e_scenarios.py— 4 scenarios pass end-to-end.uv run python tests/e2e/x402_scenarios/run_e2e.py— live demo, all four scenarios show expected behavior with real HTTP + real Bindu subprocess.Still deferred
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests