Skip to content

feat(x402): adopt settle-first ordering + E2E scenario coverage#565

Merged
raahulrahl merged 5 commits into
mainfrom
feat/x402-settle-first
May 28, 2026
Merged

feat(x402): adopt settle-first ordering + E2E scenario coverage#565
raahulrahl merged 5 commits into
mainfrom
feat/x402-settle-first

Conversation

@raahulrahl
Copy link
Copy Markdown
Member

@raahulrahl raahulrahl commented May 28, 2026

Note: This PR was originally #564, which got auto-closed when #563's base branch was deleted on merge. Same commits, rebased on top of current main.

Summary

Stacks on top of #563. PR #563 closed the artifact leak (failed settle no longer ships the deliverable); this PR closes the LLM-cost leak by moving _settle_payment to before manifest.run.

Under #563 today, a failed settle (drain attack, parallel-nonce race, validBefore expiry) cost the agent the LLM call but withheld the artifact. Under this PR, a failed settle exits before any state transition or LLM call. Zero token cost on failed payments.

Why settle-first

Researched x402 reference implementations before picking the order:

Reference impl Order Designed for
Coinbase x402-express verify → run → settle sub-second API endpoints
Google A2A x402 verify → settle → run long-running agent tasks

Bindu's workload is agent tasks. Following A2A's lead. Coinbase's buffer-then-settle pattern works for 200ms endpoints but leaves a verify-vs-settle window of seconds-to-minutes wide open for agent workloads — exactly the gap exploited in #562.

Latency on the happy path is unchanged: settle (2-5s on Base) just moves from after the LLM call to before it. Total wall-clock is identical.

Behavior change

Case Before this PR After this PR
Settle fails LLM ran, no artifact delivered, ~$0.30 LLM cost lost No LLM call, no cost. Task fails fast with recovery metadata.
Settle succeeds, work fails (couldn't happen — settle ran after) New "orphan payment" state. Payer was debited, no artifact delivered. Metadata flags payment-orphaned with full EIP-3009 fields for manual refund.
Happy path Task completed, artifact delivered Unchanged.

What else is in this PR

  • End-to-end integration tests for all four Potential settlement gating issue: completed artifacts are persisted even when settlement fails #562 scenarios using a real worker pipeline + real InMemoryStorage + real X402Middleware + mocked facilitator. See tests/integration/x402/test_e2e_scenarios.py.
  • Live subprocess demo under tests/e2e/x402_scenarios/ — a programmable fake facilitator + minimal Bindu agent + a driver that boots both and exercises each scenario via real HTTP. Run with uv run python tests/e2e/x402_scenarios/run_e2e.py.
  • Loguru fix in _settle_payment — surfaced by the live demo. The previous logger.error(f"...{e}", exc_info=True) pattern raised KeyError when the exception text contained JSON, masking the original error and skipping the recovery-metadata return. Switched to native loguru positional templating.
  • CodeRabbit follow-up — the user-facing failure message no longer echoes the raw facilitator error string (information-disclosure risk). The full detail stays in task.metadata for audit.
  • Docs: docs/PAYMENT.md updated with the settle-first model, the orphan-payment failure mode, and a pointer to the live demo.

Orphan payment

Settle-first introduces a symmetric risk: settle succeeds, then manifest.run raises (LLM provider outage, agent code bug). The payer paid, got nothing. x402 has no refund primitive, so this PR does NOT auto-refund — it persists the EIP-3009 fields and tags the payment payment-orphaned. Operator responsibility.

The expectation is orphan payments are rare; each one is a real bug to investigate.

Test plan

  • uv run pytest tests/unit/server/workers/test_manifest_worker.py — 32 passed (3 new + updated existing).
  • uv run pytest tests/unit/server/ tests/unit/extensions/x402/ — 427 passed.
  • uv run pytest tests/integration/x402/test_e2e_scenarios.py — 4 scenarios pass end-to-end.
  • uv run python tests/e2e/x402_scenarios/run_e2e.py — live demo, all four scenarios show expected behavior with real HTTP + real Bindu subprocess.
  • Pre-commit hooks (ruff, ruff-format, ty, bandit, detect-secrets, pydocstyle) — all pass.

Still deferred

  • Facilitator-timeout vs Base-confirmation race reconciliation worker. Settle-first doesn't fix this — facilitator returns failure while the on-chain tx still confirms. Persisted metadata is sufficient for a reconciliation worker to reason about, but the worker itself is not in this PR.
  • Auto-refund. x402 has no native refund primitive. Bindu doesn't try to send USDC back.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Payment settlements now occur before agent execution, preventing unnecessary token consumption on failed settlements.
    • Added orphan payment detection and tagging for cases where agents crash after successful settlement.
  • Documentation

    • Enhanced payment processing guidance with clearer failure scenarios and refund workflows.
  • Tests

    • Added comprehensive end-to-end test scenarios for payment settlement edge cases and failure modes.
    • Expanded unit test coverage for settlement-first behavior and orphan payment handling.

Review Change Stack

raahulrahl and others added 5 commits May 28, 2026 16:21
PR #563 closed the artifact-leak half of issue #562 (failed settle no
longer delivered the artifact) but agents still ate the LLM cost on
every failed settle — a payer who drained their wallet after verify,
or who lost a parallel-nonce settle race, still triggered an LLM call.

This PR moves _settle_payment to BEFORE manifest.run, so a failed
settlement costs the agent zero LLM tokens:

- run_task now calls _settle_payment immediately after extracting
  payment_context and before transitioning the task to "working".
  Settle-fail → _handle_settlement_failure with full recovery metadata
  → return. Manifest never runs.
- _handle_terminal_state takes settlement_metadata (pre-settled
  receipts) instead of payment_context; the redundant post-execute
  gate from #563 is removed.
- _handle_task_failure accepts the settlement_metadata and tags the
  payment as "payment-orphaned" when work raises after a successful
  settle. x402 has no refund primitive, so the EIP-3009 fields are
  persisted for the operator to issue a manual transfer back.

Wall-clock latency on the happy path is unchanged — settle (2-5s on
Base) just moves from after the LLM call to before it. This is the
same choice Google's A2A x402 extension makes for the same reason;
Coinbase's x402-express middleware uses the opposite ordering, but
that's tuned for sub-second API endpoints, not for agent workloads
where the verify-vs-settle gap can span minutes.

Three new tests cover the new contract:
- settle-fail must not invoke manifest.run
- settle must complete before manifest.run starts (call-order assertion)
- work failure after successful settle persists payment-orphaned metadata

Two tests from #563 (test_settle_failure_does_not_deliver_artifact,
test_settle_success_unchanged) are removed — they exercised the
post-execute gate that no longer exists. The new end-to-end settle-first
tests subsume them.

Stacks on #563.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A single integration test file that drives the real worker pipeline
(ManifestWorker + InMemoryStorage + InMemoryNonceStore) against a mocked
facilitator for each scenario from the #562 discussion:

  1. Front-run drain — settle returns failure → manifest.run not called
  2. Settle timeout — settle raises → recovery metadata persisted
  3. Parallel-nonce double-spend — first task settles & runs, second
     fails settle and burns zero LLM tokens
  4. Replay — middleware rejects identical nonce on the second request

Scenarios 1-3 hit the worker directly with a realistic payment_context
dict (the same shape the middleware produces). Scenario 4 goes through
Starlette TestClient and the real X402Middleware to verify the
middleware-level defense is intact.

Each test prints a narrative trace so `pytest -s` reads like a
walkthrough — useful when explaining the fix to reviewers or auditing
post-merge behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…2E demo

Two changes:

1. Fix latent loguru misuse in _settle_payment.

   ``logger.error(f"Error settling payment: {e}", exc_info=True)`` eagerly
   interpolates the exception into the message string, then passes that
   string to loguru — which re-parses it for ``{placeholder}`` templates.
   When the exception text contains JSON (every x402 SDK error does),
   loguru raises ``KeyError`` trying to resolve ``{success}`` / ``{error}``,
   masking the original error and skipping the recovery-metadata return.

   Switched both error logs to loguru's native templating:
     ``logger.error("Payment settlement failed: {}", error_reason)``
     ``logger.opt(exception=True).error("Error settling payment: {}", e)``

   Positional values are inserted literally — braces inside them are
   never re-parsed.

   This was already in the codebase before settle-first; the E2E demo
   below is what surfaced it.

2. Add a runnable subprocess E2E demo for all four #562 scenarios.

   ``tests/e2e/x402_scenarios/`` contains:
   - ``mock_facilitator.py`` — programmable fake that keys failure modes
     off the EIP-3009 nonce prefix (0xfa11 = settle-reverted,
     0xcdcd = facilitator timeout)
   - ``agent.py`` — minimal Bindu echo agent behind an x402 paywall,
     pointed at the mock facilitator via ``X402__FACILITATOR_URL``
   - ``run_e2e.py`` — driver that spawns both subprocesses, fires real
     HTTP requests for each scenario, and prints the observed task
     state, metadata, and recovery fields

   Run with: ``uv run python tests/e2e/x402_scenarios/run_e2e.py``

   Unlike the existing unit/integration tests (which mock at the worker
   boundary), this demo drives the whole stack — HTTP → X402Middleware
   → scheduler → ManifestWorker → real facilitator client — exactly as
   a real deployment would.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PAYMENT.md already had a "Watching it work" section with an inline
fake facilitator, but it only covered the happy path. The new
tests/e2e/x402_scenarios/ harness covers all four #562 failure modes
end-to-end with a single command — surface it from the docs so
operators don't have to discover it from the PR history.

Also adds manifest_worker.py and the E2E directory to the
"Where to look in the code" pointer list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re message

Per CodeRabbit review on #563: the failure message included the raw
``error_text`` from the facilitator (e.g. internal HTTP 500 body, JSON
payloads). That can disclose facilitator topology or internal state to
the caller — and the caller already gets a clear "task failed" signal
plus the structured metadata fields they need.

The full reason still lives in ``task.metadata["x402.payment.error"]``
for operator audit; it's just no longer echoed in the agent message
that goes back over the wire.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e747e9a2-bdb9-4028-b8d9-1c846afc15f6

📥 Commits

Reviewing files that changed from the base of the PR and between c6b726c and 616b1af.

📒 Files selected for processing (7)
  • bindu/server/workers/manifest_worker.py
  • docs/PAYMENT.md
  • tests/e2e/x402_scenarios/agent.py
  • tests/e2e/x402_scenarios/mock_facilitator.py
  • tests/e2e/x402_scenarios/run_e2e.py
  • tests/integration/x402/test_e2e_scenarios.py
  • tests/unit/server/workers/test_manifest_worker.py

📝 Walkthrough

Walkthrough

ManifestWorker now settles x402 payments before executing the manifest/LLM; settlement failures mark tasks failed without invoking the agent, and post-settlement failures create "orphan payment" metadata for manual recovery. Supporting documentation, unit tests, integration tests with middleware replay defense, and an end-to-end scenario driver (with mock facilitator and echo agent) validate the settle-first behavior and failure workflows.

Changes

X402 Settle-First Ordering and Orphan Payment Handling

Layer / File(s) Summary
Settlement-first orchestration in ManifestWorker
bindu/server/workers/manifest_worker.py
run_task performs x402 settlement before any state transition or LLM work; settlement failure invokes _handle_settlement_failure and returns early; success proceeds to agent execution. Exception paths route failures to _handle_task_failure(..., settlement_metadata=...) so orphan-payment metadata is tagged when settlement preceded the failure. _handle_terminal_state now receives pre-settled settlement_metadata and attaches receipts/status to task metadata instead of settling during terminal handling. _handle_task_failure conditionally marks failures as "orphan payments" by tagging payment-orphaned while preserving EIP-3009 nonce/authorization fields. Loguru logging adjusted to avoid f-string templating issues.
Production guidance and troubleshooting documentation
docs/PAYMENT.md
Settle-first behavior documented: settlement occurs before agent execution (failed settlements cost no LLM tokens), and post-settlement agent failures create "orphan payments" that persist nonce/authorization/receipts and require manual operator refund. Troubleshooting bullets revised to clarify settlement-failure recovery (verify passed but /settle failed) and orphan-payment handling (tag, receipts, refund workflow). Added cross-reference to prebuilt e2e scenario driver and expanded "Where to look in the code" search targets.
Unit test coverage for settle-first behavior
tests/unit/server/workers/test_manifest_worker.py
New tests validate: settlement failure prevents manifest.run and marks task failed; settlement runs before manifest.run (call-order assertion); post-settlement LLM failure creates distinct "orphan payment" metadata with recorded nonce. Terminal-state test confirms settlement_metadata is attached to task metadata when present.
Integration tests for x402 scenarios
tests/integration/x402/test_e2e_scenarios.py
Four scenarios cover settle-first and recovery workflows: (1) settle failure prevents LLM, persists recovery metadata; (2) settle timeout persists nonce/authorization; (3) parallel double-spend race where one succeeds and one fails without LLM call; (4) real X402Middleware with InMemoryNonceStore confirms replay rejection (HTTP 402) before agent reaches request. Uses ManifestRunRecorder test double and submit_and_drive_task harness to mock facilitator outcomes and assert LLM call counts.
E2E scenario mock facilitator and echo agent
tests/e2e/x402_scenarios/agent.py, tests/e2e/x402_scenarios/mock_facilitator.py
Deterministic fake x402 facilitator (Starlette-based) implements three endpoints: GET /supported (fixed chain support), POST /verify (always valid), and POST /settle (branches behavior by nonce prefix to trigger failure, timeout, or success). Simple echo agent returns PAID JOB DONE response echoing the last message, registered via bindufy with x402 paywall configuration.
E2E scenario driver and subprocess orchestration
tests/e2e/x402_scenarios/run_e2e.py
Driver spawns mock facilitator and real agent as subprocesses, waits for HTTP readiness, and executes four JSON-RPC scenario flows: (1) settle failure after wallet drain, (2) timeout during facilitator settle, (3) parallel double-spend race, (4) replay rejection. Constructs base64 X-PAYMENT headers with injected nonces, polls tasks to terminal state, and prints structured summaries of task outcome, artifacts, and metadata. Guarantees subprocess cleanup via SIGTERM/kill.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • GetBindu/Bindu#563: Modifies ManifestWorker's x402 settlement flow and payment/failure metadata handling in manifest_worker.py—directly related to settle-first orchestration and orphan-payment recovery.
  • GetBindu/Bindu#532: Updates the x402 facilitator client and models used by _settle_payment; this PR changes how ManifestWorker integrates and propagates settlement outcomes.

Poem

🐰 A rabbit hops through payment land,
Where settle now comes first, you understand—
No LLM calls for funds that fail,
And orphan coins leave quite a tale.
With nonce and proof, the refunds flow,
A settle-first, then work — bravo!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/x402-settle-first

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@raahulrahl raahulrahl merged commit 1a085f5 into main May 28, 2026
5 of 6 checks passed
@raahulrahl raahulrahl deleted the feat/x402-settle-first branch May 28, 2026 14:22
raahulrahl added a commit that referenced this pull request May 28, 2026
The #562/#565 settle-first work landed the recovery metadata
(EIP-3009 nonce + authorization + network) on every failed-settle
task, but two structural gaps remain that operators should know
about:

- x402-settle-false-negative-silent-orphans: facilitator /settle can
  time out while the chain ultimately confirms — payer debited but
  task marked failed, no `payment-orphaned` tag because the worker's
  only signal was settle's `success=False`. Reconciliation worker
  (which would query the chain for AuthorizationUsed and flip these
  tasks) is scoped out as a follow-up.

- x402-no-auto-refund-for-orphan-payments: even when orphans ARE
  tagged, Bindu has no outbound-wallet path (pay_to_address is a
  config string; no private key, no Base RPC). Refunding is fully
  manual. Custody surface is large enough that we agreed to wait
  for real volume before building.

Both entries include the metadata fields operators need for manual
reconciliation today, and reference each other as the
detection/remediation pair.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant