feat(x402): adopt settle-first ordering + E2E scenario coverage by raahulrahl · Pull Request #565 · GetBindu/Bindu

raahulrahl · 2026-05-28T14:21:58Z

Note: This PR was originally #564, which got auto-closed when #563's base branch was deleted on merge. Same commits, rebased on top of current main.

Summary

Stacks on top of #563. PR #563 closed the artifact leak (failed settle no longer ships the deliverable); this PR closes the LLM-cost leak by moving _settle_payment to before manifest.run.

Under #563 today, a failed settle (drain attack, parallel-nonce race, validBefore expiry) cost the agent the LLM call but withheld the artifact. Under this PR, a failed settle exits before any state transition or LLM call. Zero token cost on failed payments.

Why settle-first

Researched x402 reference implementations before picking the order:

Reference impl	Order	Designed for
Coinbase x402-express	verify → run → settle	sub-second API endpoints
Google A2A x402	verify → settle → run	long-running agent tasks

Bindu's workload is agent tasks. Following A2A's lead. Coinbase's buffer-then-settle pattern works for 200ms endpoints but leaves a verify-vs-settle window of seconds-to-minutes wide open for agent workloads — exactly the gap exploited in #562.

Latency on the happy path is unchanged: settle (2-5s on Base) just moves from after the LLM call to before it. Total wall-clock is identical.

Behavior change

Case	Before this PR	After this PR
Settle fails	LLM ran, no artifact delivered, ~$0.30 LLM cost lost	No LLM call, no cost. Task fails fast with recovery metadata.
Settle succeeds, work fails	(couldn't happen — settle ran after)	New "orphan payment" state. Payer was debited, no artifact delivered. Metadata flags `payment-orphaned` with full EIP-3009 fields for manual refund.
Happy path	Task completed, artifact delivered	Unchanged.

What else is in this PR

End-to-end integration tests for all four Potential settlement gating issue: completed artifacts are persisted even when settlement fails #562 scenarios using a real worker pipeline + real InMemoryStorage + real X402Middleware + mocked facilitator. See tests/integration/x402/test_e2e_scenarios.py.
Live subprocess demo under tests/e2e/x402_scenarios/ — a programmable fake facilitator + minimal Bindu agent + a driver that boots both and exercises each scenario via real HTTP. Run with uv run python tests/e2e/x402_scenarios/run_e2e.py.
Loguru fix in _settle_payment — surfaced by the live demo. The previous logger.error(f"...{e}", exc_info=True) pattern raised KeyError when the exception text contained JSON, masking the original error and skipping the recovery-metadata return. Switched to native loguru positional templating.
CodeRabbit follow-up — the user-facing failure message no longer echoes the raw facilitator error string (information-disclosure risk). The full detail stays in task.metadata for audit.
Docs: docs/PAYMENT.md updated with the settle-first model, the orphan-payment failure mode, and a pointer to the live demo.

Orphan payment

Settle-first introduces a symmetric risk: settle succeeds, then manifest.run raises (LLM provider outage, agent code bug). The payer paid, got nothing. x402 has no refund primitive, so this PR does NOT auto-refund — it persists the EIP-3009 fields and tags the payment payment-orphaned. Operator responsibility.

The expectation is orphan payments are rare; each one is a real bug to investigate.

Test plan

uv run pytest tests/unit/server/workers/test_manifest_worker.py — 32 passed (3 new + updated existing).
uv run pytest tests/unit/server/ tests/unit/extensions/x402/ — 427 passed.
uv run pytest tests/integration/x402/test_e2e_scenarios.py — 4 scenarios pass end-to-end.
uv run python tests/e2e/x402_scenarios/run_e2e.py — live demo, all four scenarios show expected behavior with real HTTP + real Bindu subprocess.
Pre-commit hooks (ruff, ruff-format, ty, bandit, detect-secrets, pydocstyle) — all pass.

Still deferred

Facilitator-timeout vs Base-confirmation race reconciliation worker. Settle-first doesn't fix this — facilitator returns failure while the on-chain tx still confirms. Persisted metadata is sufficient for a reconciliation worker to reason about, but the worker itself is not in this PR.
Auto-refund. x402 has no native refund primitive. Bindu doesn't try to send USDC back.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Payment settlements now occur before agent execution, preventing unnecessary token consumption on failed settlements.
- Added orphan payment detection and tagging for cases where agents crash after successful settlement.
Documentation
- Enhanced payment processing guidance with clearer failure scenarios and refund workflows.
Tests
- Added comprehensive end-to-end test scenarios for payment settlement edge cases and failure modes.
- Expanded unit test coverage for settlement-first behavior and orphan payment handling.

PR #563 closed the artifact-leak half of issue #562 (failed settle no longer delivered the artifact) but agents still ate the LLM cost on every failed settle — a payer who drained their wallet after verify, or who lost a parallel-nonce settle race, still triggered an LLM call. This PR moves _settle_payment to BEFORE manifest.run, so a failed settlement costs the agent zero LLM tokens: - run_task now calls _settle_payment immediately after extracting payment_context and before transitioning the task to "working". Settle-fail → _handle_settlement_failure with full recovery metadata → return. Manifest never runs. - _handle_terminal_state takes settlement_metadata (pre-settled receipts) instead of payment_context; the redundant post-execute gate from #563 is removed. - _handle_task_failure accepts the settlement_metadata and tags the payment as "payment-orphaned" when work raises after a successful settle. x402 has no refund primitive, so the EIP-3009 fields are persisted for the operator to issue a manual transfer back. Wall-clock latency on the happy path is unchanged — settle (2-5s on Base) just moves from after the LLM call to before it. This is the same choice Google's A2A x402 extension makes for the same reason; Coinbase's x402-express middleware uses the opposite ordering, but that's tuned for sub-second API endpoints, not for agent workloads where the verify-vs-settle gap can span minutes. Three new tests cover the new contract: - settle-fail must not invoke manifest.run - settle must complete before manifest.run starts (call-order assertion) - work failure after successful settle persists payment-orphaned metadata Two tests from #563 (test_settle_failure_does_not_deliver_artifact, test_settle_success_unchanged) are removed — they exercised the post-execute gate that no longer exists. The new end-to-end settle-first tests subsume them. Stacks on #563. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A single integration test file that drives the real worker pipeline (ManifestWorker + InMemoryStorage + InMemoryNonceStore) against a mocked facilitator for each scenario from the #562 discussion: 1. Front-run drain — settle returns failure → manifest.run not called 2. Settle timeout — settle raises → recovery metadata persisted 3. Parallel-nonce double-spend — first task settles & runs, second fails settle and burns zero LLM tokens 4. Replay — middleware rejects identical nonce on the second request Scenarios 1-3 hit the worker directly with a realistic payment_context dict (the same shape the middleware produces). Scenario 4 goes through Starlette TestClient and the real X402Middleware to verify the middleware-level defense is intact. Each test prints a narrative trace so `pytest -s` reads like a walkthrough — useful when explaining the fix to reviewers or auditing post-merge behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…2E demo Two changes: 1. Fix latent loguru misuse in _settle_payment. ``logger.error(f"Error settling payment: {e}", exc_info=True)`` eagerly interpolates the exception into the message string, then passes that string to loguru — which re-parses it for ``{placeholder}`` templates. When the exception text contains JSON (every x402 SDK error does), loguru raises ``KeyError`` trying to resolve ``{success}`` / ``{error}``, masking the original error and skipping the recovery-metadata return. Switched both error logs to loguru's native templating: ``logger.error("Payment settlement failed: {}", error_reason)`` ``logger.opt(exception=True).error("Error settling payment: {}", e)`` Positional values are inserted literally — braces inside them are never re-parsed. This was already in the codebase before settle-first; the E2E demo below is what surfaced it. 2. Add a runnable subprocess E2E demo for all four #562 scenarios. ``tests/e2e/x402_scenarios/`` contains: - ``mock_facilitator.py`` — programmable fake that keys failure modes off the EIP-3009 nonce prefix (0xfa11 = settle-reverted, 0xcdcd = facilitator timeout) - ``agent.py`` — minimal Bindu echo agent behind an x402 paywall, pointed at the mock facilitator via ``X402__FACILITATOR_URL`` - ``run_e2e.py`` — driver that spawns both subprocesses, fires real HTTP requests for each scenario, and prints the observed task state, metadata, and recovery fields Run with: ``uv run python tests/e2e/x402_scenarios/run_e2e.py`` Unlike the existing unit/integration tests (which mock at the worker boundary), this demo drives the whole stack — HTTP → X402Middleware → scheduler → ManifestWorker → real facilitator client — exactly as a real deployment would. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PAYMENT.md already had a "Watching it work" section with an inline fake facilitator, but it only covered the happy path. The new tests/e2e/x402_scenarios/ harness covers all four #562 failure modes end-to-end with a single command — surface it from the docs so operators don't have to discover it from the PR history. Also adds manifest_worker.py and the E2E directory to the "Where to look in the code" pointer list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…re message Per CodeRabbit review on #563: the failure message included the raw ``error_text`` from the facilitator (e.g. internal HTTP 500 body, JSON payloads). That can disclose facilitator topology or internal state to the caller — and the caller already gets a clear "task failed" signal plus the structured metadata fields they need. The full reason still lives in ``task.metadata["x402.payment.error"]`` for operator audit; it's just no longer echoed in the agent message that goes back over the wire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-28T14:22:17Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e747e9a2-bdb9-4028-b8d9-1c846afc15f6

📥 Commits

Reviewing files that changed from the base of the PR and between c6b726c and 616b1af.

📒 Files selected for processing (7)

bindu/server/workers/manifest_worker.py
docs/PAYMENT.md
tests/e2e/x402_scenarios/agent.py
tests/e2e/x402_scenarios/mock_facilitator.py
tests/e2e/x402_scenarios/run_e2e.py
tests/integration/x402/test_e2e_scenarios.py
tests/unit/server/workers/test_manifest_worker.py

📝 Walkthrough

Walkthrough

ManifestWorker now settles x402 payments before executing the manifest/LLM; settlement failures mark tasks failed without invoking the agent, and post-settlement failures create "orphan payment" metadata for manual recovery. Supporting documentation, unit tests, integration tests with middleware replay defense, and an end-to-end scenario driver (with mock facilitator and echo agent) validate the settle-first behavior and failure workflows.

Changes

X402 Settle-First Ordering and Orphan Payment Handling

Layer / File(s)	Summary
Settlement-first orchestration in ManifestWorker `bindu/server/workers/manifest_worker.py`	`run_task` performs x402 settlement before any state transition or LLM work; settlement failure invokes `_handle_settlement_failure` and returns early; success proceeds to agent execution. Exception paths route failures to `_handle_task_failure(..., settlement_metadata=...)` so orphan-payment metadata is tagged when settlement preceded the failure. `_handle_terminal_state` now receives pre-settled `settlement_metadata` and attaches receipts/status to task metadata instead of settling during terminal handling. `_handle_task_failure` conditionally marks failures as "orphan payments" by tagging `payment-orphaned` while preserving EIP-3009 nonce/authorization fields. Loguru logging adjusted to avoid f-string templating issues.
Production guidance and troubleshooting documentation `docs/PAYMENT.md`	Settle-first behavior documented: settlement occurs before agent execution (failed settlements cost no LLM tokens), and post-settlement agent failures create "orphan payments" that persist nonce/authorization/receipts and require manual operator refund. Troubleshooting bullets revised to clarify settlement-failure recovery (verify passed but `/settle` failed) and orphan-payment handling (tag, receipts, refund workflow). Added cross-reference to prebuilt e2e scenario driver and expanded "Where to look in the code" search targets.
Unit test coverage for settle-first behavior `tests/unit/server/workers/test_manifest_worker.py`	New tests validate: settlement failure prevents `manifest.run` and marks task `failed`; settlement runs before `manifest.run` (call-order assertion); post-settlement LLM failure creates distinct "orphan payment" metadata with recorded nonce. Terminal-state test confirms `settlement_metadata` is attached to task metadata when present.
Integration tests for x402 scenarios `tests/integration/x402/test_e2e_scenarios.py`	Four scenarios cover settle-first and recovery workflows: (1) settle failure prevents LLM, persists recovery metadata; (2) settle timeout persists nonce/authorization; (3) parallel double-spend race where one succeeds and one fails without LLM call; (4) real `X402Middleware` with `InMemoryNonceStore` confirms replay rejection (HTTP 402) before agent reaches request. Uses `ManifestRunRecorder` test double and `submit_and_drive_task` harness to mock facilitator outcomes and assert LLM call counts.
E2E scenario mock facilitator and echo agent `tests/e2e/x402_scenarios/agent.py`, `tests/e2e/x402_scenarios/mock_facilitator.py`	Deterministic fake x402 facilitator (Starlette-based) implements three endpoints: `GET /supported` (fixed chain support), `POST /verify` (always valid), and `POST /settle` (branches behavior by nonce prefix to trigger failure, timeout, or success). Simple echo agent returns `PAID JOB DONE` response echoing the last message, registered via `bindufy` with x402 paywall configuration.
E2E scenario driver and subprocess orchestration `tests/e2e/x402_scenarios/run_e2e.py`	Driver spawns mock facilitator and real agent as subprocesses, waits for HTTP readiness, and executes four JSON-RPC scenario flows: (1) settle failure after wallet drain, (2) timeout during facilitator settle, (3) parallel double-spend race, (4) replay rejection. Constructs base64 `X-PAYMENT` headers with injected nonces, polls tasks to terminal state, and prints structured summaries of task outcome, artifacts, and metadata. Guarantees subprocess cleanup via SIGTERM/kill.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

GetBindu/Bindu#563: Modifies ManifestWorker's x402 settlement flow and payment/failure metadata handling in manifest_worker.py—directly related to settle-first orchestration and orphan-payment recovery.
GetBindu/Bindu#532: Updates the x402 facilitator client and models used by _settle_payment; this PR changes how ManifestWorker integrates and propagates settlement outcomes.

Poem

🐰 A rabbit hops through payment land,
Where settle now comes first, you understand—
No LLM calls for funds that fail,
And orphan coins leave quite a tale.
With nonce and proof, the refunds flow,
A settle-first, then work — bravo!

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/x402-settle-first

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

The #562/#565 settle-first work landed the recovery metadata (EIP-3009 nonce + authorization + network) on every failed-settle task, but two structural gaps remain that operators should know about: - x402-settle-false-negative-silent-orphans: facilitator /settle can time out while the chain ultimately confirms — payer debited but task marked failed, no `payment-orphaned` tag because the worker's only signal was settle's `success=False`. Reconciliation worker (which would query the chain for AuthorizationUsed and flip these tasks) is scoped out as a follow-up. - x402-no-auto-refund-for-orphan-payments: even when orphans ARE tagged, Bindu has no outbound-wallet path (pay_to_address is a config string; no private key, no Base RPC). Refunding is fully manual. Custody surface is large enough that we agreed to wait for real volume before building. Both entries include the metadata fields operators need for manual reconciliation today, and reference each other as the detection/remediation pair. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

raahulrahl and others added 5 commits May 28, 2026 16:21

raahulrahl merged commit 1a085f5 into main May 28, 2026
5 of 6 checks passed

raahulrahl deleted the feat/x402-settle-first branch May 28, 2026 14:22

raahulrahl mentioned this pull request May 28, 2026

docs(known-issues): add settle-false-negative orphans + no-auto-refund #566

Merged

3 tasks

raahulrahl mentioned this pull request May 28, 2026

Potential settlement gating issue: completed artifacts are persisted even when settlement fails #562

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(x402): adopt settle-first ordering + E2E scenario coverage#565

feat(x402): adopt settle-first ordering + E2E scenario coverage#565
raahulrahl merged 5 commits into
mainfrom
feat/x402-settle-first

raahulrahl commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raahulrahl commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why settle-first

Behavior change

What else is in this PR

Orphan payment

Test plan

Still deferred

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

raahulrahl commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading