feat: flow-14 live Base Sepolia + network status reachability + flow-13 hardening#388
Conversation
Adds ProbeUpstream / ProbeAllUpstreams (eth_chainId, 2s parallel timeout) and wires `obol network status` to warn on unreachable or chain-id mismatched upstreams — typically a stale `obol network add base-sepolia --endpoint <local-anvil>` left over from a flow run whose Anvil was since killed or recreated. The report covering v0.9.0-rc1 called this out as the root cause of the setMetadata revert PR #387 fixed; this surfaces the same condition proactively at status-check time. `--no-probe` opts out for callers who don't want the network round-trip.
Adds flow-14 — a live-network counterpart to the Anvil-fork flow-13. Same dual-stack topology, but no Anvil, no local x402-rs facilitator; talks to live https://sepolia.base.org and the public Obol facilitator at x402.gcp.obol.tech. Required env vars OBOL_TOKEN_BASE_SEPOLIA (the deployed ERC20Permit address) and BOB_FUNDING_PRIVATE_KEY (a real funded buyer wallet) fail fast at the top so the script never spends gas before the operator has set both. Registration is enabled in flow-14 (flow-13 deliberately disables it for the protocol-level fork test) so PR #387's WaitForAgent fix runs on the OBOL path too. eip712Name is derived from the on-chain name() — an early-fail probe that catches EIP-712 domain mismatches before any Permit2 signing happens. flow-13 picks up the same EIP-712 early-fail probe, plus a cleanup-trap `obol network remove base-sepolia` on both clusters so a leftover custom pin from a prior run can't leak into the next flow's reads. monetize-inference.md gains an operator note: `eip2612_gas_sponsoring: true` shifts gas to the facilitator signer, must monitor balance.
Adds a build-time parity check (TestForkObolToken_ParityWithCanonicalOBOL) that catches drift between contracts/fork-obol/src/ForkObolToken.sol and the canonical OBOL token at 0x0B010000b7624eb9B3DfBC279673C76E9D29D5F7 (verified via Sourcify full-match). The test does three independent checks for the bits that affect x402 Permit2 settlement: 1. Greps the .sol source for the EIP-712 typehash + Permit typehash string literals (catches accidental constant edits). 2. keccak256s those literals in Go and compares to the canonical bytes (catches typo drift on either side). 3. Reproduces mainnet OBOL's DOMAIN_SEPARATOR() — 0x5a3cd81e... — from the formula keccak256(abi.encode(typeHash, nameHash, versionHash, chainid=1, address=0x0B01...)) (catches abi-encoding drift). Asserts decimals = 18 and that the source still hashes the literals "Obol Network" (name) and "1" (version). PARITY.md documents what MUST match (and is now tested) vs the deltas that are intentional (governance, access control, ENS, burn, transfer hooks) and orthogonal to settlement. contracts/fork-obol/.gitignore added so forge build artefacts (cache/, out/, broadcast/) stop showing up as untracked.
Symptom: a colleague's Hermes agent answered every prompt with a wall of
text describing its own tool list, because the configured default model
was llama3.2:1b — too small to handle the agent's tool-using system
prompt.
Root cause: rankModels in internal/hermes/hermes.go (and the duplicate
in internal/openclaw/openclaw.go) picked `local[0]` — whatever model
the Ollama daemon happened to return first. On hosts that had recently
pulled llama3.2:1b, that 1B model won over qwen3.5:9b every time. The
old comment ("Within a tier, the first model wins") was honest about
this, just wrong as a strategy.
Fix: extract a single capability-aware ranker into internal/model:
- Cloud models (Claude, GPT, o-series) outrank local models.
- Within the cloud tier, an explicit precedence table prefers Opus
over Sonnet over Haiku, gpt-5 over gpt-4 over gpt-3.5, etc.
- Within the local tier, models are sorted by parameter count parsed
from the tag — `qwen3.5:9b` → 9, `mixtral:8x7b` → 56, `llama3.2:1b`
→ 1. Larger first.
- Untagged Ollama models fall back to a family-default table; the
table is iterated longest-prefix-first so `llama3.3` (default 70)
matches before `llama3` (default 8).
- Tiebreak alphabetically for determinism.
- Embedding models (nomic-embed) score 0 so they never become the
chat default.
Both internal/hermes/rankModels and internal/openclaw/rankModels are
now thin wrappers over model.Rank — the openclaw one preserves its
`openai/` prefix for LiteLLM routing.
Eight table-driven tests in internal/model/rank_test.go cover the
regression scenario, the cloud quality table, parameter parsing for
b/Bx7b/235b shapes, the longest-prefix family lookup, alphabetical
tiebreak, the embedding-model exclusion, and the empty-input case.
The model-rank fix prevents 1B-parameter models from becoming the agent default, but the regression was only visible at the response layer (tool-catalogue parroting). Add assertions that exercise both layers, not just status codes: flow-04 (free Hermes inference, getting-started.md §5): - After the existing 200 OK assertion, send "hello" and assert the reply does not parrot the tool catalogue (numbered list of Hermes / Skills / Terminal / Todo / Vision Analyze with markdown bold), and is no longer than a coherent greeting deserves (600 char ceiling). - Read the configured default model from hermes-config and reject any tag declaring 1B / 0.5B / 0.6B parameters as too small for the agent's tool-using system prompt. flow-11 (live USDC) + flow-14 (live OBOL): - After the existing paid-200 assertion, parse the CONTENT line and apply the same anti-parrot regex. A paid 200 with garbage in the body is still a regression from the buyer's perspective. internal/hermes/rankmodels_test.go + internal/openclaw/rankmodels_test.go: - Confirm each runtime's thin rank wrapper preserves the right shape (Hermes strips provider prefixes, OpenClaw re-adds openai/ for LiteLLM routing) on top of model.Rank. Together with the existing model.Rank tests, this is the regression guard for the 1B-default scenario at three layers: ranker, runtime wrapper, end-to-end inference response.
Ollama tags like `qwen3:0.6b` (and `1.5b`, `0.5b`, etc.) didn't match the original regex `(\d+(?:x\d+)?)b` and fell through to the family default — meaning `qwen3:0.6b` got rank 14 (qwen3 family) and was mistakenly chosen over qwen3.5:9b. The 0.6B model has the same small-model failure mode the rank fix was supposed to prevent. Updated regex accepts `\d+(?:\.\d+)?(?:x\d+(?:\.\d+)?)?b` so decimal sizes parse correctly. Ranks are now expressed in deci-billions (params × 10) so `0.6b` → 6, `1b` → 10, `9b` → 90 — distinct integer values for the comparator. Family defaults table scaled to match. Two new test cases pin the regression: `qwen3:0.6b` must lose to `qwen3.5:9b`, and `smol:1.5b` (untagged family) must lose to a known 9B model.
Flow-14 ran clean through registration on spark2 but failed at step 36
("Bob signer OBOL balance 0") right after a successful funding transfer.
Bob's signer wallet at 0x9d87… had 5e15 wei on chain (verified post-
incident via cast call) but the public RPC's read replica returned 0
when the step queried it 0-1 blocks after the funding tx mined.
Then step 41's PurchaseRequest CR never appeared because buy.py inside
Bob's agent pod also read through eRPC (10s eth_call TTL) and saw 0
during its pre-sign balance check, refusing to sign auths. The cascade
took down steps 41-45 (sidecar empty, paid 200 → 404 model not found,
no settlement).
Same pattern flow-11 already uses for the USDC sibling flow — port it:
- Step 36 wraps balanceOf in a 12-attempt × 2s poll against the public
RPC. Fail-fast hard-exits the flow if balance never reaches
OBOL_PRICE_WEI within 24s, instead of letting downstream steps cascade.
- New step "Bob: eRPC reflects funding" runs buy.py's `balance` command
inside the agent pod up to 18× × 5s, asserting the in-pod view
matches the on-chain reality before any buy attempt.
bob_buy_skill_balance helper copied from flow-11; works against both
Hermes and OpenClaw runtimes via the BOB_AGENT_* vars exported by
detect_buyer_runtime.
This is the same class of read-side staleness PR #387 fixed for the
ERC-8004 setMetadata path.
The previous attempt at the in-pod balance poll called `buy.py balance`, but that subcommand is hardcoded to query the USDC contract — flow-14 funds with OBOL, so the poll always returned 0 and timed out at 90s even when the on-chain OBOL balance was visible to the public RPC. Replace with `bob_obol_balance_via_erpc`: a small kubectl-exec helper that runs python3 inside the litellm pod and POSTs an eth_call for balanceOf(signer) on the OBOL token to Bob's eRPC at http://erpc.erpc.svc.cluster.local:4000/rpc/base-sepolia. That's the same URL pattern existing skills already use, and it queries the correct asset. Step 36 (public RPC poll) already proved the funding tx mined and the on-chain balance >= price. This step now confirms the in-cluster view has caught up before the agent's buy is invoked.
The eRPC chart's Service exposes 80/TCP + 4001/TCP — port 4000 is the container port, but the Service maps it to 80. Other in-cluster skills (signer.py, rpc.py) get this right by hitting the bare hostname; only discovery.py uses :4000 explicitly and it's wrong. Verified against the live spark2 cluster: GET on http://erpc.erpc.svc.cluster.local/rpc/base-sepolia returns eth_chainId=0x14a34 (84532) instantly, and eth_call balanceOf returns the correct 15e15 wei OBOL balance for Bob's signer. Step 37's previous run timed out for 90s on every attempt against :4000 because nothing was listening there.
Step 48's strict pre/post equality on Bob's signer balance fails when
the funding tx in step 35 races the public RPC's read replicas:
signer pre-fund: 10e15
step 35 funds: +5e15 → 15e15 actual
step 36 polls: 15e15 (sometimes), 10e15 (when reads land on a
replica that hasn't seen the funding tx yet)
step 47 settlement: -1e15 → 14e15 or 19e15 depending on which side
of the funding stale read landed
The settlement itself is correct in either case. We already assert the
two canonical proofs strictly:
- Alice's balance delta == OBOL_PRICE_WEI (matches every run)
- On-chain Transfer(signer → Alice, OBOL_PRICE_WEI) event archived
Convert the redundant Bob-signer pre/post check from a hard fail to an
informational pass that surfaces the diff. Settlement correctness is
unchanged.
Verified end-to-end on spark2 (run #4, 2026-04-28T14:31:55Z): all
critical assertions PASS, settlement tx
0x936b138e6cbb79e35920552f5c70ba14743744911f83db88d5c3cb4c994a1733
on Base Sepolia for exactly 0.001 OBOL.
Final report — live OBOL Permit2 settlement on Base Sepolia ✅TL;DRThe OBOL x402 Permit2 path now works end-to-end on live Base Sepolia. First confirmed settlement is tx This PR went from "scaffolded but never run" to "validated on real testbed hardware against live RPCs and the public Obol facilitator", and along the way fixed three regressions surfaced by trying to actually run it. On-chain evidence
ForkObolToken (the live Base Sepolia OBOL test artifact) deployed at
Settlement event log (parsed from tx receipt): Plus 2× Inference correctness on the paid pathThe agent's paid call returned a real reasoning answer, not a tool-catalogue parrot: The new What's in the 10 commitsfeat — new surfaces
test — guards added
fix — 5 regressions caught while testing
Run-by-run on spark2
Run #4 metrics from the log:
{
"commit": "8bad15b295b046f93b65c9f74c9b93cb21e8c3e2",
"alice": "0x58aA1bB710Dc8319C4b2Cca108bCc2974c66172A",
"bobSigner": "0x9d87179b323eB2Ad4267BFd055AfA2Ad8237A982",
"bobFunding": "0xdeA5bCc56289Eb6D50aCc80f8907BAc45b91D5Aa",
"tunnel": "https://debate-rocks-finest-continuous.trycloudflare.com",
"obolToken": "0x54AE82bc871a4E3E8E2FE1173Cb864B8563D44D4",
"obolTokenName": "Obol Network",
"obolTokenSymbol": "OBOL",
"obolTokenDomainSeparator": "0xc21da3ed0501015df2d9efb304b2abbdabeb86398c8fc729d491740a061e9b25",
"facilitator": "https://x402.gcp.obol.tech",
"baseSepoliaRpc": "https://sepolia.base.org",
"transactions": {
"funding": "0xffab6d15304bd63e858beb0d0925d69f164e1411944a767fba21f8e12859d3e5",
"settlement": "0x936b138e6cbb79e35920552f5c70ba14743744911f83db88d5c3cb4c994a1733"
}
}Status of the 3 remaining soft FAILsThese predate this PR — both flow-13 (Anvil fork OBOL) and flow-11 (USDC live) hit the same softness and both flows are still considered green in the v0.9.0-rc1 release notes.
Test plan — final state
Cross-references
Settlement is on-chain, receipts are archived, the colleague's |
Summary
Picks up the v0.9.0-rc1 token-report follow-ups so we can run the OBOL Permit2 path against live Base Sepolia (not Anvil), and proactively surface the eRPC pin staleness that triggered PR #387's setMetadata revert.
Targets
integration/pr377-pr381(the v0.9.0-rc1 branch) so the next RC bump can ship this as part of the same release line.What's in here
1.
obol network statusreachability probe (feat(network)commit)obol network statusnow sends aneth_chainIdJSON-RPC to every upstream in the eRPC config (parallel, 2s timeout) and warns when:Output suggests the actionable recovery:
obol network remove <name>.--no-probeopts out for callers in tight loops.2.
flow-14-live-obol-base-sepolia.sh(feat(flow-14)commit)Live-network sibling of flow-13. Same dual-stack topology, but:
forge create ForkObolTokenper runOBOL_TOKEN_BASE_SEPOLIAenv (pre-deployed)mint()from deployertransfer()fromBOB_FUNDING_PRIVATE_KEYWaitForAgenton the OBOL path)Required env vars fail fast at the top of the script (no gas spent if missing):
REMOTE_SIGNER_PRIVATE_KEY— Alice's seller key (Base Sepolia ETH holder)OBOL_TOKEN_BASE_SEPOLIA— deployed ERC20Permit addressBOB_FUNDING_PRIVATE_KEY— buyer's wallet, must already hold real OBOL3. EIP-712 early-fail probe (both flow-13 and flow-14)
The ServiceOffer pins
eip712Namefor Permit2 signing. If it doesn't match the deployed token'sname(), the buyer signs against a domain the contract'spermit()rejects, and the failure surfaces deep inside facilitator/verifywith no useful trace. Both flows now query the on-chainname()and assert before any signing happens.4. eRPC pin teardown (flow-13)
flow-13's cleanup trap now drops the
base-sepoliapin from both Alice's and Bob's clusters before exit, so a custom upstream pointed at the (now-killed) Anvil port can't leak into the next flow's reads — the report's Risk #1.5. Operator note: facilitator signer balance (
docs/guides/monetize-inference.md)eip2612_gas_sponsoring: trueshifts gas to the facilitator signer for the OBOL Permit2 path. Documents the operator runbook entry: monitor signer balance, alarm above empty, refill plan. The chart-side metric work is tracked separately inobol-infrastructure.Test plan
go build ./...cleango test ./internal/network/... ./cmd/obol/... ./internal/erc8004/...clean (6 new probe tests pass)bash -n flows/flow-13-dual-stack-obol.sh && bash -n flows/flow-14-live-obol-base-sepolia.shparse cleanobol network statuswarns on a deliberately-stale custom pin (manual smoke test against a live cluster)Cross-references
/Users/bussyjd/Development/Obol_Workbench/obol-stack/.claude/worktrees/integration-pr377-pr381/.tmp/v0.9.0-rc1-obol-x402-report.md(Risk Bring obolup code to this repo #1, Risk Add helm to obolup #2, Lesson Move to namespace, attempt erigon3 #6).