Problem
The obol-stack eRPC config (internal/embed/infrastructure/values/erpc.yaml.gotmpl) routes each RPC request to a single upstream chosen by eRPC's selectionPolicy + score, with hedge for latency fallback. For four EVM networks (mainnet, hoodi, base, base-sepolia) we currently rely on one upstream's answer per request.
For our read paths that underpin payment/registration correctness — agent-registration document fetches, ERC-8004 registry reads, USDC balance checks, eth_call of payment requirements — a single malicious or desynced upstream can return a wrong answer that no layer above detects. Consensus validation between multiple upstreams catches this cheaply.
Proposed config
Add a consensus entry to the failsafe list for high-trust read methods on each EVM network:
failsafe:
- matchMethod: "eth_call|eth_getLogs|eth_getTransactionReceipt|eth_getTransactionByHash|eth_getBlockByNumber|eth_getBlockByHash|eth_chainId"
consensus:
maxParticipants: 3 # fan out to 3 upstreams in parallel
agreementThreshold: 2 # 2 of 3 must match → return majority answer
punishMisbehavior:
disputeThreshold: 3
disputeWindow: 10s
sitOutPenalty: 5m
- matchMethod: "*" # non-consensus path stays for latency-sensitive reads
timeout:
duration: 30s
retry:
maxAttempts: 2
delay: 100ms
hedge:
delay: 500ms
maxCount: 1
Apply to all four EVM network blocks (lines ~80-145 of the gotmpl). Keep the existing selectionPolicy intact for eth_sendRawTransaction routing; consensus only activates for read methods.
Why 3/2, not 2/2
2/2 means every paid request fails as soon as one upstream is flaky → negates the resilience we already have.
3/2 tolerates one upstream failure/disagreement per request, returns the majority answer, and the punishMisbehavior block auto-quarantines consistently-misbehaving upstreams for 5 min.
Upstream prerequisite
Each affected chain must have ≥ 3 upstreams configured in the upstreams: array for consensus to have anyone to vote with. Current state:
chainId: 1 (mainnet) — verify count; add more public RPCs via obol network add if needed.
chainId: 560048 (hoodi) — likely only 1 today.
chainId: 8453 (base) — verify.
chainId: 84532 (base-sepolia) — 1 (base-sepolia-publicnode) + whatever is added by obol network add.
When count < 3, eRPC degrades gracefully — it queries however many exist — but the resilience goal isn't met. So this issue should include bumping the default ChainList seed count or guaranteeing a minimum.
Explicit non-goals
- Do not apply consensus to
eth_sendRawTransaction — routing stays single-upstream (already handled by selectionPolicy).
- Do not apply to
eth_blockNumber/eth_syncing/latency-critical head checks — treat those as the matchMethod: "*" fallthrough.
- Do not use
agreementThreshold: 3 (of 3) — one slow upstream fails every call.
Optional: nonce handling
For eth_getTransactionCount, lagging replicas routinely disagree. Instead of strict consensus, use:
- matchMethod: "eth_getTransactionCount"
consensus:
maxParticipants: 3
agreementThreshold: 1
preferHighestValueFor:
eth_getTransactionCount:
- result
This returns the highest observed nonce, preventing stale-nonce transaction failures.
Validation plan
- Unit:
internal/network/erpc_test.go — template render with the new failsafe block.
- Integration: seed 3 base-sepolia RPCs (publicnode + alchemy public + drpc public). Probe
eth_call against the registry contract; flip one upstream to return wrong data (mock); confirm request still returns majority answer.
- Observability: eRPC emits metrics on consensus participation — expose them in Grafana.
References
Problem
The obol-stack eRPC config (
internal/embed/infrastructure/values/erpc.yaml.gotmpl) routes each RPC request to a single upstream chosen by eRPC'sselectionPolicy+ score, withhedgefor latency fallback. For four EVM networks (mainnet, hoodi, base, base-sepolia) we currently rely on one upstream's answer per request.For our read paths that underpin payment/registration correctness — agent-registration document fetches, ERC-8004 registry reads, USDC balance checks,
eth_callof payment requirements — a single malicious or desynced upstream can return a wrong answer that no layer above detects. Consensus validation between multiple upstreams catches this cheaply.Proposed config
Add a
consensusentry to thefailsafelist for high-trust read methods on each EVM network:Apply to all four EVM network blocks (lines ~80-145 of the gotmpl). Keep the existing
selectionPolicyintact foreth_sendRawTransactionrouting; consensus only activates for read methods.Why 3/2, not 2/2
2/2means every paid request fails as soon as one upstream is flaky → negates the resilience we already have.3/2tolerates one upstream failure/disagreement per request, returns the majority answer, and thepunishMisbehaviorblock auto-quarantines consistently-misbehaving upstreams for 5 min.Upstream prerequisite
Each affected chain must have ≥ 3 upstreams configured in the
upstreams:array for consensus to have anyone to vote with. Current state:chainId: 1(mainnet) — verify count; add more public RPCs viaobol network addif needed.chainId: 560048(hoodi) — likely only 1 today.chainId: 8453(base) — verify.chainId: 84532(base-sepolia) — 1 (base-sepolia-publicnode) + whatever is added byobol network add.When count < 3, eRPC degrades gracefully — it queries however many exist — but the resilience goal isn't met. So this issue should include bumping the default ChainList seed count or guaranteeing a minimum.
Explicit non-goals
eth_sendRawTransaction— routing stays single-upstream (already handled byselectionPolicy).eth_blockNumber/eth_syncing/latency-critical head checks — treat those as thematchMethod: "*"fallthrough.agreementThreshold: 3(of 3) — one slow upstream fails every call.Optional: nonce handling
For
eth_getTransactionCount, lagging replicas routinely disagree. Instead of strict consensus, use:This returns the highest observed nonce, preventing stale-nonce transaction failures.
Validation plan
internal/network/erpc_test.go— template render with the new failsafe block.eth_callagainst the registry contract; flip one upstream to return wrong data (mock); confirm request still returns majority answer.References
internal/embed/infrastructure/values/erpc.yaml.gotmpl