fix(payment): gate quote freshness on own quote only, not neighbours' by jacderida · Pull Request #127 · WithAutonomi/ant-node

jacderida · 2026-06-05T00:48:35Z

Problem

The price-staleness gate introduced in 56dd370 checks every quote in the payment bundle against the verifying node's own record count:

for (encoded_peer_id, quote) in &payment.peer_quotes {
    if quote.price < min_acceptable_price { // floor derived from MY records

Fullness across a close group is wildly heterogeneous on a real network. On ant-prod-01 (2026-06-04, PROD-UL-01 upload run) the close group for a public DataMap chunk spanned 47..=1788 records. The client collected 7 quotes and paid 3× the median on-chain (14,520,949,218,750,000 atto — more than 2× even the fullest node's current price), yet:

node holding 1788 records: floor 5,271,199,218,750,000 → rejected the honest quote from the 978-record peer (4,840,316,406,250,000)
nodes holding 1180 / 1310 records: rejected the honest quote from the 47-record peer (3,908,407,226,562,500) — the 1180-record node by a margin of just 1.05%

All quotes were 10 seconds old — no staleness was involved, only cross-node fullness spread. With a fourth close-group node disk-full, the PUT landed on 3/7 peers (majority is 4): the upload failed after the on-chain payment was burned, and the 500 already-stored body chunks became unreferenced. The same gate also rejects the replication paid-notify path (observed on two further nodes at 1180/1443 records), so the chunk cannot heal past the 3 copies.

Every close group whose fullest member's 75% floor exceeds its emptiest member's price will reject all single-node payments — guaranteed steady-state on prod where freshly joined nodes coexist with established ones.

Fix

A node can only re-derive its own price from its own record count, so its own quote is the only one it can legitimately call stale (the doc comment's stated intent — "a node storing a few replicated records between quoting and verifying" — is exactly the self-staleness case). The gate now:

resolves the node's own peer ID from the attached P2PNode (same handle the merkle closeness check uses) and skips every quote in the bundle that isn't ours;
fails open with a debug log when no identity source is attached, mirroring the existing missing-record-count posture (production always attaches);
rewords the rejection message — it previously claimed "paid X" while actually printing the quote price, not the on-chain payment.

The on-chain median payment binding (completedPayments, 3× median) is unaffected.

Tests

test_neighbour_cheap_quote_not_rejected — regression test reproducing the incident shape: own fresh quote at 1788 records alongside honest neighbour quotes at 47 and 978; must pass.
test_own_stale_quote_still_rejected_among_neighbours — the gate still bites on a genuinely stale own quote.
test_freshness_skipped_without_self_peer_id — fail-open without an identity source.
Existing freshness tests updated to mark the quote under test as the node's own (set_peer_id_for_tests, cfg-gated like set_records_stored_for_tests).

All 61 payment::verifier tests pass; clippy clean. (config::tests::test_bootstrap_peers_discover_env_var fails identically on untouched main — environment-sensitive, unrelated.)

Possible follow-up

An additional/alternative gate could compare the verified on-chain amount against the node's current price after single_payment.verify() — that would also fix the "rejected despite aggregate overpayment" semantics. Kept out of this PR to stay surgical.

Companion PR (reporting): WithAutonomi/ant-client#107 — ant --json file upload emits no JSON when the public DataMap store fails, so harnesses report 0/0 chunks for uploads like this one.

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR adjusts the quote price “freshness” (anti-staleness) gate in payment::verifier so that a node only evaluates its own quote in a multi-peer payment bundle, instead of incorrectly judging neighbours’ quotes against the verifying node’s local record count. This aligns the freshness logic with what the verifying node can actually re-derive (its own price-from-fullness) and prevents heterogeneous close-group fullness from causing systematic false rejections and failed PUTs after on-chain payment.

Changes:

Add a mechanism to resolve the verifying node’s own peer ID (from attached P2PNode, with a test-only override fallback) and use it to select which quote to freshness-check.
Update validate_quote_freshness to skip all non-self quotes and improve the rejection message wording.
Add/adjust unit tests to cover neighbour-quote scenarios, self-stale rejection, and fail-open behavior when self identity is unavailable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

grumbach

Reviewed the fix end to end (read verify_evm_payment, validate_quote_freshness, the self-ID resolution, the signing/binding path, and the merkle path). This is a clean, correct, well-scoped fix and the diagnosis matches the incident. Approving, with a couple of notes for your call.

What I verified

Root cause is real and the fix is the right shape. A close group spanning 47..=1788 records makes a full node's 75% floor exceed an empty node's honest price, so the old gate rejected perfectly fresh neighbour quotes. A node can only re-derive its own price from its own record count, so gating only its own quote is correct. The neighbour's stale-replay risk is genuinely that neighbour's gate to enforce when the PUT reaches it.
The self-ID comparison actually matches in production. self_peer_id_bytes() returns *node.peer_id().as_bytes(), and an EncodedPeerId is constructed from exactly those bytes (EncodedPeerId::new(*ant_peer_id.as_bytes()) in this file). Same 32-byte encoding, so encoded_peer_id.as_bytes() != &self_peer_id is sound and won't silently skip our own quote.
No bypass from the pre-binding ordering. validate_quote_freshness runs before validate_peer_bindings and signature verification, so the encoded_peer_id label isn't authenticated yet when we pick out "our" quote. I chased the relabeling angle: an attacker could mislabel a forged cheap quote under a neighbour's peer ID to dodge the own-quote gate, but that quote then fails the BLAKE3 peer-id binding + signature (price is signed via bytes_for_signing). Both paths reject, so there's no surviving relabeled quote. The only untrusted-data op freshness performs (derive_records_stored_from_price) is already saturating/panic-proof for exactly this pre-auth reason. Fine as-is.
Scope is right. validate_quote_freshness is only called from the single-node path; the merkle/batch path has no analogous own-fullness gate, so nothing to mirror there.

Notes (non-blocking)

Two independent self-identity notions now coexist in the verifier. validate_local_recipient identifies "us" by quote.rewards_address == local_rewards_address; the new gate identifies "us" by P2PNode peer ID. If those two ever disagree for a given node (config mismatch, or one handle attached but not the other), validate_local_recipient could still consider the node a recipient while the freshness gate matches no quote and silently no-ops. It fails open, so it's not a security hole, just a latent inconsistency. Might be worth a one-line comment noting the two are expected to refer to the same node, or a debug log when the bundle contains our rewards address but no quote under our peer ID.
Tiny doc-comment nit. The new doc on validate_quote_freshness is excellent, but it says "this node has no basis to judge it against its own record count" for neighbour quotes, which is the key insight worth keeping front and center. Consider also stating explicitly that a bundle is expected to contain exactly one quote per peer (so the loop matches at most one own quote) so a future reader doesn't wonder about multiple-own-quote handling.

Tests

The three new tests cover the regression shape (own fresh @1788 + honest neighbours @47/@978), own-still-bites, and fail-open-without-self-id. The fail-open test relies on no override being set, which is the right negative. Since these call validate_quote_freshness directly, the pub_key: [0u8; 64] fakes are fine (binding/signature aren't on this path). Coverage matches the change well.

Nice surgical fix. LGTM once you've eyeballed note 1.

The price-staleness gate (56dd370) checked every quote in the payment bundle against the verifying node's OWN record count. Fullness across a close group is wildly heterogeneous on a real network — on ant-prod-01 (2026-06-04) a close group spanned 47..=1788 records, so the three fullest nodes each found the emptiest node's honest, 10-second-old quote below their 75% floor and rejected the PUT after the client had already paid 3x the median on-chain (more than 2x even the fullest node's current price). With one more node disk-full, the DataMap store landed on 3/7 peers — below the majority of 4 — failing the upload and burning the payment. The same gate also blocked the replication paid-notify path, so the chunk could not heal past 3 copies. A node can only re-derive its own price from its own record count, so its own quote is the only one it can legitimately call stale. Resolve the node's peer ID from the attached P2PNode and skip every quote in the bundle that isn't ours; fail open (with a debug log) when no identity source is attached, mirroring the missing-record-count posture. The on-chain median payment binding is unaffected. Also reword the rejection message: it claimed "paid X" while actually printing the quote price, not the on-chain payment. Adds a regression test reproducing the incident shape (own fresh quote at 1788 records alongside honest neighbour quotes at 47 and 978), a test that an own stale quote still rejects among expensive neighbours, and a fail-open test for the missing-identity path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jacderida · 2026-06-05T14:35:46Z

Thanks for the thorough review @grumbach — both notes addressed in the amended commit (51f7d55 → 8f69ae1, history rewritten rather than stacked):

Note 1 (two self-identity notions): added a comment in validate_quote_freshness stating that the rewards-address identity (validate_local_recipient) and the peer-ID identity (this gate) are expected to refer to the same node, plus the debug! breadcrumb you suggested: when the bundle contains our rewards address but no quote under our peer ID, the fail-open is now logged instead of silent. That divergence is actually routine in our own fleet (all our nodes share a rewards address, so a PUT reaching a node whose own quote isn't in the bundle passes validate_local_recipient on a sibling's quote) — exactly the kind of path we'd have wanted in the node logs while reconstructing this incident.

Note 2 (at-most-one own quote): added a doc sentence noting that validate_quote_structure rejects duplicate peer IDs and runs before this gate on every path, so the loop matches at most one own quote.

No behaviour change on any rejecting path; clippy (CI flags) and all 61 verifier tests still pass.

🤖 Generated with Claude Code

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

+    /// tolerance — on ant-prod-01 a close group spanning 47..=1788 records
+    /// made the three fullest nodes reject every bundle containing the
+    /// emptiest node's (perfectly fresh, 10-second-old) quote, failing the
+    /// PUT after the client had already paid on-chain. The node can only


A proof-of-payment presented during replication is a receipt for a sale that already closed, but the verifier ran the full client-PUT check set against it. Two of those checks interrogate the present and therefore guarantee false rejections for replicated records: the own-quote price-freshness gate (record counts only grow, so every receipt's quoted price eventually drops below the verifier's live floor) and the local-recipient check (close groups churn, so a post-churn member receiving a record via replication was never a payee on the original receipt). The merkle candidate-closeness check has the same shape: it validates the winner pool against the live DHT, but the pool was sampled from the DHT of the original sale. On DEV-01 (2026-06-05) this rejected nearly 100% of replication proof-of-payment transfers within an hour of launch: 4M+ "PoP verification error ... stale" rejections at ~300k/hour, records pinned below target redundancy, close-group record counts diverging 150x (75..=11,231 per service), and a permanent ~500 MB/s fleet-wide re-offer storm (~25 TB egress in 16h). The divergence the rejections caused is also what made the client-PUT freshness gate (fixed for the heterogeneous-neighbour case in #127) keep firing: the two failure modes fed each other. Introduce VerificationContext { ClientPut, Replication } and thread it through verify_payment. Under Replication the verifier skips only the storer-being-paid-now checks (own-quote freshness, local recipient, merkle candidate closeness). Every receipt-authenticity check still runs in both contexts: quote structure, content binding to the exact address, peer-ID/ pub-key bindings, ML-DSA signatures, and the on-chain settlement lookup — a record cannot be admitted via replication without an authentic, settled payment for that record. The verified-XorName cache is context-aware to match: each entry records whether its verification ran the full client-PUT check set, a Replication-verified entry satisfies later replication lookups (re-offers of the same key are routine) but never a later ClientPut fast-path, and a full ClientPut verification upgrades the entry without ever being downgraded back. Without this, a replication receipt would let a later proof-less client PUT bypass the context-gated checks via the cache. Deliberate trade-off (documented on the enum): skipping the recipient and closeness checks for replication admits receipts from self-dealing payers who settle the median payment to their own wallet on-chain. The client-PUT path still rejects such pools, replication admission still requires responsibility for the key, and the abuse costs a settled on-chain payment per chunk; closing it properly belongs in quote issuance / payment policy rather than in the replication hot path, where the equivalent defence provably destroys the network's ability to heal. Call sites: the chunk PUT handler passes ClientPut (behaviour unchanged); the fresh-offer and paid-notify replication handlers pass Replication. Test results: payment::verifier 66/66 (5 new context tests: stale own quote and non-recipient receipts pass the gated checks under Replication, failing at the later binding/signature stage; content mismatch rejected under both contexts; duplicate-candidate merkle pool rejected under ClientPut but past the closeness check under Replication; Replication-verified cache entry does not satisfy a ClientPut fast-path, upgrades on full verification, never downgrades), replication 230/230, storage 29/29. cargo clippy --all-targets clean. Note: config::tests::test_bootstrap_peers_discover_env_var fails on machines with a real ~/.config/ant/bootstrap_peers.toml — pre-existing on main, unrelated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings June 5, 2026 00:48

Copilot started reviewing on behalf of jacderida June 5, 2026 00:48 View session

jacderida mentioned this pull request Jun 5, 2026

fix(cli): emit failure JSON when public DataMap store fails WithAutonomi/ant-client#107

Merged

Copilot AI reviewed Jun 5, 2026

View reviewed changes

jacderida force-pushed the fix/quote-freshness-own-quote-only branch from cf1ba3e to 51f7d55 Compare June 5, 2026 00:55

grumbach reviewed Jun 5, 2026

View reviewed changes

jacderida force-pushed the fix/quote-freshness-own-quote-only branch from 51f7d55 to 8f69ae1 Compare June 5, 2026 14:35

Copilot AI review requested due to automatic review settings June 5, 2026 14:35

Copilot started reviewing on behalf of jacderida June 5, 2026 14:35 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

jacderida merged commit f054fbe into main Jun 5, 2026
11 of 12 checks passed

jacderida deleted the fix/quote-freshness-own-quote-only branch June 5, 2026 15:14

jacderida mentioned this pull request Jun 5, 2026

fix: skip storer-being-paid-now checks for replication receipts #130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(payment): gate quote freshness on own quote only, not neighbours'#127

fix(payment): gate quote freshness on own quote only, not neighbours'#127
jacderida merged 1 commit into
mainfrom
fix/quote-freshness-own-quote-only

jacderida commented Jun 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

grumbach left a comment

Uh oh!

jacderida commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jacderida commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Tests

Possible follow-up

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

grumbach left a comment

Choose a reason for hiding this comment

What I verified

Notes (non-blocking)

Tests

Uh oh!

jacderida commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jacderida commented Jun 5, 2026 •

edited

Loading