fix(payment): verify closeness against pure-XOR view; escalate Merkle on mismatch#114
Merged
jacderida merged 3 commits intoMay 26, 2026
Conversation
… on mismatch Two upload-breaking regressions on testnets with a meaningful NAT fraction, both from storer-side closeness verification diverging from the uploader's network-walked peer selection. Single-node close-group check (introduced in #107): switch off the reachability-reranked find_closest_nodes_local_with_self onto the XOR-only find_closest_nodes_local_by_distance_with_self. The re-rank (saorsa-core #121) demoted XOR-close relay-only / NAT'd peers out of the local top-CLOSE_GROUP_SIZE, dropping 2-3 of the uploader's 7 quoted peers and breaching the >=5 threshold. This mirrors the fix already applied to the Merkle path; it remains a pure local lookup, so no added network cost. Merkle candidate-pool check (changed in #111): #111 moved the check off the authoritative network lookup onto the local routing table, with the fallback gated on local-table *size*, not match *outcome*. On a real network the local k-bucket sample legitimately diverges from the uploader's network-walked candidates (which include reachable responders from positions 17-32), so honest pools were hard-rejected with no escalation. Keep #111's local fast path (accept on a local match), but escalate to the authoritative network lookup on match *failure* too. Bound the reopened network-fallback path with a new closeness_fallback_permits semaphore (CLOSENESS_NETWORK_FALLBACK_CONCURRENCY = 16): inflight_closeness already collapses same-pool concurrency, and this caps the distinct-pool case so a forged-pool flood cannot spawn unbounded 240s Kademlia walks -- addressing the DoS rationale #111 gave for the size-only gate. Requires saorsa-core's find_closest_nodes_local_by_distance_with_self (WithAutonomi/saorsa-core#122) on rc-2026.5.4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up find_closest_nodes_local_by_distance_with_self (WithAutonomi/saorsa-core#122, now merged to rc-2026.5.4) that the single-node close-group verification change depends on. The crate is pinned to `branch = "rc-2026.5.4"`, so this only advances Cargo.lock from 1be7352 to 82bb541; no manifest change. ant-node now compiles against the published branch without a local patch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI only triggered for push/pull_request against `main`, so PRs targeting release branches (e.g. rc-2026.5.4) ran no checks. Add `rc-*` to both branch filters. Note: the pull_request branch filter is evaluated against the PR's base branch, so this only starts firing for rc-targeted PRs once it has landed on the rc-2026.5.4 branch itself (i.e. after this PR merges). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two upload-breaking regressions seen on testnets with a meaningful NAT fraction (e.g. STG-01: ~900 nodes, 30% NAT-simulated — every upload failed). Both stem from the same root cause: storer-side closeness verification compares the uploader's network-walk-derived peer set against the storer's local routing-table view, and those views diverge on a real network.
1. Single-node payment close-group check (regressed by saorsa-core #121)
The check (introduced in #107) calls
find_closest_nodes_local_with_self. When saorsa-core #121 re-ranked that lookup(reachability_tier, xor_distance)— correct for storage selection — the verifier silently inherited it. The re-rank demotes XOR-close relay-only / NAT'd peers out of the local top-CLOSE_GROUP_SIZE, while the uploader's quoted set (get_store_quotes, pure XOR) still contains them. Result: only 3–4 of 7 quoted peers match, breaching the>=5threshold → every single-node payment rejected.Fix: switch to the XOR-only
find_closest_nodes_local_by_distance_with_self(WithAutonomi/saorsa-core#122), exactly mirroring the Merkle closeness path. Still a pure local lookup → no added network cost.2. Merkle candidate-pool closeness check (regressed by #111)
#111 moved this check off the authoritative iterative-Kademlia network lookup onto the local routing table (a real per-chunk latency win), with the network fallback gated on local-table size, not match outcome. The DoS argument was sound — an attacker can't make a victim's table sparse — but it also means that when the local k-bucket sample legitimately disagrees with the uploader's network-walked candidates (which include reachable responders from positions 17–32), honest pools are hard-rejected with no escalation. On STG-01 this was 100% of Merkle uploads (
candidate pub_keys do not match; node logs: all match-failures, zero sparse-defers, zero timeouts).Fix: keep #111's local fast path — accept immediately on a local match (the common case on a healthy, converged network) — but escalate to the authoritative network lookup on match failure as well as sparsity. This restores the pre-#111 correctness only on the (rare, on a healthy net) mismatch path, not per chunk.
DoS bound: the reopened network-fallback path is capped by a new
closeness_fallback_permitssemaphore (CLOSENESS_NETWORK_FALLBACK_CONCURRENCY = 16).inflight_closenessalready collapses same-pool concurrency; this caps the distinct-pool_hashcase so a forged-pool flood cannot spawn unboundedCLOSENESS_LOOKUP_TIMEOUT-long walks. Excess concurrent walks get a fast, retryable rejection rather than queueing — addressing the exact DoS rationale #111 gave for the size-only gate.Dependency / merge order
find_closest_nodes_local_by_distance_with_self) to land onrc-2026.5.4first. Thesaorsa-coregit pin then resolves it viacargo update -p saorsa-core; until then CI here will not compile. (Verified locally against #122 via a path patch.)Test plan
cargo check --lib— clean (against saorsa-core#122)cargo test --lib payment::verifier— 67 passed, 0 failedcargo clippy --lib --all-features -- -D clippy::panic -D clippy::unwrap_used -D clippy::expect_used— cleancargo fmt --all -- --check— cleanstore_durations_msp50/p99 vs baseline to confirm the fast path is preserved🤖 Generated with Claude Code