fix(replication): bound replication queues with per-source fairness (D1 DoS)#99
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens the replication scheduling queues against a D1 DoS by bounding memory growth and enforcing per-source fairness so a single routing-table peer cannot monopolize verification/fetch work.
Changes:
- Add global capacity limits for
pending_verifyandfetch_queue, rejecting/dropping admissions once full. - Add per-
hint_senderaccounting (pending_per_sender) to enforce a per-peer pending cap for fairness. - Add an integration PoC/regression test suite covering global bounds, per-source fairness, and counter consistency.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/replication/scheduling.rs |
Introduces global/per-sender caps, per-sender accounting, and bounded enqueue behavior for replication queues. |
tests/poc_d1_bounded_queues.rs |
Adds PoC regression tests validating bounded queues and per-source fairness under flooding. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Get a mutable reference to a pending verification entry. | ||
| /// | ||
| /// INVARIANT: callers MUST NOT reassign `entry.hint_sender` through the | ||
| /// returned reference. The per-source quota counter | ||
| /// (`pending_per_sender`) is keyed by the `hint_sender` recorded at | ||
| /// admission; re-attributing a live entry to a different peer here would | ||
| /// orphan a count (decremented against the wrong peer on removal/eviction) | ||
| /// and silently desync the quota. Mutate only verification-progress | ||
| /// fields (e.g. `state`, `verified_sources`, `tried_sources`). | ||
| pub fn get_pending_mut(&mut self, key: &XorName) -> Option<&mut VerificationEntry> { | ||
| self.pending_verify.get_mut(key) |
| // Attacker floods far beyond any single-peer budget. | ||
| let attacker_flood: u32 = (MAX_PENDING_VERIFY_PER_PEER as u32).saturating_add(500_000); | ||
| let mut attacker_admitted = 0usize; | ||
| for i in 0..attacker_flood { | ||
| if queues.add_pending_verify(unique_xorname(i), entry_from(attacker)) { | ||
| attacker_admitted += 1; | ||
| } | ||
| } |
…irness (D1) ReplicationQueues::pending_verify (HashMap) and fetch_queue (BinaryHeap) had no capacity bound (the source carried the project's own TODO). handle_neighbor_sync_request documents "No per-request hint count limit" and the only gate is sender_in_rt, so a routing-table peer could flood NeighborSyncRequest hints (each message capped only by MAX_REPLICATION_MESSAGE_SIZE, ~320k 32-byte hints) and grow these structures unboundedly: memory exhaustion plus a self-amplifying outbound verification/fetch request storm. Fix, two layers: - Global memory backstop: MAX_PENDING_VERIFY / MAX_FETCH_QUEUE (131_072 each). add_pending_verify / enqueue_fetch reject once full (callers already treat non-admission as "not added"). - Per-source fairness (the actual D1 defence): each pending entry is accounted to its hint_sender via a pending_per_sender counter kept in lockstep with pending_verify (insert/remove/evict_stale). A single source may hold at most MAX_PENDING_VERIFY_PER_PEER (8_192) entries, so a flooding peer exhausts only its own quota and can never starve honest peers, whose quotas are independent. hint_sender is the cryptographically authenticated connection identity, not a forgeable payload field, so the accounting cannot be evaded. A global cap alone would have converted the memory DoS into a worse silent honest-replication starvation DoS; the per-source quota is what actually closes D1. Counter decrement uses saturating_sub with a debug_assert; get_pending_mut documents the no-reattribution invariant. A residual (~16 in-RT Sybil PeerIds to reach the global backstop) is documented as an accepted, tracked follow-up. tests/poc_d1_bounded_queues.rs proves the attack is defeated and that the critical starvation test fails against a global-cap-only fix. 460 lib tests pass; cfd clean.
…etter Per review-wave findings on PR WithAutonomi#99: Replace the `get_pending_mut(&XorName) -> Option<&mut VerificationEntry>` helper with `set_pending_state(&XorName, VerificationState) -> Option<HintPipeline>`. The old API exposed `&mut hint_sender` to callers and relied on a doc comment to keep them from re-assigning it; a re-attribution would have orphaned a per-source quota count and silently desynced the fairness counter — the same silent-starvation class this PR was written to prevent. The new setter writes only `state` and returns `pipeline`, making the desync impossible to commit by accident. Single production caller (run_verification_cycle in replication/mod.rs) migrated; D1f test renamed/updated to exercise the new setter; PoC flood shrunk from +500_000 to +10_000 over the cap (same security signal, faster in debug builds).
102d080 to
f3fd84f
Compare
…s capacity rejections Per codex round-2 review on PR WithAutonomi#99 (two BLOCKERs + one NIT): 1. [BLOCKER] Silent verified-key drop on full fetch_queue. Both verification paths removed the key from pending_verify BEFORE calling enqueue_fetch. When MAX_FETCH_QUEUE was full, enqueue_fetch silently dropped the key, so a verified key disappeared from every queue with no retry path. New `ReplicationQueues::promote_pending_to_fetch` checks capacity FIRST, then removes-from-pending-and-enqueues atomically; on capacity miss the pending entry is left in place so the next verification cycle retries. Both production call sites in `run_verification_cycle` migrated. enqueue_fetch now also returns bool (true = enqueued, false = dedup-or-full) so the few raw callers can observe outcomes. 2. [BLOCKER] False bootstrap completion under capacity-rejected hints. `add_pending_verify` now returns `AdmissionResult` (Admitted / AlreadyPresent / CapacityRejected) instead of bool, so callers can tell "no work needed" from "work was silently dropped". `admit_and_queue_hints` now reports a `capacity_rejected_count` alongside `discovered`; bootstrap paths (initial sync, neighbour sync, recurring sync) feed that count into a new `BootstrapState.capacity_rejected_outstanding` counter via `bootstrap::note_capacity_rejected`. `check_bootstrap_drained` refuses to mark drain complete while that counter is non-zero, so a bootstrap snapshot whose hints overflowed the queue caps stays pending until the source re-delivers them post-drain. 3. [NIT] `evict_stale_removes_old_entries` test used to bypass `add_pending_verify` via direct field access, so it no longer exercised the per-sender counter maintenance the followup added to `evict_stale`. Routed through the public API and added a per-sender counter assertion. A new `AdmissionResult::admitted()` helper preserves backward-compat for the dozen-plus call sites that just want the boolean. The few production sites that distinguish capacity vs dedup branch on the enum. 479 lib tests pass; cfd clean.
| // drained. | ||
| if outcome.capacity_rejected_count > 0 { | ||
| bootstrap::note_capacity_rejected( | ||
| &bootstrap_state, | ||
| outcome.capacity_rejected_count, | ||
| ) | ||
| .await; |
| if is_bootstrapping { | ||
| if !outcome.discovered.is_empty() { | ||
| bootstrap::track_discovered_keys(bootstrap_state, &outcome.discovered).await; | ||
| } | ||
| if outcome.capacity_rejected_count > 0 { | ||
| bootstrap::note_capacity_rejected(bootstrap_state, outcome.capacity_rejected_count) | ||
| .await; | ||
| } | ||
| } |
| } | ||
| if outcome.capacity_rejected_count > 0 { | ||
| bootstrap::note_capacity_rejected(bootstrap_state, outcome.capacity_rejected_count) | ||
| .await; |
| let distance = crate::client::xor_distance(&key, p2p_node.peer_id().as_bytes()); | ||
| q.enqueue_fetch(key, distance, sources); | ||
| // Atomic remove+enqueue: if fetch_queue is at capacity, the | ||
| // pending entry is preserved and retried next cycle (no | ||
| // silent drop of verified replica-repair work). | ||
| let _ = q.promote_pending_to_fetch(key, distance, sources); | ||
| } |
| if fetch_eligible && !sources.is_empty() { | ||
| let distance = | ||
| crate::client::xor_distance(&key, p2p_node.peer_id().as_bytes()); | ||
| q.remove_pending(&key); | ||
| q.enqueue_fetch(key, distance, sources); | ||
| // Not terminal — key moved to fetch queue. | ||
| // Atomic remove+enqueue: on fetch_queue capacity miss | ||
| // the pending entry is preserved so this verified key | ||
| // is retried on the next cycle (no silent drop). | ||
| let _ = q.promote_pending_to_fetch(key, distance, sources); | ||
| // Not terminal — either moved to fetch queue, or | ||
| // retained as pending until queue drains. |
…n drain Per round-3 review on PR WithAutonomi#99 (both Claude + codex flagged the same BLOCKER): The round-2 commit introduced `note_capacity_rejected` / `clear_capacity_rejected` for bootstrap drain accounting but only wired the first one. `clear_capacity_rejected` was defined as the retirement path in the doc comment yet had zero callers. The counter was monotonically increasing, so after the first `pending_verify` capacity rejection during bootstrap the drain check returned `false` forever — the node never transitioned out of `is_bootstrapping`. Fix: track outstanding rejections per `PeerId` (not as a single counter) so each source's contribution to "not yet drained" is retired independently. `note_capacity_rejected` adds the source to a `HashSet<PeerId>`; `clear_capacity_rejected(source)` removes it. The three production `admit_and_queue_hints` call sites now branch explicitly: a cycle with `capacity_rejected_count > 0` notes the source, while a clean cycle from the same source clears it. This matches the per-source granularity codex recommended (treating one source's clean cycle as evidence for THAT source's hints, not others'), so a node bootstrapping from multiple peers can't have one peer's clean cycle wrongly retire another peer's outstanding work. Two new tests in `bootstrap::tests` pin the behaviour: capacity_rejected_clears_on_clean_cycle — round-3 regression guard capacity_rejected_is_per_source — per-source granularity 481 lib tests pass; 211 replication tests; 6 D1 PoC; cfd clean.
dirvine
left a comment
There was a problem hiding this comment.
Review: PR #99 — D1 DoS fix
Architecture & Soundness
The two-layer defence is exactly right:
- Global memory backstop (MAX_PENDING_VERIFY / MAX_FETCH_QUEUE) prevents unbounded memory growth.
- Per-source fairness (MAX_PENDING_VERIFY_PER_PEER) prevents a single peer from starving honest peers — the key architectural insight that a global cap alone would convert OOM into worse honest-replication starvation.
The test poc_d1_flooding_peer_cannot_starve_honest_peer is critical: it verifies the actual D1 defence (per-source fairness), not just the memory bound. Well designed.
Implementation details
- AdmissionResult enum (Admitted / AlreadyPresent / CapacityRejected) is a clean improvement over the previous bool — distinguishes dedup from genuine capacity rejection, essential for bootstrap drain correctness.
- promote_pending_to_fetch (remove+enqueue under the write lock) prevents the silent-drop regression. The RwLock serialization means there is no race.
- set_pending_state replacing get_pending_mut is a good API narrowing — prevents accidental hint_sender re-attribution that would desync the per-source counter.
- release_sender_slot with saturating_sub + debug_assert is defensive and test-friendly.
- Bootstrap drain with capacity_rejected_sources per-peer is thorough — the round-3 regression test validates the clear mechanism explicitly.
Residual risk
The PR documentation notes the approx 16 Sybil residual. This is acceptable because:
- The global memory backstop still bounds worst-case memory to ~131K entries (~128 MB).
- 16 simultaneous Sybils in the routing table is a substantial operational bar.
- The degradation is to bounded memory, not to silent starvation.
- This is correctly tracked as a follow-up, not left unexamined.
Verdict: LGTM — well-structured, well-tested, merge-ready.
Summary
Security fix for D1:
ReplicationQueues::pending_verify(HashMap) andfetch_queue(BinaryHeap) had no capacity bound — the source carried the project's ownTODO.handle_neighbor_sync_requestdocuments "No per-request hint count limit" and the only gate issender_in_rt, so a single routing-table peer can floodNeighborSyncRequesthints (each message capped only byMAX_REPLICATION_MESSAGE_SIZE≈ 10 MiB → ~320k 32-byte hints) and grow these structures unboundedly:Fix (two layers)
Global memory backstop —
MAX_PENDING_VERIFY/MAX_FETCH_QUEUE(131_072 each).add_pending_verify/enqueue_fetchreject once full; callers already treat non-admission as "not added".Per-source fairness — the actual D1 defence. Each pending entry is accounted to its
hint_sendervia apending_per_sendercounter maintained in lockstep withpending_verify(insert /remove_pending/evict_stale). A single source may hold at mostMAX_PENDING_VERIFY_PER_PEER(8_192) entries, so a flooding peer exhausts only its own quota and can never starve honest peers, whose quotas are independent.hint_senderis the cryptographically authenticated connection identity (not a forgeable payload field), so the accounting cannot be evaded.A global cap alone would merely convert the memory DoS into a worse silent honest-replication starvation DoS (one ~4 MB message every <30 min permanently rejecting all honest hints). The per-source quota is what actually closes D1. This was caught by adversarial review and is the reason for layer 2.
Hardening: counter decrement uses
saturating_sub+debug_assert!;get_pending_mutdocuments the no-reattribution invariant. A residual (~16 simultaneously-in-routing-table SybilPeerIds to reach the global backstop) is documented as an accepted, tracked follow-up — it degrades only to the bounded memory backstop, never to silent starvation of non-Sybil peers.Proof
tests/poc_d1_bounded_queues.rs(6 tests) demonstrates the attack is defeated:poc_d1_flooding_peer_cannot_starve_honest_peer— the key test: a 508k-entry single-peer flood is clamped to 8 192 slots and honest hints are still admitted. Verified to FAIL against a global-cap-only fix (the weak fix), proving it validates the real defence, not just the memory bound.get_pending_mutno-desync, and legitimate-traffic / dedup non-regression.The original unbounded behaviour was also reproduced (pre-fix the queues grew 1:1 with attacker input); the flip was verified by neutering each enforcement layer and observing the corresponding test fail.
Validation
cfdclean (fmt + clippy, no warnings/errors).