feat: Kademlia DHT lookup for closest-peer chunk routing#8
feat: Kademlia DHT lookup for closest-peer chunk routing#8mickvandijke merged 3 commits intomainfrom
Conversation
Replace naive first-connected-peer selection with iterative Kademlia network queries (find_closest_nodes) so chunks are routed to the peer whose BLAKE3-derived DHT key is closest to the chunk address by XOR distance. Excludes the local node from results by comparing against its own BLAKE3 DHT key hex. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pick_target_peer was comparing DHT results (transport-level peer IDs) against the app-level peer ID, which never matched. This caused the client to route chunks to itself and timeout after 30s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Implements Kademlia-based closest-peer selection for chunk routing, replacing naive “first connected peer” targeting.
Changes:
- Adds iterative DHT lookup (
find_closest_nodes) for selecting a routing peer near a chunk address. - Introduces
xor_distanceandpeer_id_to_xor_namehelpers (with unit tests). - Updates public re-exports to expose the new helpers.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/lib.rs | Re-exports new client helper functions for external use. |
| src/client/quantum.rs | Switches peer selection to Kademlia closest-nodes lookup and excludes local peer. |
| src/client/mod.rs | Re-exports new data-type utilities from data_types. |
| src/client/data_types.rs | Adds XOR-distance + peer-id conversion helpers and unit tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let closest = closest_nodes | ||
| .into_iter() | ||
| .next() | ||
| .ok_or_else(|| Error::Network("No connected peers available".into())) | ||
| .find(|n| { | ||
| n.peer_id != *local_peer_id | ||
| && local_transport_id | ||
| .as_ref() | ||
| .map_or(true, |tid| n.peer_id != *tid) | ||
| }) | ||
| .ok_or_else(|| Error::Network("No remote peers found near target address".into()))?; |
There was a problem hiding this comment.
find(...) selects the first non-local entry from closest_nodes, but this assumes the returned list is already sorted by XOR distance (and in the same ordering you want). If find_closest_nodes returns an unsorted set (or a differently-sorted set), this can select a non-closest peer, contradicting the function docstring. Consider explicitly selecting the minimum-distance node from the filtered set (compute XOR distance to target and choose the smallest), or document/assert that the DHT API guarantees stable closest-first ordering.
| let local_peer_id = node.peer_id(); | ||
| let local_transport_id = node.transport_peer_id(); | ||
|
|
There was a problem hiding this comment.
The self-exclusion logic relies on n.peer_id being directly comparable to both node.peer_id() and node.transport_peer_id(). If find_closest_nodes returns records keyed by a different identifier (e.g., DHT node ID vs transport peer ID string formatting), this filter may fail to exclude the local node. Prefer comparing the same canonical identifier type/bytes on both sides (or use a dedicated ‘is local’/node-id equality if available from the DHT/node APIs).
| .find(|n| { | ||
| n.peer_id != *local_peer_id | ||
| && local_transport_id | ||
| .as_ref() | ||
| .map_or(true, |tid| n.peer_id != *tid) | ||
| }) |
There was a problem hiding this comment.
The self-exclusion logic relies on n.peer_id being directly comparable to both node.peer_id() and node.transport_peer_id(). If find_closest_nodes returns records keyed by a different identifier (e.g., DHT node ID vs transport peer ID string formatting), this filter may fail to exclude the local node. Prefer comparing the same canonical identifier type/bytes on both sides (or use a dedicated ‘is local’/node-id equality if available from the DHT/node APIs).
| /// Number of closest peers to consider for chunk routing. | ||
| const CLOSE_GROUP_SIZE: usize = 8; |
There was a problem hiding this comment.
CLOSE_GROUP_SIZE is a routing-critical constant but is hard-coded. Consider making it configurable via QuantumConfig (or a node/DHT config) so deployments can tune lookup breadth without code changes.
| /// Pick the closest peer to `target` using an iterative Kademlia network lookup. | ||
| /// | ||
| /// Queries the DHT for the `CLOSE_GROUP_SIZE` closest nodes to the target | ||
| /// address and returns the single closest remote peer (excluding ourselves). | ||
| async fn pick_target_peer(node: &P2PNode, target: &XorName) -> Result<String> { |
There was a problem hiding this comment.
The new routing behavior (closest-peer selection + local exclusion) is core logic but doesn’t appear to be covered by tests here. Adding a unit test with a mocked/stubbed DHT result set (including the local node and multiple remotes with known distances) would lock in the intended selection semantics and prevent regressions.
Greptile OverviewGreptile SummaryReplaced naive first-connected-peer selection with Kademlia DHT network queries for chunk routing. The implementation adds XOR distance calculation utilities and self-node filtering, but contains a critical bug in the peer selection logic. Key changes:
Critical issue: Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| src/client/quantum.rs | Replaced naive peer selection with Kademlia DHT lookup, but the closest-peer logic uses .find() instead of actual XOR distance sorting |
| src/client/data_types.rs | Added xor_distance and peer_id_to_xor_name helpers with comprehensive tests - implementation looks correct |
Sequence Diagram
sequenceDiagram
participant Client as QuantumClient
participant Node as P2PNode
participant DHT as Kademlia DHT
participant Peer as Target Peer
Client->>Client: compute_address(content)
Client->>Node: dht()
Node-->>Client: DHT handle
Client->>DHT: find_closest_nodes(target, 8)
DHT-->>Client: closest_nodes[]
Client->>Client: filter self from results
Client->>Client: select first non-self peer
Note over Client: BUG: Should sort by XOR distance,<br/>currently just uses first peer
Client->>Peer: ChunkPutRequest/ChunkGetRequest
Peer-->>Client: ChunkPutResponse/ChunkGetResponse
| let closest = closest_nodes | ||
| .into_iter() | ||
| .next() | ||
| .ok_or_else(|| Error::Network("No connected peers available".into())) | ||
| .find(|n| { | ||
| n.peer_id != *local_peer_id | ||
| && local_transport_id | ||
| .as_ref() | ||
| .map_or(true, |tid| n.peer_id != *tid) | ||
| }) | ||
| .ok_or_else(|| Error::Network("No remote peers found near target address".into()))?; |
There was a problem hiding this comment.
.find() returns the first non-self peer, not the closest by XOR distance
The find_closest_nodes DHT call returns nodes sorted by XOR distance, but using .find() just picks the first non-self peer instead of the actual closest. The defined xor_distance helper (line 25 in data_types.rs) is never used.
To select the true closest peer, you need to:
- Parse peer IDs to XorNames using
peer_id_to_xor_name - Compute XOR distance to target for each peer
- Sort by distance and pick the minimum
| let closest = closest_nodes | |
| .into_iter() | |
| .next() | |
| .ok_or_else(|| Error::Network("No connected peers available".into())) | |
| .find(|n| { | |
| n.peer_id != *local_peer_id | |
| && local_transport_id | |
| .as_ref() | |
| .map_or(true, |tid| n.peer_id != *tid) | |
| }) | |
| .ok_or_else(|| Error::Network("No remote peers found near target address".into()))?; | |
| let closest = closest_nodes | |
| .into_iter() | |
| .filter(|n| { | |
| n.peer_id != *local_peer_id | |
| && local_transport_id | |
| .as_ref() | |
| .map_or(true, |tid| n.peer_id != *tid) | |
| }) | |
| .filter_map(|n| { | |
| crate::client::peer_id_to_xor_name(&n.peer_id) | |
| .map(|xor| (n, crate::client::xor_distance(&xor, target))) | |
| }) | |
| .min_by(|(_, dist_a), (_, dist_b)| dist_a.cmp(dist_b)) | |
| .map(|(n, _)| n) | |
| .ok_or_else(|| Error::Network("No remote peers found near target address".into()))?; |
dirvine
left a comment
There was a problem hiding this comment.
Review Summary
Thanks @mickvandijke for this contribution! The approach of using Kademlia DHT lookup for closest-peer chunk routing is exactly right. However, there are a few issues that need to be addressed before merging:
🔴 Blocking: Compilation Error
The CI is failing because node.dht() returns Option<&Arc<DhtNetworkManager>>, not &Arc<DhtNetworkManager> directly. You need to handle the Option:
// Current (broken):
let closest_nodes = node
.dht()
.find_closest_nodes(target, CLOSE_GROUP_SIZE)
.await
.map_err(...)?;
// Fixed:
let dht = node.dht().ok_or_else(|| {
Error::Network("DHT not available".into())
})?;
let closest_nodes = dht
.find_closest_nodes(target, CLOSE_GROUP_SIZE)
.await
.map_err(|e| Error::Network(format!("Kademlia closest-nodes lookup failed: {e}")))?;🟢 Good News: XOR Sorting Concern is Invalid
Both Copilot and Greptile raised concerns about not explicitly sorting by XOR distance. I verified that find_closest_nodes in saorsa-core does return results already sorted by XOR distance (see src/dht/core_engine.rs:224: candidates.sort_by(|a, b| a.1.cmp(&b.1))). So your .find() approach is correct - the first non-self peer IS the closest.
📝 Suggestions (non-blocking)
- Consider adding a unit test with mocked DHT results to verify the peer selection logic
- Consider making
CLOSE_GROUP_SIZEconfigurable viaQuantumConfigfor deployment flexibility
Once the compilation error is fixed, I'll be happy to approve this. Great work! 🎉
dirvine
left a comment
There was a problem hiding this comment.
Update: Simpler Fix Available! 🎉
I just realized the root cause: the PR uses saorsa-core = "0.10.4" but v0.11.0 (just published) changed the dht() API from returning Option to returning &Arc<DhtNetworkManager> directly.
The Fix
Simply update Cargo.toml:
-saorsa-core = "0.10.4"
+saorsa-core = "0.11.0"This should make the code compile as-is without any changes to the pick_target_peer logic. The current implementation is correct for v0.11.0!
v0.11.0 was just published from PR #25 (feat/dht-transport-refactor) which cleaned up the DHT API.
cc @davidirvine
The DHT transport refactor in saorsa-core v0.11.0 changed the `dht()` method to return `&Arc<DhtNetworkManager>` directly instead of `Option`, which fixes the compilation error in the Kademlia chunk routing code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dirvine
left a comment
There was a problem hiding this comment.
✅ Approved!
I've pushed a commit that bumps saorsa-core to v0.11.0, which fixes the compilation error. The new version has a non-optional dht() API that returns &Arc<DhtNetworkManager> directly.
Changes Made
- Updated
Cargo.toml:saorsa-core = "0.10.4"→"0.11.0"
Verification
- ✅
cargo checkpasses - ✅
cargo clippy -- -D warningspasses - ✅ All 138 unit tests pass
Summary
This is a great contribution @mickvandijke! 🎉
- Correct approach: Using Kademlia DHT lookup for closest-peer chunk routing
- Correct implementation:
find_closest_nodesin saorsa-core already returns results sorted by XOR distance, so.find()correctly selects the closest non-self peer - Good tests: The
xor_distanceandpeer_id_to_xor_namehelpers have solid unit test coverage
Thanks for contributing to saorsa-node! 🙌
|
Thanks so much for this contribution @mickvandijke! 🙌 The Kademlia-based closest-peer selection is exactly the right approach for chunk routing. Great work on:
Looking forward to more contributions from you! 🚀 |
Add unit and e2e tests covering the remaining Section 18 scenarios: Unit tests (32 new): - Quorum: #4 fail→abandoned, #16 timeout→inconclusive, #27 single-round dual-evidence, #28 dynamic threshold undersized, #33 batched per-key, #34 partial response unresolved, #42 quorum-derived paid-list auth - Admission: #5 unauthorized peer, #7 out-of-range rejected - Config: #18 invalid config rejected, #26 dynamic paid threshold - Scheduling: #8 dedup safety, #8 replica/paid collapse - Neighbor sync: #35 round-robin cooldown skip, #36 cycle completion, #38 snapshot stability mid-join, #39 unreachable removal + slot fill, #40 cooldown peer removed, #41 cycle termination guarantee, consecutive rounds, cycle preserves sync times - Pruning: #50 hysteresis prevents premature delete, #51 timestamp reset on heal, #52 paid/record timestamps independent, #23 entry removal - Audit: #19/#53 partial failure mixed responsibility, #54 all pass, #55 empty failure discard, #56 repair opportunity filter, response count validation, digest uses full record bytes - Types: #13 bootstrap drain, repair opportunity edge cases, terminal state variants - Bootstrap claims: #46 first-seen recorded, #49 cleared on normal E2e tests (4 new): - #2 fresh offer with empty PoP rejected - #5/#37 neighbor sync request returns response - #11 audit challenge multi-key (present + absent) - Fetch not-found for non-existent key Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
find_closest_nodes) so chunks are routed to the peer whose BLAKE3-derived DHT key is closest to the chunk address by XOR distanceThis is still a WIP — further testing and refinement needed before merging.
Test plan
cargo test)🤖 Generated with Claude Code