Skip to content

Conversation

@mateeullahmalik
Copy link
Collaborator

@mateeullahmalik mateeullahmalik commented Nov 6, 2025



🧩 Summary

This PR completely overhauls the symbol retrieval pipeline to make it production-ready, memory-efficient, and network-optimized.
It introduces a streaming retrieval mechanism (BatchRetrieveStream) that:

  • Streams symbols directly to disk in the RaptorQ workspace — no in-memory accumulation.

  • Performs local-first retrieval using RetrieveBatchValues in 5k-key batches.

  • Introduces a deterministic primary-provider algorithm to ensure that each key is initially requested from exactly one node, maximizing unique symbol coverage.

  • Adds multi-wave fallback logic — if primaries fail or are slow, subsequent waves fetch missing keys from alternate top-K nodes.

  • Implements strict per-node payload caps (perNodeRequestCap = 600 ⇒ ~36 MB @ 60 KB/symbol).

  • Enforces two-level concurrency limits for predictable performance (fetchSymbolsBatchConcurrency = 8, storeSameSymbolsBatchConcurrency = 4).

  • Uses atomic early-stop and cancellation to halt network fetches the moment 17 % symbol-threshold is reached.

  • Maintains global de-duplication via a concurrent resSeen map to avoid duplicate writes or wasted network calls.


🧠 Design Goals

Goal Implementation
✅ Reduce memory footprint Stream symbols directly to disk instead of holding in maps
✅ Increase unique symbol coverage Deterministic primary-provider assignment per key
✅ Prevent duplicate network fetches Global resSeen dedup + per-wave single-provider requests
✅ Bound payload sizes perNodeRequestCap = 600 (~36 MB per RPC)
✅ Control concurrency safely Outer (8 batches) × Inner (4 node RPCs) = 32 total concurrent calls
✅ Stop early once threshold met needNetwork = required - foundLocalCount propagated to all levels
✅ Preserve redundancy Multi-wave rotation of providers per key ((base+wave)%K)
✅ Remain topology-agnostic Works cleanly with 50+ validators and XOR-balanced keyspace

🧪 Testing Plan

  • Unit:

    • Mock DHT with in-memory store → verify early exit when foundLocalCount ≥ required.

    • Simulate 50 nodes with XOR spread → confirm unique provider per key per wave.

    • Validate per-node request count ≤ 600.

  • Integration (testnet):

    • Spin up 50-validator cluster.

    • Upload 1 GB file → measure retrieval time, memory footprint, and bandwidth.

    • Verify file reconstruction hash matches action’s dataHash.

    • Validate logs show expected early-stop (found_network ≥ needNetwork).


🧾 Key Files Changed

  • pkg/dht/retrieve_stream.go

    • New BatchRetrieveStream, processBatchStream, and iterateBatchGetValuesStream with streaming and waves.

  • pkg/dht/local.go

    • Added fetchAndWriteLocalKeysBatched for batched local streaming.

  • pkg/dht/constants.go

    • Added tuning constants and perNodeRequestCap.


🧰 Backwards Compatibility

✅ 100 % backwards-compatible.
Old BatchRetrieve API remains untouched; BatchRetrieveStream is a non-breaking enhancement used by cascade restore path.


🧠 Reviewer Notes

  • Verify doBatchGetValuesCall observes ctx deadlines (it should).

  • Confirm s.ht.closestContactsWithIncludingNode returns up-to-date routing info (top-K list correctness).

  • Look out for any log spam; consider lowering some debug levels to trace if necessary.


✅ Checklist

  • Local streaming verified

  • Primary-provider waves tested

  • Concurrency limits validated (8×4 = 32 RPCs max)

  • No over-fetch or memory blow-up

  • File reconstruction verified via hash match

  • Integration test passed on 50-validator testnet


TL;DR:
This PR makes the supernode’s symbol retrieval fast, predictable, and production-grade — zero memory pressure, bounded concurrency, and smarter use of the network.


@mateeullahmalik mateeullahmalik changed the title optimize symbol fetch Optimize DHT Symbol Retrieval: Primary-Provider Waves, Streamed Writes, and Bounded Concurrency Nov 6, 2025
@mateeullahmalik mateeullahmalik self-assigned this Nov 6, 2025
@mateeullahmalik mateeullahmalik merged commit 8aa95cc into master Nov 7, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants