Optimize DHT Symbol Retrieval: Primary-Provider Waves, Streamed Writes, and Bounded Concurrency #227

mateeullahmalik · 2025-11-06T22:09:56Z

🧩 Summary

This PR completely overhauls the symbol retrieval pipeline to make it production-ready, memory-efficient, and network-optimized.
It introduces a streaming retrieval mechanism (BatchRetrieveStream) that:

Streams symbols directly to disk in the RaptorQ workspace — no in-memory accumulation.
Performs local-first retrieval using RetrieveBatchValues in 5k-key batches.
Introduces a deterministic primary-provider algorithm to ensure that each key is initially requested from exactly one node, maximizing unique symbol coverage.
Adds multi-wave fallback logic — if primaries fail or are slow, subsequent waves fetch missing keys from alternate top-K nodes.
Implements strict per-node payload caps (perNodeRequestCap = 600 ⇒ ~36 MB @ 60 KB/symbol).
Enforces two-level concurrency limits for predictable performance (fetchSymbolsBatchConcurrency = 8, storeSameSymbolsBatchConcurrency = 4).
Uses atomic early-stop and cancellation to halt network fetches the moment 17 % symbol-threshold is reached.
Maintains global de-duplication via a concurrent resSeen map to avoid duplicate writes or wasted network calls.

🧠 Design Goals

Goal	Implementation
✅ Reduce memory footprint	Stream symbols directly to disk instead of holding in maps
✅ Increase unique symbol coverage	Deterministic primary-provider assignment per key
✅ Prevent duplicate network fetches	Global resSeen dedup + per-wave single-provider requests
✅ Bound payload sizes	perNodeRequestCap = 600 (~36 MB per RPC)
✅ Control concurrency safely	Outer (8 batches) × Inner (4 node RPCs) = 32 total concurrent calls
✅ Stop early once threshold met	needNetwork = required - foundLocalCount propagated to all levels
✅ Preserve redundancy	Multi-wave rotation of providers per key ((base+wave)%K)
✅ Remain topology-agnostic	Works cleanly with 50+ validators and XOR-balanced keyspace

🧪 Testing Plan

Unit:
- Mock DHT with in-memory store → verify early exit when foundLocalCount ≥ required.
- Simulate 50 nodes with XOR spread → confirm unique provider per key per wave.
- Validate per-node request count ≤ 600.
Integration (testnet):
- Spin up 50-validator cluster.
- Upload 1 GB file → measure retrieval time, memory footprint, and bandwidth.
- Verify file reconstruction hash matches action’s dataHash.
- Validate logs show expected early-stop (found_network ≥ needNetwork).

🧾 Key Files Changed

pkg/dht/retrieve_stream.go
- New BatchRetrieveStream, processBatchStream, and iterateBatchGetValuesStream with streaming and waves.
pkg/dht/local.go
- Added fetchAndWriteLocalKeysBatched for batched local streaming.
pkg/dht/constants.go
- Added tuning constants and perNodeRequestCap.

🧰 Backwards Compatibility

✅ 100 % backwards-compatible.
Old BatchRetrieve API remains untouched; BatchRetrieveStream is a non-breaking enhancement used by cascade restore path.

🧠 Reviewer Notes

Verify doBatchGetValuesCall observes ctx deadlines (it should).
Confirm s.ht.closestContactsWithIncludingNode returns up-to-date routing info (top-K list correctness).
Look out for any log spam; consider lowering some debug levels to trace if necessary.

✅ Checklist

Local streaming verified
Primary-provider waves tested
Concurrency limits validated (8×4 = 32 RPCs max)
No over-fetch or memory blow-up
File reconstruction verified via hash match
Integration test passed on 50-validator testnet

TL;DR:
This PR makes the supernode’s symbol retrieval fast, predictable, and production-grade — zero memory pressure, bounded concurrency, and smarter use of the network.

…s, and Bounded Concurrency

Optimize DHT Symbol Retrieval: Primary-Provider Waves, Streamed Write…

f4877b0

…s, and Bounded Concurrency

mateeullahmalik force-pushed the optimizeSymbolsFetch branch from fb94027 to f4877b0 Compare November 6, 2025 23:08

mateeullahmalik changed the title ~~optimize symbol fetch~~ Optimize DHT Symbol Retrieval: Primary-Provider Waves, Streamed Writes, and Bounded Concurrency Nov 6, 2025

mateeullahmalik requested review from a-ok123 and j-rafique November 6, 2025 23:10

mateeullahmalik self-assigned this Nov 6, 2025

j-rafique approved these changes Nov 7, 2025

View reviewed changes

mateeullahmalik merged commit 8aa95cc into master Nov 7, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize DHT Symbol Retrieval: Primary-Provider Waves, Streamed Writes, and Bounded Concurrency #227

Optimize DHT Symbol Retrieval: Primary-Provider Waves, Streamed Writes, and Bounded Concurrency #227

Uh oh!

mateeullahmalik commented Nov 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize DHT Symbol Retrieval: Primary-Provider Waves, Streamed Writes, and Bounded Concurrency #227

Optimize DHT Symbol Retrieval: Primary-Provider Waves, Streamed Writes, and Bounded Concurrency #227

Uh oh!

Conversation

mateeullahmalik commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧩 Summary

🧠 Design Goals

🧪 Testing Plan

🧾 Key Files Changed

🧰 Backwards Compatibility

🧠 Reviewer Notes

✅ Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mateeullahmalik commented Nov 6, 2025 •

edited

Loading