Fix incoming edges ghost memory leak (MOD-13761)#920
Conversation
Add three benchmarks to measure performance and memory impact of the incoming edges shrink_to_fit fix: 1. DeleteZeroVectorsAsync - async deletion path (production default) 2. DeleteZeroVectorsInPlace - in-place deletion path (worst-case latency) 3. InsertZeroVectorsTimed - insertion path (heuristic pruning cost) Stress scenario: 40K random + 50K zero vectors with COSINE metric, which forces hub nodes with large incoming edge vectors. Each benchmark measures ghost memory (wasted capacity) before and after shrink_to_fit, with detailed stats (percentiles, top-10, mean). Run with: make benchmark BM_FILTER=bm-index-internals-incoming-edges
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #920 +/- ##
==========================================
+ Coverage 96.98% 97.01% +0.02%
==========================================
Files 129 129
Lines 7567 7572 +5
==========================================
+ Hits 7339 7346 +7
+ Misses 228 226 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
There are 4 total unresolved issues (including 1 from previous review).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| // Sort for percentiles and top-10 | ||
| std::vector<size_t> sorted_sizes(all_sizes); | ||
| std::vector<size_t> sorted_caps(all_caps); | ||
| std::sort(sorted_sizes.begin(), sorted_sizes.end()); | ||
| std::sort(sorted_caps.begin(), sorted_caps.end()); | ||
|
|
||
| // Percentile helper (nearest-rank method) | ||
| auto percentile = [](const std::vector<size_t> &sorted, double p) -> size_t { | ||
| if (sorted.empty()) | ||
| return 0; | ||
| size_t idx = static_cast<size_t>(p / 100.0 * sorted.size()); | ||
| if (idx >= sorted.size()) | ||
| idx = sorted.size() - 1; | ||
| return sorted[idx]; | ||
| }; | ||
|
|
||
| size_t p50_size = percentile(sorted_sizes, 50); | ||
| size_t p90_size = percentile(sorted_sizes, 90); | ||
| size_t p99_size = percentile(sorted_sizes, 99); | ||
| size_t max_size = sorted_sizes.empty() ? 0 : sorted_sizes.back(); | ||
|
|
||
| size_t p50_cap = percentile(sorted_caps, 50); | ||
| size_t p90_cap = percentile(sorted_caps, 90); | ||
| size_t p99_cap = percentile(sorted_caps, 99); | ||
| size_t max_cap = sorted_caps.empty() ? 0 : sorted_caps.back(); | ||
|
|
||
| // --- Report index memory via benchmark counter --- | ||
| state.counters["index_memory"] = hnsw_->getAllocationSize(); | ||
|
|
||
| // --- Print detailed distribution to stdout --- | ||
| std::cout << "\n=== Incoming Edges Stats" | ||
| << (iteration >= 0 ? " (iter=" + std::to_string(iteration) + ")" : "") | ||
| << " ===" << std::endl; | ||
| std::cout << " Nodes: " << num_elements << " Level entries: " << total_vectors | ||
| << " Non-empty: " << non_empty_count << std::endl; | ||
| std::cout << " Wasted bytes: " << wasted_bytes << " (used=" << total_used_bytes | ||
| << ", alloc=" << total_alloc_bytes << ")" << std::endl; | ||
| std::cout << " Size - mean: " << mean_size << " p50: " << p50_size | ||
| << " p90: " << p90_size << " p99: " << p99_size << " max: " << max_size | ||
| << std::endl; | ||
| std::cout << " Cap - mean: " << mean_cap << " p50: " << p50_cap << " p90: " << p90_cap | ||
| << " p99: " << p99_cap << " max: " << max_cap << std::endl; |
There was a problem hiding this comment.
Consider removing all of these stats and keeping only the most relevant stats to be output in state.counters
There was a problem hiding this comment.
example output:
--- Async iteration 0: After deletion (before shrink) ---
=== Incoming Edges Stats (iter=0) ===
Nodes: 40000 Level entries: 42595 Non-empty: 14867
Wasted bytes: 60188 (used=277196, alloc=337384)
Size - mean: 1.62693 p50: 0 p90: 6 p99: 14 max: 43
Cap - mean: 1.98019 p50: 0 p90: 7 p99: 20 max: 50
Top-10 by size: [43, 37, 34, 31, 31, 31, 31, 29, 29, 29]
Top-10 by cap: [50, 46, 46, 44, 44, 44, 43, 42, 42, 40]
--- Async iteration 0: After shrink (baseline) ---
=== Incoming Edges Stats (iter=0) ===
Nodes: 40000 Level entries: 42595 Non-empty: 14867
Wasted bytes: 0 (used=277196, alloc=277196)
Size - mean: 1.62693 p50: 0 p90: 6 p99: 14 max: 43
Cap - mean: 1.62693 p50: 0 p90: 6 p99: 14 max: 43
Top-10 by size: [43, 37, 34, 31, 31, 31, 31, 29, 29, 29]
Top-10 by cap: [43, 37, 34, 31, 31, 31, 31, 29, 29, 29]
This is a diagnostic benchmark for investigating memory issues, not a regression benchmark. When run manually, seeing the wasted bytes breakdown, node count, and top-10 by capacity directly in the console gives the full picture without needing a separate analysis script. state.counters only supports scalars and can't capture this.
I'll remove the size/cap distribution lines (mean, p50, p90, etc.) and add wasted_bytes to state.counters so the most important metric is also in the JSON output.
better output
|
/backport |
|
Git push to origin failed for 8.2 with exitcode 1 |
|
/backport |
|
Git push to origin failed for 8.2 with exitcode 1 |
* Add incoming edges ghost memory benchmarks (MOD-13761) Add three benchmarks to measure performance and memory impact of the incoming edges shrink_to_fit fix: 1. DeleteZeroVectorsAsync - async deletion path (production default) 2. DeleteZeroVectorsInPlace - in-place deletion path (worst-case latency) 3. InsertZeroVectorsTimed - insertion path (heuristic pruning cost) Stress scenario: 40K random + 50K zero vectors with COSINE metric, which forces hub nodes with large incoming edge vectors. Each benchmark measures ghost memory (wasted capacity) before and after shrink_to_fit, with detailed stats (percentiles, top-10, mean). Run with: make benchmark BM_FILTER=bm-index-internals-incoming-edges * results before * shrinking logic * fix uncoditionally shrink * add bm-index-internals-incoming-edges * use 1 thread * use ratio = 2, remove min * remove results before * rename to bm-hnsw-internals-incoming-edges better output
Fix incoming edges ghost memory leak (MOD-13761) (#920) * Add incoming edges ghost memory benchmarks (MOD-13761) Add three benchmarks to measure performance and memory impact of the incoming edges shrink_to_fit fix: 1. DeleteZeroVectorsAsync - async deletion path (production default) 2. DeleteZeroVectorsInPlace - in-place deletion path (worst-case latency) 3. InsertZeroVectorsTimed - insertion path (heuristic pruning cost) Stress scenario: 40K random + 50K zero vectors with COSINE metric, which forces hub nodes with large incoming edge vectors. Each benchmark measures ghost memory (wasted capacity) before and after shrink_to_fit, with detailed stats (percentiles, top-10, mean). Run with: make benchmark BM_FILTER=bm-index-internals-incoming-edges * results before * shrinking logic * fix uncoditionally shrink * add bm-index-internals-incoming-edges * use 1 thread * use ratio = 2, remove min * remove results before * rename to bm-hnsw-internals-incoming-edges better output

Problem
When vectors are deleted from an HNSW index, the
incomingUnidirectionalEdgesvectors on neighboring nodes grow during insertion but never shrink after deletion. This causes "ghost memory" — allocated but unused capacity that accumulates over time.In stress scenarios (e.g. COSINE metric with hub nodes), incoming edge vectors can grow to thousands of entries and retain all that capacity after the edges are removed.
Fix
Add amortized
shrink_to_fit()insideremoveIncomingUnidirectionalEdgeIfExists()ingraph_data.h:The ratio threshold ensures amortized O(1) cost —
shrink_to_fit()only fires when capacity exceeds 2× the actual size, avoiding reallocation churn on every removal.Results
Measured with a stress scenario: 40K random + 50K zero vectors, COSINE metric, M=16, EF_C=200, 1 background thread.
Full benchmark details and raw results: Confluence page
Benchmarks added
This PR also adds a benchmark suite (
bm_incoming_edges) to measure the fix: