Skip to content

Fix incoming edges ghost memory leak (MOD-13761)#920

Merged
meiravgri merged 9 commits intomainfrom
meiravgri/mod-13761-incoming-edges-benchmarks
Mar 25, 2026
Merged

Fix incoming edges ghost memory leak (MOD-13761)#920
meiravgri merged 9 commits intomainfrom
meiravgri/mod-13761-incoming-edges-benchmarks

Conversation

@meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Mar 23, 2026

Problem

When vectors are deleted from an HNSW index, the incomingUnidirectionalEdges vectors on neighboring nodes grow during insertion but never shrink after deletion. This causes "ghost memory" — allocated but unused capacity that accumulates over time.

In stress scenarios (e.g. COSINE metric with hub nodes), incoming edge vectors can grow to thousands of entries and retain all that capacity after the edges are removed.

Fix

Add amortized shrink_to_fit() inside removeIncomingUnidirectionalEdgeIfExists() in graph_data.h:

constexpr size_t INCOMING_EDGES_SHRINK_RATIO = 2;

bool removeIncomingUnidirectionalEdgeIfExists(idType node_id) {
    bool result = this->incomingUnidirectionalEdges->remove(node_id);

    if (result) {
        auto &vec = *this->incomingUnidirectionalEdges;
        if (vec.capacity() > INCOMING_EDGES_SHRINK_RATIO * vec.size()) {
            vec.shrink_to_fit();
        }
    }

    return result;
}

The ratio threshold ensures amortized O(1) cost — shrink_to_fit() only fires when capacity exceeds 2× the actual size, avoiding reallocation churn on every removal.

Results

Measured with a stress scenario: 40K random + 50K zero vectors, COSINE metric, M=16, EF_C=200, 1 background thread.

Metric Before Fix After Fix (Ratio=2)
Ghost memory after deletion 5,339 KB 116 KB (98% reduction)
Async delete latency 3,031 ms 3,034 ms (+0.1%)
InPlace delete latency 5,462 ms 5,575 ms (+2.1%)
Insert latency 95 ms 95 ms (0%)

Full benchmark details and raw results: Confluence page

Benchmarks added

This PR also adds a benchmark suite (bm_incoming_edges) to measure the fix:

  • DeleteZeroVectorsAsync — async deletion path (production default)
  • DeleteZeroVectorsInPlace — synchronous deletion (worst-case latency)
  • InsertZeroVectorsTimed — insertion path (heuristic pruning triggers shrink too)
make benchmark BM_FILTER=bm-hnsw-internals-incoming-edges

Add three benchmarks to measure performance and memory impact of the
incoming edges shrink_to_fit fix:

1. DeleteZeroVectorsAsync - async deletion path (production default)
2. DeleteZeroVectorsInPlace - in-place deletion path (worst-case latency)
3. InsertZeroVectorsTimed - insertion path (heuristic pruning cost)

Stress scenario: 40K random + 50K zero vectors with COSINE metric,
which forces hub nodes with large incoming edge vectors. Each benchmark
measures ghost memory (wasted capacity) before and after shrink_to_fit,
with detailed stats (percentiles, top-10, mean).

Run with: make benchmark BM_FILTER=bm-index-internals-incoming-edges
@jit-ci
Copy link

jit-ci bot commented Mar 23, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

@codecov
Copy link

codecov bot commented Mar 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.01%. Comparing base (3dd2dc2) to head (874e2fa).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #920      +/-   ##
==========================================
+ Coverage   96.98%   97.01%   +0.02%     
==========================================
  Files         129      129              
  Lines        7567     7572       +5     
==========================================
+ Hits         7339     7346       +7     
+ Misses        228      226       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

There are 4 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

@meiravgri meiravgri changed the title Add incoming edges ghost memory benchmarks (MOD-13761) Fix incoming edges ghost memory leak (MOD-13761) Mar 24, 2026
@meiravgri meiravgri requested a review from alonre24 March 24, 2026 11:06
Copy link
Collaborator

@alonre24 alonre24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very Nice!!

Comment on lines +162 to +203
// Sort for percentiles and top-10
std::vector<size_t> sorted_sizes(all_sizes);
std::vector<size_t> sorted_caps(all_caps);
std::sort(sorted_sizes.begin(), sorted_sizes.end());
std::sort(sorted_caps.begin(), sorted_caps.end());

// Percentile helper (nearest-rank method)
auto percentile = [](const std::vector<size_t> &sorted, double p) -> size_t {
if (sorted.empty())
return 0;
size_t idx = static_cast<size_t>(p / 100.0 * sorted.size());
if (idx >= sorted.size())
idx = sorted.size() - 1;
return sorted[idx];
};

size_t p50_size = percentile(sorted_sizes, 50);
size_t p90_size = percentile(sorted_sizes, 90);
size_t p99_size = percentile(sorted_sizes, 99);
size_t max_size = sorted_sizes.empty() ? 0 : sorted_sizes.back();

size_t p50_cap = percentile(sorted_caps, 50);
size_t p90_cap = percentile(sorted_caps, 90);
size_t p99_cap = percentile(sorted_caps, 99);
size_t max_cap = sorted_caps.empty() ? 0 : sorted_caps.back();

// --- Report index memory via benchmark counter ---
state.counters["index_memory"] = hnsw_->getAllocationSize();

// --- Print detailed distribution to stdout ---
std::cout << "\n=== Incoming Edges Stats"
<< (iteration >= 0 ? " (iter=" + std::to_string(iteration) + ")" : "")
<< " ===" << std::endl;
std::cout << " Nodes: " << num_elements << " Level entries: " << total_vectors
<< " Non-empty: " << non_empty_count << std::endl;
std::cout << " Wasted bytes: " << wasted_bytes << " (used=" << total_used_bytes
<< ", alloc=" << total_alloc_bytes << ")" << std::endl;
std::cout << " Size - mean: " << mean_size << " p50: " << p50_size
<< " p90: " << p90_size << " p99: " << p99_size << " max: " << max_size
<< std::endl;
std::cout << " Cap - mean: " << mean_cap << " p50: " << p50_cap << " p90: " << p90_cap
<< " p99: " << p99_cap << " max: " << max_cap << std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing all of these stats and keeping only the most relevant stats to be output in state.counters

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example output:

--- Async iteration 0: After deletion (before shrink) ---
=== Incoming Edges Stats (iter=0) ===
  Nodes: 40000  Level entries: 42595  Non-empty: 14867
  Wasted bytes: 60188  (used=277196, alloc=337384)
  Size  - mean: 1.62693  p50: 0  p90: 6  p99: 14  max: 43
  Cap   - mean: 1.98019  p50: 0  p90: 7  p99: 20  max: 50
  Top-10 by size: [43, 37, 34, 31, 31, 31, 31, 29, 29, 29]
  Top-10 by cap:  [50, 46, 46, 44, 44, 44, 43, 42, 42, 40]

--- Async iteration 0: After shrink (baseline) ---
=== Incoming Edges Stats (iter=0) ===
  Nodes: 40000  Level entries: 42595  Non-empty: 14867
  Wasted bytes: 0  (used=277196, alloc=277196)
  Size  - mean: 1.62693  p50: 0  p90: 6  p99: 14  max: 43
  Cap   - mean: 1.62693  p50: 0  p90: 6  p99: 14  max: 43
  Top-10 by size: [43, 37, 34, 31, 31, 31, 31, 29, 29, 29]
  Top-10 by cap:  [43, 37, 34, 31, 31, 31, 31, 29, 29, 29]

This is a diagnostic benchmark for investigating memory issues, not a regression benchmark. When run manually, seeing the wasted bytes breakdown, node count, and top-10 by capacity directly in the console gives the full picture without needing a separate analysis script. state.counters only supports scalars and can't capture this.

I'll remove the size/cap distribution lines (mean, p50, p90, etc.) and add wasted_bytes to state.counters so the most important metric is also in the JSON output.

@meiravgri meiravgri requested a review from alonre24 March 25, 2026 06:26
@meiravgri meiravgri added this pull request to the merge queue Mar 25, 2026
Merged via the queue into main with commit 2f21358 Mar 25, 2026
30 checks passed
@meiravgri meiravgri deleted the meiravgri/mod-13761-incoming-edges-benchmarks branch March 25, 2026 11:39
@meiravgri
Copy link
Collaborator Author

/backport

@github-actions
Copy link

Git push to origin failed for 8.2 with exitcode 1

@meiravgri
Copy link
Collaborator Author

/backport

@github-actions
Copy link

Git push to origin failed for 8.2 with exitcode 1

meiravgri added a commit that referenced this pull request Mar 25, 2026
* Add incoming edges ghost memory benchmarks (MOD-13761)

Add three benchmarks to measure performance and memory impact of the
incoming edges shrink_to_fit fix:

1. DeleteZeroVectorsAsync - async deletion path (production default)
2. DeleteZeroVectorsInPlace - in-place deletion path (worst-case latency)
3. InsertZeroVectorsTimed - insertion path (heuristic pruning cost)

Stress scenario: 40K random + 50K zero vectors with COSINE metric,
which forces hub nodes with large incoming edge vectors. Each benchmark
measures ghost memory (wasted capacity) before and after shrink_to_fit,
with detailed stats (percentiles, top-10, mean).

Run with: make benchmark BM_FILTER=bm-index-internals-incoming-edges

* results before

* shrinking logic

* fix uncoditionally shrink

* add bm-index-internals-incoming-edges

* use 1 thread

* use ratio = 2, remove min

* remove results before

* rename to bm-hnsw-internals-incoming-edges

better output
github-merge-queue bot pushed a commit that referenced this pull request Mar 25, 2026
Fix incoming edges ghost memory leak (MOD-13761) (#920)

* Add incoming edges ghost memory benchmarks (MOD-13761)

Add three benchmarks to measure performance and memory impact of the
incoming edges shrink_to_fit fix:

1. DeleteZeroVectorsAsync - async deletion path (production default)
2. DeleteZeroVectorsInPlace - in-place deletion path (worst-case latency)
3. InsertZeroVectorsTimed - insertion path (heuristic pruning cost)

Stress scenario: 40K random + 50K zero vectors with COSINE metric,
which forces hub nodes with large incoming edge vectors. Each benchmark
measures ghost memory (wasted capacity) before and after shrink_to_fit,
with detailed stats (percentiles, top-10, mean).

Run with: make benchmark BM_FILTER=bm-index-internals-incoming-edges

* results before

* shrinking logic

* fix uncoditionally shrink

* add bm-index-internals-incoming-edges

* use 1 thread

* use ratio = 2, remove min

* remove results before

* rename to bm-hnsw-internals-incoming-edges

better output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants