Skip to content

SQ8-quantized sealed segments bypass HNSW graph navigation (O(N) brute-force per query) #40

@hollanf

Description

@hollanf

Once a vector segment is sealed and SQ8 quantization is built (automatic at ≥ 1000 live vectors), every subsequent search iterates every vector in the segment. HNSW's logarithmic advantage is discarded. Per-query latency scales linearly with segment size.

Summary

VectorCollection::search has two branches per sealed segment. When seg.sq8 is None, it calls seg.index.search(query, top_k, ef) which uses the HNSW graph correctly. When seg.sq8 is Some(_) — which is the common case after a segment crosses DEFAULT_SEAL_THRESHOLD and live_count() >= 1000 — it stops using the graph entirely and performs a full linear scan of the SQ8 bytes, then reranks.

Current code

nodedb-vector/src/collection/search.rs:22-48

let results = if let Some((codec, sq8_data)) = &seg.sq8 {
    // Quantized two-phase search.
    let rerank_k = top_k.saturating_mul(3).max(20);
    let mut candidates: Vec<(u32, f32)> = Vec::with_capacity(seg.index.len());
    let dim = seg.index.dim();
    for i in 0..seg.index.len() {                                    // ← iterates EVERY vector
        if seg.index.is_deleted(i as u32) {
            continue;
        }
        let sq8_vec = &sq8_data[i * dim..(i + 1) * dim];
        let d = match self.params.metric {
            DistanceMetric::L2 => codec.asymmetric_l2(query, sq8_vec),
            DistanceMetric::Cosine => codec.asymmetric_cosine(query, sq8_vec),
            DistanceMetric::InnerProduct => codec.asymmetric_ip(query, sq8_vec),
            _ => {
                let dequant = codec.dequantize(sq8_vec);
                distance(query, &dequant, self.params.metric)
            }
        };
        candidates.push((i as u32, d));
    }
    if candidates.len() > rerank_k {
        candidates.select_nth_unstable_by(rerank_k, |a, b| { ... });
        candidates.truncate(rerank_k);
    }
    // … phase-2 FP32 rerank on candidates …
} else {
    seg.index.search(query, top_k, ef)   // ← only HNSW path
};

Vec::with_capacity(seg.index.len()) and for i in 0..seg.index.len() confirm the intent: full linear scan. There is no graph traversal in this branch.

SQ8 auto-build trigger: nodedb-vector/src/collection/lifecycle.rs:231 — any sealed segment with live_count() >= 1000 gets SQ8 built automatically.

Why it's broken

  • HNSW's purpose is O(log N) ANN search. This branch is O(N) per sealed segment per query.
  • At 1 M vectors × 768 dim, each query reads ~768 MB of SQ8 bytes sequentially and computes 1 M distance calls before reranking. Per-query latency goes from millisecond to seconds.
  • With multiple sealed segments, the cost multiplies.
  • The two-phase design is correct in principle (approximate → rerank), but the candidate-generation step must use the HNSW graph (with SQ8 as the in-graph distance function), not a linear scan.
  • This also wastes all the memory spent maintaining the HNSW graph for sealed segments — the graph is built, stored, and then never consulted once SQ8 is on.

Reproduction

  1. Build a collection; insert ≥ ~100 k vectors so a segment seals and SQ8 is constructed (see lifecycle.rs:231).
  2. Run a single nearest-neighbour query.
  3. Profile: a flamegraph shows ~100 % of time in the for i in 0..seg.index.len() loop at collection/search.rs:27.
  4. Compare against the same collection with SQ8 disabled (or small enough not to trigger seal): HNSW path runs in milliseconds.

Notes

  • Found during a CPU/memory audit sweep of nodedb-vector/src/*.
  • Easy to verify with a criterion benchmark comparing sq8=None vs sq8=Some paths on the same dataset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions