Skip to content

perf: gather-free in-place subset scan for large M (FFI scalability) #11

@Fieldnote-Echo

Description

@Fieldnote-Echo

Summary

RankQuant::search_asymmetric_subset gathers the M candidate docs into a contiguous sub_packed buffer (m * bytes_per_vec) before the SIMD rerank. Negligible for the paper's operating regime (M ≈ 100–500 from a bitmap probe), but for large M (5000+, reachable via the planned Python FFI) the gather adds measurable overhead — candidate bytes move through memory ~3× (scattered gather-read + sub_packed write + contiguous scan-read).

Why it's low-fuss

The RankQuant asymmetric kernels (scan_b{2,4}_asym_avx512, scan_b{2,4}_asym_avx2, scan_via_lut_scalar) are per-doc / coordinate-parallel: each iteration slices one doc (&packed[di*bpv..]) and vectorizes across its coordinates. They don't need candidate docs to be adjacent — only renumbered into 0..m, which is the sole reason sub_packed exists.

Proposed change

Thread an optional candidate-index map through the kernels:

// inside `for di in 0..n`:
let base = match candidates { Some(c) => c[di] as usize, None => di } * bytes_per_vec;
let doc = &packed[base .. base + bytes_per_vec];   // SIMD body unchanged

search_asymmetric_subset then passes &self.packed + candidates instead of materializing sub_packed. Each doc read once, zero allocation. TopK still yields the local di, remapped via candidates[di] as today. The full-scan path passes None and is unchanged.

Caveat / scope

  • Win at large M (~⅓ the candidate-data traffic); roughly neutral at small M (the in-place pass is scattered vs the current contiguous scan). So this is a scalability fix, not a paper-path speedup.
  • Optional: sort candidates by id (or have the bitmap probe return them sorted) so the scattered scan becomes sequential-with-gaps (prefetch-friendly).

Acceptance

  • New exactness test: index-mapped path matches the current gather path bit-for-bit (template: rankquant_asymmetric_matches_reference_b{1,2,4}).
  • Best landed alongside the Python FFI (roadmap step 3), where large M becomes externally reachable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions