chore: pre-release hardening — strict finite inputs, doc/assert fixes (council review)#12
Conversation
…s (council review) Addresses the pre-upstream review: MF1-4, the perf-comment overclaim, and the test-checklist gaps. - MF3 (strict non-finite policy): public add/search/probe entry points reject NaN/+-Inf via a shared `util::assert_all_finite` guard -- Rank, RankQuant, Bitmap, SignBitmap, plus the doc-hidden Fastscan and experimental MultiBucketBitmap. Strict rejection makes the OrderedFloat-vs-partial_cmp ordering split unreachable (non-finite never reaches a sort), so the comparator is left as-is. New tests/index/finite.rs asserts the contract per type. - MF1: relocate the `search_asymmetric_subset` doc block off `write` (generated docs were describing `write` as subset scoring) and add the gather / large-M operating-regime note (see #11). - MF2: `search_asymmetric_subset` early-returns on `k_eff == 0`, matching the other search paths. - MF4: `rankquant_bytes_per_vec` asserts `d % codes_per_byte == 0` (was silently flooring for invalid standalone calls), mirroring pack_buckets / RankQuant::new. - Perf comment: the batched-bitmap scores buffer is L3-resident (~6.6 MB), not per-core L2; per-query slices are ~828 KiB. Corrected overclaim, no code change. - Test: rankquant_asymmetric_correct_on_simd_invalid_dims exercises 48/80/20/36 (constructor-valid, SIMD-lane-invalid) so select_simd_tier's fallback is regression-guarded. (#7/#8 checklist items were already covered.) - fuzz/Cargo.lock: sync ordvec 0.1.0 -> 0.2.0 (stale since the #7 version bump). No behaviour change for finite inputs. Gate: fmt + clippy -D warnings; tests 85 (default), experimental + --no-default-features green; MSRV 1.89; docs 0 warnings; fuzz build.
Review Summary by QodoPre-release hardening: strict finite inputs, doc/assert fixes
WalkthroughsDescription• Add strict finite-input validation across all public entry points - Reject NaN/±Inf fail-fast via shared assert_all_finite guard - Covers Rank, RankQuant, Bitmap, SignBitmap, RankQuantFastscan, MultiBucketBitmap • Fix misplaced search_asymmetric_subset documentation block - Moved from write method to correct method location - Added gather/large-M operating-regime notes • Add early-return for k_eff == 0 in search_asymmetric_subset • Add divisibility assertion in rankquant_bytes_per_vec • Correct L3 cache residency comment in batched-bitmap scores buffer • Add regression test for SIMD dispatch fallback paths • Update fuzz dependency: ordvec 0.1.0 → 0.2.0 Diagramflowchart LR
A["Public API Entry Points<br/>add/search/probe"] -->|"assert_all_finite"| B["Finite Input Validation"]
B -->|"Pass"| C["Rank Transform &<br/>Scoring"]
B -->|"Fail"| D["Panic: Non-finite Values"]
C --> E["Deterministic Results"]
F["rankquant_bytes_per_vec"] -->|"assert divisibility"| G["Valid Dimension Check"]
H["search_asymmetric_subset"] -->|"early return"| I["k_eff == 0 Handling"]
File Changes1. src/bitmap.rs
|
Code Review by Qodo
1.
|
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive input validation to ensure all vector and query data are finite, preventing non-deterministic behavior caused by NaN or infinite values. Key changes include the addition of an assert_all_finite utility used across Rank, RankQuant, and Bitmap implementations, a new dimension alignment check in rankquant_bytes_per_vec, and updated documentation regarding cache residency. New tests were added to verify the finite-input enforcement and SIMD dispatch logic. A review comment correctly identified misleading documentation for search_asymmetric_subset, which incorrectly described the return values and ID mapping process.
…stop-gate) The MF3 hardening missed four directly-callable public embedding entry points that bypass the type add/search boundaries: - rank::rank_transform / rank_transform_into (the public rank primitives) - search_asymmetric_byte_lut (re-exported at the crate root) - Bitmap::build_query_bitmap_fp32 Add assert_all_finite to each. Pure additions, defense-in-depth: every public embedding entry now self-validates; internal delegation may double-check, which is negligible. MultiBucketBitmap's `w` weight matrix is intentionally left out -- it is a trusted weight param (debug_assert on length, not a release check), not an embedding. tests/index/finite.rs covers the three directly-testable paths. Gate: fmt + clippy -D warnings; tests 88 (default) / experimental green; docs 0 warnings.
…review) - search_asymmetric_subset doc: it returns (scores, global doc IDs) -- local candidate positions are mapped back internally. Corrects the relocated doc block, which wrongly described candidate positions + caller-side mapping (gemini). - RankQuantFastscan: add #[doc(hidden)] on the struct definition itself, not only the crate-root re-export (qodo).
Pre-release hardening (council review)
Addresses the pre-upstream review — must-fixes MF1–4, the perf-comment overclaim, and the test-checklist gaps. No behaviour change for finite inputs.
MF3 — strict non-finite input policy
Public
add/search/ probe entry points now rejectNaN/±Inffail-fast via a sharedutil::assert_all_finiteguard, across Rank, RankQuant, Bitmap, SignBitmap (plus the doc-hiddenRankQuantFastscanand experimentalMultiBucketBitmap, for a uniform surface). Matches the crate's existing assert-on-contract-violation style; the FFI pre-validates separately.Note: strict rejection makes the
OrderedFloat-vs-partial_cmp().unwrap_or(Equal)ordering split the council flagged unreachable (non-finite never reaches a sort), so the comparator is intentionally left unchanged — no churn.tests/index/finite.rsasserts the contract per type.MF1 — misplaced doc block
The
search_asymmetric_subsetdoc block was attached abovewrite, so generated docs describedwriteas subset scoring. Moved it ontosearch_asymmetric_subset(leaving only the.tvrqline onwrite) and folded in the gather / large-M operating-regime note (tracked: #11).MF2 —
k_eff == 0early returnsearch_asymmetric_subsetnow early-returns onk_eff == 0, matchingsearch/search_asymmetric/ the standalone path.MF4 — divisibility assert
rankquant_bytes_per_vecassertsd % codes_per_byte == 0(previously silently floored for invalid standalone calls), mirroringpack_bucketsandRankQuant::new.Perf comment (fiction-free hygiene)
The batched-bitmap scores buffer (~6.6 MB at B=8/N=207k) is L3-resident, not per-core L2; per-query slices are ~828 KiB. Corrected the overclaim — comment only.
Tests
rankquant_asymmetric_correct_on_simd_invalid_dims— runs 48/80/20/36 (constructor-valid but SIMD-lane-invalid) throughsearch_asymmetricand checks against the scalar reference, regression-guarding theselect_simd_tierfallback. (Checklist refactor: OrdVec ontology rebrand #7 constant-composition + perf: optimize symmetric rank-cosine search (centre-drop identity) #8 tie-determinism were already covered.)tests/index/finite.rs—#[should_panic]contract tests for the strict finite policy, one per substrate type.Also
fuzz/Cargo.lock: syncedordvec0.1.0 → 0.2.0(stale since the #7 version bump; surfaced by the fuzz-build gate).Verification
cargo fmt --all --check,cargo clippy --all-targets --all-features -- -D warnings— cleancargo test85 (default);--features experimental+--no-default-features— greencargo +1.89.0 build(MSRV);cargo build --locked;cargo doc --no-deps --all-features— 0 warnings;cargo +nightly fuzz build