Skip to content

chore: pre-release hardening — strict finite inputs, doc/assert fixes (council review)#12

Merged
Fieldnote-Echo merged 3 commits into
mainfrom
chore/pre-release-hardening
May 23, 2026
Merged

chore: pre-release hardening — strict finite inputs, doc/assert fixes (council review)#12
Fieldnote-Echo merged 3 commits into
mainfrom
chore/pre-release-hardening

Conversation

@Fieldnote-Echo
Copy link
Copy Markdown
Owner

Pre-release hardening (council review)

Addresses the pre-upstream review — must-fixes MF1–4, the perf-comment overclaim, and the test-checklist gaps. No behaviour change for finite inputs.

MF3 — strict non-finite input policy

Public add / search / probe entry points now reject NaN/±Inf fail-fast via a shared util::assert_all_finite guard, across Rank, RankQuant, Bitmap, SignBitmap (plus the doc-hidden RankQuantFastscan and experimental MultiBucketBitmap, for a uniform surface). Matches the crate's existing assert-on-contract-violation style; the FFI pre-validates separately.

Note: strict rejection makes the OrderedFloat-vs-partial_cmp().unwrap_or(Equal) ordering split the council flagged unreachable (non-finite never reaches a sort), so the comparator is intentionally left unchanged — no churn. tests/index/finite.rs asserts the contract per type.

MF1 — misplaced doc block

The search_asymmetric_subset doc block was attached above write, so generated docs described write as subset scoring. Moved it onto search_asymmetric_subset (leaving only the .tvrq line on write) and folded in the gather / large-M operating-regime note (tracked: #11).

MF2 — k_eff == 0 early return

search_asymmetric_subset now early-returns on k_eff == 0, matching search / search_asymmetric / the standalone path.

MF4 — divisibility assert

rankquant_bytes_per_vec asserts d % codes_per_byte == 0 (previously silently floored for invalid standalone calls), mirroring pack_buckets and RankQuant::new.

Perf comment (fiction-free hygiene)

The batched-bitmap scores buffer (~6.6 MB at B=8/N=207k) is L3-resident, not per-core L2; per-query slices are ~828 KiB. Corrected the overclaim — comment only.

Tests

Also

fuzz/Cargo.lock: synced ordvec 0.1.0 → 0.2.0 (stale since the #7 version bump; surfaced by the fuzz-build gate).

Verification

  • cargo fmt --all --check, cargo clippy --all-targets --all-features -- -D warnings — clean
  • cargo test 85 (default); --features experimental + --no-default-features — green
  • cargo +1.89.0 build (MSRV); cargo build --locked; cargo doc --no-deps --all-features0 warnings; cargo +nightly fuzz build

…s (council review)

Addresses the pre-upstream review: MF1-4, the perf-comment overclaim, and the
test-checklist gaps.

- MF3 (strict non-finite policy): public add/search/probe entry points reject
  NaN/+-Inf via a shared `util::assert_all_finite` guard -- Rank, RankQuant,
  Bitmap, SignBitmap, plus the doc-hidden Fastscan and experimental
  MultiBucketBitmap. Strict rejection makes the OrderedFloat-vs-partial_cmp
  ordering split unreachable (non-finite never reaches a sort), so the
  comparator is left as-is. New tests/index/finite.rs asserts the contract per
  type.
- MF1: relocate the `search_asymmetric_subset` doc block off `write` (generated
  docs were describing `write` as subset scoring) and add the gather / large-M
  operating-regime note (see #11).
- MF2: `search_asymmetric_subset` early-returns on `k_eff == 0`, matching the
  other search paths.
- MF4: `rankquant_bytes_per_vec` asserts `d % codes_per_byte == 0` (was silently
  flooring for invalid standalone calls), mirroring pack_buckets / RankQuant::new.
- Perf comment: the batched-bitmap scores buffer is L3-resident (~6.6 MB), not
  per-core L2; per-query slices are ~828 KiB. Corrected overclaim, no code change.
- Test: rankquant_asymmetric_correct_on_simd_invalid_dims exercises 48/80/20/36
  (constructor-valid, SIMD-lane-invalid) so select_simd_tier's fallback is
  regression-guarded. (#7/#8 checklist items were already covered.)
- fuzz/Cargo.lock: sync ordvec 0.1.0 -> 0.2.0 (stale since the #7 version bump).

No behaviour change for finite inputs. Gate: fmt + clippy -D warnings; tests 85
(default), experimental + --no-default-features green; MSRV 1.89; docs 0 warnings;
fuzz build.
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Pre-release hardening: strict finite inputs, doc/assert fixes

🐞 Bug fix ✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Add strict finite-input validation across all public entry points
  - Reject NaN/±Inf fail-fast via shared assert_all_finite guard
  - Covers Rank, RankQuant, Bitmap, SignBitmap, RankQuantFastscan, MultiBucketBitmap
• Fix misplaced search_asymmetric_subset documentation block
  - Moved from write method to correct method location
  - Added gather/large-M operating-regime notes
• Add early-return for k_eff == 0 in search_asymmetric_subset
• Add divisibility assertion in rankquant_bytes_per_vec
• Correct L3 cache residency comment in batched-bitmap scores buffer
• Add regression test for SIMD dispatch fallback paths
• Update fuzz dependency: ordvec 0.1.0 → 0.2.0
Diagram
flowchart LR
  A["Public API Entry Points<br/>add/search/probe"] -->|"assert_all_finite"| B["Finite Input Validation"]
  B -->|"Pass"| C["Rank Transform &<br/>Scoring"]
  B -->|"Fail"| D["Panic: Non-finite Values"]
  C --> E["Deterministic Results"]
  F["rankquant_bytes_per_vec"] -->|"assert divisibility"| G["Valid Dimension Check"]
  H["search_asymmetric_subset"] -->|"early return"| I["k_eff == 0 Handling"]

Loading

File Changes

1. src/bitmap.rs ✨ Enhancement +10/-4

Add finite-input validation to Bitmap entry points

src/bitmap.rs


2. src/fastscan.rs ✨ Enhancement +3/-1

Add finite-input validation to RankQuantFastscan methods

src/fastscan.rs


3. src/multi_bucket.rs ✨ Enhancement +2/-0

Add finite-input validation to MultiBucketBitmap methods

src/multi_bucket.rs


View more (7)
4. src/quant.rs 🐞 Bug fix +25/-11

Add finite validation, fix doc block, add k_eff early return

src/quant.rs


5. src/rank.rs 🐞 Bug fix +9/-1

Add finite validation, divisibility assert, Rank search guards

src/rank.rs


6. src/sign_bitmap.rs ✨ Enhancement +2/-0

Add finite-input validation to SignBitmap entry points

src/sign_bitmap.rs


7. src/util.rs ✨ Enhancement +16/-0

Implement shared assert_all_finite validation function

src/util.rs


8. tests/index/finite.rs 🧪 Tests +46/-0

Add contract tests for strict finite-input policy

tests/index/finite.rs


9. tests/index/main.rs 🧪 Tests +1/-0

Register new finite-input contract test module

tests/index/main.rs


10. tests/index/quant.rs 🧪 Tests +34/-0

Add SIMD dispatch fallback regression test

tests/index/quant.rs


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 23, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (1)

Grey Divider


Action required

1. RankQuantFastscan not #[doc(hidden)] ✓ Resolved 📘 Rule violation ⚙ Maintainability
Description
The RankQuantFastscan type definition is public but is not directly annotated with
#[doc(hidden)], relying only on a doc-hidden re-export. This violates the requirement that the
item definition itself be doc-hidden to prevent it from appearing in generated documentation via
other paths.
Code

src/fastscan.rs[R43-44]

Evidence
PR Compliance ID 737894 requires RankQuantFastscan to be immediately preceded by #[doc(hidden)]
on the same item. In src/fastscan.rs, the pub struct RankQuantFastscan definition lacks that
attribute.

Rule 737894: Annotate RankQuantFastscan with #[doc(hidden)]
src/fastscan.rs[492-503]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`RankQuantFastscan` must be annotated with `#[doc(hidden)]` on the item definition itself.

## Issue Context
Even if the crate-root re-export is `#[doc(hidden)]`, the compliance requirement explicitly demands the attribute on the `RankQuantFastscan` definition (not only on a containing module or re-export).

## Fix Focus Areas
- src/fastscan.rs[492-503]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Byte-LUT accepts NaN/Inf ✓ Resolved 🐞 Bug ≡ Correctness
Description
The public search_asymmetric_byte_lut entry point normalizes and scores queries without
assert_all_finite, so NaN/±Inf can propagate into l2_normalise/TopK and yield silently wrong
results instead of failing fast like RankQuant::search_asymmetric. This breaks the PR’s new
“strict finite inputs” contract for a re-exported API.
Code

src/quant.rs[248]

Evidence
search_asymmetric_byte_lut currently normalizes/scans queries without any finiteness guard, while
RankQuant::search_asymmetric now asserts finiteness; NaN then propagates through l2_normalise,
and TopK uses floating comparisons where NaN is never “better”, yielding incorrect results.

src/quant.rs[245-252]
src/quant.rs[714-758]
src/util.rs[28-35]
src/util.rs[101-124]
src/lib.rs[47-50]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`search_asymmetric_byte_lut()` is publicly re-exported and accepts `queries: &[f32]` but does not call `assert_all_finite(queries)`. With NaN/Inf present, `l2_normalise()` can produce NaN outputs and `TopK::maybe_insert()` comparisons won’t behave correctly, resulting in incorrect/empty top-k rather than a fail-fast panic (inconsistent with the newly-hardened `RankQuant` methods).

### Issue Context
This PR adds `assert_all_finite` guards to `RankQuant::search_asymmetric` (and other entrypoints), establishing a strict finite-input policy. The byte-LUT helper is still an externally callable API (`pub use quant::{search_asymmetric_byte_lut, RankQuant};`) and should enforce the same contract.

### Fix Focus Areas
- src/quant.rs[714-765]
- src/util.rs[28-35]
- src/util.rs[101-124]
- src/lib.rs[47-50]
- tests/index/finite.rs[1-46]

### Suggested fix
1. In `src/quant.rs::search_asymmetric_byte_lut`, add `assert_all_finite(queries);` immediately after the shape assertion (`assert_eq!(queries.len(), nq * dim);`).
2. Add a `#[should_panic(expected = "non-finite")]` test covering `search_asymmetric_byte_lut` in `tests/index/finite.rs` (similar to the existing RankQuant/Bitmap/SignBitmap checks).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. search_asymmetric_subset allocates sub_packed 📎 Requirement gap ➹ Performance
Description
RankQuant::search_asymmetric_subset still gathers candidate docs into a contiguous sub_packed
scratch buffer, allocating and copying m * bytes_per_vec bytes before scanning. This violates the
requirement for an in-place, index-mapped subset scan with zero gather/allocation on the hot path.
Code

src/quant.rs[R469-474]

Evidence
PR Compliance ID 737895 requires the subset path to avoid gathering into a contiguous scratch buffer
and instead scan in-place from the original packed storage via an optional candidates index map. The
implementation still allocates sub_packed and copies each candidate’s packed bytes into it before
dispatching the SIMD/scalar scan.

Asymmetric subset search supports index-mapped in-place scan (no gather buffer)
src/quant.rs[469-519]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`RankQuant::search_asymmetric_subset` currently gathers candidate docs into a contiguous `sub_packed` buffer (`vec![0u8; m * bpv]`) and then scans that buffer. Compliance requires a gather-free, in-place scan over the original packed storage, using an optional candidate-index map.

## Issue Context
The success criteria for this PR compliance item requires subset scanning to read each candidate doc directly from `self.packed` via `base = (candidates[di] if present else di) * bytes_per_vec`, without constructing `sub_packed`.

## Fix Focus Areas
- src/quant.rs[458-573]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
4. Bitmap query builder allows NaN ✓ Resolved 🐞 Bug ☼ Reliability
Description
Bitmap::build_query_bitmap_fp32 is a public query/probe entry point that does not call
assert_all_finite, allowing NaN/±Inf to reach partial_cmp(...).unwrap_or(Equal) sorting and
produce unspecified rankings. This violates the PR’s new strict finite-input contract and is
inconsistent with other Bitmap methods that now fail fast.
Code

src/bitmap.rs[104]

Evidence
build_query_bitmap_fp32 is public and performs a sort using partial_cmp().unwrap_or(Equal)
without finiteness validation; meanwhile other Bitmap entrypoints now call assert_all_finite,
creating an inconsistent public surface where NaN/Inf may still slip through.

src/bitmap.rs[76-96]
src/bitmap.rs[101-105]
src/util.rs[38-52]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`Bitmap::build_query_bitmap_fp32()` is `pub` and can be called directly by users, but it does not enforce the new finite-input contract. It sorts indices using `partial_cmp(...).unwrap_or(Ordering::Equal)`; with NaN present, comparisons return `None` and the fallback ordering can yield incorrect/unspecified top-`n_top` selection.

### Issue Context
This PR adds `assert_all_finite` to `Bitmap::add`, `Bitmap::search`, `top_m_candidates`, and `top_m_candidates_batched`, but not to `build_query_bitmap_fp32` itself, leaving a public hole in the strict-policy surface.

### Fix Focus Areas
- src/bitmap.rs[76-96]
- src/util.rs[38-52]
- tests/index/finite.rs[1-46]

### Suggested fix
1. Add `assert_all_finite(q);` inside `build_query_bitmap_fp32` (after `assert_eq!(q.len(), self.dim);`).
2. Add a `#[should_panic(expected = "non-finite")]` test for `Bitmap::build_query_bitmap_fp32` in `tests/index/finite.rs` (or extend the existing bitmap test to cover this method explicitly).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive input validation to ensure all vector and query data are finite, preventing non-deterministic behavior caused by NaN or infinite values. Key changes include the addition of an assert_all_finite utility used across Rank, RankQuant, and Bitmap implementations, a new dimension alignment check in rankquant_bytes_per_vec, and updated documentation regarding cache residency. New tests were added to verify the finite-input enforcement and SIMD dispatch logic. A review comment correctly identified misleading documentation for search_asymmetric_subset, which incorrectly described the return values and ID mapping process.

Comment thread src/quant.rs Outdated
Comment thread src/quant.rs
Comment thread src/fastscan.rs
Comment thread src/quant.rs
Comment thread src/bitmap.rs
…stop-gate)

The MF3 hardening missed four directly-callable public embedding entry points
that bypass the type add/search boundaries:

- rank::rank_transform / rank_transform_into (the public rank primitives)
- search_asymmetric_byte_lut (re-exported at the crate root)
- Bitmap::build_query_bitmap_fp32

Add assert_all_finite to each. Pure additions, defense-in-depth: every public
embedding entry now self-validates; internal delegation may double-check, which
is negligible. MultiBucketBitmap's `w` weight matrix is intentionally left out
-- it is a trusted weight param (debug_assert on length, not a release check),
not an embedding. tests/index/finite.rs covers the three directly-testable
paths.

Gate: fmt + clippy -D warnings; tests 88 (default) / experimental green; docs 0 warnings.
…review)

- search_asymmetric_subset doc: it returns (scores, global doc IDs) -- local
  candidate positions are mapped back internally. Corrects the relocated doc
  block, which wrongly described candidate positions + caller-side mapping (gemini).
- RankQuantFastscan: add #[doc(hidden)] on the struct definition itself, not only
  the crate-root re-export (qodo).
@Fieldnote-Echo Fieldnote-Echo merged commit fead2d2 into main May 23, 2026
7 checks passed
@Fieldnote-Echo Fieldnote-Echo deleted the chore/pre-release-hardening branch May 23, 2026 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant