Skip to content

Pre-flip: remediate release-gate audit (8 findings) + full Python parity#23

Merged
Fieldnote-Echo merged 10 commits into
mainfrom
fix/pre-flip-audit-remediation
May 25, 2026
Merged

Pre-flip: remediate release-gate audit (8 findings) + full Python parity#23
Fieldnote-Echo merged 10 commits into
mainfrom
fix/pre-flip-audit-remediation

Conversation

@Fieldnote-Echo
Copy link
Copy Markdown
Owner

Summary

Pre-public-flip hardening. This remediates an external release-gate static audit of the crate and bindings (8 findings), adds complete Python↔Rust API parity, and closes a Codex stop-review finding (a core panic reachable from the Python boundary). Every finding was triaged against the actual source and fixed at the source — no severity downgrades, no deferrals to issues.

Publish remains held: the release workflows are workflow_dispatch-only and nothing here triggers a publish.

Audit findings (all fixed)

# Finding Fix
1 / 6 Bitmap::new accepted dim > u16::MAX → later panic (Rust + Python) Core + Python Bitmap cap dim ≤ MAX_DIM; matches Rank/RankQuant and the .tvbm loader
2 Release workflows weaker than CI; deny.toml never run cargo-deny job added to ci.yml; both release workflows now require the full ci/python matrices to be green for the release SHA before publish
3 Loaders validated structure but not semantic invariants .tvr permutation, .tvrq bucket-composition, .tvbm popcount checks; .tvsb documented as legitimately structural-only
4 Stale tie-break xfail in test_bitmap.py Flipped to a passing regression test (+ cross-check vs search())
5 top_m_candidates_batched_chunked(batch_size=0) panicked batch_size > 0 guard + checked_mul; Python clamps batch_size to the query count
7 Python exposed a subset of the Rust API Full parity (below)
8 PyPI README links 404; Windows classifier missing Absolute license URLs; Operating System :: Microsoft :: Windows added

Codex stop-review follow-up

rank_to_bucket did d as u32, which truncates a d ≥ 2³² to zero → divide-by-zero. Newly reachable because the binding exposes rank_to_bucket/bucket_ranks with d as a free argument. Fixed in the core with u64 math (bit-identical over the realistic d ≤ u16::MAX domain) so Rust callers are protected too, plus the chunked-batch batch_size clamp above. Swept the rest of the new FFI surface: every entry point now turns a core assert!/overflow into a typed ValueError/IndexError, never a PanicException.

Full Python parity (finding 7)

Anything callable in Rust is now callable from Python:

  • All classes: is_empty.
  • Bitmap: build_query_bitmap_fp32, top_m_candidates_batched, top_m_candidates_batched_chunked, body_overlap_scores_subset.
  • SignBitmap: build_query_bitmap.
  • Module level: rank_transform, rank_to_bucket, bucket_ranks, pack_buckets, unpack_buckets, rankquant_bytes_per_vec, bucket_centre, rank_norm, rankquant_norm, search_asymmetric_byte_lut, and constants MAX_DIM / MAX_SIGN_BITMAP_DIM / MAX_VECTORS.

The low-level rank_io read/write functions are deliberately not duplicated — the class write()/load() methods already cover that capability, and exposing raw tuple-returning loaders would be parity of implementation detail, not API.

Test plan (local gate — mirrors CI)

  • cargo fmt --all --check, cargo clippy --all-targets --all-features -- -D warnings
  • cargo test (default / --features experimental / --no-default-features) — index harness 39 → 45 (loader-validation module + 2 should_panic guards), lib units 19 → 20
  • cargo +1.89.0 build + test (MSRV), cargo build --locked, RUSTFLAGS="-D warnings" cargo build, cargo +nightly fuzz build
  • cargo deny check → advisories / bans / licenses / sources ok
  • Binding: cargo fmt -p ordvec-python --check, cargo clippy -p ordvec-python --all-targets -- -D warnings, maturin develop + pytest147 passed, 0 failed, 0 xfail (108 + 1 xfail before)

Note for releasing

The new require-ci-green gate queries ci.yml/python.yml runs by head_sha, so release from a commit that went through main's CI (the normal flow). It fails loud with guidance if CI isn't green for the SHA.

- Bitmap::new caps dim at MAX_DIM (u16::MAX). A larger dim is still a multiple
  of 64 but cannot be rank-transformed (u16 ranks) or query-indexed (u16 ids),
  so it previously built an index that panicked on the first add/search and
  disagreed with the .tvbm loader (which already caps dim). Now consistent with
  the Rank/RankQuant constructors.
- Bitmap::top_m_candidates_batched_chunked guards batch_size > 0 and uses
  checked_mul for batch_size * dim, so a zero or overflowing batch_size fails
  loud instead of panicking inside par_chunks.
- rank_to_bucket computes the bucket in u64: `d as u32` could truncate a large
  d to zero and divide-by-zero. Bit-identical over the realistic d <= u16::MAX
  domain; removes a panic otherwise reachable where d is a free argument.

Tests: should_panic guards for the dim cap and zero batch_size, plus a
large-d rank_to_bucket regression (64-bit).
Loaders validated structure (magic/version/dim/n_vectors/payload length) but
accepted well-shaped payloads that violate the type's semantic invariant,
silently corrupting scores instead of failing. Add per-row checks at the
loader boundary:

- .tvr (Rank): each row must be a permutation of [0, dim), not merely bounded
  by dim — rank_norm assumes a permutation.
- .tvrq (RankQuant): each document must have uniform bucket composition
  (dim / 2^bits per bucket), which rankquant_norm assumes.
- .tvbm (Bitmap): each document must have exactly n_top bits set.
- .tvsb (SignBitmap): documented as legitimately structural-only — any bit
  pattern is a valid sign bitmap, so there is nothing further to verify.

Tests: a new loader_validation module pairs a valid-file positive control with
a corrupted-but-well-shaped negative case per format.
Bring the ordvec Python bindings to parity with the Rust crate's public
surface so anything callable in Rust is callable from Python:

- All four classes gain is_empty.
- Bitmap gains build_query_bitmap_fp32, top_m_candidates_batched,
  top_m_candidates_batched_chunked, and body_overlap_scores_subset.
- SignBitmap gains build_query_bitmap.
- The rank-math primitives (rank_transform, rank_to_bucket, bucket_ranks,
  pack_buckets, unpack_buckets, rankquant_bytes_per_vec, bucket_centre,
  rank_norm, rankquant_norm), the byte-LUT helper search_asymmetric_byte_lut,
  and the limit constants MAX_DIM / MAX_SIGN_BITMAP_DIM / MAX_VECTORS are
  exposed at module level. The low-level rank_io read/write functions are
  intentionally not duplicated — the class write()/load() methods cover them.

Every new entry point validates inputs at the FFI boundary (typed ValueError /
IndexError, never a PanicException): Bitmap rejects dim > u16::MAX; the chunked
candidate path clamps batch_size to the query count to avoid batch_size * dim
overflow; body_overlap_scores_subset checks q_bitmap length, doc-id bounds, and
ascending order; the primitives validate bits and array lengths.

Also flips the stale tie-break xfail in test_bitmap.py to a passing regression
test — the composite-key (score desc, doc_id asc) determinism fix it described
is in the core and Rust-tested.
- Add a cargo-deny job to ci.yml so deny.toml (advisories, license allow-list,
  banned/duplicate crates, source allow-list) is actually enforced rather than
  only checklisted in the PR template.
- release-crate.yml and release-python.yml now require the ci (and, for the
  wheel, python) workflow to have concluded success for the exact release SHA
  before publishing, so a manual release dispatch can't ship a commit whose
  full matrix (lint, MSRV, the test configs, deps + cargo-deny, AVX-512-SDE,
  wasm-simd128) has not gone green — without duplicating that matrix.
- README license links use absolute repository URLs: PyPI renders the README
  on the project page, where relative links (LICENSE-MIT/LICENSE-APACHE) 404.
- Add the Windows operating-system classifier — release-python.yml builds and
  tests a windows-latest wheel, but the classifier set listed only Linux/macOS.
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Pre-flip: remediate release-gate audit (8 findings) + full Python parity

🐞 Bug fix ✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Harden core constructors and kernels against out-of-domain inputs
  - Bitmap::new caps dim at MAX_DIM (u16::MAX); top_m_candidates_batched_chunked guards
  batch_size > 0 with checked_mul
  - rank_to_bucket uses u64 math to prevent divide-by-zero from large d values
• Validate semantic invariants on index load (permutation, composition, popcount)
  - .tvr loader verifies each row is a permutation of [0, dim)
  - .tvrq loader checks uniform bucket composition (dim / 2^bits per bucket)
  - .tvbm loader validates exactly n_top bits set per document
• Mirror full Rust public API in Python bindings for complete parity
  - Add 10 module-level rank-math primitives (rank_transform, rank_to_bucket, bucket_ranks,
  etc.)
  - Expose 3 loader limit constants (MAX_DIM, MAX_SIGN_BITMAP_DIM, MAX_VECTORS)
  - Add missing class methods (is_empty, build_query_bitmap*, body_overlap_scores_subset,
  top_m_candidates_batched*)
• Strengthen release workflows with full CI matrix gate and cargo-deny supply-chain checks
  - Add cargo-deny job to ci.yml for advisories/licenses/bans/sources validation
  - Require both ci.yml and python.yml to pass before publishing to crates.io/PyPI
Diagram
flowchart LR
  A["Core Hardening<br/>Constructors & Kernels"] --> B["Input Validation<br/>Typed Exceptions"]
  C["Semantic Validation<br/>Loaders"] --> D["Reject Corrupted<br/>Payloads"]
  E["Python API Expansion<br/>Rank Math + Constants"] --> F["Full Rust Parity<br/>10 Functions + 3 Constants"]
  G["Release Workflow<br/>Strengthening"] --> H["cargo-deny +<br/>CI Matrix Gate"]
  B --> I["Safe Public API"]
  D --> I
  F --> I
  H --> I

Loading

File Changes

1. src/bitmap.rs 🐞 Bug fix +21/-1

Cap dim at MAX_DIM and guard batch_size

src/bitmap.rs


2. src/rank.rs 🐞 Bug fix +18/-2

Use u64 math in rank_to_bucket to prevent divide-by-zero

src/rank.rs


3. src/rank_io.rs 🐞 Bug fix +77/-9

Validate semantic invariants on all index loaders

src/rank_io.rs


View more (15)
4. ordvec-python/src/lib.rs ✨ Enhancement +426/-0

Expose full Rust API with 10 rank-math primitives

ordvec-python/src/lib.rs


5. ordvec-python/python/ordvec/__init__.py ✨ Enhancement +45/-2

Export module-level functions and constants for parity

ordvec-python/python/ordvec/init.py


6. tests/index/bitmap.rs 🧪 Tests +23/-0

Add should_panic guards for dim cap and batch_size

tests/index/bitmap.rs


7. tests/index/loader_validation.rs 🧪 Tests +143/-0

New module testing semantic invariant validation on load

tests/index/loader_validation.rs


8. tests/index/main.rs 🧪 Tests +1/-0

Register new loader_validation test module

tests/index/main.rs


9. ordvec-python/tests/test_bitmap.py 🧪 Tests +133/-18

Add tests for new Bitmap methods and edge cases

ordvec-python/tests/test_bitmap.py


10. ordvec-python/tests/test_rank.py 🧪 Tests +7/-0

Add is_empty method test for Rank class

ordvec-python/tests/test_rank.py


11. ordvec-python/tests/test_rank_quant.py 🧪 Tests +7/-0

Add is_empty method test for RankQuant class

ordvec-python/tests/test_rank_quant.py


12. ordvec-python/tests/test_sign_bitmap.py 🧪 Tests +18/-0

Add is_empty and build_query_bitmap tests

ordvec-python/tests/test_sign_bitmap.py


13. ordvec-python/tests/test_primitives.py 🧪 Tests +145/-0

New module testing rank-math primitives and constants

ordvec-python/tests/test_primitives.py


14. .github/workflows/ci.yml ⚙️ Configuration changes +17/-0

Add cargo-deny job for supply-chain policy enforcement

.github/workflows/ci.yml


15. .github/workflows/release-crate.yml ⚙️ Configuration changes +31/-1

Require full CI matrix green before crates.io publish

.github/workflows/release-crate.yml


16. .github/workflows/release-python.yml ⚙️ Configuration changes +31/-1

Require core and binding CI green before PyPI publish

.github/workflows/release-python.yml


17. ordvec-python/README.md 📝 Documentation +4/-2

Fix PyPI README links with absolute GitHub URLs

ordvec-python/README.md


18. ordvec-python/pyproject.toml ⚙️ Configuration changes +1/-0

Add Windows operating system classifier

ordvec-python/pyproject.toml


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 25, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Empty bucket_ranks panics ✓ Resolved 🐞 Bug ☼ Reliability
Description
The Python bucket_ranks() binding forwards an empty ranks slice into
ordvec_core::rank::bucket_ranks, which computes d=0 and immediately hits the core
rank_to_bucket assert d > 0, surfacing as a PanicException across the FFI.
Code

ordvec-python/src/lib.rs[R933-946]

Evidence
The binding calls the core bucket_ranks directly without checking ranks.len() > 0. Core
bucket_ranks derives d from ranks.len() and calls rank_to_bucket, which asserts d > 0, so
an empty input panics.

ordvec-python/src/lib.rs[933-946]
src/rank.rs[86-90]
src/rank.rs[68-76]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ordvec-python` exposes `bucket_ranks()` as a `#[pyfunction]`, but it does not handle the empty-array case. In core, `bucket_ranks()` calls `rank_to_bucket(r, d, bits)` with `d = ranks.len()`, and `rank_to_bucket` asserts `d > 0`. Passing an empty array from Python therefore triggers a Rust panic that crosses the boundary as `PanicException`.

## Issue Context
The PR’s stated goal is that FFI entry points convert core panics/asserts into typed Python exceptions; this is a remaining panic path reachable from Python.

## Fix Focus Areas
- ordvec-python/src/lib.rs[933-946]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Release CI gate off-branch ✓ Resolved 🐞 Bug ☼ Reliability
Description
The release workflows only query by head_sha and take the most recent run, but they do not
constrain the run to the intended ref (e.g. main), so a green run for the same SHA on another
branch/PR can satisfy the gate despite the workflow messaging requiring “Push to main”.
Code

.github/workflows/release-crate.yml[R63-70]

Evidence
Both release workflows use the GitHub API .../runs?head_sha=${SHA}&per_page=1 and only inspect
.workflow_runs[0].conclusion, with no check for head_branch/branch, despite the error message
asserting the commit should be pushed to main first.

.github/workflows/release-crate.yml[42-70]
.github/workflows/release-python.yml[91-119]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The release gating step checks workflow success using only `head_sha=${SHA}`. This does not enforce that the successful run occurred on the expected branch/ref (e.g. `main`), even though the workflow comment/error message indicates it must.

## Issue Context
`workflow_dispatch` can be run from non-main refs; without a branch/ref filter, a run from an unintended ref can satisfy the gate.

## Fix Focus Areas
- .github/workflows/release-crate.yml[61-70]
- .github/workflows/release-python.yml[108-118]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. pack_buckets masks invalid codes ✓ Resolved 🐞 Bug ≡ Correctness
Description
The newly-exposed Python pack_buckets() does not validate that each bucket is in [0, 1<<bits),
and the core implementation silently masks with b & mask, so out-of-range inputs are truncated and
produce an incorrect packed stream without any error.
Code

ordvec-python/src/lib.rs[R951-971]

Evidence
The Python wrapper forwards arbitrary u8 bucket values. The core implementation explicitly uses
(b & mask) when packing, which will truncate any value with high bits set, contradicting the
stated in-range requirement and yielding wrong packed bytes with no failure.

ordvec-python/src/lib.rs[951-971]
src/rank.rs[92-117]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`pack_buckets()` is now part of the Python public API. It validates `bits` and length, but does not validate bucket values are within `[0, 1 << bits)`. The Rust core currently masks (`b & mask`) which silently truncates invalid values, leading to incorrect packed outputs.

## Issue Context
This is a correctness footgun for Python callers, especially since the docstring/contract states buckets are in-range.

## Fix Focus Areas
- ordvec-python/src/lib.rs[951-971]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 923cd15f9a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ordvec-python/src/lib.rs
Comment thread ordvec-python/src/lib.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Pre-release hardening across the Rust core, persistence loaders, Python bindings, and release/CI workflows to address the stated release-gate audit findings and bring Python up to full parity with the Rust public API surface.

Changes:

  • Strengthen loader semantic validation for Rank / RankQuant / Bitmap formats and add targeted regression tests.
  • Prevent previously reachable panics (e.g., rank_to_bucket large-d, Bitmap::new oversize dim, chunked batching with batch_size=0/overflow).
  • Expand Python bindings to include missing methods, module-level rank primitives, and exposed constants; tighten release workflows to require green CI before publishing.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/index/main.rs Registers new loader semantic-validation test module.
tests/index/loader_validation.rs Adds round-trip + “well-shaped but semantically invalid” loader rejection/acceptance tests.
tests/index/bitmap.rs Adds regression guards for oversize dim and zero batch_size panics.
src/rank.rs Switches rank_to_bucket to u64 math and adds a large-d regression test.
src/rank_io.rs Enforces semantic invariants at load time (Rank permutation, RankQuant constant composition, Bitmap popcount), and documents SignBitmap structural-only validation.
src/bitmap.rs Adds dim <= MAX_DIM constructor guard; guards chunked batching against zero/overflow.
ordvec-python/tests/test_sign_bitmap.py Adds Python-level is_empty and build_query_bitmap tests for parity.
ordvec-python/tests/test_rank.py Adds Python-level is_empty parity test.
ordvec-python/tests/test_rank_quant.py Adds Python-level is_empty parity test.
ordvec-python/tests/test_primitives.py Adds coverage for newly exposed module-level primitives and constants.
ordvec-python/tests/test_bitmap.py Converts prior xfail to regression coverage; adds parity tests for new Bitmap APIs and argument guards.
ordvec-python/src/lib.rs Implements Python API parity: new methods, module-level primitives, input guards to avoid PanicException, and exports constants.
ordvec-python/README.md Fixes license links to non-404 absolute URLs.
ordvec-python/python/ordvec/init.py Re-exports new public primitives/constants and updates __all__.
ordvec-python/pyproject.toml Adds Windows classifier.
.github/workflows/release-python.yml Adds a “require CI green for SHA” gate before publishing wheels/sdist.
.github/workflows/release-crate.yml Adds a “require CI green for SHA” gate before publishing crate.
.github/workflows/ci.yml Adds cargo-deny supply-chain policy job.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ordvec-python/tests/test_primitives.py Outdated
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands the Python API for the ordvec package by exposing module-level rank-math primitives, loader limit constants, and advanced Bitmap search methods such as batched and chunked candidate generation. In the Rust core, it introduces semantic validation during index loading to ensure data integrity (e.g., verifying permutations and constant-composition invariants) and improves numerical robustness by using u64 arithmetic for bucket calculations. Feedback was provided regarding potential usize overflows on 32-bit architectures when calculating the capacity for result vectors in the Python bindings.

Comment thread ordvec-python/src/lib.rs Outdated
Comment thread ordvec-python/src/lib.rs Outdated
Bot-review remediation (qodo / gemini / copilot / Codex) on PR #23:

- pack_buckets rejects bucket codes outside [0, 1<<bits) with a ValueError
  instead of letting the core silently mask them (`b & mask`) to a different
  bucket (qodo, Correctness).
- Bitmap::top_m_candidates_batched / _batched_chunked and
  SignBitmap::top_m_candidates_batched use checked_mul for the result-vector
  capacity, so an oversized batch*m can't overflow usize and panic in
  Array2::from_shape_vec (gemini).
- rank_transform test uses a stable inner argsort to match the core's
  "ties broken by index" contract (robust to a float32 tie), plus an explicit
  tie-break test (copilot).
- Regression test pinning bucket_ranks([]) -> []: the empty-input panic flagged
  by Codex/qodo is not reachable — the core maps over the empty slice and never
  calls rank_to_bucket, so its `d > 0` assert is unreached, and non-empty input
  always has d >= 1 (verified empirically).

pytest 147 -> 150, all green.
The require-ci-green gate queried ci.yml/python.yml runs by head_sha and took
the most recent, so a green run for the same commit SHA on an unrelated branch
could satisfy it (qodo). Filter on `branch=main` and require a run with
`head_branch == "main" and conclusion == "success"`, matching the gate's
"push to main" error message.
@Fieldnote-Echo
Copy link
Copy Markdown
Owner Author

Bot review — cycle 1 triage (resolved)

Investigated every finding from copilot / gemini / qodo / Codex and remediated or resolved each. Fixes in 8155f13 (binding) + 96cf4a3 (CI). Local gate re-verified green; pytest 147 → 150.

Finding Source Disposition
pack_buckets silently masks out-of-range codes (b & mask) qodo (Correctness) Fixed 8155f13 — boundary now rejects codes outside [0, 1<<bits) with ValueError
batch * m_eff / n_queries * m_eff can overflow usize gemini ×2 Fixed 8155f13checked_mulValueError on the two Bitmap batched paths (+ SignBitmap.top_m_candidates_batched for parity)
rank_transform test can go flaky on a float32 tie copilot Fixed 8155f13 — stable inner argsort to match the "ties broken by index" contract, + an explicit tie-break test
Release CI-green gate satisfiable by a same-SHA run on another branch qodo (Reliability) Fixed 96cf4a3 — gate now filters branch=main and requires head_branch == "main" && conclusion == "success"
Empty bucket_ranks panics via rank_to_bucket d > 0 assert Codex (P1) + qodo (Bug) Not an issue (verified empirically) — see below; pinned with a regression test

On the empty-bucket_ranks panic (Codex P1 + qodo)

This path is not reachable. ordvec.bucket_ranks(np.array([], dtype=np.uint16), 2) returns array([], dtype=uint8) with no exception:

>>> import numpy as np, ordvec
>>> ordvec.bucket_ranks(np.array([], dtype=np.uint16), 2)
array([], dtype=uint8)

The core bucket_ranks maps over the rank slice, so on an empty input the closure — and therefore the d > 0 assert inside rank_to_bucket — is never invoked. Non-empty input always has d = len >= 1. I added test_bucket_ranks_empty_returns_empty to pin the empty → empty contract against future regressions.

All five inline threads are resolved. Ready for the next review cycle.

The cycle-1 checked_mul guards sized the output buffer (batch * m_eff) and ran
*after* top_m_candidates_batched, but the core allocates `scores = vec![0u32;
batch * n]` *inside* that call — a larger quantity (n >= m_eff) that would
overflow before the guard ran (Codex stop-review). Guard `batch * max(n, qpv)`
(the core's scores + query-bitmap buffers) BEFORE the call in all three batched
wrappers (Bitmap batched / chunked, SignBitmap batched), so an overflow is a
clean ValueError instead of a wrap -> out-of-bounds panic. The output-buffer
guard is kept (now provably safe).
bucket_ranks([]) already returned [] (the core maps over the empty slice and
never calls rank_to_bucket, so its `d > 0` assert is unreachable — verified
empirically), but the static analyzers (qodo Bug / Codex P1) keep flagging the
empty path because they cannot see the iterator short-circuit. Add an explicit
`slice.is_empty()` early-return so the empty -> empty contract is local to the
boundary and the d == 0 case provably never reaches the core. Behavior
unchanged; the existing test_bucket_ranks_empty_returns_empty regression still
pins empty -> empty.
… APIs

Bitmap::top_m_candidates_batched and SignBitmap::top_m_candidates_batched
allocated `vec![0u32; batch * n]` and `vec![0u64; batch * qpv]` with unchecked
multiplication. On a 32-bit target (wasm32, which this crate supports) a
moderate corpus plus a large query batch can overflow usize, silently
under-sizing those buffers and then indexing out of bounds. Compute the lengths
with checked_mul + a clear expect() so the failure is loud on every target.

The Python binding already guards these (it rejects/clamps before calling the
core), so this closes the Rust public-API / wasm32 side. Behavior unchanged for
non-overflowing inputs; gate green including the wasm32 +simd128 build.
@Fieldnote-Echo
Copy link
Copy Markdown
Owner Author

Merge-review follow-up

Closed (Medium): unchecked batch * n / batch * qpv in core batched APIs — f6add35

Bitmap::top_m_candidates_batched and SignBitmap::top_m_candidates_batched now size their scores (batch * n) and q_batch (batch * qpv) buffers with checked_mul().expect(...). On a 32-bit target (wasm32) a moderate corpus + large query batch could overflow usize, under-size the buffer and then index OOB; this now fails loud on every target. The Python binding already guarded this before calling the core (cade149), so this closes the Rust public-API / wasm32 side you flagged.

Gate re-run green including the wasm32 + simd128 lib build; cargo test (all configs), MSRV 1.89, cargo deny, and pytest (150) all pass.

Minor polish (rank_to_bucket rank >= d, bucket_centre bucket >= 1<<bits): left as-is, deliberately

These are total functions in the Rust core — no panic, no silent-corruption: rank_to_bucket caps an out-of-range rank at the top bucket, and bucket_centre returns a defined (if out-of-domain) centre. Adding Python-only ValueErrors would make the binding stricter than the Rust API, which contradicts this PR's parity goal (anything callable in Rust should be callable from Python with the same result).

This is the opposite of pack_buckets, where I did add a range check: there the core silently masks (b & mask), so pack(7)unpack3 is a silent round-trip corruption (a serialization footgun) rather than a defined cap. That asymmetry is intentional.

Happy to add the two domain ValueErrors anyway if you'd prefer the stricter Python surface — it's a small, deliberate parity divergence, your call.

@Fieldnote-Echo Fieldnote-Echo merged commit 982e91f into main May 25, 2026
17 checks passed
@Fieldnote-Echo Fieldnote-Echo deleted the fix/pre-flip-audit-remediation branch May 25, 2026 03:29
Fieldnote-Echo added a commit that referenced this pull request May 25, 2026
Addresses the PR #30 bot-review findings (gemini x5, copilot x1):

- gemini (high): the add() growth paths could overflow usize when computing
  the resize length on 32-bit targets (wasm32 / armv7), where MAX_VECTORS
  (2^26) * a large dim / qpv / bytes-per-vec (up to ~2^16) exceeds usize::MAX.
  Centralized the guard in checked_new_len, which now also takes
  elems_per_vec and checked_mul's new_n * elems_per_vec (fail-loud), matching
  the crate's checked-allocation discipline (cf. #23 candidate APIs,
  result_buffer_len). Callers pass dim / packed-bytes-per-vec / qwords-per-vec.
  Bounded in practice by Rust's <= isize::MAX-bytes Vec invariant, but now
  explicit. +unit test checked_new_len_rejects_buffer_overflow.
- copilot: the write-guard test's temp filenames used only the pid (which the
  OS reuses), so a leftover file from an aborted run could spuriously fail the
  !exists() checks. Added a per-run nanosecond nonce (matching `forge`).

(copilot's other finding — pub(crate) breaking the fuzz crate — was already
fixed in 9713f0d, before that review landed.)

Gate green: fmt, clippy -D warnings, rustdoc -D warnings, test
(default/experimental/no-default), MSRV 1.89, --locked, +nightly fuzz build.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants