fix(core): cap add() growth at MAX_VECTORS; smoke-test the sdist install path#30
Conversation
…all path Pre-public final hardening — triage of a clean-context release-readiness audit. Two remediations; three low-value-but-real findings filed as issues. Core (closes #25; folds in the rank_io write-coherence concern): - New util::checked_new_len — a shared, fail-loud guard rejecting any add() that would grow an index past rank_io::MAX_VECTORS (the on-disk loader cap). Wired into Rank/RankQuant/Bitmap/SignBitmap add() before the resize, matching add()'s existing contract asserts (non-finite / bad-length). - Freezes the public contract as "an index holds at most MAX_VECTORS docs": candidate IDs are u32 and MAX_VECTORS << u32::MAX, so every emitted ID stays representable, and a built index always round-trips through write/load. This also makes the usize->u32 header cast in rank_io::write_* unreachable. Documents the new # Panics on each add(). CI (finding #3): - release-python.yml: install the freshly built sdist *from source* and run pytest before upload, exercising the maturin PEP 517 compile path that the prebuilt-wheel legs never cover. Validated locally — the sdist vendors the core crate, pip-installs from source, and all pytest pass. Deferred (low value, real): GIL release in the binding (#27), rank_to_bucket fail-loud consistency (#28), macos-13 release-wheel divergence (#29).
Review Summary by QodoCap index growth at MAX_VECTORS and smoke-test sdist install
WalkthroughsDescription• Cap add() growth at MAX_VECTORS to prevent index overflow - New util::checked_new_len() guard validates growth before resize - Wired into Rank, RankQuant, Bitmap, SignBitmap add() methods - Ensures candidate IDs remain representable as u32 and round-trip through write/load • Add sdist smoke test to CI pipeline - release-python.yml now installs freshly built sdist from source - Exercises maturin PEP 517 compile path not covered by prebuilt wheels - Runs full pytest suite before upload to catch packaging issues early • Document capacity contract and panic conditions - Added # Panics sections to all four add() methods - Clarifies MAX_VECTORS as hard capacity limit matching on-disk loader Diagramflowchart LR
A["add() calls<br/>Rank/RankQuant/Bitmap/SignBitmap"] -->|"checked_new_len()"| B["Validate growth<br/>against MAX_VECTORS"]
B -->|"passes"| C["Update n_vectors<br/>and resize storage"]
B -->|"exceeds limit"| D["Panic with<br/>clear error message"]
E["sdist build"] -->|"install from source"| F["Compile Rust extension<br/>via maturin PEP 517"]
F -->|"success"| G["Run pytest suite<br/>before upload"]
G -->|"all pass"| H["Upload to crates.io"]
File Changes1. src/rank.rs
|
Code Review by Qodo
1.
|
There was a problem hiding this comment.
Code Review
This pull request introduces a centralized validation utility, checked_new_len, to ensure that adding vectors to various index types (Bitmap, RankQuant, Rank, and SignBitmap) does not exceed the MAX_VECTORS capacity or overflow usize. It also includes comprehensive unit tests and updated documentation regarding panic conditions. The reviewer recommends further strengthening these checks by validating that the total memory allocation (document count multiplied by vector dimensions) does not overflow usize, particularly for 32-bit platforms where large indices could still cause issues.
…ound-trip docs Codex stop-time review: the MAX_VECTORS add() cap bounds the document *count*, not the byte payload, so it does not by itself guarantee a write/load round-trip — the loaders also enforce a separate 128 GiB MAX_PAYLOAD cap that an n_vectors cap doesn't bound (MAX_VECTORS allows ~8 TiB at dim=u16::MAX). An index with n_vectors <= MAX_VECTORS but a >128 GiB payload (e.g. ~20M x dim-4096) wrote fine but load() rejected it. - write_rank / write_rankquant / write_bitmap / write_sign_bitmap now call check_payload_bytes *before* File::create, mirroring each loader's checked-arithmetic payload computation. write() returns io::ErrorKind::InvalidData instead of emitting a file load() refuses, so write/load are symmetric (cf. check_sign_bitmap_dim, which exists for the same reason). Checking before File::create means a rejected oversized write cannot truncate an existing valid file. - Corrected the overclaim in the four add() # Panics docs and util::checked_new_len: the MAX_VECTORS cap bounds the count and keeps u32 IDs representable; the payload cap is an independent bound. The MAX_VECTORS and check_payload_bytes docs now state the two-cap contract. - Test: rank_io_writers_reject_oversized_payload_without_truncating — oversized dims (with empty payload slices, so no allocation) return InvalidData, create no file, and leave an existing valid file intact. Gate green: fmt, clippy -D warnings, test (default/experimental/no-default), MSRV 1.89, --locked.
…-fn) Codex stop-review #2: the previous commit over-claimed write/load "symmetry" for the public write_* free functions. Those are low-level trusted serializers — they do NOT re-validate dim / n_vectors / n_top / bits / divisibility / data semantics, all of which the loaders check, so a direct caller can still produce a file load_* rejects. The MAX_PAYLOAD guard closed only one of several asymmetries. The actual round-trip guarantee is type-level and was already designed in: Rank/RankQuant/Bitmap/SignBitmap::new validate dim / n_top / bits / divisibility against the loaders' bounds (Bitmap::new's comment says so explicitly), add() caps n_vectors, write() caps payload, and the types emit only loader-valid data — so anything T::write produces, T::load reloads. - rank_io module docs gain a "Round-trip contract" section: round-trip is a type-level guarantee; write_* are trusted serializers assuming loader-valid input; MAX_PAYLOAD is the one loader bound they also enforce (defense-in-depth + no truncation before File::create). - Dropped "symmetric write/load" from the four writer comments, check_payload_bytes, and the MAX_VECTORS doc. - Fixed a pre-existing module-doc inaccuracy: MAX_DIM * MAX_VECTORS is ~8 TiB, not 128 GiB — MAX_PAYLOAD is the binding byte ceiling. Doc/comment-only; no behavior change. Gate green: fmt, clippy -D warnings, rustdoc -D warnings, test (default/experimental/no-default).
…he persistence API Codex stop-review #2 follow-through (Nelson's call): the raw rank_io write_*/load_* free functions were public but are trusted serializers, not validated constructors — leaving them public invited the wrong mental model (and was the root of the retracted "write/load symmetry" claim). Close the door before crates.io locks the surface. - write_rank / load_rank / write_rankquant / load_rankquant / write_bitmap / load_bitmap / write_sign_bitmap / load_sign_bitmap -> pub(crate). The MAX_DIM / MAX_SIGN_BITMAP_DIM / MAX_VECTORS constants stay public. - The supported persistence API is now unambiguously the index types' write()/load(): Rank / RankQuant / Bitmap / SignBitmap. Module docs updated to a "Persistence API & round-trip contract" section. - Relocated the rank_io-layer tests to a src/rank_io.rs unit-test module (they need crate-internal access): the TV-DESER-004/005 loader red-team tests (from tests/redteam_delta.rs, now deleted) and the write-guard test (from tests/index/main.rs). Same coverage, co-located with the code. The Python binding only consumed the MAX_* constants (unchanged); examples and benches don't touch the free functions — no other fallout. Gate green: fmt, clippy -D warnings, rustdoc -D warnings, test (default/experimental/no-default), MSRV 1.89, --locked.
There was a problem hiding this comment.
Pull request overview
This PR hardens ordvec’s persistence and growth contracts before publication by enforcing consistent capacity limits across in-memory add() paths and on-disk writers/loaders, and by extending release CI to smoke-test the Python sdist-from-source install path.
Changes:
- Add a shared
util::checked_new_lenguard and wire it intoRank/RankQuant/Bitmap/SignBitmapadd()to cap growth atrank_io::MAX_VECTORS(and document the panic contract). - Enforce the loader’s
MAX_PAYLOADcap on allrank_io::write_*functions beforeFile::create, and make the rawwrite_*/load_*helperspub(crate)with relocated unit tests. - Update
release-python.ymlto install the built sdist and run pytest prior to upload.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
tests/redteam_delta.rs |
Removes integration tests that can no longer access crate-internal rank_io helpers (tests moved into unit tests). |
src/util.rs |
Adds checked_new_len + unit tests to cap in-memory growth at MAX_VECTORS. |
src/rank.rs |
Uses checked_new_len in Rank::add and documents panics. |
src/quant.rs |
Uses checked_new_len in RankQuant::add and documents panics. |
src/bitmap.rs |
Uses checked_new_len in Bitmap::add and documents panics. |
src/sign_bitmap.rs |
Uses checked_new_len in SignBitmap::add and documents panics. |
src/rank_io.rs |
Makes raw persistence helpers pub(crate), adds write-side MAX_PAYLOAD checks, updates docs, and relocates loader/writer tests. |
.github/workflows/release-python.yml |
Adds an sdist install-from-source + pytest smoke test to the release pipeline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e50a9e0 made rank_io::load_* pub(crate); the fuzz/ crate is a separate (workspace-excluded) crate that called ordvec::rank_io::load_* directly, so `cargo +nightly fuzz build` broke. (I missed fuzz/ when mapping the pub(crate) fallout — it is not in the workspace grep set.) Re-point all four loader targets at the public type methods — Rank::load / RankQuant::load / Bitmap::load / SignBitmap::load — which run the same loader code plus the type's post-load checks (the full public load path), so the no-panic fuzzing contract is preserved (arguably strengthened). Updated the target docs (the loaders are crate-internal now, not pub). Verified: cargo +nightly fuzz build green; fuzz fmt clean.
Addresses the PR #30 bot-review findings (gemini x5, copilot x1): - gemini (high): the add() growth paths could overflow usize when computing the resize length on 32-bit targets (wasm32 / armv7), where MAX_VECTORS (2^26) * a large dim / qpv / bytes-per-vec (up to ~2^16) exceeds usize::MAX. Centralized the guard in checked_new_len, which now also takes elems_per_vec and checked_mul's new_n * elems_per_vec (fail-loud), matching the crate's checked-allocation discipline (cf. #23 candidate APIs, result_buffer_len). Callers pass dim / packed-bytes-per-vec / qwords-per-vec. Bounded in practice by Rust's <= isize::MAX-bytes Vec invariant, but now explicit. +unit test checked_new_len_rejects_buffer_overflow. - copilot: the write-guard test's temp filenames used only the pid (which the OS reuses), so a leftover file from an aborted run could spuriously fail the !exists() checks. Added a per-run nanosecond nonce (matching `forge`). (copilot's other finding — pub(crate) breaking the fuzz crate — was already fixed in 9713f0d, before that review landed.) Gate green: fmt, clippy -D warnings, rustdoc -D warnings, test (default/experimental/no-default), MSRV 1.89, --locked, +nightly fuzz build.
Bot-review finding (supply-chain): the sdist smoke-test step added in this PR used dtolnay/rust-toolchain@stable, a moving ref — inconsistent with the SHA-pinned surrounding steps, and a supply-chain risk in a workflow that mints OIDC publish tokens. Pin all three uses in the publish pipeline (release-python.yml x1, release-crate.yml x2) to 29eef336d9b2848a0b548edc03f92a220660cdb8 (the stable branch HEAD, 2026-03-27). When pinned to a SHA the action cannot infer the channel from the ref, so each gains an explicit `with: toolchain: stable` — immutable action code, current stable compiler. CI workflows (ci.yml, python.yml) are out of scope (not the publish pipeline). Both release workflows are workflow_dispatch-only (publish-held); change validated by YAML parse + grep (not exercised by PR CI).
|
/agentic_review |
|
Persistent review updated to latest commit 0ce17e1 |
…buffer panic Second bot-review round on PR #30 (qodo + copilot): - (qodo + copilot) checked_new_len_accepts_up_to_max asserted checked_new_len(0, MAX_VECTORS, 4096) succeeds — but MAX_VECTORS*4096 = 2^38 overflows usize on 32-bit (wasm32/armv7), where the guard correctly panics, so the test would have failed the wasm32 CI lane. Gated that one assertion to target_pointer_width="64"; the 32-bit panic path stays covered by checked_new_len_rejects_buffer_overflow. - (copilot) the write-guard test used bm_dim=65536, which exceeds MAX_DIM (u16::MAX=65535) and is loader-invalid. Switched to 32768 (multiple of 64, within MAX_DIM, payload 256 GiB > 128 GiB) so the test triggers the payload guard on a loader-valid dim. - (copilot x4) documented the buffer-length usize-overflow panic (32-bit) in the # Panics section of all four add() methods. Gate green incl. a wasm32-unknown-unknown lib check; fmt, clippy -D warnings, rustdoc -D warnings, test (default/experimental/no-default), MSRV 1.89, --locked, +nightly fuzz build.
Summary
Pre-public final hardening from triaging a clean-context release-readiness audit. The audit's headline "blockers" were largely phantom against the real tree (the crates.io release workflow already exists and is SHA-pinned; the
u32-cap finding was already tracked as #25). This PR remediates the genuinely worth-doing-before-publish items and files the rest.No correctness bug was found. PUBLISH stays HELD — this is pre-flip cleanup.
Remediated
u32doc-ID capacity cap (closes #25)The on-disk loaders cap
n_vectorsatMAX_VECTORS(64 Mi), but the four in-memoryadd()paths did not — so an index could in principle grow past the candidate APIs'u32ID space.util::checked_new_len(current, adding, elems_per_vec)— a shared fail-loud guard; panics if growth would exceedrank_io::MAX_VECTORSor if the resultingnew_n × elems_per_vecbuffer length would overflowusizeon 32-bit targets (wasm32/armv7, whereMAX_VECTORS × MAX_DIM≈ 8 TiB ≫usize::MAX). Wired into all fouradd()before the resize, matching the existing contract asserts.# Panicsdocumented. (The buffer-overflow guard and the per-run-nonce test fix below came from the gemini/copilot review round — commit2559285.)MAX_VECTORSdocuments (MAX_VECTORS << u32::MAX, every candidate ID representable). Bounds the count; payload is bounded separately, below.Writer payload cap + corrected round-trip framing (Codex stop-reviews)
1.
MAX_PAYLOADenforced on writes. The loaders reject payloads over 128 GiB, but the writers didn't —MAX_VECTORS(a count cap) allows ~8 TiB atdim=u16::MAX, so a >128 GiB index (~20 M × dim-4096) wrote fine butload()rejected it. All four writers now callcheck_payload_bytesbeforeFile::create(a rejected oversized write can't truncate an existing file). Folds in audit finding #5.2. Round-trip is a type-level guarantee. An earlier draft over-claimed "write/load symmetry" for the raw writers; they don't re-validate
dim/n_vectors/n_top/bits/divisibility/semantics. The guarantee lives in the index types:*::newvalidate against the loaders' bounds,add()capsn_vectors,write()caps payload, and they emit only loader-valid data — so anythingT::writeproduces,T::loadreloads. (Also fixed a pre-existing module-doc line implyingMAX_DIM × MAX_VECTORSfits 128 GiB — it's ~8 TiB.)Froze the persistence API surface (
rank_iowriters/loaders →pub(crate))Since the raw
rank_iofree functions are trusted serializers, not validated constructors, leaving them public invited the wrong mental model. Pre-release is when to close that door.write_*/load_*functions are nowpub(crate);MAX_DIM/MAX_SIGN_BITMAP_DIM/MAX_VECTORSstaypub.write()/load()(Rank/RankQuant/Bitmap/SignBitmap) — stated in the module docs.MAX_*constants (unchanged). Fallout was confined to the test/fuzz harnesses: (a) the TV-DESER-004/005 loader red-team tests (tests/redteam_delta.rs, deleted) and the write-guard test (tests/index/main.rs) moved into asrc/rank_io.rsunit-test module (crate-internal access; tuple-level assertions preserved); (b) the fourcargo-fuzzloader targets, which calledrank_io::load_*directly, now drive the publicType::loadmethods — the same loader code via the type wrapper, so the no-panic fuzz contract is preserved. Same coverage throughout.sdist install smoke test (audit finding #3)
Wheels were install + pytest-tested per platform; the sdist was built and uploaded untested.
build-sdistnow installs the sdist from source (maturin PEP 517 compile of the vendored core crate) and runs pytest before upload. Validated locally: sdist vendors the core crate, compiles the abi3 extension from source, all 151 pytest pass.Filed as issues (low value, real — not blockers)
rank_to_bucketshould rejectrank >= dto match fix: pre-public public-API contract hardening (release-readiness triage) #26'sbucket_centrefail-loudmacos-13x86_64 wheel leg inrelease-python.ymldiverges frompython.ymlTest plan (full
CLAUDE.mdgate, all green)cargo fmt --all --check+cargo fmt --manifest-path fuzz/Cargo.toml --checkcargo clippy --all-targets --all-features -- -D warningsRUSTDOCFLAGS="-D warnings" cargo doc --no-deps --all-features(links resolve underpub(crate))cargo test(default),--features experimental,--no-default-features— green; new/relocated tests:checked_new_len×3,rank_io::tests::writers_reject_oversized_payload_without_truncating, TV-DESER-004/005cargo +1.89.0 build(MSRV),cargo build --locked,RUSTFLAGS="-D warnings" cargo buildcargo +nightly fuzz build(the four loader targets compile against the public API)ci.yml+python.yml) on push