docs: de-fiction extracted crate + dual MIT/Apache license#2
Conversation
All 26 clippy errors fixed across src/, tests/, and examples/: - manual_is_multiple_of (13×): x % n == 0 → x.is_multiple_of(n), stable since 1.87, safe on MSRV 1.89 - manual_range_contains (2×): negated range comparisons → !(a..=b).contains(&x) - manual_repeat_n (1×): repeat(v).take(n) → repeat_n(v, n), stable 1.82 - too_many_arguments (7×): #[allow] with justifying comment on scan_b2_fastscan_avx512, scan_b2_fastscan_scalar, scan_via_lut_scalar (src), finalise_row, bench_two_stage, bench_two_stage_batched, bench_sign_two_stage_batched (examples) - needless_range_loop (9×): #[allow] on all SIMD kernel loops (bitmap.rs AVX-512 kernels, sign_bitmap.rs AVX-512 kernel, fastscan.rs scalar finalize) plus two clear mechanical rewrites in tests/rank_index/quant.rs and two #[allow] in tests/rank_index/index.rs (raw index used in assertion message) cargo fmt --all run; reformatted bitmap.rs, fastscan.rs and several test/example files. No behavior change.
Old lockfile was version 3 and still listed serde as a transitive dependency that no longer exists in the dependency tree. Regenerated with `cargo generate-lockfile`; new lockfile is version 4, contains no serde entries, and passes `cargo build --locked` and `cargo test --locked`.
Review Summary by QodoDe-fiction crate and implement dual MIT/Apache-2.0 licensing
WalkthroughsDescription• **Remove fictional performance claims** from documentation: Strip TurboQuant/Harrier performance comparisons and replace with reproducible in-repo benchmark references • **Genericize codebase references**: Remove TurboQuant-specific and Harrier embedding family references; rename temporary files from turbovec_* to ordvec_* • **Implement dual MIT/Apache-2.0 licensing**: Add LICENSE-APACHE file, update LICENSE-MIT with correct copyright holder (Nelson Spence), and update README with dual-license badge and documentation • **Modernize code patterns**: Replace modulo checks with is_multiple_of() method calls, update iterator patterns to use repeat_n(), and improve range checking with idiomatic Rust patterns • **Improve code quality**: Add clippy lint suppressions with justifications, reorganize imports to follow Rust conventions, and reformat code for consistency and readability • **Regenerate benchmark results**: Update benchmarks/rank_modes_results.txt with new FastScan benchmark row and refreshed performance metrics • **Rewrite documentation**: Completely rewrite docs/RANK_MODES.md to focus on synthetic in-repo benchmarks instead of external corpus claims; fix internal path references in docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md Diagramflowchart LR
A["Fictional claims<br/>TurboQuant/Harrier refs"] -->|Remove| B["Reproducible<br/>in-repo benchmarks"]
C["Single MIT license<br/>Ryan Codrai copyright"] -->|Update| D["Dual MIT/Apache-2.0<br/>Nelson Spence copyright"]
E["turbovec namespace<br/>references"] -->|Genericize| F["ordvec-only<br/>references"]
G["Legacy code patterns<br/>modulo/repeat"] -->|Modernize| H["Idiomatic Rust<br/>is_multiple_of/repeat_n"]
B --> I["Publishable crate"]
D --> I
F --> I
H --> I
File Changes1. examples/bench_rank.rs
|
Code Review by Qodo
1.
|
There was a problem hiding this comment.
Code Review
This pull request transitions the project from turbovec to ordvec, updating licenses to a dual MIT/Apache-2.0 scheme and changing the copyright holder. It removes the serde dependency and introduces the RankQuantFastscanIndex for optimized b=2 scans. Extensive updates were made to documentation and benchmarks to reflect the new project name and performance characteristics. The codebase underwent significant formatting and linting improvements, including the use of is_multiple_of for clarity and clippy attribute additions to suppress specific warnings in SIMD kernels. I have no feedback to provide.
The x86 SIMD dispatch (select_simd_tier + the SimdTier match arms, the AVX kernels, BATCHED_AVX512_CHUNK) is cfg(target_arch=x86_64)-gated, but the glue it references — the SimdTier::Avx2/Avx512 variants, the batched-chunk consts, and the simd_tier / centre_drop_used bindings — was defined unconditionally. On non-x86 (aarch64 / macos-latest CI) those are dead/unused and, under RUSTFLAGS=-D warnings, fail the build with 7 dead_code/unused errors. Add cfg_attr(not(target_arch=x86_64), allow(...)) to each so the crate builds clean on aarch64 (scalar path) while x86 is untouched. Verified: aarch64 lib + tests + examples compile clean under -D warnings; x86 fmt/clippy/test 80/86.
Strip TurboQuant/Harrier perf claims from doc comments — fastscan.rs now points at the reproducible in-repo bench row and the top-10 parity test instead of "1.63x faster than TurboQuant b=2 on Harrier 207k". Genericize turbovec-internal doc refs (index.rs swap_remove, rank_io.rs format note) and rename the .tvsb temp files turbovec_* -> ordvec_*. Drop "Harrier" from the sign_bitmap embedding-family example. Rewrite docs/RANK_MODES.md as standalone (no TurboQuant baseline table; synthetic in-repo headline, real-corpus results deferred to the paper) and fix the stale path in FOLLOWUP_BODY_KERNEL_TIE_BREAK.md. Regenerate benchmarks/rank_modes_results.txt with the FastScan row. Dual-license MIT OR Apache-2.0: LICENSE-MIT (Nelson Spence) + LICENSE-APACHE, drop the scaffold's bare LICENSE. ordvec is original rank/sign work (no TurboQuant code), so copyright is Nelson's; the turbovec origin stays as a provenance note in README/lib.rs.
5f8593d to
9fc76c5
Compare
… review) Address the bot review wave (gemini/Codex/qodo) — all "convert core panics to clean Python errors", completing the binding's boundary-guard design: - Width validation (check_width): every f32 input now checks ncols == dim (2-D) / len == dim (1-D). The core derives n = len/dim and only asserts divisibility, so a wrong-but-divisible shape (e.g. (1,128) into a dim-64 index) was silently reinterpreted as a different vector count, or panicked on the result reshape. Now a clean ValueError. (gemini x3 critical, Codex x2 P1, qodo #3) - Constructor validation: Rank/RankQuant/Bitmap/SignBitmap `new` return PyResult and validate against the EXACT core asserts (dim in [2, u16::MAX]; bits in {1,2,4} + dim multiple of 8/bits and 2^bits; dim % 64 + 0 < n_top < dim; dim % 64 + <= MAX_SIGN_BITMAP_DIM) -> ValueError instead of panic. (gemini x4) - swap_remove (Rank, RankQuant): bounds-check -> IndexError, not panic. (gemini high, qodo #4) - README provenance tightened to the canonical "developed within turbovec, factored out" phrasing. (qodo #2) Tests: +9 (width-mismatch x6, swap_remove OOB x3); constructor-rejection tests tightened from BaseException to ValueError. Suite now 117 passed + 1 xfail. clippy -D warnings + fmt clean; MSRV 1.89 builds core + binding. Not changed: qodo #1 (ndarray via numpy) is a deliberate, documented core-vs- binding split (deps grep + publish scoped to -p ordvec; the core's published lock is clean; the binding is publish = false, PyPI-only) -- explained on-thread.
…-fn) Codex stop-review #2: the previous commit over-claimed write/load "symmetry" for the public write_* free functions. Those are low-level trusted serializers — they do NOT re-validate dim / n_vectors / n_top / bits / divisibility / data semantics, all of which the loaders check, so a direct caller can still produce a file load_* rejects. The MAX_PAYLOAD guard closed only one of several asymmetries. The actual round-trip guarantee is type-level and was already designed in: Rank/RankQuant/Bitmap/SignBitmap::new validate dim / n_top / bits / divisibility against the loaders' bounds (Bitmap::new's comment says so explicitly), add() caps n_vectors, write() caps payload, and the types emit only loader-valid data — so anything T::write produces, T::load reloads. - rank_io module docs gain a "Round-trip contract" section: round-trip is a type-level guarantee; write_* are trusted serializers assuming loader-valid input; MAX_PAYLOAD is the one loader bound they also enforce (defense-in-depth + no truncation before File::create). - Dropped "symmetric write/load" from the four writer comments, check_payload_bytes, and the MAX_VECTORS doc. - Fixed a pre-existing module-doc inaccuracy: MAX_DIM * MAX_VECTORS is ~8 TiB, not 128 GiB — MAX_PAYLOAD is the binding byte ceiling. Doc/comment-only; no behavior change. Gate green: fmt, clippy -D warnings, rustdoc -D warnings, test (default/experimental/no-default).
…he persistence API Codex stop-review #2 follow-through (Nelson's call): the raw rank_io write_*/load_* free functions were public but are trusted serializers, not validated constructors — leaving them public invited the wrong mental model (and was the root of the retracted "write/load symmetry" claim). Close the door before crates.io locks the surface. - write_rank / load_rank / write_rankquant / load_rankquant / write_bitmap / load_bitmap / write_sign_bitmap / load_sign_bitmap -> pub(crate). The MAX_DIM / MAX_SIGN_BITMAP_DIM / MAX_VECTORS constants stay public. - The supported persistence API is now unambiguously the index types' write()/load(): Rank / RankQuant / Bitmap / SignBitmap. Module docs updated to a "Persistence API & round-trip contract" section. - Relocated the rank_io-layer tests to a src/rank_io.rs unit-test module (they need crate-internal access): the TV-DESER-004/005 loader red-team tests (from tests/redteam_delta.rs, now deleted) and the write-guard test (from tests/index/main.rs). Same coverage, co-located with the code. The Python binding only consumed the MAX_* constants (unchanged); examples and benches don't touch the free functions — no other fallout. Gate green: fmt, clippy -D warnings, rustdoc -D warnings, test (default/experimental/no-default), MSRV 1.89, --locked.
De-fiction + dual MIT/Apache license
Removes fictional / leaked claims so the crate is publishable alongside the paper — no fabricated benchmarks, no private-corpus references.
Changes
fastscan.rsnow points at the reproducible in-repo bench row + the top-10 parity test instead of "1.63× faster than TurboQuant b=2 on Harrier 207k".index.rsswap_remove,rank_io.rsformat note); rename.tvsbtemp filesturbovec_*→ordvec_*; drop "Harrier" from thesign_bitmapembedding-family example.docs/RANK_MODES.mdstandalone — no TurboQuant baseline table; synthetic in-repo headline, real-corpus results deferred to the paper. Regeneratebenchmarks/rank_modes_results.txtwith the FastScan row.LICENSE-MIT(Nelson Spence) +LICENSE-APACHE, drop the scaffold's bareLICENSE. ordvec is original rank/sign work (contains no TurboQuant code), so copyright is Nelson's; the turbovec origin stays as an honest provenance note in README/lib.rs. README badge + License section updated to the dual pair.Verification
src/docs/README; only honest "extracted/ported from turbovec (MIT)" provenance.cargo fmt --check+clippy -D warningsclean;cargo test80/0, experimental 86/0.