docs: de-fiction extracted crate + dual MIT/Apache license by Fieldnote-Echo · Pull Request #2 · Fieldnote-Echo/ordvec

Fieldnote-Echo · 2026-05-22T22:51:29Z

De-fiction + dual MIT/Apache license

Stacked on #1 (hygiene) — merge #1 first; this PR's diff will then collapse to only its own changes.

Removes fictional / leaked claims so the crate is publishable alongside the paper — no fabricated benchmarks, no private-corpus references.

Changes

Strip TurboQuant/Harrier perf claims from doc comments. fastscan.rs now points at the reproducible in-repo bench row + the top-10 parity test instead of "1.63× faster than TurboQuant b=2 on Harrier 207k".
Genericize turbovec-internal doc refs (index.rs swap_remove, rank_io.rs format note); rename .tvsb temp files turbovec_* → ordvec_*; drop "Harrier" from the sign_bitmap embedding-family example.
Rewrite docs/RANK_MODES.md standalone — no TurboQuant baseline table; synthetic in-repo headline, real-corpus results deferred to the paper. Regenerate benchmarks/rank_modes_results.txt with the FastScan row.
Dual-license MIT OR Apache-2.0: add LICENSE-MIT (Nelson Spence) + LICENSE-APACHE, drop the scaffold's bare LICENSE. ordvec is original rank/sign work (contains no TurboQuant code), so copyright is Nelson's; the turbovec origin stays as an honest provenance note in README/lib.rs. README badge + License section updated to the dual pair.

Cross-PR note: the Cargo license = "MIT OR Apache-2.0" field lands in the CI PR (prod/ci). Both this PR (LICENSE files + README) and that one are needed for a coherent dual-license.

Verification

Fiction grep clean — no Harrier / arXiv / "1.63×" / TurboQuant in src/docs/README; only honest "extracted/ported from turbovec (MIT)" provenance.
src diff is comment/string-only — no logic changes.
cargo fmt --check + clippy -D warnings clean; cargo test 80/0, experimental 86/0.

All 26 clippy errors fixed across src/, tests/, and examples/: - manual_is_multiple_of (13×): x % n == 0 → x.is_multiple_of(n), stable since 1.87, safe on MSRV 1.89 - manual_range_contains (2×): negated range comparisons → !(a..=b).contains(&x) - manual_repeat_n (1×): repeat(v).take(n) → repeat_n(v, n), stable 1.82 - too_many_arguments (7×): #[allow] with justifying comment on scan_b2_fastscan_avx512, scan_b2_fastscan_scalar, scan_via_lut_scalar (src), finalise_row, bench_two_stage, bench_two_stage_batched, bench_sign_two_stage_batched (examples) - needless_range_loop (9×): #[allow] on all SIMD kernel loops (bitmap.rs AVX-512 kernels, sign_bitmap.rs AVX-512 kernel, fastscan.rs scalar finalize) plus two clear mechanical rewrites in tests/rank_index/quant.rs and two #[allow] in tests/rank_index/index.rs (raw index used in assertion message) cargo fmt --all run; reformatted bitmap.rs, fastscan.rs and several test/example files. No behavior change.

Old lockfile was version 3 and still listed serde as a transitive dependency that no longer exists in the dependency tree. Regenerated with `cargo generate-lockfile`; new lockfile is version 4, contains no serde entries, and passes `cargo build --locked` and `cargo test --locked`.

qodo-code-review · 2026-05-22T22:52:02Z

Review Summary by Qodo

De-fiction crate and implement dual MIT/Apache-2.0 licensing

📝 Documentation ✨ Enhancement

Walkthroughs

Description

• **Remove fictional performance claims** from documentation: Strip TurboQuant/Harrier performance
  comparisons and replace with reproducible in-repo benchmark references
• **Genericize codebase references**: Remove TurboQuant-specific and Harrier embedding family
  references; rename temporary files from turbovec_* to ordvec_*
• **Implement dual MIT/Apache-2.0 licensing**: Add LICENSE-APACHE file, update LICENSE-MIT with
  correct copyright holder (Nelson Spence), and update README with dual-license badge and
  documentation
• **Modernize code patterns**: Replace modulo checks with is_multiple_of() method calls, update
  iterator patterns to use repeat_n(), and improve range checking with idiomatic Rust patterns
• **Improve code quality**: Add clippy lint suppressions with justifications, reorganize imports to
  follow Rust conventions, and reformat code for consistency and readability
• **Regenerate benchmark results**: Update benchmarks/rank_modes_results.txt with new FastScan
  benchmark row and refreshed performance metrics
• **Rewrite documentation**: Completely rewrite docs/RANK_MODES.md to focus on synthetic in-repo
  benchmarks instead of external corpus claims; fix internal path references in
  docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md

Diagram

flowchart LR
  A["Fictional claims<br/>TurboQuant/Harrier refs"] -->|Remove| B["Reproducible<br/>in-repo benchmarks"]
  C["Single MIT license<br/>Ryan Codrai copyright"] -->|Update| D["Dual MIT/Apache-2.0<br/>Nelson Spence copyright"]
  E["turbovec namespace<br/>references"] -->|Genericize| F["ordvec-only<br/>references"]
  G["Legacy code patterns<br/>modulo/repeat"] -->|Modernize| H["Idiomatic Rust<br/>is_multiple_of/repeat_n"]
  B --> I["Publishable crate"]
  D --> I
  F --> I
  H --> I

File Changes

1. examples/bench_rank.rs Formatting +141/-63

Code formatting and clippy lint suppressions for benchmark

• Reorganized imports to follow Rust conventions (ordvec imports before external crates)
• Added #[allow(clippy::too_many_arguments)] attributes to functions with many parameters (kernel
 arity justification)
• Reformatted long function calls and macro invocations to multi-line for readability
• Replaced std::iter::repeat(0u8).take(n) with std::iter::repeat_n(0u8, n) for clarity

examples/bench_rank.rs

2. tests/rank_index/bitmap.rs Formatting +40/-23

Import reorganization and formatting cleanup

• Reorganized imports to place ordvec imports before external crates
• Reformatted multi-line expressions for consistency and readability
• Added #[allow(clippy::needless_range_loop)] attributes with justification comments

tests/rank_index/bitmap.rs

3. src/rank_index/bitmap.rs ✨ Enhancement +19/-29

Idiomatic Rust patterns and clippy lint handling

• Replaced modulo checks (qpv % 8 == 0) with is_multiple_of(8) method calls
• Added #[allow(clippy::needless_range_loop)] attributes to indexed loop access patterns
• Reformatted long function signatures and calls to multi-line format
• Improved code readability in AVX-512 kernel implementations

src/rank_index/bitmap.rs

View more (23)

4. src/sign_bitmap.rs 📝 Documentation +15/-16

De-fiction documentation and idiomatic code patterns

• Updated doc comment to remove "Harrier" embedding family reference and generalize example
• Replaced modulo checks with is_multiple_of() method calls
• Added #[allow(clippy::needless_range_loop)] attributes to kernel loops
• Renamed temporary file references from turbovec_* to ordvec_* in tests

src/sign_bitmap.rs

5. src/rank_index/fastscan.rs 📝 Documentation +24/-29

Remove fictional performance claims, add reproducible references

• Rewrote doc comments to remove fictional performance claims ("1.63× faster than TurboQuant")
• Replaced with reproducible in-repo benchmark references instead of external claims
• Added #[allow(clippy::too_many_arguments)] attributes to kernel functions
• Reformatted long function signatures and macro invocations

src/rank_index/fastscan.rs

6. src/rank_io.rs 📝 Documentation +14/-23

Genericize documentation and modernize range checking

• Updated doc comments to remove TurboQuant-specific references and genericize descriptions
• Replaced modulo checks with is_multiple_of() method calls throughout
• Reformatted range checks to use (start..=end).contains(&value) pattern
• Simplified error message formatting

src/rank_io.rs

7. src/rank_index/quant.rs ✨ Enhancement +30/-15

Idiomatic Rust patterns in quantization kernel

• Replaced modulo checks with is_multiple_of() method calls for clarity
• Reformatted long function calls to multi-line format
• Improved readability of conditional expressions

src/rank_index/quant.rs

8. tests/redteam_alpha.rs Formatting +21/-6

Code formatting for red-team tests

• Reformatted multi-line iterator chains and function calls
• Improved readability of assertion statements with multi-line formatting

tests/redteam_alpha.rs

9. tests/redteam_beta.rs ✨ Enhancement +11/-29

Iterator pattern improvements and formatting

• Reformatted long iterator chains to multi-line format
• Simplified variable iteration patterns (e.g., for &v in src.iter() instead of indexed loop)
• Improved readability of filter and map operations

tests/redteam_beta.rs

10. tests/rank_index/fastscan.rs Formatting +15/-7

Import reorganization and formatting
• Reorganized imports to place ordvec imports before external crates
• Reformatted multi-line iterator chains and function calls
tests/rank_index/fastscan.rs

11. tests/rank_index/main.rs Formatting +14/-11

Import organization and iterator modernization

• Reorganized imports to follow Rust conventions
• Replaced std::iter::repeat(0u8).take(n) with std::iter::repeat_n(0u8, n)
• Reformatted multi-line function calls and assertions

tests/rank_index/main.rs

12. tests/rank_index/quant.rs ✨ Enhancement +17/-14

Import reorganization and iterator improvements

• Reorganized imports to place ordvec imports before external crates
• Simplified loop patterns using iterator methods instead of indexed access
• Reformatted multi-line expressions for readability

tests/rank_index/quant.rs

13. src/rank_index/quant_kernels.rs ✨ Enhancement +13/-8

Clippy lint handling for kernel functions

• Added #[allow(clippy::too_many_arguments)] attributes to kernel functions
• Reformatted long _mm512_setr_epi32 macro invocations to single line
• Improved assertion message formatting

src/rank_index/quant_kernels.rs

14. src/rank.rs Formatting +15/-8

Code formatting and clippy lint handling

• Added #[allow(clippy::needless_range_loop)] attribute to indexed loop
• Reformatted multi-line iterator chains and assertions
• Improved readability of mathematical expressions

src/rank.rs

15. tests/rank_index/index.rs Formatting +3/-1

Import reorganization and clippy lint handling
• Reorganized imports to place ordvec imports before external crates
• Added #[allow(clippy::needless_range_loop)] attributes to indexed loops
tests/rank_index/index.rs

16. src/rank_index/util.rs Formatting +2/-7

Code formatting improvements
• Reformatted long conditional expressions to multi-line format
• Simplified function signature formatting
src/rank_index/util.rs

17. src/rank_index/index.rs 📝 Documentation +7/-3

Genericize documentation and improve formatting
• Updated doc comment to remove TurboQuant reference and genericize description
• Reformatted struct initialization to multi-line format
src/rank_index/index.rs

18. src/rank_index/multi_bucket.rs Formatting +3/-6

Code formatting improvement
• Reformatted multi-line iterator chain to more compact format
src/rank_index/multi_bucket.rs

19. tests/redteam_delta.rs Formatting +7/-5

Code formatting
• Reformatted multi-line format string and file creation calls
tests/redteam_delta.rs

20. tests/rank_index/multi_bucket.rs Formatting +2/-2

Import reorganization
• Reorganized imports to place ordvec imports before external crates
tests/rank_index/multi_bucket.rs

21. docs/RANK_MODES.md 📝 Documentation +226/-214

De-fiction documentation and remove external performance claims

• Updated title and introduction to remove TurboQuant references and genericize for ordvec
• Removed fictional performance claims and external corpus references
• Updated benchmark environment details to remove system-specific information
• Replaced TurboQuant comparison tables with ordvec-only results from synthetic corpus
• Updated all references from "turbovec" to "ordvec" and removed Harrier/arXiv citations
• Rewrote sections to focus on reproducible in-repo benchmarks instead of external claims
• Updated API surface table to reflect ordvec's actual capabilities
• Simplified reproducibility instructions (removed BLAS linking requirement)

docs/RANK_MODES.md

22. README.md 📝 Documentation +8/-4

Dual MIT/Apache-2.0 license implementation

• Updated license badge from MIT-only to dual MIT OR Apache-2.0
• Updated license section to document dual licensing with both LICENSE-MIT and LICENSE-APACHE files
• Removed reference to single LICENSE file
• Clarified turbovec extraction attribution in provenance note

README.md

23. benchmarks/rank_modes_results.txt 📝 Documentation +21/-19

Add FastScan benchmark results and refresh performance metrics

• Added new RankQuantFastscanIndex b=2 benchmark row with block-32 PQ-LUT fast path implementation
• Regenerated all benchmark results with updated performance metrics across all index types
• Updated corpus generation time from 0.16s to 0.17s
• Updated JSON benchmark data structure with new fastscan entry and refreshed performance numbers

benchmarks/rank_modes_results.txt

24. LICENSE-APACHE ⚙️ Configuration changes +176/-0

Add Apache License 2.0 for dual-licensing

• Added complete Apache License 2.0 text (176 lines)
• Establishes dual-licensing framework alongside MIT license
• Enables crate publication under either MIT or Apache-2.0 terms

LICENSE-APACHE

25. LICENSE-MIT ⚙️ Configuration changes +1/-6

Update MIT license copyright to Nelson Spence

• Changed copyright holder from Ryan Codrai to Nelson Spence
• Removed detailed turbovec attribution text from license header
• Simplified license to core MIT permission grant

LICENSE-MIT

26. docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md 📝 Documentation +1/-1

Fix internal path reference in documentation

• Updated internal path reference from turbovec/src/rank_index/bitmap.rs to
 src/rank_index/bitmap.rs
• Removes fictional turbovec namespace reference to reflect extracted crate structure

docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md

qodo-code-review · 2026-05-22T22:52:03Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. ~~License metadata mismatch~~ ✓ Resolved 🐞 Bug ⚙ Maintainability

Description

README.md now states the crate is licensed “MIT OR Apache-2.0”, but Cargo.toml still declares
license = "MIT", so cargo/crates.io metadata and license-scanners will report MIT-only while the
docs/files claim dual licensing.

Code

README.md[R46-51]

Evidence

README explicitly declares dual licensing, while Cargo.toml’s license field is still MIT-only,
creating a direct contradiction in the same repo state after this PR’s README/license-file changes.

README.md[44-51]
Cargo.toml[1-8]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The PR updates README + adds Apache license file to present the crate as dual-licensed, but `Cargo.toml` still declares MIT-only. This creates inconsistent licensing signals for downstream users and automated tooling (e.g., crates.io metadata, compliance scanners).

### Issue Context
- README now claims dual licensing (MIT OR Apache-2.0).
- Cargo manifest still says `license = "MIT"`.

### Fix Focus Areas
- Cargo.toml[1-12]
- README.md[44-51]

### Suggested fix
Update `Cargo.toml` to use the SPDX expression:
- `license = "MIT OR Apache-2.0"`

Optionally, ensure README wording matches the exact SPDX expression used in Cargo.toml.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

gemini-code-assist

Code Review

This pull request transitions the project from turbovec to ordvec, updating licenses to a dual MIT/Apache-2.0 scheme and changing the copyright holder. It removes the serde dependency and introduces the RankQuantFastscanIndex for optimized b=2 scans. Extensive updates were made to documentation and benchmarks to reflect the new project name and performance characteristics. The codebase underwent significant formatting and linting improvements, including the use of is_multiple_of for clarity and clippy attribute additions to suppress specific warnings in SIMD kernels. I have no feedback to provide.

The x86 SIMD dispatch (select_simd_tier + the SimdTier match arms, the AVX kernels, BATCHED_AVX512_CHUNK) is cfg(target_arch=x86_64)-gated, but the glue it references — the SimdTier::Avx2/Avx512 variants, the batched-chunk consts, and the simd_tier / centre_drop_used bindings — was defined unconditionally. On non-x86 (aarch64 / macos-latest CI) those are dead/unused and, under RUSTFLAGS=-D warnings, fail the build with 7 dead_code/unused errors. Add cfg_attr(not(target_arch=x86_64), allow(...)) to each so the crate builds clean on aarch64 (scalar path) while x86 is untouched. Verified: aarch64 lib + tests + examples compile clean under -D warnings; x86 fmt/clippy/test 80/86.

Strip TurboQuant/Harrier perf claims from doc comments — fastscan.rs now points at the reproducible in-repo bench row and the top-10 parity test instead of "1.63x faster than TurboQuant b=2 on Harrier 207k". Genericize turbovec-internal doc refs (index.rs swap_remove, rank_io.rs format note) and rename the .tvsb temp files turbovec_* -> ordvec_*. Drop "Harrier" from the sign_bitmap embedding-family example. Rewrite docs/RANK_MODES.md as standalone (no TurboQuant baseline table; synthetic in-repo headline, real-corpus results deferred to the paper) and fix the stale path in FOLLOWUP_BODY_KERNEL_TIE_BREAK.md. Regenerate benchmarks/rank_modes_results.txt with the FastScan row. Dual-license MIT OR Apache-2.0: LICENSE-MIT (Nelson Spence) + LICENSE-APACHE, drop the scaffold's bare LICENSE. ordvec is original rank/sign work (no TurboQuant code), so copyright is Nelson's; the turbovec origin stays as a provenance note in README/lib.rs.

… review) Address the bot review wave (gemini/Codex/qodo) — all "convert core panics to clean Python errors", completing the binding's boundary-guard design: - Width validation (check_width): every f32 input now checks ncols == dim (2-D) / len == dim (1-D). The core derives n = len/dim and only asserts divisibility, so a wrong-but-divisible shape (e.g. (1,128) into a dim-64 index) was silently reinterpreted as a different vector count, or panicked on the result reshape. Now a clean ValueError. (gemini x3 critical, Codex x2 P1, qodo #3) - Constructor validation: Rank/RankQuant/Bitmap/SignBitmap `new` return PyResult and validate against the EXACT core asserts (dim in [2, u16::MAX]; bits in {1,2,4} + dim multiple of 8/bits and 2^bits; dim % 64 + 0 < n_top < dim; dim % 64 + <= MAX_SIGN_BITMAP_DIM) -> ValueError instead of panic. (gemini x4) - swap_remove (Rank, RankQuant): bounds-check -> IndexError, not panic. (gemini high, qodo #4) - README provenance tightened to the canonical "developed within turbovec, factored out" phrasing. (qodo #2) Tests: +9 (width-mismatch x6, swap_remove OOB x3); constructor-rejection tests tightened from BaseException to ValueError. Suite now 117 passed + 1 xfail. clippy -D warnings + fmt clean; MSRV 1.89 builds core + binding. Not changed: qodo #1 (ndarray via numpy) is a deliberate, documented core-vs- binding split (deps grep + publish scoped to -p ordvec; the core's published lock is clean; the binding is publish = false, PyPI-only) -- explained on-thread.

…-fn) Codex stop-review #2: the previous commit over-claimed write/load "symmetry" for the public write_* free functions. Those are low-level trusted serializers — they do NOT re-validate dim / n_vectors / n_top / bits / divisibility / data semantics, all of which the loaders check, so a direct caller can still produce a file load_* rejects. The MAX_PAYLOAD guard closed only one of several asymmetries. The actual round-trip guarantee is type-level and was already designed in: Rank/RankQuant/Bitmap/SignBitmap::new validate dim / n_top / bits / divisibility against the loaders' bounds (Bitmap::new's comment says so explicitly), add() caps n_vectors, write() caps payload, and the types emit only loader-valid data — so anything T::write produces, T::load reloads. - rank_io module docs gain a "Round-trip contract" section: round-trip is a type-level guarantee; write_* are trusted serializers assuming loader-valid input; MAX_PAYLOAD is the one loader bound they also enforce (defense-in-depth + no truncation before File::create). - Dropped "symmetric write/load" from the four writer comments, check_payload_bytes, and the MAX_VECTORS doc. - Fixed a pre-existing module-doc inaccuracy: MAX_DIM * MAX_VECTORS is ~8 TiB, not 128 GiB — MAX_PAYLOAD is the binding byte ceiling. Doc/comment-only; no behavior change. Gate green: fmt, clippy -D warnings, rustdoc -D warnings, test (default/experimental/no-default).

…he persistence API Codex stop-review #2 follow-through (Nelson's call): the raw rank_io write_*/load_* free functions were public but are trusted serializers, not validated constructors — leaving them public invited the wrong mental model (and was the root of the retracted "write/load symmetry" claim). Close the door before crates.io locks the surface. - write_rank / load_rank / write_rankquant / load_rankquant / write_bitmap / load_bitmap / write_sign_bitmap / load_sign_bitmap -> pub(crate). The MAX_DIM / MAX_SIGN_BITMAP_DIM / MAX_VECTORS constants stay public. - The supported persistence API is now unambiguously the index types' write()/load(): Rank / RankQuant / Bitmap / SignBitmap. Module docs updated to a "Persistence API & round-trip contract" section. - Relocated the rank_io-layer tests to a src/rank_io.rs unit-test module (they need crate-internal access): the TV-DESER-004/005 loader red-team tests (from tests/redteam_delta.rs, now deleted) and the write-guard test (from tests/index/main.rs). Same coverage, co-located with the code. The Python binding only consumed the MAX_* constants (unchanged); examples and benches don't touch the free functions — no other fallout. Gate green: fmt, clippy -D warnings, rustdoc -D warnings, test (default/experimental/no-default), MSRV 1.89, --locked.

Fieldnote-Echo added 2 commits May 22, 2026 17:39

Fieldnote-Echo mentioned this pull request May 22, 2026

chore(ci): production release gate (CI matrix, SDE AVX-512, deny, MSRV) #3

Merged

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

Fieldnote-Echo added 2 commits May 22, 2026 18:01

Fieldnote-Echo force-pushed the prod/defiction branch from 5f8593d to 9fc76c5 Compare May 22, 2026 23:02

Fieldnote-Echo merged commit b4621b7 into main May 22, 2026

Fieldnote-Echo deleted the prod/defiction branch May 23, 2026 00:08

This was referenced May 23, 2026

feat: ordvec-python — PyO3 bindings for Rank/RankQuant/Bitmap/SignBitmap #13

Merged

perf: portable SIMD (NEON + WASM simd128) + cross-platform CI & wheel testing #19

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: de-fiction extracted crate + dual MIT/Apache license#2

docs: de-fiction extracted crate + dual MIT/Apache license#2
Fieldnote-Echo merged 4 commits into
mainfrom
prod/defiction

Fieldnote-Echo commented May 22, 2026

Uh oh!

qodo-code-review Bot commented May 22, 2026

Uh oh!

qodo-code-review Bot commented May 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Fieldnote-Echo commented May 22, 2026

De-fiction + dual MIT/Apache license

Changes

Verification

Uh oh!

qodo-code-review Bot commented May 22, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qodo-code-review Bot commented May 22, 2026 •

edited

Loading