Skip to content

docs: de-fiction extracted crate + dual MIT/Apache license#2

Merged
Fieldnote-Echo merged 4 commits into
mainfrom
prod/defiction
May 22, 2026
Merged

docs: de-fiction extracted crate + dual MIT/Apache license#2
Fieldnote-Echo merged 4 commits into
mainfrom
prod/defiction

Conversation

@Fieldnote-Echo
Copy link
Copy Markdown
Owner

De-fiction + dual MIT/Apache license

Stacked on #1 (hygiene) — merge #1 first; this PR's diff will then collapse to only its own changes.

Removes fictional / leaked claims so the crate is publishable alongside the paper — no fabricated benchmarks, no private-corpus references.

Changes

  • Strip TurboQuant/Harrier perf claims from doc comments. fastscan.rs now points at the reproducible in-repo bench row + the top-10 parity test instead of "1.63× faster than TurboQuant b=2 on Harrier 207k".
  • Genericize turbovec-internal doc refs (index.rs swap_remove, rank_io.rs format note); rename .tvsb temp files turbovec_*ordvec_*; drop "Harrier" from the sign_bitmap embedding-family example.
  • Rewrite docs/RANK_MODES.md standalone — no TurboQuant baseline table; synthetic in-repo headline, real-corpus results deferred to the paper. Regenerate benchmarks/rank_modes_results.txt with the FastScan row.
  • Dual-license MIT OR Apache-2.0: add LICENSE-MIT (Nelson Spence) + LICENSE-APACHE, drop the scaffold's bare LICENSE. ordvec is original rank/sign work (contains no TurboQuant code), so copyright is Nelson's; the turbovec origin stays as an honest provenance note in README/lib.rs. README badge + License section updated to the dual pair.

Cross-PR note: the Cargo license = "MIT OR Apache-2.0" field lands in the CI PR (prod/ci). Both this PR (LICENSE files + README) and that one are needed for a coherent dual-license.

Verification

  • Fiction grep clean — no Harrier / arXiv / "1.63×" / TurboQuant in src/docs/README; only honest "extracted/ported from turbovec (MIT)" provenance.
  • src diff is comment/string-only — no logic changes.
  • cargo fmt --check + clippy -D warnings clean; cargo test 80/0, experimental 86/0.

All 26 clippy errors fixed across src/, tests/, and examples/:
- manual_is_multiple_of (13×): x % n == 0 → x.is_multiple_of(n),
  stable since 1.87, safe on MSRV 1.89
- manual_range_contains (2×): negated range comparisons →
  !(a..=b).contains(&x)
- manual_repeat_n (1×): repeat(v).take(n) → repeat_n(v, n),
  stable 1.82
- too_many_arguments (7×): #[allow] with justifying comment on
  scan_b2_fastscan_avx512, scan_b2_fastscan_scalar,
  scan_via_lut_scalar (src), finalise_row, bench_two_stage,
  bench_two_stage_batched, bench_sign_two_stage_batched (examples)
- needless_range_loop (9×): #[allow] on all SIMD kernel loops
  (bitmap.rs AVX-512 kernels, sign_bitmap.rs AVX-512 kernel,
  fastscan.rs scalar finalize) plus two clear mechanical rewrites
  in tests/rank_index/quant.rs and two #[allow] in
  tests/rank_index/index.rs (raw index used in assertion message)

cargo fmt --all run; reformatted bitmap.rs, fastscan.rs and
several test/example files. No behavior change.
Old lockfile was version 3 and still listed serde as a transitive
dependency that no longer exists in the dependency tree. Regenerated
with `cargo generate-lockfile`; new lockfile is version 4, contains
no serde entries, and passes `cargo build --locked` and
`cargo test --locked`.
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

De-fiction crate and implement dual MIT/Apache-2.0 licensing

📝 Documentation ✨ Enhancement

Grey Divider

Walkthroughs

Description
• **Remove fictional performance claims** from documentation: Strip TurboQuant/Harrier performance
  comparisons and replace with reproducible in-repo benchmark references
• **Genericize codebase references**: Remove TurboQuant-specific and Harrier embedding family
  references; rename temporary files from turbovec_* to ordvec_*
• **Implement dual MIT/Apache-2.0 licensing**: Add LICENSE-APACHE file, update LICENSE-MIT with
  correct copyright holder (Nelson Spence), and update README with dual-license badge and
  documentation
• **Modernize code patterns**: Replace modulo checks with is_multiple_of() method calls, update
  iterator patterns to use repeat_n(), and improve range checking with idiomatic Rust patterns
• **Improve code quality**: Add clippy lint suppressions with justifications, reorganize imports to
  follow Rust conventions, and reformat code for consistency and readability
• **Regenerate benchmark results**: Update benchmarks/rank_modes_results.txt with new FastScan
  benchmark row and refreshed performance metrics
• **Rewrite documentation**: Completely rewrite docs/RANK_MODES.md to focus on synthetic in-repo
  benchmarks instead of external corpus claims; fix internal path references in
  docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md
Diagram
flowchart LR
  A["Fictional claims<br/>TurboQuant/Harrier refs"] -->|Remove| B["Reproducible<br/>in-repo benchmarks"]
  C["Single MIT license<br/>Ryan Codrai copyright"] -->|Update| D["Dual MIT/Apache-2.0<br/>Nelson Spence copyright"]
  E["turbovec namespace<br/>references"] -->|Genericize| F["ordvec-only<br/>references"]
  G["Legacy code patterns<br/>modulo/repeat"] -->|Modernize| H["Idiomatic Rust<br/>is_multiple_of/repeat_n"]
  B --> I["Publishable crate"]
  D --> I
  F --> I
  H --> I

Loading

File Changes

1. examples/bench_rank.rs Formatting +141/-63

Code formatting and clippy lint suppressions for benchmark

• Reorganized imports to follow Rust conventions (ordvec imports before external crates)
• Added #[allow(clippy::too_many_arguments)] attributes to functions with many parameters (kernel
 arity justification)
• Reformatted long function calls and macro invocations to multi-line for readability
• Replaced std::iter::repeat(0u8).take(n) with std::iter::repeat_n(0u8, n) for clarity

examples/bench_rank.rs


2. tests/rank_index/bitmap.rs Formatting +40/-23

Import reorganization and formatting cleanup

• Reorganized imports to place ordvec imports before external crates
• Reformatted multi-line expressions for consistency and readability
• Added #[allow(clippy::needless_range_loop)] attributes with justification comments

tests/rank_index/bitmap.rs


3. src/rank_index/bitmap.rs ✨ Enhancement +19/-29

Idiomatic Rust patterns and clippy lint handling

• Replaced modulo checks (qpv % 8 == 0) with is_multiple_of(8) method calls
• Added #[allow(clippy::needless_range_loop)] attributes to indexed loop access patterns
• Reformatted long function signatures and calls to multi-line format
• Improved code readability in AVX-512 kernel implementations

src/rank_index/bitmap.rs


View more (23)
4. src/sign_bitmap.rs 📝 Documentation +15/-16

De-fiction documentation and idiomatic code patterns

• Updated doc comment to remove "Harrier" embedding family reference and generalize example
• Replaced modulo checks with is_multiple_of() method calls
• Added #[allow(clippy::needless_range_loop)] attributes to kernel loops
• Renamed temporary file references from turbovec_* to ordvec_* in tests

src/sign_bitmap.rs


5. src/rank_index/fastscan.rs 📝 Documentation +24/-29

Remove fictional performance claims, add reproducible references

• Rewrote doc comments to remove fictional performance claims ("1.63× faster than TurboQuant")
• Replaced with reproducible in-repo benchmark references instead of external claims
• Added #[allow(clippy::too_many_arguments)] attributes to kernel functions
• Reformatted long function signatures and macro invocations

src/rank_index/fastscan.rs


6. src/rank_io.rs 📝 Documentation +14/-23

Genericize documentation and modernize range checking

• Updated doc comments to remove TurboQuant-specific references and genericize descriptions
• Replaced modulo checks with is_multiple_of() method calls throughout
• Reformatted range checks to use (start..=end).contains(&value) pattern
• Simplified error message formatting

src/rank_io.rs


7. src/rank_index/quant.rs ✨ Enhancement +30/-15

Idiomatic Rust patterns in quantization kernel

• Replaced modulo checks with is_multiple_of() method calls for clarity
• Reformatted long function calls to multi-line format
• Improved readability of conditional expressions

src/rank_index/quant.rs


8. tests/redteam_alpha.rs Formatting +21/-6

Code formatting for red-team tests

• Reformatted multi-line iterator chains and function calls
• Improved readability of assertion statements with multi-line formatting

tests/redteam_alpha.rs


9. tests/redteam_beta.rs ✨ Enhancement +11/-29

Iterator pattern improvements and formatting

• Reformatted long iterator chains to multi-line format
• Simplified variable iteration patterns (e.g., for &v in src.iter() instead of indexed loop)
• Improved readability of filter and map operations

tests/redteam_beta.rs


10. tests/rank_index/fastscan.rs Formatting +15/-7

Import reorganization and formatting

• Reorganized imports to place ordvec imports before external crates
• Reformatted multi-line iterator chains and function calls

tests/rank_index/fastscan.rs


11. tests/rank_index/main.rs Formatting +14/-11

Import organization and iterator modernization

• Reorganized imports to follow Rust conventions
• Replaced std::iter::repeat(0u8).take(n) with std::iter::repeat_n(0u8, n)
• Reformatted multi-line function calls and assertions

tests/rank_index/main.rs


12. tests/rank_index/quant.rs ✨ Enhancement +17/-14

Import reorganization and iterator improvements

• Reorganized imports to place ordvec imports before external crates
• Simplified loop patterns using iterator methods instead of indexed access
• Reformatted multi-line expressions for readability

tests/rank_index/quant.rs


13. src/rank_index/quant_kernels.rs ✨ Enhancement +13/-8

Clippy lint handling for kernel functions

• Added #[allow(clippy::too_many_arguments)] attributes to kernel functions
• Reformatted long _mm512_setr_epi32 macro invocations to single line
• Improved assertion message formatting

src/rank_index/quant_kernels.rs


14. src/rank.rs Formatting +15/-8

Code formatting and clippy lint handling

• Added #[allow(clippy::needless_range_loop)] attribute to indexed loop
• Reformatted multi-line iterator chains and assertions
• Improved readability of mathematical expressions

src/rank.rs


15. tests/rank_index/index.rs Formatting +3/-1

Import reorganization and clippy lint handling

• Reorganized imports to place ordvec imports before external crates
• Added #[allow(clippy::needless_range_loop)] attributes to indexed loops

tests/rank_index/index.rs


16. src/rank_index/util.rs Formatting +2/-7

Code formatting improvements

• Reformatted long conditional expressions to multi-line format
• Simplified function signature formatting

src/rank_index/util.rs


17. src/rank_index/index.rs 📝 Documentation +7/-3

Genericize documentation and improve formatting

• Updated doc comment to remove TurboQuant reference and genericize description
• Reformatted struct initialization to multi-line format

src/rank_index/index.rs


18. src/rank_index/multi_bucket.rs Formatting +3/-6

Code formatting improvement

• Reformatted multi-line iterator chain to more compact format

src/rank_index/multi_bucket.rs


19. tests/redteam_delta.rs Formatting +7/-5

Code formatting

• Reformatted multi-line format string and file creation calls

tests/redteam_delta.rs


20. tests/rank_index/multi_bucket.rs Formatting +2/-2

Import reorganization

• Reorganized imports to place ordvec imports before external crates

tests/rank_index/multi_bucket.rs


21. docs/RANK_MODES.md 📝 Documentation +226/-214

De-fiction documentation and remove external performance claims

• Updated title and introduction to remove TurboQuant references and genericize for ordvec
• Removed fictional performance claims and external corpus references
• Updated benchmark environment details to remove system-specific information
• Replaced TurboQuant comparison tables with ordvec-only results from synthetic corpus
• Updated all references from "turbovec" to "ordvec" and removed Harrier/arXiv citations
• Rewrote sections to focus on reproducible in-repo benchmarks instead of external claims
• Updated API surface table to reflect ordvec's actual capabilities
• Simplified reproducibility instructions (removed BLAS linking requirement)

docs/RANK_MODES.md


22. README.md 📝 Documentation +8/-4

Dual MIT/Apache-2.0 license implementation

• Updated license badge from MIT-only to dual MIT OR Apache-2.0
• Updated license section to document dual licensing with both LICENSE-MIT and LICENSE-APACHE files
• Removed reference to single LICENSE file
• Clarified turbovec extraction attribution in provenance note

README.md


23. benchmarks/rank_modes_results.txt 📝 Documentation +21/-19

Add FastScan benchmark results and refresh performance metrics

• Added new RankQuantFastscanIndex b=2 benchmark row with block-32 PQ-LUT fast path implementation
• Regenerated all benchmark results with updated performance metrics across all index types
• Updated corpus generation time from 0.16s to 0.17s
• Updated JSON benchmark data structure with new fastscan entry and refreshed performance numbers

benchmarks/rank_modes_results.txt


24. LICENSE-APACHE ⚙️ Configuration changes +176/-0

Add Apache License 2.0 for dual-licensing

• Added complete Apache License 2.0 text (176 lines)
• Establishes dual-licensing framework alongside MIT license
• Enables crate publication under either MIT or Apache-2.0 terms

LICENSE-APACHE


25. LICENSE-MIT ⚙️ Configuration changes +1/-6

Update MIT license copyright to Nelson Spence

• Changed copyright holder from Ryan Codrai to Nelson Spence
• Removed detailed turbovec attribution text from license header
• Simplified license to core MIT permission grant

LICENSE-MIT


26. docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md 📝 Documentation +1/-1

Fix internal path reference in documentation

• Updated internal path reference from turbovec/src/rank_index/bitmap.rs to
 src/rank_index/bitmap.rs
• Removes fictional turbovec namespace reference to reflect extracted crate structure

docs/FOLLOWUP_BODY_KERNEL_TIE_BREAK.md


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 22, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Remediation recommended

1. License metadata mismatch ✓ Resolved 🐞 Bug ⚙ Maintainability
Description
README.md now states the crate is licensed “MIT OR Apache-2.0”, but Cargo.toml still declares
license = "MIT", so cargo/crates.io metadata and license-scanners will report MIT-only while the
docs/files claim dual licensing.
Code

README.md[R46-51]

Evidence
README explicitly declares dual licensing, while Cargo.toml’s license field is still MIT-only,
creating a direct contradiction in the same repo state after this PR’s README/license-file changes.

README.md[44-51]
Cargo.toml[1-8]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The PR updates README + adds Apache license file to present the crate as dual-licensed, but `Cargo.toml` still declares MIT-only. This creates inconsistent licensing signals for downstream users and automated tooling (e.g., crates.io metadata, compliance scanners).

### Issue Context
- README now claims dual licensing (MIT OR Apache-2.0).
- Cargo manifest still says `license = "MIT"`.

### Fix Focus Areas
- Cargo.toml[1-12]
- README.md[44-51]

### Suggested fix
Update `Cargo.toml` to use the SPDX expression:
- `license = "MIT OR Apache-2.0"`

Optionally, ensure README wording matches the exact SPDX expression used in Cargo.toml.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions the project from turbovec to ordvec, updating licenses to a dual MIT/Apache-2.0 scheme and changing the copyright holder. It removes the serde dependency and introduces the RankQuantFastscanIndex for optimized b=2 scans. Extensive updates were made to documentation and benchmarks to reflect the new project name and performance characteristics. The codebase underwent significant formatting and linting improvements, including the use of is_multiple_of for clarity and clippy attribute additions to suppress specific warnings in SIMD kernels. I have no feedback to provide.

The x86 SIMD dispatch (select_simd_tier + the SimdTier match arms, the AVX
kernels, BATCHED_AVX512_CHUNK) is cfg(target_arch=x86_64)-gated, but the glue
it references — the SimdTier::Avx2/Avx512 variants, the batched-chunk consts,
and the simd_tier / centre_drop_used bindings — was defined unconditionally.
On non-x86 (aarch64 / macos-latest CI) those are dead/unused and, under
RUSTFLAGS=-D warnings, fail the build with 7 dead_code/unused errors.

Add cfg_attr(not(target_arch=x86_64), allow(...)) to each so the crate builds
clean on aarch64 (scalar path) while x86 is untouched. Verified: aarch64 lib +
tests + examples compile clean under -D warnings; x86 fmt/clippy/test 80/86.
Strip TurboQuant/Harrier perf claims from doc comments — fastscan.rs now
points at the reproducible in-repo bench row and the top-10 parity test
instead of "1.63x faster than TurboQuant b=2 on Harrier 207k".
Genericize turbovec-internal doc refs (index.rs swap_remove, rank_io.rs
format note) and rename the .tvsb temp files turbovec_* -> ordvec_*. Drop
"Harrier" from the sign_bitmap embedding-family example.

Rewrite docs/RANK_MODES.md as standalone (no TurboQuant baseline table;
synthetic in-repo headline, real-corpus results deferred to the paper)
and fix the stale path in FOLLOWUP_BODY_KERNEL_TIE_BREAK.md. Regenerate
benchmarks/rank_modes_results.txt with the FastScan row.

Dual-license MIT OR Apache-2.0: LICENSE-MIT (Nelson Spence) +
LICENSE-APACHE, drop the scaffold's bare LICENSE. ordvec is original
rank/sign work (no TurboQuant code), so copyright is Nelson's; the
turbovec origin stays as a provenance note in README/lib.rs.
@Fieldnote-Echo Fieldnote-Echo merged commit b4621b7 into main May 22, 2026
@Fieldnote-Echo Fieldnote-Echo deleted the prod/defiction branch May 23, 2026 00:08
Fieldnote-Echo added a commit that referenced this pull request May 23, 2026
… review)

Address the bot review wave (gemini/Codex/qodo) — all "convert core panics to
clean Python errors", completing the binding's boundary-guard design:

- Width validation (check_width): every f32 input now checks ncols == dim (2-D)
  / len == dim (1-D). The core derives n = len/dim and only asserts divisibility,
  so a wrong-but-divisible shape (e.g. (1,128) into a dim-64 index) was silently
  reinterpreted as a different vector count, or panicked on the result reshape.
  Now a clean ValueError. (gemini x3 critical, Codex x2 P1, qodo #3)
- Constructor validation: Rank/RankQuant/Bitmap/SignBitmap `new` return PyResult
  and validate against the EXACT core asserts (dim in [2, u16::MAX]; bits in
  {1,2,4} + dim multiple of 8/bits and 2^bits; dim % 64 + 0 < n_top < dim;
  dim % 64 + <= MAX_SIGN_BITMAP_DIM) -> ValueError instead of panic. (gemini x4)
- swap_remove (Rank, RankQuant): bounds-check -> IndexError, not panic.
  (gemini high, qodo #4)
- README provenance tightened to the canonical "developed within turbovec,
  factored out" phrasing. (qodo #2)

Tests: +9 (width-mismatch x6, swap_remove OOB x3); constructor-rejection tests
tightened from BaseException to ValueError. Suite now 117 passed + 1 xfail.
clippy -D warnings + fmt clean; MSRV 1.89 builds core + binding.

Not changed: qodo #1 (ndarray via numpy) is a deliberate, documented core-vs-
binding split (deps grep + publish scoped to -p ordvec; the core's published lock
is clean; the binding is publish = false, PyPI-only) -- explained on-thread.
Fieldnote-Echo added a commit that referenced this pull request May 25, 2026
…-fn)

Codex stop-review #2: the previous commit over-claimed write/load
"symmetry" for the public write_* free functions. Those are low-level
trusted serializers — they do NOT re-validate dim / n_vectors / n_top /
bits / divisibility / data semantics, all of which the loaders check, so a
direct caller can still produce a file load_* rejects. The MAX_PAYLOAD
guard closed only one of several asymmetries.

The actual round-trip guarantee is type-level and was already designed in:
Rank/RankQuant/Bitmap/SignBitmap::new validate dim / n_top / bits /
divisibility against the loaders' bounds (Bitmap::new's comment says so
explicitly), add() caps n_vectors, write() caps payload, and the types emit
only loader-valid data — so anything T::write produces, T::load reloads.

- rank_io module docs gain a "Round-trip contract" section: round-trip is a
  type-level guarantee; write_* are trusted serializers assuming loader-valid
  input; MAX_PAYLOAD is the one loader bound they also enforce
  (defense-in-depth + no truncation before File::create).
- Dropped "symmetric write/load" from the four writer comments,
  check_payload_bytes, and the MAX_VECTORS doc.
- Fixed a pre-existing module-doc inaccuracy: MAX_DIM * MAX_VECTORS is
  ~8 TiB, not 128 GiB — MAX_PAYLOAD is the binding byte ceiling.

Doc/comment-only; no behavior change. Gate green: fmt, clippy -D warnings,
rustdoc -D warnings, test (default/experimental/no-default).
Fieldnote-Echo added a commit that referenced this pull request May 25, 2026
…he persistence API

Codex stop-review #2 follow-through (Nelson's call): the raw rank_io
write_*/load_* free functions were public but are trusted serializers, not
validated constructors — leaving them public invited the wrong mental model
(and was the root of the retracted "write/load symmetry" claim). Close the
door before crates.io locks the surface.

- write_rank / load_rank / write_rankquant / load_rankquant / write_bitmap /
  load_bitmap / write_sign_bitmap / load_sign_bitmap -> pub(crate). The
  MAX_DIM / MAX_SIGN_BITMAP_DIM / MAX_VECTORS constants stay public.
- The supported persistence API is now unambiguously the index types'
  write()/load(): Rank / RankQuant / Bitmap / SignBitmap. Module docs updated
  to a "Persistence API & round-trip contract" section.
- Relocated the rank_io-layer tests to a src/rank_io.rs unit-test module
  (they need crate-internal access): the TV-DESER-004/005 loader red-team
  tests (from tests/redteam_delta.rs, now deleted) and the write-guard test
  (from tests/index/main.rs). Same coverage, co-located with the code.

The Python binding only consumed the MAX_* constants (unchanged); examples
and benches don't touch the free functions — no other fallout.

Gate green: fmt, clippy -D warnings, rustdoc -D warnings, test
(default/experimental/no-default), MSRV 1.89, --locked.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant