fix: remediate pre-release audit r5 — SIMD safety docs, fuzz hot paths, loader/Python hardening by Fieldnote-Echo · Pull Request #40 · Fieldnote-Echo/ordvec

Fieldnote-Echo · 2026-05-25T14:09:17Z

Triage + remediation of the 5th external pre-release audit (Rust crate + Python bindings). The audit confirmed no memory-safety / CVE-class findings; everything is hardening, docs, or coverage. Audit was taken at an older commit, so each finding was re-verified against current main first.

Fixed (9)

#	Sev	Fix	Commit
5	Med	Per-lane `SAFETY` + AVX-512 `0xFF00` blend-mask rationale on all 4 scan kernels	`docs: …SIMD scan-kernel safety`
6	Med	Slice-length `// SAFETY:` invariant on the AVX2/AVX-512 raw-pointer scans	(same)
8	Med	Path-trust note promoted to per-method docstrings (write/load ×4 types)	`docs(python): …path-trust`
9	Low	`search_asymmetric_byte_lut` bits==1 error reworded benchmark-only (no `{2,4}` dispatch leak)	(same)
1	Low	Loaders propagate `metadata()?` instead of swallowing as `0` (TOCTOU false-reject)	`fix: propagate loader metadata error`
7	Low	`TopK::new(0)` inert-edge test	`test: cover TopK::new(0)`
14	Low	Weekly scheduled `cargo-deny advisories` scan	`ci: weekly …advisories scan`
11	Low	`roundtrip_rankquant` fuzz target (write→load)	`test: fuzz …hot paths`
12	Low	`search_rankquant` fuzz target (add→search, drives SIMD scan + TopK)	(same)

Fuzz validated locally: cargo +nightly fuzz build green; ~14.7k / 12.8k executions each, no crashes. (Covers the Perplexity-deferred add/search fuzz too.)

Verified already-satisfied (1)

ordvec-python: batched reshape relies on core row-length invariant (debug_assert compiled out in release) #15 — python.yml already builds the abi3 wheel, installs the built wheel, and runs pytest across 3.9 (floor) + 3.13 (ceiling) plus 3.13 on arm64/macOS/windows. No change needed.

Deferred to issues (low-value / higher-effort)

chore(ci): production release gate (CI matrix, SDE AVX-512, deny, MSRV) #3, test: cargo-fuzz loader targets + SignBitmap malformed-loader coverage #4 → perf: reduce per-operation allocations on hot paths (rank_transform_into, l2_normalise) #38 (per-op allocations: rank_transform_into, l2_normalise — thread-local scratch).
perf: optimize symmetric rank-cosine search (centre-drop identity) #8 opt-in guard → feat(python): opt-in safe_paths guard (canonicalize + prefix) for write/load #39 (safe_paths canonicalize+prefix; the per-method disclosure — the audit’s minimum bar — landed here).
feat: ordvec-python — PyO3 bindings for Rank/RankQuant/Bitmap/SignBitmap #13 (Miri) → already tracked in ci: add a sanitizer job (ASAN for AVX2/NEON unsafe; Miri analysis) #31 (Perplexity-deferred sanitizer CI; Miri is rayon-blocked and cannot reach the SIMD unsafe — documented there).

Declined (non-issues, with rationale)

docs: de-fiction extracted crate + dual MIT/Apache license #2 (write_* assert_eq! → debug_assert!): writes are not a hot path, and the hard assert is deliberate defense-in-depth — converting it would regress the fix: pre-public public-API contract hardening (release-readiness triage) #26 "debug_assert compiles out in release" lesson.
chore: publish-prep — exclude dev files, drop no-op docs.rs arg, fix README link #10 (rank_norm(0) Python ValueError): deliberate Rust↔Python parity — the core is total there (defined output, no panic); a Python-only ValueError would diverge from the Rust API, exactly the call made for rank_to_bucket/bucket_centre in Pre-flip: remediate release-gate audit (8 findings) + full Python parity #23/fix: pre-public public-API contract hardening (release-readiness triage) #26.

Gate

fmt · clippy (core + -p ordvec-python) · test (default / no-default / experimental) · fuzz build + run — all green. No core Cargo.toml/lock change.

Merges under strict protection (CI + 1 non-self code-owner approval).

Audit r5 #5/#6: the four pub(crate) asymmetric scan kernels read packed docs via raw pointer arithmetic but stated no SAFETY invariant. Add the slice-length / query-length / divisibility contract to each, and document the scan_b4_asym_avx512 0xFF00 blend-mask per-lane semantics (lanes 0-7 <- chunk_lo, 8-15 <- chunk_hi, pairing with the shift vector). Comment-only; no behavioural change.

Audit r5 #1: all four loaders did file.metadata().map(|m| m.len()).unwrap_or(0). On a metadata race (NFS/procfs) that yields a bogus 0 that false-rejects a valid index. Propagate the io::Error via ? instead.

Audit r5 #7: k==0 arises when an empty index clamps k=min(requested,n)=0. Pin that maybe_insert is a no-op and finalize_into emits nothing, without panicking on the zero-length slots.

…rror Audit r5 #8: promote the module-level 'paths are not traversal-sanitised' disclosure to each write/load method so it shows in IDE hover. Audit r5 #9: reword the search_asymmetric_byte_lut bits==1 error as a benchmark-only-helper message rather than leaking the {2,4} kernel-dispatch detail.

Audit r5 #11/#12 (and the Perplexity-deferred add/search fuzz): the four existing targets only cover the loaders. Add search_rankquant (add -> search_asymmetric, driving the SIMD scan kernels + TopK, k=0 included) and roundtrip_rankquant (write -> load round-trip, the write path the loader targets never reach). Validated: fuzz build green + ~14k/13k runs each with no crashes.

Audit r5 #14: the ci.yml deny job runs on push/PR only, so a new RUSTSEC advisory against an unchanged dependency goes unseen until the next PR. Add a scheduled (weekly) + on-demand advisories scan, read-only token, mirroring ci.yml's EmbarkStudios/cargo-deny-action@v2.

qodo-code-review · 2026-05-25T14:09:41Z

Review Summary by Qodo

Audit r5 remediation: SIMD safety docs, fuzz hot paths, loader hardening

🐞 Bug fix 🧪 Tests 📝 Documentation

Walkthroughs

Description

• Propagate loader metadata errors instead of swallowing as 0 (TOCTOU false-reject fix)
• Add comprehensive SAFETY documentation to all 4 SIMD scan kernels with slice/query invariants
• Add per-method path-trust docstrings to all 8 write/load methods across Python bindings
• Add two new fuzz targets: search_rankquant (compute hot path) and roundtrip_rankquant
  (write→load)
• Add TopK::new(0) edge-case test and reword search_asymmetric_byte_lut error message
• Add weekly scheduled cargo-deny advisories CI scan for supply-chain hygiene

Diagram

flowchart LR
  A["Audit Findings"] --> B["SIMD Safety Docs"]
  A --> C["Loader Hardening"]
  A --> D["Python Path Trust"]
  A --> E["Fuzz Coverage"]
  A --> F["CI Advisories"]
  B --> G["4 Scan Kernels"]
  C --> H["Metadata Error Propagation"]
  D --> I["8 Write/Load Methods"]
  E --> J["search_rankquant Target"]
  E --> K["roundtrip_rankquant Target"]
  F --> L["Weekly Schedule"]

File Changes

1. src/quant_kernels.rs 📝 Documentation +46/-0

Document SIMD scan kernel safety invariants

• Add comprehensive SAFETY documentation block to all 4 scan kernels (scan_b2_asym_avx2,
 scan_b4_asym_avx2, scan_b2_asym_avx512, scan_b4_asym_avx512)
• Document slice-length, query-length, and divisibility invariants for raw pointer arithmetic
• Add per-lane semantics documentation for AVX-512 blend mask 0xFF00 in scan_b4_asym_avx512
• Explain lane pairing with shift vector for nibble extraction (lanes 0-7 ← chunk_lo, lanes 8-15 ←
 chunk_hi)

src/quant_kernels.rs

2. src/rank_io.rs 🐞 Bug fix +16/-4

Propagate metadata errors in all loaders

• Propagate file.metadata() errors via ? instead of swallowing as 0 in 4 loader functions
• Affected loaders: load_rank, load_rankquant, load_bitmap, load_sign_bitmap
• Add comments explaining TOCTOU race prevention on NFS/procfs filesystems
• Prevents false-rejection of valid indices when metadata call fails

src/rank_io.rs

3. src/util.rs 🧪 Tests +15/-1

Test TopK zero-k edge case

• Add TopK::new(0) edge-case test (topk_zero_k_is_inert)
• Verify maybe_insert is a no-op and finalize_into emits nothing with zero-length slots
• Ensure no panic or out-of-bounds access on empty TopK
• Import TopK in test module for test coverage

src/util.rs

View more (5)

4. ordvec-python/src/lib.rs 📝 Documentation +29/-1

Add path-trust docstrings to Python write/load methods

• Add per-method path-trust docstrings to Rank.write() and Rank.load() methods
• Add per-method path-trust docstrings to RankQuant.write() and RankQuant.load() methods
• Add per-method path-trust docstrings to Bitmap.write() and Bitmap.load() methods
• Add per-method path-trust docstrings to SignBitmap.write() and SignBitmap.load() methods
• Reword search_asymmetric_byte_lut error message to clarify benchmark-only helper status

ordvec-python/src/lib.rs

5. fuzz/fuzz_targets/search_rankquant.rs 🧪 Tests +50/-0

Add search_rankquant hot-path fuzz target

• New libFuzzer target for RankQuant ingest + asymmetric-search hot path
• Exercises add (rank_transform → pack) then search_asymmetric (SIMD scan kernels → TopK)
• Fixed dim=64 divisible by all bits ∈ {1,2,4} and AVX-512 64-code unroll
• Fuzzer shapes doc count, embedding/query values, and k (including k=0 edge)
• Maps embedding values to finite f32 to exercise kernels, not NaN/±Inf guards

fuzz/fuzz_targets/search_rankquant.rs

6. fuzz/fuzz_targets/roundtrip_rankquant.rs 🧪 Tests +52/-0

Add roundtrip_rankquant write-load fuzz target

• New libFuzzer target for RankQuant write → load round-trip
• Builds real index from fuzzer-shaped data, persists via write, requires load to accept and
 preserve shape
• Exercises write path that loader-only targets cannot reach
• Validates type-level round-trip guarantee: write then load must succeed
• Fuzzer shapes doc count and embedding values; fixed dim=64

fuzz/fuzz_targets/roundtrip_rankquant.rs

7. fuzz/Cargo.toml ⚙️ Configuration changes +16/-0

Register new fuzz targets in Cargo.toml

• Register two new fuzz target binaries: search_rankquant and roundtrip_rankquant
• Both configured with test=false, doc=false, bench=false (standard fuzz target setup)
• Add comment explaining hot-path targets beyond loaders

fuzz/Cargo.toml

8. .github/workflows/audit.yml ⚙️ Configuration changes +34/-0

Add weekly cargo-deny advisories CI scan

• New GitHub Actions workflow for weekly scheduled cargo-deny advisories scan
• Runs on Monday 06:17 UTC and on manual workflow dispatch
• Uses EmbarkStudios/cargo-deny-action@v2 with check advisories command
• Read-only token, no run: steps to prevent injection surface
• Catches newly published RustSec advisories between PR cycles

.github/workflows/audit.yml

qodo-code-review · 2026-05-25T14:09:42Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. ~~Unpinned Actions in cron~~ ✓ Resolved 🐞 Bug ⛨ Security

Description

The new weekly-scheduled audit workflow runs third-party GitHub Actions pinned only to mutable tags
(e.g., @v4, @v2), so the executed code can change without a PR in this repo. Because this runs
on a cron schedule, a compromised/malicious tag update could execute automatically on runners.

Code

.github/workflows/audit.yml[R31-34]

Evidence

The new scheduled workflow uses tag-based uses: references for both actions/checkout and
cargo-deny-action, which are mutable references. The repo’s main CI workflow shows the same
pattern, confirming this is not yet mitigated elsewhere.

.github/workflows/audit.yml[31-34]
.github/workflows/ci.yml[144-152]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The scheduled `audit.yml` workflow references GitHub Actions using mutable version tags (e.g., `actions/checkout@v4`, `EmbarkStudios/cargo-deny-action@v2`). Tags can be retargeted, so the workflow may execute different code over time without any change in this repository.

## Issue Context
This workflow is triggered weekly via `schedule`, so it will execute automatically even when no PRs are being opened. Although permissions are read-only, this is still avoidable supply-chain risk.

## Fix
Pin each referenced action to an immutable commit SHA (optionally keeping the tag in a comment for readability), e.g.:
- `uses: actions/checkout@<full_sha> # v4`
- `uses: EmbarkStudios/cargo-deny-action@<full_sha> # v2`

## Fix Focus Areas
- .github/workflows/audit.yml[31-34]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

gemini-code-assist

Code Review

This pull request introduces two new fuzz targets, search_rankquant and roundtrip_rankquant, to test the compute path and serialization round-trip of the RankQuant index. It also improves IO robustness by propagating metadata errors during index loading and adds comprehensive safety documentation to the unsafe SIMD kernels. Additionally, a unit test was added to ensure the TopK utility correctly handles cases where k=0, and Python bindings were updated with documentation regarding path traversal. I have no feedback to provide.

Copilot

Pull request overview

This PR applies the r5 pre-release audit hardening across the Rust core, SIMD docs, fuzzing coverage, Python bindings docs, and CI supply-chain scanning, without introducing new functionality or changing dependency manifests.

Changes:

Harden loader correctness by propagating metadata() errors instead of silently treating them as length 0.
Add fuzz coverage for RankQuant hot paths (ingest+search) and write→load round-trips, plus a TopK k=0 edge-case regression test.
Expand SIMD-kernel safety documentation and add per-method Python path-trust disclosures; add a weekly scheduled cargo-deny advisories workflow.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/util.rs	Adds a regression test covering `TopK::new(0)` inert behavior.
src/rank_io.rs	Propagates file metadata errors in all four loaders to avoid incorrect length checks.
src/quant_kernels.rs	Adds detailed `SAFETY:` invariants and AVX-512 blend-mask rationale to scan kernels.
ordvec-python/src/lib.rs	Documents path-trust expectations per `write`/`load` method; rewords a benchmark-only error message.
fuzz/fuzz_targets/search_rankquant.rs	New fuzz target exercising RankQuant ingest + asymmetric-search compute path.
fuzz/fuzz_targets/roundtrip_rankquant.rs	New fuzz target for RankQuant write→load round-trip.
fuzz/Cargo.toml	Registers the two new fuzz targets.
.github/workflows/audit.yml	Adds a weekly scheduled cargo-deny advisories scan workflow.

Comments suppressed due to low confidence (3)

src/rank_io.rs:364

Same issue as in load_rank: the comment claims a swallowed metadata() error would make the payload-vs-file check “pass … false-rejecting”, but in practice file_len = 0 makes check_payload_matches_file reject any non-empty payload and can let empty-payload files bypass the trailing-bytes check. Please reword to match the real behavior.

    // Propagate a metadata failure rather than swallowing it as `0`: a bogus
    // `0` would make the payload-vs-file check pass for an empty corpus even on
    // a TOCTOU-racy filesystem (NFS/procfs), false-rejecting a valid index.
    let file_len = file.metadata()?.len();

src/rank_io.rs:493

Same comment-rationale mismatch here: with file_len = 0, check_payload_matches_file will reject any non-empty bitmap file and will not detect trailing bytes when n_vectors == 0. Consider rewording this duplicated comment block so it describes the actual failure modes.

    // Propagate a metadata failure rather than swallowing it as `0`: a bogus
    // `0` would make the payload-vs-file check pass for an empty corpus even on
    // a TOCTOU-racy filesystem (NFS/procfs), false-rejecting a valid index.
    let file_len = file.metadata()?.len();

src/rank_io.rs:612

Same duplicated comment issue as the other loaders: swallowing metadata() errors as 0 does not just cause “false-reject”; it can also defeat the trailing-bytes check for the empty-payload case. Please reword the rationale to match what check_payload_matches_file actually does when file_len is bogus.

    // Propagate a metadata failure rather than swallowing it as `0`: a bogus
    // `0` would make the payload-vs-file check pass for an empty corpus even on
    // a TOCTOU-racy filesystem (NFS/procfs), false-rejecting a valid index.
    let file_len = file.metadata()?.len();

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…4} text) The #9 reword in this PR dropped the '{2, 4}' dispatch detail from the search_asymmetric_byte_lut bits==1 error; test_search_asymmetric_byte_lut_rejects_b1 had pinned that substring via match='2, 4', so CI's pytest leg failed. Re-point it at the new benchmark-only phrasing. Verified locally: maturin develop + pytest = 155 passed.

copilot: the comment conflated 'passes for an empty corpus' with 'false-rejects a valid index'. Reword to the actual failure modes a bogus file_len=0 causes — false-reject of any non-empty index, and an empty-corpus pass that skips the trailing-byte check. Comment-only; all 4 loaders.

…or (#40 review) copilot: the target swallowed write() failures (hiding regressions) and wrote to an already-open NamedTempFile handle (reopen-by-path write can fail on Windows). Write to a fresh path in tempfile::tempdir() and treat a write failure as a crash, matching the stated round-trip contract. Validated: fuzz build + 7531 runs, no crashes.

Fieldnote-Echo · 2026-05-25T14:31:32Z

Addressed the review (latest commits on the branch):

copilot — `src/rank_io.rs` metadata comment: reworded to the actual failure modes a bogus `file_len = 0` causes — false-reject of any non-empty index (`remaining` saturates to 0 ≠ positive `payload_bytes`), and an empty-corpus pass that skips the trailing-byte check. Applied to all 4 loaders.
copilot — `fuzz/.../roundtrip_rankquant.rs`: now writes to a fresh `tempfile::tempdir()` path (not an already-open `NamedTempFile` handle) and treats a `write()` failure as a crash, matching the stated round-trip contract. Re-validated: cargo +nightly fuzz build + 7531 runs, no crashes.

gemini had no findings. Both threads resolved.

@v2

Review (Bug/Security): audit.yml runs UNATTENDED on a cron schedule, so a mutable @v4/@v2 tag could ship changed/compromised action code that auto-executes on the next tick with no PR in the loop. Pin actions/checkout to 34e1148 (v4.3.1) and EmbarkStudios/cargo-deny-action to a531616 (v2.0.19). ci.yml stays on tags by the team's existing policy — it only runs on human-triggered push/PR, a lower-risk surface.

Fieldnote-Echo added 6 commits May 25, 2026 09:08

fix: propagate loader metadata error instead of swallowing as 0

6bfe161

Audit r5 #1: all four loaders did file.metadata().map(|m| m.len()).unwrap_or(0). On a metadata race (NFS/procfs) that yields a bogus 0 that false-rejects a valid index. Propagate the io::Error via ? instead.

test: cover TopK::new(0) inert edge

3bec534

Audit r5 #7: k==0 arises when an empty index clamps k=min(requested,n)=0. Pin that maybe_insert is a no-op and finalize_into emits nothing, without panicking on the zero-length slots.

Fieldnote-Echo requested a review from project-navi-bot as a code owner May 25, 2026 14:09

Fieldnote-Echo requested a review from Copilot May 25, 2026 14:10

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

Merge branch 'main' into fix/pre-release-audit-r5

b1d02ba

Copilot started reviewing on behalf of Fieldnote-Echo May 25, 2026 14:11 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread src/rank_io.rs Outdated

Comment thread fuzz/fuzz_targets/roundtrip_rankquant.rs Outdated

Fieldnote-Echo added 3 commits May 25, 2026 09:18

project-navi-bot previously approved these changes May 25, 2026

View reviewed changes

Fieldnote-Echo dismissed project-navi-bot’s stale review via b8fc2d1 May 25, 2026 14:33

project-navi-bot approved these changes May 25, 2026

View reviewed changes

project-navi-bot merged commit e160e8a into main May 25, 2026
17 checks passed

project-navi-bot deleted the fix/pre-release-audit-r5 branch May 25, 2026 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: remediate pre-release audit r5 — SIMD safety docs, fuzz hot paths, loader/Python hardening#40

fix: remediate pre-release audit r5 — SIMD safety docs, fuzz hot paths, loader/Python hardening#40
project-navi-bot merged 11 commits into
mainfrom
fix/pre-release-audit-r5

Fieldnote-Echo commented May 25, 2026

Uh oh!

qodo-code-review Bot commented May 25, 2026

Uh oh!

qodo-code-review Bot commented May 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Fieldnote-Echo commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Fieldnote-Echo commented May 25, 2026

Fixed (9)

Verified already-satisfied (1)

Deferred to issues (low-value / higher-effort)

Declined (non-issues, with rationale)

Gate

Uh oh!

qodo-code-review Bot commented May 25, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Fieldnote-Echo commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qodo-code-review Bot commented May 25, 2026 •

edited

Loading