Skip to content

fix: remediate pre-release audit r5 — SIMD safety docs, fuzz hot paths, loader/Python hardening#40

Merged
project-navi-bot merged 11 commits into
mainfrom
fix/pre-release-audit-r5
May 25, 2026
Merged

fix: remediate pre-release audit r5 — SIMD safety docs, fuzz hot paths, loader/Python hardening#40
project-navi-bot merged 11 commits into
mainfrom
fix/pre-release-audit-r5

Conversation

@Fieldnote-Echo
Copy link
Copy Markdown
Owner

Triage + remediation of the 5th external pre-release audit (Rust crate + Python bindings). The audit confirmed no memory-safety / CVE-class findings; everything is hardening, docs, or coverage. Audit was taken at an older commit, so each finding was re-verified against current main first.

Fixed (9)

# Sev Fix Commit
5 Med Per-lane SAFETY + AVX-512 0xFF00 blend-mask rationale on all 4 scan kernels docs: …SIMD scan-kernel safety
6 Med Slice-length // SAFETY: invariant on the AVX2/AVX-512 raw-pointer scans (same)
8 Med Path-trust note promoted to per-method docstrings (write/load ×4 types) docs(python): …path-trust
9 Low search_asymmetric_byte_lut bits==1 error reworded benchmark-only (no {2,4} dispatch leak) (same)
1 Low Loaders propagate metadata()? instead of swallowing as 0 (TOCTOU false-reject) fix: propagate loader metadata error
7 Low TopK::new(0) inert-edge test test: cover TopK::new(0)
14 Low Weekly scheduled cargo-deny advisories scan ci: weekly …advisories scan
11 Low roundtrip_rankquant fuzz target (write→load) test: fuzz …hot paths
12 Low search_rankquant fuzz target (add→search, drives SIMD scan + TopK) (same)

Fuzz validated locally: cargo +nightly fuzz build green; ~14.7k / 12.8k executions each, no crashes. (Covers the Perplexity-deferred add/search fuzz too.)

Verified already-satisfied (1)

Deferred to issues (low-value / higher-effort)

Declined (non-issues, with rationale)

Gate

fmt · clippy (core + -p ordvec-python) · test (default / no-default / experimental) · fuzz build + run — all green. No core Cargo.toml/lock change.

Merges under strict protection (CI + 1 non-self code-owner approval).

Audit r5 #5/#6: the four pub(crate) asymmetric scan kernels read packed docs via raw pointer arithmetic but stated no SAFETY invariant. Add the slice-length / query-length / divisibility contract to each, and document the scan_b4_asym_avx512 0xFF00 blend-mask per-lane semantics (lanes 0-7 <- chunk_lo, 8-15 <- chunk_hi, pairing with the shift vector). Comment-only; no behavioural change.
Audit r5 #1: all four loaders did file.metadata().map(|m| m.len()).unwrap_or(0). On a metadata race (NFS/procfs) that yields a bogus 0 that false-rejects a valid index. Propagate the io::Error via ? instead.
Audit r5 #7: k==0 arises when an empty index clamps k=min(requested,n)=0. Pin that maybe_insert is a no-op and finalize_into emits nothing, without panicking on the zero-length slots.
…rror

Audit r5 #8: promote the module-level 'paths are not traversal-sanitised' disclosure to each write/load method so it shows in IDE hover. Audit r5 #9: reword the search_asymmetric_byte_lut bits==1 error as a benchmark-only-helper message rather than leaking the {2,4} kernel-dispatch detail.
Audit r5 #11/#12 (and the Perplexity-deferred add/search fuzz): the four existing targets only cover the loaders. Add search_rankquant (add -> search_asymmetric, driving the SIMD scan kernels + TopK, k=0 included) and roundtrip_rankquant (write -> load round-trip, the write path the loader targets never reach). Validated: fuzz build green + ~14k/13k runs each with no crashes.
Audit r5 #14: the ci.yml deny job runs on push/PR only, so a new RUSTSEC advisory against an unchanged dependency goes unseen until the next PR. Add a scheduled (weekly) + on-demand advisories scan, read-only token, mirroring ci.yml's EmbarkStudios/cargo-deny-action@v2.
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Audit r5 remediation: SIMD safety docs, fuzz hot paths, loader hardening

🐞 Bug fix 🧪 Tests 📝 Documentation

Grey Divider

Walkthroughs

Description
• Propagate loader metadata errors instead of swallowing as 0 (TOCTOU false-reject fix)
• Add comprehensive SAFETY documentation to all 4 SIMD scan kernels with slice/query invariants
• Add per-method path-trust docstrings to all 8 write/load methods across Python bindings
• Add two new fuzz targets: search_rankquant (compute hot path) and roundtrip_rankquant
  (write→load)
• Add TopK::new(0) edge-case test and reword search_asymmetric_byte_lut error message
• Add weekly scheduled cargo-deny advisories CI scan for supply-chain hygiene
Diagram
flowchart LR
  A["Audit Findings"] --> B["SIMD Safety Docs"]
  A --> C["Loader Hardening"]
  A --> D["Python Path Trust"]
  A --> E["Fuzz Coverage"]
  A --> F["CI Advisories"]
  B --> G["4 Scan Kernels"]
  C --> H["Metadata Error Propagation"]
  D --> I["8 Write/Load Methods"]
  E --> J["search_rankquant Target"]
  E --> K["roundtrip_rankquant Target"]
  F --> L["Weekly Schedule"]

Loading

File Changes

1. src/quant_kernels.rs 📝 Documentation +46/-0

Document SIMD scan kernel safety invariants

• Add comprehensive SAFETY documentation block to all 4 scan kernels (scan_b2_asym_avx2,
 scan_b4_asym_avx2, scan_b2_asym_avx512, scan_b4_asym_avx512)
• Document slice-length, query-length, and divisibility invariants for raw pointer arithmetic
• Add per-lane semantics documentation for AVX-512 blend mask 0xFF00 in scan_b4_asym_avx512
• Explain lane pairing with shift vector for nibble extraction (lanes 0-7 ← chunk_lo, lanes 8-15 ←
 chunk_hi)

src/quant_kernels.rs


2. src/rank_io.rs 🐞 Bug fix +16/-4

Propagate metadata errors in all loaders

• Propagate file.metadata() errors via ? instead of swallowing as 0 in 4 loader functions
• Affected loaders: load_rank, load_rankquant, load_bitmap, load_sign_bitmap
• Add comments explaining TOCTOU race prevention on NFS/procfs filesystems
• Prevents false-rejection of valid indices when metadata call fails

src/rank_io.rs


3. src/util.rs 🧪 Tests +15/-1

Test TopK zero-k edge case

• Add TopK::new(0) edge-case test (topk_zero_k_is_inert)
• Verify maybe_insert is a no-op and finalize_into emits nothing with zero-length slots
• Ensure no panic or out-of-bounds access on empty TopK
• Import TopK in test module for test coverage

src/util.rs


View more (5)
4. ordvec-python/src/lib.rs 📝 Documentation +29/-1

Add path-trust docstrings to Python write/load methods

• Add per-method path-trust docstrings to Rank.write() and Rank.load() methods
• Add per-method path-trust docstrings to RankQuant.write() and RankQuant.load() methods
• Add per-method path-trust docstrings to Bitmap.write() and Bitmap.load() methods
• Add per-method path-trust docstrings to SignBitmap.write() and SignBitmap.load() methods
• Reword search_asymmetric_byte_lut error message to clarify benchmark-only helper status

ordvec-python/src/lib.rs


5. fuzz/fuzz_targets/search_rankquant.rs 🧪 Tests +50/-0

Add search_rankquant hot-path fuzz target

• New libFuzzer target for RankQuant ingest + asymmetric-search hot path
• Exercises add (rank_transform → pack) then search_asymmetric (SIMD scan kernels → TopK)
• Fixed dim=64 divisible by all bits ∈ {1,2,4} and AVX-512 64-code unroll
• Fuzzer shapes doc count, embedding/query values, and k (including k=0 edge)
• Maps embedding values to finite f32 to exercise kernels, not NaN/±Inf guards

fuzz/fuzz_targets/search_rankquant.rs


6. fuzz/fuzz_targets/roundtrip_rankquant.rs 🧪 Tests +52/-0

Add roundtrip_rankquant write-load fuzz target

• New libFuzzer target for RankQuant write → load round-trip
• Builds real index from fuzzer-shaped data, persists via write, requires load to accept and
 preserve shape
• Exercises write path that loader-only targets cannot reach
• Validates type-level round-trip guarantee: write then load must succeed
• Fuzzer shapes doc count and embedding values; fixed dim=64

fuzz/fuzz_targets/roundtrip_rankquant.rs


7. fuzz/Cargo.toml ⚙️ Configuration changes +16/-0

Register new fuzz targets in Cargo.toml

• Register two new fuzz target binaries: search_rankquant and roundtrip_rankquant
• Both configured with test=false, doc=false, bench=false (standard fuzz target setup)
• Add comment explaining hot-path targets beyond loaders

fuzz/Cargo.toml


8. .github/workflows/audit.yml ⚙️ Configuration changes +34/-0

Add weekly cargo-deny advisories CI scan

• New GitHub Actions workflow for weekly scheduled cargo-deny advisories scan
• Runs on Monday 06:17 UTC and on manual workflow dispatch
• Uses EmbarkStudios/cargo-deny-action@v2 with check advisories command
• Read-only token, no run: steps to prevent injection surface
• Catches newly published RustSec advisories between PR cycles

.github/workflows/audit.yml


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 25, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Remediation recommended

1. Unpinned Actions in cron ✓ Resolved 🐞 Bug ⛨ Security
Description
The new weekly-scheduled audit workflow runs third-party GitHub Actions pinned only to mutable tags
(e.g., @v4, @v2), so the executed code can change without a PR in this repo. Because this runs
on a cron schedule, a compromised/malicious tag update could execute automatically on runners.
Code

.github/workflows/audit.yml[R31-34]

Evidence
The new scheduled workflow uses tag-based uses: references for both actions/checkout and
cargo-deny-action, which are mutable references. The repo’s main CI workflow shows the same
pattern, confirming this is not yet mitigated elsewhere.

.github/workflows/audit.yml[31-34]
.github/workflows/ci.yml[144-152]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The scheduled `audit.yml` workflow references GitHub Actions using mutable version tags (e.g., `actions/checkout@v4`, `EmbarkStudios/cargo-deny-action@v2`). Tags can be retargeted, so the workflow may execute different code over time without any change in this repository.

## Issue Context
This workflow is triggered weekly via `schedule`, so it will execute automatically even when no PRs are being opened. Although permissions are read-only, this is still avoidable supply-chain risk.

## Fix
Pin each referenced action to an immutable commit SHA (optionally keeping the tag in a comment for readability), e.g.:
- `uses: actions/checkout@<full_sha> # v4`
- `uses: EmbarkStudios/cargo-deny-action@<full_sha> # v2`

## Fix Focus Areas
- .github/workflows/audit.yml[31-34]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@Fieldnote-Echo Fieldnote-Echo requested a review from Copilot May 25, 2026 14:10
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two new fuzz targets, search_rankquant and roundtrip_rankquant, to test the compute path and serialization round-trip of the RankQuant index. It also improves IO robustness by propagating metadata errors during index loading and adds comprehensive safety documentation to the unsafe SIMD kernels. Additionally, a unit test was added to ensure the TopK utility correctly handles cases where k=0, and Python bindings were updated with documentation regarding path traversal. I have no feedback to provide.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR applies the r5 pre-release audit hardening across the Rust core, SIMD docs, fuzzing coverage, Python bindings docs, and CI supply-chain scanning, without introducing new functionality or changing dependency manifests.

Changes:

  • Harden loader correctness by propagating metadata() errors instead of silently treating them as length 0.
  • Add fuzz coverage for RankQuant hot paths (ingest+search) and write→load round-trips, plus a TopK k=0 edge-case regression test.
  • Expand SIMD-kernel safety documentation and add per-method Python path-trust disclosures; add a weekly scheduled cargo-deny advisories workflow.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/util.rs Adds a regression test covering TopK::new(0) inert behavior.
src/rank_io.rs Propagates file metadata errors in all four loaders to avoid incorrect length checks.
src/quant_kernels.rs Adds detailed SAFETY: invariants and AVX-512 blend-mask rationale to scan kernels.
ordvec-python/src/lib.rs Documents path-trust expectations per write/load method; rewords a benchmark-only error message.
fuzz/fuzz_targets/search_rankquant.rs New fuzz target exercising RankQuant ingest + asymmetric-search compute path.
fuzz/fuzz_targets/roundtrip_rankquant.rs New fuzz target for RankQuant write→load round-trip.
fuzz/Cargo.toml Registers the two new fuzz targets.
.github/workflows/audit.yml Adds a weekly scheduled cargo-deny advisories scan workflow.
Comments suppressed due to low confidence (3)

src/rank_io.rs:364

  • Same issue as in load_rank: the comment claims a swallowed metadata() error would make the payload-vs-file check “pass … false-rejecting”, but in practice file_len = 0 makes check_payload_matches_file reject any non-empty payload and can let empty-payload files bypass the trailing-bytes check. Please reword to match the real behavior.
    // Propagate a metadata failure rather than swallowing it as `0`: a bogus
    // `0` would make the payload-vs-file check pass for an empty corpus even on
    // a TOCTOU-racy filesystem (NFS/procfs), false-rejecting a valid index.
    let file_len = file.metadata()?.len();

src/rank_io.rs:493

  • Same comment-rationale mismatch here: with file_len = 0, check_payload_matches_file will reject any non-empty bitmap file and will not detect trailing bytes when n_vectors == 0. Consider rewording this duplicated comment block so it describes the actual failure modes.
    // Propagate a metadata failure rather than swallowing it as `0`: a bogus
    // `0` would make the payload-vs-file check pass for an empty corpus even on
    // a TOCTOU-racy filesystem (NFS/procfs), false-rejecting a valid index.
    let file_len = file.metadata()?.len();

src/rank_io.rs:612

  • Same duplicated comment issue as the other loaders: swallowing metadata() errors as 0 does not just cause “false-reject”; it can also defeat the trailing-bytes check for the empty-payload case. Please reword the rationale to match what check_payload_matches_file actually does when file_len is bogus.
    // Propagate a metadata failure rather than swallowing it as `0`: a bogus
    // `0` would make the payload-vs-file check pass for an empty corpus even on
    // a TOCTOU-racy filesystem (NFS/procfs), false-rejecting a valid index.
    let file_len = file.metadata()?.len();

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/rank_io.rs Outdated
Comment thread fuzz/fuzz_targets/roundtrip_rankquant.rs Outdated
…4} text)

The #9 reword in this PR dropped the '{2, 4}' dispatch detail from the search_asymmetric_byte_lut bits==1 error; test_search_asymmetric_byte_lut_rejects_b1 had pinned that substring via match='2, 4', so CI's pytest leg failed. Re-point it at the new benchmark-only phrasing. Verified locally: maturin develop + pytest = 155 passed.
copilot: the comment conflated 'passes for an empty corpus' with 'false-rejects a valid index'. Reword to the actual failure modes a bogus file_len=0 causes — false-reject of any non-empty index, and an empty-corpus pass that skips the trailing-byte check. Comment-only; all 4 loaders.
…or (#40 review)

copilot: the target swallowed write() failures (hiding regressions) and wrote to an already-open NamedTempFile handle (reopen-by-path write can fail on Windows). Write to a fresh path in tempfile::tempdir() and treat a write failure as a crash, matching the stated round-trip contract. Validated: fuzz build + 7531 runs, no crashes.
@Fieldnote-Echo
Copy link
Copy Markdown
Owner Author

Addressed the review (latest commits on the branch):

  • copilot — `src/rank_io.rs` metadata comment: reworded to the actual failure modes a bogus `file_len = 0` causes — false-reject of any non-empty index (`remaining` saturates to 0 ≠ positive `payload_bytes`), and an empty-corpus pass that skips the trailing-byte check. Applied to all 4 loaders.
  • copilot — `fuzz/.../roundtrip_rankquant.rs`: now writes to a fresh `tempfile::tempdir()` path (not an already-open `NamedTempFile` handle) and treats a `write()` failure as a crash, matching the stated round-trip contract. Re-validated: cargo +nightly fuzz build + 7531 runs, no crashes.

gemini had no findings. Both threads resolved.

Review (Bug/Security): audit.yml runs UNATTENDED on a cron schedule, so a mutable @v4/@v2 tag could ship changed/compromised action code that auto-executes on the next tick with no PR in the loop. Pin actions/checkout to 34e1148 (v4.3.1) and EmbarkStudios/cargo-deny-action to a531616 (v2.0.19). ci.yml stays on tags by the team's existing policy — it only runs on human-triggered push/PR, a lower-risk surface.
@project-navi-bot project-navi-bot merged commit e160e8a into main May 25, 2026
17 checks passed
@project-navi-bot project-navi-bot deleted the fix/pre-release-audit-r5 branch May 25, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants