docs: changelog perf entries + 0.6.0 un-patch checklist by Fieldnote-Echo · Pull Request #283 · Project-Navi/ordvec

Fieldnote-Echo · 2026-07-04T00:16:38Z

CHANGELOG ### Performance entries for the four unlogged perf changes (numbers verified against source commit messages) and the RELEASING.md one-time un-patch step for ordinaldb's [patch.crates-io] block. Fleet-authored, reviewed APPROVE. Merge LAST — cut from the integration branch, so its commit list collapses to docs-only once #277/#278/#279/#280/#281 land. Known follow-up: a CHANGELOG line for the CLI-parity PR once it merges.

Verification now bounds every artifact read by its manifest-declared file_size_bytes (manifest hard-capped at 1 MiB; SHA-256 pins content); creation bounds reads by the observed file size. Flat ResourceLimits byte caps become opt-in ceilings (default unbounded). The primary artifact read, previously unbounded, gains a declared-size bound and the artifact_file_too_large reason code. sha256_file_bounded now streams with constant memory instead of materialising files. Fixes the undocumented 64 MiB auxiliary cap that made sign-sidecar bundles impossible to write past 524,288 rows at dim=1024 (measured on a 1,258,135-row corpus).

Independent oracle (score_all + full lexicographic sort by hamming asc, doc_id asc) pins top_m_candidates and top_m_candidates_batched_serial_csr exactly: random corpora across block boundaries, massive-tie and duplicate-run corpora exercising boundary tie-breaks, edge geometries (m >= n, single doc, empty batch), and the dim=1024 shape. Must pass bit-identically before and after the tiling swap.

Security-audit remediation (CIPHER-02): the primary artifact read was bounded only by the attacker-declared size with no configurable ceiling, unlike the auxiliary/profile classes — the documented THREAT-QUERY-003 mitigation was silently incomplete for the largest artifact in a bundle. Adds ResourceLimits::max_index_artifact_bytes (default unbounded) applied as a min() with the declared size, and bounds the create-path primary read by its observed size so the code matches the CHANGELOG claim. CLI flag parity follows separately.

Code-review remediation: the sqlite-feature cache-key path duplicated the hashing logic and missed the derived-bound change — the primary artifact hash was fully unbounded, and the calibration/encoder profile hashes were bounded only by the flat caps, which the default flip turned into effectively unbounded reads. All three now use the same declared-size .min(opt-in ceiling) derivation as the verify path; a bound violation is a cache miss. Adds default-limits grown-profile coverage for the calibration and encoder-distortion call sites, closing the per-site test gap that let this slip.

top_m_candidates_batched_serial_csr previously looped the single-query path, re-streaming the full sign bitmap per query (documented-naive Track-1). The internals now scan the corpus once per call in L2-sized doc blocks, score every query of the call against each hot block in query tiles via the existing batched kernel, and select per-query top-m with bounded (hamming, doc_id) min-collectors — bit-identical to a full sort by construction, independent of processing order (the key IS the contract's sort key). top_m_candidates routes through the same core, dropping its per-call n-row Hamming materialisation. Per-query corpus traffic drops by the call's query count: at 1.26M rows x 1024 dims, a 2048-query call reads the 161MB sidecar once instead of 2048 times. Serial contract preserved (no rayon); the oracle suite (tests/tiled_candgen.rs) pins bit-identical outputs across random, tie-heavy, duplicate-run, and edge geometries.

Audit remediation: routing top_m_candidates through the streamed core measured +50-90% at small/medium n with m in the hundreds (bounded heap O(n log m) vs select_nth_unstable_by O(n)); with one query there is no scan to share, so nq=1 stays on the dense path (parity-or-better at every measured size). Also per audit: the block-boundary oracle test now genuinely spans three blocks (the dim=128 shape fit one block), and adds the dim=768 AVX-512 tail-residue x multi-block case to the permanent suite.

assert_all_finite paid a full serial pass per add/search batch — measured ~0.1s per GiB, twice per ingest batch counting the caller layer. Scans of 1M+ floats now split across the rayon pool (4.4x measured). RankQuant::add's per-row closure allocated a fresh ranks Vec per vector inside the parallel loop; for_each_init now reuses a per-worker scratch via rank_transform_into. Measured on the 1.26M x 1024 corpus slice: encode-path attribution 0.097s serial scan -> 0.022s parallel; alloc churn removed from the hot loop.

Doc ids visit each per-query heap strictly ascending, so a candidate tying the worst kept hamming always loses the (hamming, doc_id) tie-break — once the collector is full, the accept test is exactly 'hamming < worst kept hamming'. Cache that bound in a register-friendly u32 (u32::MAX while filling) and skip the heap peek + tuple compare on the ~99.8% reject path. Bit-identical by construction; pinned by the tie-heavy and duplicate-run oracle suites.

load_rankquant's forged-buffer defense histogrammed every packed code serially — 1.29 billion shift/mask ops at 1.26M x 1024, ~1s of the 1.27s verified open. A 4KB per-byte bucket-count LUT replaces the per-code inner loop and rows validate in parallel; find_first keeps the lowest-offending-row error contract, with a scalar recheck producing the identical message. The security property is unchanged: every row still proves uniform composition before the index is usable.

CHANGELOG Unreleased gains the measured perf work merged to integration/full-stack: tiled streaming sign candidate generation + cached collector worst-bound (bit-identical internals swap; downstream batched search 220 -> 10.2k q/s at 1.26M x 1024), parallel finite validation + scratch rank encode (0.097s -> 0.022s attribution), and the LUT + parallel constant-composition load check (verified open 1.27s -> 0.38s). RELEASING gains a one-time pre-publish item: remove OrdinalDB's [patch.crates-io] block pointing at integration/full-stack when 0.6.0 publishes.

Expose --max-index-artifact-bytes on the ordvec-manifest CLI LimitArgs, wiring it to ResourceLimits::max_index_artifact_bytes so the opt-in primary-artifact read ceiling reaches feature parity with the existing --max-auxiliary-artifact-bytes flag. Close the deferred CIPHER-04 reason-code symmetry: validate_manifest_shape now rejects a zero manifest-declared artifact.file_size_bytes (artifact_file_size_zero) and validate_auxiliary_artifact_shape rejects zero-size declarations on required auxiliary artifacts (auxiliary_artifact_file_size_zero), mirroring the calibration and encoder-distortion *_file_size_zero checks. Optional artifacts keep the established zero-size absent-placeholder convention.

Security-review remediation (fleet CIPHER-001): with derived read bounds, a FIFO inside the bundle directory would block verification forever — File::open on a reader-less FIFO blocks, and a device node would stream without EOF under a large declared size. Stat the path before opening and refuse anything that is not a regular file, at every hashing call site (create, verify, sqlite cache). Regular files terminate at EOF and remain post-checked against the declaration. Unix regression test uses a real FIFO.

Security-review note (fleet CIPHER-002): parallel finite validation introduced in the encode train transitively touches the global rayon pool from inside the 'serial' CSR primitive. The serial guarantee is about candidate scan/selection ownership, not input validation; say so explicitly.

Bot-review remediation (Qodo, PR #277): the streaming loop surfaced ErrorKind::Interrupted as a hash failure, a reliability regression vs the previous read_to_end which retried EINTR internally.

Bot-review remediation (Qodo, PR #278): the shared core derived nq by integer division; a ragged buffer from a future caller would silently truncate. All current callers validate upstream — this is the cheap in-core invariant.

…t-bound' into integration/full-stack

…i-parity

chatgpt-codex-connector · 2026-07-04T00:16:42Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

qodo-code-review · 2026-07-04T00:17:51Z

PR Summary by Qodo

Add performance changelog entries and implement derived artifact size bounds

✨ Enhancement 🐞 Bug fix 🧪 Tests 📝 Documentation 🕐 40+ Minutes

AI Description

• Document previously-unlogged performance work and add a one-time 0.6.0 release hygiene step.
• Make ordvec-manifest artifact read limits derived from declared sizes, with opt-in ceilings.
• Speed up hot paths: streamed batched candidate generation, parallel finite checks, LUT validation.
• Add contract-pinning and regression tests for streaming candgen and derived-size enforcement.

Diagram

graph TD
A["Public APIs"] --> B["SignBitmap candgen"] --> C["Streamed top-m core"]
A --> D["RankQuant"] --> E["Load/encode perf"]
A --> F["ordvec-manifest"] --> G["Bounded hashing/limits"]
H["Docs"] --> A

High-Level Assessment

The following are alternative approaches to this PR:

1. Keep flat default caps; only raise the 64 MiB defaults

➕ Minimal behavioral change for deployments relying on implicit ceilings
➕ Less surface area for new limit interactions (declared-size vs cap)
➖ Still couples correctness to an arbitrary default; legitimate artifacts can regress again
➖ Doesn't address unbounded primary-artifact hashing unless separately handled

2. Stream candgen but use per-query partial sort on chunked score buffers

➕ Could retain O(n) average selection per query via select_nth-like approaches
➕ May reduce heap overhead for large m
➖ More complex to implement correctly with global (hamming, doc_id) ordering across blocks
➖ Harder to make processing-order-independent without careful merging logic

Recommendation: The PR’s approach is the better long-term contract fit: (1) derived-size read bounds make manifests the trust anchor while still allowing opt-in ceilings for hostile-input deployments, and (2) streamed candidate generation removes pathological re-scans while preserving a strict, test-pinned ordering contract. The included oracle/edge-case suites materially de-risk the algorithmic changes, making this strategy preferable to simply bumping flat defaults or introducing more complex block-merge selection logic.

Files changed (13) +824 / -49

Enhancement (4) +156 / -17

quant.rsReuse per-worker rank scratch during RankQuant::add +10/-7

Reuse per-worker rank scratch during RankQuant::add

• Switches the parallel encode loop to 'for_each_init' and reuses a per-worker 'Vec<u16>' ranks buffer. Uses 'rank_transform_into' to avoid per-vector allocation churn inside the hot loop.

src/quant.rs

rank_io.rsSpeed up RankQuant load validation with LUT + parallel scan +34/-2

Speed up RankQuant load validation with LUT + parallel scan

• Adds a per-byte bucket-count LUT to validate constant-composition rows using table lookups instead of per-code shift/mask loops. Parallelizes the row validation with 'find_first' to preserve the 'lowest offending row' contract and re-runs the scalar histogram only for error reporting.

src/rank_io.rs

sign_bitmap.rsImplement streamed batched top-m candidate generation +101/-7

Implement streamed batched top-m candidate generation

• Introduces a new serial streamed core that scans the corpus once per call in doc blocks, scores query tiles, and keeps exact per-query top-m using bounded heaps keyed by (hamming, doc_id). Optimizes collector acceptance via cached worst-hamming bounds and updates batched CSR generation to route through the streamed core (with explicit empty-batch handling) while keeping single-query on the dense partition path for performance.

src/sign_bitmap.rs

util.rsParallelize assert_all_finite for large inputs +11/-1

Parallelize assert_all_finite for large inputs

• Adds a size threshold to scan large float slices in parallel using rayon chunks, reducing validation overhead for large ingest/search batches. Keeps the existing fast serial path for smaller inputs.

src/util.rs

Bug fix (2) +93 / -19

lib.rsDerive artifact read bounds and stream bounded hashing +72/-12

Derive artifact read bounds and stream bounded hashing

• Changes default ResourceLimits byte caps to 'u64::MAX' (unbounded) and introduces 'max_index_artifact_bytes' as an opt-in ceiling. Updates verification and SQLite-related hashing call sites to bound reads by 'min(declared_size, configured_ceiling)' and to fail fast with '*_file_too_large' codes. Reimplements 'sha256_file_bounded' as a streaming reader to keep memory constant and bounds create-time hashing by observed file length.

ordvec-manifest/src/lib.rs

sqlite.rsBound SQLite cache-key hashing like verify path +21/-7

Bound SQLite cache-key hashing like verify path

• Removes unbounded 'sha256_file' usage for artifact cache keys and switches to 'sha256_file_bounded' using declared size with the opt-in ceiling. Updates profile hashing helpers to use declared-size-derived bounds as well, ensuring cache behavior matches verification semantics.

ordvec-manifest/src/sqlite.rs

Tests (3) +473 / -9

derived_limits.rsAdd end-to-end tests for derived-size limits and opt-in ceilings +223/-0

Add end-to-end tests for derived-size limits and opt-in ceilings

• Introduces a new test module asserting that default options accept artifacts above the legacy 64 MiB cap, while verify fails fast when artifacts grow beyond declared size. Adds regression coverage for truncated/inflated declarations and confirms explicit flat caps (including the new index ceiling) remain enforceable.

ordvec-manifest/tests/derived_limits.rs

manifest.rsAdjust artifact corruption tests and add grown-profile fast-fail coverage +75/-9

Adjust artifact corruption tests and add grown-profile fast-fail coverage

• Updates a corruption test to flip bytes in-place (same size) so the declared-size bound passes and the digest mismatch is exercised. Updates the 'not a byte pin' test to expect 'artifact_file_too_large' when the artifact grows past its declaration. Adds a new regression test ensuring grown calibration/distortion profile files fail fast with the *_too_large codes under default limits.

ordvec-manifest/tests/manifest.rs

tiled_candgen.rsAdd oracle-based contract tests for candidate generation ordering +175/-0

Add oracle-based contract tests for candidate generation ordering

• Adds an independent oracle (score_all + full lexicographic sort by (hamming asc, doc_id asc)) to pin 'top_m_candidates' and 'top_m_candidates_batched_serial_csr' outputs exactly. Covers random corpora across block boundaries, tie-heavy and duplicate-run cases, empty batches, and high-dimension shapes (1024 and 768 tail residue).

tests/tiled_candgen.rs

Documentation (4) +102 / -4

CHANGELOG.mdAdd missing Unreleased performance and manifest-limit entries +75/-1

Add missing Unreleased performance and manifest-limit entries

• Replaces the empty Unreleased section with detailed Performance bullets describing streamed batched candidate generation, collector accept-test optimization, parallel finite validation + RankQuant scratch reuse, and LUT-based RankQuant load validation. Adds Changed entries documenting derived artifact size bounds, new opt-in index artifact ceiling, fail-fast reason-code behavior, and constant-memory bounded hashing.

CHANGELOG.md

RELEASING.mdAdd one-time 0.6.0 downstream un-patch checklist step +5/-0

Add one-time 0.6.0 downstream un-patch checklist step

• Documents that OrdinalDB must remove its '[patch.crates-io]' overrides for 'ordvec' and 'ordvec-manifest' upon the 0.6.0 publish. Clarifies this is a one-time release hygiene action so downstream consumes crates.io releases rather than a git branch.

RELEASING.md

THREAT_MODEL.mdDocument derived artifact read bounds threat +13/-0

Document derived artifact read bounds threat

• Adds THREAT-QUERY-003 describing that verification work scales with manifest-declared bytes even with constant-memory hashing. Explains the role of opt-in ResourceLimits ceilings and reiterates that VerifiedLoadPlan is not a byte pin under local write access.

THREAT_MODEL.md

README.mdUpdate limit contract docs for derived-size bounds +9/-3

Update limit contract docs for derived-size bounds

• Rewrites artifact byte-limit bullets to state that reads are bounded by manifest-declared 'file_size_bytes' on verify and observed size on create. Adds the primary index artifact bound and clarifies flat caps are opt-in ceilings with unbounded defaults.

ordvec-manifest/README.md

codecov · 2026-07-04T00:20:43Z

Codecov Report

❌ Patch coverage is 97.88732% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/sign_bitmap.rs	98.09%	2 Missing ⚠️
src/rank_io.rs	95.83%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

qodo-code-review · 2026-07-04T00:23:25Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

1. ~~Bounded hash reads overshoot~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

sha256_file_bounded enforces max_bytes only after reading a full 64KiB chunk, so it may read up
to ~64KiB past the declared/configured bound (and reads 64KiB even when max_bytes=0). This breaks
the intended strict read-bound contract and causes unnecessary extra I/O under tight limits.

Code

ordvec-manifest/src/lib.rs[R3486-3506]

+    let mut file = File::open(path)?;
    let mut hasher = Sha256::new();
-    hasher.update(&bytes);
+    let mut size_bytes = 0u64;
+    let mut buf = [0u8; 64 * 1024];
+    loop {
+        let n = file.read(&mut buf)?;
+        if n == 0 {
+            break;
+        }
+        size_bytes += n as u64;
+        if size_bytes > max_bytes {
+            return Err(ManifestError::limit_exceeded(
+                code,
+                format!(
+                    "{context} exceeds {max_bytes} bytes while reading {}",
+                    path.display()
+                ),
+            ));
+        }
+        hasher.update(&buf[..n]);
+    }

Relevance

⭐⭐⭐ High
Team previously enforced strict bounded reads via take(max_bytes+1) in read_bounded_file; likely
expects same for hashing.
PR-#157

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

sha256_file_bounded reads fixed 64KiB chunks and only checks size_bytes > max_bytes after
incrementing, so the function can consume bytes past the limit before failing. The repo’s
read_bounded_file demonstrates the strict-bounding approach using take(max_bytes + 1),
highlighting the inconsistency with the new hashing implementation.

ordvec-manifest/src/lib.rs[3479-3511]
ordvec-manifest/src/lib.rs[138-176]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`sha256_file_bounded` can read beyond `max_bytes` by up to one buffer (64KiB) because it reads into a fixed buffer and only checks `size_bytes > max_bytes` after the read. This violates a strict "bounded read" expectation (especially noticeable when `max_bytes` is very small).

### Issue Context
The codebase already uses a strict-bounding pattern for `read_bounded_file` via `take(max_bytes + 1)`.

### Fix Focus Areas
- ordvec-manifest/src/lib.rs[3479-3511]

### Implementation notes
Adjust the loop so the reader never consumes more than `max_bytes + 1` total bytes, e.g.:
- wrap the file with `file.by_ref().take(max_bytes.saturating_add(1))` and stream from that, or
- compute `remaining = max_bytes.saturating_sub(size_bytes)` and only attempt to read `min(buf.len(), remaining as usize + 1)` each iteration.
Ensure the error is still returned with the same `LimitExceeded` code/message when the bound is exceeded.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~TOCTOU size/hash mismatch~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

create_manifest_for_index_with_options probes index metadata (including file_size_bytes) and
then hashes the index using a separately-fetched fs::metadata(index_path)?.len() bound; if the
file size changes between those calls, the manifest can embed a file_size_bytes that does not
match the bytes that were actually hashed. This can produce a manifest that fails verification
immediately or behaves inconsistently under concurrent writers.

Code

ordvec-manifest/src/lib.rs[R3566-3569]

+    let index_hash = sha256_file_bounded(
+        index_path,
+        fs::metadata(index_path)?.len(),
+        "artifact_file_too_large",

Relevance

⭐⭐⭐ High
Similar TOCTOU-hardening was definitely accepted: reuse verification-time resolved paths/metadata to
avoid same-call races.
PR-#163

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

The create path uses probe_index_metadata() (whose file_size_bytes is derived from
file.metadata().len()) but hashes with a fresh fs::metadata(...).len() bound and then writes
metadata.file_size_bytes into the manifest, enabling divergence if the file size changes between
the two observations.

ordvec-manifest/src/lib.rs[3565-3586]
src/rank_io.rs[368-384]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`create_manifest_for_index_with_options` observes the index file size twice: once indirectly via `probe_index_metadata()` (stored into `metadata.file_size_bytes` and later written into the manifest), and again via `fs::metadata(index_path)?.len()` to set the hash read bound. If the file changes between these observations, the created manifest can contain a `file_size_bytes` that doesn't correspond to the hashed content.

### Issue Context
This is a classic TOCTOU inconsistency during manifest creation. The create path should either:
- use one size snapshot consistently for both hashing and the manifest field, or
- explicitly fail if the size changes between probe and hash.

### Fix Focus Areas
- ordvec-manifest/src/lib.rs[3565-3586]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ~~Per-query bitmap allocations~~ ✓ Resolved 🐞 Bug ➹ Performance

Description

SignBitmap::top_m_candidates_streamed builds q_bitmaps by calling build_query_bitmap per
query, which allocates a fresh Vec<u64> each time and re-validates finiteness, even though the
batch entry point already validates the whole query buffer. This adds avoidable allocation/CPU
overhead in the new hot-path implementation.

Code

src/sign_bitmap.rs[R241-245]

+        let mut q_bitmaps = vec![0u64; nq * qpv];
+        for qi in 0..nq {
+            let qb = self.build_query_bitmap(&queries[qi * dim..(qi + 1) * dim]);
+            q_bitmaps[qi * qpv..(qi + 1) * qpv].copy_from_slice(&qb);
+        }

Relevance

⭐⭐⭐ High
Repo has repeatedly accepted perf fixes removing avoidable allocations/copies in hot paths; similar
guidance accepted for SignBitmap/scan code.
PR-#96
PR-#208

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

The streamed path allocates per query by calling build_query_bitmap inside a loop and then copies
into q_bitmaps. build_query_bitmap itself allocates a new Vec<u64> and runs
assert_all_finite, while top_m_candidates_batched_serial_csr already validates the full
queries slice before calling the streamed implementation.

src/sign_bitmap.rs[202-213]
src/sign_bitmap.rs[235-245]
src/sign_bitmap.rs[424-447]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The streamed batched candidate generator constructs query bitmaps via `build_query_bitmap` for each query. `build_query_bitmap` allocates a `Vec<u64>` and also calls `assert_all_finite`, which means the streamed path incurs per-query allocations and additional passes over query memory.

### Issue Context
`top_m_candidates_batched_serial_csr` already asserts `assert_all_finite(queries)` and has a preallocated `q_bitmaps` buffer sized `nq * qpv`.

### Fix Focus Areas
- src/sign_bitmap.rs[202-245]
- src/sign_bitmap.rs[424-447]

### Implementation notes
Introduce a private helper that fills an existing `&mut [u64]` (or `&mut Vec<u64>`) for a single query without allocating, e.g. `build_query_bitmap_into(q: &[f32], out: &mut [u64])`.
- In `top_m_candidates_streamed`, fill `q_bitmaps[qi*qpv..(qi+1)*qpv]` directly.
- Keep `build_query_bitmap` as the public allocating API, but avoid calling it from the streamed inner loop.
- Since the batch entry point already validated `queries`, avoid re-running `assert_all_finite` per query in the streamed path.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. ~~Artifact zero-size unvalidated~~ ✓ Resolved 🐞 Bug ⚙ Maintainability

Description

Verification now bounds primary-artifact hashing by manifest.artifact.file_size_bytes, but
validate_manifest_shape does not reject artifact.file_size_bytes == 0 (unlike
calibration/distortion profile artifacts). For malformed manifests that declare file_size_bytes=0
for a non-empty index, this shifts the failure into artifact_file_too_large instead of a clearer
schema/shape error, making reports less consistent and harder to interpret.

Code

ordvec-manifest/src/lib.rs[R264-273]

+        match sha256_file_bounded(
+            &resolved.canonical_path,
+            document
+                .manifest
+                .artifact
+                .file_size_bytes
+                .min(options.limits.max_index_artifact_bytes),
+            "artifact_file_too_large",
+            "index artifact",
+        ) {

Relevance

⭐⭐ Medium
Team accepts tightening verifier contracts, but no prior review explicitly required rejecting
artifact file_size_bytes==0.
PR-#157
PR-#152

ⓘ Recommendations generated based on similar findings in past PRs

Evidence

The verification path now uses artifact.file_size_bytes as the hash bound, but the manifest shape
validator checks many artifact fields and does not validate artifact.file_size_bytes. In contrast,
both profile artifact validators explicitly error when file_size_bytes == 0, showing an
inconsistency in schema enforcement.

ordvec-manifest/src/lib.rs[258-301]
ordvec-manifest/src/lib.rs[328-387]
ordvec-manifest/src/lib.rs[1200-1217]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Primary artifact `file_size_bytes` is now used as the effective hash read bound, but the manifest shape validator does not assert it is > 0. This creates an inconsistency with profile artifacts (which explicitly reject zero) and can lead to confusing error codes for malformed manifests.

### Issue Context
A zero-sized primary index artifact is not a meaningful/valid index file in practice (it has at least a header), so treating `file_size_bytes == 0` as a schema/shape error keeps reporting consistent and clearer.

### Fix Focus Areas
- ordvec-manifest/src/lib.rs[328-387]
- ordvec-manifest/src/lib.rs[258-301]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

…i-parity

The release-publish SBOM invariant requires member package versions in lockstep with the root; ordvec-ffi and both python bindings follow the 0.6.0 bump.

…i-parity

Closes the remaining release-publish invariant layers, verified by running tests/release_publish_invariants.py locally to a clean exit: pyproject + __init__ versions in lockstep, the changelog cut as a dated 0.6.0 section (invariant convention: the current version always has a dated section; [Unreleased] stays empty), THREAT_MODEL status line at v0.6.0, and the README quickstart installing 0.6.

…i-parity

# Conflicts: # CHANGELOG.md

Bot-review remediation (Qodo, PR #278): nq * m_eff can overflow usize on 32-bit/wasm32 targets, and the CSR wrapper's saturating_mul would attempt a usize::MAX allocation. Both sites now use checked multiplication with a clear tile-the-batch message, matching the crate's checked-allocation discipline. The exact m_eff + 1 heap reservation is kept deliberately: gradual growth double-allocates to the next power of two (~2x peak per query) — the reservation is the memory-optimal choice, now documented.

…d 'origin/feat/index-ceiling-cli-parity' into docs/release-hygiene

…imary shape check Bot-review remediation (Qodo, #283 inline): - create_manifest_for_index_with_options observed the index size twice (probe, then a separate stat for the hash bound) — a concurrent writer could produce a manifest whose size and digest describe different bytes. The hash is now bounded by the probe's size, the manifest records the byte count actually hashed, and any disagreement fails loudly. - sha256_file_bounded could read (not hash) up to one 64KiB chunk past the bound; reads now clamp to max_bytes + 1, mirroring read_bounded_file's take() pattern. - validate_manifest_shape gains artifact_file_size_zero for the primary artifact, matching the profile artifacts' explicit zero rejection instead of surfacing a confusing artifact_file_too_large.

Bot-review remediation (Qodo, #283 inline): build_query_bitmap allocated a fresh Vec and re-validated finiteness per query; the entry points already validate the whole buffer and the destination is preallocated. Oracle suites pin bit-identical output.

…i-parity

…li-parity' into integration/full-stack

Bot-review remediation (Qodo, #282): --max-index-artifact-bytes wired into ResourceLimits but the create path bounded the primary hash by the probed size alone — the opt-in ceiling was ineffective for create, unlike auxiliary artifacts. Create now mirrors verify: declared/observed size min explicit ceiling.

Nelson Spence (Fieldnote-Echo) added 23 commits July 3, 2026 12:52

Merge branch 'perf/encode-path' into integration/full-stack

a9dac16

Merge branch 'perf/collector-worst-bound' into integration/full-stack

675d096

Merge branch 'fix/manifest-derived-limits' into integration/full-stack

90f2873

Merge branch 'perf/parallel-load-validation' into integration/full-stack

522bade

Merge branch 'fix/manifest-derived-limits' into integration/full-stack

183a5af

fix: retry interrupted reads in bounded streaming hash

b18ebaf

Bot-review remediation (Qodo, PR #277): the streaming loop surfaced ErrorKind::Interrupted as a hash failure, a reliability regression vs the previous read_to_end which retried EINTR internally.

fix: assert whole-row query buffers in the streamed core

9350b41

Bot-review remediation (Qodo, PR #278): the shared core derived nq by integer division; a ragged buffer from a future caller would silently truncate. All current callers validate upstream — this is the cheap in-core invariant.

Merge branch 'perf/tiled-batch-candgen' into perf/collector-worst-bound

8bcd71c

Merge branches 'fix/manifest-derived-limits' and 'perf/collector-wors…

7aec40f

…t-bound' into integration/full-stack

Merge branch 'fix/manifest-derived-limits' into feat/index-ceiling-cl…

b4b6a64

…i-parity

Nelson Spence (Fieldnote-Echo) requested review from Navi Bot (project-navi-bot) and Todd Baur (toadkicker) as code owners July 4, 2026 00:16

qodo-code-review Bot reviewed Jul 4, 2026

View reviewed changes

Comment thread ordvec-manifest/src/lib.rs

Comment thread ordvec-manifest/src/lib.rs

Nelson Spence (Fieldnote-Echo) added 17 commits July 3, 2026 21:58

Merge branch 'integration/full-stack' into docs/release-hygiene

92e09c9

chore: track 0.6.0 in the fuzz workspace lock

0bcfb93

Merge branch 'fix/manifest-derived-limits' into feat/index-ceiling-cl…

c2be965

…i-parity

Merge branch 'fix/manifest-derived-limits' into integration/full-stack

577611b

Merge branch 'integration/full-stack' into docs/release-hygiene

60b4d62

chore: lockstep all workspace member versions at 0.6.0

661529a

The release-publish SBOM invariant requires member package versions in lockstep with the root; ordvec-ffi and both python bindings follow the 0.6.0 bump.

Merge branch 'fix/manifest-derived-limits' into feat/index-ceiling-cl…

364f176

…i-parity

Merge branch 'fix/manifest-derived-limits' into integration/full-stack

e55025f

Merge branch 'integration/full-stack' into docs/release-hygiene

744c03f

Merge branch 'fix/manifest-derived-limits' into feat/index-ceiling-cl…

3e654f7

…i-parity

Merge branch 'fix/manifest-derived-limits' into integration/full-stack

cea1d4b

Merge branch 'integration/full-stack' into docs/release-hygiene

c078c33

# Conflicts: # CHANGELOG.md

Merge branch 'perf/tiled-batch-candgen' into perf/collector-worst-bound

89abfb5

Merge branch 'perf/tiled-batch-candgen' into integration/full-stack

1b20caf

Merge branch 'integration/full-stack' into docs/release-hygiene

767760e

project-navi-bot previously approved these changes Jul 4, 2026

View reviewed changes

Merge remote-tracking branches 'origin/perf/collector-worst-bound' an…

974d49d

…d 'origin/feat/index-ceiling-cli-parity' into docs/release-hygiene

Nelson Spence (Fieldnote-Echo) dismissed Navi Bot (project-navi-bot)’s stale review via 974d49d July 4, 2026 04:02

Nelson Spence (Fieldnote-Echo) added 9 commits July 3, 2026 23:12

Merge branch 'fix/manifest-derived-limits' into feat/index-ceiling-cl…

69cbebc

…i-parity

Merge branch 'perf/tiled-batch-candgen' into perf/collector-worst-bound

4569943

Merge branches 'perf/collector-worst-bound' and 'feat/index-ceiling-c…

66dd1e1

…li-parity' into integration/full-stack

Merge branch 'integration/full-stack' into docs/release-hygiene

ee94a09

Merge branch 'fix/manifest-derived-limits' into integration/full-stack

69bb693

Merge branch 'integration/full-stack' into docs/release-hygiene

9cc58a0

toadkicker approved these changes Jul 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

docs: changelog perf entries + 0.6.0 un-patch checklist#283

docs: changelog perf entries + 0.6.0 un-patch checklist#283
Nelson Spence (Fieldnote-Echo) wants to merge 60 commits into
mainfrom
docs/release-hygiene

Fieldnote-Echo commented Jul 4, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jul 4, 2026

Uh oh!

qodo-code-review Bot commented Jul 4, 2026

Uh oh!

codecov Bot commented Jul 4, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented Jul 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

Conversation

Fieldnote-Echo commented Jul 4, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jul 4, 2026

Uh oh!

qodo-code-review Bot commented Jul 4, 2026

PR Summary by Qodo

Uh oh!

codecov Bot commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

qodo-code-review Bot commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Jul 4, 2026 •

edited

Loading

qodo-code-review Bot commented Jul 4, 2026 •

edited

Loading