docs: changelog perf entries + 0.6.0 un-patch checklist#283
docs: changelog perf entries + 0.6.0 un-patch checklist#283Nelson Spence (Fieldnote-Echo) wants to merge 60 commits into
Conversation
Verification now bounds every artifact read by its manifest-declared file_size_bytes (manifest hard-capped at 1 MiB; SHA-256 pins content); creation bounds reads by the observed file size. Flat ResourceLimits byte caps become opt-in ceilings (default unbounded). The primary artifact read, previously unbounded, gains a declared-size bound and the artifact_file_too_large reason code. sha256_file_bounded now streams with constant memory instead of materialising files. Fixes the undocumented 64 MiB auxiliary cap that made sign-sidecar bundles impossible to write past 524,288 rows at dim=1024 (measured on a 1,258,135-row corpus).
Independent oracle (score_all + full lexicographic sort by hamming asc, doc_id asc) pins top_m_candidates and top_m_candidates_batched_serial_csr exactly: random corpora across block boundaries, massive-tie and duplicate-run corpora exercising boundary tie-breaks, edge geometries (m >= n, single doc, empty batch), and the dim=1024 shape. Must pass bit-identically before and after the tiling swap.
Security-audit remediation (CIPHER-02): the primary artifact read was bounded only by the attacker-declared size with no configurable ceiling, unlike the auxiliary/profile classes — the documented THREAT-QUERY-003 mitigation was silently incomplete for the largest artifact in a bundle. Adds ResourceLimits::max_index_artifact_bytes (default unbounded) applied as a min() with the declared size, and bounds the create-path primary read by its observed size so the code matches the CHANGELOG claim. CLI flag parity follows separately.
Code-review remediation: the sqlite-feature cache-key path duplicated the hashing logic and missed the derived-bound change — the primary artifact hash was fully unbounded, and the calibration/encoder profile hashes were bounded only by the flat caps, which the default flip turned into effectively unbounded reads. All three now use the same declared-size .min(opt-in ceiling) derivation as the verify path; a bound violation is a cache miss. Adds default-limits grown-profile coverage for the calibration and encoder-distortion call sites, closing the per-site test gap that let this slip.
top_m_candidates_batched_serial_csr previously looped the single-query path, re-streaming the full sign bitmap per query (documented-naive Track-1). The internals now scan the corpus once per call in L2-sized doc blocks, score every query of the call against each hot block in query tiles via the existing batched kernel, and select per-query top-m with bounded (hamming, doc_id) min-collectors — bit-identical to a full sort by construction, independent of processing order (the key IS the contract's sort key). top_m_candidates routes through the same core, dropping its per-call n-row Hamming materialisation. Per-query corpus traffic drops by the call's query count: at 1.26M rows x 1024 dims, a 2048-query call reads the 161MB sidecar once instead of 2048 times. Serial contract preserved (no rayon); the oracle suite (tests/tiled_candgen.rs) pins bit-identical outputs across random, tie-heavy, duplicate-run, and edge geometries.
Audit remediation: routing top_m_candidates through the streamed core measured +50-90% at small/medium n with m in the hundreds (bounded heap O(n log m) vs select_nth_unstable_by O(n)); with one query there is no scan to share, so nq=1 stays on the dense path (parity-or-better at every measured size). Also per audit: the block-boundary oracle test now genuinely spans three blocks (the dim=128 shape fit one block), and adds the dim=768 AVX-512 tail-residue x multi-block case to the permanent suite.
assert_all_finite paid a full serial pass per add/search batch — measured ~0.1s per GiB, twice per ingest batch counting the caller layer. Scans of 1M+ floats now split across the rayon pool (4.4x measured). RankQuant::add's per-row closure allocated a fresh ranks Vec per vector inside the parallel loop; for_each_init now reuses a per-worker scratch via rank_transform_into. Measured on the 1.26M x 1024 corpus slice: encode-path attribution 0.097s serial scan -> 0.022s parallel; alloc churn removed from the hot loop.
Doc ids visit each per-query heap strictly ascending, so a candidate tying the worst kept hamming always loses the (hamming, doc_id) tie-break — once the collector is full, the accept test is exactly 'hamming < worst kept hamming'. Cache that bound in a register-friendly u32 (u32::MAX while filling) and skip the heap peek + tuple compare on the ~99.8% reject path. Bit-identical by construction; pinned by the tie-heavy and duplicate-run oracle suites.
load_rankquant's forged-buffer defense histogrammed every packed code serially — 1.29 billion shift/mask ops at 1.26M x 1024, ~1s of the 1.27s verified open. A 4KB per-byte bucket-count LUT replaces the per-code inner loop and rows validate in parallel; find_first keeps the lowest-offending-row error contract, with a scalar recheck producing the identical message. The security property is unchanged: every row still proves uniform composition before the index is usable.
CHANGELOG Unreleased gains the measured perf work merged to integration/full-stack: tiled streaming sign candidate generation + cached collector worst-bound (bit-identical internals swap; downstream batched search 220 -> 10.2k q/s at 1.26M x 1024), parallel finite validation + scratch rank encode (0.097s -> 0.022s attribution), and the LUT + parallel constant-composition load check (verified open 1.27s -> 0.38s). RELEASING gains a one-time pre-publish item: remove OrdinalDB's [patch.crates-io] block pointing at integration/full-stack when 0.6.0 publishes.
Expose --max-index-artifact-bytes on the ordvec-manifest CLI LimitArgs, wiring it to ResourceLimits::max_index_artifact_bytes so the opt-in primary-artifact read ceiling reaches feature parity with the existing --max-auxiliary-artifact-bytes flag. Close the deferred CIPHER-04 reason-code symmetry: validate_manifest_shape now rejects a zero manifest-declared artifact.file_size_bytes (artifact_file_size_zero) and validate_auxiliary_artifact_shape rejects zero-size declarations on required auxiliary artifacts (auxiliary_artifact_file_size_zero), mirroring the calibration and encoder-distortion *_file_size_zero checks. Optional artifacts keep the established zero-size absent-placeholder convention.
Security-review remediation (fleet CIPHER-001): with derived read bounds, a FIFO inside the bundle directory would block verification forever — File::open on a reader-less FIFO blocks, and a device node would stream without EOF under a large declared size. Stat the path before opening and refuse anything that is not a regular file, at every hashing call site (create, verify, sqlite cache). Regular files terminate at EOF and remain post-checked against the declaration. Unix regression test uses a real FIFO.
Security-review note (fleet CIPHER-002): parallel finite validation introduced in the encode train transitively touches the global rayon pool from inside the 'serial' CSR primitive. The serial guarantee is about candidate scan/selection ownership, not input validation; say so explicitly.
Bot-review remediation (Qodo, PR #277): the streaming loop surfaced ErrorKind::Interrupted as a hash failure, a reliability regression vs the previous read_to_end which retried EINTR internally.
Bot-review remediation (Qodo, PR #278): the shared core derived nq by integer division; a ragged buffer from a future caller would silently truncate. All current callers validate upstream — this is the cheap in-core invariant.
…t-bound' into integration/full-stack
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
PR Summary by QodoAdd performance changelog entries and implement derived artifact size bounds
AI Description
Diagram
High-Level Assessment
Files changed (13)
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Code Review by Qodo
1.
|
The release-publish SBOM invariant requires member package versions in lockstep with the root; ordvec-ffi and both python bindings follow the 0.6.0 bump.
Closes the remaining release-publish invariant layers, verified by running tests/release_publish_invariants.py locally to a clean exit: pyproject + __init__ versions in lockstep, the changelog cut as a dated 0.6.0 section (invariant convention: the current version always has a dated section; [Unreleased] stays empty), THREAT_MODEL status line at v0.6.0, and the README quickstart installing 0.6.
# Conflicts: # CHANGELOG.md
Bot-review remediation (Qodo, PR #278): nq * m_eff can overflow usize on 32-bit/wasm32 targets, and the CSR wrapper's saturating_mul would attempt a usize::MAX allocation. Both sites now use checked multiplication with a clear tile-the-batch message, matching the crate's checked-allocation discipline. The exact m_eff + 1 heap reservation is kept deliberately: gradual growth double-allocates to the next power of two (~2x peak per query) — the reservation is the memory-optimal choice, now documented.
…d 'origin/feat/index-ceiling-cli-parity' into docs/release-hygiene
…imary shape check Bot-review remediation (Qodo, #283 inline): - create_manifest_for_index_with_options observed the index size twice (probe, then a separate stat for the hash bound) — a concurrent writer could produce a manifest whose size and digest describe different bytes. The hash is now bounded by the probe's size, the manifest records the byte count actually hashed, and any disagreement fails loudly. - sha256_file_bounded could read (not hash) up to one 64KiB chunk past the bound; reads now clamp to max_bytes + 1, mirroring read_bounded_file's take() pattern. - validate_manifest_shape gains artifact_file_size_zero for the primary artifact, matching the profile artifacts' explicit zero rejection instead of surfacing a confusing artifact_file_too_large.
Bot-review remediation (Qodo, #283 inline): build_query_bitmap allocated a fresh Vec and re-validated finiteness per query; the entry points already validate the whole buffer and the destination is preallocated. Oracle suites pin bit-identical output.
…li-parity' into integration/full-stack
Bot-review remediation (Qodo, #282): --max-index-artifact-bytes wired into ResourceLimits but the create path bounded the primary hash by the probed size alone — the opt-in ceiling was ineffective for create, unlike auxiliary artifacts. Create now mirrors verify: declared/observed size min explicit ceiling.
CHANGELOG
### Performanceentries for the four unlogged perf changes (numbers verified against source commit messages) and the RELEASING.md one-time un-patch step for ordinaldb's[patch.crates-io]block. Fleet-authored, reviewed APPROVE. Merge LAST — cut from the integration branch, so its commit list collapses to docs-only once #277/#278/#279/#280/#281 land. Known follow-up: a CHANGELOG line for the CLI-parity PR once it merges.