Skip to content

[codex] add manifest verifier#104

Merged
project-navi-bot merged 6 commits into
mainfrom
codex/manifest-verifier
May 29, 2026
Merged

[codex] add manifest verifier#104
project-navi-bot merged 6 commits into
mainfrom
codex/manifest-verifier

Conversation

@Fieldnote-Echo
Copy link
Copy Markdown
Owner

Summary

  • add allocation-resistant core index metadata probing for Rank, RankQuant, Bitmap, and SignBitmap persisted formats
  • add the publish=false ordvec-manifest workspace crate with strict v1 schema handling, path policy, artifact verification, row identity checks, attestation shape checks, CLI, and optional SQLite registry/cache support
  • document the verifier as a pre-load sidecar provenance check and add a dedicated CI lane

Validation

  • cargo fmt --all --check
  • git diff --check
  • cargo test -p ordvec-manifest --no-default-features
  • cargo test -p ordvec-manifest --all-features
  • cargo clippy -p ordvec-manifest --all-targets --all-features -- -D warnings
  • cargo test -p ordvec --no-default-features
  • cargo clippy -p ordvec --all-targets --all-features -- -D warnings

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@Fieldnote-Echo Fieldnote-Echo force-pushed the codex/manifest-verifier branch from 24e57b4 to 13b28f9 Compare May 29, 2026 06:15
@project-navi-bot project-navi-bot marked this pull request as ready for review May 29, 2026 06:16
@project-navi-bot project-navi-bot self-requested a review as a code owner May 29, 2026 06:16
@Fieldnote-Echo Fieldnote-Echo requested a review from Copilot May 29, 2026 06:16
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces probe_index_metadata to inspect persisted index headers without allocating payloads, and adds ordvec-manifest, a repo-local sidecar verifier crate that validates index files against a JSON manifest schema. The feedback recommends wrapping the manifest file in a BufReader to optimize JSON deserialization performance, and suggests creating the output parent directory in create_manifest_for_index before canonicalization to prevent path resolution failures when the directory does not yet exist.

Comment thread ordvec-manifest/src/lib.rs Outdated
Comment thread ordvec-manifest/src/lib.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 99.15730% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/rank_io.rs 99.15% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@Fieldnote-Echo
Copy link
Copy Markdown
Owner Author

/agentic_review

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 29, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. SQLite cache bypasses verification ✓ Resolved 🐞 Bug ⛨ Security
Description
When use_cache=true, verify_with_registry returns the latest cached report selected only by
manifest_id, ignoring the current manifest file contents, VerifyOptions, and the current artifact
bytes. This can return ok=true for a modified index file or for stricter verification options than
the cached run used.
Code

ordvec-manifest/src/sqlite.rs[R6-26]

Evidence
The SQLite cache path returns cached results purely by manifest_id, and the test suite demonstrates
that a cached ok report remains ok even after the index file is mutated, proving stale verification
is possible.

ordvec-manifest/src/sqlite.rs[6-26]
ordvec-manifest/src/sqlite.rs[110-129]
ordvec-manifest/tests/manifest.rs[379-437]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`verify_with_registry(..., use_cache=true)` returns a cached `VerificationReport` using only `manifest_id` as the lookup key. This means a cached `ok=true` result can be reused even if:
- the index artifact has changed on disk since the cached report was produced, or
- the caller’s `VerifyOptions` are different (e.g., previously allowed path escape / duplicates), or
- the manifest file contents changed while keeping the same `manifest_id`.

This undermines the verifier’s integrity guarantees when cache is enabled.

### Issue Context
The cache is explicitly user-enabled, but it currently behaves like an unconditional “trust previous result for this manifest_id” shortcut.

### Fix Focus Areas
- ordvec-manifest/src/sqlite.rs[6-26]
- ordvec-manifest/src/sqlite.rs[85-129]

### Proposed fix
1. Persist additional cache key material alongside each report (at minimum):
  - a digest of the manifest JSON bytes (e.g., SHA-256 of the manifest file contents),
  - the `VerifyOptions` used (serialized),
  - and either the artifact SHA-256 observed during verification or a digest of the artifact bytes.
2. When `use_cache=true`, only return a cached report if ALL of the above match the current invocation (same manifest digest, same options, and same artifact digest).
3. If they don’t match, fall back to a fresh `verify_manifest(...)` run and store a new report.

(If you intentionally want “trust cache even if artifact changed”, make that a separate explicit flag and emit a prominent warning in the cached report output.)

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Report history overwritten in SQLite ✓ Resolved 🐞 Bug ☼ Reliability
Description
VerificationReport.checked_at is generated at second precision, but SQLite uses (manifest_id,
checked_at) as the primary key and inserts via INSERT OR REPLACE. Two verifications of the same
manifest_id in the same second will silently overwrite the earlier report, losing audit history.
Code

ordvec-manifest/src/sqlite.rs[R63-107]

Evidence
checked_at is generated with SecondsFormat::Secs, while the SQLite table uses checked_at in the
primary key and uses INSERT OR REPLACE, which will overwrite on key collisions.

ordvec-manifest/src/lib.rs[732-757]
ordvec-manifest/src/sqlite.rs[63-83]
ordvec-manifest/src/sqlite.rs[85-107]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The SQLite schema keys `verification_reports` by `(manifest_id, checked_at)`, but `checked_at` is generated with only seconds resolution. Combined with `INSERT OR REPLACE`, multiple verifications in the same second for the same `manifest_id` overwrite prior rows silently, losing history.

### Issue Context
This affects auditability/debugging of verifier behavior and can occur in automation (e.g., loops, parallel jobs, or rapid successive invocations).

### Fix Focus Areas
- ordvec-manifest/src/lib.rs[745-757]
- ordvec-manifest/src/sqlite.rs[63-83]
- ordvec-manifest/src/sqlite.rs[85-107]

### Proposed fix
Pick one:
1. Increase timestamp precision:
  - Generate `checked_at` with sub-second precision (e.g., `SecondsFormat::Nanos`) so collisions are practically eliminated.
  - Keep the existing primary key.

2. Change the schema key:
  - Add a `report_id` (UUID or INTEGER PRIMARY KEY) and make `(manifest_id, checked_at)` a non-unique index.
  - Replace `INSERT OR REPLACE` with `INSERT` so duplicate history is preserved.

If you change the schema, add a lightweight migration path (or bump schema version for the registry DB).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread ordvec-manifest/src/sqlite.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a sidecar manifest verifier for ordvec indexes. Introduces an allocation-resistant header-metadata probe in the core crate (probe_index_metadata over Rank/RankQuant/Bitmap/SignBitmap) and a new publish=false workspace crate ordvec-manifest that binds an index file to a strict v1 JSON manifest via SHA-256, header re-probe, row-identity checks, and attestation shape checks, with a CLI and optional SQLite registry/cache.

Changes:

  • Add IndexKind / IndexParams / IndexMetadata and probe_index_metadata to the core crate, validating header, declared shape, and exact file length without allocating the payload.
  • Add the ordvec-manifest workspace crate (library + CLI + optional sqlite feature) implementing schema validation, path policy, artifact verification, JSONL row identity checks, and attestation shape checks.
  • Wire the new crate into the workspace, lockfile, threat model, provenance docs, README, changelog, dependabot, and a dedicated CI lane.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/rank_io.rs Adds metadata-probe types and probe_index_metadata for all four persisted formats with header/shape/file-length checks and tests.
src/lib.rs Re-exports the new probe API.
ordvec-manifest/Cargo.toml Defines the new publish=false crate, deps, and sqlite / sqlite-bundled features.
ordvec-manifest/src/lib.rs Implements manifest schema, verification flow, path policy, attestation shape checks, hashing, and manifest creation.
ordvec-manifest/src/sqlite.rs Optional registry/cache: verify_with_registry and activate over a SQLite store.
ordvec-manifest/src/main.rs CLI: hash, inspect, verify, create, and `sqlite verify
ordvec-manifest/tests/manifest.rs End-to-end tests covering create/verify, schema strictness, path policy, JSONL rules, attestations, CLI, and SQLite.
ordvec-manifest/README.md Short usage doc for the sidecar verifier.
Cargo.toml Adds ordvec-manifest workspace member; updates workspace comment.
Cargo.lock New transitive deps (chrono, clap, sha2, rusqlite, tempfile, etc.).
.github/workflows/ci.yml Adds a dedicated manifest-verifier CI lane (no-default-features, all-features, clippy).
.github/dependabot.yml Updates comment to mention the new workspace member.
docs/INDEX_PROVENANCE.md Documents the verifier as a pre-load provenance check and its boundaries.
THREAT_MODEL.md Adds the manifest-verification trust boundary; updates non-shipping-crypto framing.
README.md Points users to the new verifier crate and updates the pre-load story.
CHANGELOG.md Notes the new probe API, new crate, and updated provenance docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ordvec-manifest/src/main.rs
Comment thread ordvec-manifest/src/lib.rs
Comment thread ordvec-manifest/src/sqlite.rs
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Copy link
Copy Markdown
Owner Author

Remediation pushed in 49f8528.

Addressed bot findings:

  • Wrapped manifest JSON reads in BufReader.
  • Created manifest output parent directories before relative path derivation.
  • Hardened SQLite cache reuse with manifest/options/artifact/row-identity SHA-256 key material and fallback to fresh verification on miss or mismatch.
  • Preserved SQLite report history with report_id, INSERT, nanosecond timestamps, and a migration path for the old table shape.
  • Kept sqlite activate --force from masking failed verification in the CLI exit code.
  • Normalized absolute manifest paths to / separators.
  • Added extra rank_io probe rejection tests for Codecov's missing-lines signal.

Validation run:

  • cargo test -p ordvec-manifest --all-features
  • cargo test -p ordvec-manifest --no-default-features
  • cargo test -p ordvec rank_io::tests::probe --no-default-features
  • cargo test -p ordvec --no-default-features
  • cargo clippy -p ordvec-manifest --all-targets --all-features -- -D warnings
  • cargo clippy -p ordvec --all-targets --all-features -- -D warnings
  • cargo fmt --all --check
  • git diff --check

All actionable inline review threads have been replied to and resolved.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Copy link
Copy Markdown
Owner Author

Follow-up merge-gate remediation pushed in b5fbe38.

Addressed:

  • create now defaults to manifest-relative, default-verifiable paths and rejects artifacts/row maps outside the output manifest directory unless create-time path policy opts in.
  • Added CreateManifestOptions plus CLI create --allow-path-escape / --allow-absolute-paths flags.
  • Changed sqlite activate --force semantics to match mutation: it activates, exits 0, and includes a sqlite_activation_forced warning when verification failed.
  • Scoped SQLite wording to cache/audit plus one active-manifest pointer instead of a full registry.
  • Enforced lowercase SHA-256 for schema-owned digest fields.
  • Added optional v1 embedding/build fields and shape checks.
  • Added verify_index_manifest(index_path, manifest_path, options) convenience API.
  • Expanded manifest tests to cover Rank, RankQuant, Bitmap, and SignBitmap create/verify fixtures plus missing artifact, wrong params, row count mismatch, uppercase digests, outside-path create policy, and the convenience API.

Local validation:

  • cargo test -p ordvec-manifest --all-features
  • cargo test -p ordvec-manifest --no-default-features
  • cargo clippy -p ordvec-manifest --all-targets --all-features -- -D warnings
  • cargo fmt --all --check
  • git diff --check

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Copy link
Copy Markdown
Owner Author

Calibration follow-up pushed in 96e4532 and CI fix in 18cb410.

Added:

  • Optional typed calibration: Option<CalibrationProfileRef> in the v1 manifest schema, plus CalibrationReport in verification output.
  • Encoder identity and ordinalization compatibility checks for Bitmap/top-k, RankQuant/bucket, SignBitmap/sign, and Rank/rank-position or caller-defined calibration.
  • Profile artifact binding: path policy, lowercase SHA-256 shape, actual profile hash and size, sample count, null-model/profile consistency, and raw f32/f64 shape-size checks. No statistical computation was added.
  • SQLite cache key material now includes the actual calibration profile SHA-256 when a profile is present, so --use-cache cannot hide profile byte drift.
  • inspect reports calibration presence/shape, and docs now describe calibration as verified profile provenance rather than a statistics engine.
  • Regression tests for absent/strict calibration schema, encoder mismatch, format compatibility, profile hash/size/path/sample failures, and SQLite cache invalidation on profile mutation.

Also fixed the CI-only release_signed_release_invariants.sh SIGPIPE under pipefail by materializing job_body before grepping needs: entries.

Local validation:

  • cargo test -p ordvec-manifest --no-default-features
  • cargo test -p ordvec-manifest --all-features
  • cargo clippy -p ordvec-manifest --all-targets --all-features -- -D warnings
  • bash tests/release_signed_release_invariants.sh
  • git diff --check

Current PR head is green from the checks view; only the scheduled weekly fuzz lane is skipped as expected. Existing inline review threads are resolved.

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Copy link
Copy Markdown
Owner Author

Calibration semantics follow-up pushed in a224955.

Addressed the remaining schema concern:

  • uniform_hypergeometric is now accepted only with top_k ordinalization; bucket, sign, and rank-position ordinalizations report calibration_null_model_ordinalization_mismatch.
  • Added parameterization-vs-ordinalization compatibility checks independent of optional profile.shape, with calibration_profile_parameterization_ordinalization_mismatch for mismatched profile meaning.
  • Updated the non-top-K calibration fixtures to use profile-backed weighted profiles instead of uniform hypergeometric.
  • Documented that uniform_hypergeometric is reserved for top-K overlap calibration.

Local validation:

  • cargo test -p ordvec-manifest --no-default-features
  • cargo test -p ordvec-manifest --all-features
  • cargo clippy -p ordvec-manifest --all-targets --all-features -- -D warnings
  • git diff --check

Copy link
Copy Markdown
Owner Author

Final PR104 remediation status for head a224955:

  • Addressed the remaining calibration semantics gate: uniform_hypergeometric is restricted to top_k ordinalization, and non-top-K uses now fail with calibration_null_model_ordinalization_mismatch.
  • Added profile parameterization-vs-ordinalization compatibility independent of optional shape metadata.
  • All inline review threads are resolved.
  • Current PR-head checks are green, including manifest verifier, clippy/fmt, coverage, CodeQL, Python matrix, Windows/macOS/Linux Rust tests, and fuzz smoke; only the scheduled weekly fuzz job is skipped as expected.
  • Working tree is clean after push.

@project-navi-bot project-navi-bot merged commit 361816e into main May 29, 2026
35 checks passed
@project-navi-bot project-navi-bot deleted the codex/manifest-verifier branch May 29, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants