Skip to content

[codex] Add encoder distortion manifest verification#152

Merged
project-navi-bot merged 3 commits into
mainfrom
codex/encoder-distortion-manifest
Jun 3, 2026
Merged

[codex] Add encoder distortion manifest verification#152
project-navi-bot merged 3 commits into
mainfrom
codex/encoder-distortion-manifest

Conversation

@Fieldnote-Echo
Copy link
Copy Markdown
Owner

Summary

  • adds a strict optional encoder_distortion manifest block with typed verifier/report support
  • verifies encoder identity, tokenizer/pooling identity, finite bounds, scoped metric digests, evidence metadata, profile artifact path/hash/size integrity, and optional calibration profile linkage
  • extends the SQLite verification cache key so encoder-distortion profile bytes invalidate stale cached reports
  • updates manifest docs to keep the profile scoped and avoid implying a global encoder theorem

Scope notes

This is a narrow ordvec-manifest provenance/report lane. It is related to #143, #147, and #148, but it does not make the verifier crate publishable, expose a new programmatic cache API, or implement the full unified auxiliary-artifact report model. It does not implement #144, #145, #146, or #149.

Validation

  • cargo fmt --check
  • cargo check -p ordvec-manifest --no-default-features
  • cargo test -p ordvec-manifest
  • cargo test -p ordvec-manifest --features sqlite
  • cargo clippy -p ordvec-manifest --all-targets --all-features -- -D warnings
  • git diff --check

Review

Adversarial review found no blocker for a draft PR, with the main guardrail that this PR should not be described as closing the broader manifest publication, cache API, auxiliary artifact, verified-load, or bounded-parser issues.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for verifying optional encoder_distortion profile references in index manifests, updating documentation, manifest schemas, SQLite caching logic, and adding comprehensive tests. The review feedback highlights two key improvements in ordvec-manifest/src/lib.rs: handling potential float overflow to infinity when calculating expected distortion bounds to prevent infinite tolerance checks, and trimming whitespace when comparing calibration profile IDs to avoid false mismatch errors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread ordvec-manifest/src/lib.rs
Comment thread ordvec-manifest/src/lib.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Fieldnote-Echo Fieldnote-Echo marked this pull request as ready for review June 3, 2026 13:56
@Fieldnote-Echo Fieldnote-Echo requested a review from Copilot June 3, 2026 13:56
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Add encoder distortion manifest verification with profile artifact validation

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Adds strict optional encoder_distortion manifest block with comprehensive verification
• Verifies encoder identity, tokenizer/pooling identity, finite bounds, scoped metric digests
• Validates evidence metadata, profile artifact path/hash/size integrity, calibration linkage
• Extends SQLite cache key to include encoder-distortion profile bytes for invalidation
• Adds tokenizer_revision and pooling fields to Embedding struct for encoder tracking
Diagram
flowchart LR
  A["Manifest Document"] -->|contains| B["EncoderDistortionProfileRef"]
  B -->|validates| C["Encoder Identity"]
  B -->|validates| D["Metric Specs"]
  B -->|validates| E["Distortion Bounds"]
  B -->|validates| F["Scope Metadata"]
  B -->|validates| G["Evidence Kind"]
  B -->|references| H["Profile Artifact"]
  H -->|integrity check| I["SHA256 & Size"]
  B -->|optional link| J["Calibration Profile"]
  K["SQLite Cache"] -->|includes| L["EncoderDistortion Profile Hash"]
  L -->|invalidates| M["Stale Reports"]

Loading

Grey Divider

File Changes

1. ordvec-manifest/src/lib.rs ✨ Enhancement +694/-2

Core encoder distortion verification and data structures

• Adds ENCODER_DISTORTION_SCHEMA_VERSION constant for schema versioning
• Introduces new data structures: EncoderDistortionProfileRef, MetricSpec, DistortionBounds,
 DistortionScope, DistortionEvidence, DistortionEvidenceKind, DistortionProfileArtifactRef
• Adds encoder_distortion optional field to IndexManifest struct
• Adds tokenizer_revision and pooling optional fields to Embedding struct
• Implements comprehensive validation functions: verify_encoder_distortion,
 validate_encoder_distortion_shape, validate_encoder_distortion_encoder,
 validate_encoder_distortion_metrics, validate_encoder_distortion_bounds,
 validate_encoder_distortion_scope, validate_encoder_distortion_evidence,
 validate_encoder_distortion_profile_artifact, validate_encoder_distortion_calibration
• Adds helper validators: validate_optional_sha256_uri, validate_optional_positive_f64,
 validate_optional_nonnegative_f64, validate_optional_probability
• Refactors compare_optional_identity to compare_optional_encoder_identity for reusability
• Adds EncoderDistortionReport struct to verification report with profile metadata fields
• Updates manifest creation to initialize encoder distortion and embedding fields

ordvec-manifest/src/lib.rs


2. ordvec-manifest/src/sqlite.rs ✨ Enhancement +63/-2

SQLite cache key extension for encoder distortion profiles

• Adds encoder_distortion_profile_sha256 column to SQLite verification_reports table
• Updates database schema migration to include new column in table creation and index
• Implements current_encoder_distortion_profile_sha256 function to compute profile hash
• Updates CacheKey struct to include encoder_distortion_profile_sha256 field
• Modifies store_report and load_cached_report functions to include encoder distortion profile
 hash in cache key matching
• Updates cache_key_from_report to extract encoder distortion profile SHA256 from verification
 report
• Updates migration detection to check for new column presence

ordvec-manifest/src/sqlite.rs


3. ordvec-manifest/tests/manifest.rs 🧪 Tests +467/-3

Comprehensive encoder distortion verification test coverage

• Adds imports for new encoder distortion types and constants
• Implements sha256_uri helper function for test digest formatting
• Implements distortion_profile helper function to create test encoder distortion profiles
• Adds comprehensive test encoder_distortion_schema_shape_is_strict_and_optional validating schema
 strictness and optional field handling
• Adds test encoder_distortion_identity_bounds_and_scope_are_checked validating encoder identity
 matching, bound constraints, and scope validation
• Adds test encoder_distortion_profile_artifact_checks_are_enforced validating artifact path,
 hash, size, format, and path escape/absolute path rejection
• Adds test encoder_distortion_can_bind_to_calibration_profile_id validating optional calibration
 profile linkage
• Adds SQLite cache test sqlite_cache_key_includes_encoder_distortion_profile_bytes verifying
 profile drift invalidates cached reports

ordvec-manifest/tests/manifest.rs


View more (2)
4. docs/INDEX_PROVENANCE.md 📝 Documentation +15/-0

Document encoder distortion profile verification semantics

• Documents optional encoder_distortion profile verification checks
• Clarifies that encoder distortion is a scoped profile, not a global bi-Lipschitz claim
• Explains verification scope: finite bounds, identity compatibility, byte-bound artifacts
• Documents optional calibration profile ID linkage validation
• Emphasizes verifier does not estimate profiles or promote empirical evidence to theorems

docs/INDEX_PROVENANCE.md


5. ordvec-manifest/README.md 📝 Documentation +10/-8

Update README for encoder distortion verification

• Updates description to include encoder distortion profile verification
• Clarifies verifier does not estimate encoder geometry
• Updates SQLite cache documentation to mention encoder distortion profile bytes in cache key

ordvec-manifest/README.md


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Jun 3, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (1) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0)

Grey Divider


Action required

1. Distortion profile hash unbounded 📎 Requirement gap ⛨ Security
Description
The new encoder_distortion verifier and related SQLite cache-key computation hash the referenced
profile file using sha256_file() without any caller-configurable maximum size, allowing hostile
manifests to trigger excessive I/O/time and bypass fail-closed resource-limit behavior. This
violates the requirement to enforce explicit, configurable resource limits with stable size/limit
failure codes consistently across library/CLI verification paths and cache metadata handling.
Code

ordvec-manifest/src/lib.rs[R988-1026]

+    if !profile.path.trim().is_empty() {
+        let path = PathBuf::from(&profile.path);
+        if let Some(resolved) = resolve_existing_path(
+            &path,
+            base_dir,
+            options,
+            "encoder_distortion_profile",
+            &mut report.errors,
+        ) {
+            report.encoder_distortion.profile_canonical_path =
+                Some(path_to_display(&resolved.canonical_path));
+            match sha256_file(&resolved.resolved_path) {
+                Ok(hash) => {
+                    report.encoder_distortion.profile_sha256 = Some(hash.sha256.clone());
+                    report.encoder_distortion.profile_size_bytes = Some(hash.size_bytes);
+                    if !hex_digest_eq(&hash.sha256, &profile.sha256) {
+                        report.error(
+                            "encoder_distortion_profile_sha256_mismatch",
+                            format!(
+                                "encoder distortion profile SHA-256 was {}, manifest declares {}",
+                                hash.sha256, profile.sha256
+                            ),
+                        );
+                    }
+                    if hash.size_bytes != profile.file_size_bytes {
+                        report.error(
+                            "encoder_distortion_profile_file_size_mismatch",
+                            format!(
+                                "encoder distortion profile size was {}, manifest declares {}",
+                                hash.size_bytes, profile.file_size_bytes
+                            ),
+                        );
+                    }
+                }
+                Err(err) => report.error(
+                    "encoder_distortion_profile_hash_failed",
+                    format!("failed to hash encoder distortion profile: {err}"),
+                ),
+            }
Evidence
Rule 1 requires bounded, configurable resource limits and stable error codes for size/limit failures
across both verification and cache metadata handling. In the added encoder_distortion verification
flow, the code resolves the profile path and then hashes the entire file via
sha256_file(&resolved.resolved_path) without consulting any VerifyOptions size limit, meaning an
oversized profile can force unbounded work instead of failing early with a stable limit error;
similarly, current_encoder_distortion_profile_sha256 used for SQLite cache key derivation resolves
the profile path and hashes the full file with sha256_file(&resolved.resolved_path) with no
maximum-size enforcement, enabling the same unbounded behavior during cache operations.

Bounded manifest/parser resource limits with stable error codes (CLI + library)
ordvec-manifest/src/lib.rs[950-1026]
ordvec-manifest/src/sqlite.rs[436-462]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The verifier and SQLite cache-key derivation hash the encoder distortion profile artifact via `sha256_file()` with no maximum-byte ceiling, which can be abused to cause excessive resource use and prevents failing early with a stable size/limit error code.

## Issue Context
PR Compliance ID 1 requires explicit, caller-configurable resource limits (with safe defaults) and stable error codes for limit failures, applied consistently across verifier inputs (including auxiliary/side artifacts) and cache metadata/behavior, and across both library and CLI verification paths.

## Fix Focus Areas
- ordvec-manifest/src/lib.rs[950-1026]
- ordvec-manifest/src/sqlite.rs[436-462]
- ordvec-manifest/src/lib.rs[1690-1696]
- ordvec-manifest/src/lib.rs[2302-2319]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Report JSON not backward-compatible ✓ Resolved 🐞 Bug ☼ Reliability
Description
VerificationReport adds a non-optional encoder_distortion field without #[serde(default)], so
older serialized reports that omit the field will fail to deserialize. This can break
replaying/storing reports (including SQLite’s cached report_json deserialization path) even though
the codebase already treats missing fields as a supported compatibility case (e.g.,
auxiliary_artifacts).
Code

ordvec-manifest/src/lib.rs[R2687-2693]

    #[serde(default)]
    pub auxiliary_artifacts: Vec<AuxiliaryArtifactReport>,
    pub row_identity: RowIdentityReport,
+    pub encoder_distortion: EncoderDistortionReport,
    pub calibration: CalibrationReport,
    pub attestation_shape_checks: Vec<AttestationShapeCheck>,
    pub errors: Vec<ReportIssue>,
Evidence
VerificationReport marks auxiliary_artifacts with #[serde(default)] and has a test ensuring
reports deserialize when that field is missing, indicating missing-field deserialization is an
intended compatibility behavior. However, the newly added encoder_distortion field is required and
lacks #[serde(default)], so missing-field deserialization will fail; SQLite’s cache loader
deserializes report_json directly into VerificationReport via serde_json::from_str, making
this a real compatibility hazard for any stored report JSON.

ordvec-manifest/src/lib.rs[2681-2696]
ordvec-manifest/tests/manifest.rs[2090-2100]
ordvec-manifest/src/sqlite.rs[225-294]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`VerificationReport` includes a new required field `encoder_distortion` but does not mark it with `#[serde(default)]`. This makes deserialization fail when the JSON payload was produced by an older version that didn't include this field.

### Issue Context
The report struct already demonstrates an explicit backward-compatibility pattern via `#[serde(default)]` on `auxiliary_artifacts`, with a test covering missing-field deserialization.

### Fix Focus Areas
- ordvec-manifest/src/lib.rs[2681-2696]
- ordvec-manifest/tests/manifest.rs[2090-2100]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Bounds overflow skips mismatch 🐞 Bug ≡ Correctness
Description
validate_encoder_distortion_bounds computes expected = declared_upper_bound / declared_lower_bound
without checking expected.is_finite(), so extreme-but-finite bounds can overflow to +inf and make
the tolerance +inf, silently bypassing the mismatch error. This can allow nonsensical bound
combinations to verify successfully (or produce confusing reports) when estimated_distortion is
present.
Code

ordvec-manifest/src/lib.rs[R845-858]

+        if lower.is_finite() && upper.is_finite() && lower > 0.0 && upper > 0.0 {
+            if let Some(estimated) = bounds.estimated_distortion {
+                let expected = upper / lower;
+                let tolerance = 1e-9_f64.max(expected.abs() * 1e-9);
+                if estimated.is_finite() && (estimated - expected).abs() > tolerance {
+                    report.error(
+                        "encoder_distortion_distortion_mismatch",
+                        format!(
+                            "encoder_distortion.bounds.estimated_distortion {} does not match declared_upper_bound / declared_lower_bound {}",
+                            estimated, expected
+                        ),
+                    );
+                }
+            }
Evidence
The new mismatch check computes expected = upper / lower and builds tolerance from
expected.abs() but only checks estimated.is_finite(), not expected.is_finite(), so overflow to
infinity can bypass the mismatch condition.

ordvec-manifest/src/lib.rs[838-858]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`validate_encoder_distortion_bounds` divides `upper/lower` and uses the result to derive a tolerance, but never checks whether the computed ratio is finite. In floating-point, dividing two finite positive `f64`s can still overflow to `+inf`, causing the tolerance to become `+inf` and the mismatch check to never fire.

### Issue Context
This logic is part of the new `encoder_distortion` manifest block verification.

### Fix Focus Areas
- ordvec-manifest/src/lib.rs[787-861]

### Suggested fix
- After computing `expected = upper / lower`, explicitly check `expected.is_finite()`.
 - If not finite: emit a dedicated verifier error (e.g., `encoder_distortion_distortion_overflow`) explaining that `declared_upper_bound/declared_lower_bound` is not finite.
 - Otherwise, proceed with the tolerance/mismatch check.
- (Optional, if intended) document or relax the strictness of the tolerance if `estimated_distortion` is expected to be rounded/empirical rather than an exact ratio.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional, strict encoder_distortion manifest support to ordvec-manifest and extends verification + SQLite caching so distortion-profile artifact bytes participate in cache invalidation. This fits into the crate’s provenance/verifier lane by strengthening “verify before load” guarantees for additional (optional) sidecar provenance.

Changes:

  • Introduces a typed encoder_distortion manifest block (schema v1) plus structured verification + report output.
  • Enforces verification of encoder/tokenizer/pooling identity, metric spec digests, finite bounds/scope/evidence fields, optional calibration linkage, and optional sidecar profile artifact integrity.
  • Extends the SQLite verification cache key + schema to include encoder-distortion profile bytes, with tests ensuring drift invalidates cached reports.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
ordvec-manifest/src/lib.rs Adds encoder_distortion schema types, verification logic, and report fields; extends embedding identity fields used for cross-checks.
ordvec-manifest/src/sqlite.rs Adds encoder_distortion_profile_sha256 to the verification cache key and SQLite schema/migration + lookup logic.
ordvec-manifest/tests/manifest.rs Adds comprehensive tests for encoder-distortion schema strictness, verification rules, artifact enforcement, calibration linkage, and sqlite cache invalidation.
ordvec-manifest/README.md Updates verifier scope description and sqlite cache behavior wording to include encoder-distortion profile bytes.
docs/INDEX_PROVENANCE.md Documents encoder-distortion verification scope and clarifies it as scoped evidence (not a global theorem).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ordvec-manifest/src/lib.rs
@Fieldnote-Echo Fieldnote-Echo force-pushed the codex/encoder-distortion-manifest branch 2 times, most recently from 73f25d3 to ac3ab63 Compare June 3, 2026 14:56
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
@Fieldnote-Echo Fieldnote-Echo force-pushed the codex/encoder-distortion-manifest branch from ac3ab63 to 60b735e Compare June 3, 2026 17:41
Copy link
Copy Markdown
Owner Author

Rebased #152 onto current main after #157/#158 and preserved both bounded verifier layers.

Local validation:

  • cargo fmt --all --check
  • cargo test -p ordvec-manifest
  • cargo test -p ordvec-manifest --features sqlite
  • git diff --check

@Fieldnote-Echo
Copy link
Copy Markdown
Owner Author

/agentic_review

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Jun 3, 2026

Code review by qodo was updated up to the latest commit 60b735e

Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Copy link
Copy Markdown
Owner Author

Follow-up for the latest Qodo summary on #152:

  • Distortion profile hash unbounded: already fixed on the current head. Library verification uses sha256_file_bounded() with ResourceLimits::max_encoder_distortion_profile_bytes and stable report code encoder_distortion_profile_too_large; SQLite cache-key computation uses the same bounded hash path and treats over-limit profiles as non-cacheable.
  • Bounds overflow skips mismatch: already fixed on the current head. validate_encoder_distortion_bounds() checks that declared_upper_bound / declared_lower_bound is finite before computing tolerance, and encoder_distortion_bounds_ratio_overflow_is_rejected covers the regression.
  • Report JSON not backward-compatible: fixed in a126cc8 by adding #[serde(default)] to VerificationReport::encoder_distortion plus verification_report_deserializes_missing_encoder_distortion_field.

Local validation after the compatibility fix:

  • cargo fmt --all --check
  • cargo test -p ordvec-manifest
  • cargo test -p ordvec-manifest --features sqlite
  • git diff --check

@project-navi-bot project-navi-bot self-requested a review June 3, 2026 23:37
@project-navi-bot project-navi-bot merged commit a9b0e95 into main Jun 3, 2026
30 checks passed
@project-navi-bot project-navi-bot deleted the codex/encoder-distortion-manifest branch June 3, 2026 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants