Skip to content

v5.2.0 — Structured Verification Diagnostics: Unified 3-Layer DiagnosticResult Model

Latest

Choose a tag to compare

@rahuldass19 rahuldass19 released this 19 Jun 03:26
a800a2b

What's in this release

v5.2.0 introduces the Structured Verification Diagnostics architecture — a unified DiagnosticResult model that establishes a 3-layer diagnostic contract across all QWED verification engines.

This is an architectural completion release, not a feature release. No existing engine return types are changed. This release establishes the contract that all engines will conform to in subsequent patches.


Why this release exists

Before v5.2.0, every QWED verification engine returned its own ad-hoc Dict[str, Any] with no consistency. Three incompatible VerificationResult dataclasses existed in different modules. Some engines returned (bool, str) tuples with no structure at all. A future diagnostic aggregation layer would need N different adapters for N engines.

More critically, verification diagnostics were not separable by audience. Agents saw internal detection patterns. Developers couldn't reliably find expected vs actual values. Auditors had no proof artifact references — engines computed mathematical proofs (Z3 assertion stacks, Newton-Raphson convergence traces, eigenvalue comparisons) and then discarded them.

v5.2.0 fixes this by establishing a single DiagnosticResult type with three disclosure layers, each targeted at a specific audience.


The 3-Layer Diagnostic Model

Layer 1 — Agent-Safe Diagnostics

Field: agent_message: str

Agent/model-facing summary. Allows agents to correct failures without exposing verification internals. Never contains detection logic, rule IDs, regex patterns, or security bypass guidance.

Example: "Claim could not be deterministically verified"

Layer 2 — Developer Diagnostics

Field: developer_fields: dict

Application-developer-facing structured evidence. Includes constraint_id, expected/actual values, advisory_checks, methods_used, and engine-specific evidence. Structured, not free-form strings.

Example:

{
    "constraint_id": "math_verifier.irr_non_convergent",
    "iterations": 100,
    "final_npv": "0.7",
    "converged": False,
    "advisory_checks": [
        {"name": "llm_fallback", "advisory_only": True, "details": {"llm_verdict": "SUPPORTED"}}
    ],
}

Layer 3 — Proof Diagnostics

Field: proof_ref: Optional[str]

Cryptographic hash (sha256:...) of retained proof artifact. Present only when status == VERIFIED and proof was established. None for UNVERIFIABLE / BLOCKED.

This is the authority bit: downstream gates use a mechanical rule — proof_ref is not None = admissible for control flow, proof_ref is None = reject. No separate authoritative boolean needed.


Key Design Decisions

Tri-state status — no proliferation

DiagnosticStatus has exactly 3 values: VERIFIED, UNVERIFIABLE, BLOCKED. No HEURISTIC, AMBIGUOUS, or CORRECTION_NEEDED. Richer distinctions live in developer_fields.constraint_id.

proof_ref is the authority bit

Resolves the design debate from #190 (Keesan's authoritative flag vs Rahul's status taxonomy). Both are the same thing: proof_ref is not None IS the authority bit AND the status taxonomy enforcer. No new semantic layer needed.

VERIFIED requires proof — structurally enforced

__post_init__ raises ValueError if status == VERIFIED and proof_ref is None (or empty string). "VERIFIED without proof" is impossible to construct — not a caller convention, a type-level invariant.

Frozen dataclasses

Both DiagnosticResult and AdvisoryCheck are frozen=True. Post-construction mutation of proof_ref or status is blocked — prevents bypassing the authority contract without explicit object.__setattr__.

Advisory checks never influence verdicts

AdvisoryCheck (LLM fallback, NLI entailment, VLM interpretation) populates developer_fields.advisory_checks with advisory_only=True enforced via __post_init__. They never set status or proof_ref. This structurally enforces the constraint: "diagnostics must never originate from model reasoning, confidence, or self-assessment."

Migration helper for existing engines

from_legacy_dict() converts ad-hoc engine dicts to DiagnosticResult for fail-closed states. It raises for legacy VERIFIED results — proof artifacts were discarded by pre-v5.2.0 engines, so backfilling is impossible. Callers must use DiagnosticResult.verified() with explicit evidence.


What this release does NOT include

  • ❌ No existing engine return types are changed (additive only)
  • ❌ No explainability features (diagnostics ≠ explainability)
  • ❌ No confidence scores in verdict-deciding position (forbidden by design)
  • ❌ No API/SDK response shape changes
  • ❌ No new verification engines

Engine conformance (migrating ad-hoc Dict[str, Any] returns to DiagnosticResult) is tracked in 10 blocked issues.


Blocked issues — next steps

These issues are unblocked by this release and will conform to the DiagnosticResult contract in subsequent patches:

Issue Engine Diagnostic layer focus
#129 MathVerifier (mode ambiguity) Layer 2 + 3
#130 MathVerifier (eigenvalues) Layer 2 + 3
#131 MathVerifier (IRR convergence) Layer 3
#133 FactVerifier (LLM fallback) Layer 1 + 2 + 3
#134 ImageVerifier (VLM fallback) Layer 1 + 3
#162 LogicVerifier (symbol inference) Layer 2 + 3
#163 GraphFactVerifier (NLI fallback) Layer 2 + 3
#164 ReasoningVerifier (no proof path) Layer 3
#190 FactVerifier (provider drift) Layer 3
#205 SecureCodeExecutor (tech-debt) Layer 1 + 2

Version Propagation

Artifact Old New
qwed (PyPI) 5.1.2 5.2.0
qwed_sdk (Python) 5.1.1 5.2.0
@qwed-ai/sdk (NPM) 5.1.2 5.2.0
qwed (crates.io/Rust) 5.1.2 5.2.0
API version marker 5.1.2 5.2.0
K8s deployment image 5.1.2 5.2.0

Quality

  • 83 new tests covering: status taxonomy, all 3 layers, authority contract, fail-closed enforcement, advisory checks, proof hashing determinism, serialization round-trip, legacy migration, frozen dataclass immutability, and realistic scenarios drawn from the 10 blocked issues
  • 7 rounds of automated review (Sentry, Greptile, CodeRabbit, CodeQL) — all issues addressed
  • Boundary check: passed
  • Lint (ruff): clean
  • No regressions in existing test suite

Included PRs

  • #206 — feat(diagnostics): unified 3-layer DiagnosticResult model (#204)
  • #207 — release: v5.2.0 version propagation

Constraints (non-negotiable)

  1. Diagnostics are NOT explainability — no confidence scores, no chain-of-thought, no model reasoning
  2. All diagnostic fields must originate from verification results, constraints, rule evaluation, schema validation, or proof systems
  3. Agent-safe diagnostics must never expose detection logic, rule IDs, regex patterns, or security bypass guidance
  4. Existing fail-closed behavior must not be weakened

Compatibility

Additive release. No breaking changes. Existing VerificationResult dataclasses and ad-hoc engine dicts continue to work. DiagnosticResult is opt-in — engines migrate incrementally via from_legacy_dict() or direct construction.

Minimum Python: 3.10 (unchanged)
Tested on: Python 3.11.9, Windows