Skip to content

v5.2.6: rendering and consistency fixes#2

Merged
jkyberneees merged 2 commits into
mainfrom
fix/v5.2.6-consistency
May 30, 2026
Merged

v5.2.6: rendering and consistency fixes#2
jkyberneees merged 2 commits into
mainfrom
fix/v5.2.6-consistency

Conversation

@jkyberneees
Copy link
Copy Markdown
Contributor

Six surgical consistency/rendering fixes to the protocol document. No changes to protocol logic — these restore document self-consistency and fix markdown rendering.

Fixes

# Section Fix
1 §7.5 Moved the mandatory same-family constraint out of the code fence — it was rendering as gray monospace with literal **asterisks**; now reads as body prose
2 §6.1, §6.6 Repaired three table rows missing their leading | (Adversarial Surface, Documentation Coverage, Documentation generation) — they were misrendering
3 §6.7 / §7.5 Scoped the 3-attempt repair cap explicitly as per-PR (shared budget across all auto-repairable findings), not per-finding
4 §3.5 Added the missing spec_independence row to the ρ sub-signal table. The four prior rows summed to 0.25 while the stated cap is 0.30 — the fifth +0.05 row closes the gap. (Appendix A, the schema, and §6.5.1 already listed it; the §3.5 table was the lone omission.)
5 §3.5 Added a back-reference to the §0.2 unknown-generator_identity fallback (family/version → maximum contribution)
6 §3.8 Annotated the pr_size_cap ceiling as HumanReviewRequired per §0.3 inside the max_severity() block

Version

Bumped 5.2.5 → 5.2.6 across all carriers: protocol frontmatter, JSON certificate sample, inline changelog, README badge, index.html (badge + footer), and the og.svg social card.

Notes

  • Finding 4 was redirected during review: the original analysis targeted Appendix A, but that glossary already listed all five sub-signals. The real defect was the §3.5 table, where fixing it also resolved the 0.25-vs-0.30 arithmetic inconsistency.

🤖 Generated with Claude Code

jkyberneees and others added 2 commits May 30, 2026 17:48
- §7.5: move mandatory same-family constraint out of the code fence so it
  renders as body prose, not gray monospace
- §6.1, §6.6: repair three table rows missing their leading pipe
  (Adversarial Surface, Documentation Coverage, Documentation generation)
- §6.7 / §7.5: scope the 3-attempt repair cap explicitly as per-PR, not
  per-finding
- §3.5: add the missing spec_independence row to the ρ sub-signal table;
  the four prior rows summed to 0.25 while the stated cap is 0.30
- §3.5: cross-reference the §0.2 unknown-generator_identity fallback
- §3.8: annotate pr_size_cap ceiling as HumanReviewRequired per §0.3
- bump version 5.2.5 → 5.2.6 (frontmatter, JSON cert, changelog, README
  badge, landing page)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jkyberneees
Copy link
Copy Markdown
Contributor Author

🔖 Verification Certificate — Protocol self-run

This PR was verified by running the AI Verification Protocol on itself (five-agent pipeline, §12). Every signal below rests on Agent-D oracle checks that were actually executed — all 8 contract clauses passed.

Headline

Verdict: 🔴 HumanReviewRequired — binding gate is correlation, not defects. Content is clean (η_raw = 0.93, 8/8 oracles green), but this is Claude verifying Claude with no provenance attestation, so ρ maxes to 0.20 and pulls η down to 0.73, below the 0.80 auto-threshold. The protocol correctly flagged its own self-verification as insufficiently independent.

PR: #2 — v5.2.6: rendering and consistency fixes
Subject SHA: 522a728 (post-merge)   Base: 518c3a0 (origin/main)

Classification:      GeneratedCode  (AI-authored, generator_identity ABSENT → §0.2 fallback)
Pipeline diversity:  B=C=D=E = Claude Opus 4.8 → INSUFFICIENT (monoculture; §0.1 violated)
η:  0.73   (η_raw 0.93 − ρ 0.20)
    signals: o=1.00, s=1.00, t=0.70, d=1.00 ; m/b/f SKIPPED (no exec code — Appendix B)
ρ:  0.20   family +0.10, version +0.05, ast 0, shared_mut 0, spec_independence +0.05
Ci/Cv:  ratio = null (Ci_zero — no gateway cost captured) ; Cv ≈ $3
Verification Gap (module): ~0.00 | stop-ship: false

Axes Summary

Axis Status Key Finding
2.1 Semantic Correctness All 6 fixes + version bump do what the spec claims (C1–C8 oracles pass)
2.2 Behavioral Contract ⚠️ Clauses all match — but spec authored by same agent → §3.5 spec-independence flag
2.3 Security Surface index.html/og.svg edits are static text; no new sink
2.4 Structural Integrity Documentation-only; no coupling/architecture change
2.5 Behavioral Exploration Markup renders/parses (fence balanced, tables well-formed); no runtime behavior
2.6 Dependency Integrity No dependency changes
2.7 Generator Provenance ⚠️ generator_identity absent on a Generated* PR → ρ family/version maxed per §0.2
2.8 Adversarial Surface No untrusted-input sink introduced
2.9 Documentation Coverage d=1: change is docs; no public API altered without docs

η derivation (Agent E re-derived, §3.2)

Three signals are inapplicable to a docs/markup PR (no executable code → no mutants m, branches b, or fuzz surface f), so per Appendix B they're skipped and their weight redistributed:

Signal Value Orig w Redistributed w Contribution
o oracle agreement 1.00 0.24 0.558 0.558
s SAST clean 1.00 0.04 0.093 0.093
t static depth 0.70¹ 0.10 0.233 0.163
d doc coverage 1.00 0.05 0.116 0.116
m,b,f 0.57 skipped

η_raw = 0.930 → η = clamp(0.930 − 0.20, 0, 1) = 0.730
¹ t=0.70: markup is lint-clean but has no type checker — §3.1 dynamic-language cap.

Verdict precedence (§3.8) — max_severity

  • η_band(0.73) → HumanReviewRequired (η < 0.80) ← binding gate
  • ρ_band(0.20) → HumanReviewRecommended
  • ΔDebt(0.03h) → AutoApprove · gap → AutoApprove · pr_size(23 LOC) → AutoApprove · no 🔴 axis
  • Result: HumanReviewRequired

Oracle evidence (Agent D — all executed, all pass)

Clause Check Result
C1 §7.5 prose outside fence (line 952 > close-fence 950)
C2 3 repaired rows begin with |
C3 "per-PR" present in §6.7 (826) and §7.5 (956)
C4 §3.5 table = 5 rows, contributions sum to 0.30
C5 §3.5 references §0.2 unknown-identity fallback (319)
C6 §3.8 ceiling annotation present (363)
C7 zero residual 5.2.5
C8 no weight/band/formula value altered

Unverified gaps

  • Pipeline monoculture (high): B/C/D/E are one model family — no independent oracle (§4.5). This is why ρ=0.20.
  • Provenance absent (medium): no generator_identity block (§0.2) → defaulted to GeneratedCode, max correlation suspicion.
  • Ci unmeasured (low): no generator-gateway cost → ratio: null per the §4 zero-Ci guard.
Machine-readable certificate (§5.1 JSON)
{
  "$schema": "https://21no.de/schemas/verification-certificate-v5.json",
  "certificate_id": "cert-522a728f-pr2",
  "created_at": "2026-05-30T17:52:37+02:00",
  "protocol_version": "5.2.6",
  "weights_version": "weights_v5.1.0",
  "partial": false,
  "pr": { "number": 2, "title": "v5.2.6: rendering and consistency fixes",
    "repo": "BackendStack21/ai-verification-protocol",
    "sha": "522a728f361f48dddc86507bf3bf49c8ce850d89",
    "loc_changed": 23, "oversized": false, "superseded_by": "" },
  "classification": "GeneratedCode",
  "spec": { "reference": "PR body + finding-reassessment", "author": "Claude Opus 4.8",
    "independence_flag": true, "contract_status": "ok" },
  "untrusted_input": { "violation_detected": false, "violation_excerpt": "" },
  "generator": { "model": "claude-opus-4-8", "version": "", "provider": "Anthropic",
    "lineage_status": "not_provided", "rerolls": 0 },
  "pipeline": {
    "agent_b": { "model": "claude-opus-4-8", "provider": "Anthropic" },
    "agent_c": { "model": "claude-opus-4-8", "provider": "Anthropic" },
    "agent_d": { "model": "claude-opus-4-8", "provider": "Anthropic" },
    "agent_e": { "model": "claude-opus-4-8", "provider": "Anthropic" },
    "diversity_ok": false,
    "diversity_notes": "Single-family self-verification; §0.1 two-family minimum not met. family/version rho maxed per §0.2." },
  "eta": { "value": 0.730,
    "signals": { "o": 1.0, "s": 1.0, "t": 0.7, "d": 1.0 },
    "signals_skipped": ["m","b","f"],
    "weights": { "o": 0.558, "s": 0.093, "t": 0.233, "d": 0.116 },
    "weights_redistributed": true,
    "rho": 0.20,
    "rho_breakdown": { "family": 0.10, "version": 0.05, "ast": 0.0, "shared_mutants": 0.0, "spec_independence": 0.05 },
    "model_pair": ["claude-opus-4-8","claude-opus-4-8"],
    "rederivation_attested": true },
  "debt": { "module_path": "/", "delta_hours": 0.03,
    "ci_dollars": 0.0, "ci_estimated": true, "ci_source": "token_estimate",
    "cv_dollars": 3.0, "ratio": null, "ratio_note": "Ci_zero",
    "module_class": "Dormant" },
  "verification_gap": { "module": 0.0, "repo": 0.0, "stop_ship": false },
  "remediations": [],
  "unverified_gaps": [
    { "id": "U1", "axis": "0.1", "reason": "single-family pipeline; no independent oracle", "risk": "high" },
    { "id": "U2", "axis": "2.7", "reason": "generator_identity absent", "risk": "medium" },
    { "id": "U3", "axis": "4", "reason": "no gateway Ci captured", "risk": "low" } ],
  "verdict": "HumanReviewRequired",
  "rationale": "Binding gate: eta_band (eta=0.73 < 0.80). eta_raw=0.93 is strong; the penalty is correlation — rho=0.20 from same-family verification (Claude verifying Claude) with absent provenance and a non-independent spec. All 9 axes pass except 2.2/2.7 (warn), driven by the same root cause. Content is correct (8/8 oracles); the verdict reflects pipeline independence, not defects.",
  "attestation": { "signer": "agent-e:claude-opus-4-8", "signature": "UNSIGNED-dogfood-run",
    "co_signatures": [], "transparency_log_entry": "" }
}

Notes

  • The protocol correctly distrusts itself: a defect-free PR still lands at HumanReviewRequired purely because verifier and generator share a family. ρ doing its job.
  • Two fixes from this very PR fired on this very PR: the §3.5 spec_independence sub-signal (5th row, +0.05) and the §0.2 unknown-identity fallback both contributed to ρ.
  • Zero-Ci guard (v5.2.x) prevented a divide-by-zeroratio: null, not a crash.
  • To clear to AutoApprove: re-run with B/C/D from a different provider family and attach a generator_identity block — drops ρ family/version (+0.15 → ~0) and lifts η back toward 0.93.

Certificate generated by an unsigned dogfooding run of the protocol against its own PR. signature: UNSIGNED-dogfood-run — not a production-attested certificate.

@jkyberneees jkyberneees merged commit 39aacbe into main May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant