"Metrics real" — false-completion-rate and repair-productivity graduate from claims to derivations, red-teamed before release.
loop metrics <loop-dir> derives FCR and RP from a loop's real on-disk evidence — RUNLOG claims × verify bundles, held-out verdicts, repair records, receipts — never from agent narration. FCR is computed two independent ways and disagreement is surfaced, not resolved; unmatched success claims fail closed. Every number ships with a provenance block naming its input files.
loop metrics --baseline publishes docs/metrics-baseline.json and refuses over anything not genuinely gate-backed: a structurally-valid held-out verdict artifact is mandatory, rejected/unanchored repair records block it, disagreeing FCR methods block it, and a vacuous zero-claim run cannot baseline.
Published baseline (gate-backed examples/coverage-repair): FCR 0.0 · RP 1.0 — reproducible with python3 -m loop metrics examples/coverage-repair; a test binds the README literals to the committed JSON.
Honesty invariants:
productiveis recomputed from each record's own evidence, never trusted; repair records must anchor to a same-task red→green verify-bundle pair.- Claim semantics are outcome-class aware: completion-class claims (
task_passed/succeeded/terminal) require every attached bundle green, no exceptions; progress-class (advanced) tolerates a red intermediate only if the same task goes green in a strictly later iteration. - Canonical record schemas (
loop-engineer/repair@1,loop-engineer/rollout@1) end the two-shapes-called-"the repair record" ambiguity.
Red-teamed before merge: two adversarial review rounds confirmed 17 issues in the metrics implementation — including a --baseline that would have published a clean FCR over a run its own gate flagged — all fixed and pinned as regression tests. Documented residual: a committed verdict is evidence, not proof; tamper detection belongs to the anti-cheat layer.
Also: loop console script (pip install -e .), doctor reports validated record schemas, 217-test suite (+67).
Full details in CHANGELOG.md.