fix(credibility): M1 slice — validator cross-checks, invocation-graded inspect, gate hardening, honest schemas + docs by SollanSystems · Pull Request #8 · SollanSystems/loop-engineer

SollanSystems · 2026-07-02T17:00:25Z

M1 credibility slice of the v1.0 launch cut-line: make every "the loop proves its work" claim mechanically true, and stop the toolkit from accepting self-asserted success anywhere.

What changed

Validator (G1 + SCHEMAS) — loop/contract.py

A Succeeded terminal now requires false_completion: false AND at least one true criteria_met entry; contradictory terminals emit a doctor issue.
Real JSON-Schema validation against schemas/*.json via the new optional [schemas] extra (pip install -e ".[schemas]"); the report now carries an honest validation_mode field (jsonschema vs structural-fallback) instead of implying schema validation that wasn't running. A field-agreement test pins every schema required field to actual enforcement.

Inspector (INSPECT) — scripts/inspect_loop.py

False-completion-defense credit is now graded on invocation evidence (a verify-script line or recorded RUNLOG/receipt run), not claims: invoked (full) / wired (half) / none (zero). A bare false_completion: false flag or prose mention earns nothing.
Consequence honestly documented: the flagship example's inspect score drops 90→76 ("strong"→"ok") until M2 wires a real gate invocation into it.

Gates (G2 + G3)

holdout_gate: an empty visible set now returns NotReady (symmetric with empty holdout) — nothing was optimized against, so nothing can be certified.
anticheat_scan: any non-cosmetic self-edit to the scanner's own source is flagged high for human review, closing the one-line return False self-neuter hole (conservative diff-layer invariant; documented rationale for not special-casing docstrings).

Scaffold + templates (TEMPLATES + SCAFFOLD)

New deterministic python3 -m loop scaffold <dir>: renders templates/ with valid defaults; output passes doctor unedited (pinned test, both dependency modes).
Templates aligned with the validator's real shape (schema key, real terminal shape); scaffold task status emits the schema-valid pending.

Docs honesty (RECEIPTS)

examples/coverage-repair and CHANGELOG no longer assert a receipts trail the frozen example doesn't ship; corrected via a CHANGELOG Errata section (history intact) and guarded by a new test.

Proof (archived in the launch loop workbench)

Mechanical fail-before/pass-after: pre-M1 tree + only the six test files → 7 FAILED; this branch → 7 passed. Every pinned test exercises new behavior, none assert the status quo.
Full suite: 118 passed + 2 skipped (stdlib-only) / 120 passed (with jsonschema).
verify-full: PASS — plugin validate --strict, doctor on the example and the dogfood .loop, self_eval 13/13, frontmatter 9/9.

One integration repair was needed after merging the six independently-built clusters (scaffold status enum vs the now-enforced schema enum): 63c993d.

An empty visible set made _all_passed([]) vacuously True, so decide([], [green_holdout]) wrongly certified Succeeded. Add a symmetric guard mirroring the empty-holdout case: no visible checks means nothing was optimized against, so the gate is NotReady. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

examples/coverage-repair ships no receipts trail, but WORKFLOW.md, README.md, and the CHANGELOG 0.3.4 note asserted it "records receipts at .loop/receipts/*.jsonl". Soften the two example docs to describe the mechanism ("a live run appends receipts to ...; this frozen example ships the contract artifacts, not a receipts trail") and add a dated ## Errata to CHANGELOG correcting the claim without rewriting 0.3.4 history. Add scripts/test_docs_claims.py: a behavioral guard that flags any present-tense "records receipts"/"receipts land" assertion adjacent to the .loop/receipts glob and requires the referenced example to actually ship receipt files (changelog history exonerated by a receipts Errata). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The collection-shape and severity-mapping self-checks only catch two known shapes; a diff inserting `return False` into _is_gate_path's body rewrote the decision logic itself and certified clean:true. Add a diff-layer invariant: any hunk touching anticheat_scan.py that adds or removes a non-comment, non-blank line is a scanner_self_edit finding (high -> FailedUnverifiable). Cosmetic-only edits stay clean. Docstrings are deliberately not exempted — a triple-quote-state heuristic would itself be a bypass vector, and a false-positive on maintenance is correct: scanner maintenance should get human eyes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The false-completion-defense check credited a self-asserted `false_completion: false` terminal flag, a `verifier_gaming` manifest key, prose mentions, or an unreferenced gate script file — all claims a loop makes about itself, not evidence. A temp dir with only `.loop/terminal_state.json={"false_completion":false}` earned full defense credit. Grade the credit instead: - invoked (full): a scripts/verify-* gate invokes a holdout/anti-cheat gate on an executable line, or RUNLOG/.loop/receipts records a run. - wired (half): a gate script exists and is referenced from the verify surface but no run is recorded. - none (zero): a bare terminal flag, prose, or unreferenced script. examples/coverage-repair loses its self-asserted credit (90/strong → 76/ok) and gains an honest, actionable gap; README snippet updated to the real output. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

state.json.tmpl and terminal_state.json.tmpl emitted schema_version: "1.0" while contract.py checks the schema key against loop-engineer/state@1 and loop-engineer/terminal@1; terminal_state.json.tmpl was a wholesale-obsolete shape missing criteria_met/false_completion/evidence/state. Rewrite both to the validators real field names. Replace the STUB-marked verify scripts with real, dependency-free minimal gates so a fresh scaffold passes the products own doctor. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Add python -m loop scaffold <dir>: copies templates/ into the standard repo-OS layout with every {{PLACEHOLDER}} filled by an honest, valid default (goal REPLACE: one-line goal, empty-but-valid structures, project name from the target dir), resolving templates/ relative to the package root so it works from an editable install. It never writes terminal_state.json (written once at loop end) and refuses to overwrite an existing contract dir. Make validate_contract treat a missing terminal_state.json as valid-in-flight when state.json declares terminal_state: null, so a fresh scaffold passes doctor unedited; a state that names a terminal with the file missing still flags. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

A Succeeded terminal is the loop's strongest claim, but _validate_terminal accepted one with false_completion=true or with no met criterion. Add a cross-field check (runs in both validation modes) that emits a contradictory_terminal issue naming exactly what contradicts. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…e (M1-SCHEMAS) schemas/*.json were never loaded; doctor emitted schemas_checked implying validation that never happened. Now: when jsonschema is importable, validate the manifest/state/tasks/terminal artifacts against schemas/*.json (resolved relative to the package repo root); otherwise fall back to the stdlib structural hand checks. The report gains validation_mode ("jsonschema" | "structural-fallback") stating what actually ran. Reconcile the schema files with the real shipped contracts (examples/coverage-repair, roadmap/v1.0) so both pass in BOTH modes -- narrowing over-required fields to optional and widening a few types, each documented in the schema description. Cross-field rules JSON Schema cannot express (terminal contradiction, task id uniqueness, evidence-before-done) run in both modes. Add a schemas extra (jsonschema) to pyproject and jsonschema to the CI install so CI exercises the real-validation path. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…e root .loop to validator shape Scaffold rendered TASKS.json task status "todo", but tasks.schema.json's enum is [pending, active, blocked, done, abandoned] — enforced via the jsonschema validation path in loop/contract.py. Emit "pending" instead so a fresh scaffold passes doctor unedited both with and without jsonschema. Also migrate the repo-root .loop/ v0.3 dogfood contract in place to the validator's shapes (schema fields on state/tasks/terminal; terminal now carries state/criteria_met/evidence/false_completion; tasks map to done + real evidence) so `python -m loop doctor .loop` exits ok:true in both validation modes. (.loop/ is gitignored run telemetry, so the migration lives in the working tree.) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63c993deae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-02T17:04:17Z

+            target = m.group("b") or pending_minus
+            in_self = bool(target) and _basename(target) == "anticheat_scan.py"


Flag scanner renames as self-edits

When a diff only renames scripts/anticheat_scan.py (for example to scripts/_disabled_scan.py), this logic keys self-edit detection solely off the +++ b/... path and then sees no added/removed hunk lines, while parse_changed_files() still excludes the old scanner basename from gate-tampering via _SELF_FILES. That means a pure scanner rename returns clean: true and can disable the anti-cheat gate without human review; please treat rename-from/rename-to metadata involving the scanner as a scanner_self_edit.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-02T17:04:17Z

+            stripped = line.strip()
+            if not stripped or stripped.startswith("#"):
+                continue
+            if any(token in stripped for token in _GATE_TOKENS):


Require real gate commands before invoked credit

When a verify-* script merely prints or documents a gate name on a non-comment line, such as echo "TODO: run holdout_gate.py", this substring check awards full false-completion defense (invoked) credit and can raise the inspector verdict to strong even though no holdout/anti-cheat gate ran. Since the new scoring is supposed to be based on invocation evidence rather than claims, please distinguish actual command invocations from echo/assignment/prose lines before granting full credit.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR hardens the “credibility slice” of the loop contract tooling by making success/verification claims mechanically enforceable: it tightens validator semantics (including optional real JSON-Schema validation), makes the inspector grade false-completion defense based on invocation evidence, hardens holdout/anti-cheat gates, adds a deterministic scaffold command, and corrects docs to avoid overstating shipped receipts.

Changes:

Add optional jsonschema-backed validation with an explicit validation_mode, plus stronger cross-field/cross-task enforcement in the core contract validator.
Update the inspector and gates to prevent self-asserted success/defense signals from receiving credit without evidence.
Introduce a deterministic python -m loop scaffold <dir> path with templates aligned to the enforced contract shape, and add tests guarding docs honesty.

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
templates/verify-full.sh	Adjust verify-full template to compose verify-fast and remove stub markers.
templates/verify-fast.sh	Add a real fast check (contract files present) and remove stub markers.
templates/terminal_state.json.tmpl	Update terminal template to the new schema/key shape.
templates/state.json.tmpl	Update state template to use `schema` key (v1 shape).
scripts/test_scaffold.py	Add scaffold regression tests (doctor-clean output, layout, CLI).
scripts/test_loop_contract_core.py	Add tests for `validation_mode`, jsonschema enforcement, and terminal contradiction rules.
scripts/test_inspect_loop.py	Add tests for invocation-graded false-completion defense scoring.
scripts/test_holdout_gate.py	Add test for empty visible set returning `NotReady`.
scripts/test_docs_claims.py	Add guard test preventing docs from claiming shipped receipts that don’t exist.
scripts/test_anticheat_scan.py	Expand tests to require scanner self-edits be flagged for human review (non-cosmetic).
scripts/inspect_loop.py	Implement invocation/wiring/none grading for false-completion defense; update scoring output.
scripts/holdout_gate.py	Make empty visible set return `NotReady` (cannot certify).
scripts/anticheat_scan.py	Add scanner self-edit detection for non-cosmetic edits to the scanner source.
schemas/terminal.schema.json	Reconcile terminal schema required fields with real contracts; clarify description.
schemas/tasks.schema.json	Broaden task evidence type to allow arrays; clarify description.
schemas/state.schema.json	Narrow required fields and broaden types to match shipped contracts; clarify description.
schemas/manifest.schema.json	Broaden permissions item type; clarify description.
README.md	Document `validation_mode` and updated inspect scoring for the example.
pyproject.toml	Add `[schemas]` extra (jsonschema) and update optional-deps documentation comments.
loop/scaffold.py	Add scaffold implementation (template rendering + verify script installation).
loop/contract.py	Add contradiction checks, jsonschema validation mode, schema loading, and in-flight terminal handling.
loop/main.py	Add `scaffold` CLI subcommand and update usage.
examples/coverage-repair/WORKFLOW.md	Reword receipts language to mechanism description (no shipped receipts claim).
examples/coverage-repair/README.md	Reword receipts language to mechanism description (no shipped receipts claim).
CHANGELOG.md	Add Errata entry correcting prior receipts claim.
.gitignore	Ignore review/ and roadmap/ workbench directories.
.github/workflows/ci.yml	Install jsonschema in CI to exercise jsonschema-mode tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def _gate_run_recorded(paths) -> bool:
+    """RUNLOG.md / .loop/receipts/*.jsonl record an actual gate run."""
+    texts = [_read_text(paths.runlog)]
+    receipts = paths.loop_dir / "receipts"
+    if receipts.is_dir():
+        texts.extend(_read_text(p) for p in sorted(receipts.glob("*.jsonl")))
+    for text in texts:
+        for line in text.splitlines():
+            low = line.lower()
+            if any(token in low for token in _GATE_TOKENS):
+                return True
+            if ("holdout" in low or "anticheat" in low or "anti-cheat" in low) and any(
+                word in low for word in _GATE_RUN_WORDS
+            ):
+                return True
+    return False


+def _validation_mode() -> str:
+    try:
+        import jsonschema  # type: ignore  # noqa: F401
+    except Exception:
+        return "structural-fallback"
+    return "jsonschema"
+


+def scaffold(target: str | Path) -> dict[str, Any]:
+    """Write a fresh, doctor-clean repo-OS contract into ``target``.
+
+    Refuses to overwrite an existing contract dir (a live loop owns its state).
+    """
+
+    target = Path(target)
+    if target.exists() and _has_existing_contract(target):
+        raise FileExistsError(f"contract already exists at {target}")


+# The core is pure-stdlib. Two optional extras enrich validation when present:
+#   yaml     — PyYAML parses the manifest; absent, loop/contract.py falls back to
+#              a stdlib subset parser.
+#   schemas  — jsonschema runs real JSON-Schema validation against schemas/*.json;
+#              absent, loop/contract.py falls back to structural hand checks.
+# So `pip install -e .` pulls in zero third-party runtime dependencies.
 [project.optional-dependencies]
 yaml = ["pyyaml>=6"]
+schemas = ["jsonschema>=4"]


…oval) Provenance note added: review/ and roadmap/ workbenches stay untracked; pointers are maintainer-facing.

SollanSystems and others added 15 commits July 1, 2026 21:20

chore(gitignore): ignore review/ and roadmap/ workbench dirs

d249e99

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Merge branch 'worktree-wf_5c76d0de-cc8-2' into launch/m1-credibility

6a42736

Merge branch 'worktree-wf_5c76d0de-cc8-3' into launch/m1-credibility

35f3746

Merge branch 'worktree-wf_5c76d0de-cc8-4' into launch/m1-credibility

ca58079

Merge branch 'worktree-wf_5c76d0de-cc8-5' into launch/m1-credibility

c4556e2

Merge branch 'worktree-wf_5c76d0de-cc8-6' into launch/m1-credibility

f3fc7c8

Copilot AI review requested due to automatic review settings July 2, 2026 17:00

Copilot started reviewing on behalf of SollanSystems July 2, 2026 17:00 View session

chatgpt-codex-connector Bot reviewed Jul 2, 2026

View reviewed changes

Copilot AI reviewed Jul 2, 2026

View reviewed changes

docs(roadmap): publish the v1.0 roadmap (rides M1 per M0-HYGIENE appr…

e30fbc5

…oval) Provenance note added: review/ and roadmap/ workbenches stay untracked; pointers are maintainer-facing.

SollanSystems merged commit 70408d6 into main Jul 2, 2026
4 checks passed

SollanSystems deleted the launch/m1-credibility branch July 2, 2026 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(credibility): M1 slice — validator cross-checks, invocation-graded inspect, gate hardening, honest schemas + docs#8

fix(credibility): M1 slice — validator cross-checks, invocation-graded inspect, gate hardening, honest schemas + docs#8
SollanSystems merged 16 commits into
mainfrom
launch/m1-credibility

SollanSystems commented Jul 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		target = m.group("b") or pending_minus
		in_self = bool(target) and _basename(target) == "anticheat_scan.py"

Conversation

SollanSystems commented Jul 2, 2026

What changed

Proof (archived in the launch loop workbench)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants