Skip to content

chore(verification): atom-level Lean 4 spike with proofs + bridge + CI#164

Merged
stevenobiajulu merged 12 commits into
mainfrom
spike/lean4-lcs-verification
May 11, 2026
Merged

chore(verification): atom-level Lean 4 spike with proofs + bridge + CI#164
stevenobiajulu merged 12 commits into
mainfrom
spike/lean4-lcs-verification

Conversation

@stevenobiajulu
Copy link
Copy Markdown
Member

@stevenobiajulu stevenobiajulu commented May 6, 2026

Summary

Six-stage Lean 4 verification spike against the safe-docx comparison engine.
The spike answers the time-boxed question "is Lean 4 + mathlib + AI-assisted
proof work tractable here?" with concrete artifacts: 5 closed proofs, 2
intentionally sorry'd specification targets, a fast-check empirical
falsifiability bridge, and a CI gate that re-checks the proofs on every PR
touching verification/lean/**.

Scope is deliberately narrow per the time-box and prior peer-review
corrections: the atom-level LCS subroutine and a sorry'd specification
surface, not the full comparison engine. See verification/ROADMAP.md for
the full Tier 1 → Tier 3+ trajectory.

What's proved (closed, zero-sorry, machine-checked)

verification/lean/LeanSpike/:

  • INV-ATOMSEQ-001 (AtomsEqual.lean) — hash-collision safety of atomsEqual.
  • INV-LCS-001 (Lcs.lean) — value-level subsequence soundness of computeAtomLcs.
  • INV-LCS-002 — optimality of returned matches in the Lean model.
  • INV-LCS-003 — strict index monotonicity.
  • INV-LCS-004 — partition completeness (matched + deleted partition original; matched + inserted partition revised).

What's stated but not proved (sorry'd specification targets)

verification/lean/LeanSpike/Spec.lean over an uninterpreted axiom signature:

  • INV-FIELD-001 (sorry'd) — field-structure preservation across accept/reject of inplace comparison output.
  • INV-RT-001 (sorry'd) — paired round-trip text equality under normalization.

Closing these is Tier 2 work; see ROADMAP.

What's empirically validated (not proved, but tested on every PR)

packages/docx-core/src/integration/lean-spec-bridge.test.ts — fast-check
property tests at 100 runs/property × 2 properties against the live TS
engine, gated on reconstructionModeUsed === 'inplace'. Surfaces
falsifications of INV-FIELD-001 or INV-RT-001 on randomly-generated synthetic
inputs. Local: 0 falsifications, ~3.6s total per full test:run.

Stages

Stage Scope Commit at HEAD
1 Atom projection + atomsEqual + INV-ATOMSEQ-001 shipped
2 Lean model of computeAtomLcs + Wagner-Fischer brute-force validation (1.19M cases) shipped
3 INV-LCS-001/002/003/004 closed proofs shipped
4 Sorry'd Spec.lean with INV-FIELD-001 + INV-RT-001 over uninterpreted axiom signature shipped
5 fast-check bridge wired to live TS engine via buildSyntheticDocx shipped
6 lean-build GitHub Actions workflow with mathlib cache + sorry audits shipped

Each stage was peer-reviewed by Codex and (where useful) Gemini in parallel
before merging into the spike branch. See PR comments for round-by-round
review summaries.

CI

New workflow .github/workflows/lean-build.yml:

  • Triggers on PR + push to main paths-filtered to verification/lean/** + the workflow file itself
  • Caches ~/.elan + verification/lean/.lake keyed on lean-toolchain + lake-manifest.json
  • Runs lake exe cache get (best-effort) → lake build (authoritative)
  • Audits Stage 1-3 modules stay zero-sorry; Spec.lean stays at exactly 2 sorries
  • Empirical first run (cold cache): 5m14s end-to-end

Workflow is not in branch protection yet. After merge, the maintainer
can flip it to required via Settings → Branches → main → Require status checks.

What this PR does NOT cover

Per the ROADMAP, the following remain open:

Estimated remaining effort (with wide error bars): 2.5–5 years of
intermittent agent-driven work to reach Tier 3 complete.

Caveats (already documented in verification/lean/README.md)

  • Proofs are properties of the Lean model, not the TS code. Extensional equivalence Lean LCS ↔ TS Wagner-Fischer is validated empirically (1.19M brute-force cases on sequences ≤ 6 over a 3-symbol alphabet, zero divergence) but not formally proven.
  • atomsEqual_implies_eq holds only because Atom is projected to 3 fields; would break under a broader projection toward the real ComparisonUnitAtom.
  • The fast-check bridge restricts to field-free synthetic input and independent random pairs. Field-bearing coverage lives in collapsed-field-inplace.test.ts; small-edit regime lives in round-trip-inplace.test.ts and nvca-coi-regression.test.ts.

Verification

  • cd verification/lean && lake build (zero sorry in Stage 1-3 modules; exactly 2 in Spec.lean)
  • npm run build clean
  • npm run lint:workspaces clean
  • npm run test:run — 1231 pass + 1 skip across docx-core
  • npm run check:spec-coverage pass
  • Stage 6 lean-build workflow green on actual GitHub Actions runner (5m14s cold cache)
  • All other CI green on this PR

Follow-on (deferred to a separate issue)

Extend the fast-check bridge's arbitraries to exercise the new
tracked-changes-emitting primitives (recent PRs #137/#138/#139/#140 added
pPrChange/trPrChange/tcPrChange and ins/del markup emitters in
footnote / comment / paragraph-insert primitives). The bridge currently
builds inputs via buildSyntheticDocx which emits plain XML; using the
new primitives as input sources would meaningfully widen INV-FIELD-001 /
INV-RT-001 coverage. Tracked separately to keep this PR scoped.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
site Ready Ready Preview, Comment May 11, 2026 2:16pm

Request Review

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@stevenobiajulu stevenobiajulu changed the title spike(verification): Lean 4 verification of LCS subroutine — Stages 1+2 (zero-sorry) chore(verification): prove Lean atom-level LCS invariants May 6, 2026
@github-actions github-actions Bot added the chore label May 6, 2026
@stevenobiajulu
Copy link
Copy Markdown
Member Author

Stage 3 is now pushed on this branch. The Lean spike now proves INV-LCS-002/003/004 in addition to the earlier soundness results, lake build is clean with zero sorry, the full local pre-submit passed (npm run build, npm run lint:workspaces, npm run test:run, npm run check:spec-coverage), and a final codex review --uncommitted found no actionable bugs in the diff.

@stevenobiajulu stevenobiajulu force-pushed the spike/lean4-lcs-verification branch from aa51110 to 0dd683f Compare May 10, 2026 19:15
@stevenobiajulu stevenobiajulu changed the title chore(verification): prove Lean atom-level LCS invariants chore(verification): Lean atom-level LCS proofs + sorry'd Spec.lean May 10, 2026
@stevenobiajulu
Copy link
Copy Markdown
Member Author

Stage 4 landed (sorry'd Spec.lean, INV-FIELD-001 + INV-RT-001)

Three commits stacked on the now-rebased branch:

  • 5cfd6d8 — Initial Stage 4: verification/lean/LeanSpike/Spec.lean (new), LeanSpike.lean import, README section.
  • 4a28caf — Address Codex peer-review findings on 5cfd6d8: narrow compareDocumentXml semantically to inplace-mode comparison output (rebuild-mode candidates bypass evaluateSafetyChecks per pipeline.ts:669); fix 9 citation drifts against current TS source.
  • a1ced8c — Final citation off-by-one: rejectAllChanges body ends at line 626, not 625.

What's in Spec.lean

  • 7 axiom declarations forming a fully uninterpreted Lean signature: OoxmlDoc, compareDocumentXml, acceptAllChanges, rejectAllChanges, validateFieldStructure, extractTextWithParagraphs, normalizeText.
  • Two sorry'd theorems:
    • INV-FIELD-001: validateFieldStructure (acceptAllChanges (compareDocumentXml a b)) = true ∧ validateFieldStructure (rejectAllChanges (compareDocumentXml a b)) = true. Mirrors evaluateSafetyChecks at pipeline.ts:404-440, scoped to the inplace branch.
    • INV-RT-001 (paired): normalizeText (extractTextWithParagraphs (acceptAllChanges combined)) = normalizeText (extractTextWithParagraphs b) ∧ normalizeText (extractTextWithParagraphs (rejectAllChanges combined)) = normalizeText (extractTextWithParagraphs a). Mirrors paired assertions at round-trip-inplace.test.ts:56-63 and :87-94.

Modeling choices (locked in plan peer-review)

  • axiom (not opaque def) for all 7 declarations — bodyless opaque fails the Inhabited-synth check in Lean 4.29.1.
  • compareDocumentXml : OoxmlDoc → OoxmlDoc → OoxmlDoc abstracts the document.xml surface (the public TS compareDocuments is (Buffer, Buffer) => Promise<CompareResult>; this Lean symbol is over the XML inside CompareResult.document, scoped to inplace-mode output).
  • Fully uninterpreted signature — no partial OOXML data model.

Isolation guarantee

  • LeanSpike/AtomsEqual.lean and LeanSpike/Lcs.lean remain zero-sorry.
  • LeanSpike/Spec.lean is the only sorry-bearing file. Verified:
    $ rg -n "sorry" LeanSpike/AtomsEqual.lean LeanSpike/Lcs.lean   # no output
    $ rg -n "sorry" LeanSpike/Spec.lean                            # 2 hits
    $ rg -c "^axiom" LeanSpike/Spec.lean                           # 7
    

Peer review

  • Plan peer-reviewed by Gemini and Codex in parallel before implementation. Both flagged the axiom vs opaque issue, the asymmetric INV-RT-001, and the over-broad INV-FIELD-001; Codex additionally caught that the stale plan file at ~/.claude/plans/ is outside any git repo (split into separate housekeeping).
  • Diff peer-reviewed by Gemini and Codex on 5cfd6d8; Codex found a real semantic precision issue (overclaiming against rebuild mode) plus 9 citation drifts.
  • Fixup 4a28caf peer-reviewed by Codex; found single off-by-one (rejectAllChanges 519-625 → 519-626), fixed in a1ced8c.

Remaining stretch (separate stages)

  • Stage 5: fast-check bridge wiring INV-FIELD-001 and INV-RT-001 to random property tests against the live TS engine in packages/docx-core/src/integration/lean-spec-bridge.test.ts.
  • Stage 6: lean-build GitHub Actions job running lake build in verification/lean/.

PR stays in draft.

…ndness

Stages 1+2 of an experimental Lean 4 verification spike against the
inner LCS subroutine of the comparison engine (atomLcs.ts:45-104).

Why:
- Recurring bug class in reconstruction (e.g. #106 → deferred-adjacent
  #110/#111 follow-ons) suggests provable invariants would prevent
  classes of bugs the test corpus alone won't catch.
- Validates whether Lean 4 + Mathlib + AI agent assist is tractable
  against this codebase before committing to broader verification work.
- Pascal's wager: bounded downside (agent rental we already pay for),
  asymmetric upside if the AI-buyer market materializes and starts
  reading proofs as a procurement signal.

What:
- New verification/lean/ Lake project pinning Lean 4.29.1 + mathlib4
  v4.29.1. Outside npm workspaces; build/distribution via Lake only.
- LeanSpike/Atom.lean: 3-field projected Atom type (sha1Hash,
  textContent, tagName) — only what atomsEqual consults; full
  ComparisonUnitAtom has ~20 fields, the rest are deliberately
  omitted from the model.
- LeanSpike/AtomsEqual.lean: atomsEqual function + INV-ATOMSEQ-001
  proved zero-sorry. Captures hash-collision safety: atomsEqual = true
  implies textContent + tagName equality regardless of sha1Hash.
- LeanSpike/Lcs.lean: Lean 4 model of the Wagner-Fischer LCS as a
  recursive computation over reversed inputs — an alternate executable
  spec, NOT a line-for-line port of the TS DP+backtracking. Proves
  INV-LCS-001 (value-level subsequence soundness) zero-sorry.
- README.md documents proof scope, the LEAN-vs-TS specification gap,
  build instructions, and explicit caveats peer review surfaced.

Verification:
- lake build succeeds in 15s, 3289 jobs, zero warnings, zero errors,
  zero sorry.
- Peer-reviewed by Gemini and Codex CLI in parallel; both approved.
- Codex empirically tested 1.19M sequence-pair comparisons (length ≤ 6,
  3-symbol alphabet) — Lean recursive algorithm and TS DP+backtracking
  produce identical matches in all cases. Strong evidence the alternate
  executable spec is observationally equivalent for small inputs.

Not covered (deferred to future stages):
- INV-LCS-002 (optimality), INV-LCS-003 (strict monotonicity of match
  indices), INV-LCS-004 (partition completeness)
- Reconstruction invariants, round-trip text equality
- Field-structure / document-shape preservation
- The hierarchicalCompare paragraph-level layer (computeAtomLcs is
  the inner subroutine; hierarchicalCompare invokes it per matched
  paragraph group)
- The TypeScript implementation itself (proof targets the Lean model;
  extensional equivalence is a deferred obligation)

Does not modify any TypeScript code. The npm workspaces are unaffected;
verification/lean/ contains no package.json and is not added to the
workspaces array.

Ref: experimental, no issue number; multi-stage spike, expect more
commits on this branch before any merge consideration.
Complete Stage 3 of the Lean LCS spike so the model now covers optimality, index monotonicity, and partition completeness in addition to the earlier soundness results.

This tightens the proof story around the atom-level subroutine before any attempt to extend the verification work toward reconstruction, where the motivating regressions actually live.

Ref: #110
Ref: #111
…T-001

Stage 4 of the Lean 4 verification spike. Adds verification/lean/LeanSpike/Spec.lean
with named specification targets (intentionally sorry'd) over a fully
uninterpreted Lean signature.

INV-FIELD-001 scopes validateFieldStructure to compareDocumentXml output,
matching evaluateSafetyChecks at pipeline.ts:404-430.

INV-RT-001 is paired: both acceptAllChanges->b-text and rejectAllChanges->a-text
under normalizeText, matching gold-standard round-trip tests at
round-trip-inplace.test.ts:4 and nvca-coi-regression.test.ts:77-103.

axiom (not opaque def) for all 7 declarations -- bodyless opaque fails the
Inhabited-synth check in Lean 4.29.1.

Stage 1-3 modules (AtomsEqual.lean, Lcs.lean) remain zero-sorry.
…inplace mode

Peer-review (Codex) findings on the Stage 4 commit:

1. compareDocumentXml docstring read as "any compareDocumentsAtomizer output."
   evaluateSafetyChecks (pipeline.ts:404-440) is gated on the inplace branch
   at pipeline.ts:669; rebuild-mode output bypasses the safety check entirely.
   Narrow the docstring to the inplace-mode comparison candidate so
   INV-FIELD-001 and INV-RT-001 don't overclaim against rebuild mode.

2. Citation drift in Spec.lean and README.md:
   - newDocumentXml: pipeline.ts:608 -> 635-650
   - validateFieldStructure call site: pipeline.ts:404-430 -> 404-440 (call at :439-440)
   - validateFieldStructure body: pipeline.ts:352-401 -> 352-402
   - acceptAllChanges: trackChangesAcceptorAst.ts:368-518 -> 368-506
   - rejectAllChanges: trackChangesAcceptorAst.ts:519-659 -> 519-625
   - extractTextWithParagraphs: trackChangesAcceptorAst.ts:660-699 -> 660-688
   - normalizeText: trackChangesAcceptorAst.ts:701-710 -> 701-711
   - round-trip-inplace.test.ts:4 (banner) -> :56-63 and :87-94 (paired assertions)
   - OoxmlDoc pipeline range: 352-783 -> 352-817

3. README framing: "TS checks and tests justify these statements" softened to
   "motivate these targets" -- with an uninterpreted axiom surface and two
   sorrys, the TS evidence cannot justify a universal Lean claim on its own.

No theorem statements changed; no axiom declarations added or removed; no sorry
changed; Stage 1-3 modules remain zero-sorry.
… theorems

Stage 5 of the Lean 4 verification spike. Empirical property-test bridge
that exercises the sorry'd Lean theorems in verification/lean/LeanSpike/Spec.lean
against the live TypeScript inplace comparison engine.

INV-FIELD-001 and INV-RT-001 hold on 50 independent random (original, revised)
synthetic-document pairs each. fast-check counterexample shrinking would surface
any falsifying input. The Lean theorems remain sorry'd intentionally — this is
an empirical bridge, not a proof.

Also exports validateFieldStructure from pipeline.ts so the bridge can call it
directly without re-implementing the predicate.

Refs PR #164.
… scope

Codex peer-review fixups on Stage 5:

1. Gate both properties on `reconstructionModeUsed === 'inplace'`. The engine
   can request inplace and fall back to rebuild after the pipeline safety
   check (pipeline.ts:404-440). Without this gate, a future regression could
   silently fall back and leave both properties vacuously green while no
   longer testing the Spec.lean surface. A fallback now fails the property
   (which is correct: fallback means INV-FIELD-001 or INV-RT-001 was
   internally falsified on the inplace candidate).

2. Document INV-FIELD-001's coverage limit explicitly. The arbitrary produces
   field-free synthetic paragraphs; field-bearing coverage lives in
   collapsed-field-inplace.test.ts. Independent pairs predominantly test
   wholesale insert/delete; small-edit coverage lives in the fixture-based
   round-trip tests.

3. Drop the redundant `<>&` input filter (synthetic builder already escapes
   them).

4. Raise numRuns from 50 to 100 per property — local bridge runtime ~3.6s for
   200 total cases, well under the 60s test timeout.

5. Move fc.assert into the `then` step for cleaner Allure BDD framing.

Refs PR #164.
@stevenobiajulu stevenobiajulu changed the title chore(verification): Lean atom-level LCS proofs + sorry'd Spec.lean chore(verification): atom-level LCS proofs, sorry'd Spec.lean, fast-check bridge May 11, 2026
@stevenobiajulu
Copy link
Copy Markdown
Member Author

Stage 5 — fast-check bridge (peer review round 2)

Commits: 569902c (initial bridge) → 1129c84 (round-1 fixup).

Round-2 verdict (Codex): "No new findings."

Round-1 issues addressed in 1129c84

Issue Severity Status
Bridge didn't gate on reconstructionModeUsed === 'inplace' — silent rebuild fallback could leave both properties vacuously green. high Fixed. Both properties now return false if modeUsed !== 'inplace'. Local: 200/200 cases stay on the inplace path (0 fallbacks).
INV-FIELD-001 near-vacuous on field-free synthetic input. high Scoped honestly. Docstring now states the field-coverage limitation and points at collapsed-field-inplace.test.ts as the real field-bearing coverage layer.
Independent pairs miss small-edit / run-boundary regime. medium Scoped honestly. Docstring states the limitation and points at the fixture-based round-trip tests. Mutation-derived property deferred as a known follow-on.
Redundant <>& input filter (synthetic builder already escapes). nit Removed.
numRuns of 50 was conservative. nit Raised to 100/property. Local bridge runtime ~3.6s for both properties (200 cases).

Empirical state

  • Bridge tests in isolation: 2/2 pass, 3.6s total at numRuns=100 per property.
  • Full docx-core suite: 84 files, 1231 passed + 1 skipped.
  • check:spec-coverage: PASS.
  • Build + lint: clean.

Known follow-ons (deferred, not blockers for the draft)

  • Mutation-derived revised = mutate(original) property to cover the small-edit regime.
  • Field-bearing arbitrary to make INV-FIELD-001 meaningfully exercise w:fldChar / w:instrText markup.
  • Stage 6: lean-build GitHub Actions job (separate stretch goal).

PR stays in draft.

Engineering-internal tracker sitting next to verification/lean/. Documents
what's complete (Tier 1 LCS soundness, Tier 1.5 sorry'd specs + fast-check
bridge, Tier 1.6 CI in flight) and what's open (Tier 2 definitional OoxmlDoc
model + closed proofs of Spec.lean invariants, Tier 2.5 Lean↔TS equivalence,
Tier 3 reconstruction invariants, Tier 3+ full engine).

Uses wide error bars (e.g. 4-12 months for Tier 2) since the dominant cost
is OOXML-modeling decisions whose tractability is unknown until attempted,
not theorem-prover keystrokes. Explicitly out of OpenSpec for now — the
natural next OpenSpec change is "build a definitional OoxmlDoc subset and
close INV-FIELD-001 against it," which happens when Tier 2 actually starts.

Refs PR #164.
…audits

Stage 6 of the Lean 4 verification spike. New workflow runs `lake build` in
verification/lean/ on PRs and pushes that touch the Lean directory or this
workflow file. Three guarantees enforced per run:

- lake build succeeds (proofs check)
- LeanSpike/AtomsEqual.lean and LeanSpike/Lcs.lean stay zero-sorry
- LeanSpike/Spec.lean has exactly 2 sorries (the intentional spec placeholders)

Performance: caches ~/.elan and verification/lean/.lake keyed on
lean-toolchain + lake-manifest.json. Uses `lake exe cache get` to pull
mathlib pre-built oleans from the community CDN, keeping a cold run under
~5 minutes vs ~30 minutes compiling mathlib from source. Install/cache-get
steps have retry loops mirroring ci.yml's network-retry style.

Workflow is NOT in branch protection — fails surface but don't block merging
yet. Flip to blocking once first green run lands on main.

Refs PR #164.
Peer-review fixups on Stage 6:

1. (HIGH, Codex+Gemini) Drop `lake update`. The cache key hashes
   lean-toolchain + lake-manifest.json, but `lake update` can mutate both
   files (transitive deps pinned to mutable refs like batteries@main and
   aesop@master). On cache miss, actions/cache would save .lake + ~/.elan
   under the key derived from committed files while the contents reflect a
   mutated manifest, poisoning future restores. The committed
   lake-manifest.json is now the sole source of truth in CI.

2. (MEDIUM, Codex) Make `lake exe cache get` best-effort via
   continue-on-error. A CDN outage was previously a hard failure even
   though `lake build` would still succeed (compiling mathlib from source
   ~30 min vs ~30 s with cached oleans). `lake build` is now the
   authoritative step; cache-get is a pure optimization.

Refs PR #164.
First lean-build run on a17d2e1 exposed a latent bug: ripgrep is NOT
pre-installed on ubuntu-latest runners. The Stage 1-3 audit silently
passed because `if rg -n "sorry" ...` evaluates exit 127 (rg-not-found)
as false — no error path runs, a false-negative audit that would not
catch a real sorry leak. The Spec.lean audit failed loudly only because
its `|| echo 0` defaulted the count to 0.

Switching both audits to grep (POSIX-guaranteed on ubuntu-latest). Also
hardened the count fallback from `|| echo 0` to `|| true` so grep's own
"0 matches" output (exit 1) doesn't double-emit a count value.

The substantive Stage 6 design — elan install, mathlib cache-get,
lake build — was empirically validated on the prior run: cold runner
compiled and proof-checked the spike in ~2m26s, well under the 30-min
timeout.

Refs PR #164.
@stevenobiajulu
Copy link
Copy Markdown
Member Author

Stage 6 — lean-build CI (and ROADMAP)

Commits this round:

  • b10335fverification/ROADMAP.md (Tier 1/2/3 tracker, lightweight engineering-internal artifact, ~169 lines)
  • 014e197 — initial lean-build workflow
  • a17d2e1 — peer-review fixup (drop lake update, cache-get best-effort)
  • 5504414 — second fixup (grep instead of rg — rg not pre-installed on ubuntu-latest)

Peer review

Round Reviewer Verdict
Stage 6 initial Codex (high+medium) lake update is a cache-key footgun; cache get failure shouldn't be hard-fail
Stage 6 initial Gemini concurred on lake update
Stage 6 fixup 1 Codex re-review "Fixup correctly resolves both findings; no further changes"
Stage 6 fixup 2 First actual CI run exposed rg-not-on-ubuntu-latest false-negative; grep fix in 5504414

actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0 SHA independently verified via git ls-remote https://github.com/actions/cache refs/tags/v4.3.0.

Empirical CI validation

Cold-cache run on 5504414:

  • Elan install: ✓
  • lake exe cache get on cold workspace: ✓ (bootstraps package resolution from committed lake-manifest.json — no lake update needed)
  • lake build: ✓ proofs check from a fully cold runner
  • Audit Stage 1-3 zero-sorry: ✓
  • Audit Spec.lean exactly 2 sorries: ✓
  • Total runtime: 5m14s cold-cache. Subsequent runs hit cache; expect <1 min warm.

Roadmap

verification/ROADMAP.md documents:

  • Tier 1 (LCS soundness) — DONE
  • Tier 1.5 (sorry'd specs + fast-check bridge) — DONE
  • Tier 1.6 (CI gate) — DONE (this round)
  • Tier 2 (definitional OoxmlDoc model + closed proofs) — NOT STARTED (4-12 months)
  • Tier 2.5 (Lean↔TS equivalence + projection broadening) — NOT STARTED (2-6 months)
  • Tier 3 (reconstruction invariants) — NOT STARTED (1.5-3 years)
  • Tier 3+ (full engine) — NOT STARTED (additional 2-4 years)

The spike's narrowed success bar (one proved LCS + 1-2 sorry'd reconstruction specs) is now exceeded. Suggest this is a logical merge point. Branch protection: lean-build workflow is intentionally NOT in branch protection yet; flip it to required after merge.

@stevenobiajulu stevenobiajulu changed the title chore(verification): atom-level LCS proofs, sorry'd Spec.lean, fast-check bridge chore(verification): atom-level Lean 4 spike with proofs + bridge + CI May 11, 2026
@stevenobiajulu stevenobiajulu marked this pull request as ready for review May 11, 2026 13:55
…hardening, bridge diagnostics)

Applies Codex + Gemini final pre-merge peer review:

- Spec.lean: compareDocumentXml is now OoxmlDoc → OoxmlDoc → Option OoxmlDoc;
  INV-FIELD-001 and INV-RT-001 are premised on compareDocumentXml a b =
  some combined, so doc pairs where inplace mode fails (ContainerResolutionError
  or every safety pass failing) are formally out of scope. The TS pipeline is
  partial and the theorems should not claim totality.
- lean-build.yml: drop OS-only restore-keys (could restore stale .lake/elan
  across toolchain/manifest changes); drop continue-on-error and "source
  compile fallback" framing on lake exe cache get (cold mathlib ≈ job
  timeout). Sorry audits use grep -nw "sorry" and the zero-sorry audit
  now globs all non-Spec LeanSpike modules.
- lean-spec-bridge.test.ts: replace bare `return false` on rebuild fallback
  with a throw carrying fallbackReason, failedChecks, and the generated
  paragraph arrays so CI failures are debuggable. Comment header narrowed
  to scope "fallback ⇒ falsification" claim to the paragraph-only
  generator family.
- ROADMAP.md: Tier 1.6 marked COMPLETE; header date bumped.
- Lcs.lean: caveat comment moved to atomsEqual_implies_eq theorem site,
  not just README — load-bearing lemma overfits the 3-field projection.
- AtomsEqual.lean: rewrite misleading "regardless of whether sha1Hash matches"
  docstring (atomsEqual short-circuits on hash inequality; the theorem just
  omits hash equality from its conclusion).

Verification: lake build clean (2 expected sorries at Spec.lean:66,88) ·
npm -w @usejunior/docx-core run build clean · tsc --noEmit clean ·
lean-spec-bridge.test.ts 2/2 pass, 0 falsifications · full docx-core
test:run 1231 pass + 1 skip (baseline preserved).
@stevenobiajulu stevenobiajulu enabled auto-merge (squash) May 11, 2026 14:16
@stevenobiajulu stevenobiajulu merged commit aa9b54c into main May 11, 2026
21 of 23 checks passed
@stevenobiajulu
Copy link
Copy Markdown
Member Author

✅ Post-merge smoke passed

Merged: aa9b54c5a10e9404f65f403865f79982823f3a13
Built from: main @ aa9b54c5a10e9404f65f403865f79982823f3a13
Smoke: clean npm run buildnpm installnpm run lint:workspacesnpm run test:runlake build + CI audits

Steps

  • ✅ build (all workspaces clean, including new fast-check dep wired through)
  • npm install (refreshed lockfile picked up fast-check added in this PR)
  • ✅ lint:workspaces (only pre-existing unrelated eslint warning in docx-mcp/src/cli/commands/edit.test.ts:133)
  • ✅ test:run across all workspaces — 2092 passed, 36 skipped, 169 files:
    • docx-core: 1237 pass + 1 skip (includes the new lean-spec-bridge.test.ts 2 properties × 100 runs, 0 falsifications)
    • docx-mcp: 736 pass
    • google-docs-core: 116 pass + 35 skip
    • safe-docx-mcpb: 3 pass
  • lake build in verification/lean/ (3297 jobs, mathlib v4.29.1 oleans via cache)
  • ✅ Lean sorry audits: zero in non-Spec modules; exactly 2 in Spec.lean (inv_field_001, inv_rt_001) — matches the CI workflow expectation

Notes for triage

  • Two CI checks failed at merge time but were not required by branch protection: actionlint and gemini-manifest. These pre-date PR chore(verification): atom-level Lean 4 spike with proofs + bridge + CI #164 (see existing repo state on main); worth a separate issue if not already tracked.
  • After pull, npm install is required before lint/tests can run because fast-check was newly added.

Log: /tmp/automerge-smoke-164-1778510266.log (local).

stevenobiajulu added a commit that referenced this pull request May 14, 2026
…el + inv_field_001 closure (#202)

* chore(verification): scaffold OpenSpec change for Tier 2 OoxmlDoc model + inv_field_001 closure

Authors the OpenSpec change `add-ooxml-doc-subset-and-inv-field-001-proof`
to kick off Tier 2 of the Lean 4 verification roadmap. Two rounds of
gemini + codex CLI peer review have been incorporated. The proposal does
NOT implement Tier 2; it scopes the work and decomposes it into eight
single-`/codex-implement`-run-sized tasks (T1, T2, T3a, T3b, T4a, T4b,
T5a, T5b/T6) that successor PRs against this change will land.

Key decisions captured:

- Model boundary: small tree-structured syntactic OOXML subset (paragraphs
  → blocks with `Ins`/`Del`/`MoveFrom`/`MoveTo` wrappers → runs → atoms
  including `DelInstrText`). Mirrors `trackChangesAcceptorAst.ts:368-659`
  structurally; deliberately narrower than full `ComparisonUnitAtom`.
- Closure strategy: introduce a recursive well-formedness predicate
  `recursivelyWellformed` (whole-doc `validateFieldStructure` plus every
  wrapper subtree `fieldSelfContained`) and prove a preservation lemma
  over the Lean model. The `Spec.lean:71` sorry closes by composing the
  lemma with a single new named axiom
  `compareDocumentXml_output_recursivelyWellformed`, which becomes the
  load-bearing residual obligation Tier 3 will discharge.
- `fieldSelfContained` strengthening (round-2 peer review fix): the
  counterexample `Begin, Separate, Ins(Del(End, Begin), InstrText), End`
  showed that per-subtree `validateFieldStructure` is too weak, so
  wrapper subtrees must enter and exit the depth/seenSeparator walk at
  `(0, false)`.
- `inv_rt_001` (Spec.lean:95 sorry), Tier 2.5 (Lean↔TS equivalence,
  broader projection), and Tier 3 (definitional `compareDocumentXml`)
  are explicitly out of scope for this change.

Also updates `verification/ROADMAP.md` line 3 to reflect PR #164 merged
and this change in progress (the line was stale, calling PR #164 a draft
when it merged 2026-05-11).

`openspec validate add-ooxml-doc-subset-and-inv-field-001-proof --strict`
passes. The proposal is opened as a draft PR so the model-boundary and
predicate-strengthening decisions can take further peer-review iteration
in PR comments before any implementation work begins.

Ref: #201 (tracking issue), #164 (Tier 1 spike).

* chore(verification): incorporate round-3 peer review (stack-valued field context, T0 generic-lemma split)

Round 3 peer review (codex + gemini CLI, 2026-05-13) caught three further
counterexample families that defeat any "standalone walk from (0, false)
ends at (0, false)" predicate:

- DelInstrText is unchecked: Del(DelInstrText) breaks reject after rewrite
- subtree consumes outer field state via leading End/Separate:
  Begin, Ins(End) breaks reject; Begin, Del(End) breaks accept
- single-boolean seenSeparator reset:
  [Begin, Separate, Del(Begin, End), InstrText, End] slips through

The fix tracks seenSeparator as a depth-indexed stack (FieldCtx := List Bool),
exactly mirroring the TS engine's pastSeparatorAtDepth at pipeline.ts:374-389,
and strengthens the per-subtree predicate to fieldContextNeutral: the subtree
is invariant under any outer field-stack context.

Other revisions in this commit:

- Pull shared walk-over-append + context-extension + DelInstrText rewrite
  lemmas out into a new T0 task (Tier2/WalkLemmas.lean) so accept and
  reject halves consume identical generic results.
- Fix DelInstrText locality claim: rewrite is global at the operation
  level (after both Del and MoveFrom unwraps complete, per TS lines
  602-616); locality is enforced by recursivelyWellformed on the input.
- Tone down empirical wording in the new axiom's docstring; scope it to
  this repo's inplace atomizer rather than to OOXML comparison engines
  in general.
- Add scenario [LEAN-T2-04](f) covering the DelInstrText locality and
  reject ordering documentation.
stevenobiajulu added a commit that referenced this pull request May 22, 2026
…208)

* feat(verification): close inv_field_001 with Tier 2 OoxmlDoc subset

Implements OpenSpec change `add-ooxml-doc-subset-and-inv-field-001-proof`
(merged 2026-05-14 via PR #202).

Replaces the uninterpreted document-level axioms in `LeanSpike/Spec.lean`
with a definitional Lean OOXML subset under `verification/lean/Tier2/`
and proves the field-structure preservation lemma machine-checked.
`inv_field_001` is closed by composing that lemma with a single named
residual axiom `compareDocumentXml_output_recursivelyWellformed`, which
captures the only remaining assumption (discharging it by modeling
`compareDocumentXml` definitionally is Tier 3 work). `inv_rt_001`
remains `sorry`'d (deferred to `add-inv-rt-001-proof`).

Why: PR #164 shipped `Spec.lean` over fully uninterpreted axioms, leaving
both invariants as `sorry`. Closing `inv_field_001` requires either a
definitional comparison model (Tier 3) or carrying an explicit, named
residual obligation. Three peer-review rounds (codex + gemini CLI) drove
the design to a stack-valued field context exactly mirroring the TS
engine's `pastSeparatorAtDepth: number[]` and the strictly stronger
`fieldContextNeutral` per-subtree predicate.

What's delivered:
- `Tier2/OoxmlModel.lean` — tree-structured OOXML subset.
- `Tier2/FieldStructure.lean` — stack-valued walk, `validateFieldStructure`,
  `fieldContextNeutral`, `recursivelyWellformed`.
- `Tier2/WalkLemmas.lean` — `walkBlocks_append`, neutrality no-op,
  `neutral_balanced` (via a tall-stack length-tracking argument so no
  `end` underflows), `delInstrText_rewrite_safe`.
- `Tier2/AcceptReject.lean` — definitional `accept` / `reject`.
- `Tier2/InvFieldOne.lean` — closed `field_structure_preserved` (zero
  `sorry`).
- `LeanSpike/Spec.lean` — `OoxmlDoc` / `acceptAllChanges` /
  `rejectAllChanges` / `validateFieldStructure` rewired from `axiom` to
  Tier 2 definitions; new named residual axiom; `inv_field_001` proof.
- `lean-build.yml` — non-Spec zero-sorry audit extended to `Tier2/`;
  Spec.lean audit updated from 2 sorries to 1.
- `lean-spec-bridge.test.ts` — one field-bearing fixture case
  exercising a TS analogue of `recursivelyWellformed` as a falsifiability
  layer for the new axiom.
- `Tier2/README.md`, `verification/lean/README.md`,
  `verification/ROADMAP.md` — scope, residual obligations, status.

Verification: `lake build` clean with one expected `sorry` warning
(`inv_rt_001`). `#print axioms inv_field_001` depends only on `propext`,
`Quot.sound`, `compareDocumentXml`, and the one residual axiom — no
`sorryAx`. Full `lean-spec-bridge.test.ts` suite (5 tests) passes.
`tsc --noEmit` and ESLint clean on the test file. `openspec validate
--strict` passes; all 10 tasks checked off.

Ref: #201

* feat(verification): extend Tier2 Block with transparent .other container; add delInstrText falsifiability fixtures

Closes peer-review gap B (subset coverage) and tightens gap C (axiom evidence)
on PR #208. Both surfaced by Codex + Gemini peer review (commit-comment thread)
and an ECMA-376 Part 4 deep-research report.

## Lean changes

Adds a 6th `Block` constructor `Block.other (tag : String) (children : List Block)`
to model OOXML structural containers that the field-context walk MUST descend
through transparently but that are NOT track-change wrappers: `w:tbl`, `w:tr`,
`w:tc`, `w:sdt`, `w:sdtContent`, `w:hyperlink`, bookmark elements, etc. Before
this change the Lean model implicitly assumed inplace output contained no such
containers — `compareDocumentXml : OoxmlDoc → OoxmlDoc → Option OoxmlDoc` was
side-stepping a lossy projection.

- `Tier2/OoxmlModel.lean`: add `.other` constructor with docstring explaining
  the modeling intent.
- `Tier2/FieldStructure.lean`: `walkBlocks`, `countBlocks` descend through
  `.other` transparently. `wrapperSubtreesBlocks` does NOT emit the children
  as a wrapper subtree (only wrappers nested inside `.other` are emitted) —
  `.other` is structural, not a track-change wrapper.
- `Tier2/AcceptReject.lean`: `acceptBlocks`, `rejectBlocks`, `renameBlocks`
  preserve the `.other` container with its tag while recursively processing
  children. Accept/reject don't unwrap or drop the container itself.
- `Tier2/WalkLemmas.lean`: add `case7` to `walkBlocks_invalid`,
  `walkBlocks_append`, `countBlocks_append`, and the inductive `walkBlocks_tall`
  key block.
- `Tier2/InvFieldOne.lean`: add `case7` to every `induction ... using
  *.induct` (walkBlocks_renameBlocks, countBlocks_renameBlocks,
  walkBlocks_acceptBlocks, walkBlocks_rejectBlocks, both
  countBlocks_*_balance). Introduce `allNeutral_other` helper that does NOT
  extract a `fieldContextNeutral` obligation on the container's children (since
  `.other` is not in `wrapperSubtreesBlocks`). `acceptBlocks_append` /
  `rejectBlocks_append` / `renameBlocks_append` and `accept_blocks` /
  `reject_blocks` use `cases b <;> simp [...]` patterns and require no edit —
  Lean enumerates all 6 constructors.

## Spec.lean axiom docstring rewrite

Honestly states the engine-specific justification for the `∀ ctx` neutrality
strength: safe-docx's current inplace atomizer emits whole field sequences as
single track-change wrappers (verified at `inPlaceModifier.ts:717, 938, 1505,
1671, 1957, 2300` and `collapsed-field-inplace.test.ts:211`). The axiom is
NOT spec-conformant: under ECMA-376 Part 4 a conformant emitter must fragment
fields across wrapper boundaries when modifying them, and fragmented wrappers
are not `fieldContextNeutral` under `∀ ctx`. When the engine becomes
ECMA-376-conformant, this axiom and the per-subtree neutrality predicate will
need replacement with a document-level form (follow-up PR-B tracked in plan).

## TS bridge fixtures (gap C)

Adds three new tests to `lean-spec-bridge.test.ts`:

  1. **NUMPAGES deletion fixture** — original has the complete field; revised
     deletes it. Exercises the `w:delInstrText` atom case in
     `isFieldContextNeutral`, which the existing NUMPAGES insertion fixture
     never reaches.

  2. **Regression guard — `isFieldContextNeutral` rejects non-neutral shapes**:
     standalone `w:fldChar separate`, standalone `w:fldChar end`, and a
     `[begin, separate]` fragment. Crafted DOM-level XML, not full DOCX.

  3. **Regression guard — `isFieldContextNeutral` accepts a complete
     self-contained field wrapper**: the canonical NUMPAGES-in-w:ins shape the
     engine actually emits.

All 8 bridge tests pass locally; typecheck and lint clean.

## Out of scope (deferred to PR-B)

The `∀ ctx` → document-level predicate weakening. Codex's review concluded
that the current engine satisfies the existing `∀ ctx` predicate, so the
weakening is not required for shipping PR #208. PR-B will land after the
engine fragmentation conformance work (new GH issue to be filed) is scoped.

Ref: #208

* docs(verification): reference issue #217 in inv_field_001 residual axiom docstring

Replaces the placeholder "see follow-up issue" with the concrete cross-reference
to the engine fragmentation conformance work tracked in #217. No behavior
change.

Ref: #208, #217

---------

Co-authored-by: Steven Obiajulu <stevenobiajulu@Stevens-MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant