chore(verification): atom-level Lean 4 spike with proofs + bridge + CI by stevenobiajulu · Pull Request #164 · UseJunior/safe-docx

stevenobiajulu · 2026-05-06T14:46:08Z

Summary

Six-stage Lean 4 verification spike against the safe-docx comparison engine.
The spike answers the time-boxed question "is Lean 4 + mathlib + AI-assisted
proof work tractable here?" with concrete artifacts: 5 closed proofs, 2
intentionally sorry'd specification targets, a fast-check empirical
falsifiability bridge, and a CI gate that re-checks the proofs on every PR
touching verification/lean/**.

Scope is deliberately narrow per the time-box and prior peer-review
corrections: the atom-level LCS subroutine and a sorry'd specification
surface, not the full comparison engine. See verification/ROADMAP.md for
the full Tier 1 → Tier 3+ trajectory.

What's proved (closed, zero-sorry, machine-checked)

verification/lean/LeanSpike/:

INV-ATOMSEQ-001 (AtomsEqual.lean) — hash-collision safety of atomsEqual.
INV-LCS-001 (Lcs.lean) — value-level subsequence soundness of computeAtomLcs.
INV-LCS-002 — optimality of returned matches in the Lean model.
INV-LCS-003 — strict index monotonicity.
INV-LCS-004 — partition completeness (matched + deleted partition original; matched + inserted partition revised).

What's stated but not proved (sorry'd specification targets)

verification/lean/LeanSpike/Spec.lean over an uninterpreted axiom signature:

INV-FIELD-001 (sorry'd) — field-structure preservation across accept/reject of inplace comparison output.
INV-RT-001 (sorry'd) — paired round-trip text equality under normalization.

Closing these is Tier 2 work; see ROADMAP.

What's empirically validated (not proved, but tested on every PR)

packages/docx-core/src/integration/lean-spec-bridge.test.ts — fast-check
property tests at 100 runs/property × 2 properties against the live TS
engine, gated on reconstructionModeUsed === 'inplace'. Surfaces
falsifications of INV-FIELD-001 or INV-RT-001 on randomly-generated synthetic
inputs. Local: 0 falsifications, ~3.6s total per full test:run.

Stages

Stage	Scope	Commit at HEAD
1	Atom projection + `atomsEqual` + INV-ATOMSEQ-001	shipped
2	Lean model of `computeAtomLcs` + Wagner-Fischer brute-force validation (1.19M cases)	shipped
3	INV-LCS-001/002/003/004 closed proofs	shipped
4	Sorry'd `Spec.lean` with INV-FIELD-001 + INV-RT-001 over uninterpreted axiom signature	shipped
5	fast-check bridge wired to live TS engine via `buildSyntheticDocx`	shipped
6	`lean-build` GitHub Actions workflow with mathlib cache + sorry audits	shipped

Each stage was peer-reviewed by Codex and (where useful) Gemini in parallel
before merging into the spike branch. See PR comments for round-by-round
review summaries.

CI

New workflow .github/workflows/lean-build.yml:

Triggers on PR + push to main paths-filtered to verification/lean/** + the workflow file itself
Caches ~/.elan + verification/lean/.lake keyed on lean-toolchain + lake-manifest.json
Runs lake exe cache get (best-effort) → lake build (authoritative)
Audits Stage 1-3 modules stay zero-sorry; Spec.lean stays at exactly 2 sorries
Empirical first run (cold cache): 5m14s end-to-end

Workflow is not in branch protection yet. After merge, the maintainer
can flip it to required via Settings → Branches → main → Require status checks.

What this PR does NOT cover

Per the ROADMAP, the following remain open:

Tier 2: definitional OoxmlDoc model + closed proofs of INV-FIELD-001 / INV-RT-001.
Tier 2.5: extensional equivalence Lean LCS ↔ TS Wagner-Fischer DP; broadening the Atom projection toward the real ComparisonUnitAtom.
Tier 3: reconstruction invariants on inPlaceModifier.ts / documentReconstructor.ts (the actual layer where bug classes from Reconstruct paragraph-level markers (commentRange, bookmark) outside <w:r> in rebuild mode #106/Reconstruct moveFromRange*/moveToRange* paragraph-level markers on rebuild #110/Reconstruct permStart/permEnd paragraph-level markers on rebuild #111/bug: inplace mode strips spaces from deleted text in word-split reconstructor #76/Inplace modifier misaligns content in tables with merged cells (w:vMerge/w:gridSpan) #65 live).
Tier 3+: full engine verification.

Estimated remaining effort (with wide error bars): 2.5–5 years of
intermittent agent-driven work to reach Tier 3 complete.

Caveats (already documented in `verification/lean/README.md`)

Proofs are properties of the Lean model, not the TS code. Extensional equivalence Lean LCS ↔ TS Wagner-Fischer is validated empirically (1.19M brute-force cases on sequences ≤ 6 over a 3-symbol alphabet, zero divergence) but not formally proven.
atomsEqual_implies_eq holds only because Atom is projected to 3 fields; would break under a broader projection toward the real ComparisonUnitAtom.
The fast-check bridge restricts to field-free synthetic input and independent random pairs. Field-bearing coverage lives in collapsed-field-inplace.test.ts; small-edit regime lives in round-trip-inplace.test.ts and nvca-coi-regression.test.ts.

Verification

cd verification/lean && lake build (zero sorry in Stage 1-3 modules; exactly 2 in Spec.lean)
npm run build clean
npm run lint:workspaces clean
npm run test:run — 1231 pass + 1 skip across docx-core
npm run check:spec-coverage pass
Stage 6 lean-build workflow green on actual GitHub Actions runner (5m14s cold cache)
All other CI green on this PR

Follow-on (deferred to a separate issue)

Extend the fast-check bridge's arbitraries to exercise the new
tracked-changes-emitting primitives (recent PRs #137/#138/#139/#140 added
pPrChange/trPrChange/tcPrChange and ins/del markup emitters in
footnote / comment / paragraph-insert primitives). The bridge currently
builds inputs via buildSyntheticDocx which emits plain XML; using the
new primitives as input sources would meaningfully widen INV-FIELD-001 /
INV-RT-001 coverage. Tracked separately to keep this PR scoped.

vercel · 2026-05-06T14:46:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
site	Ready	Preview, Comment	May 11, 2026 2:16pm

codecov · 2026-05-06T14:51:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

stevenobiajulu · 2026-05-06T22:11:40Z

Stage 3 is now pushed on this branch. The Lean spike now proves INV-LCS-002/003/004 in addition to the earlier soundness results, lake build is clean with zero sorry, the full local pre-submit passed (npm run build, npm run lint:workspaces, npm run test:run, npm run check:spec-coverage), and a final codex review --uncommitted found no actionable bugs in the diff.

stevenobiajulu · 2026-05-10T19:35:07Z

Stage 4 landed (sorry'd Spec.lean, INV-FIELD-001 + INV-RT-001)

Three commits stacked on the now-rebased branch:

5cfd6d8 — Initial Stage 4: verification/lean/LeanSpike/Spec.lean (new), LeanSpike.lean import, README section.
4a28caf — Address Codex peer-review findings on 5cfd6d8: narrow compareDocumentXml semantically to inplace-mode comparison output (rebuild-mode candidates bypass evaluateSafetyChecks per pipeline.ts:669); fix 9 citation drifts against current TS source.
a1ced8c — Final citation off-by-one: rejectAllChanges body ends at line 626, not 625.

What's in Spec.lean

7 axiom declarations forming a fully uninterpreted Lean signature: OoxmlDoc, compareDocumentXml, acceptAllChanges, rejectAllChanges, validateFieldStructure, extractTextWithParagraphs, normalizeText.
Two sorry'd theorems:
- INV-FIELD-001: validateFieldStructure (acceptAllChanges (compareDocumentXml a b)) = true ∧ validateFieldStructure (rejectAllChanges (compareDocumentXml a b)) = true. Mirrors evaluateSafetyChecks at pipeline.ts:404-440, scoped to the inplace branch.
- INV-RT-001 (paired): normalizeText (extractTextWithParagraphs (acceptAllChanges combined)) = normalizeText (extractTextWithParagraphs b) ∧ normalizeText (extractTextWithParagraphs (rejectAllChanges combined)) = normalizeText (extractTextWithParagraphs a). Mirrors paired assertions at round-trip-inplace.test.ts:56-63 and :87-94.

Modeling choices (locked in plan peer-review)

axiom (not opaque def) for all 7 declarations — bodyless opaque fails the Inhabited-synth check in Lean 4.29.1.
compareDocumentXml : OoxmlDoc → OoxmlDoc → OoxmlDoc abstracts the document.xml surface (the public TS compareDocuments is (Buffer, Buffer) => Promise<CompareResult>; this Lean symbol is over the XML inside CompareResult.document, scoped to inplace-mode output).
Fully uninterpreted signature — no partial OOXML data model.

Isolation guarantee

LeanSpike/AtomsEqual.lean and LeanSpike/Lcs.lean remain zero-sorry.

LeanSpike/Spec.lean is the only sorry-bearing file. Verified:

$ rg -n "sorry" LeanSpike/AtomsEqual.lean LeanSpike/Lcs.lean   # no output
$ rg -n "sorry" LeanSpike/Spec.lean                            # 2 hits
$ rg -c "^axiom" LeanSpike/Spec.lean                           # 7

Peer review

Plan peer-reviewed by Gemini and Codex in parallel before implementation. Both flagged the axiom vs opaque issue, the asymmetric INV-RT-001, and the over-broad INV-FIELD-001; Codex additionally caught that the stale plan file at ~/.claude/plans/ is outside any git repo (split into separate housekeeping).
Diff peer-reviewed by Gemini and Codex on 5cfd6d8; Codex found a real semantic precision issue (overclaiming against rebuild mode) plus 9 citation drifts.
Fixup 4a28caf peer-reviewed by Codex; found single off-by-one (rejectAllChanges 519-625 → 519-626), fixed in a1ced8c.

Remaining stretch (separate stages)

Stage 5: fast-check bridge wiring INV-FIELD-001 and INV-RT-001 to random property tests against the live TS engine in packages/docx-core/src/integration/lean-spec-bridge.test.ts.
Stage 6: lean-build GitHub Actions job running lake build in verification/lean/.

PR stays in draft.

…ndness Stages 1+2 of an experimental Lean 4 verification spike against the inner LCS subroutine of the comparison engine (atomLcs.ts:45-104). Why: - Recurring bug class in reconstruction (e.g. #106 → deferred-adjacent #110/#111 follow-ons) suggests provable invariants would prevent classes of bugs the test corpus alone won't catch. - Validates whether Lean 4 + Mathlib + AI agent assist is tractable against this codebase before committing to broader verification work. - Pascal's wager: bounded downside (agent rental we already pay for), asymmetric upside if the AI-buyer market materializes and starts reading proofs as a procurement signal. What: - New verification/lean/ Lake project pinning Lean 4.29.1 + mathlib4 v4.29.1. Outside npm workspaces; build/distribution via Lake only. - LeanSpike/Atom.lean: 3-field projected Atom type (sha1Hash, textContent, tagName) — only what atomsEqual consults; full ComparisonUnitAtom has ~20 fields, the rest are deliberately omitted from the model. - LeanSpike/AtomsEqual.lean: atomsEqual function + INV-ATOMSEQ-001 proved zero-sorry. Captures hash-collision safety: atomsEqual = true implies textContent + tagName equality regardless of sha1Hash. - LeanSpike/Lcs.lean: Lean 4 model of the Wagner-Fischer LCS as a recursive computation over reversed inputs — an alternate executable spec, NOT a line-for-line port of the TS DP+backtracking. Proves INV-LCS-001 (value-level subsequence soundness) zero-sorry. - README.md documents proof scope, the LEAN-vs-TS specification gap, build instructions, and explicit caveats peer review surfaced. Verification: - lake build succeeds in 15s, 3289 jobs, zero warnings, zero errors, zero sorry. - Peer-reviewed by Gemini and Codex CLI in parallel; both approved. - Codex empirically tested 1.19M sequence-pair comparisons (length ≤ 6, 3-symbol alphabet) — Lean recursive algorithm and TS DP+backtracking produce identical matches in all cases. Strong evidence the alternate executable spec is observationally equivalent for small inputs. Not covered (deferred to future stages): - INV-LCS-002 (optimality), INV-LCS-003 (strict monotonicity of match indices), INV-LCS-004 (partition completeness) - Reconstruction invariants, round-trip text equality - Field-structure / document-shape preservation - The hierarchicalCompare paragraph-level layer (computeAtomLcs is the inner subroutine; hierarchicalCompare invokes it per matched paragraph group) - The TypeScript implementation itself (proof targets the Lean model; extensional equivalence is a deferred obligation) Does not modify any TypeScript code. The npm workspaces are unaffected; verification/lean/ contains no package.json and is not added to the workspaces array. Ref: experimental, no issue number; multi-stage spike, expect more commits on this branch before any merge consideration.

Complete Stage 3 of the Lean LCS spike so the model now covers optimality, index monotonicity, and partition completeness in addition to the earlier soundness results. This tightens the proof story around the atom-level subroutine before any attempt to extend the verification work toward reconstruction, where the motivating regressions actually live. Ref: #110 Ref: #111

…T-001 Stage 4 of the Lean 4 verification spike. Adds verification/lean/LeanSpike/Spec.lean with named specification targets (intentionally sorry'd) over a fully uninterpreted Lean signature. INV-FIELD-001 scopes validateFieldStructure to compareDocumentXml output, matching evaluateSafetyChecks at pipeline.ts:404-430. INV-RT-001 is paired: both acceptAllChanges->b-text and rejectAllChanges->a-text under normalizeText, matching gold-standard round-trip tests at round-trip-inplace.test.ts:4 and nvca-coi-regression.test.ts:77-103. axiom (not opaque def) for all 7 declarations -- bodyless opaque fails the Inhabited-synth check in Lean 4.29.1. Stage 1-3 modules (AtomsEqual.lean, Lcs.lean) remain zero-sorry.

…inplace mode Peer-review (Codex) findings on the Stage 4 commit: 1. compareDocumentXml docstring read as "any compareDocumentsAtomizer output." evaluateSafetyChecks (pipeline.ts:404-440) is gated on the inplace branch at pipeline.ts:669; rebuild-mode output bypasses the safety check entirely. Narrow the docstring to the inplace-mode comparison candidate so INV-FIELD-001 and INV-RT-001 don't overclaim against rebuild mode. 2. Citation drift in Spec.lean and README.md: - newDocumentXml: pipeline.ts:608 -> 635-650 - validateFieldStructure call site: pipeline.ts:404-430 -> 404-440 (call at :439-440) - validateFieldStructure body: pipeline.ts:352-401 -> 352-402 - acceptAllChanges: trackChangesAcceptorAst.ts:368-518 -> 368-506 - rejectAllChanges: trackChangesAcceptorAst.ts:519-659 -> 519-625 - extractTextWithParagraphs: trackChangesAcceptorAst.ts:660-699 -> 660-688 - normalizeText: trackChangesAcceptorAst.ts:701-710 -> 701-711 - round-trip-inplace.test.ts:4 (banner) -> :56-63 and :87-94 (paired assertions) - OoxmlDoc pipeline range: 352-783 -> 352-817 3. README framing: "TS checks and tests justify these statements" softened to "motivate these targets" -- with an uninterpreted axiom surface and two sorrys, the TS evidence cannot justify a universal Lean claim on its own. No theorem statements changed; no axiom declarations added or removed; no sorry changed; Stage 1-3 modules remain zero-sorry.

… -> 519-626)

… theorems Stage 5 of the Lean 4 verification spike. Empirical property-test bridge that exercises the sorry'd Lean theorems in verification/lean/LeanSpike/Spec.lean against the live TypeScript inplace comparison engine. INV-FIELD-001 and INV-RT-001 hold on 50 independent random (original, revised) synthetic-document pairs each. fast-check counterexample shrinking would surface any falsifying input. The Lean theorems remain sorry'd intentionally — this is an empirical bridge, not a proof. Also exports validateFieldStructure from pipeline.ts so the bridge can call it directly without re-implementing the predicate. Refs PR #164.

… scope Codex peer-review fixups on Stage 5: 1. Gate both properties on `reconstructionModeUsed === 'inplace'`. The engine can request inplace and fall back to rebuild after the pipeline safety check (pipeline.ts:404-440). Without this gate, a future regression could silently fall back and leave both properties vacuously green while no longer testing the Spec.lean surface. A fallback now fails the property (which is correct: fallback means INV-FIELD-001 or INV-RT-001 was internally falsified on the inplace candidate). 2. Document INV-FIELD-001's coverage limit explicitly. The arbitrary produces field-free synthetic paragraphs; field-bearing coverage lives in collapsed-field-inplace.test.ts. Independent pairs predominantly test wholesale insert/delete; small-edit coverage lives in the fixture-based round-trip tests. 3. Drop the redundant `<>&` input filter (synthetic builder already escapes them). 4. Raise numRuns from 50 to 100 per property — local bridge runtime ~3.6s for 200 total cases, well under the 60s test timeout. 5. Move fc.assert into the `then` step for cleaner Allure BDD framing. Refs PR #164.

stevenobiajulu · 2026-05-11T00:25:20Z

Stage 5 — fast-check bridge (peer review round 2)

Commits: 569902c (initial bridge) → 1129c84 (round-1 fixup).

Round-2 verdict (Codex): "No new findings."

Round-1 issues addressed in `1129c84`

Issue	Severity	Status
Bridge didn't gate on `reconstructionModeUsed === 'inplace'` — silent rebuild fallback could leave both properties vacuously green.	`high`	Fixed. Both properties now return `false` if `modeUsed !== 'inplace'`. Local: 200/200 cases stay on the inplace path (0 fallbacks).
`INV-FIELD-001` near-vacuous on field-free synthetic input.	`high`	Scoped honestly. Docstring now states the field-coverage limitation and points at `collapsed-field-inplace.test.ts` as the real field-bearing coverage layer.
Independent pairs miss small-edit / run-boundary regime.	`medium`	Scoped honestly. Docstring states the limitation and points at the fixture-based round-trip tests. Mutation-derived property deferred as a known follow-on.
Redundant `<>&` input filter (synthetic builder already escapes).	nit	Removed.
numRuns of 50 was conservative.	nit	Raised to 100/property. Local bridge runtime ~3.6s for both properties (200 cases).

Empirical state

Bridge tests in isolation: 2/2 pass, 3.6s total at numRuns=100 per property.
Full docx-core suite: 84 files, 1231 passed + 1 skipped.
check:spec-coverage: PASS.
Build + lint: clean.

Known follow-ons (deferred, not blockers for the draft)

Mutation-derived revised = mutate(original) property to cover the small-edit regime.
Field-bearing arbitrary to make INV-FIELD-001 meaningfully exercise w:fldChar / w:instrText markup.
Stage 6: lean-build GitHub Actions job (separate stretch goal).

PR stays in draft.

Engineering-internal tracker sitting next to verification/lean/. Documents what's complete (Tier 1 LCS soundness, Tier 1.5 sorry'd specs + fast-check bridge, Tier 1.6 CI in flight) and what's open (Tier 2 definitional OoxmlDoc model + closed proofs of Spec.lean invariants, Tier 2.5 Lean↔TS equivalence, Tier 3 reconstruction invariants, Tier 3+ full engine). Uses wide error bars (e.g. 4-12 months for Tier 2) since the dominant cost is OOXML-modeling decisions whose tractability is unknown until attempted, not theorem-prover keystrokes. Explicitly out of OpenSpec for now — the natural next OpenSpec change is "build a definitional OoxmlDoc subset and close INV-FIELD-001 against it," which happens when Tier 2 actually starts. Refs PR #164.

…audits Stage 6 of the Lean 4 verification spike. New workflow runs `lake build` in verification/lean/ on PRs and pushes that touch the Lean directory or this workflow file. Three guarantees enforced per run: - lake build succeeds (proofs check) - LeanSpike/AtomsEqual.lean and LeanSpike/Lcs.lean stay zero-sorry - LeanSpike/Spec.lean has exactly 2 sorries (the intentional spec placeholders) Performance: caches ~/.elan and verification/lean/.lake keyed on lean-toolchain + lake-manifest.json. Uses `lake exe cache get` to pull mathlib pre-built oleans from the community CDN, keeping a cold run under ~5 minutes vs ~30 minutes compiling mathlib from source. Install/cache-get steps have retry loops mirroring ci.yml's network-retry style. Workflow is NOT in branch protection — fails surface but don't block merging yet. Flip to blocking once first green run lands on main. Refs PR #164.

Peer-review fixups on Stage 6: 1. (HIGH, Codex+Gemini) Drop `lake update`. The cache key hashes lean-toolchain + lake-manifest.json, but `lake update` can mutate both files (transitive deps pinned to mutable refs like batteries@main and aesop@master). On cache miss, actions/cache would save .lake + ~/.elan under the key derived from committed files while the contents reflect a mutated manifest, poisoning future restores. The committed lake-manifest.json is now the sole source of truth in CI. 2. (MEDIUM, Codex) Make `lake exe cache get` best-effort via continue-on-error. A CDN outage was previously a hard failure even though `lake build` would still succeed (compiling mathlib from source ~30 min vs ~30 s with cached oleans). `lake build` is now the authoritative step; cache-get is a pure optimization. Refs PR #164.

First lean-build run on a17d2e1 exposed a latent bug: ripgrep is NOT pre-installed on ubuntu-latest runners. The Stage 1-3 audit silently passed because `if rg -n "sorry" ...` evaluates exit 127 (rg-not-found) as false — no error path runs, a false-negative audit that would not catch a real sorry leak. The Spec.lean audit failed loudly only because its `|| echo 0` defaulted the count to 0. Switching both audits to grep (POSIX-guaranteed on ubuntu-latest). Also hardened the count fallback from `|| echo 0` to `|| true` so grep's own "0 matches" output (exit 1) doesn't double-emit a count value. The substantive Stage 6 design — elan install, mathlib cache-get, lake build — was empirically validated on the prior run: cold runner compiled and proof-checked the spike in ~2m26s, well under the 30-min timeout. Refs PR #164.

stevenobiajulu · 2026-05-11T12:48:18Z

Stage 6 — lean-build CI (and ROADMAP)

Commits this round:

b10335f — verification/ROADMAP.md (Tier 1/2/3 tracker, lightweight engineering-internal artifact, ~169 lines)
014e197 — initial lean-build workflow
a17d2e1 — peer-review fixup (drop lake update, cache-get best-effort)
5504414 — second fixup (grep instead of rg — rg not pre-installed on ubuntu-latest)

Peer review

Round	Reviewer	Verdict
Stage 6 initial	Codex (`high`+`medium`)	`lake update` is a cache-key footgun; `cache get` failure shouldn't be hard-fail
Stage 6 initial	Gemini	concurred on `lake update`
Stage 6 fixup 1	Codex re-review	"Fixup correctly resolves both findings; no further changes"
Stage 6 fixup 2	First actual CI run	exposed rg-not-on-ubuntu-latest false-negative; grep fix in `5504414`

actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0 SHA independently verified via git ls-remote https://github.com/actions/cache refs/tags/v4.3.0.

Empirical CI validation

Cold-cache run on 5504414:

Elan install: ✓
lake exe cache get on cold workspace: ✓ (bootstraps package resolution from committed lake-manifest.json — no lake update needed)
lake build: ✓ proofs check from a fully cold runner
Audit Stage 1-3 zero-sorry: ✓
Audit Spec.lean exactly 2 sorries: ✓
Total runtime: 5m14s cold-cache. Subsequent runs hit cache; expect <1 min warm.

Roadmap

verification/ROADMAP.md documents:

Tier 1 (LCS soundness) — DONE
Tier 1.5 (sorry'd specs + fast-check bridge) — DONE
Tier 1.6 (CI gate) — DONE (this round)
Tier 2 (definitional OoxmlDoc model + closed proofs) — NOT STARTED (4-12 months)
Tier 2.5 (Lean↔TS equivalence + projection broadening) — NOT STARTED (2-6 months)
Tier 3 (reconstruction invariants) — NOT STARTED (1.5-3 years)
Tier 3+ (full engine) — NOT STARTED (additional 2-4 years)

The spike's narrowed success bar (one proved LCS + 1-2 sorry'd reconstruction specs) is now exceeded. Suggest this is a logical merge point. Branch protection: lean-build workflow is intentionally NOT in branch protection yet; flip it to required after merge.

…hardening, bridge diagnostics) Applies Codex + Gemini final pre-merge peer review: - Spec.lean: compareDocumentXml is now OoxmlDoc → OoxmlDoc → Option OoxmlDoc; INV-FIELD-001 and INV-RT-001 are premised on compareDocumentXml a b = some combined, so doc pairs where inplace mode fails (ContainerResolutionError or every safety pass failing) are formally out of scope. The TS pipeline is partial and the theorems should not claim totality. - lean-build.yml: drop OS-only restore-keys (could restore stale .lake/elan across toolchain/manifest changes); drop continue-on-error and "source compile fallback" framing on lake exe cache get (cold mathlib ≈ job timeout). Sorry audits use grep -nw "sorry" and the zero-sorry audit now globs all non-Spec LeanSpike modules. - lean-spec-bridge.test.ts: replace bare `return false` on rebuild fallback with a throw carrying fallbackReason, failedChecks, and the generated paragraph arrays so CI failures are debuggable. Comment header narrowed to scope "fallback ⇒ falsification" claim to the paragraph-only generator family. - ROADMAP.md: Tier 1.6 marked COMPLETE; header date bumped. - Lcs.lean: caveat comment moved to atomsEqual_implies_eq theorem site, not just README — load-bearing lemma overfits the 3-field projection. - AtomsEqual.lean: rewrite misleading "regardless of whether sha1Hash matches" docstring (atomsEqual short-circuits on hash inequality; the theorem just omits hash equality from its conclusion). Verification: lake build clean (2 expected sorries at Spec.lean:66,88) · npm -w @usejunior/docx-core run build clean · tsc --noEmit clean · lean-spec-bridge.test.ts 2/2 pass, 0 falsifications · full docx-core test:run 1231 pass + 1 skip (baseline preserved).

stevenobiajulu · 2026-05-11T15:07:31Z

✅ Post-merge smoke passed

Merged: aa9b54c5a10e9404f65f403865f79982823f3a13
Built from: main @ aa9b54c5a10e9404f65f403865f79982823f3a13
Smoke: clean npm run build → npm install → npm run lint:workspaces → npm run test:run → lake build + CI audits

Steps

✅ build (all workspaces clean, including new fast-check dep wired through)
✅ npm install (refreshed lockfile picked up fast-check added in this PR)
✅ lint:workspaces (only pre-existing unrelated eslint warning in docx-mcp/src/cli/commands/edit.test.ts:133)
✅ test:run across all workspaces — 2092 passed, 36 skipped, 169 files:
- docx-core: 1237 pass + 1 skip (includes the new lean-spec-bridge.test.ts 2 properties × 100 runs, 0 falsifications)
- docx-mcp: 736 pass
- google-docs-core: 116 pass + 35 skip
- safe-docx-mcpb: 3 pass
✅ lake build in verification/lean/ (3297 jobs, mathlib v4.29.1 oleans via cache)
✅ Lean sorry audits: zero in non-Spec modules; exactly 2 in Spec.lean (inv_field_001, inv_rt_001) — matches the CI workflow expectation

Notes for triage

Two CI checks failed at merge time but were not required by branch protection: actionlint and gemini-manifest. These pre-date PR chore(verification): atom-level Lean 4 spike with proofs + bridge + CI #164 (see existing repo state on main); worth a separate issue if not already tracked.
After pull, npm install is required before lint/tests can run because fast-check was newly added.

Log: /tmp/automerge-smoke-164-1778510266.log (local).

…el + inv_field_001 closure (#202) * chore(verification): scaffold OpenSpec change for Tier 2 OoxmlDoc model + inv_field_001 closure Authors the OpenSpec change `add-ooxml-doc-subset-and-inv-field-001-proof` to kick off Tier 2 of the Lean 4 verification roadmap. Two rounds of gemini + codex CLI peer review have been incorporated. The proposal does NOT implement Tier 2; it scopes the work and decomposes it into eight single-`/codex-implement`-run-sized tasks (T1, T2, T3a, T3b, T4a, T4b, T5a, T5b/T6) that successor PRs against this change will land. Key decisions captured: - Model boundary: small tree-structured syntactic OOXML subset (paragraphs → blocks with `Ins`/`Del`/`MoveFrom`/`MoveTo` wrappers → runs → atoms including `DelInstrText`). Mirrors `trackChangesAcceptorAst.ts:368-659` structurally; deliberately narrower than full `ComparisonUnitAtom`. - Closure strategy: introduce a recursive well-formedness predicate `recursivelyWellformed` (whole-doc `validateFieldStructure` plus every wrapper subtree `fieldSelfContained`) and prove a preservation lemma over the Lean model. The `Spec.lean:71` sorry closes by composing the lemma with a single new named axiom `compareDocumentXml_output_recursivelyWellformed`, which becomes the load-bearing residual obligation Tier 3 will discharge. - `fieldSelfContained` strengthening (round-2 peer review fix): the counterexample `Begin, Separate, Ins(Del(End, Begin), InstrText), End` showed that per-subtree `validateFieldStructure` is too weak, so wrapper subtrees must enter and exit the depth/seenSeparator walk at `(0, false)`. - `inv_rt_001` (Spec.lean:95 sorry), Tier 2.5 (Lean↔TS equivalence, broader projection), and Tier 3 (definitional `compareDocumentXml`) are explicitly out of scope for this change. Also updates `verification/ROADMAP.md` line 3 to reflect PR #164 merged and this change in progress (the line was stale, calling PR #164 a draft when it merged 2026-05-11). `openspec validate add-ooxml-doc-subset-and-inv-field-001-proof --strict` passes. The proposal is opened as a draft PR so the model-boundary and predicate-strengthening decisions can take further peer-review iteration in PR comments before any implementation work begins. Ref: #201 (tracking issue), #164 (Tier 1 spike). * chore(verification): incorporate round-3 peer review (stack-valued field context, T0 generic-lemma split) Round 3 peer review (codex + gemini CLI, 2026-05-13) caught three further counterexample families that defeat any "standalone walk from (0, false) ends at (0, false)" predicate: - DelInstrText is unchecked: Del(DelInstrText) breaks reject after rewrite - subtree consumes outer field state via leading End/Separate: Begin, Ins(End) breaks reject; Begin, Del(End) breaks accept - single-boolean seenSeparator reset: [Begin, Separate, Del(Begin, End), InstrText, End] slips through The fix tracks seenSeparator as a depth-indexed stack (FieldCtx := List Bool), exactly mirroring the TS engine's pastSeparatorAtDepth at pipeline.ts:374-389, and strengthens the per-subtree predicate to fieldContextNeutral: the subtree is invariant under any outer field-stack context. Other revisions in this commit: - Pull shared walk-over-append + context-extension + DelInstrText rewrite lemmas out into a new T0 task (Tier2/WalkLemmas.lean) so accept and reject halves consume identical generic results. - Fix DelInstrText locality claim: rewrite is global at the operation level (after both Del and MoveFrom unwraps complete, per TS lines 602-616); locality is enforced by recursivelyWellformed on the input. - Tone down empirical wording in the new axiom's docstring; scope it to this repo's inplace atomizer rather than to OOXML comparison engines in general. - Add scenario [LEAN-T2-04](f) covering the DelInstrText locality and reject ordering documentation.

…208) * feat(verification): close inv_field_001 with Tier 2 OoxmlDoc subset Implements OpenSpec change `add-ooxml-doc-subset-and-inv-field-001-proof` (merged 2026-05-14 via PR #202). Replaces the uninterpreted document-level axioms in `LeanSpike/Spec.lean` with a definitional Lean OOXML subset under `verification/lean/Tier2/` and proves the field-structure preservation lemma machine-checked. `inv_field_001` is closed by composing that lemma with a single named residual axiom `compareDocumentXml_output_recursivelyWellformed`, which captures the only remaining assumption (discharging it by modeling `compareDocumentXml` definitionally is Tier 3 work). `inv_rt_001` remains `sorry`'d (deferred to `add-inv-rt-001-proof`). Why: PR #164 shipped `Spec.lean` over fully uninterpreted axioms, leaving both invariants as `sorry`. Closing `inv_field_001` requires either a definitional comparison model (Tier 3) or carrying an explicit, named residual obligation. Three peer-review rounds (codex + gemini CLI) drove the design to a stack-valued field context exactly mirroring the TS engine's `pastSeparatorAtDepth: number[]` and the strictly stronger `fieldContextNeutral` per-subtree predicate. What's delivered: - `Tier2/OoxmlModel.lean` — tree-structured OOXML subset. - `Tier2/FieldStructure.lean` — stack-valued walk, `validateFieldStructure`, `fieldContextNeutral`, `recursivelyWellformed`. - `Tier2/WalkLemmas.lean` — `walkBlocks_append`, neutrality no-op, `neutral_balanced` (via a tall-stack length-tracking argument so no `end` underflows), `delInstrText_rewrite_safe`. - `Tier2/AcceptReject.lean` — definitional `accept` / `reject`. - `Tier2/InvFieldOne.lean` — closed `field_structure_preserved` (zero `sorry`). - `LeanSpike/Spec.lean` — `OoxmlDoc` / `acceptAllChanges` / `rejectAllChanges` / `validateFieldStructure` rewired from `axiom` to Tier 2 definitions; new named residual axiom; `inv_field_001` proof. - `lean-build.yml` — non-Spec zero-sorry audit extended to `Tier2/`; Spec.lean audit updated from 2 sorries to 1. - `lean-spec-bridge.test.ts` — one field-bearing fixture case exercising a TS analogue of `recursivelyWellformed` as a falsifiability layer for the new axiom. - `Tier2/README.md`, `verification/lean/README.md`, `verification/ROADMAP.md` — scope, residual obligations, status. Verification: `lake build` clean with one expected `sorry` warning (`inv_rt_001`). `#print axioms inv_field_001` depends only on `propext`, `Quot.sound`, `compareDocumentXml`, and the one residual axiom — no `sorryAx`. Full `lean-spec-bridge.test.ts` suite (5 tests) passes. `tsc --noEmit` and ESLint clean on the test file. `openspec validate --strict` passes; all 10 tasks checked off. Ref: #201 * feat(verification): extend Tier2 Block with transparent .other container; add delInstrText falsifiability fixtures Closes peer-review gap B (subset coverage) and tightens gap C (axiom evidence) on PR #208. Both surfaced by Codex + Gemini peer review (commit-comment thread) and an ECMA-376 Part 4 deep-research report. ## Lean changes Adds a 6th `Block` constructor `Block.other (tag : String) (children : List Block)` to model OOXML structural containers that the field-context walk MUST descend through transparently but that are NOT track-change wrappers: `w:tbl`, `w:tr`, `w:tc`, `w:sdt`, `w:sdtContent`, `w:hyperlink`, bookmark elements, etc. Before this change the Lean model implicitly assumed inplace output contained no such containers — `compareDocumentXml : OoxmlDoc → OoxmlDoc → Option OoxmlDoc` was side-stepping a lossy projection. - `Tier2/OoxmlModel.lean`: add `.other` constructor with docstring explaining the modeling intent. - `Tier2/FieldStructure.lean`: `walkBlocks`, `countBlocks` descend through `.other` transparently. `wrapperSubtreesBlocks` does NOT emit the children as a wrapper subtree (only wrappers nested inside `.other` are emitted) — `.other` is structural, not a track-change wrapper. - `Tier2/AcceptReject.lean`: `acceptBlocks`, `rejectBlocks`, `renameBlocks` preserve the `.other` container with its tag while recursively processing children. Accept/reject don't unwrap or drop the container itself. - `Tier2/WalkLemmas.lean`: add `case7` to `walkBlocks_invalid`, `walkBlocks_append`, `countBlocks_append`, and the inductive `walkBlocks_tall` key block. - `Tier2/InvFieldOne.lean`: add `case7` to every `induction ... using *.induct` (walkBlocks_renameBlocks, countBlocks_renameBlocks, walkBlocks_acceptBlocks, walkBlocks_rejectBlocks, both countBlocks_*_balance). Introduce `allNeutral_other` helper that does NOT extract a `fieldContextNeutral` obligation on the container's children (since `.other` is not in `wrapperSubtreesBlocks`). `acceptBlocks_append` / `rejectBlocks_append` / `renameBlocks_append` and `accept_blocks` / `reject_blocks` use `cases b <;> simp [...]` patterns and require no edit — Lean enumerates all 6 constructors. ## Spec.lean axiom docstring rewrite Honestly states the engine-specific justification for the `∀ ctx` neutrality strength: safe-docx's current inplace atomizer emits whole field sequences as single track-change wrappers (verified at `inPlaceModifier.ts:717, 938, 1505, 1671, 1957, 2300` and `collapsed-field-inplace.test.ts:211`). The axiom is NOT spec-conformant: under ECMA-376 Part 4 a conformant emitter must fragment fields across wrapper boundaries when modifying them, and fragmented wrappers are not `fieldContextNeutral` under `∀ ctx`. When the engine becomes ECMA-376-conformant, this axiom and the per-subtree neutrality predicate will need replacement with a document-level form (follow-up PR-B tracked in plan). ## TS bridge fixtures (gap C) Adds three new tests to `lean-spec-bridge.test.ts`: 1. **NUMPAGES deletion fixture** — original has the complete field; revised deletes it. Exercises the `w:delInstrText` atom case in `isFieldContextNeutral`, which the existing NUMPAGES insertion fixture never reaches. 2. **Regression guard — `isFieldContextNeutral` rejects non-neutral shapes**: standalone `w:fldChar separate`, standalone `w:fldChar end`, and a `[begin, separate]` fragment. Crafted DOM-level XML, not full DOCX. 3. **Regression guard — `isFieldContextNeutral` accepts a complete self-contained field wrapper**: the canonical NUMPAGES-in-w:ins shape the engine actually emits. All 8 bridge tests pass locally; typecheck and lint clean. ## Out of scope (deferred to PR-B) The `∀ ctx` → document-level predicate weakening. Codex's review concluded that the current engine satisfies the existing `∀ ctx` predicate, so the weakening is not required for shipping PR #208. PR-B will land after the engine fragmentation conformance work (new GH issue to be filed) is scoped. Ref: #208 * docs(verification): reference issue #217 in inv_field_001 residual axiom docstring Replaces the placeholder "see follow-up issue" with the concrete cross-reference to the engine fragmentation conformance work tracked in #217. No behavior change. Ref: #208, #217 --------- Co-authored-by: Steven Obiajulu <stevenobiajulu@Stevens-MacBook-Pro.local>

vercel Bot deployed to Preview May 6, 2026 22:10 View deployment

stevenobiajulu changed the title ~~spike(verification): Lean 4 verification of LCS subroutine — Stages 1+2 (zero-sorry)~~ chore(verification): prove Lean atom-level LCS invariants May 6, 2026

github-actions Bot added the chore label May 6, 2026

stevenobiajulu force-pushed the spike/lean4-lcs-verification branch from aa51110 to 0dd683f Compare May 10, 2026 19:15

vercel Bot deployed to Preview May 10, 2026 19:16 View deployment

vercel Bot deployed to Preview May 10, 2026 19:34 View deployment

stevenobiajulu changed the title ~~chore(verification): prove Lean atom-level LCS invariants~~ chore(verification): Lean atom-level LCS proofs + sorry'd Spec.lean May 10, 2026

stevenobiajulu added 6 commits May 10, 2026 20:00

docs(verification): fix rejectAllChanges citation off-by-one (519-625…

9524f30

… -> 519-626)

stevenobiajulu force-pushed the spike/lean4-lcs-verification branch from a1ced8c to 569902c Compare May 11, 2026 00:10

vercel Bot deployed to Preview May 11, 2026 00:10 View deployment

vercel Bot deployed to Preview May 11, 2026 00:20 View deployment

stevenobiajulu changed the title ~~chore(verification): Lean atom-level LCS proofs + sorry'd Spec.lean~~ chore(verification): atom-level LCS proofs, sorry'd Spec.lean, fast-check bridge May 11, 2026

vercel Bot deployed to Preview May 11, 2026 01:30 View deployment

vercel Bot deployed to Preview May 11, 2026 01:36 View deployment

vercel Bot deployed to Preview May 11, 2026 01:49 View deployment

vercel Bot deployed to Preview May 11, 2026 12:42 View deployment

stevenobiajulu changed the title ~~chore(verification): atom-level LCS proofs, sorry'd Spec.lean, fast-check bridge~~ chore(verification): atom-level Lean 4 spike with proofs + bridge + CI May 11, 2026

stevenobiajulu marked this pull request as ready for review May 11, 2026 13:55

vercel Bot deployed to Preview May 11, 2026 14:16 View deployment

stevenobiajulu enabled auto-merge (squash) May 11, 2026 14:16

stevenobiajulu merged commit aa9b54c into main May 11, 2026
21 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(verification): atom-level Lean 4 spike with proofs + bridge + CI#164

chore(verification): atom-level Lean 4 spike with proofs + bridge + CI#164
stevenobiajulu merged 12 commits into
mainfrom
spike/lean4-lcs-verification

stevenobiajulu commented May 6, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 6, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 6, 2026

Uh oh!

stevenobiajulu commented May 6, 2026

Uh oh!

stevenobiajulu commented May 10, 2026

Uh oh!

stevenobiajulu commented May 11, 2026

Uh oh!

stevenobiajulu commented May 11, 2026

Uh oh!

Uh oh!

stevenobiajulu commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stevenobiajulu commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's proved (closed, zero-sorry, machine-checked)

What's stated but not proved (sorry'd specification targets)

What's empirically validated (not proved, but tested on every PR)

Stages

CI

What this PR does NOT cover

Caveats (already documented in verification/lean/README.md)

Verification

Follow-on (deferred to a separate issue)

Uh oh!

vercel Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 6, 2026

Codecov Report

Uh oh!

stevenobiajulu commented May 6, 2026

Uh oh!

stevenobiajulu commented May 10, 2026

Stage 4 landed (sorry'd Spec.lean, INV-FIELD-001 + INV-RT-001)

What's in Spec.lean

Modeling choices (locked in plan peer-review)

Isolation guarantee

Peer review

Remaining stretch (separate stages)

Uh oh!

stevenobiajulu commented May 11, 2026

Stage 5 — fast-check bridge (peer review round 2)

Round-1 issues addressed in 1129c84

Empirical state

Known follow-ons (deferred, not blockers for the draft)

Uh oh!

stevenobiajulu commented May 11, 2026

Stage 6 — lean-build CI (and ROADMAP)

Peer review

Empirical CI validation

Roadmap

Uh oh!

Uh oh!

stevenobiajulu commented May 11, 2026

✅ Post-merge smoke passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

stevenobiajulu commented May 6, 2026 •

edited

Loading

Caveats (already documented in `verification/lean/README.md`)

vercel Bot commented May 6, 2026 •

edited

Loading

Round-1 issues addressed in `1129c84`