chore(verification): atom-level Lean 4 spike with proofs + bridge + CI#164
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Stage 3 is now pushed on this branch. The Lean spike now proves INV-LCS-002/003/004 in addition to the earlier soundness results, |
aa51110 to
0dd683f
Compare
Stage 4 landed (sorry'd Spec.lean, INV-FIELD-001 + INV-RT-001)Three commits stacked on the now-rebased branch:
What's in Spec.lean
Modeling choices (locked in plan peer-review)
Isolation guarantee
Peer review
Remaining stretch (separate stages)
PR stays in draft. |
…ndness Stages 1+2 of an experimental Lean 4 verification spike against the inner LCS subroutine of the comparison engine (atomLcs.ts:45-104). Why: - Recurring bug class in reconstruction (e.g. #106 → deferred-adjacent #110/#111 follow-ons) suggests provable invariants would prevent classes of bugs the test corpus alone won't catch. - Validates whether Lean 4 + Mathlib + AI agent assist is tractable against this codebase before committing to broader verification work. - Pascal's wager: bounded downside (agent rental we already pay for), asymmetric upside if the AI-buyer market materializes and starts reading proofs as a procurement signal. What: - New verification/lean/ Lake project pinning Lean 4.29.1 + mathlib4 v4.29.1. Outside npm workspaces; build/distribution via Lake only. - LeanSpike/Atom.lean: 3-field projected Atom type (sha1Hash, textContent, tagName) — only what atomsEqual consults; full ComparisonUnitAtom has ~20 fields, the rest are deliberately omitted from the model. - LeanSpike/AtomsEqual.lean: atomsEqual function + INV-ATOMSEQ-001 proved zero-sorry. Captures hash-collision safety: atomsEqual = true implies textContent + tagName equality regardless of sha1Hash. - LeanSpike/Lcs.lean: Lean 4 model of the Wagner-Fischer LCS as a recursive computation over reversed inputs — an alternate executable spec, NOT a line-for-line port of the TS DP+backtracking. Proves INV-LCS-001 (value-level subsequence soundness) zero-sorry. - README.md documents proof scope, the LEAN-vs-TS specification gap, build instructions, and explicit caveats peer review surfaced. Verification: - lake build succeeds in 15s, 3289 jobs, zero warnings, zero errors, zero sorry. - Peer-reviewed by Gemini and Codex CLI in parallel; both approved. - Codex empirically tested 1.19M sequence-pair comparisons (length ≤ 6, 3-symbol alphabet) — Lean recursive algorithm and TS DP+backtracking produce identical matches in all cases. Strong evidence the alternate executable spec is observationally equivalent for small inputs. Not covered (deferred to future stages): - INV-LCS-002 (optimality), INV-LCS-003 (strict monotonicity of match indices), INV-LCS-004 (partition completeness) - Reconstruction invariants, round-trip text equality - Field-structure / document-shape preservation - The hierarchicalCompare paragraph-level layer (computeAtomLcs is the inner subroutine; hierarchicalCompare invokes it per matched paragraph group) - The TypeScript implementation itself (proof targets the Lean model; extensional equivalence is a deferred obligation) Does not modify any TypeScript code. The npm workspaces are unaffected; verification/lean/ contains no package.json and is not added to the workspaces array. Ref: experimental, no issue number; multi-stage spike, expect more commits on this branch before any merge consideration.
Complete Stage 3 of the Lean LCS spike so the model now covers optimality, index monotonicity, and partition completeness in addition to the earlier soundness results. This tightens the proof story around the atom-level subroutine before any attempt to extend the verification work toward reconstruction, where the motivating regressions actually live. Ref: #110 Ref: #111
…T-001 Stage 4 of the Lean 4 verification spike. Adds verification/lean/LeanSpike/Spec.lean with named specification targets (intentionally sorry'd) over a fully uninterpreted Lean signature. INV-FIELD-001 scopes validateFieldStructure to compareDocumentXml output, matching evaluateSafetyChecks at pipeline.ts:404-430. INV-RT-001 is paired: both acceptAllChanges->b-text and rejectAllChanges->a-text under normalizeText, matching gold-standard round-trip tests at round-trip-inplace.test.ts:4 and nvca-coi-regression.test.ts:77-103. axiom (not opaque def) for all 7 declarations -- bodyless opaque fails the Inhabited-synth check in Lean 4.29.1. Stage 1-3 modules (AtomsEqual.lean, Lcs.lean) remain zero-sorry.
…inplace mode Peer-review (Codex) findings on the Stage 4 commit: 1. compareDocumentXml docstring read as "any compareDocumentsAtomizer output." evaluateSafetyChecks (pipeline.ts:404-440) is gated on the inplace branch at pipeline.ts:669; rebuild-mode output bypasses the safety check entirely. Narrow the docstring to the inplace-mode comparison candidate so INV-FIELD-001 and INV-RT-001 don't overclaim against rebuild mode. 2. Citation drift in Spec.lean and README.md: - newDocumentXml: pipeline.ts:608 -> 635-650 - validateFieldStructure call site: pipeline.ts:404-430 -> 404-440 (call at :439-440) - validateFieldStructure body: pipeline.ts:352-401 -> 352-402 - acceptAllChanges: trackChangesAcceptorAst.ts:368-518 -> 368-506 - rejectAllChanges: trackChangesAcceptorAst.ts:519-659 -> 519-625 - extractTextWithParagraphs: trackChangesAcceptorAst.ts:660-699 -> 660-688 - normalizeText: trackChangesAcceptorAst.ts:701-710 -> 701-711 - round-trip-inplace.test.ts:4 (banner) -> :56-63 and :87-94 (paired assertions) - OoxmlDoc pipeline range: 352-783 -> 352-817 3. README framing: "TS checks and tests justify these statements" softened to "motivate these targets" -- with an uninterpreted axiom surface and two sorrys, the TS evidence cannot justify a universal Lean claim on its own. No theorem statements changed; no axiom declarations added or removed; no sorry changed; Stage 1-3 modules remain zero-sorry.
… theorems Stage 5 of the Lean 4 verification spike. Empirical property-test bridge that exercises the sorry'd Lean theorems in verification/lean/LeanSpike/Spec.lean against the live TypeScript inplace comparison engine. INV-FIELD-001 and INV-RT-001 hold on 50 independent random (original, revised) synthetic-document pairs each. fast-check counterexample shrinking would surface any falsifying input. The Lean theorems remain sorry'd intentionally — this is an empirical bridge, not a proof. Also exports validateFieldStructure from pipeline.ts so the bridge can call it directly without re-implementing the predicate. Refs PR #164.
a1ced8c to
569902c
Compare
… scope Codex peer-review fixups on Stage 5: 1. Gate both properties on `reconstructionModeUsed === 'inplace'`. The engine can request inplace and fall back to rebuild after the pipeline safety check (pipeline.ts:404-440). Without this gate, a future regression could silently fall back and leave both properties vacuously green while no longer testing the Spec.lean surface. A fallback now fails the property (which is correct: fallback means INV-FIELD-001 or INV-RT-001 was internally falsified on the inplace candidate). 2. Document INV-FIELD-001's coverage limit explicitly. The arbitrary produces field-free synthetic paragraphs; field-bearing coverage lives in collapsed-field-inplace.test.ts. Independent pairs predominantly test wholesale insert/delete; small-edit coverage lives in the fixture-based round-trip tests. 3. Drop the redundant `<>&` input filter (synthetic builder already escapes them). 4. Raise numRuns from 50 to 100 per property — local bridge runtime ~3.6s for 200 total cases, well under the 60s test timeout. 5. Move fc.assert into the `then` step for cleaner Allure BDD framing. Refs PR #164.
Stage 5 — fast-check bridge (peer review round 2)Commits: Round-2 verdict (Codex): "No new findings." Round-1 issues addressed in
|
| Issue | Severity | Status |
|---|---|---|
Bridge didn't gate on reconstructionModeUsed === 'inplace' — silent rebuild fallback could leave both properties vacuously green. |
high |
Fixed. Both properties now return false if modeUsed !== 'inplace'. Local: 200/200 cases stay on the inplace path (0 fallbacks). |
INV-FIELD-001 near-vacuous on field-free synthetic input. |
high |
Scoped honestly. Docstring now states the field-coverage limitation and points at collapsed-field-inplace.test.ts as the real field-bearing coverage layer. |
| Independent pairs miss small-edit / run-boundary regime. | medium |
Scoped honestly. Docstring states the limitation and points at the fixture-based round-trip tests. Mutation-derived property deferred as a known follow-on. |
Redundant <>& input filter (synthetic builder already escapes). |
nit | Removed. |
| numRuns of 50 was conservative. | nit | Raised to 100/property. Local bridge runtime ~3.6s for both properties (200 cases). |
Empirical state
- Bridge tests in isolation: 2/2 pass, 3.6s total at numRuns=100 per property.
- Full
docx-coresuite: 84 files, 1231 passed + 1 skipped. check:spec-coverage: PASS.- Build + lint: clean.
Known follow-ons (deferred, not blockers for the draft)
- Mutation-derived
revised = mutate(original)property to cover the small-edit regime. - Field-bearing arbitrary to make INV-FIELD-001 meaningfully exercise
w:fldChar/w:instrTextmarkup. - Stage 6:
lean-buildGitHub Actions job (separate stretch goal).
PR stays in draft.
Engineering-internal tracker sitting next to verification/lean/. Documents what's complete (Tier 1 LCS soundness, Tier 1.5 sorry'd specs + fast-check bridge, Tier 1.6 CI in flight) and what's open (Tier 2 definitional OoxmlDoc model + closed proofs of Spec.lean invariants, Tier 2.5 Lean↔TS equivalence, Tier 3 reconstruction invariants, Tier 3+ full engine). Uses wide error bars (e.g. 4-12 months for Tier 2) since the dominant cost is OOXML-modeling decisions whose tractability is unknown until attempted, not theorem-prover keystrokes. Explicitly out of OpenSpec for now — the natural next OpenSpec change is "build a definitional OoxmlDoc subset and close INV-FIELD-001 against it," which happens when Tier 2 actually starts. Refs PR #164.
…audits Stage 6 of the Lean 4 verification spike. New workflow runs `lake build` in verification/lean/ on PRs and pushes that touch the Lean directory or this workflow file. Three guarantees enforced per run: - lake build succeeds (proofs check) - LeanSpike/AtomsEqual.lean and LeanSpike/Lcs.lean stay zero-sorry - LeanSpike/Spec.lean has exactly 2 sorries (the intentional spec placeholders) Performance: caches ~/.elan and verification/lean/.lake keyed on lean-toolchain + lake-manifest.json. Uses `lake exe cache get` to pull mathlib pre-built oleans from the community CDN, keeping a cold run under ~5 minutes vs ~30 minutes compiling mathlib from source. Install/cache-get steps have retry loops mirroring ci.yml's network-retry style. Workflow is NOT in branch protection — fails surface but don't block merging yet. Flip to blocking once first green run lands on main. Refs PR #164.
Peer-review fixups on Stage 6: 1. (HIGH, Codex+Gemini) Drop `lake update`. The cache key hashes lean-toolchain + lake-manifest.json, but `lake update` can mutate both files (transitive deps pinned to mutable refs like batteries@main and aesop@master). On cache miss, actions/cache would save .lake + ~/.elan under the key derived from committed files while the contents reflect a mutated manifest, poisoning future restores. The committed lake-manifest.json is now the sole source of truth in CI. 2. (MEDIUM, Codex) Make `lake exe cache get` best-effort via continue-on-error. A CDN outage was previously a hard failure even though `lake build` would still succeed (compiling mathlib from source ~30 min vs ~30 s with cached oleans). `lake build` is now the authoritative step; cache-get is a pure optimization. Refs PR #164.
First lean-build run on a17d2e1 exposed a latent bug: ripgrep is NOT pre-installed on ubuntu-latest runners. The Stage 1-3 audit silently passed because `if rg -n "sorry" ...` evaluates exit 127 (rg-not-found) as false — no error path runs, a false-negative audit that would not catch a real sorry leak. The Spec.lean audit failed loudly only because its `|| echo 0` defaulted the count to 0. Switching both audits to grep (POSIX-guaranteed on ubuntu-latest). Also hardened the count fallback from `|| echo 0` to `|| true` so grep's own "0 matches" output (exit 1) doesn't double-emit a count value. The substantive Stage 6 design — elan install, mathlib cache-get, lake build — was empirically validated on the prior run: cold runner compiled and proof-checked the spike in ~2m26s, well under the 30-min timeout. Refs PR #164.
Stage 6 — lean-build CI (and ROADMAP)Commits this round:
Peer review
Empirical CI validationCold-cache run on
Roadmap
The spike's narrowed success bar (one proved LCS + 1-2 sorry'd reconstruction specs) is now exceeded. Suggest this is a logical merge point. Branch protection: |
…hardening, bridge diagnostics) Applies Codex + Gemini final pre-merge peer review: - Spec.lean: compareDocumentXml is now OoxmlDoc → OoxmlDoc → Option OoxmlDoc; INV-FIELD-001 and INV-RT-001 are premised on compareDocumentXml a b = some combined, so doc pairs where inplace mode fails (ContainerResolutionError or every safety pass failing) are formally out of scope. The TS pipeline is partial and the theorems should not claim totality. - lean-build.yml: drop OS-only restore-keys (could restore stale .lake/elan across toolchain/manifest changes); drop continue-on-error and "source compile fallback" framing on lake exe cache get (cold mathlib ≈ job timeout). Sorry audits use grep -nw "sorry" and the zero-sorry audit now globs all non-Spec LeanSpike modules. - lean-spec-bridge.test.ts: replace bare `return false` on rebuild fallback with a throw carrying fallbackReason, failedChecks, and the generated paragraph arrays so CI failures are debuggable. Comment header narrowed to scope "fallback ⇒ falsification" claim to the paragraph-only generator family. - ROADMAP.md: Tier 1.6 marked COMPLETE; header date bumped. - Lcs.lean: caveat comment moved to atomsEqual_implies_eq theorem site, not just README — load-bearing lemma overfits the 3-field projection. - AtomsEqual.lean: rewrite misleading "regardless of whether sha1Hash matches" docstring (atomsEqual short-circuits on hash inequality; the theorem just omits hash equality from its conclusion). Verification: lake build clean (2 expected sorries at Spec.lean:66,88) · npm -w @usejunior/docx-core run build clean · tsc --noEmit clean · lean-spec-bridge.test.ts 2/2 pass, 0 falsifications · full docx-core test:run 1231 pass + 1 skip (baseline preserved).
✅ Post-merge smoke passedMerged: Steps
Notes for triage
Log: |
…el + inv_field_001 closure (#202) * chore(verification): scaffold OpenSpec change for Tier 2 OoxmlDoc model + inv_field_001 closure Authors the OpenSpec change `add-ooxml-doc-subset-and-inv-field-001-proof` to kick off Tier 2 of the Lean 4 verification roadmap. Two rounds of gemini + codex CLI peer review have been incorporated. The proposal does NOT implement Tier 2; it scopes the work and decomposes it into eight single-`/codex-implement`-run-sized tasks (T1, T2, T3a, T3b, T4a, T4b, T5a, T5b/T6) that successor PRs against this change will land. Key decisions captured: - Model boundary: small tree-structured syntactic OOXML subset (paragraphs → blocks with `Ins`/`Del`/`MoveFrom`/`MoveTo` wrappers → runs → atoms including `DelInstrText`). Mirrors `trackChangesAcceptorAst.ts:368-659` structurally; deliberately narrower than full `ComparisonUnitAtom`. - Closure strategy: introduce a recursive well-formedness predicate `recursivelyWellformed` (whole-doc `validateFieldStructure` plus every wrapper subtree `fieldSelfContained`) and prove a preservation lemma over the Lean model. The `Spec.lean:71` sorry closes by composing the lemma with a single new named axiom `compareDocumentXml_output_recursivelyWellformed`, which becomes the load-bearing residual obligation Tier 3 will discharge. - `fieldSelfContained` strengthening (round-2 peer review fix): the counterexample `Begin, Separate, Ins(Del(End, Begin), InstrText), End` showed that per-subtree `validateFieldStructure` is too weak, so wrapper subtrees must enter and exit the depth/seenSeparator walk at `(0, false)`. - `inv_rt_001` (Spec.lean:95 sorry), Tier 2.5 (Lean↔TS equivalence, broader projection), and Tier 3 (definitional `compareDocumentXml`) are explicitly out of scope for this change. Also updates `verification/ROADMAP.md` line 3 to reflect PR #164 merged and this change in progress (the line was stale, calling PR #164 a draft when it merged 2026-05-11). `openspec validate add-ooxml-doc-subset-and-inv-field-001-proof --strict` passes. The proposal is opened as a draft PR so the model-boundary and predicate-strengthening decisions can take further peer-review iteration in PR comments before any implementation work begins. Ref: #201 (tracking issue), #164 (Tier 1 spike). * chore(verification): incorporate round-3 peer review (stack-valued field context, T0 generic-lemma split) Round 3 peer review (codex + gemini CLI, 2026-05-13) caught three further counterexample families that defeat any "standalone walk from (0, false) ends at (0, false)" predicate: - DelInstrText is unchecked: Del(DelInstrText) breaks reject after rewrite - subtree consumes outer field state via leading End/Separate: Begin, Ins(End) breaks reject; Begin, Del(End) breaks accept - single-boolean seenSeparator reset: [Begin, Separate, Del(Begin, End), InstrText, End] slips through The fix tracks seenSeparator as a depth-indexed stack (FieldCtx := List Bool), exactly mirroring the TS engine's pastSeparatorAtDepth at pipeline.ts:374-389, and strengthens the per-subtree predicate to fieldContextNeutral: the subtree is invariant under any outer field-stack context. Other revisions in this commit: - Pull shared walk-over-append + context-extension + DelInstrText rewrite lemmas out into a new T0 task (Tier2/WalkLemmas.lean) so accept and reject halves consume identical generic results. - Fix DelInstrText locality claim: rewrite is global at the operation level (after both Del and MoveFrom unwraps complete, per TS lines 602-616); locality is enforced by recursivelyWellformed on the input. - Tone down empirical wording in the new axiom's docstring; scope it to this repo's inplace atomizer rather than to OOXML comparison engines in general. - Add scenario [LEAN-T2-04](f) covering the DelInstrText locality and reject ordering documentation.
…208) * feat(verification): close inv_field_001 with Tier 2 OoxmlDoc subset Implements OpenSpec change `add-ooxml-doc-subset-and-inv-field-001-proof` (merged 2026-05-14 via PR #202). Replaces the uninterpreted document-level axioms in `LeanSpike/Spec.lean` with a definitional Lean OOXML subset under `verification/lean/Tier2/` and proves the field-structure preservation lemma machine-checked. `inv_field_001` is closed by composing that lemma with a single named residual axiom `compareDocumentXml_output_recursivelyWellformed`, which captures the only remaining assumption (discharging it by modeling `compareDocumentXml` definitionally is Tier 3 work). `inv_rt_001` remains `sorry`'d (deferred to `add-inv-rt-001-proof`). Why: PR #164 shipped `Spec.lean` over fully uninterpreted axioms, leaving both invariants as `sorry`. Closing `inv_field_001` requires either a definitional comparison model (Tier 3) or carrying an explicit, named residual obligation. Three peer-review rounds (codex + gemini CLI) drove the design to a stack-valued field context exactly mirroring the TS engine's `pastSeparatorAtDepth: number[]` and the strictly stronger `fieldContextNeutral` per-subtree predicate. What's delivered: - `Tier2/OoxmlModel.lean` — tree-structured OOXML subset. - `Tier2/FieldStructure.lean` — stack-valued walk, `validateFieldStructure`, `fieldContextNeutral`, `recursivelyWellformed`. - `Tier2/WalkLemmas.lean` — `walkBlocks_append`, neutrality no-op, `neutral_balanced` (via a tall-stack length-tracking argument so no `end` underflows), `delInstrText_rewrite_safe`. - `Tier2/AcceptReject.lean` — definitional `accept` / `reject`. - `Tier2/InvFieldOne.lean` — closed `field_structure_preserved` (zero `sorry`). - `LeanSpike/Spec.lean` — `OoxmlDoc` / `acceptAllChanges` / `rejectAllChanges` / `validateFieldStructure` rewired from `axiom` to Tier 2 definitions; new named residual axiom; `inv_field_001` proof. - `lean-build.yml` — non-Spec zero-sorry audit extended to `Tier2/`; Spec.lean audit updated from 2 sorries to 1. - `lean-spec-bridge.test.ts` — one field-bearing fixture case exercising a TS analogue of `recursivelyWellformed` as a falsifiability layer for the new axiom. - `Tier2/README.md`, `verification/lean/README.md`, `verification/ROADMAP.md` — scope, residual obligations, status. Verification: `lake build` clean with one expected `sorry` warning (`inv_rt_001`). `#print axioms inv_field_001` depends only on `propext`, `Quot.sound`, `compareDocumentXml`, and the one residual axiom — no `sorryAx`. Full `lean-spec-bridge.test.ts` suite (5 tests) passes. `tsc --noEmit` and ESLint clean on the test file. `openspec validate --strict` passes; all 10 tasks checked off. Ref: #201 * feat(verification): extend Tier2 Block with transparent .other container; add delInstrText falsifiability fixtures Closes peer-review gap B (subset coverage) and tightens gap C (axiom evidence) on PR #208. Both surfaced by Codex + Gemini peer review (commit-comment thread) and an ECMA-376 Part 4 deep-research report. ## Lean changes Adds a 6th `Block` constructor `Block.other (tag : String) (children : List Block)` to model OOXML structural containers that the field-context walk MUST descend through transparently but that are NOT track-change wrappers: `w:tbl`, `w:tr`, `w:tc`, `w:sdt`, `w:sdtContent`, `w:hyperlink`, bookmark elements, etc. Before this change the Lean model implicitly assumed inplace output contained no such containers — `compareDocumentXml : OoxmlDoc → OoxmlDoc → Option OoxmlDoc` was side-stepping a lossy projection. - `Tier2/OoxmlModel.lean`: add `.other` constructor with docstring explaining the modeling intent. - `Tier2/FieldStructure.lean`: `walkBlocks`, `countBlocks` descend through `.other` transparently. `wrapperSubtreesBlocks` does NOT emit the children as a wrapper subtree (only wrappers nested inside `.other` are emitted) — `.other` is structural, not a track-change wrapper. - `Tier2/AcceptReject.lean`: `acceptBlocks`, `rejectBlocks`, `renameBlocks` preserve the `.other` container with its tag while recursively processing children. Accept/reject don't unwrap or drop the container itself. - `Tier2/WalkLemmas.lean`: add `case7` to `walkBlocks_invalid`, `walkBlocks_append`, `countBlocks_append`, and the inductive `walkBlocks_tall` key block. - `Tier2/InvFieldOne.lean`: add `case7` to every `induction ... using *.induct` (walkBlocks_renameBlocks, countBlocks_renameBlocks, walkBlocks_acceptBlocks, walkBlocks_rejectBlocks, both countBlocks_*_balance). Introduce `allNeutral_other` helper that does NOT extract a `fieldContextNeutral` obligation on the container's children (since `.other` is not in `wrapperSubtreesBlocks`). `acceptBlocks_append` / `rejectBlocks_append` / `renameBlocks_append` and `accept_blocks` / `reject_blocks` use `cases b <;> simp [...]` patterns and require no edit — Lean enumerates all 6 constructors. ## Spec.lean axiom docstring rewrite Honestly states the engine-specific justification for the `∀ ctx` neutrality strength: safe-docx's current inplace atomizer emits whole field sequences as single track-change wrappers (verified at `inPlaceModifier.ts:717, 938, 1505, 1671, 1957, 2300` and `collapsed-field-inplace.test.ts:211`). The axiom is NOT spec-conformant: under ECMA-376 Part 4 a conformant emitter must fragment fields across wrapper boundaries when modifying them, and fragmented wrappers are not `fieldContextNeutral` under `∀ ctx`. When the engine becomes ECMA-376-conformant, this axiom and the per-subtree neutrality predicate will need replacement with a document-level form (follow-up PR-B tracked in plan). ## TS bridge fixtures (gap C) Adds three new tests to `lean-spec-bridge.test.ts`: 1. **NUMPAGES deletion fixture** — original has the complete field; revised deletes it. Exercises the `w:delInstrText` atom case in `isFieldContextNeutral`, which the existing NUMPAGES insertion fixture never reaches. 2. **Regression guard — `isFieldContextNeutral` rejects non-neutral shapes**: standalone `w:fldChar separate`, standalone `w:fldChar end`, and a `[begin, separate]` fragment. Crafted DOM-level XML, not full DOCX. 3. **Regression guard — `isFieldContextNeutral` accepts a complete self-contained field wrapper**: the canonical NUMPAGES-in-w:ins shape the engine actually emits. All 8 bridge tests pass locally; typecheck and lint clean. ## Out of scope (deferred to PR-B) The `∀ ctx` → document-level predicate weakening. Codex's review concluded that the current engine satisfies the existing `∀ ctx` predicate, so the weakening is not required for shipping PR #208. PR-B will land after the engine fragmentation conformance work (new GH issue to be filed) is scoped. Ref: #208 * docs(verification): reference issue #217 in inv_field_001 residual axiom docstring Replaces the placeholder "see follow-up issue" with the concrete cross-reference to the engine fragmentation conformance work tracked in #217. No behavior change. Ref: #208, #217 --------- Co-authored-by: Steven Obiajulu <stevenobiajulu@Stevens-MacBook-Pro.local>
Summary
Six-stage Lean 4 verification spike against the safe-docx comparison engine.
The spike answers the time-boxed question "is Lean 4 + mathlib + AI-assisted
proof work tractable here?" with concrete artifacts: 5 closed proofs, 2
intentionally
sorry'd specification targets, a fast-check empiricalfalsifiability bridge, and a CI gate that re-checks the proofs on every PR
touching
verification/lean/**.Scope is deliberately narrow per the time-box and prior peer-review
corrections: the atom-level LCS subroutine and a sorry'd specification
surface, not the full comparison engine. See
verification/ROADMAP.mdforthe full Tier 1 → Tier 3+ trajectory.
What's proved (closed, zero-sorry, machine-checked)
verification/lean/LeanSpike/:AtomsEqual.lean) — hash-collision safety ofatomsEqual.Lcs.lean) — value-level subsequence soundness ofcomputeAtomLcs.original; matched + inserted partitionrevised).What's stated but not proved (sorry'd specification targets)
verification/lean/LeanSpike/Spec.leanover an uninterpreted axiom signature:Closing these is Tier 2 work; see ROADMAP.
What's empirically validated (not proved, but tested on every PR)
packages/docx-core/src/integration/lean-spec-bridge.test.ts— fast-checkproperty tests at 100 runs/property × 2 properties against the live TS
engine, gated on
reconstructionModeUsed === 'inplace'. Surfacesfalsifications of INV-FIELD-001 or INV-RT-001 on randomly-generated synthetic
inputs. Local: 0 falsifications, ~3.6s total per full test:run.
Stages
atomsEqual+ INV-ATOMSEQ-001computeAtomLcs+ Wagner-Fischer brute-force validation (1.19M cases)Spec.leanwith INV-FIELD-001 + INV-RT-001 over uninterpreted axiom signaturebuildSyntheticDocxlean-buildGitHub Actions workflow with mathlib cache + sorry auditsEach stage was peer-reviewed by Codex and (where useful) Gemini in parallel
before merging into the spike branch. See PR comments for round-by-round
review summaries.
CI
New workflow
.github/workflows/lean-build.yml:mainpaths-filtered toverification/lean/**+ the workflow file itself~/.elan+verification/lean/.lakekeyed onlean-toolchain+lake-manifest.jsonlake exe cache get(best-effort) →lake build(authoritative)Spec.leanstays at exactly 2 sorriesWorkflow is not in branch protection yet. After merge, the maintainer
can flip it to required via Settings → Branches → main → Require status checks.
What this PR does NOT cover
Per the ROADMAP, the following remain open:
OoxmlDocmodel + closed proofs ofINV-FIELD-001/INV-RT-001.Atomprojection toward the realComparisonUnitAtom.inPlaceModifier.ts/documentReconstructor.ts(the actual layer where bug classes from Reconstruct paragraph-level markers (commentRange, bookmark) outside <w:r> in rebuild mode #106/Reconstruct moveFromRange*/moveToRange* paragraph-level markers on rebuild #110/Reconstruct permStart/permEnd paragraph-level markers on rebuild #111/bug: inplace mode strips spaces from deleted text in word-split reconstructor #76/Inplace modifier misaligns content in tables with merged cells (w:vMerge/w:gridSpan) #65 live).Estimated remaining effort (with wide error bars): 2.5–5 years of
intermittent agent-driven work to reach Tier 3 complete.
Caveats (already documented in
verification/lean/README.md)atomsEqual_implies_eqholds only becauseAtomis projected to 3 fields; would break under a broader projection toward the realComparisonUnitAtom.collapsed-field-inplace.test.ts; small-edit regime lives inround-trip-inplace.test.tsandnvca-coi-regression.test.ts.Verification
cd verification/lean && lake build(zerosorryin Stage 1-3 modules; exactly 2 inSpec.lean)npm run buildcleannpm run lint:workspacescleannpm run test:run— 1231 pass + 1 skip across docx-corenpm run check:spec-coveragepasslean-buildworkflow green on actual GitHub Actions runner (5m14s cold cache)Follow-on (deferred to a separate issue)
Extend the fast-check bridge's arbitraries to exercise the new
tracked-changes-emitting primitives (recent PRs #137/#138/#139/#140 added
pPrChange/trPrChange/tcPrChangeand ins/del markup emitters infootnote / comment / paragraph-insert primitives). The bridge currently
builds inputs via
buildSyntheticDocxwhich emits plain XML; using thenew primitives as input sources would meaningfully widen INV-FIELD-001 /
INV-RT-001 coverage. Tracked separately to keep this PR scoped.