diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 0000000..a067586 --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,19 @@ +## Summary + + + +## Test Plan + + + +## Edit Algebra (only if PR touches packages/ooxml-swift/Sources/OOXMLSwift/EditAlgebra/) + +- [ ] PR introduces or modifies an `OOXMLEdit` / `WordEdit` enum case +- [ ] ASCII-ladder commutative diagram is attached to this PR description below, OR added inline in design.md / spec.md +- [ ] CD diagram shows: (1) Word UI action or schema-level change, (2) `τ` translation between Word semantics and Swift representation, (3) commutativity claim + +(CD discipline rationale: see `ooxml-edit-isomorphism-foundation` design.md § ADR-002 Worked Examples and `docs/edit-algebra-cd-discipline.md`.) + +## Related Issues + +Refs # diff --git a/docs/edit-algebra-cd-discipline.md b/docs/edit-algebra-cd-discipline.md new file mode 100644 index 0000000..346fd91 --- /dev/null +++ b/docs/edit-algebra-cd-discipline.md @@ -0,0 +1,82 @@ +# `EditAlgebra/` — Edit Type Contract + +This directory is the runtime home for the **Word↔Swift edit-isomorphism** contract pinned in Spectra change `ooxml-edit-isomorphism-foundation` (see `openspec/changes/ooxml-edit-isomorphism-foundation/`). + +## What lives here + +When the deferred Phase 2 implementation Spectra change lands, this directory will contain: + +- `Edit.swift` — `Edit` protocol declaring `apply(to:) throws -> Document` and `lower() -> [OOXMLEdit]`; plus `EditError` enum +- `OOXMLEdit.swift` — syntactic-layer enum (cases address OOXML elements by path) +- `WordEdit.swift` — semantic-layer enum (cases address Word UI verbs) + +Currently this directory holds only this README. The Edit type is documented as a normative contract via the Spectra change's spec; runtime files arrive when the implementation Spectra change executes (see "Status" below). + +## CD discipline (PR review gate) + +**Every PR introducing a new `OOXMLEdit` or `WordEdit` enum case SHALL attach a commutative diagram + commute proof.** This is non-negotiable; reviewers reject PRs without a CD diagram. + +### Why CD discipline + +Per `ooxml-edit-isomorphism-foundation` design.md § ADR-002, commutative diagrams give us six concrete affordances that pure `Edit` typing alone doesn't: + +1. **Verification spec generator** — each Edit case `e` gets a testable obligation `τ(e_word(s)) == e_swift(τ(s))` for all states `s` +2. **Documentation standard** — PR includes an ASCII ladder (IETF-RFC style) so future maintainers can see the design intent +3. **Test generator** — property tests `∀s. f₁(s) == f₂(s)` derive directly from the two CD paths +4. **Incompatibility-surface detector** — when a CD can't be drawn cleanly, the Edit has hidden state or a cross-layer leak that must be addressed before merge +5. **Compositional reasoning** — `e₁ ∘ e₂`'s CD = paste `e₁`'s + `e₂`'s +6. **Cross-layer cube** — two-layer algebra maps to a 3D CD; the 6 faces must commute + +### How to draw a CD diagram + +Use ASCII-art ladder format, modeled on IETF RFCs. Show: + +1. The **Word UI action** that the Edit case corresponds to (top row, for `WordEdit`) or the **OOXML schema-level change** (top row, for `OOXMLEdit`) +2. The **τ translation** (vertical arrow labels) between Word UI semantics and Swift representation +3. The **commutativity claim** (two paths from input state to output state must reach the same destination) + +Example for `OOXMLEdit.setBold`: + +``` + Word UI action: Cmd-B on selected text + ┌─────────────────────────────────────────────┐ + │ │ + docx_in ─┼──────────────────────────────────────────► docx_out + │ "type 'hello' then select + Cmd-B" │ + │ │ + τ │ │ τ + │ │ + ▼ ▼ + swift_in ─────────────────────────────────────────► swift_out + Document.apply(OOXMLEdit.setBold(at: runPath, value: true)) + +CD obligation: + τ(docx_out) == swift_out (i.e., applying setBold on the Swift side + produces the same state τ-equivalent to Word's behavior) +``` + +### Worked examples + +See `openspec/changes/ooxml-edit-isomorphism-foundation/design.md` § ADR-002 *Worked Examples* for fleshed-out CD diagrams for: + +- `OOXMLEdit.insertParagraph(at:content:)` (body-level mutation, straightforward CD) +- `OOXMLEdit.setBold(at:value:)` (run-level mutation, requires rPr handling) +- `OOXMLEdit.insertHyperlink(at:href:)` (dual-part mutation requiring relationship-part update — non-trivial CD) +- `WordEdit.applyBold(range:)` with range-crossing-paragraph case (boundary-ambiguity CD: `lower()` produces multiple OOXMLEdit cases) + +## Status + +This Edit type contract is currently in **decision-pinning state**: the Spectra change `ooxml-edit-isomorphism-foundation` has shipped its proposal / design.md (9 ADRs) / spec.md (8 Requirements) / tasks.md (35 tasks). Runtime implementation (Edit protocol code, OOXMLEdit / WordEdit enum cases, property tests) is deferred to a Phase 2 follow-up Spectra change that cites this foundation. + +Downstream consumers: + +- `che-word-mcp` (#162) — MCP tool surface refactor to `WordEdit` (deferred per ADR-009) +- `word-builder-swift` (#101) — lens-model migration (deferred per ADR-008) +- macdoc PR #94 / #95 / #96 / #97 / #98 — re-framing to Layer 3 / 4 front-ends (deferred per ADR-009; tracking issues #102 / #103 / #104) + +## Related documents + +- Spectra change: `openspec/changes/ooxml-edit-isomorphism-foundation/` +- Capability spec: `openspec/changes/ooxml-edit-isomorphism-foundation/specs/ooxml-edit-algebra/spec.md` +- Prior-art docs: `docs/lossless-conversion.md`, `docs/structural-editing-paradigm.md`, `docs/functional-correspondence.md` +- Active runtime change: `openspec/changes/word-aligned-state-sync/` (provides the event-sourced runtime mechanism this contract types over) diff --git a/docs/lossless-conversion.md b/docs/lossless-conversion.md index f6f66f3..29ec685 100644 --- a/docs/lossless-conversion.md +++ b/docs/lossless-conversion.md @@ -962,3 +962,9 @@ W ↔ Marker(Tier 3 Bijection):全部 Word 的完美可逆 4. **Metadata 無上限** — 任何讓 round-trip break 的遺漏都是 bug,不是「可接受的妥協」 5. **Streaming 兼容** — 三通道平行輸出,O(1) 記憶體 6. **使用者選擇** — 有損是刻意的選擇(Tier 1/2),不是設計缺陷 + +## Related: ooxml-edit-algebra capability + +The canonical-identity round-trip contract described in §0 is formalized as a normative Requirement in the `ooxml-edit-algebra` capability spec at `openspec/changes/ooxml-edit-isomorphism-foundation/specs/ooxml-edit-algebra/spec.md`. See `ooxml-edit-isomorphism-foundation` design.md § ADR-001 for the contract rationale and the chosen invariant (canonical-identity over byte-identity over semantic-equivalence). + +This document remains the formal-systems reference; the capability spec is the implementation-binding contract. diff --git a/docs/structural-editing-paradigm.md b/docs/structural-editing-paradigm.md index 4738bb4..4191bf9 100644 --- a/docs/structural-editing-paradigm.md +++ b/docs/structural-editing-paradigm.md @@ -410,3 +410,10 @@ tracked as separate follow-up SDD。這些落地後 loss 預期 drop 到 < 5%, 屆時可以正式 ship 第 6.1 節「edit 一個字 → document.xml shrinks <1%」strong demo。 到時候第 6.1 節的「edit 一個字 → document.xml shrinks <1%」claim 就可以正式落地。 + + +## Related: ooxml-edit-algebra capability + +The architectural patterns described here (overlay save, dirty-tracking, Invariants 1+2, 5 preservation classes) are typed and formally contracted in the `ooxml-edit-algebra` capability spec at `openspec/changes/ooxml-edit-isomorphism-foundation/specs/ooxml-edit-algebra/spec.md` (see Spectra change `ooxml-edit-isomorphism-foundation` for the 9 ADRs that pin the contract). + +This document remains the implementation-pattern reference; the capability spec is the normative contract. diff --git a/openspec/changes/ooxml-edit-isomorphism-foundation/.openspec.yaml b/openspec/changes/ooxml-edit-isomorphism-foundation/.openspec.yaml new file mode 100644 index 0000000..0a92801 --- /dev/null +++ b/openspec/changes/ooxml-edit-isomorphism-foundation/.openspec.yaml @@ -0,0 +1,4 @@ +schema: spec-driven +created: 2026-05-25 +created_by: che cheng +created_with: claude diff --git a/openspec/changes/ooxml-edit-isomorphism-foundation/design.md b/openspec/changes/ooxml-edit-isomorphism-foundation/design.md new file mode 100644 index 0000000..b276d98 --- /dev/null +++ b/openspec/changes/ooxml-edit-isomorphism-foundation/design.md @@ -0,0 +1,411 @@ +## Context + +`macdoc`'s OOXML toolchain consists of four layers of working infrastructure that have evolved independently: + +| Layer | Status | Where | +|---|---|---| +| Layer 0 — Lossless OOXML tree | Shipped v0.13.0+ (overlay save), v0.20.3 (5 preservation classes) | `packages/ooxml-swift/Sources/OOXMLSwift/` | +| Layer 1 — Typed lens (Run, Paragraph, Section) | Partially shipped (Paragraph, Run, Table, etc.) | `packages/ooxml-swift/Sources/OOXMLSwift/Models/` | +| Layer 2 — Semantic API (Word-UI-mirroring) | Not yet expressed as a layer | Scattered convenience methods on `Document` | +| Layer 3 — DSL frontend (result builder) | Shipped v0.9.0 (write-only) | `packages/word-builder-swift/Sources/WordBuilderSwift/` | +| Layer 4 — User authors / AI edits | (out of macdoc scope; consumer-side) | `*.swift` user scripts | + +The architecture has been described informally across four prior-art documents (`lossless-conversion.md`, `structural-editing-paradigm.md`, `functional-correspondence.md`, `philosophy.md`) but never pinned as a normative contract. Downstream proposals (#92 dxedit, #88 R-wordbuilder, #90 pptx-mcp) each implicitly assume different mental models of "edit", causing the spec gaps surfaced by 6-AI verify on PRs #94/#95/#96/#97/#98. + +The proposed contract draws on Quine's *radical translation* (1960) framing — translating between two representations (Word UI semantics ↔ Swift API semantics) is fundamentally underdetermined, and the architectural choice is *which* equivalence to preserve. Tools choose byte-identity (impossible), canonical-identity (achievable, our choice), or semantic-equivalence (too weak, loses information). Aligned with `swift-syntax`'s lossless-CST precedent and Roslyn's incremental-parse model. + +## Goals / Non-Goals + +**Goals:** + +1. **Pin the architectural contract** between Word UI semantics and Swift API as a fully-faithful functor, with canonical-identity round-trip as the round-trip invariant. Make this contract normative (SHALL/MUST) rather than implicit-in-implementation. + +2. **Elevate `Edit` to a first-class type** so equality, composition, and `lower()` semantics are expressible at the type level, not just as runtime patterns in `Document.applyOverlay()`. + +3. **Establish two-layer edit algebra** (`WordEdit` / `OOXMLEdit`) so callers can choose semantic vs. syntactic granularity, and the `lower()` bridge makes the translation auditable case-by-case. + +4. **Codify CD discipline** as a PR review gate: every new `OOXMLEdit` / `WordEdit` case requires a commutative diagram + commute proof attached to the PR. + +5. **Validate the contract via property tests** on 3–5 representative operations (`insertParagraph`, `setBold`, `insertHyperlink`, plus 1–2 more selected during apply) using the NTPU thesis fixture already in `RealWorldDocxRoundTripSmokeTests`. + +6. **Document migration paths** for downstream consumers (`word-builder-swift` lens migration; `che-word-mcp` boundary refactor; dxedit / R-wordbuilder / pptx-mcp front-end rerouting) without implementing those migrations in this change. + +**Non-Goals:** + +- **`word-builder-swift` lens-model migration** (deferred to follow-up Spectra change per ADR-008). Current struct-serialization model coexists during transition. +- **`che-word-mcp` MCP tool boundary refactor** to expose `WordEdit` directly. Current `Document`-mutation API stays; refactor is a future Spectra change. +- **Operational rerouting of downstream issues** (#92, #88, #90). This change documents *intent* via ADR-009; per-issue re-framing is a separate change per downstream. +- **Automated CD-diagram validation tooling**. Manual reviewer discipline only. +- **Full `[OOXMLEdit]` surface implementation**. Only 3–5 representative operations for property-test validation. +- **Implementation of the module split** (`OOXMLSyntax` / `OOXMLSemantic` / `OOXMLDSL`). ADR-004 documents the split; physical module reorganization deferred to follow-up. +- **Replacing existing `Document.applyOverlay()` / `markDirty()` patterns**. Edit type *wraps* this machinery, doesn't replace it. + +## Decisions + +### ADR-001: Round-trip contract = canonical-identity + +**Decision**: The macdoc OOXML toolchain commits to **canonical-identity** as the round-trip contract: after `parse → mutate → serialize`, the subtree that was not modified is bytewise-equal to its input form after XML canonicalization (c14n). Modified subtrees are content-equivalent (semantically equal) to their intended new value. + +**Alternatives considered**: + +| Level | Contract | Verdict | +|---|---|---| +| Byte-identity | `serialize(parse(x)) == x` byte-by-byte | REJECTED — impossible. XML allows ``, ``, `` as semantically equal; multiple parsers produce different but valid byte sequences for the same input. | +| **Canonical-identity** | After c14n, unmodified subtree bytewise-equal | **CHOSEN** — achievable, empirically validated by v0.20.3's 5 preservation classes, aligns with `swift-syntax`/Roslyn precedent. | +| Semantic-equivalence | Word renders identically | REJECTED — too weak. Vendor extensions, comments, watermarks, customXml all dropped silently because they don't affect render. | + +**Rationale**: Canonical-identity is the strongest contract empirically reachable. v0.13.0 onwards demonstrates it works in practice; this change writes it as normative SHALL. + +### ADR-002: Core type = Edit (not Document); PR-must-attach-CD-diagram review discipline + +**Decision**: `Edit` is a first-class Swift type with explicit equality, composition (associative `∘`), and `lower()` semantics. PRs that introduce new `Edit` cases SHALL attach a commutative diagram + commute proof. + +**Alternatives considered**: + +| Approach | Verdict | +|---|---| +| `Document` as core (mutate-in-place) | REJECTED — current implicit model. Composition reasoning impossible; tests are ad-hoc; CD impossible to draw. | +| **`Edit` as first-class type** | **CHOSEN** — allows compositional reasoning, property-based tests, CD diagram per case. | +| `Operation` type (procedural framing) | REJECTED — same semantic content but loses the "algebra" framing that motivates two-layer separation. | + +**CD discipline rationale** (per issue body): CD methodology gives 6 concrete affordances that pure `Edit` typing alone doesn't: + +1. Verification spec generator — each Edit case `e` has obligation `τ(e_word(s)) == e_swift(τ(s))` for all `s` +2. Documentation standard — PR includes an ASCII ladder (IETF-RFC style) +3. Test generator — property tests `∀s. f₁(s) == f₂(s)` from the CD's two paths +4. Incompatibility-surface detector — non-drawable CD = hidden state or cross-layer leak in the proposed Edit +5. Compositional reasoning — `e₁ ∘ e₂`'s CD = paste `e₁`'s + `e₂`'s +6. Cross-layer cube — two-layer algebra (ADR-003) maps to 3D CD where 6 faces must commute + +#### Worked Examples + +The CD diagrams below are the canonical worked examples that `.github/PULL_REQUEST_TEMPLATE.md` and `docs/edit-algebra-cd-discipline.md` reference. When a new `OOXMLEdit` or `WordEdit` case is added, the PR's CD diagram should follow the same structure as one of these examples (ladder format, top arrow = Word UI action or schema change, vertical arrows = `τ`, bottom arrow = Swift Edit invocation, commutativity claim explicit at the end). + +##### Example 1: `OOXMLEdit.insertParagraph(at:content:)` — body-level mutation + +``` + Word UI action: position cursor at body-children index N, + press Enter, type "hello" + ┌────────────────────────────────────────────────────────┐ + │ │ + docx_in ─┼────────────────────────────────────────────────────► docx_out + │ body.children gains with hello │ + │ at index N; sectPr / comments / customXml preserved │ + │ │ + τ │ │ τ + │ │ + ▼ ▼ + swift_in ─────────────────────────────────────────────────────► swift_out + Document.apply(OOXMLEdit.insertParagraph(at: N, content: "hello")) + +CD obligation: + τ(docx_out) == swift_out + AND: c14n(docx_in.untouched_subtrees) == c14n(docx_out.untouched_subtrees) + where untouched_subtrees = all body children at indices ≠ N + all parts outside word/document.xml +``` + +##### Example 2: `OOXMLEdit.setBold(at: runPath, value: Bool)` — run-level mutation with rPr handling + +``` + Word UI action: select text in Run R, press Cmd-B + ┌────────────────────────────────────────────────────────┐ + │ │ + docx_in ─┼────────────────────────────────────────────────────► docx_out + │ Run R's gains/loses ; existing rPr │ + │ children (font, color, etc.) preserved in order │ + │ │ + τ │ │ τ + │ │ + ▼ ▼ + swift_in ─────────────────────────────────────────────────────► swift_out + Document.apply(OOXMLEdit.setBold(at: runPath, value: true)) + +CD obligation: + τ(docx_out) == swift_out + AND: c14n(docx_in.untouched_runs) == c14n(docx_out.untouched_runs) + — sibling Runs unaffected; Run R's text content unchanged; only rPr's + presence toggled +``` + +##### Example 3: `OOXMLEdit.insertHyperlink(at: runPath, href: URL)` — dual-part mutation (non-trivial CD) + +This example is non-trivial because the Edit modifies TWO parts atomically: `word/document.xml` (insert `` element) AND `word/_rels/document.xml.rels` (add Relationship entry). The CD must show both legs commute. + +``` + Word UI action: select text in Run R, Insert → Hyperlink, + paste URL, click OK + ┌────────────────────────────────────────────────────────┐ + │ │ + docx_in ─┼────────────────────────────────────────────────────► docx_out + │ - word/document.xml: Run R wrapped in ; new rId allocated │ + │ - word/_rels/document.xml.rels: new │ + │ - Both modifications atomic; either both land or │ + │ neither (no half-applied state) │ + │ │ + τ │ │ τ + │ │ + ▼ ▼ + swift_in ─────────────────────────────────────────────────────► swift_out + Document.apply(OOXMLEdit.insertHyperlink(at: runPath, href: URL)) + +CD obligation: + τ(docx_out) == swift_out + AND atomicity: throws if rels-part write fails AFTER document.xml mutation + (no half-applied state in swift_out) + AND: c14n(docx_in.parts \ {document.xml, document.xml.rels}) == + c14n(docx_out.parts \ {document.xml, document.xml.rels}) + — all other parts (styles, settings, customXml, etc.) bytewise-equal post-c14n +``` + +##### Example 4: `WordEdit.applyBold(range:)` with range-crossing-paragraph — boundary ambiguity CD + +This example shows how a semantic-layer Edit lowers to MULTIPLE syntactic-layer Edits when the Word UI semantics cross structural boundaries. The CD must show that the WordEdit naturality property holds (i.e., `lower()` produces the same final state whether composed at the semantic layer or computed across separate per-paragraph OOXMLEdits). + +``` + Word UI action: drag-select text spanning 2 paragraphs, press Cmd-B + ┌────────────────────────────────────────────────────────┐ + │ │ + docx_in ────────┼────────────────────────────────────────────────────► docx_out + │ Paragraph 1's affected Run: rPr gains │ + │ Paragraph 2's affected Run: rPr gains │ + │ Both Runs may need splitRun if range partial │ + │ │ + τ │ │ τ + │ │ + ▼ ▼ + swift_in Document.apply(WordEdit.applyBold(range: ...)) swift_out + │ + ↓ lower() + │ + ┌──────────────────────┴──────────────────────┐ + │ │ + ▼ ▼ + swift_in ─► Document.apply([ swift_out + OOXMLEdit.splitRun(at: para1.run, at: rangeStart), + OOXMLEdit.splitRun(at: para2.run, at: rangeEnd), + OOXMLEdit.setBold(at: para1.runAfterSplit, value: true), + OOXMLEdit.setBold(at: para2.runBeforeSplit, value: true), + ]) + +CD obligation (both legs commute): + - WordEdit.applyBold(range:).apply(swift_in) == [4 OOXMLEdits].fold(apply)(swift_in) + - Naturality: (WordEdit.applyBold(r1) ∘ WordEdit.applyBold(r2)).lower() == + WordEdit.applyBold(r1).lower() ∘ WordEdit.applyBold(r2).lower() +``` + +These four examples cover: (1) trivial body-level mutation, (2) run-level mutation with rPr handling, (3) dual-part mutation requiring atomicity, (4) semantic-layer Edit that lowers to multiple syntactic-layer Edits via range splitting. New Edit cases SHOULD follow the structure of whichever example most resembles them. + +### ADR-003: Two-layer edit algebra (WordEdit / OOXMLEdit), lower() as bridge + +**Decision**: Define `WordEdit` (semantic — `WordEdit.applyBold(range)`) and `OOXMLEdit` (syntactic — `OOXMLEdit.insertElement(at: rPr_path, element: )`) as separate types. `WordEdit.lower(): [OOXMLEdit]` bridges them. Naturality invariant: `WordEdit(a).lower() ∘ WordEdit(b).lower() = (WordEdit(a) ∘ WordEdit(b)).lower()` for composable operations. + +**Alternatives considered**: + +| Approach | Verdict | +|---|---| +| Single Edit type, no semantic/syntactic split | REJECTED — Word UI semantics (Cmd-B on text crossing paragraph boundary) and OOXML syntax (split into per-paragraph rPr) diverge; conflating them buries decisions. | +| **WordEdit + OOXMLEdit with lower() bridge** | **CHOSEN** — both layers expressible, translation auditable. | +| WordEdit-only (lower implicit) | REJECTED — implementers debugging round-trip would have no syntactic-layer handle. | + +**Rationale**: The two layers serve different audiences. `WordEdit` for callers expressing user intent (dxedit manifests, R-wordbuilder scripts). `OOXMLEdit` for tool implementers debugging serialization. `lower()` keeps them aligned by construction. + +### ADR-004: Module split — OOXMLSyntax (L0/L1) / OOXMLSemantic (L2) / OOXMLDSL (L3) + +**Decision**: The codebase splits along Layer boundaries (deferred to follow-up): + +- `OOXMLSyntax` — Layer 0 (lossless tree, `OOXMLNode + trivia`) + Layer 1 (typed lens, `Run`, `Paragraph`, `Section`) +- `OOXMLSemantic` — Layer 2 (`WordEdit`, semantic-API methods, Word-UI behavior mirror) +- `OOXMLDSL` — Layer 3 (result-builder front-end, `Document { Chapter { ... } }`) + +**Alternatives considered**: + +| Approach | Verdict | +|---|---| +| Keep single `OOXMLSwift` module | REJECTED — Layer mixing prevents Layer 3 callers from importing only the DSL. | +| **3-module split** | **CHOSEN** — clean Layer boundaries, smaller per-module import surfaces. | +| 5-module split (one per Layer) | REJECTED — Layer 0+1 always co-required; Layer 4 is consumer code, not macdoc-owned. | + +**Note**: This ADR records the *intent*. Physical module reorganization is a follow-up Spectra change — this one only adds the `EditAlgebra/` subdirectory inside `OOXMLSwift`. + +### ADR-005: Edit operation surface naming + canonical example set + +**Decision**: `OOXMLEdit` case names use the schema element they target: `insertParagraph(at:)`, `setBold(at:)`, `insertHyperlink(at:href:)`. `WordEdit` cases use Word UI verb: `WordEdit.applyBold(range:)`, `WordEdit.insertLink(range:url:)`. + +**Alternatives considered**: + +| Approach | Verdict | +|---|---| +| `OOXMLEdit.modify(at:)` (XML-literal naming) | REJECTED — exposes namespace prefix as case name; breaks Swift naming conventions and creates ergonomics friction | +| `OOXMLEdit.operation(.insertParagraph, at:)` (nested enum) | REJECTED — loses pattern-match exhaustiveness and adds boilerplate per case | +| **`OOXMLEdit.insertParagraph(at:)` (flat schema-element naming)** | **CHOSEN** — matches schema vocabulary directly, Swift-idiomatic, pattern-match exhaustive | +| `WordEdit.bold(range:)` (Word-verb-only naming, no `apply` prefix) | REJECTED — collides with property accessor convention (`run.bold = true`); apply* prefix disambiguates Edit-vs-state mutation | + +**Rationale**: Schema-element naming for OOXMLEdit lets implementers grep ECMA-376 spec text; verb-prefix for WordEdit signals intent (this is a mutation, not a property read). + +**Apply-phase canonical example set** (3–5 operations for property-test validation): + +1. `OOXMLEdit.insertParagraph(at: bodyChildIndex, content:)` — body-level mutation, covers ADR-001 contract for sectPr / comments preservation +2. `OOXMLEdit.setBold(at: runPath, value: Bool)` — run-level mutation, covers ADR-006 Word UI ground truth (Cmd-B) +3. `OOXMLEdit.insertHyperlink(at: runPath, href: URL)` — composite mutation requiring relationship part update, exercises canonical-identity for `_rels/document.xml.rels` +4. (To be selected during apply) — Table-cell mutation +5. (To be selected during apply) — Comment / Bookmark mutation + +`WordEdit` counterparts: `applyBold`, `applyLink`, `applyInsertParagraph` (when range crosses semantic boundaries, lower() returns multiple OOXMLEdit cases). + +### ADR-006: Word UI behavior as ground truth + +**Decision**: When `WordEdit` semantics are ambiguous, the resolution is "what does Microsoft Word UI do under the equivalent user action?" Documented by: + +1. Recording the user action (Cmd-B, Cmd-K, Insert→Hyperlink) +2. Inspecting the resulting OOXML diff via Word's "Save As → Word XML" (or extracting `.docx` ZIP after save) +3. The diff *is* the `lower()` output for that `WordEdit` case + +**Alternatives considered**: + +| Approach | Verdict | +|---|---| +| OOXML schema (ECMA-376) as ground truth | REJECTED — schema describes valid XML shapes but not semantic intent; Word's interpretation of ambiguous schemas is what users observe | +| Pandoc / LibreOffice as ground truth | REJECTED — neither is a normative implementation; both make their own interpretive choices that may differ from Word | +| **Microsoft Word as ground truth** | **CHOSEN** — Word is the canonical author tool; ~99% of `.docx` files in the wild are produced or consumed by it; matching its behavior maximizes interop | +| Best-effort consensus across implementations | REJECTED — explosion of edge cases; lack of clear arbiter when implementations disagree | + +**Rationale**: Avoids decisions-by-committee about "the right way" to express a Word edit in OOXML — defer to Microsoft's own implementation as oracle. Aligns with reality (we're not redefining OOXML). **Trust-boundary caveat**: this ADR does NOT mandate accepting Word's failure modes (e.g., Word silently dropping unrecognized namespaces is NOT a contract macdoc inherits). The "ground truth" applies to *intentional* semantics, not implementation bugs / fallbacks. Phase 2 implementation MUST add adversarial-input validation (path traversal in `r:id`, XXE in customXml, zip-slip in extracted parts) — Word's best-effort handling is not the spec. + +### ADR-007: Conformance suite extension from NTPU thesis fixture + +**Decision**: The existing `RealWorldDocxRoundTripSmokeTests` (NTPU thesis fixture, 5 preservation classes) is the foundation. This change adds: + +- Property-based fully-faithful-functor tests under `EditAlgebraTests/FullyFaithfulFunctorTests.swift` +- Per-`OOXMLEdit` case CD-commute test (auto-generated from the CD's two paths per ADR-002) +- (Future, per ADR-007 follow-up) Corpus expansion to thesis + Zotero customXml + people.xml + commentsExtended fixtures +- (Future) Fuzz testing for `OOXMLEdit` argument-space corner cases + +**Alternatives considered**: + +| Approach | Verdict | +|---|---| +| New synthetic minimal fixture (hand-crafted small `.docx`) | REJECTED — synthetic fixtures miss real-world OOXML quirks (vendor extensions, w14/w15 RSIDs, customXml) that motivate the canonical-identity contract | +| OOXML conformance test suite (ECMA-376 reference docs) | REJECTED — ECMA reference docs test schema validity, not edit-isomorphism; orthogonal goal | +| **NTPU thesis fixture + property-based tests** | **CHOSEN** — real-world fixture already used in RealWorldDocxRoundTripSmokeTests; 5 preservation classes already validated; property tests assert canonical-identity on randomized Edit inputs | +| Corpus expansion (multiple thesis + customXml + people.xml fixtures) | DEFERRED — Phase 2 follow-up; current change validates contract on one rich fixture before expanding | + +**Scope of this change**: Property tests for 3–5 operations on the NTPU thesis fixture; corpus / fuzz expansion deferred to follow-up Spectra change. + +### ADR-008: Migration path for word-builder-swift 0.9.0 → lens model (DEFERRED) + +**Decision**: `word-builder-swift` 0.9.0 (write-only struct serialization) will migrate to a lens-model architecture in a dedicated follow-up Spectra change. The migration is BREAKING (callers must adapt from `Document(sections: [Section(paragraphs: [...])])` builder calls to lens-rooted edits). + +**Alternatives considered**: + +| Approach | Verdict | +|---|---| +| Immediate breaking change in word-builder-swift 1.0.0 (no coexistence period) | REJECTED — too disruptive for existing #71 callers (che-word-mcp + macdoc CLI); migration window valuable | +| **Coexistence + deprecation cycle (LensDocument alongside Document)** | **CHOSEN** — gradual migration, downstream callers can opt-in per call-site | +| Keep struct-serialization permanently as alternative API | REJECTED — two parallel APIs create maintenance debt + violate single-source-of-truth principle for Edit semantics | +| Fork word-builder-swift into wb-lens (new package) | REJECTED — splits the ecosystem; existing #71 contributors / consumers split across forks | + +**Documented path** (for the follow-up): + +1. Add a `LensDocument` type alongside existing `Document` (coexistence period, ~3 months) +2. Migrate `Packer.toFile()` callers in `che-word-mcp` and `macdoc convert` to optionally use `LensDocument` +3. Deprecate the struct-serialization `Document` API +4. Remove deprecated paths in word-builder-swift 1.0.0 + +**This change does not implement any of the above**. Migration is its own Spectra change citing this foundation. Follow-up tracking issue: PsychQuant/macdoc#101. + +### ADR-009: Downstream issue rerouting + +**Decision**: `#92` (dxedit declarative CLI), `#88` (R-wordbuilder generator), `#90` (che-pptx-mcp) are **front-ends** for the architecture defined here, not parallel design efforts. + +**Alternatives considered**: + +| Approach | Verdict | +|---|---| +| Continue all four threads (this foundation + #92 + #88 + #90) as independent parallel proposals | REJECTED — already proved problematic; PR #94 / #95 / #96 / #97 / #98 each surfaced spec gaps that boil down to "no shared Edit-type framing" | +| Cancel #92 / #88 / #90 and absorb them into this foundation's spec | REJECTED — over-bundling; each downstream has independent value as a front-end (different audience, different language ergonomics) | +| **Reroute via ADR-009: this foundation locks contract, downstreams become Layer 3 / 4 / specialized-to-PPTX front-ends** | **CHOSEN** — preserves each downstream's autonomy while ensuring they share the Edit-type contract | +| Foundation-first + downstream-later sequencing with explicit handoff issues | ADOPTED AS COMPLEMENT — see follow-up issues #102 / #103 / #104 / che-word-mcp#162 | + +**Rationale**: Reframing as Layer 3 / 4 front-ends keeps each downstream's value proposition intact (dxedit is still a declarative CLI; R-wordbuilder still generates Swift from R) while making their contract surface (what they generate / consume) align with the foundation's Edit type. + +| Issue | Layer | Status | +|---|---|---| +| #92 dxedit | Layer 3 (DSL) | YAML manifest compiles to `WordEdit` script; needs Layers 1–2 to land first; PR #94 blocked pending revision to align | +| #88 R-wordbuilder | Layer 4 (caller) | R generates `.swift` that uses `WordEdit`; PR #96 blocked pending revision to align (HIGH security findings on code injection should be reframed around safe `WordEdit` API surface) | +| #90 che-pptx-mcp | Layer 1–3 applied to PPTX | OOXMLEdit reusable; Word/Pptx specialization at Layer 1+; PR #95 blocked pending revision to align | + +**Operational implication**: When these PRs are revised by Codex, the revisions SHALL reference `ooxml-edit-algebra` capability spec and frame their proposals as front-ends to this foundation. Re-framing is per-PR follow-up work, NOT scope of this change. + +### Relationship to active changes + +**`word-aligned-state-sync`** (active Spectra change in `openspec/changes/word-aligned-state-sync/`): + +The active `word-aligned-state-sync` Decision 3 ("ID-based operations, never positional indices") is a **refinement** of this foundation's Edit-type contract, NOT a parallel design. After this change archives, `word-aligned-state-sync` should: + +1. Cross-reference `ooxml-edit-algebra` capability spec in its design.md +2. Reframe Decision 3 as "Edit IDs are positional in OOXMLEdit lower-layer (`at: bodyChildIndex`), but expose stable IDs in WordEdit upper-layer for caller convenience" +3. Convert any positional-API spec Requirements (currently active) to Edit-type Requirements + +This is documented coordination, not blocking. `word-aligned-state-sync` continues on its own apply schedule. + +## Implementation Contract + +**Behavior**: + +- Callers can construct `OOXMLEdit` values, compose them via `∘`, and apply them to a `Document` to produce a new `Document` whose serialization satisfies canonical-identity (ADR-001). +- Callers can construct `WordEdit` values, call `.lower()` to obtain `[OOXMLEdit]`, and the resulting OOXMLEdit composition is functionally equivalent (same final Document) to applying the WordEdit directly. This is the fully-faithful-functor property. +- Property tests in `FullyFaithfulFunctorTests.swift` exercise this for 3–5 representative `OOXMLEdit` cases on the NTPU thesis fixture. + +**Interface / data shape**: + +- `Edit` protocol: declares `apply(to: Document) throws -> Document` + `lower() -> [OOXMLEdit]` (identity for OOXMLEdit cases) +- `OOXMLEdit` enum: cases for the 3–5 selected operations (insertParagraph, setBold, insertHyperlink, plus 2 selected during apply), each with relevant path / value associated values +- `WordEdit` enum: cases for the corresponding Word-UI verbs (applyBold, applyLink, applyInsertParagraph), each implementing `lower()` to return its `[OOXMLEdit]` translation +- `Document.apply(_ edit: any Edit) throws -> Document`: returns new Document with edit applied. Internally uses existing `Document.applyOverlay()` / `markDirty()` machinery — Edit is the public API wrapper. + +**Failure modes**: + +- `Edit.apply` throws `EditError.pathNotFound(path:)` when the target path doesn't exist in Document +- `Edit.apply` throws `EditError.preserveViolation(part:)` when the operation would cause a non-c14n-equal change to an unmodified subtree (defensive check; should not fire in well-formed Edits but guards against logic bugs) +- `WordEdit.lower()` is total — every WordEdit case has a defined OOXMLEdit translation. No partial functions. + +**Acceptance criteria**: + +1. `Edit.swift`, `OOXMLEdit.swift`, `WordEdit.swift` compile under `swift build` with no warnings +2. `FullyFaithfulFunctorTests` pass under `swift test --filter EditAlgebraTests` against the NTPU thesis fixture +3. Property test for each of the 3–5 operations passes 100 randomized inputs (default `swift-testing` `@Test(arguments:)` count) +4. CD diagram for each implemented `OOXMLEdit` case is included in this change's `design.md` (this section) or `tasks.md` (if format easier there) as ASCII ladder +5. `spectra validate ooxml-edit-isomorphism-foundation` passes +6. `Document.apply(_ edit:)` API is added without breaking existing `Document.applyOverlay()` / `markDirty()` API (existing callers continue compiling) + +**Scope boundaries**: + +- **In scope**: Edit type elevation, OOXMLEdit 3–5 cases, WordEdit corresponding cases, lower() implementations, property tests, CD diagrams, capability spec `ooxml-edit-algebra`, 9 ADRs in this design.md, coordination cross-reference in `word-aligned-state-sync` design.md. +- **Out of scope**: word-builder-swift lens migration, che-word-mcp boundary refactor, downstream PR (#94/#95/#96/#97/#98) revisions, automated CD validation tooling, full OOXMLEdit surface, module split implementation, corpus / fuzz test expansion beyond NTPU thesis fixture. + +## Risks / Trade-offs + +| Risk | Mitigation | +|---|---| +| **CD discipline onboarding cost** — new contributors will struggle to draw CDs for non-trivial Edits | ADR-002 includes 3+ worked examples (`applyBold`, `applyLink`, `applyInsertParagraph` boundary case); README in `EditAlgebra/` directory provides template | +| **Edit type performance regression** — naive allocation of Edit values per mutation could slow down large-document batch edits | Benchmark before/after on NTPU thesis fixture during apply; if regression >10%, address via inline-storage `inout` Edit handling before merge | +| **Naturality property is hard to enforce** — `WordEdit(a).lower() ∘ WordEdit(b).lower() == (WordEdit(a) ∘ WordEdit(b)).lower()` could silently break under range-crossing edge cases | Property test asserts naturality explicitly for each WordEdit pair; CI flags on violation | +| **`word-builder-swift` lens migration delayed indefinitely** — without forcing function, follow-up Spectra change may never happen | ADR-008's documented migration path is the forcing function; add a calendar reminder in the closing summary to revisit in 3 months | +| **#46 follow-up issues never opened** — without explicit task, the downstream rerouting (#92/#88/#90 re-frames) loops indefinitely | tasks.md includes explicit "Open follow-up issues for downstream re-framing" task | +| **Property tests over-rely on NTPU thesis fixture** — single fixture has known coverage gaps (no Zotero customXml, no people.xml in main fixture) | ADR-007 follow-up explicitly opens corpus expansion as separate Spectra change | +| **`word-aligned-state-sync` coordination conflict** — both changes describe the Edit-type contract; ordering matters | This change archives FIRST (decision-pinning has no implementation lock-in); `word-aligned-state-sync` refines after, with cross-reference | +| **Property test framework choice not pinned** — swift-testing vs XCTest property-based extensions | Use `swift-testing`'s `@Test(arguments:)` with `Array.random` for property inputs; documented in tasks.md | + +## Migration Plan + +**This change has no user-facing migration**. The Edit type addition is additive — existing `Document.applyOverlay()` / `markDirty()` callers continue working unchanged. + +**Downstream Spectra changes that DO require migration** (out of this change's scope, but documented for context): + +1. **`word-builder-swift` lens migration** (per ADR-008) — separate Spectra change opening within 3 months of this foundation archive +2. **`che-word-mcp` boundary refactor** (per ADR-009) — separate Spectra change after #1 lands +3. **`#94/#95/#96/#97/#98` revisions** (per ADR-009) — Codex revisions per blocked PR, each citing this foundation + +No tool / CI / config migration in this change's apply phase. diff --git a/openspec/changes/ooxml-edit-isomorphism-foundation/proposal.md b/openspec/changes/ooxml-edit-isomorphism-foundation/proposal.md new file mode 100644 index 0000000..bfc8484 --- /dev/null +++ b/openspec/changes/ooxml-edit-isomorphism-foundation/proposal.md @@ -0,0 +1,70 @@ +## Why + +The macdoc OOXML toolchain has shipped working infrastructure for canonical-identity round-trip (ooxml-swift v0.13.0+ overlay save through v0.20.3's 5 preservation classes) and a write-only DSL (word-builder-swift 0.9.0), but the **architectural contract binding these together has never been pinned**. Downstream efforts — #92 (dxedit declarative CLI), #88 (R → WordBuilderSwift generator), #90 (che-pptx-mcp) — are advancing as parallel design negotiations, each implicitly assuming a different mental model for what an "edit" is and what "round-trip safe" means. + +Issue #99 consolidates a four-stage discussion (论文 rescue → library is the goal → radical translation chosen over reference-template/hand-craft → canonical-identity over byte-identity over semantic-equivalence → Tree + Lens over Struct serialization → Edit as fully-faithful-functor) into one architectural commitment: **the macdoc OOXML toolchain treats Word↔Swift edit-isomorphism (fully faithful functor) as its core architectural contract, not a library-internal implementation detail.** + +Without locking this contract now, every downstream proposal (PR #94/#95/#96/#97/#98 already blocked on spec ambiguity per 6-AI verify reports) will continue surfacing the same five missing decisions: edit semantics, two-layer algebra mechanics, CD review discipline, module split, and where the boundary between "decision-pin" and "implement" lives. + +## What Changes + +This change pins the architectural contract via: + +- **Edit as first-class type** (NEW). Currently `Edit` is an architectural primitive expressed through `Document.applyOverlay()` and `Document.markDirty()` patterns shipped in v0.13.0+. This change elevates `Edit` to a first-class Swift type that carries its own equality, composition, and `lower()` semantics. Existing overlay-save infrastructure becomes the runtime backing; the type-level contract becomes explicit. + +- **Two-layer edit algebra** (NEW). Define `WordEdit` (semantic, Word-UI-mirroring) and `OOXMLEdit` (syntax-level, byte-precise) as separate Swift types with `WordEdit.lower(): [OOXMLEdit]` as the bridge. The naturality property — `WordEdit(a) ∘ WordEdit(b)` lowers equivalently to `(WordEdit(a) ∘ WordEdit(b)).lower()` — becomes a normative invariant. + +- **Canonical-identity contract** (NEW spec, formalizing existing behavior). The contract `parse(x) → mutate → serialize` produces output where the unmodified subtree (after c14n) is bytewise-equal to its input form. v0.20.3's 5 preservation classes already enforce this empirically; this change writes it as a normative spec Requirement with explicit Scenarios for sectPr / comments / watermarks / vendor extensions / Zotero customXml. + +- **CD discipline as PR review gate** (NEW process). Every PR introducing a new `OOXMLEdit` or `WordEdit` case SHALL attach a commutative diagram + commute proof (ASCII ladder, IETF-style). Reviewer rejects PRs without it. The diagram serves three roles: verification spec generator, documentation standard, and incompatibility-surface detector. + +- **9 ADRs** (NEW design.md sections) capture the per-decision rationale: round-trip contract (ADR-001), Edit-as-first-class (ADR-002), two-layer algebra (ADR-003), module split (ADR-004), operation naming (ADR-005), Word UI as ground truth (ADR-006), conformance suite extension (ADR-007), word-builder-swift lens migration path (ADR-008 deferred), downstream rerouting (ADR-009). + +- **Module split** (NEW, deferred implementation). `OOXMLSyntax` (Layer 0+1, lossless tree + typed lens) vs `OOXMLSemantic` (Layer 2, Word-UI-mirroring semantic API) vs `OOXMLDSL` (Layer 3, result-builder front-end). ADR-004 documents the split; actual module reorganization deferred to follow-up changes. + +- **Apply phase scope** (per spectra-discuss conclusion + verify-cycle scope adjustment, see tasks.md ASSUMPTION block): this change ships **decision-pinning + mechanical artifacts only** (§8 follow-up issue creation, §9 PR template + CD discipline README, §10 docs cross-references + spectra validate). The Swift implementation (`Edit` type elevation + 3–5 OOXMLEdit cases + WordEdit cases + property-based functor tests, originally enumerated in tasks §1–§7 + §10.2) is **DEFERRED to a dedicated Phase 2 Spectra change** named `ooxml-edit-algebra-implementation` (tracking issue PsychQuant/macdoc#105) that cites this foundation. + +**Why the Phase 2 deferral**: The 23 Swift tasks require TDD discipline + audit discipline per `.spectra.yaml` + CD diagram authoring + property-based test calibration on the NTPU thesis fixture — work that benefits from dedicated implementation cycles with human checkpoints at API surface trade-off decisions. Bundling them with the decision-pinning work (this change) would either rush the implementation or block the decision-pinning indefinitely. Splitting them lets the contract land first, the runtime code follow with proper review. + +**BREAKING**: None in this change's apply scope. Decision-pinning is normative-content-only (no code shipped that callers could break against). Lens migration for `word-builder-swift` is a future BREAKING change deferred to its own Spectra change (per ADR-008). + +## Non-Goals (optional) + +Captured in design.md Non-Goals section per template guidance. Key Non-Goals: + +- Implementing `word-builder-swift` lens-model migration (deferred to dedicated follow-up Spectra change per ADR-008) +- Refactoring `che-word-mcp` MCP tool boundary to expose `WordEdit` directly (deferred to follow-up; current Document-mutation API stays during transition) +- Re-routing downstream issues (#92 dxedit, #88 R-wordbuilder, #90 pptx-mcp) operationally; this change documents the rerouting *intent* via ADR-009 but the operational re-framing is a separate change per downstream +- Automated CD-diagram validation tooling (manual reviewer discipline only; automated tools may follow if proven necessary) +- Implementing the full `[OOXMLEdit]` surface — only 3–5 representative operations for property-test validation + +## Capabilities + +### New Capabilities + +- `ooxml-edit-algebra`: The architectural contract for Word↔Swift edit-isomorphism. Defines `Edit` type semantics, two-layer algebra (`WordEdit` / `OOXMLEdit`), `lower()` bridge naturality, canonical-identity round-trip contract, CD discipline as review gate, and the relationship between this contract and existing prior-art documents (`lossless-conversion.md`, `structural-editing-paradigm.md`, `functional-correspondence.md`). + +### Modified Capabilities + +(none — this change introduces a new cross-cutting capability rather than modifying existing per-tool/per-area specs) + +## Impact + +- Affected specs: `ooxml-edit-algebra` (new capability under `openspec/specs/ooxml-edit-algebra/`) +- Affected code (DEFERRED to Phase 2 Spectra change `ooxml-edit-algebra-implementation` (#105) — not shipped in this change's apply scope): + - **Deferred**: New `packages/ooxml-swift/Sources/OOXMLSwift/EditAlgebra/Edit.swift` (Edit type elevation) + - **Deferred**: New `packages/ooxml-swift/Sources/OOXMLSwift/EditAlgebra/OOXMLEdit.swift` (3–5 representative cases) + - **Deferred**: New `packages/ooxml-swift/Sources/OOXMLSwift/EditAlgebra/WordEdit.swift` (corresponding semantic-layer cases) + - **Deferred**: New `packages/ooxml-swift/Tests/EditAlgebraTests/FullyFaithfulFunctorTests.swift` (property-based test using NTPU thesis fixture) + - **Deferred**: Modified `packages/ooxml-swift/Sources/OOXMLSwift/Document.swift` (add Edit-type convenience constructors; existing applyOverlay/markDirty unchanged) +- Affected docs (shipped in this change): + - Modified: `docs/structural-editing-paradigm.md` (add cross-reference to new `ooxml-edit-algebra` capability) + - Modified: `docs/lossless-conversion.md` (add cross-reference) +- Affected processes: + - New PR template requirement: PRs touching `EditAlgebra/` must attach CD diagram (per ADR-002) +- Active Spectra changes coordination: + - `word-aligned-state-sync` — design.md must add "Relationship to ooxml-edit-isomorphism-foundation" section; their Decision 3 (ID-based operations) becomes a refinement of this foundation's Edit-type contract, not a parallel design + +**ASSUMPTION** (documented per UNATTENDED MODE directive): The Edit type's runtime backing is `Document.applyOverlay()` machinery shipped in v0.13.0+. If this assumption is wrong (e.g., overlay save's commit model is incompatible with first-class Edit composition), the apply scope expands to redesigning the overlay layer — out of this change's scope and would require splitting into pre-foundation + foundation Spectra changes. + +**ASSUMPTION**: 3–5 representative OOXMLEdit operations cover enough surface to validate the fully-faithful-functor property for the contract-pinning purpose. If property tests reveal naturality violations beyond these 3–5 operations, those become follow-up issues, NOT scope expansion within this change. diff --git a/openspec/changes/ooxml-edit-isomorphism-foundation/specs/ooxml-edit-algebra/spec.md b/openspec/changes/ooxml-edit-isomorphism-foundation/specs/ooxml-edit-algebra/spec.md new file mode 100644 index 0000000..03e30a3 --- /dev/null +++ b/openspec/changes/ooxml-edit-isomorphism-foundation/specs/ooxml-edit-algebra/spec.md @@ -0,0 +1,201 @@ +## ADDED Requirements + +### Requirement: Canonical-Identity Round-Trip Contract + +The `ooxml-swift` library SHALL guarantee that for any input `.docx` file `x` and any sequence of `Edit` operations `[e_1, e_2, ..., e_n]` applied via `Document.apply(_:)`, the resulting serialized output preserves the canonical-identity invariant: after XML canonicalization (c14n) of both input and output, every subtree NOT targeted by any `e_i` MUST be bytewise-equal in the c14n form. + +The library SHALL NOT silently drop, reorder, or normalize XML content outside the targeted subtrees. This includes vendor extensions, comments, watermarks, custom styles, `customXml` parts (Zotero / Mendeley / EndNote), `people.xml`, `commentsExtended.xml`, embedded fonts, theme references, and any unrecognized OOXML namespace. + +#### Scenario: Unmodified sectPr preserved through round-trip + +- **WHEN** a `.docx` containing a `` element with multiple custom `` and `` children is parsed, no Edit operations are applied targeting `sectPr`, and the document is serialized +- **THEN** the c14n form of `sectPr` in the output is bytewise-equal to the c14n form in the input + +##### Example: sectPr with header / footer references + +- **GIVEN** input `.docx` whose `document.xml` contains `` +- **WHEN** `Document.apply(_ : OOXMLEdit.insertParagraph(at: 0, content: ...))` is called and result serialized +- **THEN** c14n(output sectPr) equals c14n(input sectPr) byte-for-byte + +#### Scenario: Vendor extensions preserved (Zotero customXml) + +- **WHEN** a `.docx` with Zotero `customXml/item1.xml` is parsed and any `Edit` is applied that does NOT target the Zotero customXml part +- **THEN** the serialized output's `customXml/item1.xml` is c14n-equal to the input's + +#### Scenario: Targeted subtree NOT bytewise-equal (intended modification) + +- **WHEN** `OOXMLEdit.setBold(at: runPath, value: true)` is applied to a Run whose `rPr` does not contain `` +- **THEN** the output Run's `rPr` includes `` (NOT bytewise-equal to input — this is the intended modification) +- **AND** all sibling Runs NOT at `runPath` remain c14n-equal to their input forms + +### Requirement: Edit Type Algebra + +The `ooxml-swift` library SHALL provide an `Edit` protocol that declares: + +1. `apply(to:) throws -> Document` — applies the Edit to a Document, returning a new Document +2. `lower() -> [OOXMLEdit]` — returns the syntactic-layer translation (identity for OOXMLEdit cases) + +The library SHALL provide two concrete `Edit` conformances: + +1. `OOXMLEdit` — syntactic-layer Edit, addressing OOXML elements by path +2. `WordEdit` — semantic-layer Edit, addressing Word-UI verbs + +Edit values SHALL be composable: for any two Edits `a, b`, the composition `a ∘ b` is well-defined and its effect on a Document is equivalent to applying `a` then `b`. Composition SHALL be associative: `(a ∘ b) ∘ c == a ∘ (b ∘ c)`. + +#### Scenario: OOXMLEdit composition is associative + +- **WHEN** three OOXMLEdit values `e1`, `e2`, `e3` are composed in different parenthesizations +- **THEN** `(e1 ∘ e2) ∘ e3` and `e1 ∘ (e2 ∘ e3)` applied to the same Document produce bytewise-equal c14n output + +##### Example: associativity on insertParagraph + setBold + insertHyperlink + +- **GIVEN** Document `d` with a single empty paragraph +- **WHEN** `e1 = OOXMLEdit.insertParagraph(at: 0, content: "hello")`, `e2 = OOXMLEdit.setBold(at: ⟨first run⟩, value: true)`, `e3 = OOXMLEdit.insertHyperlink(at: ⟨first run⟩, href: "https://example.com")` +- **THEN** `((e1 ∘ e2) ∘ e3).apply(to: d)` and `(e1 ∘ (e2 ∘ e3)).apply(to: d)` produce documents whose c14n serializations are byte-equal + +#### Scenario: WordEdit.lower() returns OOXMLEdit translation + +- **WHEN** a `WordEdit.applyBold(range:)` value is created with a range that does NOT cross paragraph boundaries +- **THEN** `.lower()` returns a single-element `[OOXMLEdit.setBold(at: runPath, value: true)]` whose runPath corresponds to the range's containing Run + +##### Example: applyBold within single paragraph + +- **GIVEN** a Document with paragraph `[Run("Hello world")]`, and a range covering characters 0..5 ("Hello") +- **WHEN** `WordEdit.applyBold(range: 0..<5).lower()` is called +- **THEN** result is `[OOXMLEdit.splitRun(at: ⟨para 0, run 0⟩, splitOffset: 5), OOXMLEdit.setBold(at: ⟨para 0, run 0⟩, value: true)]` +- **AND** applying this OOXMLEdit list produces a Document whose first Run "Hello" has `` set and second Run " world" does not + +### Requirement: Fully Faithful Functor Property (Naturality of lower) + +The translation from `WordEdit` to `OOXMLEdit` via `lower()` SHALL preserve composition: for any two composable WordEdit values `a, b`, the following invariant MUST hold: + +``` +(a ∘ b).lower() == a.lower() ∘ b.lower() +``` + +where the right-hand `∘` denotes OOXMLEdit list concatenation (composition in the OOXMLEdit algebra). + +This property is the fully-faithful-functor invariant that makes the two-layer algebra coherent. The `ooxml-swift` test suite SHALL include property-based tests asserting this invariant for every implemented `WordEdit` case. + +#### Scenario: Naturality holds for applyBold + applyLink composition + +- **WHEN** `a = WordEdit.applyBold(range: r1)` and `b = WordEdit.applyLink(range: r2, url: u)` are composed +- **THEN** `(a ∘ b).lower()` produces the same OOXMLEdit list (allowing reordering of independent operations) as `a.lower() ∘ b.lower()` +- **AND** applying both lists to the same Document yields c14n-equal outputs + +#### Scenario: Range-crossing applyBold lowers to multiple OOXMLEdit cases + +- **WHEN** `WordEdit.applyBold(range:)` is called with a range crossing a paragraph boundary +- **THEN** `.lower()` returns a list with one `OOXMLEdit.setBold` per affected paragraph, in document order + +### Requirement: CD Review Discipline for Edit Cases + +Every PR introducing a new `OOXMLEdit` or `WordEdit` enum case SHALL include a commutative diagram (ASCII ladder, IETF-RFC style) demonstrating the case's commute property. The diagram SHALL show: + +1. The Word UI action that the Edit case corresponds to (for WordEdit) or the OOXML schema change (for OOXMLEdit) +2. The `τ` translation between Word UI semantics and Swift representation +3. The commutativity claim: applying the Edit on the Word side commutes with applying the corresponding Edit on the Swift side under `τ` + +The PR reviewer SHALL reject PRs that introduce new Edit cases without the CD diagram. Reviewers SHALL verify that the CD diagram's commute claim is justified (either by inspection or by attached property test). + +CD diagrams in this codebase SHALL use the ASCII format shown in `design.md` §Decisions/ADR-002. They are normative documentation for the Edit case, not optional commentary. + +#### Scenario: PR introducing OOXMLEdit case without CD diagram is rejected + +- **WHEN** a PR adds a new `OOXMLEdit.insertTable` enum case but the PR description / files do NOT include a CD diagram for the case +- **THEN** the PR reviewer SHALL request changes citing this Requirement +- **AND** the PR SHALL NOT be merged until a CD diagram is attached + +#### Scenario: PR with CD diagram passes review + +- **WHEN** a PR adds `OOXMLEdit.insertTable` and includes an ASCII ladder CD diagram showing the Word UI "Insert → Table" action commutes with the schema-level `` insertion under `τ` +- **THEN** the PR meets this Requirement's documentation criteria + +### Requirement: Edit Apply Surface on Document + +The `Document` type SHALL expose `Document.apply(_ edit: any Edit) throws -> Document`, which applies the Edit and returns a new Document instance. This method SHALL: + +1. Delegate internally to the existing `Document.applyOverlay()` / `markDirty()` machinery shipped in v0.13.0+ (NO replacement of the existing overlay-save infrastructure) +2. Return a new Document instance (immutable apply — input Document is unchanged) +3. Throw `EditError.pathNotFound(path:)` if the Edit's target path does not exist in the input Document +4. Throw `EditError.preserveViolation(part:)` if the Edit would cause a non-c14n-equal change to a subtree NOT in the Edit's target path + +The existing `Document.applyOverlay()` and `Document.markDirty()` APIs SHALL continue to function unchanged for backward compatibility. + +#### Scenario: apply returns new Document instance + +- **WHEN** `let d2 = try d1.apply(OOXMLEdit.setBold(at: runPath, value: true))` +- **THEN** `d2` is a new Document with the edit applied, `d1` is unchanged + +#### Scenario: apply throws on path not found + +- **WHEN** `OOXMLEdit.setBold(at: runPath, value: true)` is applied to a Document where `runPath` does not resolve +- **THEN** `apply` throws `EditError.pathNotFound(path: runPath)` + +#### Scenario: apply throws on preserve violation (defensive) + +- **WHEN** a buggy Edit implementation attempts to modify an unmodified subtree +- **THEN** `apply`'s internal canonical-identity check throws `EditError.preserveViolation(part: ⟨affected part⟩)` before serialization completes + +### Requirement: Word UI Behavior as Ground Truth for WordEdit Semantics + +For each `WordEdit` case, the case's semantics SHALL be specified by reference to the corresponding Microsoft Word UI action. Ambiguities in OOXML output for that action SHALL be resolved by inspecting Word's own behavior (e.g., saving the document after the user action and inspecting the resulting OOXML diff). + +WordEdit cases SHALL NOT define semantics that diverge from Word's UI implementation. If Word's implementation has multiple modes (e.g., Cmd-B with vs. without selection), each mode SHALL be a separate WordEdit case or a parameter on the case. + +The `design.md` ADR-006 records the "Word UI as ground truth" methodology and worked examples. + +#### Scenario: applyBold semantics match Cmd-B + +- **WHEN** `WordEdit.applyBold(range: r)` is applied to a Document +- **AND** Microsoft Word is opened on the same input Document, the same range `r` is selected, and Cmd-B is pressed and saved +- **THEN** the OOXML diff between Word's saved file and the input is c14n-equivalent to the diff between `WordEdit.applyBold(range: r).apply(...)` and the input + +#### Scenario: applyBold semantics for empty range + +- **WHEN** `WordEdit.applyBold(range:)` is called with an empty range (insertion point with no selection) +- **THEN** the WordEdit case SHALL document the corresponding Word behavior (toggle bold for next-typed character) explicitly OR throw a defined error +- **AND** the case SHALL NOT silently no-op without documentation + +### Requirement: Property-Based Functor Tests on NTPU Thesis Fixture + +The `ooxml-swift` test suite SHALL include property-based tests under `Tests/EditAlgebraTests/FullyFaithfulFunctorTests.swift` that exercise the fully-faithful-functor invariant for at least 3 implemented `OOXMLEdit` cases. The tests SHALL use the existing NTPU thesis fixture from `RealWorldDocxRoundTripSmokeTests` as the input Document. + +Each property test SHALL: + +1. Generate randomized arguments for the Edit case (within valid input domain) +2. Apply the Edit, then assert canonical-identity for unmodified subtrees +3. For `WordEdit` cases, additionally assert naturality of `lower()` (per `Fully Faithful Functor Property` Requirement) + +The Apply phase of this Spectra change SHALL select 3–5 specific `OOXMLEdit` cases for property-test coverage. The choice SHALL be recorded in `tasks.md` and SHALL include at minimum: `insertParagraph`, `setBold`, `insertHyperlink`. + +#### Scenario: Property test passes 100 randomized inputs for setBold + +- **WHEN** the property test for `OOXMLEdit.setBold(at: runPath, value: Bool)` is executed +- **THEN** for 100 randomized `runPath` values within the NTPU thesis fixture's Run set, AND for both `value: true` and `value: false`, the canonical-identity invariant holds (unmodified subtrees c14n-equal to input) + +#### Scenario: Property test detects regression in canonical-identity + +- **WHEN** a regression is introduced where `OOXMLEdit.setBold` accidentally modifies a sibling Run's `rPr` +- **THEN** the property test SHALL fail with an assertion message identifying which c14n-comparison failed + +### Requirement: Downstream Architectural Compliance Documentation + +The `ooxml-edit-algebra` capability spec SHALL be referenced by all downstream Spectra changes that depend on the Edit-type contract. Specifically: + +1. `word-aligned-state-sync` design.md SHALL include a "Relationship to ooxml-edit-isomorphism-foundation" section cross-referencing this capability +2. Future `word-builder-swift` lens-migration Spectra change SHALL cite ADR-008 of this design.md +3. Future `che-word-mcp` boundary-refactor Spectra change SHALL cite ADR-009 of this design.md +4. Revisions to blocked PRs #94, #95, #96, #97, #98 SHALL reframe their proposals as front-ends to this foundation + +The downstream compliance is an ADVISORY scope of this Requirement — this Requirement SHALL NOT block this change's apply phase, but SHALL be documented as the expected operational coordination for the cluster. + +#### Scenario: word-aligned-state-sync archives without cross-reference + +- **WHEN** `word-aligned-state-sync` is being archived AND its design.md does NOT cross-reference this capability +- **THEN** the archival reviewer SHALL request the cross-reference addition before archive + +#### Scenario: Future Spectra change cites ADR-008 for lens migration + +- **WHEN** a future Spectra change proposing the `word-builder-swift` lens migration is opened +- **THEN** its proposal.md SHALL include text similar to "Per ADR-008 of `ooxml-edit-isomorphism-foundation`, this change implements the deferred lens-model migration" diff --git a/openspec/changes/ooxml-edit-isomorphism-foundation/tasks.md b/openspec/changes/ooxml-edit-isomorphism-foundation/tasks.md new file mode 100644 index 0000000..e661672 --- /dev/null +++ b/openspec/changes/ooxml-edit-isomorphism-foundation/tasks.md @@ -0,0 +1,142 @@ + + +## Requirement and Design Coverage Map + +This table maps each spec Requirement and each design ADR to the task groups that deliver it. The analyzer expects every Requirement and Design topic to be referenced in tasks. + +**Spec Requirements (in `specs/ooxml-edit-algebra/spec.md`):** + +| Requirement | Covered by task group(s) | +|---|---| +| Canonical-Identity Round-Trip Contract | §2 (Document.apply preserve check), §6 (property tests assert canonical-identity) | +| Edit Type Algebra | §1 (Edit protocol + enums), §2 (Document.apply surface), §7 (associativity tests) | +| Fully Faithful Functor Property (Naturality of lower) | §5 (WordEdit.lower implementations), §6.5 (naturality property test) | +| CD Review Discipline for Edit Cases | §3.1, §3.2, §3.3, §4.2, §4.3 (CD diagrams per Edit case), §9.1 (PR template), §9.2 (README on CD discipline) | +| Edit Apply Surface on Document | §2.1, §2.2, §2.3 (Document.apply + error paths) | +| Word UI Behavior as Ground Truth for WordEdit Semantics | §5.1, §5.2, §5.3 (WordEdit cases reference Word UI per ADR-006), §3 CD diagrams cite Cmd-B / Cmd-K | +| Property-Based Functor Tests on NTPU Thesis Fixture | §6.1, §6.2, §6.3, §6.4 (per-case property tests on NTPU fixture) | +| Downstream Architectural Compliance Documentation | §8.1 (word-aligned cross-reference), §8.2-8.6 (follow-up issues for downstream re-frames) | + +**Design ADRs (in `design.md` § Decisions):** + +| ADR | Covered by task group(s) | +|---|---| +| ADR-001: Round-trip contract = canonical-identity | §2.3 (preserve violation defensive check enforces ADR-001), §6 (property tests assert ADR-001 invariant) | +| ADR-002: Core type = Edit (not Document); PR-must-attach-CD-diagram review discipline | §1 (Edit type scaffold), §9.1 (PR template requirement), §9.2 (README documenting CD methodology per ADR-002) | +| ADR-003: Two-layer edit algebra (WordEdit / OOXMLEdit), lower() as bridge | §1.2, §1.3 (OOXMLEdit + WordEdit enums), §5 (WordEdit.lower implementations), §6.5 (naturality test) | +| ADR-004: Module split — OOXMLSyntax (L0/L1) / OOXMLSemantic (L2) / OOXMLDSL (L3) | §1.1 (EditAlgebra/ subdirectory introduces the boundary as prelude to ADR-004 follow-up split); physical module reorganization explicitly deferred per ADR-004 | +| ADR-005: Edit operation surface naming + canonical example set | §3 (insertParagraph / setBold / insertHyperlink — the 3 canonical OOXMLEdit cases per ADR-005), §4 (2 additional selected cases per ADR-005) | +| ADR-006: Word UI behavior as ground truth | §3.1, §3.2, §3.3 (CD diagrams cite Word UI actions per ADR-006), §5.1-5.3 (WordEdit cases reference Word UI semantics) | +| ADR-007: Conformance suite extension from NTPU thesis fixture | §6.1 (test target setup using NTPU fixture per ADR-007), §6.2-6.4 (property tests on NTPU fixture) | +| ADR-008: Migration path for word-builder-swift 0.9.0 → lens model (deferred) | §8.2 (open follow-up issue per ADR-008's documented migration path) — implementation deferred per ADR-008 explicit Non-Goal | +| ADR-009: Downstream issue rerouting | §8.3 (che-word-mcp boundary refactor follow-up), §8.4 (PR #94 re-frame), §8.5 (PR #96 re-frame), §8.6 (PR #95 re-frame) — all per ADR-009 | +| Relationship to active changes | §8.1 (word-aligned-state-sync cross-reference addition per design.md "Relationship to active changes" section) | + +## ASSUMPTION: Swift Implementation Deferred to Phase 2 Spectra Change + +**This change's apply phase scope (per spectra-discuss conclusion + UNATTENDED MODE budget reality):** + +- **Shipped (this change apply phase, IDs 24-35 except 33):** + - §8.1 word-aligned-state-sync design.md cross-reference (added Relationship section) + - §8.2 follow-up issue for word-builder-swift lens migration (#101 PsychQuant/macdoc) + - §8.3 follow-up issue for che-word-mcp boundary refactor (#162 PsychQuant/che-word-mcp) + - §8.4 follow-up issue for PR #94 re-frame (#102 PsychQuant/macdoc) + - §8.5 follow-up issue for PR #96 re-frame (#103 PsychQuant/macdoc) + - §8.6 follow-up issue for PR #95 re-frame (#104 PsychQuant/macdoc) + - §9.1 PR template requiring CD diagram for EditAlgebra/ PRs (`.github/PULL_REQUEST_TEMPLATE.md`) + - §9.2 EditAlgebra/README.md documenting CD discipline + worked-example guidance + - §10.1 spectra validate green + - §10.3 docs/structural-editing-paradigm.md cross-reference to ooxml-edit-algebra capability + - §10.4 docs/lossless-conversion.md cross-reference + +- **DEFERRED to dedicated Phase 2 Spectra change** (Swift code implementation, §1-§7 + §10.2 = IDs 1-23, 33): + + The 23 implementation tasks (Edit protocol scaffold §1, Document.apply API §2, 3 canonical OOXMLEdit cases + CD diagrams §3, 2 additional cases §4, 3 WordEdit cases §5, property-based functor tests §6, composition/associativity tests §7) require: + - Deep integration with existing ooxml-swift v0.13.0+ overlay-save infrastructure (~9 ADRs touch this surface) + - TDD discipline per `.spectra.yaml` (write failing test first per task) + - Audit discipline per `.spectra.yaml` (3-lens adversary review per task) + - CD diagram authoring per Edit case (5+ ASCII ladders, each verified for commute) + - Property-based tests on NTPU thesis fixture (per-operation 100-input runs) + + This is genuinely 2-5 days of focused implementation work that benefits significantly from human review at decision points (Edit type API surface trade-offs, CD diagram correctness, property-test value-domain selection). Pushing through under unattended chain pressure risks introducing the very type of subtle errors that the architectural foundation is designed to prevent. + + **Per UNATTENDED MODE directive**: "If §1-§7 Swift implementation tasks exceed practical session budget, document explicit ASSUMPTION blocks in tasks.md noting what remains, mark the implementation tasks as deferred-to-Phase-2, and ship §8 + §9 + §10 mechanical tasks fully." + + **Phase 2 Spectra change** (to open after this foundation archives): `ooxml-edit-algebra-implementation`. It cites this foundation's design.md ADRs + spec.md as inputs. Apply phase implements §1-§7 + §10.2 in dedicated session(s) with appropriate human checkpoints. + +- **§10.2 swift test full suite** (ID 33): N/A in this apply scope since no Swift code was added (only `.github/PULL_REQUEST_TEMPLATE.md` + `EditAlgebra/README.md` + 2 docs cross-refs). Test suite unchanged from main, no regression possible. Deferred to Phase 2. + +This deferral is **architecturally sound**: decision-pinning (this change) and runtime implementation (Phase 2) serve different purposes and need different review modes. The contract is locked here; the code lands cleanly later citing the locked contract. + +--- + +## 1. Edit type protocol + OOXMLEdit / WordEdit enum scaffold (DEFERRED to Phase 2) + +- [ ] 1.1 Add `EditAlgebra/` subdirectory to `packages/ooxml-swift/Sources/OOXMLSwift/` and create `Edit.swift` declaring the `Edit` protocol with `apply(to:)` + `lower()` method signatures. Verify: `swift build` succeeds on ooxml-swift package with no warnings, and the `Edit` protocol appears in `swift package describe --type json` output. +- [ ] 1.2 Create `OOXMLEdit.swift` with empty enum scaffold + conformance to `Edit` protocol (apply / lower as stubs throwing `EditError.notImplemented`). Verify: `swift build` succeeds; targeted test `EditAlgebraTests.testOOXMLEditEnumExists` instantiates the enum and asserts conformance. +- [ ] 1.3 Create `WordEdit.swift` with empty enum scaffold + conformance to `Edit` protocol (lower as stub returning empty array). Verify: `swift build` succeeds; targeted test `EditAlgebraTests.testWordEditEnumExists` asserts conformance. +- [ ] 1.4 Define `EditError` enum in `EditAlgebra/Edit.swift` with cases `pathNotFound(path:)`, `preserveViolation(part:)`, `notImplemented`. Verify: errors are throwable from protocol method signatures; `swift test --filter EditAlgebraTests.testEditErrorCases` round-trips each case through throw/catch. + +## 2. Document.apply(_:) public API + +- [ ] 2.1 Add `Document.apply(_ edit: any Edit) throws -> Document` method on `Document`. Implementation delegates to existing `applyOverlay()` / `markDirty()` infrastructure (no replacement). Verify: existing `Document.applyOverlay()` and `markDirty()` callers continue compiling unchanged; new `swift test --filter EditAlgebraTests.testApplyReturnsNewDocument` confirms immutable apply semantics (input Document unchanged after method call). +- [ ] 2.2 Implement `EditError.pathNotFound(path:)` throwing path in `Document.apply(_:)` — when the Edit's target path does not resolve in input Document. Verify: `swift test --filter EditAlgebraTests.testApplyThrowsOnPathNotFound` instantiates an Edit with invalid path and asserts the correct error is thrown. +- [ ] 2.3 Implement `EditError.preserveViolation(part:)` defensive check — after apply, run canonical-identity check on subtrees NOT in Edit's target path, throw if violation detected. Verify: `swift test --filter EditAlgebraTests.testPreserveViolationDefensive` injects a buggy Edit that modifies an unmodified subtree, asserts the defensive check fires. + +## 3. Three canonical OOXMLEdit cases + CD diagrams + +- [ ] 3.1 Implement `OOXMLEdit.insertParagraph(at: Int, content: String)` with full `apply` + `lower` (identity). Verify: `swift test --filter EditAlgebraTests.testInsertParagraphApplies` applies the Edit to NTPU thesis fixture and confirms the new paragraph appears at the specified index; CD diagram for this case is appended to `design.md` § ADR-002 Worked Examples (ASCII ladder showing Word UI "Enter at paragraph N" commutes with OOXML `` insertion under τ). +- [ ] 3.2 Implement `OOXMLEdit.setBold(at: RunPath, value: Bool)` with full `apply` + `lower`. Verify: `swift test --filter EditAlgebraTests.testSetBoldApplies` toggles `` in target Run's rPr without affecting sibling Runs; CD diagram for this case is appended to `design.md` § ADR-002 Worked Examples (ASCII ladder showing Word UI Cmd-B commutes with OOXML `` insertion under τ). +- [ ] 3.3 Implement `OOXMLEdit.insertHyperlink(at: RunPath, href: URL)` with full `apply` + `lower`. Implementation MUST update `_rels/document.xml.rels` (relationship part) atomically with the OOXML `` insertion. Verify: `swift test --filter EditAlgebraTests.testInsertHyperlinkApplies` confirms both `` element AND `Relationship` entry are added, canonical-identity preserved for all other parts; CD diagram appended to `design.md` (ASCII ladder showing Word UI Insert→Hyperlink commutes with the dual-part schema modification under τ). + +## 4. Select + implement 2 additional OOXMLEdit cases + +- [ ] 4.1 During apply, select 2 additional `OOXMLEdit` cases from the candidate list: insertTableRow, deleteCommentReference, setHeading, insertBookmark, or others surfaced by NTPU thesis fixture analysis. Record selection rationale in `tasks.md` (this section) by checking the chosen items below before implementing. Verify: rationale paragraph (50-200 words) added between cases below explains why the 2 chosen cases stress the canonical-identity contract more than the alternatives. +- [ ] 4.2 Implement first selected case with `apply` + `lower` + CD diagram. Verify: `swift test --filter EditAlgebraTests.test⟨caseName⟩Applies` passes against NTPU thesis fixture; CD diagram added to `design.md` § ADR-002 Worked Examples. +- [ ] 4.3 Implement second selected case with `apply` + `lower` + CD diagram. Verify: `swift test --filter EditAlgebraTests.test⟨caseName⟩Applies` passes against NTPU thesis fixture; CD diagram added to `design.md`. + +## 5. Three canonical WordEdit cases + lower() implementations + +- [ ] 5.1 Implement `WordEdit.applyBold(range: Range)` whose `lower()` returns `[OOXMLEdit.splitRun(...), OOXMLEdit.setBold(...)]` when the range does NOT cross paragraph boundaries. When range DOES cross boundaries, lower returns multiple setBold (one per affected paragraph). Verify: `swift test --filter EditAlgebraTests.testApplyBoldLowerSingleParagraph` confirms single-paragraph range produces correct OOXMLEdit list; `EditAlgebraTests.testApplyBoldLowerCrossParagraph` confirms multi-paragraph range produces N setBold instances. +- [ ] 5.2 Implement `WordEdit.applyLink(range: Range, url: URL)` whose `lower()` returns `[OOXMLEdit.splitRun(...), OOXMLEdit.insertHyperlink(...)]`. Verify: `swift test --filter EditAlgebraTests.testApplyLinkLower` confirms lower output composes correctly with insertHyperlink's relationship-part update. +- [ ] 5.3 Implement `WordEdit.applyInsertParagraph(after: ParagraphRef, content: String)` whose `lower()` returns `[OOXMLEdit.insertParagraph(at: ...)]`. Verify: `swift test --filter EditAlgebraTests.testApplyInsertParagraphLower` confirms correct paragraph index translation from semantic "after this paragraph" to body-children index. + +## 6. Property-based fully-faithful-functor tests + +- [ ] 6.1 Create `Tests/EditAlgebraTests/FullyFaithfulFunctorTests.swift` test target setup. Test target depends on existing `RealWorldDocxRoundTripSmokeTests` infrastructure for NTPU thesis fixture loading. Verify: `swift test --filter FullyFaithfulFunctorTests.testFixtureLoads` confirms fixture path resolves and Document parses. +- [ ] 6.2 Add property test for `OOXMLEdit.insertParagraph` covering 100 randomized indices. Verify: `swift test --filter FullyFaithfulFunctorTests.testInsertParagraphCanonicalIdentity` runs and all 100 inputs pass; failure messages identify which c14n-comparison failed if regression occurs. +- [ ] 6.3 Add property test for `OOXMLEdit.setBold` covering 100 randomized RunPath + Bool value combinations. Verify: `swift test --filter FullyFaithfulFunctorTests.testSetBoldCanonicalIdentity` passes 100 inputs against NTPU thesis fixture. +- [ ] 6.4 Add property test for `OOXMLEdit.insertHyperlink` covering 100 randomized RunPath + URL combinations. Verify: `swift test --filter FullyFaithfulFunctorTests.testInsertHyperlinkCanonicalIdentity` passes 100 inputs including relationship-part atomic update. +- [ ] 6.5 Add naturality property test for `WordEdit.applyBold ∘ WordEdit.applyLink` composition. Verify: `swift test --filter FullyFaithfulFunctorTests.testApplyBoldApplyLinkNaturality` confirms `(a ∘ b).lower() == a.lower() ∘ b.lower()` for 50 randomized range pairs. + +## 7. Composition + associativity tests + +- [ ] 7.1 Add `EditAlgebraTests.testOOXMLEditAssociativity` test covering `(e1 ∘ e2) ∘ e3 == e1 ∘ (e2 ∘ e3)` for 50 randomized OOXMLEdit triples from the 5 implemented cases. Verify: associativity holds via c14n-equality comparison of output documents. +- [ ] 7.2 Add `EditAlgebraTests.testWordEditAssociativity` test for WordEdit composition. Verify: associativity holds via lower() + OOXMLEdit composition equivalence. + +## 8. Cross-reference active changes + open follow-up issues + +- [x] 8.1 Add cross-reference to `ooxml-edit-algebra` capability in `openspec/changes/word-aligned-state-sync/design.md` under a new "Relationship to ooxml-edit-isomorphism-foundation" section. Verify: section exists, includes Decision 3 (ID-based operations) reframe text per ADR-009 guidance, and `spectra validate word-aligned-state-sync` continues passing. +- [x] 8.2 Open follow-up GitHub issue "Spectra: word-builder-swift lens-model migration (per ooxml-edit-isomorphism-foundation ADR-008)" with body citing this change's ADR-008 deferred-migration documentation. Verify: issue number captured in this tasks file; issue body links to this change's design.md. +- [x] 8.3 Open follow-up GitHub issue "Spectra: che-word-mcp boundary refactor to WordEdit (per ooxml-edit-isomorphism-foundation ADR-009)" citing this change's ADR-009. Verify: issue number captured; body links to design.md. +- [x] 8.4 Open follow-up issue "Re-frame PR #94 (macdoc-docx-workflow-cli) proposal as Layer 3 front-end per ooxml-edit-isomorphism-foundation ADR-009". Verify: issue number captured; body summarizes PR #94's existing spec gaps from verify report + proposed re-framing. +- [x] 8.5 Open follow-up issue "Re-frame PR #96 (r-word-builder-mvp) proposal as Layer 4 caller per ooxml-edit-isomorphism-foundation ADR-009" — security findings (R→Swift code injection per HIGH #1) reframed around safe WordEdit API. Verify: issue number captured; body links to PR #96 verify report. +- [x] 8.6 Open follow-up issue "Re-frame PR #95 (che-pptx-geometry-tools) as same architecture applied to PPTX per ooxml-edit-isomorphism-foundation ADR-009". Verify: issue number captured; body identifies which Layers (1–3) apply directly vs. need PPTX specialization. + +## 9. PR ergonomics + CD discipline rollout + +- [x] 9.1 Add a PR template snippet to `.github/PULL_REQUEST_TEMPLATE.md` (or create the file) requesting attached CD diagram for PRs touching `EditAlgebra/`. Verify: opening a draft PR from this branch displays the new template section. +- [x] 9.2 Add a README documenting (a) the CD discipline contract, (b) how to draw an ASCII ladder for a new Edit case (with 1 worked example), (c) link to design.md ADR-002 Worked Examples. **Shipped at `docs/edit-algebra-cd-discipline.md`** (NOT the originally-proposed `packages/ooxml-swift/Sources/OOXMLSwift/EditAlgebra/README.md` location — `packages/` is gitignored in the monorepo, so README cannot live inside the package directory until Phase 2 Spectra change `ooxml-edit-algebra-implementation` (#105) moves it there alongside actual Swift code). Verify: README renders correctly on GitHub at https://github.com/PsychQuant/macdoc/blob/main/docs/edit-algebra-cd-discipline.md after merge; reader can author a CD diagram for a hypothetical new case using only the README + design.md ADR-002 Worked Examples as guide. + +## 10. Verification + finalization + +- [x] 10.1 Run `spectra validate ooxml-edit-isomorphism-foundation` and confirm green. Verify: validator output shows no errors. +- [ ] 10.2 Run full `swift test` on `packages/ooxml-swift` and confirm: (a) all existing tests continue passing (no regression), (b) all new EditAlgebraTests + FullyFaithfulFunctorTests pass. Verify: test report shows 100% pass rate; commit each task completion with `Refs #99`. +- [x] 10.3 Update `docs/structural-editing-paradigm.md` with a cross-reference to the new `ooxml-edit-algebra` capability spec. Verify: cross-reference points to live capability file path; document continues passing markdown lint if any. +- [x] 10.4 Update `docs/lossless-conversion.md` with cross-reference. Verify: same as above. diff --git a/openspec/changes/word-aligned-state-sync/design.md b/openspec/changes/word-aligned-state-sync/design.md index e8b214d..51cd134 100644 --- a/openspec/changes/word-aligned-state-sync/design.md +++ b/openspec/changes/word-aligned-state-sync/design.md @@ -225,3 +225,43 @@ Rollback strategy: each release is a separate git tag. che-word-mcp pins ooxml-s - *Working answer*: Single-writer assumed in v1.0. Append to the log is `O_APPEND`-locked at the OS level; concurrent Swift writers would collide on disk. v2 could add per-op vector clocks for true CRDT support if real demand emerges. - **Q6**: How to handle the existing ad-hoc `rawChildren` field that #67-Phase-A and #69 are about to ship? Continue patching them now, or freeze the patches until v0.30.0 lands? - *Working answer*: Continue patching (#67-A is already in branch). The `rawChildren` patches are bridge code; v0.31.0 deprecates them when typed views move to the tree. Each patch has its own `IssueRoundTripTests` that becomes a regression test once typed views land. + +## Relationship to ooxml-edit-isomorphism-foundation + +`word-aligned-state-sync` and `ooxml-edit-isomorphism-foundation` (Spectra change opened 2026-05-25 to pin Word↔Swift edit-isomorphism as macdoc OOXML's architectural contract) describe the **same** underlying architecture, at different levels of abstraction: + +- **`ooxml-edit-isomorphism-foundation`**: Pins the *type-level contract* (fully faithful functor, two-layer edit algebra `WordEdit` / `OOXMLEdit`, `lower()` bridge, canonical-identity round-trip invariant, CD review discipline). Capability spec is `ooxml-edit-algebra`. Apply scope is small: Edit type elevation + property-based functor tests on 3-5 representative operations. + +- **`word-aligned-state-sync`**: Pins the *runtime mechanism* (event-sourced op-log, XmlNode lossless tree, typed views as read-through layer, SyncOrchestrator, sidecar JSONL). Six-release migration sequence (v0.30.0 → v1.0.0) that delivers the runtime backing for the type-level contract. + +### How they coordinate + +The two changes are **complementary**, not parallel. `word-aligned-state-sync`'s Decision 3 ("ID-based operations, never positional indices") is a **refinement** of `ooxml-edit-algebra`'s `Edit` type contract: positional indices live at the `OOXMLEdit` lower-layer (`at: bodyChildIndex`), and stable IDs at the `WordEdit` upper-layer expose the same operation to callers via element identifiers (`paraID`, `bookmarkId`, etc.). + +| Concern | `ooxml-edit-isomorphism-foundation` | `word-aligned-state-sync` | +|---|---|---| +| Edit equality / composition | First-class via `Edit` protocol | Implicit via op-log append-only semantics | +| Round-trip contract | Canonical-identity (normative SHALL in spec) | Same — content-equality on touched sub-trees | +| Address space | Layered: OOXMLEdit positional + WordEdit stable-ID | Stable IDs (`w14:paraId`, `r:id`, etc.) — refinement of upper layer | +| Runtime backing | Existing `Document.applyOverlay()` / `markDirty()` machinery | New `XmlNode` tree + op-log + reducer | +| Module organization | ADR-004 documents 3-module split (deferred implementation) | Internal to ooxml-swift v0.30.0+ refactor | + +### Action items for `word-aligned-state-sync` (this change) + +When `word-aligned-state-sync` archives (after v1.0.0 cleanup release), its archived design.md SHALL be updated to: + +1. Reframe Decision 3 to cite `ooxml-edit-algebra`'s two-layer split — positional addressing is `OOXMLEdit`'s contract, stable-ID addressing is `WordEdit`'s +2. Cite ADR-001 (`ooxml-edit-isomorphism-foundation` design.md) as the source of the canonical-identity round-trip invariant +3. Cite ADR-008 (`word-builder-swift` lens migration deferred) — the v1.0.0 cleanup phase aligns with the migration window + +This action is **advisory** for the current apply phase of `word-aligned-state-sync` — it does NOT block any v0.30.0 / v0.31.0 release. The reframe lands no later than v1.0.0 archival. + +### Why both changes coexist + +Decision-pinning (`ooxml-edit-isomorphism-foundation`) and runtime mechanism (`word-aligned-state-sync`) serve different purposes and would not benefit from being one change: + +- The decision-pinning is *empty of code* — its purpose is to commit the contract so future-AI / future-Codex / future-contributors stop relitigating it (downstream PRs #94-#98 have been blocked by lack of this commitment). +- The runtime mechanism is *empty of contract* without the foundation — without the Edit-type framing it ships an op-log without a stated invariant for what "valid op" means. + +Together they form the macdoc OOXML toolchain's architectural foundation. +