fix(docx-core): emit paragraph-level markers outside <w:r> on rebuild#109
Merged
stevenobiajulu merged 1 commit intomainfrom Apr 28, 2026
Merged
Conversation
Restores commentRange* and bookmark* on rebuild output by atomizing paragraph-internal markers and routing them outside the synthetic <w:r> wrapper that buildSingleRun emits for run-level leaves. PR #101 traded these markers away to fix a Word-can't-open regression (issue #94), accepting that the highlighted span on commentRange and paragraph-internal bookmark spans were lost on rebuild output. This PR puts them back — but at the schema-correct position (siblings of <w:r>, not children) — so Word still opens the file and the spans survive. Mechanics: - Add PARAGRAPH_LEVEL_TAGS (commentRangeStart/End, bookmarkStart/End) and isParagraphLevelLeaf() in atomizer.ts. - Make atomization ancestry-sensitive: only emit a marker atom when the element sits inside a <w:p>. Body/table-sibling placements stay out of the atom stream and are still handled by the existing scaffold-strip block in the reconstructor (avoids misattachment via assignParagraph- Indices, which only assigns indices to atoms with a w:p ancestor). - Gate atomization with the new atomizeParagraphLevelMarkers flag in AtomizeTreeOptions, set true only on the rebuild path. The inplace path's bookmark reconciliation breaks if bookmarkStart/End atoms enter the stream (orphaned-bookmark warnings, round-trip safety failures). - Add buildRunContentWithParagraphMarkers(): walks a RunGroup left to right, buffers run-level atoms, flushes the buffer to <w:r> blocks via subGroupByRPr+buildSingleRun on each marker, and emits markers as bare elements. Always sub-groups by per-atom rPr — group.rPr is captured from the first atom and is intentionally suppressed for moved groups, so trusting it would bleed formatting across segments. - Apply enforceConsumerCompatibility on the rebuilt body to dedupe bookmark names/IDs and synthesize recovery starts/ends for orphans (mirrors the post-processing already used in inplace mode). This is required because cross-context bookmark pairs (one half paragraph- internal, the other body-level) appear unbalanced after rebuild. Tests: - Strengthen the existing "Comment added on revised side" assertion in rebuild-auxiliary-merge.test.ts to verify markers are present and that their immediate parent is one of w:p / w:ins / w:del / w:moveFrom / w:moveTo (DOM-based, not regex). The original "must NOT appear inside <w:r>" regex regression stays as a floor. - Add cross-paragraph comment-span test, sibling-style scaffold-marker test, and an inplace-mode regression test for the comment-span scenario. Extend synthetic-docx-fixture.ts with commentSpanParagraphs, bookmarkOnParagraph, and siblingBookmarkBefore options. Move-range markers (moveFromRange*, moveToRange*) and permStart/permEnd are out of scope here — moveRange collides with the synthetic emission in wrapWithMoveFrom/To and permRange lacks fixture coverage. Tracked in follow-up issues. Reviewed by Codex CLI (with repo access) and Gemini CLI in parallel before implementation; both reviews informed the final design (ancestry- sensitive atomization, rebuild-only gate, sub-group-by-rPr override of group.rPr). Fixes: #106 Ref: #101
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This was referenced Apr 27, 2026
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
3 tasks
stevenobiajulu
added a commit
that referenced
this pull request
Apr 28, 2026
* chore(release): bump workspace versions to 0.9.1 Patch release covering bug fixes accumulated since v0.9.0: - fix(docx-core): merge auxiliary parts in rebuild mode (#94, #101) - fix(docx-core): emit paragraph-level markers outside <w:r> on rebuild (#109) - fix(docx-core): preserve reply comments in rebuild mode (#108, #112) - test(docx-mcp): harden vitest path allowlist against /tmp symlink topology (#105) - ci: SHA-pin all GitHub Actions, add dependency review and Dependabot - ci: fix PR title check for acronyms and fork PRs (#88) - chore(skills): sync docx-editing to 0.3.0 (#95) Generated via scripts/bump_version.mjs. * fix(docx-mcp): declare @xmldom/xmldom dependency explicitly The bumped package-lock.json regenerated cleanly and exposed a latent bug: docx-mcp directly imports @xmldom/xmldom in tag_parser.ts but did not declare it as a dependency. It was previously masked by npm hoisting from docx-core's transitive dep — the workspace happened to work because the lockfile carried a stale hoisted xmldom@0.8.11. Pinning to ^0.8.11 (matching what was already running in production via the silent hoist) preserves observable behavior. tag_parser depends on xmldom 0.8's lenient parsing of cross-nested tags (<b><i>text</b></i>) — xmldom 0.9 throws on such input. Migrating docx-mcp's tag_parser to xmldom 0.9 is its own concern; this commit keeps the 0.9.1 release a pure version bump. docx-core continues to use xmldom ^0.9.9 (PR #75); the two versions coexist under their respective package's node_modules. No DOM nodes cross the docx-core/docx-mcp boundary, so the version split is safe.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #106. Restores
w:commentRangeStart/w:commentRangeEndandw:bookmarkStart/w:bookmarkEndon rebuild output by atomizing paragraph-internal markers and routing them outside the synthetic<w:r>wrapper thatbuildSingleRunemits for run-level leaves.PR #101 traded these markers away to fix a "Word can't open" regression (issue #94), accepting that comment ranges (the highlighted span) and paragraph-internal bookmark spans were lost on rebuild output. This PR puts them back — at the schema-correct position (siblings of
<w:r>, not children) — so Word still opens the file and the spans survive.Mechanics
PARAGRAPH_LEVEL_TAGSset +isParagraphLevelLeaf()predicate inatomizer.ts.<w:p>. Body/table-sibling placements stay out of the atom stream and are still handled by the existing scaffold-strip block (avoids misattachment viaassignParagraphIndices, which only assigns indices to atoms with aw:pancestor).atomizeParagraphLevelMarkersflag onAtomizeTreeOptions, set true only on the rebuild path. The inplace path's bookmark reconciliation breaks ifbookmarkStart/Endatoms enter the stream (orphaned-bookmark warnings, round-trip safety failures).buildRunContentWithParagraphMarkers(): walks aRunGroupleft to right, buffers run-level atoms, flushes the buffer to<w:r>blocks viasubGroupByRPr+buildSingleRunon each marker, and emits markers as bare elements. Always sub-groups by per-atom rPr —group.rPris captured from the first atom and is intentionally suppressed for moved groups, so trusting it would bleed formatting across segments.enforceConsumerCompatibilityon the rebuilt body (mirrors inplace mode) to dedupe bookmark names/IDs and synthesize recovery starts/ends for orphans. Required because cross-context bookmark pairs (one half paragraph-internal, other body-level) appear unbalanced after rebuild.Tests
rebuild-auxiliary-merge.test.tsto verify markers are present and that their immediate parent is one ofw:p/w:ins/w:del/w:moveFrom/w:moveTo(DOM-based, not regex). Original "must NOT appear inside<w:r>" regex regression stays as a floor.w:id.synthetic-docx-fixture.tswithcommentSpanParagraphs,bookmarkOnParagraph,siblingBookmarkBefore.Counts: docx-core 1094 pass (was 1091, +3 new), docx-mcp 638 pass, full lint clean.
Out of scope (follow-ups will be filed)
moveFromRange*/moveToRange*— collides with synthetic emission inwrapWithMoveFrom/To; needs scaffold-cleanup + dedup work.permStart/permEnd— same placement rule per ECMA-376; mechanical extension once helper exists, but lacks fixture coverage.buildDocumentPreservingStructurecross-paragraph comment-range stripping in inplace mode — separate code path; explicitly deferred by issue Reconstruct paragraph-level markers (commentRange, bookmark) outside <w:r> in rebuild mode #106 itself.Peer review
Reviewed by Codex CLI (with repo access) and Gemini CLI in parallel before implementation; both reviews informed the final design — particularly Codex's catches on:
shouldStartNewParagraphkeepsparagraphIndex === undefinedatoms in the current paragraph, so body-level markers would otherwise misattach)group.rPrbecause moved groups suppress rPr-splitting)Insertedrange, so the marker's parent is<w:ins>— not raw<w:p>)Test plan
npm run buildnpm run lint:workspacesnpm run test:run(1094/1095 across docx-core, +638 docx-mcp, +113 third workspace)npm run check:spec-coverageFixes: #106
Ref: #101