develop#1
Merged
Merged
Conversation
Owner
awakecoding
commented
Feb 13, 2026
- chore(planning): initialize markdown conversion fidelity improvement plan
- fix(docx): rewrite table cell text extraction to use run-aware parsing
- fix(docx): rewrite TOC links to Section_X.Y anchors and strip page numbers
- fix(docx): remove _Toc anchor tags from heading output
- chore(planning): phase 2 complete, TASK-006 cancelled (not needed)
- fix(docx): prepend section numbers to heading text from TOC mapping
- feat(docx): add inline formatting and table cell formatting support
- feat(tests): add anchor validation to Test-OpenSpecMarkdownFidelity
- improve processing
- feat(docx): convert packet diagram tables to mermaid packet-beta syntax
- fix(docx): detect additional packet diagram styles (Definition-Field, Packetdiagramheaderrow)
- feat: rename output to .md and add root index generation
- cleanup
- Add convert-and-publish workflow and Prepare-Publish script
…plan 12 tasks across 5 phases to fix TOC/anchor links, text extraction artifacts, and conversion fidelity issues identified by comparing DOCX-to-markdown output with live Microsoft Learn HTML. Co-authored-by: Cursor <cursoragent@cursor.com>
Replaced flat w:t node collection with paragraph/run-aware extraction in Get-OpenSpecOpenXmlNodeText. The old approach joined all text nodes with spaces, causing mid-word artifacts (e.g., 'W EBAUTHN', '10/8/20 10', 'technica l'). The new approach walks w:p > w:r structure and delegates to ConvertFrom-OpenSpecOpenXmlRunText which correctly handles w:br, w:tab, w:cr elements. Tasks: TASK-001, TASK-002 | Phase: 1/5 | Progress: 2/12 (17%) Co-authored-by: Cursor <cursoragent@cursor.com>
…mbers Rewrote Add-OpenSpecSectionAnchorsFromToc to replace _Toc anchor targets with Section_X.Y in TOC links and strip trailing DOCX page numbers from labels. TOC entries now read '[1 Introduction](#Section_1)' instead of '[1 Introduction 5](#_Toc164822728)'. Tasks: TASK-003, TASK-004 | Phase: 2/5 | Progress: 4/12 (33%) Co-authored-by: Cursor <cursoragent@cursor.com>
Keep _Toc bookmarks during initial conversion for Section_X.Y anchor placement, then strip all _Toc anchor tags with regex. Each heading now has only bookmark GUID + Section_X.Y anchors. Task: TASK-005 | Phase: 2/5 | Progress: 5/12 (42%) Co-authored-by: Cursor <cursoragent@cursor.com>
All MS Open Specs headings are numbered - slug anchors for non-numbered headings not needed. Phase 2 complete: 3 tasks done, 1 cancelled. Moving to Phase 3. Phase: 2/5 complete | Progress: 6/12 (50%) Co-authored-by: Cursor <cursoragent@cursor.com>
Word auto-numbers headings but the number isn't in the paragraph text. Added post-processing in Add-OpenSpecSectionAnchorsFromToc to inject section numbers from the TOC map into heading lines. Headings now show '# 1 Introduction' matching the live Microsoft Learn HTML. Task: TASK-007 finding | Phase: 3/5 | Progress: 7/12 (58%) Co-authored-by: Cursor <cursoragent@cursor.com>
Add bold/italic/code detection from OpenXML run properties (w:rPr) to markdown output. Uses Unicode noncharacter placeholders for safe marker merging of adjacent same-format runs. Whitespace moved outside markers for CommonMark compliance. Bold stripped from headings. Table cell extraction upgraded from plain text to paragraph-aware rendering, preserving bold formatting and hyperlinks within table cells. Results across 41 specs: 20,258 bold pairs, 669 bold table rows, 796 linked table rows, 0 conversion errors. Completes TASK-009 and TASK-010. All 12 plan tasks now completed. Co-authored-by: Cursor <cursoragent@cursor.com>
Extended fidelity tests to validate: Section_X.Y anchors present, no _Toc anchors remain, TOC links resolve to existing anchors, numbered headings exist, bold formatting detected. Fixed CRLF regex issue in table detection. All 41 specs pass. Updated plan.md to reflect completed project status with all checkboxes checked. Co-authored-by: Cursor <cursoragent@cursor.com>
Detect DOCX packet layout tables by their PacketDiagramHeaderText style and convert them to mermaid packet-beta diagrams instead of wide 32-column markdown tables. Continuation rows are merged into the previous field's bit range for correct multi-row field representation. Co-authored-by: Cursor <cursoragent@cursor.com>
… Packetdiagramheaderrow) Extend packet diagram detection to match Packetdiagramheaderrow and Definition-Field/Definition-Field2 styles in addition to PacketDiagramHeaderText. This catches 230 additional packet diagrams across the RDP specs. Co-authored-by: Cursor <cursoragent@cursor.com>
Change per-spec output filename from index.md to <ProtocolId>.md for unique editor tab names. Update cross-document link generation to match. Add Update-OpenSpecIndex command that generates a README.md catalog of all converted specs with titles and links. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.