Skip to content

develop#1

Merged
awakecoding merged 14 commits into
masterfrom
develop
Feb 13, 2026
Merged

develop#1
awakecoding merged 14 commits into
masterfrom
develop

Conversation

@awakecoding
Copy link
Copy Markdown
Owner

  • chore(planning): initialize markdown conversion fidelity improvement plan
  • fix(docx): rewrite table cell text extraction to use run-aware parsing
  • fix(docx): rewrite TOC links to Section_X.Y anchors and strip page numbers
  • fix(docx): remove _Toc anchor tags from heading output
  • chore(planning): phase 2 complete, TASK-006 cancelled (not needed)
  • fix(docx): prepend section numbers to heading text from TOC mapping
  • feat(docx): add inline formatting and table cell formatting support
  • feat(tests): add anchor validation to Test-OpenSpecMarkdownFidelity
  • improve processing
  • feat(docx): convert packet diagram tables to mermaid packet-beta syntax
  • fix(docx): detect additional packet diagram styles (Definition-Field, Packetdiagramheaderrow)
  • feat: rename output to .md and add root index generation
  • cleanup
  • Add convert-and-publish workflow and Prepare-Publish script

awakecoding and others added 14 commits February 12, 2026 21:02
…plan

12 tasks across 5 phases to fix TOC/anchor links, text extraction artifacts, and conversion fidelity issues identified by comparing DOCX-to-markdown output with live Microsoft Learn HTML.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replaced flat w:t node collection with paragraph/run-aware extraction in Get-OpenSpecOpenXmlNodeText. The old approach joined all text nodes with spaces, causing mid-word artifacts (e.g., 'W EBAUTHN', '10/8/20 10', 'technica l'). The new approach walks w:p > w:r structure and delegates to ConvertFrom-OpenSpecOpenXmlRunText which correctly handles w:br, w:tab, w:cr elements.

Tasks: TASK-001, TASK-002 | Phase: 1/5 | Progress: 2/12 (17%)
Co-authored-by: Cursor <cursoragent@cursor.com>
…mbers

Rewrote Add-OpenSpecSectionAnchorsFromToc to replace _Toc anchor targets with Section_X.Y in TOC links and strip trailing DOCX page numbers from labels. TOC entries now read '[1 Introduction](#Section_1)' instead of '[1 Introduction 5](#_Toc164822728)'.

Tasks: TASK-003, TASK-004 | Phase: 2/5 | Progress: 4/12 (33%)
Co-authored-by: Cursor <cursoragent@cursor.com>
Keep _Toc bookmarks during initial conversion for Section_X.Y anchor placement, then strip all _Toc anchor tags with regex. Each heading now has only bookmark GUID + Section_X.Y anchors.

Task: TASK-005 | Phase: 2/5 | Progress: 5/12 (42%)
Co-authored-by: Cursor <cursoragent@cursor.com>
All MS Open Specs headings are numbered - slug anchors for non-numbered headings not needed. Phase 2 complete: 3 tasks done, 1 cancelled. Moving to Phase 3.

Phase: 2/5 complete | Progress: 6/12 (50%)
Co-authored-by: Cursor <cursoragent@cursor.com>
Word auto-numbers headings but the number isn't in the paragraph text. Added post-processing in Add-OpenSpecSectionAnchorsFromToc to inject section numbers from the TOC map into heading lines. Headings now show '# 1 Introduction' matching the live Microsoft Learn HTML.

Task: TASK-007 finding | Phase: 3/5 | Progress: 7/12 (58%)
Co-authored-by: Cursor <cursoragent@cursor.com>
Add bold/italic/code detection from OpenXML run properties (w:rPr) to markdown output. Uses Unicode noncharacter placeholders for safe marker merging of adjacent same-format runs. Whitespace moved outside markers for CommonMark compliance. Bold stripped from headings.

Table cell extraction upgraded from plain text to paragraph-aware rendering, preserving bold formatting and hyperlinks within table cells.

Results across 41 specs: 20,258 bold pairs, 669 bold table rows, 796 linked table rows, 0 conversion errors. Completes TASK-009 and TASK-010. All 12 plan tasks now completed.

Co-authored-by: Cursor <cursoragent@cursor.com>
Extended fidelity tests to validate: Section_X.Y anchors present, no _Toc anchors remain, TOC links resolve to existing anchors, numbered headings exist, bold formatting detected. Fixed CRLF regex issue in table detection. All 41 specs pass.

Updated plan.md to reflect completed project status with all checkboxes checked.

Co-authored-by: Cursor <cursoragent@cursor.com>
Detect DOCX packet layout tables by their PacketDiagramHeaderText style and convert them to mermaid packet-beta diagrams instead of wide 32-column markdown tables. Continuation rows are merged into the previous field's bit range for correct multi-row field representation.

Co-authored-by: Cursor <cursoragent@cursor.com>
… Packetdiagramheaderrow)

Extend packet diagram detection to match Packetdiagramheaderrow and Definition-Field/Definition-Field2 styles in addition to PacketDiagramHeaderText. This catches 230 additional packet diagrams across the RDP specs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Change per-spec output filename from index.md to <ProtocolId>.md for unique editor tab names. Update cross-document link generation to match. Add Update-OpenSpecIndex command that generates a README.md catalog of all converted specs with titles and links.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@awakecoding awakecoding merged commit a24e9c0 into master Feb 13, 2026
@awakecoding awakecoding deleted the develop branch February 13, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant