Skip to content

Fix #210: handle percent-suffixed table/cell widths in WmlToHtmlConverter#211

Merged
JSv4 merged 4 commits into
mainfrom
claude/issue-210-investigation-xkEYo
May 30, 2026
Merged

Fix #210: handle percent-suffixed table/cell widths in WmlToHtmlConverter#211
JSv4 merged 4 commits into
mainfrom
claude/issue-210-investigation-xkEYo

Conversation

@JSv4
Copy link
Copy Markdown
Owner

@JSv4 JSv4 commented May 30, 2026

Summary

Fixes #210. convertDocxToHtml (i.e. WmlToHtmlConverter) threw FormatExceptionConversion failed: Format_InvalidStringWithValue, 100% — for any document containing a table whose width (table-level w:tblW or cell-level w:tcW) was expressed as a percentage with w:type="pct" and a percent-suffixed value such as w:w="100%" / w:w="50%". DXA (twips) widths converted fine.

Root cause

The w:w attribute on w:tblW / w:tcW has OOXML schema type ST_TblWidth (a union over ST_MeasurementOrPercent + ST_DecimalNumber). Under w:type="pct" the value may be expressed two schema-valid ways:

Form Example Meaning
Integer (fiftieths of a percent) w:w="5000" 5000 / 50 = 100%
Percent-suffixed string w:w="100%" a literal 100%

Microsoft Word writes the integer-fiftieths form. The widely used docx JS library (used in the issue's repro) writes the percent-suffixed string form for WidthType.PERCENTAGE. Docxodus only handled the integer form — it cast the attribute straight to int ((int)tblW.Attribute(W._w) / (int) tcPr...tcW...), and (int)"100%" throws. DXA widths were unaffected because they are always plain integers.

Fix

Width parsing for w:tblW and w:tcW now routes through a single helper, ParseTblWidthValue(XAttribute, out bool isExplicitPercent), that:

  • strips a trailing % and reports it via isExplicitPercent,
  • parses with decimal.TryParse (invariant culture) so a non-numeric/garbage value returns null and is skipped instead of throwing,
  • lets callers treat an explicit "100%" as a literal percentage, while a bare integer under pct is still divided by 50 (fiftieths → percent), preserving existing behavior (5000100%).

The three call sites (table pct, table dxa, cell dxa+pct) were updated to use it. DXA output ({n}pt) and integer-pct output are unchanged.

Tests

New Docxodus.Tests/HtmlConverterTablePercentageWidthTests.cs (HcTablePercentageWidthTests) builds minimal in-memory .docx documents (no fixtures on disk) with a 2×2 table and asserts:

Docs

  • CHANGELOG.md — entry under [Unreleased] → Fixed.
  • docs/ooxml_corner_cases.md — documents the percent-suffixed w:w corner case with a minimal reproducer and a Word/LibreOffice/Docxodus comparison table.

Notes on similar patterns

Swept the codebase for the same risky cast. tblInd already uses a safe (decimal?) cast and is DXA-only; gridCol / tcW reads in RevisionProcessor.cs are DXA-only in practice. The user-facing crash was confined to the three WmlToHtmlConverter width casts, which now share one tolerant parser.

claude and others added 4 commits May 30, 2026 04:01
…rter

convertDocxToHtml threw FormatException ("Format_InvalidStringWithValue,
100%") for any table whose w:tblW or w:tcW used w:type="pct" with a
percent-suffixed value such as w:w="100%" / w:w="50%". This is the form the
`docx` npm library emits for WidthType.PERCENTAGE, and it is valid per the
OOXML ST_TblWidth / ST_MeasurementOrPercent schema, which permits either a
plain integer in fiftieths-of-a-percent OR a "<number>%" string. The
converter cast the w:w attribute straight to int, which throws on "100%";
DXA (twips) widths were unaffected because they are always plain integers.

Width parsing for w:tblW and w:tcW now routes through a single
ParseTblWidthValue helper that tolerates the percent-suffixed form: an
explicit "100%" is treated as a literal percentage, while a bare integer
under pct is still interpreted as fiftieths of a percent (5000 -> 100%).
Non-numeric values are ignored gracefully instead of throwing.

Tests: HcTablePercentageWidthTests (in-memory docx exercising pct string,
pct integer, dxa, and garbage widths). Documents the corner case in
docs/ooxml_corner_cases.md and adds a CHANGELOG entry.
The previous commit landed the tests, CHANGELOG, and corner-case doc but the
WmlToHtmlConverter.cs edits did not take (the file had been touched between
read and write), so the actual percentage-width fix was missing and the new
HcTablePercentageWidthTests would have failed. This applies the intended
change: route w:tblW / w:tcW width parsing through ParseTblWidthValue so a
percent-suffixed value such as w:w="100%" no longer throws FormatException.
…nt-width fixtures

The in-memory docx built by CreateDocxWithTableWidth lacked a
StyleDefinitionsPart and DocumentSettingsPart, so ConvertToHtml threw
ArgumentNullException (FormattingAssembler dereferences
StyleDefinitionsPart; CalculateSpanWidthForTabs dereferences
DocumentSettingsPart) before any table-width parsing ran. All four
HcTablePercentageWidthTests failed in CI for this reason, not because
of the width fix. Supply both parts so the tests exercise the actual
percent/dxa/integer/garbage width paths.
@JSv4 JSv4 merged commit a091614 into main May 30, 2026
12 checks passed
@JSv4 JSv4 mentioned this pull request May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

convertDocxToHtml throws Format_InvalidStringWithValue on tables with percentage widths (w:type="pct")

2 participants