Skip to content

Converting Final Text WORD to HTML

John Moehrke edited this page Mar 30, 2022 · 1 revision

ITI decided to convert our Final-Text WORD to HTML, and from that point forward manage changes using HTML. We looked at use of intermediate formats that are more friendly to editors, but found that there was not an overall benefit. We specifically chose to keep our HTML as basic as possible, keeping any style in the CSS. Further we chose to do this conversion only to the Final-Text, as that is stable and usually only has very targeted Change-Proposals, or the addition of new chapters when a supplement transitions to Final-Text.

Note that whitepapers, handbooks, and supplements we have converted to Markdown, and manage the publication process. This choice is due to the fact these are more dynamic; and have less defined structure. See Authoring long form Supplement and Whitepapers for details on this.

Converting word to HTML

There are a set of tools that were used to convert word to HTML. Pandoc is usually the tool used, but for this step a different tool was chosen. The primary conversion tool used was AConvert.

The Volumes were managed as one WORD document, these needed to be split on header 1 or 2 depending on the volume. These splits were often done using scripting. The scripting was not preserved as it was 'quick and dirty'.

The chapters then had the header and footer applied. We looked at using a display framework, but found that they added overhead that we did not see as sustainable.

Many cleanups were needed

  • Often times the conversion tool used often to cover special styling. Most of the time this is totally not necessary. As we determined patterns, we would then bulk edit those patterns out. But some use of span was not automatable.
  • We chose to make sure tables and figures were decorated with styled titles.