Skip to content

Customizing Docx Rendering

Vladimir Schneider edited this page Oct 19, 2019 · 5 revisions

The DocxRenderer is found in the flexmark-docx-converter module.

The rendering process involves creating an empty WordprocessingMLPackage and passing it to the renderer so it can add rendered markdown contents to this package.

The renderer uses the styles defined in the package to apply formatting for markdown elements. Customizing the output is a matter of changing the styles you pass to the renderer.

Since the rendering appends contents to the package, you can pass any a non-empty document to have markdown contents appended. This includes rendering several markdown documents to a single package.

A default empty document in XML form is included in the module and you can simply use DocxRenderer.getDefaultTemplate() static method to get it.

There are other convenience methods to read in a template from a stream or a resource.

The styles in the package are expected to have IDs which will be associated with various markdown elements being rendered. The default values are used in the empty.xml document template used by default.

The renderer will read the styles from the package and do its best to propagate them to nested markdown elements.

For convenience there is also a DocxContextImpl class that can be used to create docx content via code. It is used by the renderer but also by the ComboDocxConverterSpecTest test file which appends the status of all tests, markdown source and resulting docx conversion into a single document. It is a good source for sample code should you need it.

DocxConverter CommonMark Sample and DocxConverter Pegdown Sample files shows how to use this module.

Customizing the Styles

The renderer relies on the style configuration to produce the document. If you modify the styles inconsistently then your results will reflect this.

The styles are in XML form in the empty.xml default template file used for rendering and you can make a copy of it and modify for your use. Thereafter, passing either the string or input stream to DocxRenderer.getDefaultTemplate() with that content. When passing a stream then the data is not limited to XML and can be a docx document stream.

Unless you are very comfortable with manipulating Docx format and don't mind debugging wrong style assignments it is recommended that the default style names be used and only the style definitions be changed to reflect your desired style instead.

Another way to make it work is to open the empty.xml package in a word processor that supports it (Libre Office, Word) and modify the styles and save the document as XML or docx.

Modifying styles with MS Word is also possible but for that you will need to start with a docx document that already contains all the needed styles. In the root directory of the distribution jar file you will find the flexmark-empty-template.docx file. This file contains all the markdown elements and styles used in the conversion that you can modify. It also contains the instructions on the best way to do this with success.

The file was produced by running the docx conversion on the empty.md file, also found in the jar, with the default template.

A sample which uses empty.md and empty.xml template to generate flexmark-empty-template.docx can be found in DocxConverterEmpty Sample

If you do create your own styles template document, it is highly recommended to run the empty.md through the conversion, using your modified template document to make sure all styles are present and nothing got messed up in the process.

Image Embedding

The renderer uses the DocxLinkResolver for basic link resolution for document relative and site relative URLs. For this to work you will need to provide a DocxRenderer.DOC_RELATIVE_URL and DocxRenderer.DOC_ROOT_URL so that your links can be properly mapped to files or http:// resources.

The renderer will embed any images linked through image or image refs in the document. Resolved links can be http: or file: protocol. In either case the renderer will load the file and embed it in the document.

You can provide your own link resolver to customize link resolution rules. The link resolver used by the DocxRenderer is the same as ones used by HtmlRenderer so you can reuse your existing HTML custom link resolvers.

Limitations

Docx format does not allow for the same flexibility as HTML/CSS in a browser. Specifically, nested borders are not available for text paragraphs. Therefore if an element in markdown is nested within several parents that render a border then the border from the most recent parent will be rendered. This specifically affects block quotes in the default template.

Additionally, in docx, the border offset from the left margin is specified in pt (points, 1/72 in.) and limited to 31 pt maximum. Other measurements are in twips (1/20 of a point) with no practical limitation. This creates a condition where the child indent combined with a parent indent can easily exceed the 31 pt limit, making it impossible to keep the child's border (really the parent's extended to the child) aligned with the parent's border. When this happens the 31 pt limit is respected and the child border will be offset from the parent's.

Another caveat is that the left margin and hanging indents have 20x the resolution of the border offset. Which means that it is impossible to keep the border of the child aligned with the parent unless the child indent and the parent indent differ by a multiple of full points.

The renderer detects when this is not the case, which would cause the child border to be visibly misaligned, and adjusts the child's left margin to eliminate this misalignment. Unless your eyes are very sensitive you will not notice the less than 1 pt shift in the text left margin, whereas anyone can notice a 1/2 pt break in a straight line.

Acknowledgments

The development of this module was sponsored by Johner Institut GmbH. It was needed to allow easy conversion of their internal documentation in markdown to the docx format preferred by users at large.

The module uses the docx4j library for handling all docx manipulations.