title | author |
---|---|
Round-trip from markdown to docx and back with pandoc |
Charl P. Botha |
This is an experiment checks to what extent we could use docx as a self-contained storage format for markdown + media, whilst allowing for a limited amount of editing directly to the docx as well.
- README.md: this file.
- README.docx: this file converted docx by pandoc as show below.
- README-from-docx.md: the docx, converted back to markdown.
An often-used Markdown convention is to have a single H1 header per document, and to treat that as the document title.
However, in converting to docx, pandoc treats H1 as just another heading, and prefers to use the title in the yaml header above as the document title.
Convert from markdown to docx:
# the lua filter will add the codeblock language
# as part of a special prepended line
pandoc README.md -o README.docx --lua-filter md_to_docx_filter.lua
Convert that docx back to markdown:
# this lua filter reads the stored codeblock languages
# and reconstructs the language-labelled codeblocks
# --standalone is to reconstruct yaml
# --extract-media=. required to etract images into ./media/
pandoc README.docx -o README-from-docx.md --lua-filter docx_to_md_filter.lua \
--standalone --extract-media=.
What would happen if I round-tripped this codeblock to docx and back?
def main():
print("hello world")
The python tag gets lost in the docx it seems.
One way of fixing this is by using filters to embed the language in the docx codeblock text, and then to use that language when converting back to markdown.
The equation for finding the roots of a parabolic equation, which is a quadratic equation of the form
Why yes of course that works pretty well also.
Link to another file in this directory. What to do?