Skip to content

Troubleshooting formatting with pandoc

Greg Boone edited this page Sep 19, 2016 · 9 revisions

To get a docx to markdown you can run pandoc pandoc -f docx -t markdown --atx-headers /path/to/draft.docx > _posts/201x-mm-dd-post-slug.md

Pandoc does a few strange things when converting, though, and you'll want to check all these things before publishing.

  • Clean frontmatter: extra spaces between lines, remove date

  • Make sure author is listed in _data/authors.yml

  • Check for special characters that pandoc escapes with a \ unless they're supposed to print and would otherwise be interpreted as markdown

    • $
    • _
    • `
  • Replace all [* with [

  • Replace all *] with ]

  • Add or remove image from frontmatter

  • Words that put a space before the close of markdown artifacts. Jekyll will not render **this statement ** as bold. It will render it like: *this statement * instead.

  • Sometimes links are converted where each word is linked individually: [like](https://18f.gsa.gov) [this](https://18f.gsa.gov), instead of [like this](https://18f.gsa.gov)

  • Bulleted lists where the list item text is too long gets rendered like this:

    - first line of the list item
    > second line of the same list item
    > third line of the same list item
    - second list item
    

    There's no need for those > characters, in fact they screw up the formatting pretty badly. Make sure you delete them.