Ruby Shell
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.github Update .github/ISSUE_TEMPLATE/bug_report.md May 3, 2018
bin fix ci Nov 9, 2017
docs Update docs/CONTRIBUTING.md Oct 24, 2017
lib target ruby 2.1 Nov 9, 2017
script add rubocop Jan 12, 2016
test fix ci Nov 9, 2017
.gitignore fix ci Nov 9, 2017
.rubocop.yml fix ci Nov 9, 2017
.travis.yml fix ci Nov 9, 2017
CONTRIBUTING.md update contributing instructions May 12, 2014
Gemfile fix ci Nov 9, 2017
LICENSE.md Create LICENSE.md Mar 22, 2014
README.md fix readme formatting Feb 13, 2016
Rakefile fix ci Nov 9, 2017
appveyor.yml user ruby 2.2 in appveryor Nov 9, 2017
word-to-markdown.gemspec fix ci Nov 9, 2017

README.md

Word to Markdown converter

A Ruby gem to liberate content from the jail that is Word documents

Build Status Gem Version Inline docs Build status

The problem

Our default content publishing workflow is terribly broken. We've all been trained to make paper, yet today, content authored once is more commonly consumed in multiple formats, and rarely, if ever, does it embody physical form. Put another way, our go-to content authoring workflow remains relatively unchanged since it was conceived in the early 80s.

I'm asked regularly by government employees — knowledge workers who fire up a desktop word processor as the first step to any project — for an automated pipeline to convert Microsoft Word documents to Markdown, the lingua franca of the internet, but as my recent foray into building just such a converter proves, it's not that simple.

Markdown isn't just an alternative format. Markdown forces you to write for the web.

Read more

Demo

Install

You'll need to install LibreOffice. Then:

gem install word-to-markdown

Usage

file = WordToMarkdown.new("/path/to/document.docx")
=> <WordToMarkdown path="/path/to/document.docx">

file.to_s
=> "# Test\n\n This is a test"

file.document.tree
=> <Nokogiri Document>

Command line usage

Once you've installed the gem, it's just:

$ w2m path/to/document.docx

Outputs the resulting markdown to stdout

Supports

  • Paragraphs
  • Numbered lists
  • Unnumbered lists
  • Nested lists
  • Italic
  • Bold
  • Explicit headings (e.g., selected as "Heading 1" or "Heading 2")
  • Implicit headings (e.g., text with a larger font size relative to paragraph text)
  • Images
  • Tables
  • Hyperlinks

Requirements and configuration

Word-to-markdown requires soffice a command line interface to LibreOffice that works on Linux, Mac, and Windows. To install soffice, see the LibreOffice documentation.

Testing

script/cibuild

Server

Word-to-markdown-demo contains a lightweight server for converting Word Documents as a service.

A live version runs at word-to-markdown.herokuapp.com.