Dewordify is used to facilitate the development of online course materials by converting MS Word documents into HTML pages.
- Install node https://nodejs.org/en/download/
- Install dewordify
(sudo) npm install -g
This package includes an NPM script that allows updates with a single line:
(sudo) npm run update
The above command is just sugar for the following commands which you can just as easily run instead:
git pull && npm install -g
In the command line, navigate to the location of the Word Document (docx) and run
cd path/to/file/location dewordify
If the word file that you would like to parse is not the most recently modified docx file in the folder, either re-save it so that it is, or tell dewordify which file to target like so:
By default, Dewordify uses an HTML template designed for use in the production of courses at BCIT. This is not ideal for anybody but BCIT.
If you would like to use your own custom template, simply include a file called
template.html in the same folder as the word document you would like to convert.
In order to save you time, Dewordify will also search up to 3 folders above your working directory. It will use the first
template.html file that it finds starting in the working directory. That means that you can have a general template defined higher up in your course production folders while providing more specific templates closer to the word documents they will apply to.
Take a look at the following structure:
└───modules ├───00 │ ├───00.docx │ └───template.html ├───01 │ └───01.docx ├───02 │ └───02.docx └───template.html
In this example both 01.docx and 02.docx will use the common template found in the modules folder, while 00.docx has a custom template.
When you create your
template.html, you might ask "where will my content go". This question actually has multiple answers. In priority order:
- Wherever you place a
<content></content>tag (the tag itself will be removed).
- Inside a tag with
- Prepended to the
<body>tag (allowing for footer text and low priority
If none of these conditions are met, only the contents will be returned.
"Markout" is the name we've given to our unique syntax for indicating blocks of content that require custom styling. Markout consists of 3 items used in combination to form "markers":
- A starting symbol (default
- An ending symbol (default
- A specific word which is mapped to a wrapping tag (typically an instructional term with an appropriately named wrapper)
A starting marker would be a combination of 1 and 3 (eg
#marker) while an ending marker would be 2 and 3 (eg
/marker). You would then use the starting and ending markers to indicate content blocks in your word document like so:
#marker Some block of text. Possibly spanning multiple lines. /marker
While Dewordify includes a default set of markers developed in-house at BCIT (documented in BCIT's Conversion Guide), there may come a time where you want to modify, expand, or define an entirely new set of custom markers. To do this, you will need to supply Dewordify with a
Simply look in the
default-files folder for a file with the same name and copy it into your working folder. It's pretty straight forward to edit. For starters, you can edit the start and end symbols if
/ somehow conflict with your internal processes. Second, you will see a set of
wrappers which represent #3 mentioned above. Lastly, there are a set of mappings which simply map mispelled or dirivative words to the appropriate wrappers.
Like templates, you can place these up to 3 folders above your working directory.
MS Word Styles
From time to time, you will see the following:
[MAMMOTH WARNING] Unknown Word Styles.
The first step when you encounter this should be to look at the word document and try to identify the culprit. Usually it's insignificant and can be ignored or manually corrected in either the word document or HTML. In cases where it has wide ranging impacts on the output, read-on.
Style Mapping (Advanced)
Similar to the way you include custom templates and markoutMaps, you can supply a custom
styleMap.txt file. Unlike the others, the
styleMap.txt file that you supply will be appended to the default styleMap found in the
First, style mappings are a little difficult to describe and are actually a feature of Mammoth.js so I'll start by referring you to the custom style map documentation in Mammoth.js.
Second, I've gone to the trouble of creating a
styleMap-reference-list.txt (also found in the
default-files folder) which shows a comprehensive listing of every unmapped style that I could find in the guts of MS Word. This gives you a start at figuring out the most confusing part which is whether the style is a "run" or "paragraph" type (think inline vs block level HTML). You can simply copy that line into your custom
styleMap.txt file and modify the output to whatever you desire.
Note: It's possible for people to define their own custom styles in which cases you won't find them listed in the reference list.
Another confusing part of style mapping is the
:fresh declaration. Without it, adjacent styles of the same type will be combined into a single tag. I found experimentation to be the best way to understand. For example, what would you expect the output of this to be:
div.parent > p.child:fresh
It's worth noting again that styleMaps are appended to the one found in the
default-files folder so your custom
styleMap.txt file should only contain the styles that you are fixing.
The following packages make this project possible.