Convert Webster's Unabridged Dictionary from Project Gutenberg to OSX dictionary
After reading James Somers' excellent article about the differences in quality between dictionaries, I was prompted to follow his method for getting Webster's 1913 into Dictionary.app.
The directions worked as promised, but I was dissapointed with the formatting. In particular, I was troubled by the hard line breaks at about 80 characters and the way that the pronunciation, part of speech, and etymology were not clearly separated from the rest of the entry.
To get something more like what I was looking for, I started with a copy of Webster's Unabridged Dictionary from Project Gutenberg and wrote this script to convert it to the XML that Apple's Dictionary Development Kit takes as input.
If you just want the results
If you don't care to run the script yourself and just want the finished dictionary, you can download an installer package that will put the finished dictionary in ~/Library/Dictionaries for you.
There's also an iOS app.
The Gutenberg text should have the beginning and end trimmed off for proper parsing. Line endings should also be made to match whatever is standard on the local machine.
Assuming OSX with a copy of the Gutenberg text in ../29765 and a Dictionary Development Kit project at ../ddk_project, this should do the trick:
tr -d "\r" < ../29765/29765.txt.utf-8 | tail -n +27 | head -n 973878 | \ ./convert-dsm.pl > ../ddk_project/WebstersUnabridged.xml