Preprocessed data that is smaller and less complete, but also easier to work with
All words that appear in at least 2 of the top 100 or so works from Project Gutenberg (minus a couple of bad words I didn't want to include)
The famous CMU pronunciation dictionary, abridged to just the words that appear in the above list
The Google Quickdraw dataset, turned into a JSONish object of SVG paths that can be used in Tracery. 30 doodles from each of 237 objects.