- Major revisions to the command line arguments. Should be more consistent now.
- Added a new feature to subset using data taken from Common Crawl rather than using Google's subsets. This seems to be much better for Chinese text, and not very much better for Japanese text. (TODO: Separate by CJK languages. Japanese and Chinese are the biggest culprits.)
- Documentation changes.
- Initial release.