Convert GCIDE XML Data To Slob Format (used in Aard 2)
GCIDE
, short for GNU Collaborative International Dictionary of English
, is a great and open dictionary derived from Webster's Revised Unabridged Dictionary (1913)
, which is valuable even if many definitions are old from today's perspective.
The XML format of GCIDE
data is available on http://www.ibiblio.org/webster/ , the latest version is 0.51. It's our task to produce slob format of GCIDE
, needed by Aard 2, from XML.
Now, we complete this task in this repo.
- JDK 7+
- Maven 3.0+
- GNU/Linux or *BSD (Because we need
patch
utility)
$ mkdir /tmp/gcide-build && cd /tmp/gcide-build
$ git clone https://github.com/darkgeek/gcide-converter
$ wget -c http://www.ibiblio.org/webster/gcide_xml-0.51.zip && unzip gcide_xml-0.51.zip
$ cd /tmp/gcide-build/gcide_xml-0.51/xml_files/
$ patch < /tmp/gcide-build/gcide-converter/files/patch-gcide-f.xml
$ patch < /tmp/gcide-build/gcide-converter/files/patch-gcide.xml
$ cd /tmp/gcide-build/gcide-converter && mvn clean package
$ cd /tmp/gcide-build/gcide_xml-0.51/xml_files/
$ java -jar /tmp/gcide-build/gcide-converter/target/gcide-converter-1.0-SNAPSHOT.jar # It'll generate a `dict_creator.py` python script
$ # Install and setup a slob working environment by following this guide: https://github.com/itkach/slob/blob/master/README.org
$ python dict_creator.py # It'll produce a file named `gcide.slob` in current directory
If you see the file gcide.slob
in /tmp/gcide-build/gcide_xml-0.51/xml_files/
, then fine, it's what we need. You could move it to your Aard dictionary directory.
Q: Help! I get GC overhead limit exceeded
error when I run the gcide-converter-xxx.jar
file!
A: Add option -Xmx
to enlarge the heap space in JVM, like this:
java -Xmx1024m -jar /tmp/gcide-build/gcide-converter/target/gcide-converter-1.0-SNAPSHOT.jar
Whenever you see a bug or something wrong, don't hesitate to open an issue or send a PR.