Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Purpose of HunspellXML
Hunspell is a very flexible and powerful spell-check dictionary engine that has been used in a wide variety of programs including Firefox, LibreOffice, OpenOffice, and Opera, as well as in other software. Nevertheless, the file format for specifying a Hunspell dictionary, although documented, is rather complex and difficult to master. HunspellXML aims to facilitate the process of creating Hunspell dictionaries by:
- providing a simple XML file format which is more human-readable than raw Hunspell files
- converting the XML to valid Hunspell affix and dictionary files
- creating Firefox, LibreOffice, OpenOffice, and Opera spell-check plugins automatically
Benefits of Using HunspellXML
Defining your dictionary first in HunspellXML provides the following advantages over defining it directly in the raw Hunspell format:
- Human-readable - The HunspellXML file is human-readable and thus provides an excellent option for creating Hunspell dictionary source code, without having to learn all formatting options required to create a raw Hunspell dictionary and affix file.
- Error checking - The HunspellXML library provides some error checking for affix rules, including some restrictions that are not currently documented in the Hunspell documentation.
- Plugin packaging - The HunspellXML library provides utilities for creating packaged Hunspell dictionary plugins for Firefox, LibreOffice/OpenOffice, and Opera.
- MyThes thesaurus - HunspellXML also provides basic support for creating MyThes thesaurus files.
- Testing - In HunspellXML, you can define and export tests (correctly and incorrectly spelled words) to help verify that the Hunspell dictionary you create does what you intended.
Affix multiplication - While Hunspell only provides the possibility to represent 3 levels of affixes, one method to get around this is to combine multiple affixes into one Hunspell affix slot. For example, the Lingala verb extensions (-am, -an, -el, -is, -ol), can combine with verb tense markers (-a, -i, -aka, -aki) which requires 20 rules to be typed in a raw Hunspell affix file (5 x 4). HunspellXML provides a
<multiply>feature so you don't have to type all the combinations out. You only have to enter the rules from each affix group (9 rules instead of 20 for the Lingala example). For languages that need to combine lots of affix rules, this can be a significant improvement in readability and maintainability.
- The groovy-all-[version].jar library from the Groovy distribution.
- The RelaxNG library (jing.jar) from Thai Open Source
- The hunspell.jar library and its jna.jar dependency from HunspellJNA
Command Line Utility
If you download version 1.8 or above from the releases, you will be able to use the command line utility to convert HunspellXML to Hunspell or vice versa. Read more about it on the Command Line Tool page.
- Read through the Getting Started Tutorial to build your first simple HunspellXML file.
- Browse through the HunspellXML File Format Overview and its subsections
- Read more about creating affixation rules and HunspellXML's powerful
<multiply><group>syntax in Creating Affixation Rules: an Example with Lingala Verbs
- Read about [Undocumented Hunspell Constraints](wiki/Affix-Rules:-Undocumented Hunspell Constraints)
HunspellXML File Format Reference
- HunspellXML File Format Overview
Tips for Designing Your Dictionary Definition
- Affix Rules
- Suggestion Rules
- Other Tips