Clone this wiki locally
Purpose of HunspellXML
Hunspell is a very flexible and powerful spell-check dictionary engine that has been used in a wide variety of programs including Firefox, LibreOffice, OpenOffice, and Opera, as well as in other software. Nevertheless, the file format for specifying a Hunspell dictionary, although documented, is rather complex and difficult to master. HunspellXML aims to facilitate the process of creating Hunspell dictionaries by:
- providing a simple XML file format which is more human-readable than raw Hunspell files
- converting the XML to valid Hunspell affix and dictionary files
- creating Firefox, LibreOffice, OpenOffice, and Opera spell-check plugins automatically
Benefits of Using HunspellXML
Defining your dictionary first in HunspellXML provides the following advantages over defining it directly in the raw Hunspell format:
- Human-readable - The HunspellXML file is human-readable and thus provides an excellent option for creating Hunspell dictionary source code, without having to learn all formatting options required to create a raw Hunspell dictionary and affix file.
- Error checking - The HunspellXML library provides some error checking for affix rules, including some restrictions that are not currently documented in the Hunspell documentation.
- Plugin packaging - The HunspellXML library provides utilities for creating packaged Hunspell dictionary plugins for Firefox, LibreOffice/OpenOffice, and Opera.
- MyThes thesaurus - HunspellXML also provides basic support for creating MyThes thesaurus files.
- Testing - In HunspellXML, you can define and export tests (correctly and incorrectly spelled words) to help verify that the Hunspell dictionary you create does what you intended.
Affix multiplication - While Hunspell only provides the possibility to represent 3 levels of affixes, one method to get around this is to combine multiple affixes into one Hunspell affix slot. For example, the Lingala verb extensions (-am, -an, -el, -is, -ol), can combine with verb tense markers (-a, -i, -aka, -aki) which requires 20 rules to be typed in a raw Hunspell affix file (5 x 4). HunspellXML provides a
<multiply>feature so you don't have to type all the combinations out. You only have to enter the rules from each affix group (9 rules instead of 20 for the Lingala example). For languages that need to combine lots of affix rules, this can be a significant improvement in readability and maintainability.
- The groovy-all-[version].jar library from the Groovy distribution.
- The RelaxNG library (jing.jar) from Thai Open Source
- The hunspell.jar library and its jna.jar dependency from HunspellJNA
If you don't want to write your own program to interface with the HunspellXML library, you can use the HunspellXML Converter. Just drop your HunspellXML file onto the running HunspellXML Converter window and it will automatically create your Hunspell dictionary, all the Hunspell plugins, as well as giving you a text area to try out the spell-check functionality of your new Hunspell dictionary.
- Read through the Getting Started Tutorial to build your first simple HunspellXML file.
- Browse through the HunspellXML File Format Overview and its subsections
- Read more about creating affixation rules and HunspellXML's powerful
<multiply><group>syntax in Creating Affixation Rules: an Example with Lingala Verbs
- Read about [Undocumented Hunspell Constraints](wiki/Affix-Rules:-Undocumented Hunspell Constraints)
HunspellXML File Format Reference
- HunspellXML File Format Overview
Tips for Designing Your Dictionary Definition
- Affix Rules
- Creating Affixation Rules: an Example with Lingala Verbs
- Lingala Verb Example HunspellXML File
- Creating a Position Class Chart
- [Undocumented Hunspell Constraints](wiki/Affix-Rules:-Undocumented Hunspell Constraints)
- Suggestion Rules
- Other Tips
- [Export a Hunspell dictionary from FLEx (Field Works Language Explorer)](wiki/Other-Tips:-Export a Hunspell dictionary from FLEx)
- [Add Your Language to LibreOffice](wiki/Other-Tips:-Add Your Language to LibreOffice)