Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Home

TrnsltLife edited this page · 30 revisions

HunspellXML


Introduction

HunspellXML defines an XML format for creating Hunspell dictionaries, and a Java/Groovy library for transforming dictionaries described in HunspellXML into the standard Hunspell format.

Purpose of HunspellXML

Hunspell is a very flexible and powerful spell-check dictionary engine that has been used in a wide variety of programs including Firefox, LibreOffice, OpenOffice, and Opera, as well as in other software. Nevertheless, the file format for specifying a Hunspell dictionary, although documented, is rather complex and difficult to master. HunspellXML aims to facilitate the process of creating Hunspell dictionaries by:

  • providing a simple XML file format which is more human-readable than raw Hunspell files
  • converting the XML to valid Hunspell affix and dictionary files
  • creating Firefox, LibreOffice, OpenOffice, and Opera spell-check plugins automatically

Benefits of Using HunspellXML

Defining your dictionary first in HunspellXML provides the following advantages over defining it directly in the raw Hunspell format:

  • Human-readable - The HunspellXML file is human-readable and thus provides an excellent option for creating Hunspell dictionary source code, without having to learn all formatting options required to create a raw Hunspell dictionary and affix file.
  • Error checking - The HunspellXML library provides some error checking for affix rules, including some restrictions that are not currently documented in the Hunspell documentation.
  • Plugin packaging - The HunspellXML library provides utilities for creating packaged Hunspell dictionary plugins for Firefox, LibreOffice/OpenOffice, and Opera.
  • MyThes thesaurus - HunspellXML also provides basic support for creating MyThes thesaurus files.
  • Testing - In HunspellXML, you can define and export tests (correctly and incorrectly spelled words) to help verify that the Hunspell dictionary you create does what you intended.
  • Affix multiplication - While Hunspell only provides the possibility to represent 3 levels of affixes, one method to get around this is to combine multiple affixes into one Hunspell affix slot. For example, the Lingala verb extensions (-am, -an, -el, -is, -ol), can combine with verb tense markers (-a, -i, -aka, -aki) which requires 20 rules to be typed in a raw Hunspell affix file (5 x 4). HunspellXML provides a <multiply> feature so you don't have to type all the combinations out. You only have to enter the rules from each affix group (9 rules instead of 20 for the Lingala example). For languages that need to combine lots of affix rules, this can be a significant improvement in readability and maintainability.

Requirements

  • Java
  • The groovy-all-[version].jar library from the Groovy distribution.
  • The RelaxNG library (jing.jar) from Thai Open Source
  • The hunspell.jar library and its jna.jar dependency from HunspellJNA

User Interface

If you don't want to write your own program to interface with the HunspellXML library, you can use the HunspellXML Converter. Just drop your HunspellXML file onto the running HunspellXML Converter window and it will automatically create your Hunspell dictionary, all the Hunspell plugins, as well as giving you a text area to try out the spell-check functionality of your new Hunspell dictionary.

Getting Started

HunspellXML File Format Reference

Tips for Designing Your Dictionary Definition

Something went wrong with that request. Please try again.