Skip to content

HunspellXML Format

TrnsltLife edited this page Dec 2, 2013 · 37 revisions

HunspellXML   HunspellXML Format


HunspellXML File Format

The file format specification below is currently just an overview. In depth explanations are given for some HunspellXML elements, but for information about what some of them do, you'll need to consult the Hunspell documentation or Hunspell test and example files included in the Hunspell-1.3.2.tar.gz file.

Conventions

The following conventions are used in describing the data format of the XML elements:

  • [boolean] - a boolean variable with possible values of either "true" or "false"
  • [char] - a single character
  • [list of chars] - a list of characters separated by spaces
  • [flag] - the format of the flag datatype varies depending on what is set in the <flagType> element in the <settings> block
  • [list of flags] - a list of flags, separated by spaces
  • [list of morphs] - a list of morphology fields, separated by spaces. A morphology field is a two-letter code, a colon, and a description, e.g. is:plural. See HunspellXML Format (DictionaryFile).
  • [list of synonyms] - a list of synonyms (words), separated by vertical bars/pipes (|).
  • [integer] - an integer value
  • [integer:0-99] - an integer value with a minimum possible value of 0 and a maximum possible value of 99
  • [locale] - a locale is an ISO639-1, ISO639-2, or ISO639-3 language code, optionally followed by an underscore and an ISO3166-1 alpha-2 country code. (examples: en_US, ebo_CG, en, ebo)
  • [list of locales] - a list of locales, separated by spaces
  • [regex] - a simplied regular expression
  • [text] - any sequence of characters

Top Level Elements

The root <hunspell>...</hunspell> element can contain the following elements:

<hunspell>
	<suppress ... />
	<metadata> ... </metadata>
	<affixFile> ... </affixFile>
	<dictionaryFile> ... </dictionaryFile>
	<thesaurusFile> ... </thesaurusFile>
	<tests> ... </tests>
</hunspell>

<suppress.../>

Attributes:

  • blankLines [boolean]
  • myBlankLines [boolean]
  • comments [boolean]
  • myComments [boolean]
  • metadata [boolean]

The suppress element allows you to hide metadata, comments, and blank lines from the final Hunspell affix file. By default, HunspellXML writes some section heading comments into the Hunspell .aff file, as well as including all the metadata elements as comments. These comments can be suppressed using the comments="false" and metadata="false" attributes.

You can also enter your own comments inside the <affixFile>...</affixFile> element using the <comment>...</comment> element (see below), but these can be hidden with myComments="true". You can enter blank lines with <br/>, but these can be hidden with myBlankLines="true".

<comment>[text]</comment>

The <comment>[text]</comment> element is there to let you add comments to the Hunspell .aff file. Any text you place in the comment element will be written into the Hunspell .aff file with the comment character # at the start of the line. Multi-line text is fine.

The <comment>[text]</comment> element can be placed anywhere directly inside any of the following elements (it may not be placed inside sub-elements unless they are specified in this list):

  • <metadata>...</metadata>
  • <customAttributes>...</customAttributes>
  • <affixFile>...</affixFile>
  • <affixes>...</affixes>
  • <compounds>...</compounds>
  • <convert>...</convert> (only right before the first <input .../> element and before the first <output .../> element)
  • <settings>...</settings>
  • <suggestions>..</suggestions>

The text inside the comment tags will be output into the dictionary .aff file at the place it occurs in the HunspellXML file, preceded by the # comment character.

<br/>

The <br/> element can be placed anywhere the <comment>[text]</comment> element can go. It is used to write blank lines into the Hunspell .aff file, to allow you to format the .aff file to look the way you want.

<!-- Standard XML Comments -->

Standard XML comments can go anywhere that XML allows. These can be useful for documenting your HunspellXML file or commenting out certain affix rules as you are testing your dictionary.

See the following sections: