Skip to content

Other Tips: Export a Hunspell Dictionary from FLEx

TrnsltLife edited this page Apr 10, 2018 · 3 revisions

These are instructions on how to export a Hunspell-usable wordlist with affix flags from FLEx (Field Works Language Explorer). It assumes the use of an example database of English words like the "Simple" FLEx database. It also assumes that you are familiar with how to use FLEx.

Note that these instructions present one way to store Hunspell affixation codes in your FLEx database, but there are probably other (even better) ways to do it, so feel free to experiment.

  1. In FLEx, open the Lists tab on the side.
  2. Choose the Insert->Custom List... menu item.
    1. Name the List: Hunspell Flags
    2. Uncheck "Allow duplicate item names"
    3. Data in Analysis Writing Systems
    4. Display items in the list by: Abbreviation - Name
    5. Click OK to create the list.
  3. Use the Insert->List Item menu or List Item button to create new list items:
    1. Description: Noun with plural in -s; Name&Abbreviation: NS
    2. Description: Regular Verb; Name&Abbreviation: VB
    3. Description: Past Participle in -en; Name&Abbreviation: VN
    4. Description: Verb with present in -s Name&Abbreviation: VS
  4. (You can choose different flags than NS, VB, VN, VS, if you want. Here are the rules:)
    1. All your flags must be of the same type(see below)
    2. Two-character codes (like NS, VB, VN, C2)
    3. One-character codes (like S, V, n, #)
    4. A number between 1 and 65000
  5. Go to the Lexicon tab.
  6. Choose the Tools->Configure->Custom Fields... menu item.
    1. Click the Add button to add a new field.
    2. Name the Custom Field Name: Hunspell Flags
    3. Location: Entry
    4. Type: List Reference (multiple items)
    5. List: Hunspell Flags
    6. Writing System(s): First Analysis Writing System.
    7. Click OK to create the custom field.
  7. Add flags to the Hunspell Flags field for some of your words. Using Bulk Edit Entries, it is easy to do this.
    1. Find all your regular verbs and add the VB flag
    2. Find irregular verb forms like "broke" and "spoke" and add the VN flag.
    3. Find irregular verbs like "speak", "break", "say", "ring" and add the VS flag.
    4. Add the NS flag to most of your nouns.
  8. Now we'll find out how to export this data so you can use it in your Hunspell dictionary.
    1. First, Go to the Dictionary view in the Lexicon tab.
    2. Use the Tools->Configure->Dictionary... menu item.
    3. At the top, click the Manage Views link.
      1. Select the Stem-based dictionary from the list and click the Copy button.
      2. Name the new dictionary view "Stem-based Hunspell Export" and click OK.
    4. Change the select box at the top to Stem-based Hunspell Export.
    5. Under the Main Entry, uncheck everything except for Headword and Hunspell Flag.
    6. Click Headword.
      1. Change the Character Style to
      2. Click the Configure Homograph Number link and Hide the number.
      3. Delete all characters and spaces from the Before:, Between:, and After: fields.
    7. Click on Hunspell Flags.
      1. Change Character Style for Content to
      2. Remove all characters and spaces from the Before:, Between:, and After: fields. Put a single slash "/" in the Before: field.
    8. Repeat steps 'v' - 'vii', but this time for Minor Entry instead of for Main Entry.
    9. Click the OK button at the bottom.
  9. In Lexicon Edit, set the filter on headword to exclude spaces. (Regular Expression: ^[^ ]*$). And set the Morph Type filter to exclude all prefixes, suffixes, infixes, etc. The resulting dictionary view should look something like below. The headword, followed by a slash, optionally followed by the Hunspell Flags (2 letter codes with no spaces between them).
break/VS
bright
bright
bring
broke/VN
bye
car/NS
cat/NS
cheerio
color/NS
colour/NS
cross/VBNS
  1. In Dictionary view, press Ctrl+A to Select All the text.
    1. Press Ctrl+C to copy it to the clipboard.
    2. Now open a simple text editor like Notepad or Notepad++, and paste the word list into a new text file.
    3. Look at the number of words in the word list (you can see this in the bottom right status bar in FLEx) and type that number at the top of the file before the very first word.
    4. Finally, save the text in UTF-8 format as en_US.dic. (Replace "en" and "US" with the Ethnologue language code and the country code for your language.)
85
aardvark/NS
across
bank/NS
...