# Yup'ik Toy FST Morphological Analyzer
Central Alaskan Yup'ik (ISO 639-3: esu) is a polysynthetic language in the Eskimo-Aleut language family located in south-west Alaska.

This lab consists of creating a toy FST morphological analyzer for Yup'ik. This toy grammar will be looking at the recursive nature of Yup'ik noun and verb suffixes.

## Yup'ik Words
![Word](Yup'ikWord.png)

Yup'ik noun and verbs consist of one base, zero or more derivational 

## Yup'ik FST Diagram
![FST](Yup'ikFSDiagram.png)

## Yup'ik Morphophonological Processes

|Symbol | Description |
|:---:|:---:|
| \+ | indicates that the suffix keeps final consonants of bases (and if the base does not end in a consonant, the postbase merely affixes to the base without changing it). |
| \- | indicates that the suffix drops final consonants from bases. |
| ~ |  indicates that the suffix drops final e from bases. |
| : | indicates that the suffix drops voiced velar/uvular continuants (fricatives and nasals, g, r, or ng) if they occur between single vowels of which at least the first is full. |
| (ng) (s) | are used with bases ending in a vowel. |
| (t) |  is used with bases ending in a consonant. |
| (g) |  is used with bases ending in two vowels. |

In [1]:
import hfst
import fstutils as fst

In [3]:
defs = fst.Definitions({
    "Stop":'[ p | t | c | k | q ]',
    "Nasal":'[ m | n | "ng" | ḿ | ń | "ńg" ]',
    "Fricative":'[  v   |  l   |  s   |  g   |  r   | "vv" | "ll" | "ss" | "gg" | "rr" |"u͡g" | "u͡gg" | "u͡r" | "u͡rr" ]',
    'C':'[ Stop | Nasal | Fricative | w | y ]',
    'FullVowel':'[ a | i | u ]',
    'V':'[ e | FullVowel ]',
    'Alphabet':'[ C | V ]',
    'MorphPhonSymbols':'[ "~" | "+" | "-" | ":" | "@" | "`" | "(ng)" | "(s)" | "(g)" | "(t)" ]'
})

In [15]:
allomorphy = hfst.regex(defs.replace('"(ng)" -> "ng", "(s)" -> s || V MorphPhonSymbols* _ .o. "(t)" -> t || [ g | r ] MorphPhonSymbols* _ .o. "(g)" -> g || V V MorphPhonSymbols* _ .o. [ "(ng)" | "(s)" | "(g)" | "(t)" ] -> 0'))
dropConsonant = hfst.regex(defs.replace('C -> 0 || _ MorphPhonSymbols* "-" .o. "-" -> 0'))
keepConsonant = hfst.regex(defs.replace('"+" -> 0'))
eDeletion = hfst.regex(defs.replace('e -> 0 || _ MorphPhonSymbols* "~" .o. "~" -> 0'))
velarDropping = hfst.regex(defs.replace('[ g | r | "ng" ] -> 0 || C V ":" _ V C .o. ":" -> 0'))
baseFinalEndings = hfst.regex(defs.replace('r -> q, g -> k, e -> a || _ [ "=" | .#. ]'))
tripleConsonant = hfst.regex(defs.replace('[..] -> e || C C _ C'))
engi = hfst.regex(defs.replace('e ng i -> a i'))
cleanup = hfst.regex(defs.replace('"=" -> "-"'))

In [17]:
grammar = hfst.compile_lexc_file('esu_toy.lexc')
grammar.compose(allomorphy)
grammar.compose(dropConsonant)
grammar.compose(keepConsonant)
grammar.compose(eDeletion)
grammar.compose(velarDropping)
grammar.compose(baseFinalEndings)
grammar.compose(tripleConsonant)
grammar.compose(engi)
grammar.compose(cleanup)

In [18]:
fst.lookup(grammar, 'boat-big-to.make-place.to.V-to.lack-PST-again-IND.3sg=reported')

['angyarpaliviitellrunqigtuq-gguq']