# Yup'ik Toy FST Morphological Analyzer
Central Alaskan Yup'ik (ISO 639-3: esu) is a polysynthetic language in the Eskimo-Aleut language family located in south-west Alaska.

This lab consists of creating a toy FST morphological analyzer for Yup'ik. This toy grammar will be looking at the recursive nature of Yup'ik noun and verb suffixes.

## Yup'ik Words
![Word](Yup'ikWord.png)

Yup'ik noun and verbs consist of one base, zero or more derivational 

## Yup'ik FST Diagram
![FST](Yup'ikFSDiagram.png)

## Yup'ik Morphophonological Processes

|Symbol | Description |
|:---:|:---:|
| \+ | indicates that the suffix keeps final consonants of bases (and if the base does not end in a consonant, the postbase merely affixes to the base without changing it). |
| \- | indicates that the suffix drops final consonants from bases. |
| ~ |  indicates that the suffix drops final e from bases. |
| : | indicates that the suffix drops voiced velar/uvular continuants (fricatives and nasals, g, r, or ng) if they occur between single vowels of which at least the first is full. |
| (ng) (s) | are used with bases ending in a vowel. |
| (t) |  is used with bases ending in a consonant. |
| (g) |  is used with bases ending in two vowels. |

In [1]:
import hfst
import fstutils as fst

In [2]:
defs = fst.Definitions({
    "Stop":'[ p | t | c | k | q ]',
    "Nasal":'[ m | n | "ng" | ḿ | ń | "ńg" ]',
    "Fricative":'[  v   |  l   |  s   |  g   |  r   | "vv" | "ll" | "ss" | "gg" | "rr" |"u͡g" | "u͡gg" | "u͡r" | "u͡rr" ]',
    'C':'[ Stop | Nasal | Fricative | w | y ]',
    'FullVowel':'[ a | i | u ]',
    'V':'[ e | FullVowel ]',
    'Alphabet':'[ C | V ]',
    'MorphPhonSymbols':'[ "~" | "+" | "-" | ":" | "@" | "`" | "(ng)" | "(s)" | "(g)" | "(t)" ]'
})

In [3]:
allomorphy = hfst.regex(defs.replace('"(ng)" -> "ng", "(s)" -> s || V MorphPhonSymbols* _ .o. "(t)" -> t || [ g | r ] MorphPhonSymbols* _ .o. "(g)" -> g || V V MorphPhonSymbols* _ .o. [ "(ng)" | "(s)" | "(g)" | "(t)" ] -> 0'))
dropConsonant = hfst.regex(defs.replace('C -> 0 || _ MorphPhonSymbols* "-" .o. "-" -> 0'))
keepConsonant = hfst.regex(defs.replace('"+" -> 0'))
eDeletion = hfst.regex(defs.replace('e -> 0 || _ MorphPhonSymbols* "~" .o. "~" -> 0'))
velarDropping = hfst.regex(defs.replace('[ g | r | "ng" ] -> 0 || C V _ ":" V C .o. ":" -> 0'))
baseFinalEndings = hfst.regex(defs.replace('r -> q, g -> k, e -> a || _ [ "+" | .#. ]'))
tripleConsonant = hfst.regex(defs.replace('[..] -> e || C C _ C'))
engi = hfst.regex(defs.replace('e ng i -> a i'))
cleanup = hfst.regex(defs.replace('"=" -> "-"'))

In [4]:
grammar = hfst.compile_lexc_file('esu_toy_full.lexc')
grammar.compose(allomorphy)
grammar.compose(dropConsonant)
grammar.compose(keepConsonant)
grammar.compose(eDeletion)
grammar.compose(velarDropping)
grammar.compose(baseFinalEndings)
grammar.compose(tripleConsonant)
grammar.compose(engi)
grammar.compose(cleanup)

In [5]:
fst.lookup(grammar, 'boat-big-to.make-place.to-to.lack-PST-again-IND.3sg=reported')

['angyarpaliviitellrunqigtuq-gguq']

In [6]:
test_target1 = fst.read_test_file("testsuite1.txt")

In [7]:
fst.test_fst(grammar, test_target1)

caribou\ABS.sg
tuntu
boat\ABS.sg
angyaq
river\ABS.sg
kuik
dog\ABS.sg
qimugta
work-IND.3sg
caliuq
eat-IND.3sg
neruq
play-IND.3sg
aquiguq
sleep-IND.3sg
qavartuq
be.good-IND.3sg
assirtuq
be.hungry-IND.3sg
kaigtuq
caribou-big\ABS.sg
tunturpak
boat-big\ABS.sg
angyarpak
river-big\ABS.sg
kuirpak
dog-big\ABS.sg
qimugterpak
work-PST-IND.3sg
calillruuq
eat-PST-IND.3sg
nerellruuq
sleep-PST-IND.3sg
qavallruuq
be.hungry-PST-IND.3sg
kaillruuq
caribou-to.make-IND.3sg
tuntuliuq
boat-to.make-IND.3sg
angyaliuq
river-to.make-IND.3sg
kuiliuq
dog-to.make-IND.3sg
qimugteliuq
work-place.to-IND.3sg
calivik


FstPathNotFound: The string work-place.to-IND.3sg was not found in the transducer

In [10]:
fst.lookup(grammar, 'work-place.to\ABS.sg')

['calivik']