# Yup'ik Toy FST Morphological Analyzer

Central Alaskan Yup'ik (ISO 639-3: esu) is a polysynthetic language in the Eskimo-Aleut language family located in south-west Alaska.

This lab consists of creating a toy FST morphological analyzer for Yup'ik. This toy grammar will be looking at the recursive nature of Yup'ik noun and verb suffixes and the morphophonological processes.

## Yup'ik Words

![Word](Yup'ikWord.png)

Yup'ik nouns and verbs consist of one base, zero or more derivational suffixes, one inflectional suffix, and zero or more enclitics.

Bases are recursive and allow adding multiple derivaitonal suffixes.

### Base = Base + Derivational Suffix

There are 4 types of derivational suffixes:
* [`N` → `N`] noun elaborating
* [`V` → `V`] verb elaborating
* [`N` → `V`] verbalizing
* [`V` → `N`] nominalizing

Suffixes attach to their corresponding base type (noun or verb) and the resulting base+suffix either stays the same type (`N` → `N` or `V` → `V`) or changes type (`N` → `V` or `V` → `N`).

## Yup'ik FST Diagram
![FST](Yup'ikFSDiagram.png)

## Yup'ik Morphophonological Processes

Yup'ik morphophonology is described by Jacobson with a variety of symbols. These symbols are associated with certain morphophonological processes described in the table below. 

| Symbol | Description |
|:---|:---|
| `+` | indicates that the suffix keeps final consonants of <br>bases (and if the base does not end in a consonant, <br>the postbase merely affixes to the base without <br>changing it) |
| `-` | indicates that the suffix drops final consonants from <br>bases |
| `~` | indicates that the suffix drops final e from bases |
| `:` | indicates that the suffix drops voiced velar/uvular <br>continuants (g, r, or ng) if they occur between single <br>vowels of which at least the first is full |
| `(ng)` `(s)` | are used with bases ending in a vowel |
| `(t)` | is used with bases ending in a consonant |
| `(g)` | is used with bases ending in two vowels |


### Example

```
boat-big-to.make-IND.3sg
      <==>
angyar-rpag-li~+(g)(t)uq
      <==>
angyarpaliuq
```

In [None]:
import hfst
import fstutils as fst

# Phase 1

- [ ] <b>Task 1</b> Word final replacements –– `r` to `q`, `g` to `k`, and `e` to `a`
- [ ] <b>Task 2</b> Implement morphophonological processes [`+`, `-`, `~`, `(g)`, `(t)`]
- [ ] <b>Task 3</b> Replace `=` to `-` for enclitics

In [None]:
# Convenient Yup'ik character classes for grammar rules
# Add new definitions if you'd like
defs = fst.Definitions({
    "Stop":'[ p | t | c | k | q ]',
    "Nasal":'[ m | n | "ng" | ḿ | ń | "ńg" ]',
    "Fricative":'[  v   |  l   |  s   |  g   |  r   | "vv" | "ll" | "ss" | "gg" | "rr" |"u͡g" | "u͡gg" | "u͡r" | "u͡rr" ]',
    'C':'[ Stop | Nasal | Fricative | w | y ]',
    'FullVowel':'[ a | i | u ]',
    'V':'[ e | FullVowel ]',
    'Alphabet':'[ C | V ]',
    'MorphPhonSymbols':'[ "~" | "+" | "-" | ":" | "@" | "`" | "(ng)" | "(s)" | "(g)" | "(t)" ]'
})

In [None]:
### REGEX REPLACE RULES

# TODO: Define new regex replace rules here!



In [None]:
### GRAMMAR COMPOSITION
grammar = hfst.compile_lexc_file('esu_toy.lexc')

# TODO: Compose new regex replace rules here!



## Testing

In [None]:
fst.lookup(grammar, 'boat-big-to.make-place.to-to.lack-PST-again-IND.3sg=reported')

In [None]:
# Run testsuite1
test_target1 = fst.read_test_file("testsuite1.txt")

In [None]:
fst.test_fst(grammar, test_target1)

# Phase 2

- [ ] **Task 1** Add new words to `esu_toy.lexc`

|Category|Words|
|---|---|
|Noun Base | village:nuna<br>ice.cream:akutar<br>person:yug<br>sun:akerte |
|Verb Base | be.smart:elisnga<br>sit:aqume<br>walk:piyua<br>study:elitnaur<br>go:ayag |
|Noun Deriv | -fake:~+(ng)uar [N → N]<br>-to.lack:~%:(ng)ite [N → V] |
|Verb Deriv | -A'.say:~+ni [V → V]<br>|-one.who:+(s)te [V → N] |
|Noun Infl | -ABS.1sg.sg:-ka |
|Verb Infl | -IND.1sg.3sg:~+(g)aqa |
|Enclitic | =reported:=gguq |

- [ ] **Task 2** Implement morphophonological processes [`:`, `(ng)`, `(s)`]
- [ ] **Task 3** Triple consonant cluster should add e after middle consonant (i.e. `CCC` → `CCeC`)
- [ ] **Task 4** Add rule: (`engi` → `ai`)

## Testing

In [None]:
# Run testsuite2
test_target2 = fst.read_test_file("testsuite2.txt")

In [None]:
fst.test_fst(grammar, test_target2)

In [None]:
# Quick lookup function to check grammar output
fst.lookup(grammar, 'boat-big-to.make-place.to-to.make-PST-again-IND.3sg=QST')