This notebook takes a list of Crosby Schaeffer verbs (ultimately to be extracted from the csv generated by `GenerateVocabList`) and generates all the forms of each word using the following steps:
1. use Stanza to get the lemmas of each word, store these in a file
2. using James Tauber's [greek-inflexion github repo](https://github.com/jtauber/greek-inflexion), follow the steps in [`README-morphgnt.md`](https://github.com/jtauber/greek-inflexion/blob/master/README-morphgnt.md#morphgnt) and use the file created in the last step to generate a list of the forms of each verb 

Step 0: Imports

In [36]:
import stanza
stanza.download('grc') 

!pip install pyyaml
import yaml

HBox(children=(HTML(value='Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/res…

2022-07-27 10:25:42 INFO: Downloading default packages for language: grc (Ancient_Greek)...





2022-07-27 10:25:42 INFO: File exists: /Users/mallard/stanza_resources/grc/default.zip
2022-07-27 10:25:43 INFO: Finished downloading models and saved to /Users/mallard/stanza_resources.




Step 1: define the list of verbs

In [3]:
verbs = ['παύουσ', 'πέμπει', 'γράφω', 'λύω']

Step 2: use stanza to get the lemma of each word

In [33]:
nlp = stanza.Pipeline('grc') 
doc = nlp(" ".join(verbs))
lemmas =[word.lemma for sent in doc.sentences for word in sent.words]
print(lemmas)

HBox(children=(HTML(value='Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/res…




2022-07-27 10:17:51 INFO: Loading these models for language: grc (Ancient_Greek):
| Processor | Package |
-----------------------
| tokenize  | proiel  |
| pos       | proiel  |
| lemma     | proiel  |
| depparse  | proiel  |

2022-07-27 10:17:51 INFO: Use device: cpu
2022-07-27 10:17:51 INFO: Loading: tokenize
2022-07-27 10:17:51 INFO: Loading: pos
2022-07-27 10:17:52 INFO: Loading: lemma
2022-07-27 10:17:52 INFO: Loading: depparse
2022-07-27 10:17:52 INFO: Done loading processors!


['παύον', 'πέμπω', 'πέμπω', 'γράφω', 'λύω']


In [49]:
forms = [word.feats for sent in doc.sentences for word in sent.words]
print(forms)

['Case=Gen|Gender=Masc|Number=Sing', 'Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', None]


Step 3: Store the lemmas in a file (in the same format as the [`*_lexicon.yaml` files](https://github.com/jtauber/greek-inflexion/blob/master/STEM_DATA/dik_lexicon.yaml))

In [44]:
file_name = 'lib/cs_lexicon.yaml'

dict_file = []
for lemma in lemmas: 
    dict_file.append ({lemma: {'stems':[]}})

print(dict_file)

# write to yaml file
with open(file_name, 'w', encoding="utf-8") as file:
    documents = yaml.dump(dict_file, stream=file, allow_unicode=True)

[{'παύον': {'stems': []}}, {'πέμπω': {'stems': []}}, {'πέμπω': {'stems': []}}, {'γράφω': {'stems': []}}, {'λύω': {'stems': []}}]


Step 4: save file to greek-inflexion/STEM_DATA 

Step 5: create new `generate_*_lexicon.py` file