This notebook takes a list of Crosby Schaeffer verbs (ultimately to be extracted from the csv generated by `GenerateVocabList`) and generates all the forms of each word using the following steps:

#### Part 0
1. make a local copy of James Tauber's [greek-inflexion github repo](https://github.com/jtauber/greek-inflexion)
2. copy this notebook into greek-inflexion
3. copy the file `Summer-2022/lib/greek-inflexion-files/crosby_schaeffer_generate.py` into `/greek-inflexion/`
4. copy the file `lib/greek-inflexion-files/cs_10_verbs.txt` into `/greek-inflexion/`

#### Part 1
1. use Stanza to get the lemmas of each word, store these in a file
2. follow the steps in [`README-morphgnt.md`](https://github.com/jtauber/greek-inflexion/blob/master/README-morphgnt.md#morphgnt) to generate the stems of each word in the file created above
3. save this yaml file as `crosby_schaeffer_lexicon.yaml` and move it into `greek-inflexion/STEM_DATA/`


#### Part 2
1. conjecture the stems for each word in the yaml file, copy-paste results into the yaml file
(Note that I'm also saving a copy of this yaml file in `Summer-2022/lib/greek-inflexion-files/` for reference), clean up the yaml file
2. from the `greek-inflexion` directory, run `./crosby_schaeffer_generate.py`
3. use greek-inflexion to generate all the forms of each word in the yaml file
4. save these results into a csv file `all_forms.csv`
5. copy this csv file into `Summer-2022/lib/greek-inflexion-files/` 

## Part 1

Step 0: Imports

In [2]:
import stanza
stanza.download('grc') 

!pip install pyyaml
import yaml

import csv

HBox(children=(HTML(value='Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/res…

2022-08-01 14:56:21 INFO: Downloading default packages for language: grc (Ancient_Greek)...





2022-08-01 14:56:21 INFO: File exists: /Users/mallard/stanza_resources/grc/default.zip
2022-08-01 14:56:22 INFO: Finished downloading models and saved to /Users/mallard/stanza_resources.




Step 1: define the list of verbs

In [18]:
# read in the list of verbs (first 10 chapters)
txt_file = open("./cs_10_verbs.txt", "r")
file_content = txt_file.read()

verbs = file_content.split("\n")
txt_file.close()

['ἔχει', 'παύει', 'πέμπει ', 'ἄγω', 'γράφω', 'ἐθέλω', 'λύω', 'παύω', 'φυλάττω', 'ἀθροίζω', 'ἁρπάζω', 'ἄρχω', 'μέλλω', 'διώκω', 'νομίζω', 'πείθω', 'λείπω']


Step 2: use stanza to get the lemma of each word

In [12]:
nlp = stanza.Pipeline('grc') 
doc = nlp(" ".join(verbs))
lemmas =[word.lemma for sent in doc.sentences for word in sent.words]
print(lemmas)

['ἔχει', 'παύει', 'πέμπει ', 'ἄγω', 'γράφω', 'ἐθέλω', 'λύω', 'παύω', 'φυλάττω', 'ἀθροίζω', 'ἁρπάζω', 'ἄρχω', 'μέλλω', 'διώκω', 'νομίζω', 'πείθω', 'λείπω']


HBox(children=(HTML(value='Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/res…




2022-08-01 15:04:41 INFO: Loading these models for language: grc (Ancient_Greek):
| Processor | Package |
-----------------------
| tokenize  | proiel  |
| pos       | proiel  |
| lemma     | proiel  |
| depparse  | proiel  |

2022-08-01 15:04:41 INFO: Use device: cpu
2022-08-01 15:04:41 INFO: Loading: tokenize
2022-08-01 15:04:41 INFO: Loading: pos
2022-08-01 15:04:41 INFO: Loading: lemma
2022-08-01 15:04:42 INFO: Loading: depparse
2022-08-01 15:04:42 INFO: Done loading processors!


['ἔχω', 'παύω', 'πέμπω', 'ἄγω', 'γράφω', 'ἐθέλω', 'λύω', 'παύω', 'φυλάσσω', 'ἀθροίζω', 'ἁρπάζω', 'ἄρχω', 'μέλλω', 'διώκω', 'νομίζω', 'πείθω', 'λείπω']


In [13]:
forms = [word.feats for sent in doc.sentences for word in sent.words]
print(forms)

['Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', None, None, 'Mood=Imp|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', None, 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act']


Step 3: Store the lemmas in a file (in the same format as the [`*_lexicon.yaml` files](https://github.com/jtauber/greek-inflexion/blob/master/STEM_DATA/dik_lexicon.yaml))

In [70]:
file_name = './STEM_DATA/crosby_schaeffer_lexicon.yaml'

f = open(file_name, "w")
for lemma in lemmas:
    f.write(lemma+':\n    stems:\n')
f.close()

-----

## Part 2

In [71]:
from greek_inflexion import GreekInflexion

In [92]:
inflexion = GreekInflexion('stemming.yaml', 'STEM_DATA/crosby_schaeffer_lexicon.yaml')

f = open(file_name, "w")

# conjecture the stem
for lemma in lemmas:
    l = sorted(inflexion.possible_stems(lemma, '.+1S$')) # NOTE: edit this line to include additional stems
    if len(l) > 0:
        f.write('\n'+lemma+':\n    stems:\n        1-: ' + l[0][1])
    else:
        f.write('\n'+lemma+':\n    stems:\n')

f.close()

In [93]:
# generate all the forms, save to a csv file
forms = []
for lemma in lemmas:
    forms.append(inflexion.conjugate_core(lemma, "PAI", "AAI", tags={})[0][1]) # NOTE: Edit this line and the one below to add additional forms
  
header = ['PAI.1S', 'PAI.2S', 'PAI.3S', 'PAI.1P', 'PAI.2P', 'PAI.3P']
  
with open('all_forms.csv', 'w') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames = header)
    writer.writeheader()
    writer.writerows(forms)