This notebook takes a list of Crosby Schaeffer verbs (ultimately to be extracted from the csv generated by `GenerateVocabList`) and generates all the forms of each word using the following steps:

#### Part 0
1. make a local copy of James Tauber's [greek-inflexion github repo](https://github.com/jtauber/greek-inflexion)
2. copy this notebook into greek-inflexion
3. copy the file `Summer-2022/lib/greek-inflexion-files/crosby_schaeffer_generate.py` into `/greek-inflexion/`
4. copy the file `lib/greek-inflexion-files/cs_10_verbs.txt` into `/greek-inflexion/`

#### Part 1
1. use Stanza to get the lemmas of each word, store these in a file
2. follow the steps in [`README-morphgnt.md`](https://github.com/jtauber/greek-inflexion/blob/master/README-morphgnt.md#morphgnt) to generate the stems of each word in the file created above (ΝΟΤΕ: Right now, I'm entering the imperfect stems manually. Ideally, this should be automated.)
3. save this yaml file as `crosby_schaeffer_lexicon.yaml` and move it into `greek-inflexion/STEM_DATA/`


#### Part 2
1. conjecture the stems for each word in the yaml file, copy-paste results into the yaml file
(Note that I'm also saving a copy of this yaml file in `Summer-2022/lib/greek-inflexion-files/` for reference), clean up the yaml file
2. from the `greek-inflexion` directory, run `./crosby_schaeffer_generate.py`
3. use greek-inflexion to generate all the forms of each word in the yaml file
4. save these results into a csv file `all_forms.csv`
5. copy this csv file into `Summer-2022/lib/greek-inflexion-files/` 

## Part 1

Step 0: Imports

In [3]:
import stanza
stanza.download('grc') 

!pip install pyyaml
import yaml

import csv

HBox(children=(HTML(value='Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/res…

2022-08-02 12:09:48 INFO: Downloading default packages for language: grc (Ancient_Greek)...





2022-08-02 12:09:49 INFO: File exists: /Users/mallard/stanza_resources/grc/default.zip
2022-08-02 12:09:50 INFO: Finished downloading models and saved to /Users/mallard/stanza_resources.




Step 1: define the list of verbs

In [4]:
# read in the list of verbs (first 10 chapters)
txt_file = open("./cs_10_verbs.txt", "r")
file_content = txt_file.read()

verbs = file_content.split("\n")
txt_file.close()

Step 2: use stanza to get the lemma of each word

In [5]:
nlp = stanza.Pipeline('grc') 
doc = nlp(" ".join(verbs))
lemmas =[word.lemma for sent in doc.sentences for word in sent.words]
print(lemmas)

HBox(children=(HTML(value='Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/res…




2022-08-02 12:09:55 INFO: Loading these models for language: grc (Ancient_Greek):
| Processor | Package |
-----------------------
| tokenize  | proiel  |
| pos       | proiel  |
| lemma     | proiel  |
| depparse  | proiel  |

2022-08-02 12:09:55 INFO: Use device: cpu
2022-08-02 12:09:55 INFO: Loading: tokenize
2022-08-02 12:09:55 INFO: Loading: pos
2022-08-02 12:09:55 INFO: Loading: lemma
2022-08-02 12:09:55 INFO: Loading: depparse
2022-08-02 12:09:56 INFO: Done loading processors!


['ἔχω', 'παύω', 'πέμπω', 'ἄγω', 'γράφω', 'ἐθέλω', 'λύω', 'παύω', 'φυλάσσω', 'ἀθροίζω', 'ἁρπάζω', 'ἄρχω', 'μέλλω', 'διώκω', 'νομίζω', 'πείθω', 'λείπω']


In [6]:
forms = [word.feats for sent in doc.sentences for word in sent.words]
print(forms)

['Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', None, None, 'Mood=Imp|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', None, 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act', 'Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act']


Step 3: Store the lemmas in a file (in the same format as the [`*_lexicon.yaml` files](https://github.com/jtauber/greek-inflexion/blob/master/STEM_DATA/dik_lexicon.yaml))

In [61]:
file_name = './STEM_DATA/crosby_schaeffer_lexicon.yaml'

f = open(file_name, "w")
for lemma in lemmas:

    f.write(lemma+':\n    stems:\n')

f.close()


-----

## Part 2

In [62]:
from greek_inflexion import GreekInflexion

In [119]:
inflexion = GreekInflexion('stemming.yaml', 'STEM_DATA/crosby_schaeffer_lexicon.yaml')

f = open(file_name, "w")

# conjecture the stem
for lemma in lemmas:
    # get the list of all stems
    l = sorted(inflexion.possible_stems(lemma, '.+1S$'))
    
    # get the present stem (in the form of a tuple)
    pres_stem = [item for item in l if item[0] == 'PAI.1S']
    
    # if a list of stems was successfully generated
    if len(l) > 0:
        f.write('\n'+lemma+':\n    stems:\n        1-: ' + pres_stem[0][1])
    else:
        f.write('\n'+lemma+':\n    stems:\n')
        
    # to produce the imperfect stem, we'll repeat the process with 1+ (the augmented stem)...

f.close()

In [122]:
inflexion = GreekInflexion('stemming.yaml', 'STEM_DATA/crosby_schaeffer_lexicon.yaml')

In [123]:
# generate all the forms, save to a csv file
forms = []
for lemma in lemmas:
    try:
        # present forms
        a = inflexion.conjugate_core(lemma, "PAI", tags={"final-nu-aai.3s"})[0][1]
        # imperfect forms
        b = inflexion.conjugate_core(lemma, "IAI", tags={"final-nu-aai.3s"})[0][1]
        
        forms.append({**a,**b}) 

    except:
        forms.append("")
print(forms)
  
header = ['PAI.1S', 'PAI.2S', 'PAI.3S', 'PAI.1P', 'PAI.2P', 'PAI.3P', 'IAI.1S', 'IAI.2S', 'IAI.3S', 'IAI.1P', 'IAI.2P', 'IAI.3P']
  
with open('all_forms.csv', 'w') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames = header)
    writer.writeheader()
    writer.writerows(forms)

[{'PAI.1S': 'ἔχω', 'PAI.2S': 'ἔχεις', 'PAI.3S': 'ἔχει', 'PAI.1P': 'ἔχομεν', 'PAI.2P': 'ἔχετε', 'PAI.3P': 'ἔχουσι(ν)', 'IAI.1S': 'ἦχον', 'IAI.2S': 'ἦχες', 'IAI.3S': 'ἦχε(ν)', 'IAI.1P': 'ἤχομεν', 'IAI.2P': 'ἤχετε', 'IAI.3P': 'ἦχον'}, {'PAI.1S': 'παύω', 'PAI.2S': 'παύεις', 'PAI.3S': 'παύει', 'PAI.1P': 'παύομεν', 'PAI.2P': 'παύετε', 'PAI.3P': 'παύουσι(ν)', 'IAI.1S': 'ἔπαυον', 'IAI.2S': 'ἔπαυες', 'IAI.3S': 'ἔπαυε(ν)', 'IAI.1P': 'ἐπαύομεν', 'IAI.2P': 'ἐπαύετε', 'IAI.3P': 'ἔπαυον'}, {'PAI.1S': 'πέμπω', 'PAI.2S': 'πέμπεις', 'PAI.3S': 'πέμπει', 'PAI.1P': 'πέμπομεν', 'PAI.2P': 'πέμπετε', 'PAI.3P': 'πέμπουσι(ν)', 'IAI.1S': 'ἔπεμπον', 'IAI.2S': 'ἔπεμπες', 'IAI.3S': 'ἔπεμπε(ν)', 'IAI.1P': 'ἐπέμπομεν', 'IAI.2P': 'ἐπέμπετε', 'IAI.3P': 'ἔπεμπον'}, {'PAI.1S': 'ἄγω', 'PAI.2S': 'ἄγεις', 'PAI.3S': 'ἄγει', 'PAI.1P': 'ἄγομεν', 'PAI.2P': 'ἄγετε', 'PAI.3P': 'ἄγουσι(ν)', 'IAI.1S': 'ἄγον/ἦγον', 'IAI.2S': 'ἄγες/ἦγες', 'IAI.3S': 'ἄγε(ν)/ἦγε(ν)', 'IAI.1P': 'ἄγομεν/ἤγομεν', 'IAI.2P': 'ἄγετε/ἤγετε', 'IAI.3P': 'ἄγον/

In [118]:
# print(list(inflexion.possible_stems('ἄγω')))
# # inflexion.conjugate("ἄγω", "PAI", "IAI", tags={"final-nu-aai.3s"})
# inflexion.conjugate_core('ἄγω', "PAI", "IAI", tags={"final-nu-aai.3s"})[1][1]
# list(inflexion.conjugate_core('ἄγω', "PAI", tags={"final-nu-aai.3s"})[0][1].values())
# inflexion.conjugate('γράφω', tags={"final-nu-aai.3s"})[0][1]

-
    lemma: γράφω

    tags:
      - final-nu-aai.3s




TypeError: 'NoneType' object is not subscriptable