# Automatic annotation of Greek

This exercice is thought to automatically annotate Greek text. We will use models which were trained by Stanza using data from the following treebanks: PROIEL and Perseus. Stanza was developed by the Stanford NLP group. In the following exercice you will be able to compare the same text as annotated using the two models.

To start the exercice, just **run a cell**, that is: just click on the on the cell and then the button `Run` on the menu on top or, alternatively, press `Ctrl + Enter` in Windows or Linux, or `Command key (⌘) + Enter` in Apple. And then go on running the following cells. You can also run the complete notebook (from start to end) by using the menu: `Cell > Run all`

If you want, you can change the Greek text in the first cell. 


### Before starting, be aware that...

- ...downloading the models might take up to a couple of minutes.
- ...you must respect the order in which the cells appear.
- ...if you want to change the text and use your favorite Greek sentences, you'll need to rerun the cells.
- ...if you get the message `Dead kernel`: just run the first cell (the one where you define the text), and then run the cells of the model you want to use.

### To know more... 
Performance of the system on different UD Treebanks is available in the [website of the Stanza project](https://stanfordnlp.github.io/stanza/performance.html).


In [None]:
# Insert here the text you want to automatically annotate (here Isocrates, To Demonicus, 1 18)
text = "Ἐὰν ᾖς φιλομαθής, ἔσει πολυμαθής"

import stanza # import the library Stanza
import spacy # import the spaCy libraries
from spacy import displacy
from spacy_stanza import StanzaLanguage

# Then run the cells and compare the two Greek models: PROIEL (default model), and Perseus


## Automatic annotation with the PROIEL model (default model)

In [None]:
stanza.download('grc') # download default Greek model (PROIEL)
nlp = stanza.Pipeline('grc') # initialise Greek neural pipeline
doc = nlp(text) # run annotation over a sentence 

In [None]:
# lemmatisation and PoS with the PROIEL model
for sentence in doc.sentences:
  for word in sentence.words:
    print(word.text, "lemma:", word.lemma, " PoS:", word.pos)

In [None]:
# morphological analysis with the PROIEL model
for sentence in doc.sentences:
  for word in sentence.words:
    print(word.text, word.feats)

In [None]:
# print (i.e. visualise) the dependencies with Stanza in JSON format
for sentence in doc.sentences:
  print(sentence.dependencies)

In [None]:
# Visualise the dependencies with SpaCy. 
#In this visualisation you won't see the root and the dependencies are shown horizontally. 
snlp = stanza.Pipeline(lang="grc")
nlp = StanzaLanguage(snlp)
doc = nlp(text)
displacy.render(doc, style="dep", jupyter=True)

## Automatic annotation with the Perseus model


In [None]:
stanza.download(lang='grc', package='perseus') # download the Perseus model 

In [None]:
# Lemmatisation and PoS with the Perseus model
nlp = stanza.Pipeline(lang='grc', package="perseus")
doc = nlp(text)
for sentence in doc.sentences:
  for word in sentence.words:
    print(word.text, "lemma:", word.lemma, " PoS:", word.pos)

In [None]:
# Morphological analysis with the Perseus model  
for sentence in doc.sentences:
  for word in sentence.words:
    print(word.text, word.feats)

In [None]:
# print the dependencies with Stanza in JSON format
for sentence in doc.sentences:
  print(sentence.dependencies)

In [None]:
# visualise the dependencies with the Perseus model. In this visualisation you won't see the root and the dependencies are shown horizontally. 
snlp = stanza.Pipeline(lang="grc", package="perseus")
nlp = StanzaLanguage(snlp)
doc = nlp(text)
displacy.render(doc, style="dep", jupyter=True)
