# Part of Speech tagging and lemmatisation with 🐍

See information on the [Sunoikisis Wiki](https://github.com/SunoikisisDC/SunoikisisDC-2017-2018/wiki/Python-2:-Part-of-Speech-tagging-and-lemmatisation).

## Imports

In [1]:
import os
import sys
import cltk

In [2]:
cltk.__version__

'0.1.83'

## Download corpora

### Greek

In [33]:
from cltk.corpus.utils.importer import CorpusImporter

In [34]:
grk_corpus_importer = CorpusImporter('greek')

In [35]:
grk_corpus_importer.list_corpora

['greek_software_tlgu',
 'greek_text_perseus',
 'phi7',
 'tlg',
 'greek_proper_names_cltk',
 'greek_models_cltk',
 'greek_treebank_perseus',
 'greek_lexica_perseus',
 'greek_training_set_sentence_cltk',
 'greek_word2vec_cltk',
 'greek_text_lacus_curtius',
 'greek_text_first1kgreek']

In [41]:
grk_corpus_importer.import_corpus('greek_text_perseus')

Downloaded 100% 143.79 MiB | 4.02 MiB/s s 

**Note**: `cltk.corpus.latin.latinlibrary` is a shortcut for several things, and there is nothing comparable (yet) for Greek (see [source code](https://github.com/cltk/cltk/blob/master/cltk/corpus/latin/__init__.py)).

### Latin

In [42]:
la_corpus_importer = CorpusImporter('latin')

In [43]:
la_corpus_importer.import_corpus('latin_text_latin_library')

Downloaded 100% 35.50 MiB | 5.91 MiB/s 

In [3]:
from cltk.corpus.latin import latinlibrary

**NB**: Disclaimer about what the library does behind the scenes when one imports the submodule `latinlibrary`

In [4]:
amicitia_words = latinlibrary.words('cicero/amic.txt')

In [5]:
len(amicitia_words)

11618

We can get `n` number of tokens from this text by using the *slice notation*:

In [17]:
# the first ten tokens
amicitia_words[:10]

['Cicero',
 ':',
 'de',
 'Amicitia',
 'M.',
 'TVLLI',
 'CICERONIS',
 'LAELIVS',
 'DE',
 'AMICITIA']

In [18]:
# or the last token
amicitia_words[-1]

'Page'

We can also count occurrences by using the `count()` method and passing as parameter the token we want to inspect:

In [19]:
amicitia_words.count('et')

236

In [23]:
amicitia_words.count('amicitia')

67

Let's have a closer look to the `type` of the variable `amicitia_words` where we loaded the content of Cicero's *De Amicitia*:

In [11]:
type(amicitia_words)

nltk.corpus.reader.util.StreamBackedCorpusView

In [7]:
help(amicitia_words)

Help on StreamBackedCorpusView in module nltk.corpus.reader.util object:

class StreamBackedCorpusView(nltk.collections.AbstractLazySequence)
 |  A 'view' of a corpus file, which acts like a sequence of tokens:
 |  it can be accessed by index, iterated over, etc.  However, the
 |  tokens are only constructed as-needed -- the entire corpus is
 |  never stored in memory at once.
 |  
 |  The constructor to ``StreamBackedCorpusView`` takes two arguments:
 |  a corpus fileid (specified as a string or as a ``PathPointer``);
 |  and a block reader.  A "block reader" is a function that reads
 |  zero or more tokens from a stream, and returns them as a list.  A
 |  very simple example of a block reader is:
 |  
 |      >>> def simple_block_reader(stream):
 |      ...     return stream.readline().split()
 |  
 |  This simple block reader reads a single line at a time, and
 |  returns a single token (consisting of a string) for each
 |  whitespace-separated substring on the line.
 |  
 |  When d

## Part of Speech Tagging

### Latin

#### CLTK taggers

In [47]:
from cltk.tag.pos import POSTag
tagger = POSTag('latin')
tagger.tag_ngram_123_backoff('Gallia est omnis divisa in partes tres')

AssertionError: CLTK linguistics models not available for unigram.pickle.

In [48]:
la_corpus_importer.list_corpora

['latin_text_perseus',
 'latin_treebank_perseus',
 'latin_text_latin_library',
 'phi5',
 'phi7',
 'latin_proper_names_cltk',
 'latin_models_cltk',
 'latin_pos_lemmata_cltk',
 'latin_treebank_index_thomisticus',
 'latin_lexica_perseus',
 'latin_training_set_sentence_cltk',
 'latin_word2vec_cltk',
 'latin_text_antique_digiliblt',
 'latin_text_corpus_grammaticorum_latinorum',
 'latin_text_poeti_ditalia']

In [49]:
la_corpus_importer.import_corpus('latin_models_cltk')

In [50]:
tagger = POSTag('latin')

In [51]:
tagger

<cltk.tag.pos.POSTag at 0x113476d68>

In [46]:
list(zip(
    tagger.tag_tnt(" ".join([str(w) for w in amicitia_words[100:150]])),
    tagger.tag_ngram_123_backoff(" ".join([str(w) for w in amicitia_words[100:150]])),
    tagger.tag_crf(" ".join([str(w) for w in amicitia_words[100:150]]))
))

[(('91', 'Unk'), ('91', None), ('91', 'A-P---FN-')),
 (('92', 'Unk'), ('92', None), ('92', 'N-P---FN-')),
 (('93', 'Unk'), ('93', None), ('93', 'A-P---FN-')),
 (('94', 'Unk'), ('94', None), ('94', 'N-P---FN-')),
 (('95', 'Unk'), ('95', None), ('95', 'A-P---FN-')),
 (('96', 'Unk'), ('96', None), ('96', 'N-P---FN-')),
 (('97', 'Unk'), ('97', None), ('97', 'A-P---FN-')),
 (('98', 'Unk'), ('98', None), ('98', 'N-P---FN-')),
 (('99', 'Unk'), ('99', None), ('99', 'A-P---FN-')),
 (('100', 'Unk'), ('100', None), ('100', 'N-P---FN-')),
 (('101', 'Unk'), ('101', None), ('101', 'A-P---FN-')),
 (('102', 'Unk'), ('102', None), ('102', 'N-P---FN-')),
 (('103', 'Unk'), ('103', None), ('103', 'A-P---FN-')),
 (('104', 'Unk'), ('104', None), ('104', 'N-P---FN-')),
 (('[', 'U--------'), ('[', 'U--------'), ('[', 'U--------')),
 (('1', 'Unk'), ('1', None), ('1', 'N-S---MV-')),
 ((']', 'U--------'), (']', 'U--------'), (']', 'U--------')),
 (('Q', 'Unk'), ('Q', None), ('Q', 'N-S---MV-')),
 (('.', 'U-------

#### TreeTagger

In [32]:
from treetagger import TreeTagger

In [9]:
os.environ["TREETAGGER_HOME"] = "/Users/rromanello/tree-tagger/cmd/"

In [10]:
tt = TreeTagger(language="latin")

In [12]:
tt.tag(amicitia_words[100:150])

[['91', 'ADJ:NUM', '@card@'],
 ['92', 'ADJ:NUM', '@card@'],
 ['93', 'ADJ:NUM', '@card@'],
 ['94', 'ADJ:NUM', '@card@'],
 ['95', 'ADJ:NUM', '@card@'],
 ['96', 'ADJ:NUM', '@card@'],
 ['97', 'ADJ:NUM', '@card@'],
 ['98', 'ADJ:NUM', '@card@'],
 ['99', 'ADJ:NUM', '@card@'],
 ['100', 'ADJ:NUM', '@card@'],
 ['101', 'ADJ:NUM', '@card@'],
 ['102', 'ADJ:NUM', '@card@'],
 ['103', 'ADJ:NUM', '@card@'],
 ['104', 'ADJ:NUM', '@card@'],
 ['[', 'PUN', '['],
 ['1', 'ADJ:NUM', '@card@'],
 [']', 'PUN', ']'],
 ['Q.', 'ABBR', 'Q.'],
 ['Mucius', 'ADJ', '<unknown>'],
 ['augur', 'N:nom', 'augur'],
 ['multa', 'ADJ', 'multus'],
 ['narrare', 'V:INF', 'narro'],
 ['de', 'PREP', 'de'],
 ['C.', 'ABBR', 'C.'],
 ['Laelio', 'N:abl', '<unknown>'],
 ['socero', 'N:abl', 'socer'],
 ['suo', 'POSS', 'suus'],
 ['memoriter', 'ADV', 'memoriter'],
 ['et', 'CC', 'et'],
 ['iucunde', 'ADJ', '<unknown>'],
 ['solebat', 'V:IND', 'soleo'],
 ['nec', 'CC', 'nec'],
 ['dubitare', 'V:INF', 'dubito'],
 ['illum', 'DIMOS', 'ille'],
 ['in', 'PRE

In [31]:
tt.tag("Cogito ergo sum")

NameError: name 'tt' is not defined

### Greek

#### CLTK taggers

In [36]:
grk_corpus_importer.import_corpus("greek_models_cltk")

In [37]:
from cltk.tag.pos import POSTag
tagger = POSTag('greek')

In [39]:
tagger.tag_ngram_123_backoff('θεοὺς μὲν αἰτῶ τῶνδ᾽ ἀπαλλαγὴν πόνων φρουρᾶς ἐτείας μῆκος')

[('θεοὺς', 'N-P---MA-'),
 ('μὲν', 'G--------'),
 ('αἰτῶ', 'V1SPIA---'),
 ('τῶνδ', 'P-P---MG-'),
 ('᾽', None),
 ('ἀπαλλαγὴν', 'N-S---FA-'),
 ('πόνων', 'N-P---MG-'),
 ('φρουρᾶς', 'N-S---FG-'),
 ('ἐτείας', 'A-S---FG-'),
 ('μῆκος', 'N-S---NA-')]

In [40]:
tagger.tag_tnt('θεοὺς μὲν αἰτῶ τῶνδ᾽ ἀπαλλαγὴν πόνων φρουρᾶς ἐτείας μῆκος')

[('θεοὺς', 'N-P---MA-'),
 ('μὲν', 'G--------'),
 ('αἰτῶ', 'V1SPIA---'),
 ('τῶνδ', 'P-P---NG-'),
 ('᾽', 'Unk'),
 ('ἀπαλλαγὴν', 'N-S---FA-'),
 ('πόνων', 'N-P---MG-'),
 ('φρουρᾶς', 'N-S---FG-'),
 ('ἐτείας', 'A-S---FG-'),
 ('μῆκος', 'N-S---NA-')]

In [41]:
import os.path
from nltk.corpus.reader.plaintext import PlaintextCorpusReader
from nltk.tokenize.punkt import PunktSentenceTokenizer, PunktParameters
from cltk.tokenize.sentence import TokenizeSentence
from cltk.tokenize.word import WordTokenizer

In [46]:
word_tokenizer = WordTokenizer('greek')
sentence_tokenizer = TokenizeSentence("greek")

In [60]:
# my modified version of https://github.com/cltk/greek_text_perseus/blob/master/perseus_compiler.py

import os
import re
import bleach
#from cltk.corpus.classical_greek.replacer import Replacer
from cltk.corpus.greek.beta_to_unicode import Replacer


home = os.path.expanduser('~')
cltk_path = os.path.join(home, 'cltk_data')
print(cltk_path)
perseus_root = cltk_path + '/greek/text/greek_text_perseus/'
print(perseus_root)
ignore = [
    '.git',
    'LICENSE.md',
    'README.md',
    'cltk_json',
    'json',
    'perseus_compiler.py'
]
authors = [d for d in os.listdir(perseus_root) if d not in ignore]

for author in authors:
    texts = os.listdir(perseus_root + author + '/opensource')
    for text in texts:
        text_match = re.match(r'.*_gk.xml', text)
        if text_match:
            gk_file = text_match.group()
            txt_file = perseus_root + author + '/opensource/' + gk_file
            with open(txt_file) as gk:
                html = gk.read()
                beta_code = bleach.clean(html, strip=True).upper()
                a_replacer = Replacer()
                unicode_converted = a_replacer.beta_code(beta_code)
                #print(unicode_converted)
                unicode_root = cltk_path + '/greek/text/perseus_unicode/'
                unic_pres = os.path.isdir(unicode_root)
                if unic_pres is True:
                    pass
                else:
                    os.mkdir(unicode_root)
                author_path = unicode_root + author
                author_path_pres = os.path.isdir(author_path)
                if author_path_pres is True:
                    pass
                else:
                    os.mkdir(author_path)
                gk_file_txt = os.path.splitext(gk_file)[0] + '.txt'
                uni_write = author_path + '/' + gk_file_txt
                print(uni_write)
                with open(uni_write, 'w') as uni_write:
                    uni_write.write(unicode_converted)

/Users/mat/cltk_data
/Users/mat/cltk_data/greek/text/greek_text_perseus/
/Users/mat/cltk_data/greek/text/perseus_unicode/Aeschines/aeschin_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Aeschines/aeschin_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Aeschylus/aesch.ag_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Aeschylus/aesch.ag_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Aeschylus/aesch.eum_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Aeschylus/aesch.eum_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Aeschylus/aesch.lib_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Aeschylus/aesch.lib_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Aeschylus/aesch.pb_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Aeschylus/aesch.pb_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Aeschylus/aesch.pers_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Aeschylus/aesch.pers_gk.txt
/Users/mat/cltk_data/greek/text/p

/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath09_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath10_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath10_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath11_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath11_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath12_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath12_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath13_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath13_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath14_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath14_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath15_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Athenaeus/ath15_gk.txt
/Users/mat/cltk_data/greek/text/perseus_unicode/Bacchylides/bacchyl_gk.txt
/U

KeyboardInterrupt: 

In [62]:
try:
    perseusgreek = PlaintextCorpusReader(
        cltk_path + '/greek/text/perseus_unicode/', 
        '.*\.txt',
        word_tokenizer=word_tokenizer, 
        sent_tokenizer=sentence_tokenizer, 
        encoding='utf-8'
    )    
    pass
except IOError as e:
    pass
    # print("Corpus not found. Please check that the Latin Library is installed in CLTK_DATA.")

In [73]:
birds = perseusgreek.words('Aristophanes/aristoph.birds_gk.txt')

In [78]:
print(list(birds[1000:1100]))

['ἐπέγειρον', 'αὐτόν', '.', 'Θεράπων', 'Ἔποποσ', 'οἶδα', 'μὲν', 'σαφῶσ', 'ὅτι', 'ἀχθέσεται', ',', 'σφῷν', 'δ’', 'αὐτὸν', 'οὕνεκ’', 'ἐπεγερῶ', '.', 'Πισθέταιροσ', 'κακῶς', 'σύ', 'γ’', 'ἀπόλοῐ', ',', 'ὥς', 'μ’', 'ἀπέκτεινας', 'δέει', '.', 'Ἐυελπίδησ', 'οἴμοι', 'κακοδαίμων', 'χὠ', 'κολοιός', 'μοἴχεται', 'ὑπὸ', 'τοῦ', 'δέους\\', '.', 'Πισθέταιροσ', 'ὦ', 'δειλότατον', 'σὺ', 'θηρίον', ',', 'δείσας', 'ἀφῆκας', 'τὸν', 'κολοιόν', ';', 'Ἐυελπίδησ', 'εἰπέ', 'μοι', ',', 'σὺ', 'δὲ', 'τὴν', 'κορώνην', 'οὐκ', 'ἀφῆκας', 'καταπεσών', ';', 'Πισθέταιροσ', 'μὰ', 'Δί’', 'οὐκ', 'ἔγωγε', '.', 'Ἐυελπίδησ', 'ποῦ', 'γάρ', 'ἐστ’', ';', 'Πισθέταιροσ', 'ἀπέπτετο', '.', 'Ἐυελπίδησ', 'οὐκ', 'ἆρ’', 'ἀφῆκας', ';', 'ὦγάθ’', 'ὡς', 'ἀνδρεῖος', 'εἶ', '.', 'Ἔποψ', 'ἄνοιγε', 'τὴν', 'ὕλην', ',', 'ἵν’', 'ἐξέλθω', 'ποτέ', '.', 'Ἐυελπίδησ', 'ὦ', 'Ἡράκλεις', 'τουτὶ', 'τί', 'ποτ’']


In [85]:
tagger.tag_tnt(" ".join(birds[1000:1100]))

[('ἐπέγειρον', 'Unk'),
 ('αὐτόν', 'A-S---MA-'),
 ('.', 'U--------'),
 ('Θεράπων', 'Unk'),
 ('Ἔποποσ', 'Unk'),
 ('οἶδα', 'V1SRIA---'),
 ('μὲν', 'G--------'),
 ('σαφῶσ', 'Unk'),
 ('ὅτι', 'C--------'),
 ('ἀχθέσεται', 'Unk'),
 (',', 'U--------'),
 ('σφῷν', 'P-D---MG-'),
 ('δ', 'G--------'),
 ('’', 'Unk'),
 ('αὐτὸν', 'A-S---MA-'),
 ('οὕνεκ', 'C--------'),
 ('’', 'Unk'),
 ('ἐπεγερῶ', 'Unk'),
 ('.', 'U--------'),
 ('Πισθέταιροσ', 'Unk'),
 ('κακῶς', 'D--------'),
 ('σύ', 'P-S----N-'),
 ('γ', 'G--------'),
 ('’', 'Unk'),
 ('ἀπόλοῐ', 'Unk'),
 (',', 'U--------'),
 ('ὥς', 'C--------'),
 ('μ', 'P-S---MA-'),
 ('’', 'Unk'),
 ('ἀπέκτεινας', 'Unk'),
 ('δέει', 'N-S---ND-'),
 ('.', 'U--------'),
 ('Ἐυελπίδησ', 'Unk'),
 ('οἴμοι', 'E--------'),
 ('κακοδαίμων', 'Unk'),
 ('χὠ', 'L-S---MN-'),
 ('κολοιός', 'Unk'),
 ('μοἴχεται', 'Unk'),
 ('ὑπὸ', 'R--------'),
 ('τοῦ', 'L-S---NG-'),
 ('δέους', 'N-S---NG-'),
 ('\\', 'Unk'),
 ('.', 'U--------'),
 ('Πισθέταιροσ', 'Unk'),
 ('ὦ', 'E--------'),
 ('δειλότατον', 'Unk'),

#### TreeTagger

## Lemmatization

### Latin

#### PyCollatinus

* Python port of the [Collatinus lemmatizer](https://github.com/biblissima/collatinus)
* good if you can read some French (or at least practice it) 😉
* the PoS tags used by Collatinus are explained [here](https://github.com/biblissima/collatinus/blob/master/NOTES_Tagger.md)
* morphological analysis not readily machine readable

We import and instantiate the PyCollatinus lemmatizer (`Lemmatiseur`) – and ignore the long list of warnings. 

In [25]:
from pycollatinus import Lemmatiseur
analyzer = Lemmatiseur()

/Users/mat/.local/share/virtualenvs/sunoikisis_dc-I3iKJ3Z3/lib/python3.6/site-packages/pycollatinus/parser.py:335: MissingRadical: honor has no radical 1
/Users/mat/.local/share/virtualenvs/sunoikisis_dc-I3iKJ3Z3/lib/python3.6/site-packages/pycollatinus/parser.py:335: MissingRadical: aer has no radical 1
/Users/mat/.local/share/virtualenvs/sunoikisis_dc-I3iKJ3Z3/lib/python3.6/site-packages/pycollatinus/parser.py:335: MissingRadical: tethys has no radical 1
/Users/mat/.local/share/virtualenvs/sunoikisis_dc-I3iKJ3Z3/lib/python3.6/site-packages/pycollatinus/parser.py:335: MissingRadical: opes has no radical 1
/Users/mat/.local/share/virtualenvs/sunoikisis_dc-I3iKJ3Z3/lib/python3.6/site-packages/pycollatinus/parser.py:335: MissingRadical: dos has no radical 1
/Users/mat/.local/share/virtualenvs/sunoikisis_dc-I3iKJ3Z3/lib/python3.6/site-packages/pycollatinus/parser.py:335: MissingRadical: corpus has no radical 1
/Users/mat/.local/share/virtualenvs/sunoikisis_dc-I3iKJ3Z3/lib/python3.6/site-p

The lemmatiser can take as input a **single word**

In [25]:
list(analyzer.lemmatise("Cogito"))

[{'desinence': 'ito',
  'form': 'cogito',
  'lemma': 'cogo',
  'morph': '2ème singulier impératif futur actif',
  'radical': 'cog'},
 {'desinence': 'ito',
  'form': 'cogito',
  'lemma': 'cogo',
  'morph': '3ème singulier impératif futur actif',
  'radical': 'cog'},
 {'desinence': 'o',
  'form': 'cogito',
  'lemma': 'cogito',
  'morph': '1ère singulier indicatif présent actif',
  'radical': 'cogit'},
 {'desinence': 'o',
  'form': 'cogito',
  'lemma': 'cogito',
  'morph': '1ère singulier indicatif présent actif',
  'radical': 'cogit'}]

or an **entire sentence**

In [26]:
list(analyzer.lemmatise_multiple("Cogito ergo sum"))

[[{'desinence': 'ito',
   'form': 'cogito',
   'lemma': 'cogo',
   'morph': '2ème singulier impératif futur actif',
   'radical': 'cog'},
  {'desinence': 'ito',
   'form': 'cogito',
   'lemma': 'cogo',
   'morph': '3ème singulier impératif futur actif',
   'radical': 'cog'},
  {'desinence': 'o',
   'form': 'cogito',
   'lemma': 'cogito',
   'morph': '1ère singulier indicatif présent actif',
   'radical': 'cogit'},
  {'desinence': 'o',
   'form': 'cogito',
   'lemma': 'cogito',
   'morph': '1ère singulier indicatif présent actif',
   'radical': 'cogit'}],
 [{'desinence': 'o',
   'form': 'ergo',
   'lemma': 'ergo',
   'morph': '1ère singulier indicatif présent actif',
   'radical': 'erg'},
  {'desinence': '',
   'form': 'ergo',
   'lemma': 'ergo',
   'morph': '-',
   'radical': 'ergo'},
  {'desinence': '',
   'form': 'ergo',
   'lemma': 'ergo',
   'morph': 'positif',
   'radical': 'ergo'}],
 [{'desinence': 'um',
   'form': 'sum',
   'lemma': 'sum',
   'morph': '1ère singulier indicatif p

Let's try to output the lemmatisation in a more intellegible way...

In [35]:
# the analyzer output is essentially a list of lists
# for each analyzed token it returns a list of possible lemmata
# here we iterate through both lists and display the analysis as we go along

for n, result in enumerate(analyzer.lemmatise_multiple("Cogito ergo sum")):
    for i, lemma in enumerate(result):
        print(
            "{}.{}\t{}\t\t{} {}".format(
                n + 1,
                i + 1,
                lemma["form"],
                lemma["lemma"],
                lemma["morph"]
            )
        )

1.1	cogito		cogo 2ème singulier impératif futur actif
1.2	cogito		cogo 3ème singulier impératif futur actif
1.3	cogito		cogito 1ère singulier indicatif présent actif
1.4	cogito		cogito 1ère singulier indicatif présent actif
2.1	ergo		ergo 1ère singulier indicatif présent actif
2.2	ergo		ergo -
2.3	ergo		ergo positif
3.1	sum		sum 1ère singulier indicatif présent actif


the same but also with PoS tag

In [30]:
# the analyzer output is essentially a list of lists
# for each analyzed token it returns a list of possible lemmata
# here we iterate through both lists and display the analysis as we go along

for n, result in enumerate(analyzer.lemmatise_multiple("Cogito ergo sum", pos=True),):
    for i, lemma in enumerate(result):
        print(
            "{}.{}\t{}\t{}\t{} {}".format(
                n + 1,
                i + 1,
                lemma["form"],
                lemma["pos"],
                lemma["lemma"],
                lemma["morph"]
            )
        )

1.1	cogito	v	cogo 2ème singulier impératif futur actif
1.2	cogito	v	cogo 3ème singulier impératif futur actif
1.3	cogito	v	cogito 1ère singulier indicatif présent actif
1.4	cogito	v	cogito 1ère singulier indicatif présent actif
2.1	ergo	v	ergo 1ère singulier indicatif présent actif
2.2	ergo	c	ergo -
2.3	ergo	d	ergo positif
3.1	sum	v	sum 1ère singulier indicatif présent actif


### Greek