## POS Tagging

We'll generate the POS tags from the tokenized text. You can construct your own tagger from a training set (as described in the [docs](http://docs.cltk.org/en/latest/latin.html#making-pos-training-sets). CLTK also provides pre=trained models you can easily use which were trained with the help of NLTK.

In [1]:
from cltk.tag.pos import POSTag

In [2]:
from data.line_tokenized_text import line_tokenized_text as text

In [3]:
tagger = POSTag('latin')

###  TnT tagger

[TnT](http://www.nltk.org/_modules/nltk/tag/tnt.html) (short for Trigrams'n'Tags) is a Hidden Markov Model that is trained on trigrams of a text corpus.

In [4]:
tnt_tags = []

In [5]:
for line in text:
    tnt_tags.append(tagger.tag_tnt(line))

In [6]:
tnt_tags[0]

[('Conditi', 'Unk'),
 ('paradoxi', 'Unk'),
 ('compositio', 'Unk'),
 ('mellis', 'Unk'),
 ('pondo', 'D--------'),
 ('XV', 'Unk'),
 ('in', 'R--------'),
 ('aeneum', 'A-S---NN-'),
 ('vas', 'N-S---NN-'),
 ('mittuntu', 'Unk'),
 ('praemissis', 'Unk'),
 ('vini', 'N-S---NG-'),
 ('sextariis', 'Unk'),
 ('duobu', 'Unk'),
 ('ut', 'C--------'),
 ('in', 'R--------'),
 ('coctura', 'Unk'),
 ('mellis', 'Unk'),
 ('vinum', 'N-S---NA-'),
 ('decoquas', 'Unk')]

### 1–2–3–gram backoff tagger
The backoff tagger is a Bayesian tagger that works by identifying known ngrams from the trained corpora. Since it can only recognize ngrams it has seen, it's important to normalize the text and possibly try orthographic variations of unkown ngrams.

In [7]:
backoff_tags = []

In [8]:
for line in text:
    backoff_tags.append(tagger.tag_ngram_123_backoff(line))

In [9]:
backoff_tags[0]

[('Conditi', None),
 ('paradoxi', None),
 ('compositio', None),
 ('mellis', None),
 ('pondo', 'D--------'),
 ('XV', None),
 ('in', 'R--------'),
 ('aeneum', 'A-S---NN-'),
 ('vas', 'N-S---NN-'),
 ('mittuntu', None),
 ('praemissis', None),
 ('vini', 'N-S---NG-'),
 ('sextariis', None),
 ('duobu', None),
 ('ut', 'C--------'),
 ('in', 'R--------'),
 ('coctura', None),
 ('mellis', None),
 ('vinum', 'N-S---NA-'),
 ('decoquas', None)]

### Ingredients POS tags

We now want to find the POS tags of the ingredients.

In [10]:
from data.ingredient_indices import ingredient_indices

In [11]:
flatten = lambda L: [item for l in L for item in l]

In [12]:
POS_ingredients = []

In [13]:
tnt_flatteneded_tags = flatten(tnt_tags)

In [14]:
for index in ingredient_indices:
    POS_ingredients.append(tnt_flatteneded_tags[index-1:index+2])

In [15]:
tnt_flatteneded_tags

[('Conditi', 'Unk'),
 ('paradoxi', 'Unk'),
 ('compositio', 'Unk'),
 ('mellis', 'Unk'),
 ('pondo', 'D--------'),
 ('XV', 'Unk'),
 ('in', 'R--------'),
 ('aeneum', 'A-S---NN-'),
 ('vas', 'N-S---NN-'),
 ('mittuntu', 'Unk'),
 ('praemissis', 'Unk'),
 ('vini', 'N-S---NG-'),
 ('sextariis', 'Unk'),
 ('duobu', 'Unk'),
 ('ut', 'C--------'),
 ('in', 'R--------'),
 ('coctura', 'Unk'),
 ('mellis', 'Unk'),
 ('vinum', 'N-S---NA-'),
 ('decoquas', 'Unk'),
 ('quod', 'C--------'),
 ('igni', 'N-S---MB-'),
 ('lento', 'A-S---NB-'),
 ('et', 'C--------'),
 ('aridis', 'Unk'),
 ('lignis', 'Unk'),
 ('calefactu', 'Unk'),
 ('commotum', 'Unk'),
 ('ferula', 'Unk'),
 ('dum', 'C--------'),
 ('coquitu', 'Unk'),
 ('si', 'C--------'),
 ('effervere', 'Unk'),
 ('coeperi', 'Unk'),
 ('vini', 'N-S---NG-'),
 ('rore', 'N-S---MB-'),
 ('compescitur', 'Unk'),
 ('praeter', 'R--------'),
 ('quod', 'P-S---NA-'),
 ('subtracto', 'Unk'),
 ('igni', 'N-S---MB-'),
 ('in', 'R--------'),
 ('se', 'P-S---MA-'),
 ('redit', 'V3SPIA---'),
 ('cum'

In [16]:
POS_ingredients

[[('compositio', 'Unk'), ('mellis', 'Unk'), ('pondo', 'D--------')],
 [('praemissis', 'Unk'), ('vini', 'N-S---NG-'), ('sextariis', 'Unk')],
 [('coctura', 'Unk'), ('mellis', 'Unk'), ('vinum', 'N-S---NA-')],
 [('mellis', 'Unk'), ('vinum', 'N-S---NA-'), ('decoquas', 'Unk')],
 [('coeperi', 'Unk'), ('vini', 'N-S---NG-'), ('rore', 'N-S---MB-')],
 [('dactilis', 'Unk'), ('vino', 'N-S---NB-'), ('molliti', 'Unk')],
 [('suffusione', 'Unk'), ('vini', 'N-S---NG-'), ('de', 'R--------')],
 [('supermittis', 'Unk'), ('vini', 'N-S---NG-'), ('lenis', 'A-S---MN-')],
 [('peregrinanti', 'Unk'), ('piper', 'N-S---NA-'), ('tritum', 'T-SRPPMA-')],
 [('cum', 'R--------'), ('melle', 'N-S---NB-'), ('despumato', 'Unk')],
 [('aut', 'C--------'), ('mellis', 'Unk'), ('proferas', 'Unk')],
 [('aut', 'C--------'), ('vinum', 'N-S---NA-'), ('misceas', 'Unk')],
 [('nonnihil', 'Unk'), ('vini', 'N-S---NG-'), ('melizomo', 'Unk')],
 [('propter', 'R--------'), ('mellis', 'Unk'), ('exitum', 'N-S---MA-')],
 [('II', 'Unk'), ('vini'