# Part-of-speech tagging

- Default tagging
- Training a unigram part-of-speech tagger
- Combining taggers with backoff tagging
- Training and combining ngram taggers
- Creating a model of likely word tags
- Tagging with regular expressions
- Affix tagging
- Training a Brill tagger
- Training the TnT tagger
- Using WordNet for tagging
- Tagging proper names
- Classifier-based tagging
- Training a tagger with NLTK-Trainer

## What is part-of-speech tagging?
- It is the process of converting a sentence into a list of tuples
 - a sentence is in the form of a list of words
 - each tuple is of the form (word, tag)  
   - The tag is a part-of-speech tag, signifies whether the word is a noun, adjective, verb, and so on

## Why we need part-of-speech tagging?
- It is a necessary step before chunking.
- Without the part-of-speech tags, a chunker cannot know how to extract phrases from a sentence.

### Reference
- https://en.wikipedia.org/wiki/Word_sense_disambiguation

## Default tagging

- This simply assigns the same part-of-speech tag to every token.
- This tagger is useful as a last-resort tagger
- It provides a baseline to measure accuracy improvements

### Alternatives
- Penn Treebank Part-of-speech tags
 - http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

In [1]:
from nltk.tag import DefaultTagger
tagger = DefaultTagger('NN') #NN, NP, ...
tagger.tag(['Hello', 'World'])

[('Hello', 'NN'), ('World', 'NN')]

### Evaluating accuracy

In [2]:
from nltk.corpus import treebank

test_sents = treebank.tagged_sents()[3000:]
tagger.evaluate(test_sents)

0.14331966328512843

### Tagging sentences

In [3]:
tagger.tag_sents([['Hello', 'world', '.'], ['How', 'are', 'you', '?']])

[[('Hello', 'NN'), ('world', 'NN'), ('.', 'NN')],
 [('How', 'NN'), ('are', 'NN'), ('you', 'NN'), ('?', 'NN')]]

### Untagging a tagged sentence

In [4]:
from nltk.tag import untag

untag([('Hello', 'NN'), ('World', 'NN')])

['Hello', 'World']

## Traning a unigram part-of-speech tagger

### Inheritance relationships
- SequentialBackoffTagger
 - choose_tage()  
 
- ContextTagger(SequentialBackoffTagger)
 - context()

- AffixTagger(ContextTagger)
- NgramTagger(ContextTagger)
 - UnigramTagger(NgramTagger)
 - BigramTagger(NgramTagger)
 - TrigramTagger(NgramTagger)

In [5]:
from nltk.tag import UnigramTagger
from nltk.corpus import treebank

train_sents = treebank.tagged_sents()[:3000]
tagger = UnigramTagger(train_sents)
treebank.sents()[0]

['Pierre',
 'Vinken',
 ',',
 '61',
 'years',
 'old',
 ',',
 'will',
 'join',
 'the',
 'board',
 'as',
 'a',
 'nonexecutive',
 'director',
 'Nov.',
 '29',
 '.']

In [6]:
treebank.tagged_sents()[0]

[('Pierre', 'NNP'),
 ('Vinken', 'NNP'),
 (',', ','),
 ('61', 'CD'),
 ('years', 'NNS'),
 ('old', 'JJ'),
 (',', ','),
 ('will', 'MD'),
 ('join', 'VB'),
 ('the', 'DT'),
 ('board', 'NN'),
 ('as', 'IN'),
 ('a', 'DT'),
 ('nonexecutive', 'JJ'),
 ('director', 'NN'),
 ('Nov.', 'NNP'),
 ('29', 'CD'),
 ('.', '.')]

In [7]:
tagger.evaluate(test_sents)

0.8571551910209367

### Overriding the context model
- All taggers that inherit from `ContextTagger` can take a pre-built model instead of training their own.
- The pre-built model is a Python dict mapping a context key to a tag

### Use case
- Unless we know exactly what we are doing, let the tagger train its own model instead of passing in our own.
- A good case for passing a self-created model to the `UnigramTagger` class is for when we have a dictionary of words and tags, and we know that every word should always map to its tag.
 - We can put this `UnigramTagger` as our first backoff tagger.

### Minimum frequency cutoff
- The `ContextTagger` class uses frequency of occurrence to decide which tag is most likely for a given context.

In [8]:
tagger = UnigramTagger(model = {'Pierre': 'NN'})
tagger.tag(treebank.sents()[0])

[('Pierre', 'NN'),
 ('Vinken', None),
 (',', None),
 ('61', None),
 ('years', None),
 ('old', None),
 (',', None),
 ('will', None),
 ('join', None),
 ('the', None),
 ('board', None),
 ('as', None),
 ('a', None),
 ('nonexecutive', None),
 ('director', None),
 ('Nov.', None),
 ('29', None),
 ('.', None)]

In [9]:
tagger = UnigramTagger(train_sents, cutoff = 3)
tagger.evaluate(test_sents)

0.775350744657889

## Combining taggers with backoff tagging

- `Backoff tagging` is one of the core features of `SequentialBackoffTagger`

In [10]:
tagger1 = DefaultTagger('NN')
tagger2 = UnigramTagger(train_sents, backoff = tagger1)
tagger2.evaluate(test_sents)

0.8741204403194475

### Saving and loading a trained tagger with pickle

In [11]:
import pickle

f = open('tagger.pickle', 'wb')
pickle.dump(tagger, f)
f.close()
f = open('tagger.pickle', 'rb')
tagger = pickle.load(f)

In [12]:
tagger.evaluate(test_sents)

0.775350744657889

## Training and combining ngram taggers

- BigramTagger
- TrigramTagger
- NgramTagger
 - e.g. quadgram tagger

In [13]:
from nltk.tag import BigramTagger, TrigramTagger
bitagger = BigramTagger(train_sents)
bitagger.evaluate(test_sents)

0.11318799913662854

In [14]:
tritagger = TrigramTagger(train_sents)
tritagger.evaluate(test_sents)

0.06902654867256637

In [15]:
def backoff_tagger(train_sent, tagger_classes, backoff = None):
    for cls in tagger_classes:
        backoff = cls(train_sents, backoff = backoff)
    
    return backoff

In [16]:
backoff = DefaultTagger('NN')
tagger = backoff_tagger(train_sents, [UnigramTagger, BigramTagger, TrigramTagger], backoff = backoff)
tagger.evaluate(test_sents)

0.8806388948845241

In [17]:
tagger._taggers

[<TrigramTagger: size=845>,
 <BigramTagger: size=1859>,
 <UnigramTagger: size=8818>,
 <DefaultTagger: tag=NN>]

In [18]:
isinstance(tagger._taggers[0], TrigramTagger)

True

### Quadgram tagger

In [19]:
from nltk.tag import NgramTagger
# quadtagger = NgramTagger(4, train_sents)
# quadtagger.evaluate(test_sents)

class QuadgramTagger(NgramTagger):
    def __init__(self, *args, **kwargs):
        NgramTagger.__init__(self, 4, *args, **kwargs)

In [20]:
quadtagger = backoff_tagger(train_sents, [UnigramTagger, BigramTagger, TrigramTagger, QuadgramTagger], backoff = backoff)
quadtagger.evaluate(test_sents)

0.8805093891646881

## Creating a model of likely word tags

- The word_tag_model() function take three arguments:
 - a list of all words
 - a list of all tagged words
 - the maximum number of words we want to use for our model

In [21]:
from nltk.probability import FreqDist, ConditionalFreqDist

def word_tag_model(words, tagged_words, limit = 200):
    fd = FreqDist(words)
    cfd = ConditionalFreqDist(tagged_words)
    most_freq = (word for word, count in fd.most_common(limit))
    return dict((word, cfd[word].max()) for word in most_freq)

In [22]:
from nltk.corpus import treebank

model = word_tag_model(treebank.words(), treebank.tagged_words())
tagger = UnigramTagger(model = model)
tagger.evaluate(test_sents)

0.5593352039715087

In [23]:
default_tagger = DefaultTagger('NN')
likely_tagger = UnigramTagger(model = model, backoff = default_tagger)
tagger = backoff_tagger(train_sents, [UnigramTagger, BigramTagger, TrigramTagger], backoff = likely_tagger)
tagger.evaluate(test_sents)

0.8806388948845241

In [24]:
tagger = backoff_tagger(train_sents, [UnigramTagger, BigramTagger, TrigramTagger], backoff = default_tagger)
likely_tagger = UnigramTagger(model = model, backoff = tagger)
likely_tagger.evaluate(test_sents)

0.8817181092164904

## Tagging with regular expressions

Can use this tagger as backoff tagger

In [25]:
patterns = [
    (r'^\d+$', 'CD'), # Cardinal number
    (r'.*ing$', 'VBG'), # gerunds, i.e. wondering
    (r'.*ment$', 'NN'), # i.e. wonderment
    (r'.*ful$', 'JJ'), # i.e. wonderful
]

In [26]:
from nltk.tag import RegexpTagger

tagger = RegexpTagger(patterns)
tagger.evaluate(test_sents)

0.037470321605870924

## Affix tagging

- The AffixTagger class is another `ContextTagger` subclass

### Working with min_stem_length
- The AffixTagger class also takes a min_stem_length keyword argument
 - The default value is 2.
 - If the word length is less than min_stem_length plus the absolute value of affix_length, then None is returned by the context() method.

In [27]:
from nltk.tag import AffixTagger
tagger = AffixTagger(train_sents)
tagger.evaluate(test_sents)

0.27507014893157783

In [28]:
prefix_tagger = AffixTagger(train_sents, affix_length = 3)
prefix_tagger.evaluate(test_sents)

0.2365637815670192

In [29]:
suffix_tagger = AffixTagger(train_sents, affix_length = -2)
suffix_tagger.evaluate(test_sents)

0.3196201165551478

## Training a Brill tagger

### What is a Brill tagger?
- brill.Template(brill.Pos([-1])) means that a rule can be generated using the previous part-of-speech tag
- brill.Template(brill.Pos([1])) means that we can look at the next part-of-speech tag to generate a rule
- brill.Template(brill.Word([-2, -1])) means we can look at the combination of the previous two words to learn a transformation rule

In [30]:
from nltk.tag import brill, brill_trainer

def train_brill_tagger(initial_tagger, train_sents, **kwargs):
    templates = [
        brill.Template(brill.Pos([-1])),
        brill.Template(brill.Pos([1])),
        brill.Template(brill.Pos([-2])),
        brill.Template(brill.Pos([2])),
        brill.Template(brill.Pos([-2, -1])),
        brill.Template(brill.Pos([1, 2])),
        brill.Template(brill.Pos([-3, -2, -1])),
        brill.Template(brill.Pos([1, 2, 3])),
        brill.Template(brill.Pos([-1]), brill.Pos([1])),
        brill.Template(brill.Word([-1])),
        brill.Template(brill.Word([1])),
        brill.Template(brill.Word([-2])),
        brill.Template(brill.Word([2])),
        brill.Template(brill.Word([-2, -1])),
        brill.Template(brill.Word([1, 2])),
        brill.Template(brill.Word([-3, -2, -1])),
        brill.Template(brill.Word([1, 2, 3])),
        brill.Template(brill.Word([-1]), brill.Word([1])),
    ]
    
    trainer = brill_trainer.BrillTaggerTrainer(initial_tagger, templates, deterministic = True)

    return trainer.train(train_sents, **kwargs)

In [31]:
default_tagger = DefaultTagger('NN')
initial_tagger = backoff_tagger(train_sents, [UnigramTagger, BigramTagger, TrigramTagger], backoff = default_tagger)
initial_tagger.evaluate(test_sents)

0.8806388948845241

In [32]:
brill_tagger = train_brill_tagger(initial_tagger, train_sents)
brill_tagger.evaluate(test_sents)

0.8822361320958342

### Tracing?

## Training the TnT tagger

- TnT stands for Trigram'nTags.
- This is a statistical tagger based on second order Markov models
- http://www.coli.uni-saarland.de/~thorsten/tnt/

In [33]:
from nltk.tag import tnt
tnt_tagger = tnt.TnT()
tnt_tagger.train(train_sents)
tnt_tagger.evaluate(test_sents)

0.875545003237643

### TnT()
- The TnT tagger accepts a few optional keyword arguments.
- We can pass in a tagger for unknown words as unk
- If the tagger is already trained, then we must also pass in Trained = True, otherwise it will call unk.train(data) with the same data we pass into the train() method.

In [34]:
unk = DefaultTagger('NN')
tnt_tagger = tnt.TnT(unk = unk, Trained = True)
tnt_tagger.train(train_sents)
tnt_tagger.evaluate(test_sents)

0.892467083962875

### Controlling the beam search

- `N` parameter controls the number of possible solutions the tagger maintains while trying to guess the tags for a sentence.
- The default value of N is 1000.
- N can influence both memory and accuracy.
- Don't assume that accuracy will not change if we decrease N.
- Experiment with our own data to be sure.

In [36]:
tnt_tagger = tnt.TnT(N = 100)
tnt_tagger.train(train_sents)
tnt_tagger.evaluate(test_sents)

0.875545003237643

### Significance of capitalization

- We can pass `C = True` to the TnT constructor if we want capitalization of words to be significant.
- The default is C = False.

## Using WordNet for tagging

In [46]:
from nltk.tag import SequentialBackoffTagger
from nltk.corpus import wordnet
from nltk.probability import FreqDist

class WordNetTagger(SequentialBackoffTagger):
    '''
    >>> wt = WordNetTagger()
    >>> wt.tag(['food', 'is', 'great'])
    [('food', 'NN'), ('is', 'VB'), ('great', 'JJ')]
    '''
    def __init__(self, *args, **kwargs):
        SequentialBackoffTagger.__init__(self, *args, **kwargs)
        
        self.wordnet_tag_map = {
            'n': 'NN',
            's': 'JJ',
            'a': 'JJ',
            'r': 'RB',
            'v': 'VB'
        }
    
    def choose_tag(self, tokens, index, history):
        word = tokens[index]
        fd = FreqDist()
        
        for synset in wordnet.synsets(word):
            fd[synset.pos()] += 1

        try:
            return self.wordnet_tag_map.get(fd.max())
        except:
            return self.wordnet_tag_map['n']

wn_tagger = WordNetTagger()
wn_tagger.evaluate(train_sents)

0.18737985576240793

In [48]:
from nltk.tag import UnigramTagger, BigramTagger, TrigramTagger

tagger = backoff_tagger(train_sents, [UnigramTagger, BigramTagger, TrigramTagger], backoff = wn_tagger)
tagger.evaluate(test_sents)

0.8865098208504208

## Tagging proper names

In [52]:
from nltk.tag import SequentialBackoffTagger
from nltk.corpus import names

class NamesTagger(SequentialBackoffTagger):
    def __init__(self, *args, **kwargs):
        SequentialBackoffTagger.__init__(self, *args, **kwargs)
        self.name_set = set([n.lower() for n in names.words()])
        
    def choose_tag(self, tokens, index, history):
        word = tokens[index]

        if word.lower() in self.name_set:
            return 'NNP'
        else:
            return None

nt = NamesTagger()
nt.tag(['Jacob'])

[('Jacob', 'NNP')]

## Classifier-based tagging

- The `ClassifierBasedPOSTagger` class uses classification to do part-of-speech tagging.
 - A subclass of `ClassifierBasedTagger`
 - This tagger implements a feature detector that combines many of the techniques of the previous taggers into a single feature set.
 - The feature detector  
    - finds multiple length suffixes
    - does some regular expression matching
    - looks at the unigram, bigram, and trigram history
 - We can use a different classifier instead of `NaiveBayesClassifier` by passing in our own `classifier_builder` function
- The `ClassifierBasedTagger` class is often the most accurate tagger, but it's also one of the slowest taggers.

In [53]:
from nltk.tag.sequential import ClassifierBasedPOSTagger

tagger = ClassifierBasedPOSTagger(train = train_sents)
tagger.evaluate(test_sents)

0.9309734513274336

In [54]:
from nltk.classify import MaxentClassifier

me_tagger = ClassifierBasedPOSTagger(train = train_sents,
                                     classifier_builder = MaxentClassifier.train)
me_tagger.evaluate(test_sents)

  ==> Training (100 iterations)

      Iteration    Log Likelihood    Accuracy
      ---------------------------------------
             1          -3.82864        0.008
             2          -0.76859        0.957


  exp_nf_delta = 2 ** nf_delta
  sum1 = numpy.sum(exp_nf_delta * A, axis=0)
  sum2 = numpy.sum(nf_exp_nf_delta * A, axis=0)


         Final               nan        0.984


0.9258363911072739

### Detecting features with a custom feature detector
- If we want to do our own feature detection, there are two ways to do it:
  1. Subclass `ClassifierBasedTagger` and implement a `feature_detector()` method  
  2. Pass a function as the `feature_detector keyword` argument into `ClassifierBasedTagger` at initialization  
- Either way, we need a feature detection method that can take the same arguments as `choose_tag(): `tokens, index, history  
 - return a `dict` of key-value features  
   - key is the feature name
   - value is the feature value

### Setting a cutoff probability
If we want to use a backoff tagger, we have to pass in a `cutoff_prob` argument to specify the probability threshold for classification. Then, if the probability of the chosen tag is less than cutoff_prob, the backoff tagger will be used.

### Using a pre-trained classifier
If we want to use a classifier that's already been trained, then we can pass that into `ClassifierBasedTagger` or `ClassifierBasedPOSTagger` as the `classifier`.
- In this case, the `classifier_builder` argument is ignored and no training takes place.
- We must ensure that the classifier has been trained on and can classify feature sets produced by whatever `feature_detector()` methoud we use.

In [60]:
def unigram_feature_detector(tokens, index, history):
    return {'word': tokens[index]}

from nltk.tag.sequential import ClassifierBasedTagger
tagger = ClassifierBasedTagger(train = train_sents,
                               feature_detector = unigram_feature_detector)

print(tagger.evaluate(test_sents))

default = DefaultTagger('NN')
tagger = ClassifierBasedPOSTagger(train = train_sents,
                                  backoff = default,
                                  cutoff_prob = 0.3)
print(tagger.evaluate(test_sents))

0.8733865745737104
0.9311029570472696


## Training a tagger with NLTK-Trainer
- NLTK-Trainer is a collection of scripts that give us the ability to run training experiements without writing a single line of code.
- Project link: https://github.com/japerk/nltk-trainer
- Documentation: http://nltk-trainer.readthedocs.org/

### NLTK-Trainer provides a script that can save a pickled tagger (train_tagger.py)
### NLTK-Trainer provides a script that can train on a custom corpus (train_tagger.py)
### NLTK-Trainer provides a script that can train with universal tags (train_tagger.py)
### NLTK-Trainer provides a script that can analyze a tagger against a tagged corpus (analyze_tagger_coverage.py)
### NLTK-Trainer provides a script that can analyze a tagged corpus (analyze_tagged_corpus.py)