# Lab 2: Evaluating an N-Gram Language Model

In this lab, you will evaluate the quality of an n-gram language model using perplexity.

We have built several n-gram language models and provided an implementation for computing the probabilities. The implementation includes [Laplace Smoothing](https://en.wikipedia.org/wiki/Additive_smoothing), with assigns some probability to sequences that were never encountered during training.

First, review the implementation below to make sure that it makes sense to you.

In [1]:
import pickle
BOS = '<BOS>'
EOS = '<EOS>'
OOV = '<OOV>'
class NGramLM:
    def __init__(self, path, smoothing=0.001, verbose=False):
        with open(path, 'rb') as fin:
            data = pickle.load(fin)
        self.n = data['n']
        self.V = set(data['V'])
        self.model = data['model']
        self.smoothing = smoothing
        self.verbose = verbose

    def get_prob(self, context, token):
        # Take only the n-1 most recent context (Markov Assumption)
        context = tuple(context[-self.n+1:])
        # Add <BOS> tokens if the context is too short, i.e., it's at the start of the sequence
        while len(context) < (self.n-1):
            context = (BOS,) + context
        # Handle words that were not encountered during the training by replacing them with a special <OOV> token
        context = tuple((c if c in self.V else OOV) for c in context)
        if token not in self.V:
            token = OOV
        if context in self.model:
            # Compute the probability using a Maximum Likelihood Estimation and Laplace Smoothing
            count = self.model[context].get(token, 0)
            prob = (count + self.smoothing) / (sum(self.model[context].values()) + self.smoothing * len(self.V))
        else:
            # Simplified formula if we never encountered this context; the probability of all tokens is uniform
            prob = 1 / len(self.V)
        # Optional logging
        if self.verbose:
            print(f'{prob:.4n}', *context, '->', token)
        return prob

In [2]:
# Load pre-built n-gram languae models
model_unigram = NGramLM('arthur-conan-doyle.tok.train.n1.pkl')
model_bigram = NGramLM('arthur-conan-doyle.tok.train.n2.pkl')
model_trigram = NGramLM('arthur-conan-doyle.tok.train.n3.pkl')
model_4gram = NGramLM('arthur-conan-doyle.tok.train.n4.pkl')
model_5gram = NGramLM('arthur-conan-doyle.tok.train.n5.pkl')

FileNotFoundError: [Errno 2] No such file or directory: 'arthur-conan-doyle.tok.train.n1.pkl'

Now it's time to see how well these models fit our data! We'll use Perplexity for this calculation, but it's up to you to implement it below.

Recall the formula for perplexity from the lecture:

$$
perplexity = 2^{\frac{-1}{n}\sum \log_2(P(w_i|w_{<i}))}
$$

Hint: you'll want to use the [`math.log2`](https://docs.python.org/3/library/math.html#math.log2) function

In [None]:
from typing import List, Tuple
def perplexity(model: NGramLM, texts: List[Tuple[str]]) -> float:
    text_list =[]
    for tup in texts:
        text_list.append(list(tup))
    final_token = [l for sublist in text_list for l in sublist]
    prob_res = model.get_prob(texts, tuple(final_token))
    return prob_res
# Example:
perplexity(model_unigram, [('My', 'dear', 'Watson', '.'), ('Come', 'over', 'here', '!')])