# Language Modeling using Ngram

In this Exercise, we are going to create a bigram language model and its variation. We will build one model for each of the following type and calculate their perplexity:
- Unigram Model
- Bigram Model
- Bigram Model with Laplace smoothing
- Bigram Model with Interpolation
- Bigram Model with Kneser-ney Interpolation

We will also use NLTK which is a natural language processing library for python to make our lives easier.



In [1]:
# #download corpus
!wget --no-check-certificate https://github.com/ekapolc/nlp_2019/raw/master/HW4/BEST2010.zip
!unzip BEST2010.zip

--2025-01-16 15:24:48--  https://github.com/ekapolc/nlp_2019/raw/master/HW4/BEST2010.zip
Resolving github.com (github.com)... 4.237.22.38
Connecting to github.com (github.com)|4.237.22.38|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/ekapolc/nlp_2019/master/HW4/BEST2010.zip [following]
--2025-01-16 15:24:48--  https://raw.githubusercontent.com/ekapolc/nlp_2019/master/HW4/BEST2010.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7423530 (7.1M) [application/zip]
Saving to: ‘BEST2010.zip’


2025-01-16 15:24:49 (202 MB/s) - ‘BEST2010.zip’ saved [7423530/7423530]

Archive:  BEST2010.zip
   creating: BEST2010/
  inflating: BEST2010/article.txt    
  inflating: BEST2010/encyclopedia.txt  
  infla

In [2]:
!wget https://www.dropbox.com/s/jajdlqnp5h0ywvo/tokenized_wiki_sample.csv

--2025-01-16 15:24:50--  https://www.dropbox.com/s/jajdlqnp5h0ywvo/tokenized_wiki_sample.csv
Resolving www.dropbox.com (www.dropbox.com)... 162.125.83.18, 2620:100:6033:18::a27d:5312
Connecting to www.dropbox.com (www.dropbox.com)|162.125.83.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.dropbox.com/scl/fi/88uzig0mno1b57d6bhwht/tokenized_wiki_sample.csv?rlkey=oya9jw1rljj31jc49fvoaty01 [following]
--2025-01-16 15:24:50--  https://www.dropbox.com/scl/fi/88uzig0mno1b57d6bhwht/tokenized_wiki_sample.csv?rlkey=oya9jw1rljj31jc49fvoaty01
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc56b90ee3c8c7c0764750d1d7c8.dl.dropboxusercontent.com/cd/0/inline/CiRSlqf6B9Da_iFFGi-S39GrYCInpwT-VPOhSNeAT0A8lMLBRiudLYPmZdi5JabQGWQ1PCVb3K1KjC_OJgyTYBW69yVtfOVQvNgIEZuN-eD0d9Uts-jxt3zWDv2wuiZDby8LxfT5Oti3cj3fQDXDo2ln/file# [following]
--2025-01-16 15:24:51--  https://uc56b90ee3c8c7c0764750d1d7

In [3]:
#First we import necessary library such as math, nltk, bigram, and collections.
import math
import nltk
import io
import random
from random import shuffle
from nltk import bigrams, trigrams
from collections import Counter, defaultdict
random.seed(999)

BEST2010 is a free Thai NLP dataset by NECTEC usually used as a standard benchmark for various NLP tasks including language modeling. It is separated into 4 domains including article, encyclopedia, news, and novel. The data is already  tokenized using '|' as a separator.

For example,

ตาม|ที่|นางประนอม ทองจันทร์| |กับ| |ด.ช.กิตติพงษ์ แหลมผักแว่น| |และ| |ด.ญ.กาญจนา กรองแก้ว| |ป่วย|สงสัย|ติด|เชื้อ|ไข้|ขณะ|นี้|ยัง|ไม่|ดี|ขึ้น|

In [4]:
total_word_count = 0
best2010 = []
with open('BEST2010/news.txt','r',encoding='utf-8') as f:
  for i,line in enumerate(f):
    line=line.strip()[:-1] #remove the trailing |
    total_word_count += len(line.split("|"))
    best2010.append(line)

In [5]:
#For simplicity, we assumes that each line is a sentence.
print (f'Total sentences in BEST2010 news dataset :\t{len(best2010)}')
print (f'Total word counts in BEST2010 news dataset :\t{total_word_count}')

Total sentences in BEST2010 news dataset :	30969
Total word counts in BEST2010 news dataset :	1660190


We separate the input into 2 sets, train and test data with 70:30 ratio

In [6]:
sentences = best2010
# The data is separated to train and test set with 70:30 ratio.
train = sentences[:int(len(sentences)*0.7)]
test = sentences[int(len(sentences)*0.7):]

#Training data
train_word_count =0
for line in train:
    for word in line.split('|'):
        train_word_count+=1
print ('Total sentences in BEST2010 news training dataset :\t'+ str(len(train)))
print ('Total word counts in BEST2010 news training dataset :\t'+ str(train_word_count))

Total sentences in BEST2010 news training dataset :	21678
Total word counts in BEST2010 news training dataset :	1042797


Here we load the data from Wikipedia which is also already tokenized. It will be used for answering questions in MyCourseville.

In [7]:
import pandas as pd
wiki_data = pd.read_csv("tokenized_wiki_sample.csv")

## Data Preprocessing

Before training any language models, the first step we always do is process the data into the format suited for the LM.

For this exercise, we will use NLTK to help process our data.

In [8]:
!pip install -U nltk

Collecting nltk
  Downloading nltk-3.9.1-py3-none-any.whl.metadata (2.9 kB)
Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m49.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: nltk
  Attempting uninstall: nltk
    Found existing installation: nltk 3.2.4
    Uninstalling nltk-3.2.4:
      Successfully uninstalled nltk-3.2.4
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
preprocessing 0.1.13 requires nltk==3.2.4, but you have nltk 3.9.1 which is incompatible.[0m[31m
[0mSuccessfully installed nltk-3.9.1


In [9]:
from nltk.lm.preprocessing import pad_both_ends, flatten
from nltk.lm.vocabulary import Vocabulary
from nltk import ngrams

We begin by "tokenizing" our training set. Note that the data is already tokenized so we can just split it.

In [10]:
tokenized_train = [["<s>"] + t.split("|") + ["</s>"] for t in train] # "tokenize" each sentence

Next we create a vocabulary with the ```Vocabulary``` class from NLTK. It accepts a list of tokens so we flatten our sentences into one long sentence first.







In [11]:
flat_tokens = list(flatten(tokenized_train)) #join all sentences into one long sentence
vocab = Vocabulary(flat_tokens, unk_cutoff=3) #Words with frequency **below** 3 (not exactly 3) will not be considered in our vocab and will be converted to <UNK>.

Then we replace low frequency words and pad each sentence with \<s\> in the front and \</s\> in the back of each sentence.

Now *each* sentence is going to look something like this:
\["\<s\>", "hello", "my", "name", "is", "\<UNK\>", "\</s\>" \]

In [12]:
tokenized_train = [[token if token in vocab else "<UNK>" for token in sentence] for sentence in tokenized_train]
padded_tokenized_train = [list(pad_both_ends(sentence, n=2)) for sentence in tokenized_train]

Finally, we do the same for the test set and the wiki dataset.

In [13]:
tokenized_test = [t.split("|") for t in test]
tokenized_test = [[token if token in vocab else "<UNK>" for token in sentence] for sentence in tokenized_test]
padded_tokenized_test = [list(pad_both_ends(sentence, n=2)) for sentence in tokenized_test]

tokenized_wiki_test = [t.split("|") for t in wiki_data['tokenized'].tolist()]
tokenized_wiki_test = [[token if token in vocab else "<UNK>" for token in sentence] for sentence in tokenized_wiki_test]
padded_tokenized_wiki_test = [list(pad_both_ends(sentence, n=2)) for sentence in tokenized_wiki_test]

# Unigram

In this section, we will demonstrate how to build a unigram language model <br>
**Important note:** <br>
**\<s\>** = sentence start symbol <br>
**\</s\>** = sentence end symbol

# VERY IMPORTANT:
- In this notebook, we will *not* default the unknown token probability to ```1/len(vocab)``` but instead will treat it as a normal word and let the model learn its probability so that we can compare our results to NLTK.
- **Also make sure that the code in this notebook can be executed without any problem. If we find that you used NLTK to answer questions in MyCourseVille and did not finish the assignment, you will receive a grade of 0 for this assignment.**

In [14]:
class UnigramModel():
  def __init__(self, data, vocab):
    self.unigram_count = defaultdict(lambda: 0.0)
    self.word_count = 0
    self.vocab = vocab
    for sentence in data:
        for w in sentence: #[(word1, ), (word2, ), (word3, )...]
          w = w[0]
          if w in self.vocab:
            self.unigram_count[w] +=1
          else:
            self.unigram_count["<UNK>"] += 1
          self.word_count+=1

  def __getitem__(self, w):
    w = w[0]  #[(word1, ), (word2, ), (word3, )...]
    if w in self.vocab:
      return self.unigram_count[w]/(self.word_count)
    else:
      return self.unigram_count["<UNK>"]/(self.word_count)

In [15]:
train_unigrams = [list(ngrams(sent, n=1)) for sent in padded_tokenized_train] #creating the unigrams by setting n=1
model = UnigramModel(train_unigrams, vocab)

In [16]:
def getLnValue(x):
      return math.log(x)

In [17]:
#problability of 'นายก'
print(getLnValue(model['นายก']))

#for example, problability of 'นายกรัฐมนตรี' which is an unknown word is equal to
print(getLnValue(model['นายกรัฐมนตรี']))

#problability of 'นายก' 'ได้' 'ให้' 'สัมภาษณ์' 'กับ' 'สื่อ'
prob = getLnValue(model['นายก'])+getLnValue(model['ได้'])+ getLnValue(model['ให้'])+getLnValue(model['สัมภาษณ์'])+getLnValue(model['กับ'])+getLnValue(model['สื่อ'])+getLnValue(model['</s>'])
print ('Problability of a sentence', math.exp(prob))

-3.991273499731109
-3.991273499731109
Problability of a sentence 1.408776035744038e-16


# Perplexity

In order to compare language model we need to calculate perplexity. In this task you should write a perplexity calculation code for the unigram model. The result perplexity should be around 406.89 and
376.86 on train and test data.

## TODO #1 Calculate perplexity

In [18]:
def getLnValue(x):
    return math.log(x)

def calculate_sentence_ln_prob(sentence, model):
    sum_prob = 0
    for w in sentence:
        sum_prob += getLnValue(model[w])
    return sum_prob

def perplexity(test,model):
    sum_ln = 0
    n = 0
    for sentence in test:
        sum_ln += calculate_sentence_ln_prob(sentence, model)
        n += len(sentence)
    return math.exp(-1/n * sum_ln)

In [19]:
test_unigrams = [list(ngrams(sent, n=1)) for sent in padded_tokenized_test]

In [20]:
print(perplexity(train_unigrams,model))
print(perplexity(test_unigrams,model))

406.8950820766048
376.86063648570286


## Q1 MCV
Calculate the perplexity of the model on the wiki test set and answer in MyCourseVille

In [21]:
wiki_test_unigrams = [list(ngrams(sent, n=1)) for sent in padded_tokenized_wiki_test]

In [22]:
print(perplexity([list(flatten(wiki_test_unigrams))], model))

498.2505681123939


# Bigram

Next, you will create a better language model than a unigram (which is not much to compare with). But first, it is very tedious to count every pair of words that occur in our corpus by ourselves. Lucky for us, nltk provides us a simple library which will simplify the process.

In [23]:
#example of nltk usage for bigram
sentence = 'I always search google for an answer .'
padded_sentence = list(pad_both_ends(sentence.split(), n=2))

print('This is how nltk generate bigram.')
for w1,w2 in bigrams(padded_sentence):
    print(w1,w2)
print('\n<s> and </s> are used as a start and end of sentence symbol. respectively.')

This is how nltk generate bigram.
<s> I
I always
always search
search google
google for
for an
an answer
answer .
. </s>

<s> and </s> are used as a start and end of sentence symbol. respectively.


Now, you should be able to implement a bigram model by yourself. Also, you must create a new perplexity calculation for bigram. The result perplexity should be around 50.21 and inf on train and test data.

## TODO #3 Write Bigram Model

In [24]:
class BigramModel():
    def __init__(self, data, vocab):
        self.unigram_count = defaultdict(lambda: 0.0)
        self.bigram_count = defaultdict(lambda: 0.0)
        self.vocab = vocab
          
        for sentence in data:
            for w1,w2 in sentence: #[(word1, ), (word2, ), (word3, )...]
                self.bigram_count[(w1,w2)] += 1
                self.unigram_count[w1] += 1
            self.unigram_count[sentence[-1][-1]] += 1
    
    def __getitem__(self, bigram):
        w1, w2 = bigram
        bigram_count = self.bigram_count[(w1, w2)]
        unigram_count = self.unigram_count[w1]
        return bigram_count / unigram_count

## TODO #4 Write Perplexity for Bigram Model

Sum perplexity score at a sentence level, instead of word level

In [25]:
def calculate_sentence_ln_prob(sentence, model):
    prob = 0
    for w in sentence:
        prob += getLnValue(model[w]) if model[w] != 0 else float('-inf')
    return prob

def perplexity(bigram_data, model):
    sum_ln = 0
    n = 0
    for sentence in bigram_data:
        sum_ln += calculate_sentence_ln_prob(sentence, model)
        n += len(sentence)
    return math.exp(-1/n * sum_ln)

In [26]:
train_bigrams = [list(ngrams(sent, n=2)) for sent in padded_tokenized_train]
test_bigrams = [list(ngrams(sent, n=2)) for sent in padded_tokenized_test]

In [27]:
bigram_model_scratch = BigramModel(train_bigrams, vocab)

In [28]:
print(perplexity([list(flatten(train_bigrams))], bigram_model_scratch))
print(perplexity([list(flatten(test_bigrams))[:17]], bigram_model_scratch))
print(perplexity([list(flatten(test_bigrams))], bigram_model_scratch))

50.21343110065738
24.977802535470772
inf


## Q2 MCV

In [29]:
wiki_test_bigrams = [list(ngrams(sent, n=2)) for sent in padded_tokenized_wiki_test]

In [30]:
print(perplexity([list(flatten(wiki_test_bigrams))],bigram_model_scratch))

inf


# Smoothing

Usually any ngram models have a sparsity problem, which means it does not have every possible ngram of words in the dataset. Smoothing techniques can alleviate this problem. In this section, you will implement three basic smoothing methods laplace smoothing, interpolation for bigram, and Knesey-Ney smoothing.

## TODO #5 write Bigram with Laplace smoothing (Add-One Smoothing)

The result perplexity on training and testing should be:

    307.29, 364.17 for Laplace smoothing

In [31]:
class BigramWithLaplaceSmoothing():

  def __init__(self, data, vocab):
    self.unigram_count = defaultdict(lambda: 0.0)
    self.bigram_count = defaultdict(lambda: 0.0)
    self.vocab = vocab
      
    for sentence in data:
        for w1,w2 in sentence: #[(word1, ), (word2, ), (word3, )...]
            self.bigram_count[(w1,w2)] += 1
            self.unigram_count[w1] += 1
        self.unigram_count[sentence[-1][-1]] += 1
        

  def __getitem__(self, bigram):
    w1, w2 = bigram
    bigram_count = self.bigram_count[(w1, w2)]
    unigram_count = self.unigram_count[w1]
    return (bigram_count+1) / (unigram_count+len(self.vocab))

model = BigramWithLaplaceSmoothing(train_bigrams, vocab)
print(perplexity([list(flatten(train_bigrams))],model))
print(perplexity([list(flatten(test_bigrams))], model))

307.2932191431376
364.17463606907467


## Q3 MCV

In [32]:
print(perplexity([list(flatten(wiki_test_bigrams))],model))

738.5456651453641


## TODO #6 Write Bigram with Interpolation
Set the lambda value as 0.7 for bigram, 0.25 for unigram, and 0.05 for unknown word.

The result perplexity on training and testing should be:

    62.44, 103.99 for Interpolation

In [33]:
class BigramWithInterpolation():

  def __init__(self, data, vocab, l = 0.7):
    self.unigram_count = defaultdict(lambda: 0.0)
    self.bigram_count = defaultdict(lambda: 0.0)
    self.vocab = vocab
    self.word_count = 0
      
    for sentence in data:
        for w1,w2 in sentence: #[(word1, ), (word2, ), (word3, )...]
            self.bigram_count[(w1,w2)] += 1
            self.unigram_count[w1] += 1
            self.word_count += 1
        self.unigram_count[sentence[-1][-1]] += 1
        self.word_count += 1

  def __getitem__(self, bigram):
    w1, w2 = bigram
    bigram_prob = self.bigram_count[(w1, w2)] / self.unigram_count[w1]
    unigram_prob = self.unigram_count[w2] / self.word_count 
    return 0.7*bigram_prob + 0.25*unigram_prob + 0.05/len(self.vocab)

model = BigramWithInterpolation(train_bigrams, vocab)
print(perplexity([list(flatten(train_bigrams))],model))
print(perplexity([list(flatten(test_bigrams))], model))

62.44269181334268
103.99017321534633


## Q4 MCV

In [34]:
print(perplexity([list(flatten(wiki_test_bigrams))],model))

255.71779470477514


## Language modeling on multiple domains

Sometimes, we do not have enough data to create a language model for a new domain. In that case, we can improvised by combining several models to improve result on the new domain.

In this exercise you will try to merge two language models from news and article domains to create a language model for the encyclopedia domain.

In [35]:
# create encyclopeida data (test data)
encyclo_data=[]
with open('BEST2010/encyclopedia.txt','r',encoding='utf-8') as f:
    for i,line in enumerate(f):
        # print(line)
        # break
        encyclo_data.append(line.strip()[:-1])

(news) First, you should try to calculate perplexity of 
your bigram with interpolation on encyclopedia data. The  perplexity should be around 240.75

In [36]:
tokenized_encyclo_data = [t.split("|") for t in encyclo_data]
tokenized_encyclo_data = [[token if token in vocab else "<UNK>" for token in sentence] for sentence in tokenized_encyclo_data]
padded_tokenized_encyclo_data = [list(pad_both_ends(sentence, n=2)) for sentence in tokenized_encyclo_data]
encyclopedia_bigrams = [list(ngrams(sent, n=2)) for sent in padded_tokenized_encyclo_data]

In [37]:
# 1) news only on "encyclopedia"
print(perplexity([list(flatten(encyclopedia_bigrams))], model))

240.74578402349226


## TODO #7 - Langauge Modelling on Multiple Domains
Combine news and article datasets to create another bigram model and evaluate it on the encyclopedia data.



(article) For your information, a bigram model with interpolation using article data to test on encyclopedia data has a perplexity of 218.57

In [38]:
# 2) article only on "encyclopedia"
best2010_article=[]
with open('BEST2010/article.txt','r',encoding='utf-8') as f:
    for i,line in enumerate(f):
        best2010_article.append(line.strip()[:-1])

combined_total_word_count = 0
for line in best2010_article:
    combined_total_word_count += len(line.split('|'))

# article_bigrams = ...
# article_vocab = ...

tokenized_article = [["<s>"] + t.split("|") + ["</s>"] for t in best2010_article]
article_vocab = Vocabulary(list(flatten(tokenized_article)), unk_cutoff=3)

tokenized_article = [[token if token in article_vocab else "<UNK>" for token in sentence] for sentence in tokenized_article]
padded_tokenized_article = [list(pad_both_ends(sentence, n=2)) for sentence in tokenized_article]
article_bigrams = [list(ngrams(sent, n=2)) for sent in tokenized_article]

tokenized_encyclo_data = [t.split("|") for t in encyclo_data]
tokenized_encyclo_data = [[token if token in article_vocab else "<UNK>" for token in sentence] for sentence in tokenized_encyclo_data]
padded_tokenized_encyclo_data = [list(pad_both_ends(sentence, n=2)) for sentence in tokenized_encyclo_data]
encyclopedia_bigrams = [list(ngrams(sent, n=2)) for sent in padded_tokenized_encyclo_data]





model = BigramWithInterpolation(article_bigrams, article_vocab)

In [39]:
print('Perplexity of the bigram model using article data with interpolation smoothing on encyclopedia test data',perplexity([list(flatten(encyclopedia_bigrams))], model))

Perplexity of the bigram model using article data with interpolation smoothing on encyclopedia test data 218.57479345888848


In [40]:
# 3) train on news + article, test on "encyclopedia"
best2010_article_and_news = best2010_article.copy()
with open('BEST2010/news.txt','r',encoding='utf-8') as f:
    for i,line in enumerate(f):
        best2010_article_and_news.append(line.strip()[:-1])

tokenized_article_and_news = [["<s>"] + t.split("|") + ["</s>"] for t in best2010_article_and_news]
article_and_news_flat_tokens = list(flatten(tokenized_article_and_news))
combined_vocab = Vocabulary(article_and_news_flat_tokens, unk_cutoff=3)

tokenized_article_and_news = [[token if token in combined_vocab else "<UNK>" for token in sentence] for sentence in tokenized_article_and_news]
padded_tokenized_article_and_news = [list(pad_both_ends(sentence, n=2)) for sentence in tokenized_article_and_news]
combined_bigrams = [list(ngrams(sent, n=2)) for sent in tokenized_article_and_news]

tokenized_encyclo_data = [t.split("|") for t in encyclo_data]
tokenized_encyclo_data = [[token if token in combined_vocab else "<UNK>" for token in sentence] for sentence in tokenized_encyclo_data]
padded_tokenized_encyclo_data = [list(pad_both_ends(sentence, n=2)) for sentence in tokenized_encyclo_data]
encyclopedia_bigrams = [list(ngrams(sent, n=2)) for sent in padded_tokenized_encyclo_data]

combined_model = BigramWithInterpolation(combined_bigrams, combined_vocab)
print('Perplexity of the combined Bigram model with interpolation smoothing on encyclopedia test data',perplexity([list(flatten(encyclopedia_bigrams))], combined_model))

Perplexity of the combined Bigram model with interpolation smoothing on encyclopedia test data 242.88025282580364


## TODO #8 - Kneser-ney on "News"

<!-- Reimplement equation 4.33 in SLP textbook (https://lagunita.stanford.edu/c4x/Engineering/CS-224N/asset/slp4.pdf) -->

Implement Bigram Knerser-ney LM. The result perplexity should be around 58.18, 93.84 on train and test data. Be careful not to mix up vocab from the above section!


In [41]:
class BigramKneserNey():

  def __init__(self, data, vocab):
    self.unigram_count = defaultdict(lambda: 0.0)
    self.bigram_count = defaultdict(lambda: 0.0)
    self.row_count = defaultdict(lambda: 0.0)
    self.col_count = defaultdict(lambda: 0.0)
    self.vocab = vocab
    self.word_count = 0
    self.unique_count = 0

    for sentence in data:
        for w1, w2 in sentence:
            self.bigram_count[(w1, w2)] += 1
            self.unigram_count[w1] += 1
            self.word_count += 1
            if self.bigram_count[(w1, w2)] == 1:
                self.row_count[w1] += 1
                self.col_count[w2] += 1
                self.unique_count += 1
        self.unigram_count[sentence[-1][-1]] += 1
        self.word_count += 1


  def __getitem__(self, bigram):
    w1, w2 = bigram
    x = max(self.bigram_count[(w1, w2)] - 0.75, 0) / self.unigram_count[w1]
    lambda1 = (0.75 / self.unigram_count[w1]) * self.row_count[w1]
    p_con = self.col_count[w2] / self.unique_count

    return x + lambda1 * p_con

model = BigramKneserNey(train_bigrams, vocab)
print(perplexity([list(flatten(train_bigrams))],model))
print(perplexity([list(flatten(train_bigrams))[:1000]],model))
print(perplexity([list(flatten(test_bigrams))[:1000]], model))
print(perplexity([list(flatten(test_bigrams))], model))

58.18312117005813
46.16427141273723
88.87482261840823
93.8399459324311


## Q5 MCV

In [42]:
print(perplexity([list(flatten(wiki_test_bigrams))],model))

268.6766593898691
