## Why do we need Markov Chain to compute language models?

<p> Answer: Markov Chains can calculate the conditional probabilities of a finite set of states, so it can handle a long sentence as long as it is finite. Moreover, it is able to jump to other states given the probability. Hence, it can simplify the calculations of conditional probabilities.</p>

## Utilize Bigram Models

In [1]:
import numpy as np
import pandas as pd
words = ['i', 'want', 'to', 'eat', 'chinese', 'food', 'lunch', 'spend']
word_cnts = np.array([2533, 927, 2417, 746, 158, 1093, 341, 278]).reshape(1, -1)
df_word_cnts = pd.DataFrame(word_cnts, columns=words)
df_word_cnts

Unnamed: 0,i,want,to,eat,chinese,food,lunch,spend
0,2533,927,2417,746,158,1093,341,278


In [2]:
bigram_word_cnts = [[5, 827, 0, 9, 0, 0, 0, 2], [2, 0, 608, 1, 6, 6, 5, 1], [2, 0, 4, 686, 2, 0, 6, 211],
                    [0, 0, 2, 0, 16, 2, 42, 0], [1, 0, 0, 0, 0, 82, 1, 0], [15, 0, 15, 0, 1, 4, 0, 0],
                    [2, 0, 0, 0, 0, 1, 0, 0], [1, 0, 1, 0, 0, 0, 0, 0]]

df_bigram_word_cnts = pd.DataFrame(bigram_word_cnts, columns=words, index=words)
df_bigram_word_cnts

Unnamed: 0,i,want,to,eat,chinese,food,lunch,spend
i,5,827,0,9,0,0,0,2
want,2,0,608,1,6,6,5,1
to,2,0,4,686,2,0,6,211
eat,0,0,2,0,16,2,42,0
chinese,1,0,0,0,0,82,1,0
food,15,0,15,0,1,4,0,0
lunch,2,0,0,0,0,1,0,0
spend,1,0,1,0,0,0,0,0


In [6]:
df_bigram_prob = df_bigram_word_cnts.copy()
#df_bigram_prob = df_bigram_prob/np.sum(bigram_word_cnts)
for word in words:
  df_bigram_prob.loc[word,:] = df_bigram_prob.loc[word, :]/df_word_cnts.loc[0, word]
df_bigram_prob

Unnamed: 0,i,want,to,eat,chinese,food,lunch,spend
i,0.001974,0.32649,0.0,0.003553,0.0,0.0,0.0,0.00079
want,0.002157,0.0,0.655879,0.001079,0.006472,0.006472,0.005394,0.001079
to,0.000827,0.0,0.001655,0.283823,0.000827,0.0,0.002482,0.087298
eat,0.0,0.0,0.002681,0.0,0.021448,0.002681,0.0563,0.0
chinese,0.006329,0.0,0.0,0.0,0.0,0.518987,0.006329,0.0
food,0.013724,0.0,0.013724,0.0,0.000915,0.00366,0.0,0.0
lunch,0.005865,0.0,0.0,0.0,0.0,0.002933,0.0,0.0
spend,0.003597,0.0,0.003597,0.0,0.0,0.0,0.0,0.0


## Calculate the probability of follows:

<p>s1 = "i want english food </p>
<p>s1 = "want i english food </p>

```
P(s1) = P(i|<start>) * P(want|i) * P(english|want) * P(food|english) * P(<end>|food)
= 0.25 * 0.32649 * 0.0011 * 0.5 * 0.68 = 0.000030526815
P(s2) = P(want|<start>) * P(i|want) * P(english|i) * P(food|english) * P(<end>|food)
= 0.25 * 0.002157 * 0.0011 * 0.5 * 0.68 = 0.0000002016795

```
Hence, s1 is more likely to appear