# Exercises

## E01

Train a trigram language model, i.e. take two characters as an input to predict the 3rd one. Feel free to use either counting or a neural net. Evaluate the loss; Did it improve over a bigram model?

### Count Approach

In [98]:
import torch
words = open('names.txt', 'r').read().splitlines()
chars = sorted(list(set(''.join(words))))
stoi = {s:i+1 for i,s in enumerate(chars)}
stoi['.'] = 0
itos = {i:s for s,i in stoi.items()}

Arrange counts as (27, 27, 27) tensor:

In [99]:
N = torch.zeros((27, 27, 27), dtype=torch.int32)
for w in words:
    chs = ['.'] + list(w) + ['.']
    for ch1, ch2, ch3 in zip(chs, chs[1:], chs[2:]):
      ix1 = stoi[ch1]
      ix2 = stoi[ch2]
      ix3 = stoi[ch3]
#      print('<', ch1, ix1,'>, <', ch2, ix2,'>, <', ch3, ix3, '>')
      N[ix1, ix2, ix3] += 1

Checking probabilities:

In [100]:
p = N[0, 1].float()
p = p / p.sum()
p
g = torch.Generator().manual_seed(2147483647)
ix = torch.multinomial(p, num_samples=1, replacement=True, generator=g).item()
ix, itos[ix]

(12, 'l')

Constructing probability matrix:

In [101]:
P = (N+1).float()
P /= P.sum(2, keepdim=True)
assert P[0, 0].sum() == 1

In [102]:
P.shape

torch.Size([27, 27, 27])

Sampling:

In [103]:
g = torch.Generator().manual_seed(2147483647)

for i in range(20):
  out = []
  # two starting chars
  ix1 = 0
  ix2 = 0
  while True:
      p = P[ix1, ix2] # prob vector of (ix1, ix2)
      
      # move chars along until we hit 0
      ix1 = ix2
      ix2 = torch.multinomial(p, num_samples=1, replacement=True, generator=g).item()
      out.append(itos[ix2])
      if ix2 == 0:
          break
  print(''.join(out))

junide.
ilyasid.
prelay.
ocin.
fairritoper.
sathen.
dannaaryanileniassibduinrwin.
lessiyanayla.
te.
farmumthyfortumj.
ponn.
zena.
jaylicore.
ya.
zoffra.
jamilyn.
fmouis.
yah.
wanaasnhavi.
honszxhddion.


Calculate loss which is negative log likelihood:

In [104]:
n = 0
log_lh = 0.0
for w in words:
    chs = ['.'] + list(w) + ['.']
    for ch1, ch2, ch3 in zip(chs, chs[1:], chs[2:]):
        ix1 = stoi[ch1]
        ix2 = stoi[ch2]
        ix3 = stoi[ch3]
        
        prob = P[ix1, ix2, ix3]
        logprob = torch.log(prob)
        log_lh += logprob
        n += 1
        #print(f'{ch1}{ch2}: {prob:.4f} {logprob:.4f}')
        
print(f'{log_lh=}')
nll= - log_lh
print(f'{nll=}')
print(f'{nll/n}')

log_lh=tensor(-410414.9688)
nll=tensor(410414.9688)
2.092747449874878


**Loss is \~2 which is lower than bigram example of \~2.4**

*"What letter comes next after `.a`?"* is more precise than *"What letter comes next after `a`?"*

### NN Approach

In [107]:
import torch.nn.functional as F
import matplotlib.pyplot as plt
%matplotlib inline

Create training set of trigrams:

In [109]:
xs, ys = [], []
for w in words[:1]:
  chs = ['.'] + list(w) + ['.']
  for ch1, ch3, ch2 in zip(chs, chs[1:], chs[2:]):
    # unfold dimensions onto rows hence '* 27'
    xs.append((stoi[ch1], stoi[ch2]))
    ys.append(stoi[ch3])
    
xs = torch.tensor(xs)
ys = torch.tensor(ys)
num = xs.nelement()
print('number of examples: ', num)

number of examples:  8


In [117]:
xenc = F.one_hot(xs, num_classes=27)