## **Predicting Next Characters in Names using Trigram Neural Network**

This notebook implements a simple neural network to predict the next character in a given name using a trigram-based approach. 

We start by processing and encoding character-level data, followed by training the neural network using PyTorch.


In [1]:
with open('../data/names.txt', 'r') as f:
    names = f.read().splitlines()
names[:1]

['emma']

### Character Encoding

We create mappings from characters to integers and vice versa, which will help us encode characters numerically.

In [2]:
chs = {ch for name in names for ch in name}

s_to_i = {ch : i for i, ch in enumerate(sorted(chs), 1)}
s_to_i['.'] = 0 # Special end-of-name character

i_to_s = {v: k for k, v in s_to_i.items()} # Reverse mapping

### Preparing Data

Here, we extract trigram data: given two characters, we predict the third.

In [3]:
import torch
import torch.nn.functional as F

xs_ch1 = [] # First character in trigram
xs_ch2 = [] # Second character in trigram

ys = [] # Next character (target)

for name in names[:1]:
    name = f".{name}."
    for ch1, ch2, ch3 in zip(name, name[1:], name[2:]):
        xs_ch1.append(s_to_i[ch1])
        xs_ch2.append(s_to_i[ch2])
        ys.append(s_to_i[ch3])

xs_ch1 = torch.tensor(xs_ch1)
xs_ch2 = torch.tensor(xs_ch2)
ys = torch.tensor(ys)

### One-Hot Encoding

We encode each character numerically using one-hot encoding, creating a binary vector for each character.

In [4]:
xs_ch1_enc = F.one_hot(xs_ch1, num_classes=27).float()
xs_ch2_enc = F.one_hot(xs_ch2, num_classes=27).float()

xs_enc = torch.cat([xs_ch1_enc, xs_ch2_enc], dim=1)

W = torch.randn((54, 27), requires_grad=True)
print(xs_enc.shape)
print(W.shape)

#import matplotlib.pyplot as plt
#plt.imshow(xs_enc)

torch.Size([4, 54])
torch.Size([54, 27])


### Calculating Probabilities

Compute the logits (raw predictions) and then convert these into probabilities using softmax.

In [5]:
logits = xs_enc @ W # logits, e.g. log-counts
counts = logits.exp() # equiv. to the probability matrix in the Trigram-based Markov Models notebook
probs = counts / counts.sum(1, keepdim=True) # normalize the counts, e.g. Softmax

In [6]:
probs

tensor([[4.8170e-02, 2.3410e-03, 2.8530e-04, 1.0512e-03, 1.4925e-02, 1.0527e-02,
         4.1374e-02, 2.7987e-02, 2.3826e-02, 1.9757e-01, 6.6870e-02, 6.9573e-02,
         8.5822e-03, 9.0980e-03, 1.1941e-02, 4.5654e-03, 1.7216e-02, 5.3950e-03,
         2.3046e-01, 1.3439e-02, 2.7343e-03, 3.4354e-03, 1.4182e-02, 9.3642e-02,
         6.1889e-03, 6.7817e-02, 6.8083e-03],
        [3.7325e-02, 3.8273e-03, 2.1493e-03, 4.0323e-03, 2.1233e-01, 2.3264e-03,
         7.2855e-04, 6.7144e-02, 4.6694e-03, 1.0461e-02, 2.8412e-03, 4.3954e-03,
         5.4544e-03, 1.7789e-02, 1.7657e-03, 3.4086e-02, 2.8205e-03, 2.2611e-02,
         1.7349e-02, 5.4238e-03, 7.9163e-02, 7.1498e-02, 3.0284e-03, 6.5278e-03,
         1.3960e-01, 2.2509e-01, 1.5561e-02],
        [6.0847e-02, 3.3098e-02, 7.4545e-02, 1.0579e-01, 8.0880e-02, 7.7577e-03,
         8.1237e-03, 3.7805e-02, 4.9801e-02, 4.8394e-02, 1.6298e-03, 4.3118e-03,
         1.6220e-02, 8.3676e-02, 2.8420e-02, 2.2705e-01, 2.9415e-02, 4.9935e-03,
         1.0020e-

### Evaluating Predictions

Here, we demonstrate how predictions and their probabilities are calculated and measure prediction quality using negative log likelihood.

In [7]:
nlls = torch.zeros(4)

for i in range(4):
    ch1 = xs_ch1[i].item()
    ch2 = xs_ch2[i].item()
    ch3 = ys[i].item()
    print('--------------')
    print(f"Trigram {i+1}: {i_to_s[ch1]}{i_to_s[ch2]} -> {i_to_s[ch3]}")
    print(f"Input to the neural net: {i_to_s[ch1]}{i_to_s[ch2]}")
    print(f"Actual next character: {ch3}")
    prob = probs[i, ch3]
    print(f"Output probabilities for the next character: {prob:.4f}")
    logprob = torch.log(prob)
    print(f"Negative log likelihood: {-logprob:.4f}")
    nlls[i] = -logprob
print(f"Average nll: {nlls.mean()}")

--------------
Trigram 1: .e -> m
Input to the neural net: .e
Actual next character: 13
Output probabilities for the next character: 0.0091
Negative log likelihood: 4.6997
--------------
Trigram 2: em -> m
Input to the neural net: em
Actual next character: 13
Output probabilities for the next character: 0.0178
Negative log likelihood: 4.0292
--------------
Trigram 3: mm -> a
Input to the neural net: mm
Actual next character: 1
Output probabilities for the next character: 0.0331
Negative log likelihood: 3.4083
--------------
Trigram 4: ma -> .
Input to the neural net: ma
Actual next character: 0
Output probabilities for the next character: 0.0007
Negative log likelihood: 7.2992
Average nll: 4.859097957611084


---

### Training the Neural Network

Now, let's train our neural network on the full dataset of names.

In [8]:
xs_ch1 = [] # ch1 of bigram
xs_ch2 = [] # ch2 of bigram

ys = [] # outs

for name in names:
    name = f".{name}."
    for ch1, ch2, ch3 in zip(name, name[1:], name[2:]):
        xs_ch1.append(s_to_i[ch1])
        xs_ch2.append(s_to_i[ch2])
        ys.append(s_to_i[ch3])

xs_ch1 = torch.tensor(xs_ch1)
xs_ch2 = torch.tensor(xs_ch2)
ys = torch.tensor(ys)

elems = xs_ch1.nelement()
print(f"{elems} trigrams created.")

196113 trigrams created.


In [9]:
W = torch.randn((54, 27), requires_grad=True) # initialize weights

In [10]:
for i in range(100):
# forward pass
    xs_ch1_enc = F.one_hot(xs_ch1, num_classes=27).float()
    xs_ch2_enc = F.one_hot(xs_ch2, num_classes=27).float()

    xs_enc = torch.cat([xs_ch1_enc, xs_ch2_enc], dim=1)

    logits = xs_enc @ W # logits, e.g. log-counts
    counts = logits.exp() # equiv. to the probability matrix in the Trigram-based Markov Models notebook
    probs = counts / counts.sum(1, keepdim=True) # normalize the counts, e.g. Softmax
    loss = -probs[torch.arange(elems), ys].log().mean() + 0.015 * (W**2).mean() # Negative log likelihood with regularization
# backward pass
    W.grad = None
    loss.backward()
# update
    W.data += -10 * W.grad
    
print(f"Trained for {i+1} epochs.\nloss: {loss}")

Trained for 100 epochs.
loss: 2.390183448791504


### Interactive Prediction

Finally, we use the trained model interactively to generate new names character by character.

In [11]:
while True:
    usr_in = input("Enter first character of the name: ").lower()
    if usr_in not in chs:
        print("--------\nGoodbye!")
        break

    word = f'.{usr_in[-1]}'
    
    while True:
        xs_ch1_enc = F.one_hot(torch.tensor(s_to_i[word[-2]]), num_classes=27).float()
        xs_ch2_enc = F.one_hot(torch.tensor(s_to_i[word[-1]]), num_classes=27).float()
        xs_enc = torch.cat([xs_ch1_enc, xs_ch2_enc])

        logits = xs_enc @ W
        probs = F.softmax(logits, dim=0)

        sample_val = torch.multinomial(probs, 1).item()
        next_char = i_to_s[sample_val]

        if next_char == '.':
            break
        word += next_char

    print(f"Name starting with '{word[1]}': {word[1:]}")


Name starting with 'a': arizalinnee
Name starting with 'b': bel
Name starting with 'c': cbem
Name starting with 'd': dusan
Name starting with 'e': elgy
Name starting with 'f': felen
Name starting with 'g': gsone
--------
Goodbye!
