<a href="https://colab.research.google.com/github/bubuloMallone/NeuralNetworksEX/blob/main/mlp_names.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Implementation of a multilayer perceptron (MLP) charachter-level language model, trained on the names dataset already used.

Reference paper: Bengio et al. 2003

https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

In [1]:
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt

In [2]:
!wget https://raw.githubusercontent.com/bubuloMallone/NeuralNetworksEX/refs/heads/main/datasets/names.txt

words = open('names.txt', 'r').read().splitlines()

words[:10]

--2025-07-08 17:52:04--  https://raw.githubusercontent.com/bubuloMallone/NeuralNetworksEX/refs/heads/main/datasets/names.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 228145 (223K) [text/plain]
Saving to: ‘names.txt’


2025-07-08 17:52:04 (9.91 MB/s) - ‘names.txt’ saved [228145/228145]



['emma',
 'olivia',
 'ava',
 'isabella',
 'sophia',
 'charlotte',
 'mia',
 'amelia',
 'harper',
 'evelyn']

First let us build the vocabulary of characters and the relative mappings to/from integers

In [5]:
chars = sorted(list(set(''.join(words))))
alphabet_size = len(chars) + 1
stoi = {s:i+1 for i,s in enumerate(chars)}
stoi['.'] = 0
itos = {i:s for s,i in stoi.items()}
print(itos)
print(alphabet_size)

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z', 0: '.'}
27


Now let us build the dataset

In [6]:
# define the context length: how many char we consider to predict the next one
block_size = 3
X, Y = [], []

for word in words:
  # print(word)

  context = [0] * block_size
  for ch in word + '.':
    idx = stoi[ch]
    X.append(context)
    Y.append(idx)
    # print(''.join(itos[i] for i in context), '--->', itos[idx])
    context = context[1:] + [idx]

X = torch.tensor(X)
Y = torch.tensor(Y)
print('Data:', X.shape, X.dtype)
print('Labels:', Y.shape, Y.dtype)

num_samples = X.shape[0]

Data: torch.Size([228146, 3]) torch.int64
Labels: torch.Size([228146]) torch.int64


Now let us build now te neural network. It will be a MLP consisting of an embedding layer, an internal layer and an putput layer.

In [9]:
g = torch.Generator().manual_seed(2147483647)

# embedding
emb_dim = 2
C = torch.randn((alphabet_size, emb_dim))

# first fully connected layer
hidden_dim = 100
W1 = torch.randn((block_size * emb_dim, hidden_dim), generator=g)
b1 = torch.randn(hidden_dim, generator=g)

# second fully connected layer
W2 = torch.randn((hidden_dim, alphabet_size), generator=g)
b2 = torch.randn(alphabet_size, generator=g)

parameters = [C, W1, b1, W2, b2]

# require gradients
for p in parameters:
  p.requires_grad = True

tot_parameters = sum(p.nelement() for p in parameters)
print(f'Total number of parameters: {tot_parameters}')

Total number of parameters: 3481


In [12]:
for _ in range(100):
  # forward pass
  emb = C[X]
  h = torch.tanh(emb.view(-1, block_size * emb_dim) @ W1 + b1)   # (num_samples, hidden_dim)
  logits = h @ W2 + b2    # (hidden_dim, alphabet_size)
  loss = F.cross_entropy(logits, Y)
  # print(loss.item())

  # backward pass
  for p in parameters:
    p.grad = None
  loss.backward()

  # update parameters
  learning_rate = 0.1
  for p in parameters:
    p.data += -learning_rate * p.grad

print(loss.item())


2.8117642402648926
