E05: look up and use F.cross_entropy instead. You should achieve the same result. Can you think of why we'd prefer to use F.cross_entropy instead?

In [7]:
import torch
import matplotlib.pyplot as plt
import torch.nn.functional as F

In [8]:
words = open('../names.txt', 'r').read().splitlines()

In [9]:
chars = sorted(list(set(''.join(['.'] + words))))

# Create look up tables for the alphabet
  # stoi = string to index
  # itos = index to string
stoi = {s:i for i, s in enumerate(chars)}
itos = {i:s for s, i in stoi.items()}

In [10]:
# create the training data
xs, ys = [], []
for w in words:
  chs = ['.'] + list(w) + ['.']
  for ch1, ch2 in zip(chs, chs[1:]):
    ix1 = stoi[ch1]
    ix2 = stoi[ch2]
    xs.append(ix1)
    ys.append(ix2)
ys = torch.tensor(ys)
num = len(xs)
print('number of training examples: ', num)

number of training examples:  228146


In [11]:
# initialize the 'network'
g = torch.Generator().manual_seed(2147483647)
W = torch.randn((27, 27), generator=g, requires_grad=True)

In [12]:
# gradient descent
for k in range(100):
  
  # forward pass
  logits = W[xs, :] # predict log-counts
  loss = F.cross_entropy(logits, ys) + 0.01*(W**2).mean()
  
  # backward pass
  W.grad = None # set to zero the gradient
  loss.backward()
  
  # update
  W.data += -50 * W.grad
print(f'final training loss: {loss.item():.3f}')

final training loss: 2.490


When using F.cross_entropy, we don't need to convert the logits into probabilities and then calculate the cross entropy. We can directly use the logits and the target to calculate the cross entropy.

This way, we can save some computation time.