# IS319 - Deep Learning

## TP3 - Recurrent neural networks

Credits: Andrej Karpathy

The goal of this TP is to experiment with recurrent neural networks for a character-level language model to generate text that looks like training text data.

In [54]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

## 1. Text data preprocessing

Several text datasets are provided, feel free to experiment with different ones throughout the TP. At the beginning, use a small subset of a given dataset (for example use only 10k characters).

In [55]:
!tar -xvf text-datasets.tgz

tar: text-datasets.tgz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now


In [56]:
text_data_fname = 'baudelaire.txt'  # ~0.1m characters (French)
# text_data_fname = 'proust.txt'      # ~7.3m characters (French)
# text_data_fname = 'shakespeare.txt' # ~0.1m characters (English)
# text_data_fname = 'lotr.txt'        # ~2.5m characters (English)
# text_data_fname = 'doom.c'          # ~1m characters (C Code)
# text_data_fname = 'linux.c'         # ~11.5m characters (C code)

text_data = open(text_data_fname, 'r').read()
text_data = text_data[:10000] # use a small subset
print(f'Dataset `{text_data_fname}` contains {len(text_data)} characters.')
print('Excerpt of the dataset:')
print(text_data[:2000])

Dataset `baudelaire.txt` contains 10000 characters.
Excerpt of the dataset:
LES FLEURS DU MAL

par

CHARLES BAUDELAIRE


AU LECTEUR


La sottise, l'erreur, le péché, la lésine,
Occupent nos esprits et travaillent nos corps,
Et nous alimentons nos aimables remords,
Comme les mendiants nourrissent leur vermine.

Nos péchés sont têtus, nos repentirs sont lâches,
Nous nous faisons payer grassement nos aveux,
Et nous rentrons gaîment dans le chemin bourbeux,
Croyant par de vils pleurs laver toutes nos taches.

Sur l'oreiller du mal c'est Satan Trismégiste
Qui berce longuement notre esprit enchanté,
Et le riche métal de notre volonté
Est tout vaporisé par ce savant chimiste.

C'est le Diable qui tient les fils qui nous remuent!
Aux objets répugnants nous trouvons des appas;
Chaque jour vers l'Enfer nous descendons d'un pas,
Sans horreur, à travers des ténèbres qui puent.

Ainsi qu'un débauché pauvre qui baise et mange
Le sein martyrisé d'une antique catin,
Nous volons au passage un plaisir c

**(Question)** Create a character-level vocabulary for your text data. Create two dictionaries: `ctoi` mapping each character to an index, and the reverse `itoc` mapping each index to its corresponding character. Implement the functions to convert text to tensor and tensor to text using these mappings. Apply these functions to some text data.

In [57]:
# Create the vocabulary and the two mapping dictionaries
# YOUR CODE HERE
import numpy as np

idx = 0
ctoi = {}
itoc = {}

voc = set(text_data)

for elt in voc:
    ctoi[elt] = idx
    itoc[idx] = elt
    idx += 1

print(ctoi)
print(itoc)
print(len(ctoi))
ctoi["$"] = len(ctoi)
itoc[len(itoc)] = "$"
# Implement the function converting text to tensor
def text_to_tensor(text, ctoi):
    # YOUR CODE HERE
    return torch.LongTensor(np.array([ctoi[c] for c in text]))
    #tensor = torch.zeros(len(text), 1, len(ctoi))
    #for i, letter in enumerate(text):
    #  tensor[i][0][ctoi[letter]] = 1
    #return tensor


# Implement the function converting tensor to text
def tensor_to_text(tensor, itoc):
    # YOUR CODE HERE
    return ''.join([itoc[elt.item()] for elt in tensor])#torch.argmax(tensor, dim=2)])

# Apply your functions to some text data
# YOUR CODE HERE
#raise NotImplementedError()
a = text_to_tensor(text_data[:10], ctoi)

print(a)
print(tensor_to_text(a, itoc))



{'e': 0, 'c': 1, 'O': 2, "'": 3, 'F': 4, 'à': 5, 'I': 6, 'q': 7, 's': 8, 'm': 9, 'z': 10, 'g': 11, 'i': 12, 'M': 13, 'l': 14, 'f': 15, 'd': 16, 'î': 17, 'B': 18, ';': 19, 'è': 20, 'a': 21, 'É': 22, 'v': 23, 'N': 24, 'T': 25, '_': 26, 'ê': 27, 'u': 28, 'G': 29, 'A': 30, 'E': 31, '\n': 32, 'D': 33, 'o': 34, ',': 35, 'â': 36, 'k': 37, '?': 38, 'b': 39, '!': 40, 'V': 41, 'S': 42, 'U': 43, 'Q': 44, '.': 45, 'h': 46, 'ç': 47, 'j': 48, 'p': 49, 'P': 50, 'L': 51, 'ù': 52, 'û': 53, 'C': 54, 'y': 55, 'ô': 56, 'J': 57, 'H': 58, 'W': 59, '-': 60, '»': 61, '«': 62, 'x': 63, 'R': 64, ':': 65, 'é': 66, 'n': 67, 't': 68, ' ': 69, 'r': 70}
{0: 'e', 1: 'c', 2: 'O', 3: "'", 4: 'F', 5: 'à', 6: 'I', 7: 'q', 8: 's', 9: 'm', 10: 'z', 11: 'g', 12: 'i', 13: 'M', 14: 'l', 15: 'f', 16: 'd', 17: 'î', 18: 'B', 19: ';', 20: 'è', 21: 'a', 22: 'É', 23: 'v', 24: 'N', 25: 'T', 26: '_', 27: 'ê', 28: 'u', 29: 'G', 30: 'A', 31: 'E', 32: '\n', 33: 'D', 34: 'o', 35: ',', 36: 'â', 37: 'k', 38: '?', 39: 'b', 40: '!', 41: 'V',

## 2. Setup a character-level recurrent neural network

**(Question)** Setup a simple embedding layer with `nn.Embedding` to project character indices to `embedding_dim` dimensional vectors. Explain precisely how this layer works and what are its outputs for a given input sequence.

In [58]:
# YOUR CODE HERE
embedding = nn.Embedding(len(ctoi),50)

embedding(text_to_tensor(text_data[:10], ctoi))

tensor([[ 0.3860,  0.1862,  0.0707,  0.0040, -0.3789, -0.7938, -0.6429,  0.4111,
          0.3506,  0.2699, -1.9981,  0.4843,  0.7209,  0.0319,  0.1646, -0.3573,
          1.4867, -0.9488,  0.0887,  0.3134,  0.8540, -0.0650, -1.0025,  0.6038,
          0.6396, -0.2486, -0.5385, -1.1400, -1.0879,  0.0148, -0.8484, -1.8171,
          1.3409,  1.6867,  0.3975,  2.1242, -0.0626,  0.9216, -1.8180, -1.0942,
         -0.4673,  1.7314,  1.3986, -1.1375, -0.2095,  0.7399, -1.6571, -0.6826,
         -1.4411, -0.5281],
        [ 0.1699,  3.5393,  1.3957, -1.3589,  0.3102, -0.7718, -2.0302,  0.5013,
          0.5569,  1.7336, -0.8657, -0.8459,  1.3793, -0.6010, -0.4635,  0.6749,
         -0.9830, -0.0485,  0.8404, -0.6501,  0.3915,  1.5865, -1.2196,  0.2043,
          0.2346, -0.4659, -0.2888, -0.3179, -0.8887,  0.7649, -1.4995, -0.5024,
         -0.2206, -0.6439, -1.3168, -1.0166,  1.4951,  0.4900, -0.4946,  0.8736,
         -0.3390, -0.0584,  0.5506,  0.3547,  0.8620, -0.4055, -1.8146,  1.8021,


**(Question)** Setup a single-layer RNN with `nn.RNN` (without defining a custom class). Use `hidden_dim` size for hidden states. Explain precisely the outputs of this layer for a given input sequence.

In [59]:
# YOUR CODE HERE
rnn = nn.RNN(input_size=5, hidden_size=20)

YOUR ANSWER HERE

**(Question)** Create a simple RNN model with a custom `nn.Module` class. It should contain: an embedding layer, a single-layer RNN, and a dense output layer. For each character of the input sequence, the model should predict the probability of the next character. The forward method should return the probabilities for next characters and the corresponding hidden states.
After completing the class, create a model and apply the forward pass on some input text. Understand and explain the results.

*Note:* depending on how you implement the loss function later, it can be convenient to return logits instead of probabilities, i.e. raw values of the output layer before any activation function.

In [60]:
class CharRNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, num_layers=1):
        '''Initialize model parameters and layers.'''
        super().__init__()
        # YOUR CODE HERE
        self.embedding = nn.Embedding(vocab_size,embedding_dim)
        self.rnn = nn.RNN(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=num_layers)
        self.dense = nn.Linear(hidden_dim, vocab_size)

    def forward(self, tensor_data, hidden_state=None):
        '''Apply the forward pass for some text data already converted to tensor.'''
        # YOUR CODE HERE
        embedding = self.embedding(tensor_data)
        output, hidden = self.rnn(embedding, hidden_state)
        logits = self.dense(output)
        return logits, hidden

# Initialize a model and apply the forward pass on some input text
# YOUR CODE HERE
charRNN = CharRNN(len(ctoi),50,100)
logits, _ = charRNN.forward(text_to_tensor(text_data[:1],ctoi))

print(text_data[0])
print(logits.shape)
output = F.softmax(logits)
print(output)

L
torch.Size([1, 72])
tensor([[0.0122, 0.0255, 0.0097, 0.0180, 0.0111, 0.0090, 0.0151, 0.0120, 0.0173,
         0.0092, 0.0082, 0.0167, 0.0125, 0.0246, 0.0148, 0.0124, 0.0146, 0.0146,
         0.0200, 0.0115, 0.0168, 0.0157, 0.0097, 0.0118, 0.0130, 0.0169, 0.0169,
         0.0114, 0.0109, 0.0105, 0.0132, 0.0115, 0.0120, 0.0129, 0.0152, 0.0122,
         0.0160, 0.0154, 0.0086, 0.0220, 0.0173, 0.0161, 0.0194, 0.0144, 0.0168,
         0.0126, 0.0117, 0.0108, 0.0092, 0.0085, 0.0160, 0.0108, 0.0161, 0.0205,
         0.0141, 0.0169, 0.0147, 0.0118, 0.0152, 0.0147, 0.0134, 0.0115, 0.0132,
         0.0113, 0.0175, 0.0143, 0.0111, 0.0116, 0.0103, 0.0114, 0.0091, 0.0161]],
       grad_fn=<SoftmaxBackward0>)


  output = F.softmax(logits)


YOUR ANSWER HERE

**(Question)** Implement a simple training loop to overfit on a small input sequence. The loss function should be a categorical cross entropy on the predicted characters. Monitor the loss function value over the iterations.

In [99]:
# Sample a small input sequence into tensor `input_seq` and store its corresponding expected sequence into tensor `target_seq`
# YOUR CODE HERE
vocab_size = len(ctoi)
embedding_dim = 20
hidden_dim = 100
input_seq = text_to_tensor(text_data[:10], ctoi)
input_seq, last_seq = input_seq[:-2], input_seq[-1:]
print(input_seq, last_seq)
#id = torch.eye(len(ctoi))
#one_hot_input = id[input_seq]
#print(one_hot_input.shape)

target_seq = torch.cat([input_seq[1:], last_seq])
print(target_seq)
one_hot_target = id[target_seq]
print(one_hot_target.size())
criterion = nn.CrossEntropyLoss()

print(input_seq.shape, target_seq.shape)

# Implement a training loop overfitting an input sequence and monitoring the loss function
def train_overfit(model, input_seq, target_seq, n_iters=200, learning_rate=0.02):
    # YOUR CODE HERE
    optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate, weight_decay=5e-3, momentum=0.9)
    hidden = None

    for iter in range(1, n_iters + 1):
      logits, hidden = model.forward(input_seq)
      #output = F.softmax(logits, dim=1)
      loss = criterion(logits, target_seq)

      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

      if iter % 10 == 0:
          print(f'Iteration {iter}/{n_iters}, Loss: {loss.item()}')



# Initialize a model and make it overfit the input sequence
# YOUR CODE HERE

charRNN = CharRNN(vocab_size, embedding_dim, hidden_dim)
train_overfit(charRNN, input_seq, target_seq)

tensor([51, 31, 42, 69,  4, 51, 31, 43]) tensor([42])
tensor([31, 42, 69,  4, 51, 31, 43, 42])
torch.Size([8, 72])
torch.Size([8]) torch.Size([8])
Iteration 10/200, Loss: 2.6660232543945312
Iteration 20/200, Loss: 0.38535934686660767
Iteration 30/200, Loss: 0.07076366245746613
Iteration 40/200, Loss: 0.022225189954042435
Iteration 50/200, Loss: 0.012122681364417076
Iteration 60/200, Loss: 0.008934041485190392
Iteration 70/200, Loss: 0.00790470652282238
Iteration 80/200, Loss: 0.007511892355978489
Iteration 90/200, Loss: 0.0074180718511343
Iteration 100/200, Loss: 0.007461337372660637
Iteration 110/200, Loss: 0.007568446919322014
Iteration 120/200, Loss: 0.007710354868322611
Iteration 130/200, Loss: 0.007873008958995342
Iteration 140/200, Loss: 0.008047943003475666
Iteration 150/200, Loss: 0.008229600265622139
Iteration 160/200, Loss: 0.00841391272842884
Iteration 170/200, Loss: 0.008597610518336296
Iteration 180/200, Loss: 0.008777903392910957
Iteration 190/200, Loss: 0.008952690288424

**(Question)** Implement a `predict_argmax` method for your `RNN` model. Then, verify your overfitting: use some characters of your input sequence as context to predict the remaining ones. Experiment with the current model and analyze the results.

In [123]:
class CharRNN(CharRNN):
    def predict_argmax(self, context_tensor, n_predictions):
        # Apply the forward pass for the context tensor
        # Then, store the last prediction and last hidden state
        # YOUR CODE HERE
        predictions = []
        logits, hidden = self.forward(context_tensor)
        output = F.softmax(logits, dim=1)
        pred = torch.argmax(output,dim=1)
        last_pred = torch.LongTensor(pred[-1]).unsqueeze(-1)
        predictions.append(last_pred)
        # Use the last prediction and last hidden state as inputs to the next forward pass
        # Do this in a loop to predict the next `n_predictions` characters
        # YOUR CODE HERE
        for _ in range(n_predictions):
            logits, hidden = self.forward(last_pred, hidden)
            output = F.softmax(logits, dim=1)
            last_pred = torch.argmax(output).unsqueeze(-1)
            predictions.append(last_pred)
        return predictions


# Initialize a model and make it overfit as above
# Then, verify your overfitting by predicting characters given some context
# YOUR CODE HERE

charRNN = CharRNN(vocab_size, embedding_dim, hidden_dim)
train_overfit(charRNN, input_seq, target_seq)

print(tensor_to_text(charRNN.predict_argmax(text_to_tensor(text_data[:500],ctoi),100),itoc))


Iteration 10/200, Loss: 2.45266056060791
Iteration 20/200, Loss: 0.24240770936012268
Iteration 30/200, Loss: 0.0919770896434784
Iteration 40/200, Loss: 0.031762223690748215
Iteration 50/200, Loss: 0.011348268017172813
Iteration 60/200, Loss: 0.0066064889542758465
Iteration 70/200, Loss: 0.00528807332739234
Iteration 80/200, Loss: 0.004910900257527828
Iteration 90/200, Loss: 0.004839443601667881
Iteration 100/200, Loss: 0.004896368831396103
Iteration 110/200, Loss: 0.005020711570978165
Iteration 120/200, Loss: 0.005186916328966618
Iteration 130/200, Loss: 0.005383055657148361
Iteration 140/200, Loss: 0.00560236070305109
Iteration 150/200, Loss: 0.005839820019900799
Iteration 160/200, Loss: 0.006091044284403324
Iteration 170/200, Loss: 0.006351749412715435
Iteration 180/200, Loss: 0.006617935840040445
Iteration 190/200, Loss: 0.006885402835905552
Iteration 200/200, Loss: 0.007150441408157349
US FLEUS FLEUS FLEUS FLEUS FLEUS FLEUS FLEUS FLEUS FLEUS FLEUS FLEUS FLEUS FLEUS FLEUS FLEUS FLEU

YOUR ANSWER HERE

Using the argmax function to predict the next character can yield a deterministic generator always predicting the same characters. Instead, it is common to predict the next character by sampling from the distribution of output predictions, adding some randomness into the generator.

**(Question)** Implement a `predict_proba` method for your `RNN` model. It should be very similar to `predict_argmax`, but instead of using argmax, it should randomly sample from the output predictions. To do that, you can use the `torch.distribution.Categorical` class and its `sample()` method. Verify that your method correctly added some randomness.

In [None]:
class CharRNN(CharRNN):
    def predict_proba(self, input_context, n_predictions):
        # YOUR CODE HERE
        

# Verify that your predictions are not deterministic anymore
# YOUR CODE HERE
raise NotImplementedError()

## 3. Train the RNN model on text data

**(Question)** Adapt your previous code to implement a proper training loop for a text dataset. To do so, we need to specify a sequence length `seq_len`, acting similarly to the batch size in classic neural networks. Then, you can either randomly sample sequences of length `seq_len` from the text dataset over `n_iters` iterations, or properly loop over the text dataset for `n_epochs` epochs (with a random starting point for each epoch to ensure different sequences), to make sure the whole dataset is seen by the model. Feel free to adjust training and model parameters empirically. Start with a small model and a small subset of the text dataset, then move on to larger experiments. Remember to use GPU if available.

In [None]:
# Create the text dataset, compute its mappings and convert it to tensor
# YOUR CODE HERE
raise NotImplementedError()

# Initialize training parameters
# YOUR CODE HERE
raise NotImplementedError()

# Initialize a character-level RNN model
# YOUR CODE HERE
raise NotImplementedError()

# Setup the training loop
# Regularly record the loss and sample from the model to monitor what is happening
# YOUR CODE HERE
raise NotImplementedError()

**(Question)** From your trained model, play around with its predictions: start with a custom input sequence and ask the model to predict the rest. Analyze and comment your results.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

## 4. Experiment with different RNN architectures

**(Question)** Experiment with different RNN architecures. Potential ideas are multi-layer RNNs, GRUs and LSTMs. All models can be extended to multi-layer using the `num_layers` parameter. Analyze and comment your results.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE