# IS319 - Deep Learning

## TP3 - Recurrent neural networks

Credits: Andrej Karpathy

The goal of this TP is to experiment with recurrent neural networks for a character-level language model to generate text that looks like training text data.

In [31]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

## 1. Text data preprocessing

Several text datasets are provided, feel free to experiment with different ones throughout the TP. At the beginning, use a small subset of a given dataset (for example use only 10k characters).

In [32]:
!tar -xvf text-datasets.tgz

tar: Error opening archive: Failed to open 'text-datasets.tgz'


In [33]:
text_data_fname = 'baudelaire.txt'  # ~0.1m characters (French)
# text_data_fname = 'proust.txt'      # ~7.3m characters (French)
# text_data_fname = 'shakespeare.txt' # ~0.1m characters (English)
# text_data_fname = 'lotr.txt'        # ~2.5m characters (English)
# text_data_fname = 'doom.c'          # ~1m characters (C Code)
# text_data_fname = 'linux.c'         # ~11.5m characters (C code)

text_data = open(text_data_fname, 'r',encoding="utf8").read()
text_data = text_data[:10000] # use a small subset
print(f'Dataset `{text_data_fname}` contains {len(text_data)} characters.')
print('Excerpt of the dataset:')
print(text_data[:2000])

Dataset `baudelaire.txt` contains 10000 characters.
Excerpt of the dataset:
LES FLEURS DU MAL

par

CHARLES BAUDELAIRE


AU LECTEUR


La sottise, l'erreur, le péché, la lésine,
Occupent nos esprits et travaillent nos corps,
Et nous alimentons nos aimables remords,
Comme les mendiants nourrissent leur vermine.

Nos péchés sont têtus, nos repentirs sont lâches,
Nous nous faisons payer grassement nos aveux,
Et nous rentrons gaîment dans le chemin bourbeux,
Croyant par de vils pleurs laver toutes nos taches.

Sur l'oreiller du mal c'est Satan Trismégiste
Qui berce longuement notre esprit enchanté,
Et le riche métal de notre volonté
Est tout vaporisé par ce savant chimiste.

C'est le Diable qui tient les fils qui nous remuent!
Aux objets répugnants nous trouvons des appas;
Chaque jour vers l'Enfer nous descendons d'un pas,
Sans horreur, à travers des ténèbres qui puent.

Ainsi qu'un débauché pauvre qui baise et mange
Le sein martyrisé d'une antique catin,
Nous volons au passage un plaisir c

**(Question)** Create a character-level vocabulary for your text data. Create two dictionaries: `ctoi` mapping each character to an index, and the reverse `itoc` mapping each index to its corresponding character. Implement the functions to convert text to tensor and tensor to text using these mappings. Apply these functions to some text data.

In [34]:
# Create the vocabulary and the two mapping dictionaries
# YOUR CODE HERE
import numpy as np

idx = 0
ctoi = {}
itoc = {}

voc = set(text_data)

for elt in voc:
    ctoi[elt] = idx
    itoc[idx] = elt
    idx += 1

print(ctoi)
print(itoc)
print(len(ctoi))
ctoi["$"] = len(ctoi)
itoc[len(itoc)] = "$"
# Implement the function converting text to tensor
def text_to_tensor(text, ctoi):
    # YOUR CODE HERE
    return torch.LongTensor(np.array([ctoi[c] for c in text]))
    #tensor = torch.zeros(len(text), 1, len(ctoi))
    #for i, letter in enumerate(text):
    #  tensor[i][0][ctoi[letter]] = 1
    #return tensor


# Implement the function converting tensor to text
def tensor_to_text(tensor, itoc):
    # YOUR CODE HERE
    return ''.join([itoc[elt.item()] for elt in tensor])#torch.argmax(tensor, dim=2)])

# Apply your functions to some text data
# YOUR CODE HERE
#raise NotImplementedError()
a = text_to_tensor(text_data[:10], ctoi)

print(a)
print(tensor_to_text(a, itoc))



{':': 0, 'R': 1, 'É': 2, "'": 3, '!': 4, 'ô': 5, 'B': 6, ';': 7, ',': 8, 'N': 9, ' ': 10, 'P': 11, 'b': 12, 'D': 13, 'ê': 14, 'G': 15, 'H': 16, 'f': 17, 'T': 18, 'v': 19, 'U': 20, '»': 21, 'I': 22, '.': 23, 'h': 24, 'L': 25, 'n': 26, 'ù': 27, 'W': 28, '?': 29, 'c': 30, 'û': 31, 'r': 32, 'o': 33, 'a': 34, '\n': 35, 't': 36, 'C': 37, 'l': 38, 'é': 39, 'm': 40, 'F': 41, 'â': 42, 'g': 43, 'E': 44, 'V': 45, 's': 46, 'u': 47, 'q': 48, 'e': 49, 'd': 50, '-': 51, 'O': 52, '«': 53, 'k': 54, 'Q': 55, 'à': 56, 'i': 57, 'î': 58, 'j': 59, 'J': 60, 'è': 61, 'A': 62, 'z': 63, 'p': 64, '_': 65, 'y': 66, 'S': 67, 'M': 68, 'x': 69, 'ç': 70}
{0: ':', 1: 'R', 2: 'É', 3: "'", 4: '!', 5: 'ô', 6: 'B', 7: ';', 8: ',', 9: 'N', 10: ' ', 11: 'P', 12: 'b', 13: 'D', 14: 'ê', 15: 'G', 16: 'H', 17: 'f', 18: 'T', 19: 'v', 20: 'U', 21: '»', 22: 'I', 23: '.', 24: 'h', 25: 'L', 26: 'n', 27: 'ù', 28: 'W', 29: '?', 30: 'c', 31: 'û', 32: 'r', 33: 'o', 34: 'a', 35: '\n', 36: 't', 37: 'C', 38: 'l', 39: 'é', 40: 'm', 41: 'F',

## 2. Setup a character-level recurrent neural network

**(Question)** Setup a simple embedding layer with `nn.Embedding` to project character indices to `embedding_dim` dimensional vectors. Explain precisely how this layer works and what are its outputs for a given input sequence.

In [35]:
# YOUR CODE HERE
embedding = nn.Embedding(len(ctoi),50)

embedding(text_to_tensor(text_data[:10], ctoi))

tensor([[-8.1775e-01,  9.7502e-01, -1.7873e+00, -3.2672e-01, -9.3343e-01,
         -1.5130e-01, -4.0676e-01,  7.5977e-01, -1.9959e+00, -1.4231e+00,
         -3.8849e-01,  6.8988e-01, -3.2921e+00,  4.9925e-01, -2.7136e-01,
          6.1775e-01,  1.3765e+00, -9.3984e-01,  1.4457e+00, -7.0095e-02,
          4.9262e-01,  1.2880e+00,  1.5253e-02, -3.7845e-03,  1.1888e-02,
          1.2387e+00, -3.3290e-02,  3.3278e-01,  2.2172e+00,  6.9170e-01,
         -7.2544e-01,  2.5967e-01, -2.7061e-01,  6.7540e-01, -5.1456e-01,
         -4.1789e-01,  3.8044e-01,  9.3984e-02, -2.0808e-01,  1.2979e+00,
          1.2123e+00,  6.4157e-01, -7.8024e-01, -1.4961e+00, -5.5236e-01,
          3.6518e-02, -3.0728e-01, -4.8298e-01, -9.1797e-01,  2.8892e-01],
        [-5.8279e-01, -2.3925e-01, -1.7259e+00, -1.5302e+00, -4.0124e-01,
         -1.2530e+00, -4.9950e-01,  1.2825e+00,  9.0740e-02, -1.1030e+00,
         -1.1058e+00, -2.5028e-01,  2.0488e+00, -6.8769e-01,  2.3465e-01,
         -7.2817e-01,  4.4428e-01,  1

**(Question)** Setup a single-layer RNN with `nn.RNN` (without defining a custom class). Use `hidden_dim` size for hidden states. Explain precisely the outputs of this layer for a given input sequence.

In [36]:
# YOUR CODE HERE
rnn = nn.RNN(input_size=5, hidden_size=20)

YOUR ANSWER HERE

**(Question)** Create a simple RNN model with a custom `nn.Module` class. It should contain: an embedding layer, a single-layer RNN, and a dense output layer. For each character of the input sequence, the model should predict the probability of the next character. The forward method should return the probabilities for next characters and the corresponding hidden states.
After completing the class, create a model and apply the forward pass on some input text. Understand and explain the results.

*Note:* depending on how you implement the loss function later, it can be convenient to return logits instead of probabilities, i.e. raw values of the output layer before any activation function.

In [37]:
class CharRNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, num_layers=1):
        '''Initialize model parameters and layers.'''
        super().__init__()
        # YOUR CODE HERE
        self.embedding = nn.Embedding(vocab_size,embedding_dim)
        self.rnn = nn.RNN(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=num_layers)
        self.dense = nn.Linear(hidden_dim, vocab_size)

    def forward(self, tensor_data, hidden_state=None):
        '''Apply the forward pass for some text data already converted to tensor.'''
        # YOUR CODE HERE
        embedding = self.embedding(tensor_data)
        output, hidden = self.rnn(embedding, hidden_state)
        logits = self.dense(output)
        return logits, hidden

# Initialize a model and apply the forward pass on some input text
# YOUR CODE HERE


vocab_size = len(ctoi)
embedding_dim = 200
hidden_dim = 100

charRNN = CharRNN(vocab_size,embedding_dim,hidden_dim)
logits, _ = charRNN.forward(text_to_tensor(text_data[:1],ctoi))

print(text_data[0])
print(logits.shape)
output = F.softmax(logits)
print(output)

L
torch.Size([1, 72])
tensor([[0.0070, 0.0201, 0.0138, 0.0141, 0.0211, 0.0166, 0.0084, 0.0122, 0.0067,
         0.0124, 0.0110, 0.0154, 0.0201, 0.0080, 0.0154, 0.0173, 0.0113, 0.0115,
         0.0167, 0.0185, 0.0108, 0.0123, 0.0078, 0.0144, 0.0140, 0.0157, 0.0148,
         0.0125, 0.0120, 0.0098, 0.0121, 0.0264, 0.0098, 0.0104, 0.0112, 0.0120,
         0.0199, 0.0081, 0.0179, 0.0165, 0.0164, 0.0173, 0.0145, 0.0086, 0.0104,
         0.0102, 0.0126, 0.0112, 0.0076, 0.0136, 0.0118, 0.0166, 0.0170, 0.0172,
         0.0132, 0.0161, 0.0164, 0.0142, 0.0113, 0.0149, 0.0144, 0.0121, 0.0172,
         0.0198, 0.0171, 0.0118, 0.0098, 0.0087, 0.0215, 0.0203, 0.0105, 0.0197]],
       grad_fn=<SoftmaxBackward0>)


  output = F.softmax(logits)


YOUR ANSWER HERE

**(Question)** Implement a simple training loop to overfit on a small input sequence. The loss function should be a categorical cross entropy on the predicted characters. Monitor the loss function value over the iterations.

In [38]:
# Sample a small input sequence into tensor `input_seq` and store its corresponding expected sequence into tensor `target_seq`
# YOUR CODE HERE
vocab_size = len(ctoi)
embedding_dim = 20
hidden_dim = 100
input_seq = text_to_tensor(text_data[:10], ctoi)
input_seq, last_seq = input_seq[:-2], input_seq[-1:]
print(input_seq, last_seq)

target_seq = torch.cat([input_seq[1:], last_seq])

criterion = nn.CrossEntropyLoss()

print(input_seq.shape, target_seq.shape)

# Implement a training loop overfitting an input sequence and monitoring the loss function
def train_overfit(model, input_seq, target_seq, n_iters=200, learning_rate=0.02):
    # YOUR CODE HERE
    optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate, weight_decay=5e-3, momentum=0.9)
    hidden = None

    for iter in range(1, n_iters + 1):
      logits, hidden = model.forward(input_seq, hidden)
      hidden = hidden.detach() #Once we update the hidden state we need to detach it, to not backpropagate through it in the next batch
      #output = F.softmax(logits, dim=1)
      loss = criterion(logits, target_seq)

      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

      if iter % 10 == 0:
          print(f'Iteration {iter}/{n_iters}, Loss: {loss.item()}')



# Initialize a model and make it overfit the input sequence
# YOUR CODE HERE

charRNN = CharRNN(vocab_size, embedding_dim, hidden_dim)
train_overfit(charRNN, input_seq, target_seq)

tensor([25, 44, 67, 10, 41, 25, 44, 20]) tensor([67])
torch.Size([8]) torch.Size([8])
Iteration 10/200, Loss: 2.2508349418640137
Iteration 20/200, Loss: 0.1759185940027237
Iteration 30/200, Loss: 0.027652131393551826
Iteration 40/200, Loss: 0.007345334626734257
Iteration 50/200, Loss: 0.0034764930605888367
Iteration 60/200, Loss: 0.002715783193707466
Iteration 70/200, Loss: 0.002596323611214757
Iteration 80/200, Loss: 0.002659213962033391
Iteration 90/200, Loss: 0.0027965474873781204
Iteration 100/200, Loss: 0.0029749737586826086
Iteration 110/200, Loss: 0.0031818565912544727
Iteration 120/200, Loss: 0.0034112045541405678
Iteration 130/200, Loss: 0.003659548470750451
Iteration 140/200, Loss: 0.0039240713231265545
Iteration 150/200, Loss: 0.004202090669423342


Iteration 160/200, Loss: 0.004490645136684179
Iteration 170/200, Loss: 0.004786449484527111
Iteration 180/200, Loss: 0.005086120218038559
Iteration 190/200, Loss: 0.005385922268033028
Iteration 200/200, Loss: 0.005682464689016342


**(Question)** Implement a `predict_argmax` method for your `RNN` model. Then, verify your overfitting: use some characters of your input sequence as context to predict the remaining ones. Experiment with the current model and analyze the results.

In [39]:
class CharRNN(CharRNN):
    def predict_argmax(self, context_tensor, n_predictions):
        # Apply the forward pass for the context tensor
        # Then, store the last prediction and last hidden state
        # YOUR CODE HERE
        predictions = []
        logits, hidden = self.forward(context_tensor)
        output = F.softmax(logits, dim=1)[-1]
        last_pred = torch.argmax(output)
        last_pred = torch.LongTensor(last_pred).unsqueeze(-1)
        predictions.append(last_pred)
        # Use the last prediction and last hidden state as inputs to the next forward pass
        # Do this in a loop to predict the next `n_predictions` characters
        # YOUR CODE HERE
        for _ in range(n_predictions):
            logits, hidden = self.forward(last_pred, hidden)
            output = F.softmax(logits, dim=1)
            last_pred = torch.argmax(output).unsqueeze(-1)
            predictions.append(last_pred)
        return predictions


# Initialize a model and make it overfit as above
# Then, verify your overfitting by predicting characters given some context
# YOUR CODE HERE

charRNN = CharRNN(vocab_size, embedding_dim, hidden_dim)
train_overfit(charRNN, input_seq, target_seq)

print(tensor_to_text(charRNN.predict_argmax(text_to_tensor(text_data[:10],ctoi),10),itoc))


Iteration 10/200, Loss: 2.3578298091888428
Iteration 20/200, Loss: 0.16633586585521698
Iteration 30/200, Loss: 0.02181057818233967
Iteration 40/200, Loss: 0.00830086413770914
Iteration 50/200, Loss: 0.004669333808124065
Iteration 60/200, Loss: 0.003719083731994033
Iteration 70/200, Loss: 0.00346615188755095
Iteration 80/200, Loss: 0.0034484269563108683
Iteration 90/200, Loss: 0.003531783353537321
Iteration 100/200, Loss: 0.003670725505799055
Iteration 110/200, Loss: 0.0038469890132546425
Iteration 120/200, Loss: 0.004051199648529291
Iteration 130/200, Loss: 0.004277787636965513
Iteration 140/200, Loss: 0.004522814881056547
Iteration 150/200, Loss: 0.004783169366419315
Iteration 160/200, Loss: 0.005055741406977177
Iteration 170/200, Loss: 0.005337526556104422
Iteration 180/200, Loss: 0.005624797195196152
Iteration 190/200, Loss: 0.005914022680372
Iteration 200/200, Loss: 0.0062013971619307995
 FLEUS FLEU


YOUR ANSWER HERE

Using the argmax function to predict the next character can yield a deterministic generator always predicting the same characters. Instead, it is common to predict the next character by sampling from the distribution of output predictions, adding some randomness into the generator.

**(Question)** Implement a `predict_proba` method for your `RNN` model. It should be very similar to `predict_argmax`, but instead of using argmax, it should randomly sample from the output predictions. To do that, you can use the `torch.distribution.Categorical` class and its `sample()` method. Verify that your method correctly added some randomness.

In [40]:
from torch.distributions import Categorical


tensor = torch.tensor([0.06,0.04,0.3,0.2,0.1,0.05,0.09,0.06,0.1])
distribution = Categorical(probs=tensor)
last_pred1 = distribution.sample()
last_pred2 = distribution.sample()
last_pred3 = distribution.sample()
argmax = tensor.argmax()


print("argmax : " ,argmax, tensor[argmax])
print("sample distribution : ", last_pred1, tensor[last_pred1])
print("sample distribution : ", last_pred2, tensor[last_pred2])
print("sample distribution : ", last_pred3, tensor[last_pred3])

argmax :  tensor(2) tensor(0.3000)
sample distribution :  tensor(3) tensor(0.2000)
sample distribution :  tensor(8) tensor(0.1000)
sample distribution :  tensor(2) tensor(0.3000)


In [41]:
class CharRNN(CharRNN):
    def predict_proba(self, input_context, n_predictions):
        # YOUR CODE HERE
        predictions = []
        logits, hidden = self.forward(input_context)
        output = F.softmax(logits, dim=1)[-1]
        from torch.distributions import Categorical
        distribution = Categorical(probs=output)
        last_pred = distribution.sample()
        #max_pred = torch.argmax(output)
        #print(last_pred, max_pred, max(output))
        last_pred = torch.LongTensor(last_pred).unsqueeze(-1)
        predictions.append(last_pred)
        # Use the last prediction and last hidden state as inputs to the next forward pass
        # Do this in a loop to predict the next `n_predictions` characters
        # YOUR CODE HERE
        for _ in range(n_predictions):
            logits, hidden = self.forward(last_pred, hidden)
            output = F.softmax(logits, dim=1)
            #last_pred = torch.argmax(output).unsqueeze(-1)     
            distribution = Categorical(logits=output)
            last_pred = distribution.sample()
            predictions.append(last_pred)
        return predictions
        

# Verify that your predictions are not deterministic anymore
# YOUR CODE HERE

vocab_size = len(ctoi)
embedding_dim = 5
hidden_dim = 50    


charRNN = CharRNN(vocab_size, embedding_dim, hidden_dim)
train_overfit(charRNN, input_seq, target_seq)

print(tensor_to_text(charRNN.predict_proba(text_to_tensor(text_data[:10],ctoi),20),itoc))


Iteration 10/200, Loss: 3.358973979949951
Iteration 20/200, Loss: 1.614370584487915
Iteration 30/200, Loss: 0.868393063545227
Iteration 40/200, Loss: 0.4838734567165375
Iteration 50/200, Loss: 0.31867921352386475
Iteration 60/200, Loss: 0.18157970905303955
Iteration 70/200, Loss: 0.0910157784819603
Iteration 80/200, Loss: 0.0518941730260849
Iteration 90/200, Loss: 0.035380344837903976
Iteration 100/200, Loss: 0.027599308639764786
Iteration 110/200, Loss: 0.02338743954896927
Iteration 120/200, Loss: 0.02085047774016857
Iteration 130/200, Loss: 0.01919364742934704
Iteration 140/200, Loss: 0.01804051548242569
Iteration 150/200, Loss: 0.01719873584806919
Iteration 160/200, Loss: 0.016563240438699722
Iteration 170/200, Loss: 0.016072111204266548
Iteration 180/200, Loss: 0.015685953199863434
Iteration 190/200, Loss: 0.015378683805465698
Iteration 200/200, Loss: 0.015132136642932892
SîSsmVûEf'èufcaHûQFyn


## 3. Train the RNN model on text data

**(Question)** Adapt your previous code to implement a proper training loop for a text dataset. To do so, we need to specify a sequence length `seq_len`, acting similarly to the batch size in classic neural networks. Then, you can either randomly sample sequences of length `seq_len` from the text dataset over `n_iters` iterations, or properly loop over the text dataset for `n_epochs` epochs (with a random starting point for each epoch to ensure different sequences), to make sure the whole dataset is seen by the model. Feel free to adjust training and model parameters empirically. Start with a small model and a small subset of the text dataset, then move on to larger experiments. Remember to use GPU if available.

In [42]:
# Create the text dataset, compute its mappings and convert it to tensor
# YOUR CODE HERE
seq_len = 16
dataset_size = int(len(text_data) / seq_len)
text_dataset_input = torch.empty((dataset_size, seq_len)).long()
text_dataset_output = torch.empty((dataset_size, seq_len)).long()
for i in range(1,len(text_data), seq_len):
    target_input = text_to_tensor(text_data[i-1:i-1+seq_len], ctoi)
    target_output = text_to_tensor(text_data[i:i + seq_len], ctoi)
    if target_input.size(dim=0) != seq_len:
        pad = torch.cat([torch.tensor([len(ctoi) - 1]) for _ in range(seq_len - target_input.size(dim=0))])
        target_input = torch.cat((target_input,pad))
    if target_output.size(dim=0) != seq_len:
        pad = torch.cat([torch.tensor([len(ctoi) - 1]) for _ in range(seq_len - target_output.size(dim=0))])
        target_output = torch.cat((target_output,pad))
    text_dataset_input[int(i / seq_len)] = target_input
    text_dataset_output[int(i / seq_len)] = target_output
#print(text_to_tensor(text_data[i:i+seq_len],ctoi))
print(text_dataset_input[0])

# Initialize training parameters
# YOUR CODE HERE
vocab_size = len(ctoi)
embedding_dim = 16
hidden_dim = 16
n_epochs = 16
# Initialize a character-level RNN model
# YOUR CODE HERE
textRNN = CharRNN(vocab_size, embedding_dim, hidden_dim)

optimizer = torch.optim.SGD(textRNN.parameters(), lr = 0.01)#, weight_decay=5e-3, momentum=0.9)
criterion = nn.CrossEntropyLoss()
# Setup the training loop
# Regularly record the loss and sample from the model to monitor what is happening
# YOUR CODE HERE
def fit(model, text_dataset_input, text_dataset_output, dataset_size,n_epochs, optimizer):
  indices = torch.randperm(dataset_size) #randomly permute the indices
  for iter in range(1, n_epochs + 1):
        indices = torch.randperm(dataset_size) #randomly permute the indices
        hidden = None
        for i in range(dataset_size): 
          logits, hidden = model(text_dataset_input[indices[i]],hidden)
          hidden = hidden.detach() #Once we update the hidden state we need to detach it, to not backpropagate through it in the next batch
          loss = criterion(logits, text_dataset_output[indices[i]])
          optimizer.zero_grad()
          loss.backward()
          optimizer.step()

        
        print(f'Epoch {iter}, Loss: {loss.item()}')
  return model

textRNN = fit(textRNN, text_dataset_input, text_dataset_output, dataset_size, n_epochs, optimizer)
      

tensor([25, 44, 67, 10, 41, 25, 44, 20,  1, 67, 10, 13, 20, 10, 68, 62])


Epoch 1, Loss: 2.758643388748169
Epoch 2, Loss: 2.8383214473724365
Epoch 3, Loss: 2.5508549213409424
Epoch 4, Loss: 2.9029555320739746
Epoch 5, Loss: 3.0752131938934326
Epoch 6, Loss: 2.8168880939483643
Epoch 7, Loss: 2.9333038330078125
Epoch 8, Loss: 2.0109949111938477
Epoch 9, Loss: 2.728416681289673
Epoch 10, Loss: 2.8701605796813965
Epoch 11, Loss: 1.8708546161651611
Epoch 12, Loss: 2.7119500637054443
Epoch 13, Loss: 2.2926745414733887
Epoch 14, Loss: 2.2959446907043457
Epoch 15, Loss: 2.2324752807617188
Epoch 16, Loss: 2.3488385677337646
Epoch 17, Loss: 2.0884361267089844
Epoch 18, Loss: 2.1759555339813232
Epoch 19, Loss: 2.146421432495117
Epoch 20, Loss: 2.008077383041382


**(Question)** From your trained model, play around with its predictions: start with a custom input sequence and ask the model to predict the rest. Analyze and comment your results.

In [43]:
# YOUR CODE HERE
context_tensor = text_to_tensor(text_data[10:50], ctoi)

print(tensor_to_text(textRNN.predict_argmax(context_tensor,200),itoc))














































































































































































































YOUR ANSWER HERE

## 4. Experiment with different RNN architectures

**(Question)** Experiment with different RNN architecures. Potential ideas are multi-layer RNNs, GRUs and LSTMs. All models can be extended to multi-layer using the `num_layers` parameter. Analyze and comment your results.

In [44]:
# YOUR CODE HERE
class GruNN(CharRNN):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, num_layers=1):
        '''Initialize model parameters and layers.'''
        super().__init__(vocab_size, embedding_dim, hidden_dim, num_layers=1)
        # YOUR CODE HERE
        self.gru = nn.GRU(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=num_layers)

    def forward(self, tensor_data, hidden_state=None):
        '''Apply the forward pass for some text data already converted to tensor.'''
        # YOUR CODE HERE
        embedding = self.embedding(tensor_data)
        output, hidden = self.gru(embedding, hidden_state)
        logits = self.dense(output)
        return logits, hidden

In [55]:
# YOUR CODE HERE
class LSTMNN(CharRNN):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, num_layers=1):
        '''Initialize model parameters and layers.'''
        super().__init__(vocab_size, embedding_dim, hidden_dim)
        # YOUR CODE HERE
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=num_layers)

    def forward(self, tensor_data, hidden_state=None, c=None):
        '''Apply the forward pass for some text data already converted to tensor.'''
        # YOUR CODE HERE
        embedding = self.embedding(tensor_data)
        output, (hidden, c) = self.lstm(embedding, hidden_state, c)
        logits = self.dense(output)
        return logits, hidden, c

def fit_lstm(model, text_dataset_input, text_dataset_output, dataset_size,n_epochs, optimizer):
  indices = torch.randperm(dataset_size) #randomly permute the indices
  for iter in range(1, n_epochs + 1):
        indices = torch.randperm(dataset_size) #randomly permute the indices
        hidden = None
        c = None
        for i in range(dataset_size): 
          logits, (hidden, c) = model(text_dataset_input[indices[i]],hidden, c)
          hidden = hidden.detach() #Once we update the hidden state we need to detach it, to not backpropagate through it in the next batch
          c = c.detach()
          loss = criterion(logits, text_dataset_output[indices[i]])
          optimizer.zero_grad()
          loss.backward()
          optimizer.step()

        
        print(f'Epoch {iter}, Loss: {loss.item()}')
  return model

In [46]:
vocab_size = len(ctoi)
embedding_dim = 32
hidden_dim = 16

n_epochs = 20

In [47]:
context_tensor = text_to_tensor(text_data[10:50], ctoi)

In [48]:
multilayer_rnn = CharRNN(vocab_size, embedding_dim, hidden_dim, num_layers=4)
optimizer = torch.optim.SGD(multilayer_rnn.parameters(), lr = 0.01)#, weight_decay=5e-3, momentum=0.9)

multilayer_rnn = fit(multilayer_rnn, text_dataset_input, text_dataset_output, dataset_size, n_epochs, optimizer)

Epoch 1, Loss: 2.9483582973480225
Epoch 2, Loss: 2.4559316635131836
Epoch 3, Loss: 3.0863142013549805
Epoch 4, Loss: 2.7661280632019043
Epoch 5, Loss: 2.2989284992218018
Epoch 6, Loss: 2.3460638523101807
Epoch 7, Loss: 2.4761457443237305
Epoch 8, Loss: 2.4451935291290283
Epoch 9, Loss: 2.373699426651001
Epoch 10, Loss: 2.1268067359924316
Epoch 11, Loss: 2.1940388679504395
Epoch 12, Loss: 2.4814724922180176
Epoch 13, Loss: 2.5526845455169678
Epoch 14, Loss: 2.6189045906066895
Epoch 15, Loss: 1.8712354898452759
Epoch 16, Loss: 2.301795482635498
Epoch 17, Loss: 2.7008726596832275
Epoch 18, Loss: 2.039731502532959
Epoch 19, Loss: 2.8939640522003174
Epoch 20, Loss: 2.1882810592651367


In [49]:
print(tensor_to_text(multilayer_rnn.predict_argmax(context_tensor,200),itoc))







L'Et d'ant les pais et le pais et le pais et le pais et le pais et le pais et le pais et le pais et le pais et le pais et le pais et le pais et le pais et le pais et le pais et le pais et le pais


In [50]:


multilayer_gru = GruNN(vocab_size, embedding_dim, hidden_dim, num_layers=4)
optimizer = torch.optim.SGD(multilayer_gru.parameters(), lr = 0.01)#, weight_decay=5e-3, momentum=0.9)
multilayer_gru = fit(multilayer_gru, text_dataset_input, text_dataset_output, dataset_size, n_epochs, optimizer)


Epoch 1, Loss: 3.244809627532959
Epoch 2, Loss: 2.8710639476776123
Epoch 3, Loss: 3.087885618209839
Epoch 4, Loss: 2.854846954345703
Epoch 5, Loss: 2.9981679916381836
Epoch 6, Loss: 2.9538776874542236
Epoch 7, Loss: 2.749758720397949
Epoch 8, Loss: 3.282749652862549
Epoch 9, Loss: 3.024132490158081
Epoch 10, Loss: 3.1337473392486572
Epoch 11, Loss: 3.443592071533203
Epoch 12, Loss: 2.89670729637146
Epoch 13, Loss: 2.857091188430786
Epoch 14, Loss: 3.180155038833618
Epoch 15, Loss: 2.6948320865631104
Epoch 16, Loss: 3.2492191791534424
Epoch 17, Loss: 2.766862392425537
Epoch 18, Loss: 2.833197593688965
Epoch 19, Loss: 3.1480019092559814
Epoch 20, Loss: 2.7279279232025146


In [51]:
print(tensor_to_text(multilayer_gru.predict_argmax(context_tensor,200),itoc))

 e oe e e e e e oe e e e e e oe e e e e e oe e e e e e oe e e e e e oe e e e e e oe e e e e e oe e e e e e oe e e e e e oe e e e e e oe e e e e e oe e e e e e oe e e e e e oe e e e e e oe e e e e e oe 


In [56]:
multilayer_lstm = LSTMNN(vocab_size, embedding_dim, hidden_dim, num_layers=4)
optimizer = torch.optim.SGD(multilayer_lstm.parameters(), lr = 0.01)#, weight_decay=5e-3, momentum=0.9)

multilayer_lstm = fit_lstm(multilayer_lstm, text_dataset_input, text_dataset_output, dataset_size, n_epochs, optimizer)

TypeError: LSTM.forward() takes from 2 to 3 positional arguments but 4 were given

In [None]:
print(tensor_to_text(multilayer_lstm.predict_argmax(context_tensor,200),itoc))

YOUR ANSWER HERE