# Exercise 3

Welcome to the exercise sheet about Recurrent Neural Networks. In this exercise sheet, we will take a closer look into RNNs, LSTMs and other variations.


The main task is to implement the same models as in the lecture and run the classification on the MNIST dataset.

## Imports


Let's first import all the dependencies we will need for this exercise.

In [1]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

## Loading the Dataset and making it iterable


In [2]:

train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())


batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

100%|██████████| 9.91M/9.91M [00:02<00:00, 3.61MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 251kB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 1.71MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 3.28MB/s]


### Exercise 1.1: Creating the model classes

Implement the RNN and the LSTM models from the lecture starting with one hidden layer and a tanh activation function for the RNN. Hint: The PyTorch packages provides built-in RNN and LSTM models.

In [None]:
# The RNN
class RNNModel(nn.Module):
    def __init__(self, ):
        super(RNNModel, self).__init__()
        # Hidden dimensions
        self.hidden = 10
        # Number of hidden layers
        self.num_layers = 2
        # Building your RNN
        self.RNN = nn.RNN(input_size=28, hidden_size=self.hidden, num_layers=self.num_layers, batch_first=True)
        # Readout layer
        self.fc = nn.Linear(self.hidden, 10)
    def forward(self, x):
        # Initialize hidden state with zeros
        self.hidden = torch.zeros(self.num_layers, x.size(0), self.hidden)
        #Define the forward steps
        out= self.RNN(x, self.hidden)
        return out
    
# The LSTM
class LSTMModel(nn.Module):
    def __init__(self, ):
        super(LSTMModel, self).__init__()
        # Hidden dimensions
        self.hidden = 10
        # Number of hidden layers
        self.num_layers = 2
        # Building your LSTM
        self.LSTM = nn.LSTM(input_size=28, hidden_size=self.hidden, num_layers=self.num_layers, batch_first=True)
        # Readout layer
        self.fc = nn.Linear(self.hidden, 10)
    def forward(self, x):
        # Initialize hidden state and cell state with zeros
        self.hidden = torch.zeros(self.num_layers, x.size(0), self.hidden)
        self.cell = torch.zeros(self.num_layers, x.size(0), self.hidden)
        out, (hn, cn) = self.LSTM(x, (self.hidden, self.cell))
        # Define the forward steps

        return out

### Exercise 1.2: Instantiations

In [None]:
#Instantiate the model classes

input_dim = 
hidden_dim =
layer_dim = 
output_dim = 

model_rnn  = RNNModel()
model_lstm = LSTMModel()


#Move to GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    
#Instantiate the Loss
criterion = nn.CrossEntropyLoss()
#Instantiate the Optimizer 
optimizer_rnn = torch.optim.SGD(model_rnn.parameters())

learning_rate = 0.1


## Exercise 1.3: Training the models

Below, you find the training steps for the RNN model. Implement the training for the LSTM model accordingly.

In [None]:
# RNN Training
# Number of steps to unroll
seq_dim = 28  

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        if torch.cuda.is_available():
            images = Variable(images.view(-1, seq_dim, input_dim).cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images.view(-1, seq_dim, input_dim))
            labels = Variable(labels)
            
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model_rnn(images)
        
        # Calculate Loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                if torch.cuda.is_available():
                    images = Variable(images.view(-1, seq_dim, input_dim).cuda())
                else:
                    images = Variable(images.view(-1, seq_dim, input_dim))
                
                # Forward pass only to get logits/output
                outputs = model_rnn(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))

In [None]:
# LSTM Training
# Number of steps to unroll

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        
        # Clear gradients w.r.t. parameters
        
        # Forward pass to get output/logits
        
        # Calculate Loss
        
        # Getting gradients w.r.t. parameters
        
        # Updating parameters
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            
            # Iterate through test dataset
            
                # Forward pass only to get logits/output
                
                # Get predictions from the maximum value
                
                # Total number of labels
                
                # Total correct predictions
            
            # Print Loss

## Exercise 2: Classification
We want to compare different model configurations with each other. 

For the RNN: 
* 1, 2, 3 or 4 hidden layers
* tanh and ReLu activation function 
* Additional fully connected layer

For the LSTM: 
* 1, 2 or 3 hidden layers
* Additional fully connected layer


### Exercise 2.1:
Change the above implementation to allow for an efficient way to compare the final classification accuracies in one cell (i.e. define training methods and add model parameters). 


In [None]:
## your code goes here
## type your answer as a comment

### Exercise 2.2:
Do your results differ from the results presented in the lecture? If so, why?

In [None]:
## Your answer goes here

## Exercise 3:

So far, we always trained for 3000 iterations with a batch size of 100 and a learning rate of 0.1. Our classification accuracies might be improved, if we change these values. Systematically change these values and find a better combination (if possible). 

In [None]:
## your code goes here

## Exercise 4: 
1. Why might the LSTM result in better classification accuracies? What are the advantages and disadvantages of using an LSTM in this task, compared to an RNN?
2. We addressed other variants of RNNs in the lecture. Which of them might be suitable for this classification task an why? (GRU, bidirectional RNN, Recursive Neural Network, Encoder-Decoder RNN)