# Recurrent Neural Networks in PyTorch

## 1. About Recurrent Neural Networks
### 1.1 RNN is essentially a FNN (feedforward neural network)
* Receives input from the input layer, but also receives input from the hidden layer (recurrently) after passing through a linear function
* Question: What are the advantages? What sort of problems are RNN's suited for? 
* Insight: This should be pretty simple / straightforward, especially in PyTorch based on what we've learned so far. It's all in ordering the non-linear & linear functions and how they affect the data going through the NN. 

## 2. Building a RNN using PyTorch
### Model A: 1 Hidden Layer (ReLU)
* Unroll 28 time steps
    * Each step size: 28 x 1
    * Total per unroll 28 x 28
        * Feedforward neural network input size: 28 x 28
* 1 Hidden Layer
* ReLU Activation

### Steps
* STEP 1: Load dataset
* STEP 2: Make dataset iterable
* STEP 3: Create model class
* STEP 4: Instantiate model class
* STEP 5: Instantiate loss class
* STEP 6: Instantiate optimizer class
* STEP 7: Train Model! 

### Step 1: Load MNIST Dataset
**Handwritten digits from 0-9**

In [1]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [2]:
train_dataset = dsets.MNIST(root="./data",
                            train=True,
                            transform=transforms.ToTensor(), 
                            download=True)

test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

In [3]:
print(train_dataset.train_data.size())

torch.Size([60000, 28, 28])


In [4]:
print(train_dataset.train_labels.size())

torch.Size([60000])


In [5]:
print(test_dataset.test_data.size())

torch.Size([10000, 28, 28])


In [6]:
print(test_dataset.test_labels.size())

torch.Size([10000])


### Step 2: Make dataset iterable

In [7]:
batch_size = 100
n_iters = 3000
num_epochs = int(n_iters / (len(train_dataset)/batch_size))

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Step 3: Create Model Class (RNN)

In [8]:
class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNNModel, self).__init__()
        # Hidden Dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your RNN
        # batch_first=True causes input/output Tensors to be of shape
        # (batch_dim, seq_dim, input_dim)
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
        
        # Readout Layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # Initialize hidden state with zeros
        # (layer_dim, batch_size, hidden_dim)
        h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
        
        # One time step
        out, hn = self.rnn(x, h0)
        
        # Index hidden state of last time step
        # out.size() --> 100, 28, 100
        # out[:, -1, :] --> 100, 100 --> just want last time step hidden states!
        out = self.fc(out[:, -1, :])
        # out.size() --> 100, 10
        return out

### Step 4: Instantiate Model Class
* 28 time steps
    * Each time step: input dimension = 28
* 1 hidden layer
* MNIST 0-9 digits $\to$ output dimension = 10

In [9]:
input_dim = 28
hidden_dim = 100
layer_dim = 1
output_dim = 10

In [10]:
model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)

### Step 5: Instantiate Loss Class

RNN will also use **Cross Entropy Loss** like FNN, CNN, logistic regression

In [11]:
criterion = nn.CrossEntropyLoss()

### Step 6: Instantiate Optimizer Class

In [12]:
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

### RNN Model Parameters In-depth

In [13]:
print(len(list(model.parameters())))

6


**Parameters (A = parameter slope, B = Bias)**
* Input layer to hidden layer linear function
    * A1, B1
* Hidden layer to output layer linear function
    * A2, B2
* Hidden layer to hidden layer linear function
    * A3, B3

In [14]:
print(model.parameters())
print(len(list(model.parameters()))) #This will show us how many items are in our list of parameters (trainable layers)

# Input to hidden layer (A1)
print(list(model.parameters())[0].size())

# Input ot hidden BIAS (B1)
print(list(model.parameters())[1].size())

# Hidden to Hidden (A2)
print(list(model.parameters())[2].size())

# Hidden to Hidden BIAS (B2)
print(list(model.parameters())[3].size())

# Hidden to Output (A3)
print(list(model.parameters())[4].size())

# Hidden to Output BIAS (B3)
print(list(model.parameters())[5].size())

<generator object Module.parameters at 0x111f2fbf8>
6
torch.Size([100, 28])
torch.Size([100, 100])
torch.Size([100])
torch.Size([100])
torch.Size([10, 100])
torch.Size([10])


### Step 7: Train Model! 
**Process**
1. **Convert inputs/labels to Variables**
    * RNN Input (1, 28)
    * CNN Input (1, 28, 28)
    * FNN Input (1, 28*28)
- Clear gradient buffers
- Get output given inputs
- Get Loss
- Get gradients w.r.t. parameters
- Update parameters using gradients
    * parameters = parameters - learning rate * paramters_gradients
- REPEAT

In [16]:
# Number of steps to unroll
seq_dim = 28

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
    
        images = Variable(images.view(-1, seq_dim, input_dim)) 
        labels = Variable(labels)
        
        #Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        #Forward pass to get outputs / logits
        # outputs.size() --> (100, 10)
        outputs = model(images)
        
        #Calculate Loss: softmax --> Cross Entropy Loss
        loss = criterion(outputs, labels)
        
        #Get gradients w.r.t. parameters
        loss.backward()
        
        #Update parameters
        optimizer.step()
        
        iter += 1
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0 
            total = 0
            #Iterate through the test dataset
            for images, labels in test_loader:
                # Load images to Torch Variable
                images = Variable(images.view(-1, seq_dim, input_dim))
                
                # Forward pass only to get outputs/logits
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                    
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data[0], accuracy))

Iteration: 500. Loss: 1.5028343200683594. Accuracy: 51.91
Iteration: 1000. Loss: 0.5447091460227966. Accuracy: 77.21
Iteration: 1500. Loss: 0.4508627653121948. Accuracy: 84.89
Iteration: 2000. Loss: 0.4968450665473938. Accuracy: 87.19
Iteration: 2500. Loss: 0.5650933980941772. Accuracy: 85.4
Iteration: 3000. Loss: 0.20396526157855988. Accuracy: 93.09


### This seemed to take about as long, or maybe a little longer than a CNN. I need to do some reading to find out what are some good applications for RNN's vs. CNN's et cetera. 