## Hey guys, we are gonna discuss RNN in this notebook
A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition[4] or speech recognition.

* RNN is essentially an FNN but with a hidden layer (non-linear output) that passes on information to the next FNN
* Compared to an FNN, we've one additional set of weight and bias that allows information to flow from one FNN to another FNN sequentially that allows time-dependency.
* The diagram below shows the only difference between an FNN and a RNN.

#### one layered RNN

<img src = "./data/rnn0-1.png">

#### two layered RNN

<img src = "./data/rnn0-2.png">

## RNN model 1

* Unroll 28 time steps(here we are giving images of 28*28 pixels)
    * Each step input size: 28 x 1
    * Total per unroll: 28 x 28
        * Feedforward Neural Network input size: 28 x 28
* 1 Hidden layer
* ReLU Activation Function

<img src = "./data/rnn2n.png">

In [2]:

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as datasets

train_dataset = datasets.MNIST(root = './data',
                                   train = True,
                                   transform = transforms.ToTensor(),download = True)


test_dataset = datasets.MNIST(root = './data',
                                   train = False,
                                   transform = transforms.ToTensor(),download = True)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


9913344it [00:56, 176878.63it/s]                             


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Using downloaded and verified file: ./data/MNIST/raw/train-labels-idx1-ubyte.gz
Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Using downloaded and verified file: ./data/MNIST/raw/t10k-images-idx3-ubyte.gz
Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Using downloaded and verified file: ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw

Processing...


  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


Done!


In [180]:
print(train_dataset.train_data.shape, train_dataset.train_data.shape)


from torch.utils.data import DataLoader
batch_size = 128
n_iters = 3000
num_epochs = int(n_iters/(len(train_dataset)/batch_size) ) + 3


train_loader = DataLoader(train_dataset,
                         shuffle = True, 
                         batch_size = batch_size)


test_loader = DataLoader(test_dataset,
                         shuffle = True, 
                         batch_size = batch_size)

torch.Size([60000, 28, 28]) torch.Size([60000, 28, 28])


## Model A: 1 hidden layer

For each element in the input sequence, each layer computes the following function:
$$
h_{t} = tanh(W_{ih}x_{t} + b_{ih} + W_{hh}h_{t - 1} + b_{hh})
$$
where $h_{t}$ is the hidden state at time t, $x_{t}$ is the input at time t, and $h_{(t-1)}$ is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If nonlinearity is 'relu', then ReLU is used instead of $tanh$.



#### paramters for the RNN model
* input_dim: fisrt layer num nodes
* output_dim:  last layer num nodes
* hidden_dim: num of nodes in the hidden layer
* layer_dim:  layer_dim â€“ Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
* seq_dim: number of sequences in a sample, for e.g: number of wards in a sentence




In [171]:
class RNNmodel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, layer_dim):
        super(RNNmodel,self).__init__()
        
        # hidden dimesnion
        self.hidden_dim = hidden_dim
        
        # numeber of RNN stacked layers
        self.layer_dim = layer_dim
        
        
        # build rnn model
        # normally input and output tensor from a layer is in the shape (channel_dim(num of channels), batch_dim, num_feature) like (3, 128, 28, 28)
        # but with batch_first = True, input and output tensor will be of the shape (batch_dim, channel_dim, num_features) like (128, 3, 28, 28)
        # batch_dim = num of samples in a batch
        self.rnn = nn.RNN(input_dim, hidden_dim, 
                         layer_dim, batch_first = True, 
                         nonlinearity = 'relu')
        
        # readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        
        
        
        
    def forward(self, x):
        # rnn needs to inputs: output from the pervious iteration(called the hidden state(ht)) and input for time t(xt)
        # so for the initial iteration, we have to initialize inital state(h0), manually
        
        # initializing the inital state
        # (layer_dim, batch_size, hidden_dim) ==> (1, 128, 100)
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
        # We need to detach the hidden state to prevent exploding/vanishing gradients
        # This is part of truncated backpropagation through time (BPTT)
#         print(x.shape, "before reshape")
        x = x.view(x.shape[0], 28, 28)
#         print(x.shape, h0.shape)
        out, hn = self.rnn(x, h0.detach())
        
        # output.size = (batch_dim ,seq_len, output_dim) ==> (128, 28, 10)
        # this output matrix contain the output of each time step
        # but we need the output for the last time step
        # therefore
        out = out[:, -1, :]
        
        out = self.fc(out)
        
        # out_size() ==> (100, 10)
        return out
        

        
        

 

#### initiatlizing model paramters and citerion

In [172]:
input_dim = 28
hidden_dim = 100
layer_dim = 1
output_dim = 10
num_epochs = 3000

model = RNNmodel(input_dim, hidden_dim, output_dim, layer_dim)

criterion = nn.CrossEntropyLoss()
learning_rate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)

In [173]:
len(list(model.parameters()))

6

In [174]:
print("Input to hidden weights(Wih): ", list(model.parameters())[0].shape)
print("hidden to hidden weights(Whh): ", list(model.parameters())[1].shape)
print("Input to hidden bias(Bih): ", list(model.parameters())[2].shape)
print("Hidden to hidden bias(Bhh): ", list(model.parameters())[3].shape)
print("Hidden to output weights(Who): ", list(model.parameters())[4].shape)
print("Hidden to output bias(Bho): ", list(model.parameters())[5].shape)

Input to hidden weights(Wih):  torch.Size([100, 28])
hidden to hidden weights(Whh):  torch.Size([100, 100])
Input to hidden bias(Bih):  torch.Size([100])
Hidden to hidden bias(Bhh):  torch.Size([100])
Hidden to output weights(Who):  torch.Size([10, 100])
Hidden to output bias(Bho):  torch.Size([10])


#### model training step

In [183]:
# num of time steps(number of times to unroll)
seq_dim = 28
num_epochs = 6
iter = 0
for epoch in range(num_epochs):
    print("==========================epoch num: {}=======================".format(epoch))

    for i, (images, labels) in enumerate(train_loader):
        images = images.view(-1, seq_dim, input_dim).requires_grad_()

        # clear gradients in the optimizer object 
        optimizer.zero_grad()
        
        # forward pass 
        outputs = model(images)
        
        loss = criterion(outputs, labels)
        
        # getting  gradients of loss w.r.t weights
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        
        
        if iter%500 == 0:
            correct = 0
            total = 0
            
            
            for images, labels in test_loader:
                images = images.requires_grad_()
                
                outputs = model.forward(images)
                # it doesnt make much differnce if we use or not use softmax function here
                outputs = torch.nn.functional.softmax(outputs, dim=1)
                _, predicted = torch.max(outputs, 1)
                
                total += labels.size(0)
                
                correct += (predicted == labels).sum()
                
                
            accuracy = 100*(correct/total)
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))



Iteration: 500. Loss: 0.025200678035616875. Accuracy: 97.0999984741211
Iteration: 1000. Loss: 0.055494390428066254. Accuracy: 96.91999816894531
Iteration: 1500. Loss: 0.033629078418016434. Accuracy: 97.1500015258789
Iteration: 2000. Loss: 0.049652520567178726. Accuracy: 97.01000213623047
Iteration: 2500. Loss: 0.05371534824371338. Accuracy: 97.3499984741211
