<a href="https://colab.research.google.com/github/SarthakNarayan/DL-and-ML/blob/master/googlecolab/MNISTusingRNN_BiRNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Best Explanation of RNN
![Best Explanation](https://i.stack.imgur.com/0Poch.png)<br/>
Several LSTM cells form one LSTM layer. This is shown in the figure below. Since you are mostly dealing with data that is very extensive, it is not possible to incorporate everything in one piece into the model. Therefore, data is divided into small pieces as batches, which are processed one after the other until the batch containing the last part is read in. In the lower part of the figure you can see the input (dark grey) where the batches are read in one after the other from batch 1 to batch batch_size. The cells LSTM cell 1 to LSTM cell time_step above represent the described cells of the LSTM model (http://colah.github.io/posts/2015-08-Understanding-LSTMs/). The number of cells is equal to the number of fixed time steps. For example, if you take a text sequence with a total of 150 characters, you could divide it into 3 (batch_size) and have a sequence of length 50 per batch (number of time_steps and thus of LSTM cells). If you then encoded each character one-hot, each element (dark gray boxes of the input) would represent a vector that would have the length of the vocabulary (number of features). These vectors would flow into the neuronal networks (green elements in the cells) in the respective cells and would change their dimension to the length of the number of hidden units (number_units). So the input has the dimension (batch_size x time_step x features). The Long Time Memory (Cell State) and Short Time Memory (Hidden State) have the same dimensions (batch_size x number_units). The light gray blocks that arise from the cells have a different dimension because the transformations in the neural networks (green elements) took place with the help of the hidden units (batch_size x time_step x number_units). The output can be returned from any cell but mostly only the information from the last block (black border) is relevant (not in all problems) because it contains all information from the previous time steps.

In [0]:
import torch
import torchvision
from torchvision import transforms
from torch import nn
import torch.optim as optim

In [0]:
train_dataset = torchvision.datasets.MNIST(root='data',
                                           train=True, 
                                           transform=transforms.ToTensor(),
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='data',
                                          train=False, 
                                          transform=transforms.ToTensor())

length_of_train_set = len(train_dataset)
fraction = 0.2
length_of_validation_set = int(fraction*length_of_train_set)
resulting_train_length = length_of_train_set - length_of_validation_set

new_train_dataset , validation_dataset = torch.utils.data.random_split(train_dataset , [resulting_train_length,length_of_validation_set])

In [0]:
batch_size = 128
no_hidden_units = 100
sequence_length = 28
# each row with 28 pixels
input_size = 28
# Since there are 10 classes
output_size = 10
num_layers = 2
num_epochs = 2

In [4]:
# Data loader
train_loader = torch.utils.data.DataLoader(dataset=new_train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

validation_loader = torch.utils.data.DataLoader(dataset=validation_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

print("Lengh of trainset {}".format(len(new_train_dataset)))
print("Lengh of testset {}".format(len(test_dataset)))
print("Lengh of validationset {}".format(len(validation_dataset)))

Lengh of trainset 48000
Lengh of testset 10000
Lengh of validationset 12000


In [5]:
class Lstm(nn.Module):
    def __init__(self , input_size , no_hidden_units , output_size , num_layers , bidirectional = False):
        super(Lstm , self).__init__()
        
        # since we will be using these values in the next function
        self.bidirectional = bidirectional
        self.no_hidden_units = no_hidden_units
        self.num_layers = num_layers
        
        if self.bidirectional == False:
            # Here input_size is the number of features 
            # Can test the performance of RNNs and GRUs
            self.lstm = nn.LSTM(input_size , self.no_hidden_units , 
                                num_layers , batch_first = True)
            self.fc = nn.Linear(self.no_hidden_units , output_size)
            
        elif self.bidirectional == True:
            self.lstm = nn.LSTM(input_size , self.no_hidden_units , 
                                num_layers , batch_first = True ,
                                bidirectional = True)
            self.fc = nn.Linear(self.no_hidden_units*2 , output_size)
        
    def forward(self , x):
        # Would only have been hidden if RNN would have been used
        # hidden = torch.zeros(self.num_layers ,x.size(0) ,self.no_hidden_units)
        
        if self.bidirectional == False: 
            h0 = torch.zeros(self.num_layers ,x.size(0) ,self.no_hidden_units).cuda()
            c0 = torch.zeros(self.num_layers ,x.size(0) ,self.no_hidden_units).cuda()
            
        elif self.bidirectional == True:
            h0 = torch.zeros(self.num_layers*2 ,x.size(0) ,self.no_hidden_units).cuda()
            c0 = torch.zeros(self.num_layers*2 ,x.size(0) ,self.no_hidden_units).cuda()
            
        out , _ = self.lstm(x , (h0 , c0))
        # out: tensor of shape (batch_size, seq_length, hidden_size*2) for bidirectional
        # getting the last output from the sequence
        out = out[:,-1,:]
        output = self.fc(out)
        return output
    
lstm = Lstm(input_size , no_hidden_units , output_size , num_layers , True)
lstm.cuda()
print(lstm)

Lstm(
  (lstm): LSTM(28, 100, num_layers=2, batch_first=True, bidirectional=True)
  (fc): Linear(in_features=200, out_features=10, bias=True)
)


In [0]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(lstm.parameters() , lr=1e-3)

In [7]:
accuracy_max = 0
for i in range(num_epochs):
    
    running_train_loss = 0
    correct = 0
    accuracy = 0
    running_validation_loss = 0
    # Very important to have train at the beginning
    # because the program might break after training at evaluation 
    # Hence validation state will continue and there will be no way to go
    # back to train state
    lstm.train()
    for images , labels in train_loader:
        # making images and labels cuda compatible
        images = images.cuda()
        labels = labels.cuda()
        images = images.reshape(-1 ,sequence_length , input_size)
        # making the gradients zero
        optimizer.zero_grad()
        #forward pass
        logits = lstm(images)
        # calculating the loss
        loss = criterion(logits , labels)
        # backward propagation
        loss.backward()
        optimizer.step()
        running_train_loss += loss
    
    print("Training loss after epoch {}/{} is {} ".format(i+1 , num_epochs , running_train_loss/len(train_loader)))
    
    # putting the mode in evaluation mode
    lstm.eval()
    
    # Since we dont want to compute gradients 
    with torch.no_grad():
        for images_val , labels_val in validation_loader:
            images_val = images_val.cuda()
            labels_val = labels_val.cuda()
            images_val = images_val.reshape(-1 ,sequence_length , input_size)
            
            prediction = lstm(images_val)
            values , indices = torch.max(prediction , 1)
            valid_loss = criterion(prediction , labels_val)
            
            running_validation_loss += valid_loss
            for j in range(len(indices)):
                if (indices[j] == labels_val[j]):
                    correct += 1
    
    accuracy = (correct/len(validation_dataset))*100
    running_validation_loss = running_validation_loss/len(validation_loader)
    print("Validation loss and accuracy after epoch {}/{} is {} and {}".format(i+1 , 
                                                                               num_epochs , 
                                                                               running_validation_loss,
                                                                               accuracy))
    if accuracy_max < accuracy:
        accuracy_max = accuracy
        print("Maximum validation accuracy of {} at epoch {}/{}".format(accuracy_max,
                                                                    i+1 , 
                                                                    num_epochs))
        print("saving the model \n")
        torch.save(lstm.state_dict(), '/content/TransferLearning.pth')
    else:
        print()

print("\n Training Over")

Training loss after epoch 1/2 is 0.6576919555664062 
Validation loss and accuracy after epoch 1/2 is 0.2457413524389267 and 92.525
Maximum validation accuracy of 92.525 at epoch 1/2
saving the model 

Training loss after epoch 2/2 is 0.19110603630542755 
Validation loss and accuracy after epoch 2/2 is 0.13701337575912476 and 95.76666666666667
Maximum validation accuracy of 95.76666666666667 at epoch 2/2
saving the model 


 Training Over


In [8]:
# loading the weights of the best model for testing
model_loaded = lstm
model_loaded.cuda()
model_loaded.load_state_dict(torch.load('/content/TransferLearning.pth')) 
model_loaded.eval()

Lstm(
  (lstm): LSTM(28, 100, num_layers=2, batch_first=True, bidirectional=True)
  (fc): Linear(in_features=200, out_features=10, bias=True)
)

In [9]:
# testing the model
correct = 0
accuracy = 0
with torch.no_grad():
        for images_test , labels_test in test_loader:
            images_test = images_test.cuda()
            labels_test = labels_test.cuda()
            images_test = images_test.reshape(-1 ,sequence_length , input_size)
            prediction = model_loaded(images_test)
            values , indices = torch.max(prediction , 1)
            test_loss = criterion(prediction , labels_test)
            
            for j in range(len(indices)):
                if (indices[j] == labels_test[j]):
                    correct += 1
    
accuracy = (correct/len(test_dataset))*100
print("Test accuracy is {}".format(accuracy))

Test accuracy is 95.94
