
<h1>Recurrent Neural Networks (RNN)</h1>

<h3 style="color: yellow;">A Recurrent Neural Network (RNN) is a type of artificial neural network designed to recognize patterns in sequences of data such as text, genomes, handwriting, flattened images, spoken words, or numerical time series data from sensors, stock markets, and government agencies.</h3>

<h3 style="color: yellow;">RNNs possess a kind of "memory" that retains information about previous computations.</h3>

<h3 style="color: yellow;">RNNs are especially effective for sequences and lists. Both the input and the output can be sequences.</h3>

<h3 style="color: yellow;">In essence, while traditional neural networks might process a single input to produce a single output (like in image classification), RNNs manage sequences, where the output from one step becomes the input for the next.</h3>

<div style="display: flex; justify-content: center;">
    <img src='rnn.png', width =600>
</div>



In [1]:
# Importing libraries
import torch, torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from  torchvision.datasets import MNIST
from torchvision.transforms import transforms

  warn(


In [2]:
# Cuda
device= torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [3]:
# In previous contexts and feedforward networks, the batch size was introduced, and the shape for each batch was (batch x channel x height x width).
# In RNNs, the batch size is still introduced, but the shape for each batch is (batch x channel x height).
# we treat each image as rows of sequences of pixels, i.e., (28 sequence each of a length of 28 pixels).
INPUT_SIZE=28
SEQUENCE_LENGTH=28
NUM_LAYERS=2
HIDDEN_SIZE=256
NUM_CLASSES=10
LR=10
BATCH_SIZE=64
EPOCHS=4

In [4]:
# Baisc RNN class

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes,sequence_length=28):
        super(RNN,self).__init__()
        self.input_size=input_size
        self.hidden_size=hidden_size
        self.num_layers=num_layers
        self.num_classes=num_classes
        self.sequence_length=sequence_length
        self.rnn=nn.RNN(self.input_size,self.hidden_size,self.num_layers,batch_first=True) # In the MNIST dataset, the first dimension refers to the batch size
        self.fc=nn.Linear(self.hidden_size*self.sequence_length,self.num_classes)
        
    def forward(self,x):
        # Initialize hidden state with zeros
        h0=torch.zeros(self.num_layers,x.size(0), self.hidden_size).to(device) # x.size(0) considers the size of the batch
        out,_=self.rnn(x,h0) # _ gives the hidden state and we do not want to retrieve it for each single sample, thus we ignore it
        out=out.reshape(out.size(0), -1) # shpe(0) to keep the batch size first and -1 to flatten the output
        out=self.fc(out)
        return out

In [5]:
# Datset and dataloader
train_dataset=MNIST(root='dataset/',train=True,transform=transforms.ToTensor(),download=True)
test_dataset=MNIST(root='dataset/',train=False,transform=transforms.ToTensor(),download=True)
train_loader=DataLoader(dataset=train_dataset,batch_size=BATCH_SIZE,shuffle=True)
test_loader=DataLoader(dataset=test_dataset,batch_size=BATCH_SIZE,shuffle=True)

In [6]:
# Model initialization
model=RNN(INPUT_SIZE,HIDDEN_SIZE,NUM_LAYERS,NUM_CLASSES).to(device)
loss=nn.CrossEntropyLoss()
optimizer=optim.Adam(model.parameters(),lr=LR)

In [7]:
# training loop
for epoch in range(EPOCHS):
    for i, (data,labels) in enumerate(train_loader):
        data=data.to(device).squeeze(1) # squeeze to remove the channel dimension
        labels=labels.to(device)
        prediction=model(data)
        loss_=loss(prediction,labels)
        optimizer.zero_grad()
        loss_.backward()
        optimizer.step()
        
        if i%2==0:
            print(f'{epoch}/{EPOCHS} | step: {i}/ {len(train_loader)} | loss: {loss_.item():.4f}')   
        


0/4 | step: 0/ 938 | loss: 2.3177
0/4 | step: 2/ 938 | loss: 4378.7852
0/4 | step: 4/ 938 | loss: 17551.5176
0/4 | step: 6/ 938 | loss: 53697.1680
0/4 | step: 8/ 938 | loss: 86780.0078
0/4 | step: 10/ 938 | loss: 121697.2344
0/4 | step: 12/ 938 | loss: 77003.0859
0/4 | step: 14/ 938 | loss: 36137.6836
0/4 | step: 16/ 938 | loss: 57747.1484
0/4 | step: 18/ 938 | loss: 54562.9336
0/4 | step: 20/ 938 | loss: 53340.0273
0/4 | step: 22/ 938 | loss: 45295.2578
0/4 | step: 24/ 938 | loss: 74331.6094
0/4 | step: 26/ 938 | loss: 70547.1641
0/4 | step: 28/ 938 | loss: 60793.3359
0/4 | step: 30/ 938 | loss: 53434.7031
0/4 | step: 32/ 938 | loss: 62354.2383
0/4 | step: 34/ 938 | loss: 57321.8945
0/4 | step: 36/ 938 | loss: 56881.9414
0/4 | step: 38/ 938 | loss: 43405.2188
0/4 | step: 40/ 938 | loss: 34138.0469
0/4 | step: 42/ 938 | loss: 39680.7422
0/4 | step: 44/ 938 | loss: 31311.1367
0/4 | step: 46/ 938 | loss: 31764.4297
0/4 | step: 48/ 938 | loss: 49467.0703
0/4 | step: 50/ 938 | loss: 33437.

In [8]:
# check accuracy on training and test to see how good our model is
def check_accuracy(loader,model):
    if loader.dataset.train:
        print('Checking accuracy on training data')
    else:
        print('Checking accuracy on test data')
    num_correct=0
    num_samples=0
    model.eval()
    
    with torch.no_grad():
        for data,label in loader:
            data=data.to(device).squeeze(1)
            label=label.to(device)
            prediction=model(data)
            _,pred=prediction.max(1)
            num_correct+=(pred==label).sum()
            num_samples+=pred.size(0)
        print(f'Obtained {num_correct}/{num_samples} with accuracy: '
            f'{float(num_correct)/float(num_samples)*100:.2f}')
    model.train()
    
    
check_accuracy(train_loader,model)
check_accuracy(test_loader,model)
            

Checking accuracy on training data


Obtained 8557/60000 with accuracy: 14.26
Checking accuracy on test data
Obtained 1460/10000 with accuracy: 14.60



<h1>Performance improvement using GRU</h1>

<h3 style="color: yellow;"> A Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) architecture.</h3>
 <h3 style="color: yellow;">It's designed to solve the vanishing gradient problem which can come with standard RNNs, allowing each recurrent unit to adaptively capture dependencies of different time scales.</h3>

 <div style="display: flex; justify-content: center;">
    <img src='gru.png', width =500>
</div>

In [9]:
# GRU class

class GRU(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes,sequence_length=28):
        super(GRU,self).__init__()
        self.input_size=input_size
        self.hidden_size=hidden_size
        self.num_layers=num_layers
        self.num_classes=num_classes
        self.sequence_length=sequence_length
        self.gru=nn.GRU(self.input_size,self.hidden_size,self.num_layers,batch_first=True) # In the MNIST dataset, the first dimension refers to the batch size
        self.fc=nn.Linear(self.hidden_size*self.sequence_length,self.num_classes)
        
    def forward(self,x):
        # Initialize hidden state with zeros
        h0=torch.zeros(self.num_layers,x.size(0), self.hidden_size).to(device) # x.size(0) considers the size of the batch
        out,_=self.gru(x,h0) # _ gives the hidden state and we do not want to retrieve it for each single sample, thus we ignore it
        out=out.reshape(out.size(0), -1) # shpe(0) to keep the batch size first and -1 to flatten the output
        out=self.fc(out)
        return out

In [10]:
# Model initialization
model=GRU(INPUT_SIZE,HIDDEN_SIZE,NUM_LAYERS,NUM_CLASSES).to(device)
loss=nn.CrossEntropyLoss()
optimizer=optim.Adam(model.parameters(),lr=LR)


# training loop
for epoch in range(EPOCHS):
    for i, (data,labels) in enumerate(train_loader):
        data=data.to(device).squeeze(1) # squeeze to remove the channel dimension
        labels=labels.to(device)
        prediction=model(data)
        loss_=loss(prediction,labels)
        optimizer.zero_grad()
        loss_.backward()
        optimizer.step()
        
        if i%2==0:
            print(f'{epoch}/{EPOCHS} | step: {i}/ {len(train_loader)} | loss: {loss_.item():.4f}')   
        



0/4 | step: 0/ 938 | loss: 2.3014
0/4 | step: 2/ 938 | loss: 2699.0737
0/4 | step: 4/ 938 | loss: 17993.4785
0/4 | step: 6/ 938 | loss: 41038.8555
0/4 | step: 8/ 938 | loss: 58198.6992
0/4 | step: 10/ 938 | loss: 37888.8359
0/4 | step: 12/ 938 | loss: 34672.4219
0/4 | step: 14/ 938 | loss: 24222.2539
0/4 | step: 16/ 938 | loss: 29018.4062
0/4 | step: 18/ 938 | loss: 22586.1543
0/4 | step: 20/ 938 | loss: 34523.9570
0/4 | step: 22/ 938 | loss: 21586.3633
0/4 | step: 24/ 938 | loss: 23089.0273
0/4 | step: 26/ 938 | loss: 17955.0840
0/4 | step: 28/ 938 | loss: 8933.5674
0/4 | step: 30/ 938 | loss: 19638.5898
0/4 | step: 32/ 938 | loss: 11345.4648
0/4 | step: 34/ 938 | loss: 13208.8633
0/4 | step: 36/ 938 | loss: 9385.2803
0/4 | step: 38/ 938 | loss: 10905.0205
0/4 | step: 40/ 938 | loss: 10037.5791
0/4 | step: 42/ 938 | loss: 13945.8418
0/4 | step: 44/ 938 | loss: 15117.8740
0/4 | step: 46/ 938 | loss: 11018.1689
0/4 | step: 48/ 938 | loss: 8502.3496
0/4 | step: 50/ 938 | loss: 10219.6816

0/4 | step: 150/ 938 | loss: 4954.4355
0/4 | step: 152/ 938 | loss: 4521.6919
0/4 | step: 154/ 938 | loss: 4898.2334
0/4 | step: 156/ 938 | loss: 4453.4956
0/4 | step: 158/ 938 | loss: 3679.9734
0/4 | step: 160/ 938 | loss: 2724.2454
0/4 | step: 162/ 938 | loss: 4898.4849
0/4 | step: 164/ 938 | loss: 3381.7048
0/4 | step: 166/ 938 | loss: 3188.6084
0/4 | step: 168/ 938 | loss: 2344.6311
0/4 | step: 170/ 938 | loss: 2741.0107
0/4 | step: 172/ 938 | loss: 2920.9685
0/4 | step: 174/ 938 | loss: 3218.6782
0/4 | step: 176/ 938 | loss: 2099.4377
0/4 | step: 178/ 938 | loss: 3383.3684
0/4 | step: 180/ 938 | loss: 6495.0068
0/4 | step: 182/ 938 | loss: 5187.1226
0/4 | step: 184/ 938 | loss: 2065.5459
0/4 | step: 186/ 938 | loss: 4095.8125
0/4 | step: 188/ 938 | loss: 3961.6223
0/4 | step: 190/ 938 | loss: 7947.1348
0/4 | step: 192/ 938 | loss: 4786.7520
0/4 | step: 194/ 938 | loss: 6330.3916
0/4 | step: 196/ 938 | loss: 5604.5322
0/4 | step: 198/ 938 | loss: 4879.5957
0/4 | step: 200/ 938 | lo

In [11]:
# check accuracy on training and test to see how good our model is
def check_accuracy(loader,model):
    if loader.dataset.train:
        print('Checking accuracy on training data')
    else:
        print('Checking accuracy on test data')
    num_correct=0
    num_samples=0
    model.eval()
    
    with torch.no_grad():
        for data,label in loader:
            data=data.to(device).squeeze(1)
            label=label.to(device)
            prediction=model(data)
            _,pred=prediction.max(1)
            num_correct+=(pred==label).sum()
            num_samples+=pred.size(0)
        print(f'Obtained {num_correct}/{num_samples} with accuracy: '
            f'{float(num_correct)/float(num_samples)*100:.2f}')
    model.train()
    
    
check_accuracy(train_loader,model)
check_accuracy(test_loader,model)
            

Checking accuracy on training data
Obtained 39989/60000 with accuracy: 66.65
Checking accuracy on test data
Obtained 6645/10000 with accuracy: 66.45



<h1>Performance improvement using LSTM</h1>

<h3 style="color: yellow;"> Long Short-Term Memory (LSTMs) are designed to avoid long-term dependency issues, making them particularly well-suited for tasks where sequences have long-range dependencies.</h3>


 <h3 style="color: yellow;"STMs have a sophisticated gating mechanism consisting of three gates: Forget Gate (f), Input Gate (i), Output Gate (o).</h3>



 <div style="display: flex; justify-content: center;">
    <img src='lstm.png', width =500>
</div>

In [12]:
# LSTM class
# Beside h0, we must define another hidden state c0 for LSTM

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes,sequence_length=28):
        super(LSTM,self).__init__()
        self.input_size=input_size
        self.hidden_size=hidden_size
        self.num_layers=num_layers
        self.num_classes=num_classes
        self.sequence_length=sequence_length
        self.lstm=nn.LSTM(self.input_size,self.hidden_size,self.num_layers,batch_first=True) # In the MNIST dataset, the first dimension refers to the batch size
        self.fc=nn.Linear(self.hidden_size*self.sequence_length,self.num_classes)
        
    def forward(self,x):
        # Initialize hidden state with zeros
        h0=torch.zeros(self.num_layers,x.size(0), self.hidden_size).to(device) # x.size(0) considers the size of the batch
        
        # Define another hidden state for LSTM
        c0=torch.zeros(self.num_layers,x.size(0), self.hidden_size).to(device)
        
        out,_=self.lstm(x,(h0,c0)) # _ gives the hidden state and we do not want to retrieve it for each single sample, thus we ignore it
        out=out.reshape(out.size(0), -1) # shpe(0) to keep the batch size first and -1 to flatten the output
        out=self.fc(out)
        return out

In [13]:
# Model initialization
model=GRU(INPUT_SIZE,HIDDEN_SIZE,NUM_LAYERS,NUM_CLASSES).to(device)
loss=nn.CrossEntropyLoss()
optimizer=optim.Adam(model.parameters(),lr=LR)


# training loop
for epoch in range(EPOCHS):
    for i, (data,labels) in enumerate(train_loader):
        data=data.to(device).squeeze(1) # squeeze to remove the channel dimension
        labels=labels.to(device)
        prediction=model(data)
        loss_=loss(prediction,labels)
        optimizer.zero_grad()
        loss_.backward()
        optimizer.step()
        
        if i%2==0:
            print(f'{epoch}/{EPOCHS} | step: {i}/ {len(train_loader)} | loss: {loss_.item():.4f}')   
        



0/4 | step: 0/ 938 | loss: 2.3047
0/4 | step: 2/ 938 | loss: 2563.2458
0/4 | step: 4/ 938 | loss: 24669.9844
0/4 | step: 6/ 938 | loss: 38488.5586
0/4 | step: 8/ 938 | loss: 56919.7031
0/4 | step: 10/ 938 | loss: 53240.3438
0/4 | step: 12/ 938 | loss: 45068.6367
0/4 | step: 14/ 938 | loss: 34588.5156
0/4 | step: 16/ 938 | loss: 25836.5039
0/4 | step: 18/ 938 | loss: 23124.2168
0/4 | step: 20/ 938 | loss: 43640.7461
0/4 | step: 22/ 938 | loss: 56013.9648
0/4 | step: 24/ 938 | loss: 50881.5195
0/4 | step: 26/ 938 | loss: 39232.0508
0/4 | step: 28/ 938 | loss: 26047.2910
0/4 | step: 30/ 938 | loss: 28801.4004
0/4 | step: 32/ 938 | loss: 21545.9492
0/4 | step: 34/ 938 | loss: 27779.8672
0/4 | step: 36/ 938 | loss: 26088.8906
0/4 | step: 38/ 938 | loss: 26041.8359
0/4 | step: 40/ 938 | loss: 35257.6953
0/4 | step: 42/ 938 | loss: 25779.2910
0/4 | step: 44/ 938 | loss: 25973.4043
0/4 | step: 46/ 938 | loss: 16212.6250
0/4 | step: 48/ 938 | loss: 17492.6523
0/4 | step: 50/ 938 | loss: 18658.8

In [14]:
# check accuracy on training and test to see how good our model is
def check_accuracy(loader,model):
    if loader.dataset.train:
        print('Checking accuracy on training data')
    else:
        print('Checking accuracy on test data')
    num_correct=0
    num_samples=0
    model.eval()
    
    with torch.no_grad():
        for data,label in loader:
            data=data.to(device).squeeze(1)
            label=label.to(device)
            prediction=model(data)
            _,pred=prediction.max(1)
            num_correct+=(pred==label).sum()
            num_samples+=pred.size(0)
        print(f'Obtained {num_correct}/{num_samples} with accuracy: '
            f'{float(num_correct)/float(num_samples)*100:.2f}')
    model.train()
    
    
check_accuracy(train_loader,model)
check_accuracy(test_loader,model)
            

Checking accuracy on training data
Obtained 18655/60000 with accuracy: 31.09
Checking accuracy on test data
Obtained 3145/10000 with accuracy: 31.45
