
<h1>Recurrent Neural Networks (RNN)</h1>

<h3 style="color: yellow;">A Recurrent Neural Network (RNN) is a type of artificial neural network designed to recognize patterns in sequences of data such as text, genomes, handwriting, flattened images, spoken words, or numerical time series data from sensors, stock markets, and government agencies.</h3>

<h3 style="color: yellow;">RNNs possess a kind of "memory" that retains information about previous computations.</h3>

<h3 style="color: yellow;">RNNs are especially effective for sequences and lists. Both the input and the output can be sequences.</h3>

<h3 style="color: yellow;">In essence, while traditional neural networks might process a single input to produce a single output (like in image classification), RNNs manage sequences, where the output from one step becomes the input for the next.</h3>

<div style="display: flex; justify-content: center;">
    <img src='rnn.png', width =600>
</div>



In [1]:
# Importing libraries
import torch, torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from  torchvision.datasets import MNIST
from torchvision.transforms import transforms

  warn(


In [2]:
# Cuda
device= torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [3]:
# In previous contexts and feedforward networks, the batch size was introduced, and the shape for each batch was (batch x channel x height x width).
# In RNNs, the batch size is still introduced, but the shape for each batch is (batch x channel x height).
# we treat each image as rows of sequences of pixels, i.e., (28 sequence each of a length of 28 pixels).
INPUT_SIZE=28
SEQUENCE_LENGTH=28
NUM_LAYERS=2
HIDDEN_SIZE=256
NUM_CLASSES=10
LR=0.1
BATCH_SIZE=64
EPOCHS=4

In [4]:
# Baisc RNN class

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes,sequence_length=28):
        super(RNN,self).__init__()
        self.input_size=input_size
        self.hidden_size=hidden_size
        self.num_layers=num_layers
        self.num_classes=num_classes
        self.sequence_length=sequence_length
        self.rnn=nn.RNN(self.input_size,self.hidden_size,self.num_layers,batch_first=True) # In the MNIST dataset, the first dimension refers to the batch size
        self.fc=nn.Linear(self.hidden_size*self.sequence_length,self.num_classes)
        
    def forward(self,x):
        # Initialize hidden state with zeros
        h0=torch.zeros(self.num_layers,x.size(0), self.hidden_size).to(device) # x.size(0) considers the size of the batch
        out,_=self.rnn(x,h0) # _ gives the hidden state and we do not want to retrieve it for each single sample, thus we ignore it
        out=out.reshape(out.size(0), -1) # shpe(0) to keep the batch size first and -1 to flatten the output
        out=self.fc(out)
        return out

In [5]:
# Datset and dataloader
train_dataset=MNIST(root='dataset/',train=True,transform=transforms.ToTensor(),download=True)
test_dataset=MNIST(root='dataset/',train=False,transform=transforms.ToTensor(),download=True)
train_loader=DataLoader(dataset=train_dataset,batch_size=BATCH_SIZE,shuffle=True)
test_loader=DataLoader(dataset=test_dataset,batch_size=BATCH_SIZE,shuffle=True)

In [6]:
# Model initialization
model=RNN(INPUT_SIZE,HIDDEN_SIZE,NUM_LAYERS,NUM_CLASSES).to(device)
loss=nn.CrossEntropyLoss()
optimizer=optim.Adam(model.parameters(),lr=LR)

In [7]:
# training loop
for epoch in range(EPOCHS):
    for i, (data,labels) in enumerate(train_loader):
        data=data.to(device).squeeze(1) # squeeze to remove the channel dimension
        labels=labels.to(device)
        prediction=model(data)
        loss_=loss(prediction,labels)
        optimizer.zero_grad()
        loss_.backward()
        optimizer.step()
        
        if i%2==0:
            print(f'{epoch}/{EPOCHS} | step: {i}/ {len(train_loader)} | loss: {loss_.item():.4f}')   
        


0/4 | step: 0/ 938 | loss: 2.3135
0/4 | step: 2/ 938 | loss: 58.5364
0/4 | step: 4/ 938 | loss: 33.4571
0/4 | step: 6/ 938 | loss: 89.5860
0/4 | step: 8/ 938 | loss: 93.7883
0/4 | step: 10/ 938 | loss: 466.9517
0/4 | step: 12/ 938 | loss: 781.7485
0/4 | step: 14/ 938 | loss: 1065.4987
0/4 | step: 16/ 938 | loss: 600.9093
0/4 | step: 18/ 938 | loss: 622.3619
0/4 | step: 20/ 938 | loss: 509.9616
0/4 | step: 22/ 938 | loss: 603.4929
0/4 | step: 24/ 938 | loss: 484.5089
0/4 | step: 26/ 938 | loss: 904.5291
0/4 | step: 28/ 938 | loss: 809.0738
0/4 | step: 30/ 938 | loss: 540.1201
0/4 | step: 32/ 938 | loss: 527.6788
0/4 | step: 34/ 938 | loss: 321.5785
0/4 | step: 36/ 938 | loss: 508.0666
0/4 | step: 38/ 938 | loss: 607.0573
0/4 | step: 40/ 938 | loss: 562.0098
0/4 | step: 42/ 938 | loss: 537.8113
0/4 | step: 44/ 938 | loss: 435.0861
0/4 | step: 46/ 938 | loss: 434.6760
0/4 | step: 48/ 938 | loss: 422.4437
0/4 | step: 50/ 938 | loss: 386.8527
0/4 | step: 52/ 938 | loss: 290.3809
0/4 | step:

In [8]:
# check accuracy on training and test to see how good our model is
def check_accuracy(loader,model):
    if loader.dataset.train:
        print('Checking accuracy on training data')
    else:
        print('Checking accuracy on test data')
    num_correct=0
    num_samples=0
    model.eval()
    
    with torch.no_grad():
        for data,label in loader:
            data=data.to(device).squeeze(1)
            label=label.to(device)
            prediction=model(data)
            _,pred=prediction.max(1)
            num_correct+=(pred==label).sum()
            num_samples+=pred.size(0)
        print(f'Obtained {num_correct}/{num_samples} with accuracy: '
            f'{float(num_correct)/float(num_samples)*100:.2f}')
    model.train()
    
    
check_accuracy(train_loader,model)
check_accuracy(test_loader,model)
            

Checking accuracy on training data


Obtained 6265/60000 with accuracy: 10.44
Checking accuracy on test data
Obtained 1028/10000 with accuracy: 10.28



<h1>Performance improvement using GRU</h1>

<h3 style="color: yellow;"> A Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) architecture.</h3>
 <h3 style="color: yellow;">It's designed to solve the vanishing gradient problem which can come with standard RNNs, allowing each recurrent unit to adaptively capture dependencies of different time scales.</h3>

 <div style="display: flex; justify-content: center;">
    <img src='gru.png', width =500>
</div>

In [9]:
# GRU class

class GRU(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes,sequence_length=28):
        super(GRU,self).__init__()
        self.input_size=input_size
        self.hidden_size=hidden_size
        self.num_layers=num_layers
        self.num_classes=num_classes
        self.sequence_length=sequence_length
        self.gru=nn.GRU(self.input_size,self.hidden_size,self.num_layers,batch_first=True) # In the MNIST dataset, the first dimension refers to the batch size
        self.fc=nn.Linear(self.hidden_size*self.sequence_length,self.num_classes)
        
    def forward(self,x):
        # Initialize hidden state with zeros
        h0=torch.zeros(self.num_layers,x.size(0), self.hidden_size).to(device) # x.size(0) considers the size of the batch
        out,_=self.gru(x,h0) # _ gives the hidden state and we do not want to retrieve it for each single sample, thus we ignore it
        out=out.reshape(out.size(0), -1) # shpe(0) to keep the batch size first and -1 to flatten the output
        out=self.fc(out)
        return out

In [10]:
# Model initialization
model=GRU(INPUT_SIZE,HIDDEN_SIZE,NUM_LAYERS,NUM_CLASSES).to(device)
loss=nn.CrossEntropyLoss()
optimizer=optim.Adam(model.parameters(),lr=LR)


# training loop
for epoch in range(EPOCHS):
    for i, (data,labels) in enumerate(train_loader):
        data=data.to(device).squeeze(1) # squeeze to remove the channel dimension
        labels=labels.to(device)
        prediction=model(data)
        loss_=loss(prediction,labels)
        optimizer.zero_grad()
        loss_.backward()
        optimizer.step()
        
        if i%2==0:
            print(f'{epoch}/{EPOCHS} | step: {i}/ {len(train_loader)} | loss: {loss_.item():.4f}')   
        



0/4 | step: 0/ 938 | loss: 2.2995
0/4 | step: 2/ 938 | loss: 202.6226
0/4 | step: 4/ 938 | loss: 394.8499
0/4 | step: 6/ 938 | loss: 655.2086
0/4 | step: 8/ 938 | loss: 897.9982
0/4 | step: 10/ 938 | loss: 1020.6848
0/4 | step: 12/ 938 | loss: 208.7840
0/4 | step: 14/ 938 | loss: 292.0469
0/4 | step: 16/ 938 | loss: 134.2702
0/4 | step: 18/ 938 | loss: 178.2141
0/4 | step: 20/ 938 | loss: 180.6277
0/4 | step: 22/ 938 | loss: 146.1734
0/4 | step: 24/ 938 | loss: 127.5219
0/4 | step: 26/ 938 | loss: 214.0220


0/4 | step: 28/ 938 | loss: 137.7257
0/4 | step: 30/ 938 | loss: 179.8280
0/4 | step: 32/ 938 | loss: 175.3296
0/4 | step: 34/ 938 | loss: 107.4693
0/4 | step: 36/ 938 | loss: 165.3964
0/4 | step: 38/ 938 | loss: 149.8317
0/4 | step: 40/ 938 | loss: 145.3198
0/4 | step: 42/ 938 | loss: 138.8790
0/4 | step: 44/ 938 | loss: 105.6921
0/4 | step: 46/ 938 | loss: 106.2840
0/4 | step: 48/ 938 | loss: 124.0673
0/4 | step: 50/ 938 | loss: 124.5043
0/4 | step: 52/ 938 | loss: 134.9178
0/4 | step: 54/ 938 | loss: 158.8102
0/4 | step: 56/ 938 | loss: 144.3313
0/4 | step: 58/ 938 | loss: 155.3364
0/4 | step: 60/ 938 | loss: 94.4192
0/4 | step: 62/ 938 | loss: 103.2918
0/4 | step: 64/ 938 | loss: 90.2318
0/4 | step: 66/ 938 | loss: 125.0393
0/4 | step: 68/ 938 | loss: 108.6216
0/4 | step: 70/ 938 | loss: 85.5893
0/4 | step: 72/ 938 | loss: 70.2675
0/4 | step: 74/ 938 | loss: 90.2882
0/4 | step: 76/ 938 | loss: 102.0776
0/4 | step: 78/ 938 | loss: 132.8610
0/4 | step: 80/ 938 | loss: 68.3097
0/4 | s

In [11]:
# check accuracy on training and test to see how good our model is
def check_accuracy(loader,model):
    if loader.dataset.train:
        print('Checking accuracy on training data')
    else:
        print('Checking accuracy on test data')
    num_correct=0
    num_samples=0
    model.eval()
    
    with torch.no_grad():
        for data,label in loader:
            data=data.to(device).squeeze(1)
            label=label.to(device)
            prediction=model(data)
            _,pred=prediction.max(1)
            num_correct+=(pred==label).sum()
            num_samples+=pred.size(0)
        print(f'Obtained {num_correct}/{num_samples} with accuracy: '
            f'{float(num_correct)/float(num_samples)*100:.2f}')
    model.train()
    
    
check_accuracy(train_loader,model)
check_accuracy(test_loader,model)
            

Checking accuracy on training data
Obtained 5923/60000 with accuracy: 9.87
Checking accuracy on test data
Obtained 980/10000 with accuracy: 9.80



<h1>Performance improvement using LSTM</h1>

<h3 style="color: yellow;"> Long Short-Term Memory (LSTMs) are designed to avoid long-term dependency issues, making them particularly well-suited for tasks where sequences have long-range dependencies.</h3>


 <h3 style="color: yellow;"STMs have a sophisticated gating mechanism consisting of three gates: Forget Gate (f), Input Gate (i), Output Gate (o).</h3>



 <div style="display: flex; justify-content: center;">
    <img src='lstm.png', width =500>
</div>

In [12]:
# LSTM class
# Beside h0, we must define another hidden state c0 for LSTM

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes,sequence_length=28):
        super(LSTM,self).__init__()
        self.input_size=input_size
        self.hidden_size=hidden_size
        self.num_layers=num_layers
        self.num_classes=num_classes
        self.sequence_length=sequence_length
        self.lstm=nn.LSTM(self.input_size,self.hidden_size,self.num_layers,batch_first=True) # In the MNIST dataset, the first dimension refers to the batch size
        self.fc=nn.Linear(self.hidden_size*self.sequence_length,self.num_classes)
        
    def forward(self,x):
        # Initialize hidden state with zeros
        h0=torch.zeros(self.num_layers,x.size(0), self.hidden_size).to(device) # x.size(0) considers the size of the batch
        
        # Define another hidden state for LSTM
        c0=torch.zeros(self.num_layers,x.size(0), self.hidden_size).to(device)
        
        out,_=self.lstm(x,(h0,c0)) # _ gives the hidden state and we do not want to retrieve it for each single sample, thus we ignore it
        out=out.reshape(out.size(0), -1) # shpe(0) to keep the batch size first and -1 to flatten the output
        out=self.fc(out)
        return out

In [13]:
# Model initialization
model=GRU(INPUT_SIZE,HIDDEN_SIZE,NUM_LAYERS,NUM_CLASSES).to(device)
loss=nn.CrossEntropyLoss()
optimizer=optim.Adam(model.parameters(),lr=LR)


# training loop
for epoch in range(EPOCHS):
    for i, (data,labels) in enumerate(train_loader):
        data=data.to(device).squeeze(1) # squeeze to remove the channel dimension
        labels=labels.to(device)
        prediction=model(data)
        loss_=loss(prediction,labels)
        optimizer.zero_grad()
        loss_.backward()
        optimizer.step()
        
        if i%2==0:
            print(f'{epoch}/{EPOCHS} | step: {i}/ {len(train_loader)} | loss: {loss_.item():.4f}')   
        



0/4 | step: 0/ 938 | loss: 2.3048
0/4 | step: 2/ 938 | loss: 219.4711
0/4 | step: 4/ 938 | loss: 107.4901
0/4 | step: 6/ 938 | loss: 314.2815
0/4 | step: 8/ 938 | loss: 383.3282
0/4 | step: 10/ 938 | loss: 741.0831
0/4 | step: 12/ 938 | loss: 444.4359
0/4 | step: 14/ 938 | loss: 321.9375
0/4 | step: 16/ 938 | loss: 308.9910
0/4 | step: 18/ 938 | loss: 390.1303
0/4 | step: 20/ 938 | loss: 217.5894
0/4 | step: 22/ 938 | loss: 233.4795
0/4 | step: 24/ 938 | loss: 224.2578
0/4 | step: 26/ 938 | loss: 196.8800
0/4 | step: 28/ 938 | loss: 276.0218
0/4 | step: 30/ 938 | loss: 254.9886
0/4 | step: 32/ 938 | loss: 272.9568
0/4 | step: 34/ 938 | loss: 274.6280
0/4 | step: 36/ 938 | loss: 307.6221
0/4 | step: 38/ 938 | loss: 304.3756
0/4 | step: 40/ 938 | loss: 209.1018
0/4 | step: 42/ 938 | loss: 190.4679
0/4 | step: 44/ 938 | loss: 92.9001
0/4 | step: 46/ 938 | loss: 157.1845
0/4 | step: 48/ 938 | loss: 181.2256
0/4 | step: 50/ 938 | loss: 173.7586
0/4 | step: 52/ 938 | loss: 108.2661
0/4 | ste

0/4 | step: 58/ 938 | loss: 128.7434
0/4 | step: 60/ 938 | loss: 183.5843
0/4 | step: 62/ 938 | loss: 203.4203
0/4 | step: 64/ 938 | loss: 166.6874
0/4 | step: 66/ 938 | loss: 172.4035
0/4 | step: 68/ 938 | loss: 180.6915
0/4 | step: 70/ 938 | loss: 189.6446
0/4 | step: 72/ 938 | loss: 88.3695
0/4 | step: 74/ 938 | loss: 161.6409
0/4 | step: 76/ 938 | loss: 127.6558
0/4 | step: 78/ 938 | loss: 166.5810
0/4 | step: 80/ 938 | loss: 157.9833
0/4 | step: 82/ 938 | loss: 74.1565
0/4 | step: 84/ 938 | loss: 115.4272
0/4 | step: 86/ 938 | loss: 81.0136
0/4 | step: 88/ 938 | loss: 57.8280
0/4 | step: 90/ 938 | loss: 100.0609
0/4 | step: 92/ 938 | loss: 107.3209
0/4 | step: 94/ 938 | loss: 92.7030
0/4 | step: 96/ 938 | loss: 61.5700
0/4 | step: 98/ 938 | loss: 79.5084
0/4 | step: 100/ 938 | loss: 73.1855
0/4 | step: 102/ 938 | loss: 54.4851
0/4 | step: 104/ 938 | loss: 75.5915
0/4 | step: 106/ 938 | loss: 118.4186
0/4 | step: 108/ 938 | loss: 105.2591
0/4 | step: 110/ 938 | loss: 69.6776
0/4 | 

In [14]:
# check accuracy on training and test to see how good our model is
def check_accuracy(loader,model):
    if loader.dataset.train:
        print('Checking accuracy on training data')
    else:
        print('Checking accuracy on test data')
    num_correct=0
    num_samples=0
    model.eval()
    
    with torch.no_grad():
        for data,label in loader:
            data=data.to(device).squeeze(1)
            label=label.to(device)
            prediction=model(data)
            _,pred=prediction.max(1)
            num_correct+=(pred==label).sum()
            num_samples+=pred.size(0)
        print(f'Obtained {num_correct}/{num_samples} with accuracy: '
            f'{float(num_correct)/float(num_samples)*100:.2f}')
    model.train()
    
    
check_accuracy(train_loader,model)
check_accuracy(test_loader,model)
            

Checking accuracy on training data


Obtained 35720/60000 with accuracy: 59.53
Checking accuracy on test data
Obtained 6068/10000 with accuracy: 60.68



<h1> Bidirectional LSTM (BiLSTM)</h1>

<h3 style="color: yellow;"> BiLSTM networks are an extension of traditional LSTM, a type of Recurrent Neural Network (RNN)</h3>

 <h3 style="color: yellow;">LSTMs are a special kind of RNN that can learn and remember over long sequences and are less susceptible to the vanishing gradient problem.</h3>

 <h3 style="color: yellow;">LSTMs is to increase the amount of information available to the network by processing the data in both forward and backward directions.</h3>


 <div style="display: flex; justify-content: center;">
    <img src='blstm.png', width =500>
</div>

In [15]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
import torchvision
from torchvision.datasets import MNIST
from torchvision.transforms import transforms
device=torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [16]:
# Constants and Hyperparameters
INPUT_SIZE=28
SEQUENCE_LENGTH=28
NUM_LAYERS=2
HIDDEN_SIZE=256
NUM_CLASSES=10
LR=0.01
BATCH_SIZE=64
EPOCHS=4

In [17]:
class BLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes,sequence_length=28):
        super(BLSTM,self).__init__()
        self.input_size=input_size
        self.hidden_size=hidden_size
        self.num_layers=num_layers
        self.num_classes=num_classes
        self.sequence_length=sequence_length
        self.lstm=nn.LSTM(self.input_size,self.hidden_size,self.num_layers,batch_first=True,bidirectional=True)
        self.fc=nn.Linear(self.hidden_size*2,self.num_classes) # Multiply by 2 because of  forward and backward directions (later we concatenate them as a single vector)
        
    def forward(self,x):
        h0=torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
        c0=torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
        out,_=self.lstm(x,(h0,c0)) #_ represent a tuple of hidden state and cell state
        out=self.fc(out[:,-1,:]) # We take the last hidden state of the last layer
        return out

        

In [18]:
# Dataset and dataloader
train_dataset=MNIST(root='./data',train=True,transform=transforms.ToTensor(),download=True)
test_dataset=MNIST(root='./data',train=False,transform=transforms.ToTensor(),download=True)
train_loader=DataLoader(dataset=train_dataset,batch_size=BATCH_SIZE,shuffle=True)
test_loader=DataLoader(dataset=test_dataset,batch_size=BATCH_SIZE,shuffle=True)


In [19]:
# Model,  loss, and optimizer initialization
model=BLSTM(INPUT_SIZE,HIDDEN_SIZE,NUM_LAYERS,NUM_CLASSES).to(device)
loss=nn.CrossEntropyLoss()
optimizer=optim.Adam(model.parameters(),lr=LR)

In [20]:
# Training loop
for epoch in range(EPOCHS):
    for i, (data,labels) in enumerate(train_loader):
        data=data.to(device).squeeze(1)
        labels=labels.to(device)
        prediction=model(data)
        loss_=loss(prediction,labels)
        optimizer.zero_grad()
        loss_.backward()
        optimizer.step()

        if i%2==0:
            print(f'{epoch}/{EPOCHS} | step: {i}/ {len(train_loader)} | loss: {loss_.item():.4f}')   
    

0/4 | step: 0/ 938 | loss: 2.3010
0/4 | step: 2/ 938 | loss: 2.2952
0/4 | step: 4/ 938 | loss: 2.2043
0/4 | step: 6/ 938 | loss: 2.3240
0/4 | step: 8/ 938 | loss: 2.1541
0/4 | step: 10/ 938 | loss: 2.2877
0/4 | step: 12/ 938 | loss: 2.0364
0/4 | step: 14/ 938 | loss: 2.0841
0/4 | step: 16/ 938 | loss: 1.7900
0/4 | step: 18/ 938 | loss: 1.5778
0/4 | step: 20/ 938 | loss: 1.6798
0/4 | step: 22/ 938 | loss: 1.6371
0/4 | step: 24/ 938 | loss: 1.3330
0/4 | step: 26/ 938 | loss: 1.6218


0/4 | step: 28/ 938 | loss: 1.2351
0/4 | step: 30/ 938 | loss: 1.0011
0/4 | step: 32/ 938 | loss: 1.4940
0/4 | step: 34/ 938 | loss: 1.4615
0/4 | step: 36/ 938 | loss: 1.1227
0/4 | step: 38/ 938 | loss: 1.3199
0/4 | step: 40/ 938 | loss: 1.2437
0/4 | step: 42/ 938 | loss: 0.9844
0/4 | step: 44/ 938 | loss: 1.2426
0/4 | step: 46/ 938 | loss: 0.9758
0/4 | step: 48/ 938 | loss: 1.1774
0/4 | step: 50/ 938 | loss: 1.0874
0/4 | step: 52/ 938 | loss: 1.4202
0/4 | step: 54/ 938 | loss: 1.1261
0/4 | step: 56/ 938 | loss: 1.1085
0/4 | step: 58/ 938 | loss: 0.9689
0/4 | step: 60/ 938 | loss: 1.1049
0/4 | step: 62/ 938 | loss: 1.3295
0/4 | step: 64/ 938 | loss: 1.2333
0/4 | step: 66/ 938 | loss: 0.9794
0/4 | step: 68/ 938 | loss: 0.6990
0/4 | step: 70/ 938 | loss: 1.1003
0/4 | step: 72/ 938 | loss: 1.0945
0/4 | step: 74/ 938 | loss: 0.7357
0/4 | step: 76/ 938 | loss: 0.9587
0/4 | step: 78/ 938 | loss: 0.6882
0/4 | step: 80/ 938 | loss: 0.8248
0/4 | step: 82/ 938 | loss: 0.8371
0/4 | step: 84/ 938 

In [21]:
# Def check teh accuracy
def check_accuracy(loader, model):
    if loader.dataset.train:
        print('Checking accuracy on training data')
    else:
        print('Checking accuracy on test data')
    num_correct=0
    num_samples=0
    model.eval()
    
    with torch.no_grad():
        for data,labels in loader:
            data=data.to(device).squeeze(1)
            labels=labels.to(device)
            prediction=model(data)
            _,pred=prediction.max(1)
            num_correct+=(pred==labels).sum()
            num_samples+=pred.size(0)
        print(f'Obtained {num_correct}/{num_samples} with accuracy: '
            f'{float(num_correct)/float(num_samples)*100:.2f}')
    model.train()
    
check_accuracy(train_loader,model)
check_accuracy(test_loader,model)


Checking accuracy on training data


Obtained 57887/60000 with accuracy: 96.48
Checking accuracy on test data
Obtained 9621/10000 with accuracy: 96.21
