<a href="https://colab.research.google.com/github/divya-r-kamat/DeepVision/blob/main/Neural%20Network%20Architecture/MNIST_Neural_Network_Architecture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Import functions

In [1]:
from __future__ import print_function #will allow to use print as a function, they must be at the top of the file

import torch  # import torch 

import torch.nn as nn # torch neural network modules and classes, that help in creating and training of the neural network

#The nn.functional package contains many useful loss functions and several other utilities.
import torch.nn.functional as F 

import torch.optim as optim
from torchvision import datasets, transforms

## Model
We will use a convolutional neural network, using the nn.Conv2d class from PyTorch. The 2D convolution is a fairly simple operation: start with a kernel, which is simply a small matrix of weights. This kernel “slides” over the 2D input data, performing an elementwise multiplication with the part of the input it is currently on, and then summing up the results into a single output pixel.

The activation function we'll use here is called a Rectified Linear Unit or ReLU, and it has a really simple formula: relu(x) = max(0,x) i.e. if an element is negative, we replace it by 0, otherwise we leave it unchanged.

To define the model, we extend the nn.Module class

In [2]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)              # input - 28 x 28 x 1   | Output - 28 x 28 x 32  | Receptive Field - 3 x 3
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)             # input - 28 x 28 x 32  | Output - 28 x 28 x 64  | Receptive Field - 5 x 5
        self.pool1 = nn.MaxPool2d(2, 2)                          # input - 28 x 28 x 64  | Output - 14 x 14 x 64  | Receptive Field - 10 x 10  (for now we are considering that with max pooling, receptive field doubles - but this is not entirely correct)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)            # input - 14 x 14 x 64  | Output - 14 x 14 x 128 | Receptive Field - 12 x 12
        self.conv4 = nn.Conv2d(128, 256, 3, padding=1)           # input - 14 x 14 x 128 | Output - 14 x 14 x 256 | Receptive Field - 14 x 14
        self.pool2 = nn.MaxPool2d(2, 2)                          # input - 14 x 14 x 256 | Output - 7 x 7 x 256   | Receptive Field - 28 x 28
        self.conv5 = nn.Conv2d(256, 512, 3)                      # input - 7 x 7 x 256   | Output - 5 x 5 x 512   | Receptive Field - 30 x 30
        self.conv6 = nn.Conv2d(512, 1024, 3)                     # input - 5 x 5 x 512   | Output - 3 x 3 x 1024  | Receptive Field - 32 x 32
        self.conv7 = nn.Conv2d(1024, 10, 3)                      # input - 3 x 3 x 1024  | Output - 1 x 1 x 10    | Receptive Field - 34 x 34

    def forward(self, x):

        #The first convolutional layer self.conv1 has a convolutional operation on input tensor x, followed by a relu activation operation 
        #whcih is then passed to second convolution operation self.conv2 followed by a relu whose output is then passed to a max pooling 
        x = self.pool1(F.relu(self.conv2(F.relu(self.conv1(x))))) 

        #The output of the max pool operation is passed to another two convolution and relu activation operation followed by max pooling. 
        #The relu() and the max_pool2d() calls are just pure operations. Neither of these have weights  
        x = self.pool2(F.relu(self.conv4(F.relu(self.conv3(x)))))

        #The output of the max pool operation from 4th convolution is passed to another two sets of convolution and relu activation operation
        x = F.relu(self.conv6(F.relu(self.conv5(x))))

        # The output from 6th convolution is then passed to a final convolution followed by a relu operation
        x = F.relu(self.conv7(x))

        #The flatten operation puts all of the tensor's elements into a single dimension.
        x = x.view(-1, 10)

        # Inside the network we usually use relu() as our non-linear activation function, but for the output layer, whenever we have a single category that we are trying to predict, we use softmax(). 
        #The softmax function returns a positive probability for each of the prediction classes, and the values sum to 1.

        return F.log_softmax(x)

## Using a GPU
As the sizes of our models and datasets increase, we need to use GPUs to train our models within a reasonable amount of time. GPUs contain hundreds of cores that are optimized for performing expensive matrix operations on floating point numbers in a short time, which makes them ideal for training deep neural networks with many layers.

Reference - [Machine Learning on GPU](https://hsf-training.github.io/hsf-training-ml-gpu-webpage/)

In [3]:
#This command will return a boolean (True/False) based on the GPU availability.
use_cuda = torch.cuda.is_available()


#Find out the specifications of the GPU(s)
# There are a wide variety of GPUs available these days, so it’s often useful to check the specifications of the GPU(s) that are available to you. For example, the following lines of code will tell you 
      #(i) which version of CUDA the GPU(s) support, 
      #(ii) how many GPUs there are available, 
      #(iii) for a specific GPU (here 0) what kind of GPU it is, and 
      #(iv) how much memory it has available in total.

if use_cuda:
    print('__CUDNN VERSION:', torch.backends.cudnn.version())
    print('__Number CUDA Devices:', torch.cuda.device_count())
    print('__CUDA Device Name:',torch.cuda.get_device_name(0))
    print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)


#use the use_cuda flag to specify which device you want to use
# This will set the device to the GPU if one is available and to the CPU if there isn’t a GPU available. This means that you don’t need to hard code changes into your code to use one or the other.

device = torch.device("cuda" if use_cuda else "cpu")

__CUDNN VERSION: 7603
__Number CUDA Devices: 1
__CUDA Device Name: Tesla T4
__CUDA Device Total Memory [GB]: 15.843721216


In [4]:
# Install and import torchsummary, this helps in visualization of the model which is very helpful while debugging the network
!pip install torchsummary
from torchsummary import summary

#move the model to the choosen device
model = Net().to(device)

# prints the model summary
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 32, 28, 28]             320
            Conv2d-2           [-1, 64, 28, 28]          18,496
         MaxPool2d-3           [-1, 64, 14, 14]               0
            Conv2d-4          [-1, 128, 14, 14]          73,856
            Conv2d-5          [-1, 256, 14, 14]         295,168
         MaxPool2d-6            [-1, 256, 7, 7]               0
            Conv2d-7            [-1, 512, 5, 5]       1,180,160
            Conv2d-8           [-1, 1024, 3, 3]       4,719,616
            Conv2d-9             [-1, 10, 1, 1]          92,170
Total params: 6,379,786
Trainable params: 6,379,786
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 1.51
Params size (MB): 24.34
Estimated Total Size (MB): 25.85
-------------------------------------



## DataLoader

We download the data and create a PyTorch dataset using the MNIST class from torchvision.datasets and create PyTorch data loaders for training and testing.

In [5]:
#Set the torch seed for reproducibility
torch.manual_seed(1)

#set the batch_size
batch_size = 128

# Host to GPU copies are much faster when they originate from pinned (page-locked) memory. 
# For data loading, passing pin_memory=True to a DataLoader will automatically put the fetched data Tensors in pinned memory, and thus enables faster data transfer to CUDA-enabled GPUs.

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}


#load train data and normalize the pixel values with mean and std computed on the MNIST training set
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)


#load test data and normalize the pixel values with mean and std computed on the MNIST training set

test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)


## Model Training

Steps for training is defined below:

- forward pass : compute predicted outputs by passing inputs to the model
- calculate the loss
- Backward pass
- perform single optimization step (parameter update)
- Clear the gradients for all optimized variables
- update average training loss

Test the trained Network

Finally, we test our best model on previously unseen test data and evaluate its performance. Testing on unseen data is good way to check our model perforamance to see if our model generalizes well. It may be useful to be granular in this analysis and take a look at how this model performs on each class as well as looking at its overall loss and accuracy

In [6]:
from tqdm import tqdm

#
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    pbar = tqdm(train_loader)
    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        pbar.set_description(desc= f'loss={loss.item()} batch_id={batch_idx}')


def test(model, device, test_loader):

    #sets all the layers in the model to evaluation mode. For eg. layers like dropout layers which turn "off" nodes during training with some probability, but allow every node to be on during evaluation.
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

In [7]:

model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

for epoch in range(1, 2):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

loss=1.976784110069275 batch_id=468: 100%|██████████| 469/469 [00:19<00:00, 24.27it/s]



Test set: Average loss: 1.8792, Accuracy: 2924/10000 (29%)

