# <font color='Blue'> **The required packages**</font>



> **Torch package**
*   The torch package contains data structures for multi-dimensional tensors and defines mathematical operations over these tensors.

> **Torch cuda**
* It is used to set up and run CUDA operations. It keeps track of the currently selected GPU, and all CUDA tensors you allocate will by default be created on that device.

>**Torch utils**
*  TorchUtils is a Python package providing helpful utility APIs for the PyTorch projects.

> **Torch nn**
*  This contains different classess that help to build neural network models.

> **Torch funtional**
*  The functional API of PyTorch is a powerful tool that enables you to write
 high-performance neural network models


> **Torch optim**
*   torch.optim is a package implementing various optimization algorithms.

> **Torchvision**

*   Torchvision provides additional functionalities to manipulate and process images with standard image processing algorithms. It has the computer vision models and datasets


1.   **datasets:**

        It has common datasets like MNIST(Modified National Institute of Standards and Technology), CIFAR10, ImageNet etc.

2.  **transforms**

       Torchvision supports common computer vision transformations in the torchvision.transforms and torchvision.transforms.v2 modules. Transforms can be used to transform or augment data for training or inference of different tasks (image classification, detection, segmentation, video classification).


> **Torch Summary**

*  This can be used to print out the trainable and non-trainable parameters in a Keras-like manner for PyTorch models.


In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
# !pip install torchsummary
from torchsummary import summary

**Torch Cuda**

`It keeps track of the currently selected GPU, and all CUDA tensors you allocate will by default be created on that device.`

In [4]:
# !pip install torch
#!pip install numpy
#!pip install torchsummary
#!pip install torchvision
#
# !pip3 install --upgrade pip

In [5]:
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
device

device(type='cpu')

#<font color='Blue'>  **What Does a PyTorch DataLoader Do?** </font>

The PyTorch DataLoader class is a tool which help to prepare, manage, and serve the data to the deep learning networks. Because many of the pre-processing steps we will need to do before beginning training a model, finding ways to standardize these processes is critical for the readability and maintainability of your code.

The PyTorch DataLoader used below are :

**Define a dataset to work with:** identifying where the data is coming from and how it should be accessed.
The datasets used here is the MNIST dataset

**Batch the data:**
 To define how many training or testing samples to use in a single iteration. Because data are often split across training and testing sets of large sizes, being able to work with batches of data can allow us  training and testing processes to be more manageable.

**Shuffle the data:** PyTorch can handle shuffling data for us as it loads data into batches. This can increase representativeness in the dataset and prevent accidental skewness.

**Transforms:**

The transform() method allows you to execute a function for each value of the DataFrame.

TensorFlow Transform is a library for preprocessing input data for TensorFlow, including creating features that require a full pass over the training dataset. For example, using TensorFlow Transform you could: Normalize an input value by using the mean and standard deviation.



# **Transforms**
Data transformation is also known as data preparation or data preprocessing. There are lots of different names for the same thing. It makes sure that your data is clean and ready to be used by your machine learning algorithm. Without data transformation, your AI won't be able to make accurate predictions.

**To Tensor**

This is a very commonly used conversion transform. In PyTorch, we mostly work with data in the form of tensors. If the input data is in the form of a NumPy array or PIL image, we can convert it into a tensor format using ToTensor.

Tensor image are expected to be of shape (C, H, W), where C is the number of channels, and H and W refer to height and width. Most transforms support batched tensor input. A batch of Tensor images is a tensor of shape (N, C, H, W), where N is a number of images in the batch. The v2 transforms generally accept an arbitrary number of leading dimensions (..., C, H, W) and can handle batched images or batched videos.

**Normalization**

The goal of normalization is to transform features to be on a similar scale. This improves the performance and training stability of the model.
The original pixel values range from 0 to 255. So, we divided them by 255 to get them into the range of 0.0 to 1.0. That's normalizing! The benefit of normalizing the input data is that it avoids large gradient values that could make the training process difficult.

In [21]:
batch_size = 128
#ToTensor transforms the PIL Image to a torch.Tensor and Normalize subtracts the mean and divides by the standard deviation you provide.

train_dataset = datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ]))
train_loader = torch.utils.data.DataLoader(train_dataset
    ,
    batch_size=batch_size, shuffle=True)
test_dataset = datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ]))
test_loader = torch.utils.data.DataLoader(test_dataset
    ,
    batch_size=batch_size, shuffle=True)

In [13]:
# nn.Conv2d(1,32,3,padding=1)
#torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)

# Some Notes on our naive model

We are going to write a network based on what we have learnt so far.

The size of the input image is 28x28x1. We are going to add as many layers as required to reach RF = 32 "atleast".

In [22]:


train_transform = transforms.Compose([transforms.ToTensor()])
train_set = datasets.MNIST('../data', train=True, download=True, transform=train_transform)
print(train_set.train_data.shape)
print(train_set.train_data.float().mean())
print(train_dataset.train_data.float().mean())
print(train_set.train_data.float().mean()/255)
print(train_set.train_data.float().std()/255)
 # print(train_set.train_data.float().mean()/255)
#print(train_dataset.train_data.std(axis=(0,1,2))/255)



torch.Size([60000, 28, 28])
tensor(33.3184)
tensor(33.3184)
tensor(0.1307)
tensor(0.3081)


#### **Input layer:**
 Input layer has nothing to learn, at it’s core, what it does is just provide the input image’s shape. So no learnable parameters here. Thus number of parameters = 0.
#### **CONV layer:** 
This is where CNN learns, so certainly we’ll have weight matrices. To calculate the learnable parameters here, all we have to do is just multiply the by the shape of width m, height n, previous layer’s filters d and account for all such filters k in the current layer. Don’t forget the bias term for each of the filter. Number of parameters in a CONV layer would be : ((m * n * d)+1)* k), added 1 because of the bias term for each filter. The same expression can be written as follows: ((shape of width of the filter * shape of height of the filter * number of filters in the previous layer+1)*number of filters). Where the term “filter” refer to the number of filters in the current layer.
#### **POOL layer:** 
This has got no learnable parameters because all it does is calculate a specific number, no backprop learning involved! Thus number of parameters = 0.
#### **Fully Connected Layer (FC):**
 This certainly has learnable parameters, matter of fact, in comparison to the other layers, this category of layers has the highest number of parameters, why? because, every neuron is connected to every other neuron! So, how to calculate the number of parameters here? You probably know, it is the product of the number of neurons in the current layer c and the number of neurons on the previous layer p and as always, do not forget the bias term. Thus number of parameters here are: ((current layer neurons c * previous layer neurons p)+1*c).

In [32]:
class FirstDNN(nn.Module): 
  def __init__(self):
    super(FirstDNN, self).__init__()  ##It is callig the parent class (nn.Module) init function
    
     ######################### LAYER 1 conv 1 #############################################################################
     #torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
    
    #  r_in:1, n_in:28, j_in:1, s:1, r_out:3, n_out:28, j_out:1, k:3 , p =1
    #  r_out = r_in + (k  - 1) * j_in = 1 + 2 * 1 = 3
    #  n_out = (n_in + 2p - K) / s + 1 = (28 + 2 - 3 ) / 1 + 1 = 28
    #  j_out = j_in * s = 1 * 1 = 1
     # No of parameters = (k * k *previous input layer + 1 ) * no of filters = (9 + 1) * 32 = 320
    # r_in:1, n_in:28, j_in:1, s:1, r_out:3, n_out:28, j_out:1


    self.conv1 = nn.Conv2d(1, 32, 3, padding=1)# Edges

    ######################### LAYER 2 Conv 2 #############################################################################
    #  r_in:3 , n_in:28 , j_in:1 , s:1 , r_out: 5, n_out:28 , j_out:1 , k = 3, p =1
    #  r_out = r_in + (k  - 1) * j_in = 3 + 2 *1 = 5
    #  n_out = (n_in + 2p - K) / s + 1 = (28 + 2 - 3 ) / 1 + 1 = 28
    #  j_out = j_in * s = 1 * 1 = 1
    # No of parameters = (k * k *previous input layer + 1 ) * no of filters = (9 *32 + 1) * 64 = 18496
    # r_in: 3, n_in:28 , j_in:1 , s:1 , r_out: 5 , n_out:28 , j_out:1



    self.conv2 = nn.Conv2d(32, 64, 3, padding=1)#Textures


    ######################### LAYER 3 Max #############################################################################
    #  r_in: 5, n_in:28 , j_in:1 , s: 2, r_out: 6, n_out:14 , j_out: 2 k = 2 , p =0
    #  r_out = r_in + (k  - 1) * j_in = 5 + 1* 1 = 6
    #  n_out = (n_in + 2p - K) / s + 1 = (28 - 2 ) / 1 + 1 = 14
    #  j_out = j_in * s = 1 * 2 = 2
     # No of parameters = 0 (because it is pooling layer)


    # r_in:5 , n_in: 28, j_in: 1, s:1 , r_out:6 , n_out: 14, j_out:2
    self.pool1 = nn.MaxPool2d(2, 2)#Why parameters is zero



    ######################### LAYER 4 Conv3 #############################################################################
    # r_in:6 , n_in:14 , j_in:2 , s: 1, r_out: 10, n_out: 14, j_out:2 k = 3, p =1
    #  r_out = r_in + (k  - 1) * j_in = 6+ 2*2 = 10
    #  n_out = (n_in + 2p - k) / s + 1 = (14 +2 *1 - 3 ) / 1 + 1 = 13 + 1 = 14
    #  j_out = j_in * s = 2 * 1 = 2
    # No of parameters = (k * k previous input layer + 1 ) * no of filters = (9 *64 + 1) * 128 = 73856

   # r_in:6 , n_in: 14, j_in: 2, s:1 , r_out: 10, n_out:14 , j_out:2
    self.conv3 = nn.Conv2d(64, 128, 3, padding=1)


 ######################### LAYER 5 conv 4#############################################################################
    # r_in:10 , n_in:14 , j_in:2 , s: 1, r_out:14 , n_out: 14, j_out:2 k = 3, p =1
    #  r_out = r_in + (k  - 1) * j_in = 10 +2 * 2 = 14
    #  n_out = (n_in + 2p - K) / s + 1 = (14 +2 -3)/ 1 +1 = 14
    #  j_out = j_in * s = 2 *1 = 2
    # No of parameters = (k * k previous input layer + 1 ) * no of filters = (9 *128 + 1) * 256 = 2,95,168


    # # r_in:10 , n_in:14 , j_in:2 , s:1 , r_out:14 , n_out:14 , j_out:2
    self.conv4 = nn.Conv2d(128, 256, 3, padding = 1)



    ######################### LAYER 6 Max pool #############################################################################
    # r_in: 14, n_in: 14, j_in:2 , s:2 , r_out:16, n_out:7 , j_out:4 k = 2 , p =0
    #  r_out = r_in + (k  - 1) * j_in = 14 + (1 * 2) = 16
    #  n_out = (n_in + 2p - K) / s + 1 = (14 + 0 - 2 )/ 2 + 1 = 7
    #  j_out = j_in * s = 2 * 2 = 4
    # No of parameters = 0



    # # r_in:14 , n_in:14 , j_in:2 , s:2 , r_out: 16, n_out:7 , j_out:4
    self.pool2 = nn.MaxPool2d(2, 2)


     ######################### LAYER 7 conv 5 #############################################################################
    # r_in: 16, n_in:7 , j_in: 4, s: 1, r_out:24 , n_out: 5, j_out:4 k = 3, p = 0
    #  r_out = r_in + (k  - 1) * j_in = 16 +2 * 4 = 24
    #  n_out = (n_in + 2p - K) / s + 1 = (7 + 0- 3)/1 + 1 = 5
    #  j_out = j_in * s = 4*1 = 4
    # No of parameters = (k * k previous input layer + 1 ) * no of filters = (9 *256 + 1) * 512 = 11,80,160


    self.conv5 = nn.Conv2d(256, 512, 3)
     ######################### LAYER 8 conv 6 #############################################################################
    # r_in:24 , n_in: 5, j_in:4 , s:1 , r_out:32 , n_out: 3, j_out:4 , k = 3 p =0
    #  r_out = r_in + (k  - 1) * j_in = 24 + 2* 4 = 32
    #  n_out = (n_in + 2p - K) / s + 1 == (5 + 0- 3)/ 1 + 1 = 3
    #  j_out = j_in * s = 4 * 1 = 4
    # No of parameters = (k * k previous input layer + 1 ) * no of filters = (9 *512 +1 ) * 1024 ) = 47,19,616

    self.conv6 = nn.Conv2d(512, 1024, 3)
     ######################### LAYER 8 #############################################################################
    # r_in: 32, n_in: 3, j_in:4 , s:1 , r_out: 40, n_out:1 , j_out:4, k = 3 ,p =0
    #  r_out = r_in + (k  - 1) * j_in = 32 +2 * 4 = 40
    #  n_out = (n_in + 2p - K) / s + 1 = (3 + 0- 3 ) / 1 +1 = 1
    #  j_out = j_in * s = 4 * 1 = 4
    # No of parameters = (k * k previous input layer + 1 ) * no of filters = (9 *1024 +1 ) * 10 ) = 92,170

    
    self.conv7 = nn.Conv2d(1024, 10, 3)
# Correct values
# https://user-images.githubusercontent.com/498461/238034116-7db4cec0-7738-42df-8b67-afa971428d39.png
  def forward(self, x):
    x = self.pool1(F.relu(self.conv2(F.relu(self.conv1(x)))))
    x = self.pool2(F.relu(self.conv4(F.relu(self.conv3(x)))))
    x = F.relu(self.conv6(F.relu(self.conv5(x))))

    x = self.conv7(x)
    #x = F.relu(x) # this is the last step. Think what ReLU does to our results at this stage!
    x = x.view(-1, 10)
    return F.log_softmax(x)


In [33]:
model = FirstDNN().to(device)

In [34]:
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 32, 28, 28]             320
            Conv2d-2           [-1, 64, 28, 28]          18,496
         MaxPool2d-3           [-1, 64, 14, 14]               0
            Conv2d-4          [-1, 128, 14, 14]          73,856
            Conv2d-5          [-1, 256, 14, 14]         295,168
         MaxPool2d-6            [-1, 256, 7, 7]               0
            Conv2d-7            [-1, 512, 5, 5]       1,180,160
            Conv2d-8           [-1, 1024, 3, 3]       4,719,616
            Conv2d-9             [-1, 10, 1, 1]          92,170
Total params: 6,379,786
Trainable params: 6,379,786
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 1.51
Params size (MB): 24.34
Estimated Total Size (MB): 25.85
-------------------------------------

  return F.log_softmax(x)


In [19]:
# !pip install tqdm

In [35]:
from tqdm import tqdm
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    pbar = tqdm(train_loader)
    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        pbar.set_description(desc= f'loss={loss.item()} batch_id={batch_idx}')


def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

In [36]:
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

for epoch in range(1, 2):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

  return F.log_softmax(x)
loss=0.06854268163442612 batch_id=468: 100%|██████████| 469/469 [08:07<00:00,  1.04s/it] 



Test set: Average loss: 0.0562, Accuracy: 9821/10000 (98%)

