<a href="https://colab.research.google.com/github/anubhavsatpathy/EVA7/blob/main/Session1/EVA7_Session_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Importing the required Libraries

Code is modular in nature. The bulk of the code we need to create / train and test neural nets is encapsulated within Python libraries and frameworks. There are many such libraries like *Tensorflow*, *Deeplearning4j* and *pytorch*.

**pyTorch** has become the defacto framework for all cutting edge research in the field of AI. This framework has been developed by a team of engineers at Facebook and is now widely adopted in the AI community because : 


*   **More Pythony and transparent** : Pytorch interfaces well with all python number crunching libraries like *numpy* etc. Its autograd module makes things very transparent unlike tensorflow where most of the compute is actually abstracted away beding the graph.
*   **Dynamic Graphs** : Unlike tensorflow, the computational graph can be changed during runtime in pytorch. This makes a ton of research possible and hence the popularity of the framework among researchers

Okkthen ... let's focus on the libraries we have imported below : 



*   *torch* : The torch package contains data structures for multi-dimensional tensors and defines mathematical operations over these tensors. Additionally, it provides many utilities for efficient serializing of Tensors and arbitrary types, and other useful utilities
*   *torch.nn as nn* : This contains the basic building blocks of the Neural nets. For example the layers of the net like *Convolution layers*, *Linear layers*, *Activation layers* etc. This also defines containers like *Module* that all our models will sub-class
*   *torch.nn.functional* : While *torch.nn* defines all layers as classes, torch.nn.functional defines them as functions. These become important if we want to say manually pass on weights for every epoch / iteration etc. It is more controlable and transparent.
*   *torch.optim* : This package contains all optimizers we may want to use to train our neural networks. It includes basic optimizers like SGD to advanced optimizers like Adam. 
*   *torchvision* : This is a package that contains commonly used datasets / data transformations and model architectures in the field of computer vision 





In [2]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

#Defining our Model : 
In the following block of code, we proceed to define our model. The salient pointers to note here are the following : 



*   **nn.Module** : We inheric nn.Module which is a container calss for all models that we will define using pytorch. It provides for handy utilities like registering of parameters / retreiving of parameters / the forward dunction that needs to be overidden etc.
*   **nn.Conv2d** : This defines a 2D convolution layer the parameters are as follows (# of incoming channels, number of output channels required, kernel size, padding) - it also permits parameters like dialation etc. that we will cover in later units. One important thing to note here is that the Bias parameter is turned to True by default. 
*   **nn.MaxPool2d** : This defines the Max Pooling layer. We use max pooling to decrease the size of out channels and to increase the receptive field faster. 
*   **How does our model look** : Our model defines 3 convolution blocks. the first two blocks end with pooling operations and the third block reduces the channel size to 1x1. The GRF of the last convolution layer is 34x34 that accounts for the 4 paddings applied during the convolutions
*  **The forward function** : The *forward()* function actually uses the layers defined in the constructor of the Model to compute an output of the network and return it to the caller. As you can see we use our Activation functions ReLU over here and finally return the softmax of the output. We also convert the tensor into a (-1, num_classes) shape before applying the softmax functions

The inpul sizes and outbut sizes mentioned below ignore the batch size component intentionally



In [3]:
# All shapes are in the format (height,width,channels)
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1) #input : (28,28,1) OUtput : (28,28,32) RF : (3x3)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1) #input : (28,28,32) OUtput : (28,28,64) RF : (5x5)
        self.pool1 = nn.MaxPool2d(2, 2) #input : (28,28,64) OUtput : (14,14,64) RF : (10x10)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1) #input : (14,14,64) OUtput : (14,14,128) RF : (12x12)
        self.conv4 = nn.Conv2d(128, 256, 3, padding=1) #input : (14,14,128) OUtput : (14,14,256) RF : (14x14)
        self.pool2 = nn.MaxPool2d(2, 2) #input : (14,14,256) OUtput : (7,7,256) RF : (28x28)
        self.conv5 = nn.Conv2d(256, 512, 3) #input : (7,7,256) OUtput : (5,5,512) RF : (30x30)
        self.conv6 = nn.Conv2d(512, 1024, 3) #input : (5,5,512) OUtput : (3,3,1024) RF : (32x32)
        self.conv7 = nn.Conv2d(1024, 10, 3) #input : (3,3,1024) OUtput : (1,1,10) RF : (34x34)

    def forward(self, x):
        x = self.pool1(F.relu(self.conv2(F.relu(self.conv1(x)))))
        x = self.pool2(F.relu(self.conv4(F.relu(self.conv3(x)))))
        x = F.relu(self.conv6(F.relu(self.conv5(x))))
        x = F.relu(self.conv7(x))
        x = x.view(-1, 10)
        return F.log_softmax(x)

#Printing the summary of the model : 



*   **What is torchsummary** : It is a community contributed package that given a model and an input size prints out the layers / output sizes and number of parameters in each layer - This is very useful visualization before spending time and resources in training the network
*   **What is CUDA** : CUDA is our interface with the GPU. It is a parallel computing framework that makes using a GPU for general purpose computing very seamless




In [4]:
!pip install torchsummary
from torchsummary import summary
use_cuda = torch.cuda.is_available()  #Checks if CUDA is available in the runtime
device = torch.device("cuda" if use_cuda else "cpu") # Sets device to cuda if GPU is available else to cpu - device is where the params of the model live
model = Net().to(device) # Moves the parameters to the device as specified in the code above
summary(model, input_size=(1, 28, 28)) # prints the summary as explained in the text snippet above this block

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 32, 28, 28]             320
            Conv2d-2           [-1, 64, 28, 28]          18,496
         MaxPool2d-3           [-1, 64, 14, 14]               0
            Conv2d-4          [-1, 128, 14, 14]          73,856
            Conv2d-5          [-1, 256, 14, 14]         295,168
         MaxPool2d-6            [-1, 256, 7, 7]               0
            Conv2d-7            [-1, 512, 5, 5]       1,180,160
            Conv2d-8           [-1, 1024, 3, 3]       4,719,616
            Conv2d-9             [-1, 10, 1, 1]          92,170
Total params: 6,379,786
Trainable params: 6,379,786
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 1.51
Params size (MB): 24.34
Estimated Total Size (MB): 25.85
-------------------------------------

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


#Datasets and DataLoaders : 


*   **Datasets** : Datasets are classes that hold the details about important datasets used in the computer vision field
*   **Dataloader** : DataLoaders are classes that are used to load up multiple samples at the same time using multiple workers



In [5]:


torch.manual_seed(1) # sets the seed of random number generator to 1
batch_size = 128 # Sets the batch size to 1

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {} # num_workers means number of workers engaged in sampling and transforming data pin_memory fastens up transfering the data onto the GPU
# In the code below we declare a DataLoader for our training data
# We pass it the MNIST dataset containing the digit images and labels
# We do three transformations on each sample of our training data - we convert them to tensors and normalize them using the mean and stdev of the dataset
# We shuffle the data every epoch
# We pass the batch size of 128
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)
# In the code below we declare a DataLoader for our test data
# We pass it the MNIST dataset containing the digit images and labels
# We do three transformations on each sample of our training data - we convert them to tensors and normalize them using the mean and stdev of the dataset
# We shuffle the data every epoch
# We pass the batch size of 128
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw



  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


#Defining the train and test methods : 



*   **What is tqdm** : tqdm is a general purpose progress bar used for python programs and the CLI
*   **optimizer.zero_grad()** : This is important because by default, gradients accumulate (are summed up) between iteration. This clears us the gradients b/w optimizer steps
*  **torch.no_grad()** : Any computation that is performed with no_grad() is not added to the computation graph



In [6]:
from tqdm import tqdm
def train(model, device, train_loader, optimizer, epoch):
    model.train() #This puts the model in train mode meaning that layers like dropouts / batch norm etc will be active otherwise they would be bypassed
    pbar = tqdm(train_loader) # train_loader is an iterable that has been passed to tqdm creating another iterable that adds a batch id to the return value of train_loader.__next()
    for batch_idx, (data, target) in enumerate(pbar): # Iterate through training batches, performing the following for each batch
        data, target = data.to(device), target.to(device) # Transfer the input and labels to the GPU (remember we had pinned memopry for this earlier) 
        optimizer.zero_grad() # Clear gradients of the params of the model
        output = model(data) # Run the forward function of the model to get the softmax class activations
        loss = F.nll_loss(output, target) # Calculates the negative log likelihood loss for the outputs given the labels
        loss.backward() # Calculates the gradients of the loss wrt the model params
        optimizer.step() # Updates the gradients by a multiple of their gradients calculated above (defined by learning rate)
        pbar.set_description(desc= f'loss={loss.item()} batch_id={batch_idx}') # Prits progress info


def test(model, device, test_loader):
    model.eval() # This puts the model in eval mode so as to deactivate layers like Dropout etc.
    test_loss = 0 # Initializes loss and correct preds
    correct = 0
    with torch.no_grad():
        for data, target in test_loader: # for each batch in test_loader do the following
            data, target = data.to(device), target.to(device) # Transfer the input and labels to the GPU
            output = model(data) # Run the model on the batch
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item() # Add to correct if the pred equals the label view_as helps reshape a tensor like another tensor

    test_loss /= len(test_loader.dataset) # Average the loss for all the batches
    #Print accuracy and avg loss over batches
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

#Actually Training the model : 


*   **Optimizer** : We set up the optimizer. The learning rate defines the rate at which the params are updated. Momentum defines the carry forward of the last rate of update.



In [7]:

model = Net().to(device) #Transfers tha params of the model to the device
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) # Sets up the optimizer for training

for epoch in range(1, 2): # Trains the model for one epoch
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

loss=1.99245023727417 batch_id=468: 100%|██████████| 469/469 [00:38<00:00, 12.03it/s]



Test set: Average loss: 1.9687, Accuracy: 2790/10000 (28%)

