
# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

## Learning Objective

At the end of the experiment, you will be able to:

* understand the output at each layer in convolutional neural network
* implement the ConvNet using PyTorch


In [None]:
#@title Experiment Explanation Video
from IPython.display import HTML

HTML("""<video width="854" height="480" controls>
  <source src="https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/Walkthrough/Single_Image_Convolution_Walkthrough.mp4" type="video/mp4">
</video>
""")

In [None]:
! wget https://cdn.iiith.talentsprint.com/aiml/Parrot1.jpg


### Basic Pytorch packages

**nn:**  This package provides an easy and modular way to build and train simple or complex neural networks.

**torch.nn.functional:** This package includes non-linear functions like ReLu and sigmoid

**torchvision:**  This package is used to load and prepare the dataset. Using this package we can perform/apply transformations on the input data.

**transforms:**  This package is  used to perform preprocessing on images and operations sequentially. 





In [None]:
from PIL import Image             # Python Image library
import torch.nn.functional as F
import torch.nn as nn
from torchvision import transforms
import numpy as np
from matplotlib import pyplot as plt

### Loading Image using PIL Package

In [None]:
image = Image.open('/content/Parrot1.jpg')

In [None]:
# The height and width of the image
image.size

**Defining Transformations**

We will use `transform.compose()` to combine all the image transformations at one place.

`transforms.ToTensor()` is used to convert the images into tensor values. 

`transforms.Resize(`) is used to resize the image.

In [None]:
data_transforms = transforms.Compose([
        transforms.Resize((256, 256)),
        transforms.ToTensor(),
        ])
img = data_transforms(image)

# The original image is reshaped from (600, 600) to (256, 256)
img.shape

In [None]:
# Printing the tensor values of an image
img

### Visualize the Image

permute() function is used to re-arrange the order of dimension of the image. matplotlib works fine even without conversion to numpy array. But PyTorch Tensors (Image tensors) are channel first, so to use them with matplotlib need to reshape it:

In [None]:
print(img.shape)
print(img.permute(1, 2, 0).shape)

plt.imshow(img.permute(1, 2, 0)) # In permute(1, 2, 0), 2 is the width; 1 is the height; 0 is the channel of the image
plt.show()

### Building a Convolutional Neural Network with PyTorch

ConvNet is a sequence of layers, and every layer of a ConvNet transforms one volume of activations to another through a differentiable function. We use three main types of layers to build ConvNet architectures: Convolutional Layer, Pooling Layer, and Fully-Connected Layer. We will stack these layers to form a full ConvNet architecture.









**Convolution Layer**   is the first filter applied as part of the feature-engineering step which applies a filter to our image. We pass over a mini image, usually called a kernel, and output the resulting, filtered subset of our image.

Output formula for convolutional,   $O = \frac{W- K + 2P}{S} +1$

*   O : output height/length
*   W : input height/length
*   K : filter size (kernel size)
*   P : padding 
*   S : stride


**Pooling Layer**  function is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network, and hence also control overfitting. The Pooling Layer operates independently on every depth slice of the input and resizes it spatially, using the MAX operation.

Output formula for Pooling, $O = \frac{W- K}{S} +1$   

If using PyTorch default stride (default stride is same as kernel size), then output formula for pooling will result,  $O = \frac{W}{K}$


![Capture](https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/Single_Image_Colvolution.PNG)

Output after Convolutional layer 1 = $\frac{256-3+ 2(1)}{1}+1 = 256$

Output after Convolutional layer 2 = $\frac{256-3+ 2(1)}{1}+1 = 256$

Output after Maxpool layer 1 = $\frac{256-3}{2}+1 = 127$

#### Let's understand how each layer of CNN works while passing an Image

Output of convolutional layer with Stride 1 and padding 1

In [None]:
# Defining a 2D conv layer with input_channels= 3, out_channel= 3, filter= 3, stride= 1 and padding= 1
cnn2d = nn.Conv2d(in_channels= 3, out_channels= 3, kernel_size= 3, stride= 1, padding= 1)

cnn1_output = cnn2d(img.unsqueeze(0))
print(cnn1_output.shape)

Output of Maxpool Layer

In [None]:
# Defining a maxpool layer with filter size= 3
Max_pool = nn.MaxPool2d(kernel_size=3)
maxpool_output = Max_pool(cnn1_output)
maxpool_output.shape

#### Defining ConvNet Architecture

Considering two convolutional layers, 1 maxpool layer and 1 fully connected layer for building an architecture. The output of first conv layer will be the input of the second conv layer. Applying relu for the output of the second conv layer and then maxpool.


In [None]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        
        # Initialze convolutional layer1 with filter size 3, stride 1 and padding 1
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=1)

        self.conv2 = nn.Conv2d(in_channels=6, out_channels=3, kernel_size=3, stride=1, padding=1)

        self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2) 

    def forward(self, x):

        # Convolutional layer1 
        out = self.conv1(x)
        print("conv1", out.shape)

        # Convolutional layer2
        out = self.conv2(out)
        print("conv2", out.shape)

        # Activation Function 
        out = F.relu(out)

        # Maxpool layer
        out = self.maxpool1(out)
        
        return out

# Initializing the network by creating an instance
net = Net()
print(net)

Pass the image through the model and get the output

In [None]:
output = net(img.unsqueeze(0))
output.shape

In [None]:
# Color image after maxpooling. Multiple execution will result in different intensity of the colors.
plt.imshow((output[0].permute(1,2,0)).data.numpy())
plt.show()

**Summary:** In the above experiment we have seen representation of images using ConvNet architecture (filters at different layers and the output of each layer i.e. convolution of image at each layer). CNN extracts the feature of image and convert it into lower dimension without loosing its characteristics. Image classification using CNN can be seen in further experiments