### Learning of CNN through different layers
![Different layers of CNN](img/cnn_layers.png)

        The above pictures are of feature maps generated from different layers of VGG16(Visual Geometry Group of Oxford) trained on imagenet.
        The Layer 1 generates mostly horizontal, vetical and diagonal lines. There mostly used for detecting edges in an image. The Layer 2 will try to give more informations than first. It detects the corners. The CNN learns to do this on its own. There is no special instruction for the CNN to focus on more complex objects in deeper layers. That’s just how it normally works out when you feed training data into a CNN. Layer 3 is where we start to see some complex patterns like the eyes, face etc. We can assume that this feature maps are obtained from a model trained for detection of human faces. In Layer 4 we see our features finding patterns in the more complex parts of the faces such as eyes.
![5th layer of VGG16](img/layer5.png)
        
        In Layer 5, you can the feature map generates the specific faces of humans, tyres of cars, faces of animals etc. This feature map contains to most information about the patters found in the images.
        
### Different Parts of a CNN
![Different Parts of CNN](img/cnn_parts.png)

    Now, we have learnt a lot about CNNs, it's time to implement it using PyTorch. 

# Implementation of a CNN Classification Network

In [9]:
import numpy as np
import pandas as pd
import torch
from torch import nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
ds = pd.read_csv('data/mnist.csv').values
ds.shape

(42000, 785)

In [4]:
# Reshaping our data according to our CNN
xtrain=ds[2000:10000, 1:].reshape((-1, 1, 28, 28))/255.0
ytrain=ds[2000:10000, 0]

xtest=ds[23000:24500, 1:].reshape((-1, 1, 28, 28))/255.0
ytest=ds[23000:24500, 0]

print(xtrain.shape, ytrain.shape)
print(xtest.shape, ytest.shape) # batch_size, no_of_channels, width, height

(8000, 1, 28, 28) (8000,)
(1500, 1, 28, 28) (1500,)


        Here a question comes, why we divided image by 255 or why it's necessary?
Ans:-

         These are all scaling techniques, the pixel values are small (Note that these small values still represents the original image), and hence the computation required and time to converge the model reduces significantly. CNN will converge despite taking 0–255 as inputs instead of scaled down to 0 -1 . However, it will converge very slowly.
## Data Normalization
        Data normalization is an important step which ensures that each input parameter (pixel, in this case) has a similar data distribution. This makes convergence faster while training the network. Data normalization is done by subtracting the mean from each pixel and then dividing the result by the standard deviation. The distribution of such data would resemble a Gaussian curve centered at zero. For image inputs we need the pixel numbers to be positive, so we might choose to scale the normalized data in the range [0,1] or [0, 255].
        In PyTorch Normalisation is done, channel-wise.

In [12]:
from torchvision import transforms
transforms.Normalize??

In [13]:
# Checking Output Values or labels
print(set(ytrain))

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}


In [None]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        
        self.conv1= nn.Sequential(
                nn.Conv2d(1, 16, 5, 1, 2),
                nn.ReLU(),
                nn.MaxPool2d(2))
        
        self.conv2=nn.Sequential(
                nn.Conv2d(16, 32, 5, 1, 2),
                nn.ReLU(),
                nn.MaxPool2d(2))
        
        self.out=nn.Linear(32*7*7, 10)
        
        
    def forward(self, x):
        x=self.conv1(x)
        x=self.conv2(x)
        x=x.view(x.size(0), -1)
        
        output=F.softmax(self.out(x))
#         print(output.size())
        return output
        
        