In [1]:
import cv2
import matplotlib.pyplot as plt

### CNN (Convolutional Neural Network)

#### Why CNN when we can use ANN for classification?
1) As ANNs dont scale well to image data.<br>
2) Image data contains patterns <br>
a) eyes, nose,ears for a human<br>
b) lights, doors,mirror etc for a car<br>

ANNs fail to recognize these patterns in the images

### CNN
1) It is a class of NN used for Image Classification.<br>
2) CNN image classifications takes an input image, process it and classify it under certain categories (Eg., Dog, Cat, Tiger, Lion). Computers sees an input image as array of pixels and it depends on the image resolution. Based on the image resolution, it will see h x w x d( h = Height, w = Width, d = Dimension ). Eg., An image of 6 x 6 x 3 array of matrix of RGB (3 refers to RGB values) and an image of 4 x 4 x 1 array of matrix of grayscale image.<br>
3) Bottom line is that the role of the ConvNet is to reduce the images into a form that is easier to process, without losing features that are critical for getting a good prediction.<br>
4) By using a CNN, one can enable sight to computers.<br>
5) In CNN we have 4 layers

    a) Convolution + Activation layer, 
    b) Pooling layer
    c) Flatten layer 
    d) Dense layer 
 
The layers are arranged in such a way so that they detect simpler patterns first (lines, curves, etc.) and more complex patterns (faces, objects, etc.) further along.<br>

6) In generel, there will be multiple (Convolution + Pooling) layers in a CNN architecture

### Acrhitecture of CNN
<img src="cnn_arch1.jpeg">

### Layers in CNN

### 1) Colvolution Layer + Activation (relu)

#### i) Convolution
a) Convolution is the first layer to extract features from an input image. Convolution preserves the relationship between pixels by learning image features.<br>
b) In this layer, the kernel slides across the height and width of the image-producing the image representation of that receptive region. This produces a two-dimensional representation of the image known as an feature map that gives the response of the kernel at each spatial position of the image. The sliding size of the kernel is called a stride.<br>
c) On each image N different F * F filters(or kernels) are applied to extract features from the images<br> 

<img src="conv1.png">
<img src="conv2.png">
<img src="convolution_gif.gif">


### Mathematically

<img src="conv3.png" height="500" width="400">

### Shape of Feature Map

Image= (h* w * d)<br>
Filter=(fh*fw)<br>
Stride=s<br>
Padding= p=1

#### Without Padding
Feature Map Shape=[(h-fh)/s+1,(w-fw)/s+1)]

#### With Padding
Feature Map Shape= [(h-fh+2p)/s+1,(w-fw+2p)/s+1)]

#### ii) Activation

a) On the Feature map obtained from convolution, activation Function (like relu) is applied. 
<img src="conv_activation.png">

b) The first  convolutional layer usually extracts basic features such as horizontal or diagonal edges. This output is passed on to the next layer which detects more complex features such as corners or combinational edges. As we move deeper into the network it can identify even more complex features such as objects, faces, etc.


### 2) Pooling Layer

1) Pooling layer is responsible for reducing the spatial size of the Convolved Feature. This is to decrease the computational power required to process the data by reducing the dimensions<br>
2) This helps in reducing the spatial size of the representation, which decreases the required amount of computation and weights.<br>
3) There are usually 2 types of Pooling - MaxPooling and AveragePooling.
Sliding a window, we only take the maximum value inside the box on the left case. This is ‘max pooling.’ We can also take the average values like the picture on the right. This is ‘average pooling.’ And we can also tune the stride like what we do at the convolution layer.<br>
Example of Pooling layer with a stride of 2<br>
<img src="pooling2.jpg" height="400" width="350">

4)	On the feature map obtained after Convolutional + Activation layer, MaxPooling layer is applied to select the maximum value from that window of given stride. <br>
In this example stride=1 <br>
<img src="pooling1.png">

5)	Pooling also prevents overfitting as there are less parameters.<br>
6) <b>Convolution + Pooling helps detect Translation Invariant features</b><br>

Example of Translation Invariance<br>
<img src="translation_invariance.jpg">

### 3) Flatten Layer

1) Flattening is used to convert all the resultant 2-Dimensional arrays from pooled feature maps into a single long continuous linear vector.<br>
2) Here we flatten the result of Conv + Pooling obtained from the last layer.<br>
3) The Flattern layer doesn’t learn anything, and thus the number of parameters is 0. <br>
4) We are making a classification model, which means these processed data should be good input to the model. It needs to be in the form of a 1-dimensional linear vector. Rectangular or cubic shapes can’t be direct inputs.<br>


### 4) Dense Layer
1) These layers are used to perform the classification the same way as ANN<br>

<img src="flattening_dense.png" height="400" width="500">

### Stride
1) Stride is the number of pixels shifts over the input matrix. When the stride is 1 then we move the filters to 1 pixel at a time. When the stride is 2 then we move the filters to 2 pixels at a time and so on.

<img src="stride.jpg" width="450">

### Padding
1) It can be same or valid. <br>
2) valid padding imples no padding. <br>
3) same padding results in padding with zeros evenly to the left/right or up/down of the input such that output has the same height/width dimension as the input.<br>
4) Padding is done so that corner and border pixels get their due weightage in terms of their participation in the convolution process.<br>

<img src="padding1.png" align="left">

### Sample Architectures of CNN models

In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D,Dense,Flatten,MaxPooling2D,AveragePooling2D

In [3]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

### Model-1

In [4]:
m1=Sequential()
m1.add(Conv2D(64,(3,3),activation='relu',input_shape=(28,28,1)))#stride=1
m1.add(MaxPooling2D(pool_size=(2,2))) #stride=2

m1.add(Conv2D(32,(3,3),activation='relu'))
m1.add(MaxPooling2D(pool_size=(2,2)))

m1.add(Flatten())
m1.add(Dense(16,activation='relu'))
m1.add(Dense(10,activation='softmax'))

m1.compile(optimizer='adam',loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

In [5]:
m1.summary()
# conv2D => 64*(1*(3*3)) + 64 = 640 
# conv2D_1 => 32*(64*(3*3)) + 32 = 18464 
# dense => 16*800 + 16 = 12816 
# dense_1 => 10*16 + 10 = 170
#Total= 640+18464+12816+170=32090

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 64)        640       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 32)        18464     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 32)          0         
_________________________________________________________________
flatten (Flatten)            (None, 800)               0         
_________________________________________________________________
dense (Dense)                (None, 16)                12816     
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1

### Model-2

In [6]:
m2=Sequential()
m2.add(Conv2D(64,(3,3),activation='relu',input_shape=(28,28,1),padding='same'))
m2.add(MaxPooling2D(pool_size=(2,2))) #stride=2

m2.add(Conv2D(32,(3,3),activation='relu'))
m2.add(MaxPooling2D(pool_size=(2,2)))

m2.add(Flatten())
m2.add(Dense(16,activation='relu'))
m2.add(Dense(10,activation='softmax'))

m2.compile(optimizer='adam',loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

In [7]:
m2.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 28, 28, 64)        640       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 12, 12, 32)        18464     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 6, 6, 32)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1152)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 16)                18448     
_________________________________________________________________
dense_3 (Dense)              (None, 10)               

### Model-3

In [8]:
m3=Sequential()
m3.add(Conv2D(64,(3,3),activation='relu',input_shape=(28,28,1),strides=2))
m3.add(MaxPooling2D(pool_size=(2,2))) #stride=2

m3.add(Conv2D(32,(3,3),activation='relu'))
m3.add(MaxPooling2D(pool_size=(2,2)))

m3.add(Flatten())
m3.add(Dense(16,activation='relu'))
m3.add(Dense(10,activation='softmax'))

m3.compile(optimizer='adam',loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

In [9]:
m3.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 13, 13, 64)        640       
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 4, 4, 32)          18464     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 2, 2, 32)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 16)                2064      
_________________________________________________________________
dense_5 (Dense)              (None, 10)               