## CNN

A Convolutional Neural Network is a very special kind of multi layer neural network. The name convolutional neural network means that the network employs mathematical operation called convolution that combines imformation from 2 sources to produce a new set of information. CNN uses this convolution operation to extract relevant explanatory features for the input image. 
 
 There are 3 main layers in simple CNN
*  Convolution Layer
*  Polling Layer
*  Fully Connected Layer

The convolutional layer is the main type of layer in CNN, where each neuron is connected to a certain region of the input area called the **receptive field**.

In a typical CNN architecture, each convolutional layer is followed by a **Rectified Linear Unit (ReLU)** layer , then a **Pooling layer** then one or more convolutional layer (+ ReLU) and finally one or more **Fully Connected Layer**. The output from each convolution layer is a set of objects called** feature maps**, generated by a single kernel filter. Then the feature maps can be used to define a new input to the next layer.

![](https://www.safaribooksonline.com/library/view/practical-convolutional-neural/9781788392303/assets/f6a1addd-d986-4e6a-b27b-388aa2bfd8f3.png)

### Why CNN?

**FFN** are powerful, but one of their main disadavantage is that it **ignores the structure of the input**. Input data to the network has to be converted into a numeric 1D array. However for higher dimensional arrays like image, it gets difficult to deal with such conversion. It is essential to preserve the structure of images, as there are lot of hidden information stored inside, this is where a CNN comes into the picture. A **CNN considers the structure of the image** while processing them.

Lets consider Face Recognition problem. In the case of FNN we need to convert a face image into 1D vector. As you can see, in the below figure, there is no positional relationships between the different rows of images. This makes our classifier less sensitive towards positional changes. To overcome this issue we need to train the network with spatial context, this is where CNN comes in


![](https://www.safaribooksonline.com/library/view/ensemble-machine-learning/9781788297752/assets/a1f5bcdc-ccc0-4201-889b-991fa63ef481.png)

Another disadvantage is MLP/FNN works fien for small images, but it break downs for larger images because of the huge number of parameters required. 

For example, a 100 X 100 image has 10000 pixels, and if the first layers has just 1000 neurons, this means 10 million conenctions.

![](https://www.safaribooksonline.com/library/view/practical-convolutional-neural/9781788392303/assets/685a8fc6-999c-4f76-92ef-0377bfa260f0.png)

CNNs solve this problem using partially connected layers. Filters/kernels which are used in intermediate layers, shares same weights. Hence CNNs use less number of parameters than MLP


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import tensorflow as tf
import time

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

## Convolutional Layer
The Main objective of convolution in relation to Convnet is to extract features from the input image. This layer does most of the computation in a ConvNet. Lets try to understand how this convolution works


    * In mathematics, convolution is a mathematical operation on two functions that produces a third function—that is, the modified (convoluted) version of one of the original functions. The resulting function gives in integral of the pointwise multiplication of the two functions as a function of the amount that one of the original functions is translated. Interested readers can refer to this URL for more information: https://en.wikipedia.org/wiki/Convolution.
    
 Convolution involves the multiplication of 2 functions, f and g, to produce a new modified function (  f * g ). For example Convolution between an image say f with a filter function , g will produce a new version of the image
 
 Lets start manually convolving  4 X 4 input with 3 X 3 filter. The first step in the convolution process is to take the element wise product of the filter and the local receptive field 
 (first nine boxes) of the input
 
 ![](https://www.safaribooksonline.com/library/view/deep-learning-quick/9781788837996/assets/47bb2d29-6a4a-46d5-8193-51c49ee62817.jpg)
 
 Once first convolution operation is done, we will just slide the filter over one row and do the same operation until filter is slided through all the rows and columns. Convolution operation reduces to a feature map of size 2 X 2 matrix.
 
  ![](https://www.safaribooksonline.com/library/view/deep-learning-quick/9781788837996/assets/1ffeca84-f312-4324-bb86-19417a50f596.jpg)
    
Filters slides over the width and height of the input volume, to produce a 2D activation that gives the reponses of that filter at every spatial position. These filter detect features like edges, blocks etc. Each dot product between filter and image chunk results in a single number. We will use multiple filters to extract different features from images.

![](https://www.safaribooksonline.com/library/view/neural-network-programming/9781788390392/assets/7059df7a-658f-47ca-b8df-63dae005f5a7.jpg)





    

## Convolution operations in TensorFlow

    conv2d(
         input,
         filter,
         strides,
         padding,
         use_cudnn_on_gpu=True,
         data_format='NHWC',
         dilations=[1, 1, 1, 1],
         name=None
     )
 
## Convolution layers in Keras

    Conv2D(filters, kernel_size, strides, padding, activation='relu', input_shape)
    
## Convolution layer in Pytorch

    torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)


### Padding 

Single Conv layer reduces the image of size 32 X 32 to activation map of size 28 X 28 which will be used as input to next layer. Subsequent Conve layer reduces the image size drastically which results in loss of information and vanishing gradient problem. To over come this padding comes into picture.

Padding increases the size of a input data by appending/prepending constants around input data. In most of the cases , this constant is zero and it called Zero padding.

**SAME padding ** The term SAME means that the output feature map has the same spatial dimensions as the input feature map. Tries to pad evenly left and right, but if the number of columns to be added is odd, it will add the extra column to the right.

**VALID padding ** VALID means no padding and only drops the rightmost columns (or bottommost rows)

### Strides
The strides causes a kernel to skip over pixels in an image and not include them in the output. The strides  determines how a convolution operation works with a kernel when a larger image and more complex kernel are used. As a convolution is sliding the kernel over the input, it is using the strides parameter to determine how it walks over the input, instead of going over every element of an input.

Let F X F be the size of the filter, Conv layer with Stride 1 and Zero Padding (F -1) / 2 will preserve the size spatially


In [None]:
sess = tf.InteractiveSession()

In [None]:
i = tf.constant([
                 [1.0, 1.0, 1.0, 0.0, 0.0],
                 [0.0, 0.0, 1.0, 1.0, 1.0],
                 [0.0, 0.0, 1.0, 1.0, 0.0],
                 [0.0, 0.0, 1.0, 0.0, 0.0]], dtype=tf.float32)
k = tf.constant([
                [1.0, 0.0, 1.0],
                [0.0, 1.0, 0.0],
                [1.0, 0.0, 1.0]
        ], dtype=tf.float32)

kernel = tf.reshape(k,[3,3,1,1], name = 'Kernel')
image = tf.reshape(i, [1,4,5,1], name = 'image')

In [None]:
res = tf.squeeze(tf.nn.conv2d(image, kernel, strides = [1,1,1,1], padding = "VALID"))

In [None]:
sess.run(res)

In [None]:
res = tf.squeeze(tf.nn.conv2d(image, kernel, strides = [1,1,1,1], padding = "SAME"))

In [None]:
sess.run(res)

**Conv Layer In short**

* Accepts a volume of size W1 X H1 X D1
* Requires 4 major hyperparameters
    * No. of filters K (usually K will be power of 2)
    * Filter Spatial extent F
    * Stride S
    * Amount of Zero padding P
* Produces a volume of size W2 X H2 X D2 where
    * W2 = (W1 - F + 2P) / S + 1
    * H2 =  (H1 - F + 2P) / S + 1
* With Parameter sharing it introduces F * F * D1 weights per filter . So in total (F * F * D1) * K weights and K Bias

## Activation Functions

## ReLU

The Rectified Linear Unit is the most commonly used activation function in deep learning models. The function returns 0 if it receives any negative input, but for any positive value  xx  it returns that value back. So it can be written as **f(x)=max(0,x) **.

*Mathematical Notation*

Graphically it looks like this

![](https://i.imgur.com/gKA4kA9.jpg)

It's surprising that such a simple function (and one composed of two linear pieces) can allow your model to account for non-linearities and interactions so well. But the ReLU function works great in most applications, and it is very widely used as a result.

## Pooling

Pooling layers help with overfitting and improve performance by reducing the size of the input tensor. Typically they are used to scale down the input, keeping important function.

**Pooling meachanisms:**
* Max -  max pooling uses the maximum value from each of a cluster of neurons at the prior layer.
* Average - Average pooling uses the mean value from each of a cluster of neurons at the prior layer.

In [None]:
inp = tf.constant([
    [
     [[1.0], [0.2], [2.0]],
     [[0.1], [1.2], [1.4]],
     [[1.1], [0.4], [0.4]]
    ] 
  ])

kernel = [1, 2, 2, 1]
max_pool = tf.nn.max_pool(inp, kernel, [1,1,1,1], "VALID")
sess.run(max_pool)

In [None]:
avg_pool = tf.nn.avg_pool(inp, kernel, [1, 1, 1, 1], "VALID")
sess.run(avg_pool)

## Fully Connected Layer

 After several convolutional and max pooling layers, the high-level reasoning in the neural network is done via fully connected layers. Neurons in a fully connected layer have connections to all activations in the previous layer, as seen in regular neural networks. Their activations can hence be computed with a matrix multiplication followed by a bias offset.

## CNN Model

### Hyperparameters

*one epoch = one forward pass and one backward pass of all the training examples*

*batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.*

*The learning rate defines how quickly or slowly a network updates its parameters.*

*Low learning rate slows down the learning process but converges smoothly. Larger learning rate speeds up the learning but may not converge. Usually a decaying Learning rate is preferred.*



**to be continued..**