# Convolutional Neural Networks (CNNs)

>*Reference:*   
https://towardsdatascience.com/a-beginners-guide-to-convolutional-neural-networks-cnns-14649dbddce8

#### **Convolution:**  
Convolution is a fundamental operation in image processing and signal processing. It involves combining two functions to produce a third function that represents how one function affects the other. In the context of image processing, convolution is often used to process images by applying a filter or kernel to the image pixels.  
> It is how the input is modified by a filter

In convolutional networks, multiple filters are taken to slice through the image and map them one by one and learn different portions of an input image. For example, the dark edges of an image are mapped onto a blank image using a convolution. The network then learns how to identify a dark edge using this mapping and filter.

<div style="display: flex; flex-wrap: wrap; justify-content: center;">
    <figure style="margin: 10px;">
        <img src="./Images/CNN1.png" alt="Dark Edge Mapping using a Convolution" style="width: auto-width; height: auto-height; object-fit: cover;">
        <figcaption style="justify-content:center;">Dark Edge Mapping using a Convolution</figcaption>
    </figure>
</div>

#### How Convolution Works:

Convolution can be done in 2D or in 3D. When the input image is in grayscale then the image has only 2 channels and hence we perform 2D convolution. However, if the image is in RGB, the image has three channels - red, blue and green. Here, we perform 3D convolution or 2D convolution three times, for each colour channel separately.


Convolution filters are usually 3 x 3 matrices

<div style="display: flex; flex-wrap: wrap; justify-content: center;">
    <figure style="margin: 10px;">
        <img src="./Images/CNN2.png" alt="2D Convolution" style="width: auto-width; height: auto-height; object-fit: cover;">
        <figcaption style="justify-content:center;">2D Convolution</figcaption>
    </figure>
</div>

Take a look at the above image. We have the following:
> *Input Image:* 4x4 2D image without any padding.   
> *Convolution Filter:*  3x3 filter  
> *Output Image:* 2x2 image  

Consider the terminologies used:
> *Padding:*   
Padding is the process of adding additional noise to the edges of an image. This is to mitigate loss and preserve spatial information.  
> *Stride:*   
Stride refers to the number of columns/rows by which the filter will move.

##### Process of Applying the Filter:

1. The filter is first mapped to the first 3 x 3 matrix in the input image. This is from [0,0]- [2,2]. Here that is: 
<div style="display: flex; flex-wrap: wrap; justify-content: center;">
    <figure style="margin: 10px;">
        <img src="./Images/CNN3.png" alt="" style="width: auto-width; height: auto-height; object-fit: cover;">
        <figcaption style="justify-content:center;"></figcaption>
    </figure>
</div>

2. Once mapped, the values at the respective positions are multiplied with each other and then all the products are added. That would be:

$$ (2 * 1) + (0 * 0) + (1 * 1) + (0 * 0) + (1 * 0) + (0 * 0) + (0 * 0) + (0 * 1) + (1 * 0) = 3 $$

3. This value obtained here is the filtered value of the first 3X3 pixel of the input image. The same is put into the [0,0] cell of the output image.

4. Then move the filter to the right/left/up/down by the stride decided upon. Repeat the above steps until all the columns and rows of the input matrix has been covered and the final output matrix is filled.

The output we obtain here is:
<div style="display: flex; flex-wrap: wrap; justify-content: center;">
    <figure style="margin: 10px;">
        <img src="./Images/CNN4.png" alt="" style="width: auto-width; height: auto-height; object-fit: cover;">
        <figcaption style="justify-content:center;"></figcaption>
    </figure>
</div>

The above process is for a 2D image - that is an image in grayscale. The same process is applied when it comes to a coloured image - except it is done three times. Once for the red channel, once for the blue and once for the green. At the end the output images for all three channels are combined to obtain the final output image. 

#### Relu Activation Function:

> *Reference:*  
https://builtin.com/machine-learning/relu-activation-function  
https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/

The rectified linear activation function or ReLU for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. It has become the default activation function for many types of neural networks because a model that uses it is easier to train and often achieves better performance.

#### Pooling

Pooling is the methodology used to reduce the x and y dimensions of a 3d image. It is similar to convolution. Convolution is usually used to reduce the y dimension of the image. 

There are two types of pooling that are commonly used:
1. Max pooling -  Maximum value is taken across the filter window
2. Average pooling - Average value is taken

#### Common CNN Architecture:

<div style="display: flex; flex-wrap: wrap; justify-content: center;">
    <figure style="margin: 10px;">
        <img src="./Images/CNN5.png" alt="" style="width: auto-width; height: auto-height; object-fit: cover;">
        <figcaption></figcaption>
    </figure>
</div>

When it comes to the implementation of CNNs, the most successful architectures use one or more stacks of convolution + pool layers with Relu activation followed by a flatten and then dense layer