## Convolutional Neural Networks


It is more common and usually outperform normal neural networks in image recognition and speech recognition.  

For image recognition example, in normal networks we take individual and random pixels as inputs to the network. The convolutional neural network search for patterns in the image look at groups of pixels instead of individual ones.

A convolutional network is composed of convolutional layers, which takes inputs the groups of pixels from the input layer. 

This architecture allows the network to concentrate on lower level features in the first layer and then assemble these features into larger, higher level features in the next hidden layer and so on. 

![image.png](attachment:image.png)


The receptive field of a particular neuron have two dimensions, in case of pixels, you can have height and weight window, such as 3x3, 5x5 pixels window to go into a particular neuron of a convolutional layer.





## Stride

The concept of stride is the shifting of the receptive field/window of pixels to the right or left of the image. Is defined by how many steps we are moving in each step in convolution. Default is one.

![image-2.png](attachment:image-2.png)

Overlapping columns or rows of pixels can be shared by adjacent receptive fields. 

The stride will determine the amount of overlap between windows and the size of upper layers in the network.





## Padding

The concept of padding is the process of adding zeros to the input matrix symetrically to maintain the dimension of output as in input.

![image-3.png](attachment:image-3.png)

When we have larger strides, we will face the problem eventually of the unavailabilty of pixels columns to move by stride steps. This is a problem because each receptive field must be the same for all inputs for neurons.

Two possible solutions are the **"valid" padding**, in which the algorithm ignores margin pixels so the receptive field can cover all windows symmetrically for each row and columnn.

The **"same" padding** is applied if there are informative pixels in the margins. Extra dummies columns or rows are added with no information so that the next window after stride can have the same height x weight of all the other windows. 




## Filter and feature map


Each window will have **one single value** representative of individual pixels of the window group. This single value is obtained through filter. The filter is a matrix of height X width window of pixels. For example, each value in the cells of the matrix in the right of the image below is the value of each pixel. However, it is not the pure value of pixel in RGB values, but it is the value of pixel multiplied by the **filter value** and then the values are added up. This representative value will be input for the convolutional layer. **The filter value is found by the algorithm while training the model**. 


![image-4.png](attachment:image-4.png)


A **feature map** is the the matrix containing all the representative values of each window after filtering pixels.


![image-5.png](attachment:image-5.png)


The filter can highlight particular features of the image, you can have vertical or horizontal filter. 

The filter transform the pixel values of a particular window doing sum or products of pixels in the window.


![image-6.png](attachment:image-6.png)





**The features maps are the outputs of one filter applied to the previous layer and goes into the next convolutional layer of the neural network. The feature maps can be combined into a single input for a convolutional layer. At the same time, a convolutional layer can have more than one feature map as input.**


![image-7.png](attachment:image-7.png)


## Channels

When you have colored images, each pixel is the combination of three channels, in this case, Red Green and Blue (RGB) channels, and each pixel of these channels has a value that ranges from 0 (black) to 255 (white). So, each color channel can be conbined in various proportions to obtain any color in the visible spectrum. 

Remember that RGB values are denoted in hexadecimal notation. These colored RGB pixel values will be filtered in the recepetive field/window to produce a feature map. 

## Pooling Layer

Pooling layer is to subsample (shrink, reduce) the input image in order to reduce the computational load, the memory usage, and the number of parameters to be estimated (thereby limiting the risk to overfit the model)

* There are two approaches to pool:

    1) **Max pooling**: the maximum pixel value of the batch/window is selected
    
    2) **Average pooling**: the average value of all the pixels in the batch is selected
    
The stride step number determines the amount of reducing of the original image: if stride is 2, then the pixel frame will be reduced by half. When you reduce image by pooling pixels you reduce the amount of parameters to be estimated in the neural network and the computational loading is significantly reduced. 


![image.png](attachment:image.png)