# Convolutional Neural Networks

- A CNN can use **convolutional layers** to help alleviate ANN's issues.
- A convolutional layer is created when we apply multiple image filters to the input images. 
- The layer will then be trained to figure out the best filter weight values.

A CNN also helps reduce parameters by focusing on **local connectivity**.<br> 
- Not all neurons will be fully connected.
- Instead, neurons are only connected to a subset of local neurons in the next layer (these are end up being filters).

### A fully connected neuron will be like: <br>
![ssn.png](attachment:ssn.png)

So many parameters will slow and reduce the efficiency of our network. That's way local connectivity is used: <br>
![nknk.png](attachment:nknk.png)

So the we can see the difference in a more clear way:<br>
![karsilastirma.png](attachment:karsilastirma.png)

Some mathematics involved in the convolution process:<br>
- Convolution layers consist of a set of learnable filters. Every filter has small width and height and the same depth as that of input volume (3 if the input layer is image input).<br><br>
- For example, if we have to run convolution on an image with dimension `34 x 34 x 3`. The possible size of filters can be `a x a x 3`, where `‘a’` can be 3, 5, 7, etc but small as compared to image dimension.<br><br>
- During forward pass, we slide each filter across the whole input volume step by step where each step is called **stride** (which can have value 2, 3 or even 4 for high dimensional images) and compute the dot product between the weights of filters and patch from input volume.<br><br>
- As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together and as a result, we’ll get output volume having a depth equal to the number of filters. The network will learn all the filters.

### Layers Used for Convolutional Networks

Let's take an example image of `8 x 8 x 3`.<br>
There are 4 types of layers;<br><br>
- **Input Layer:** This layer holds the raw input of the image with<br>
> - Width ==========> 4<br>
> - Height ==========> 4<br>
> - Depth ==========> 3<br>

- **Convolution Layer:** This layer computes the output volume by computing the dot product between all filters and image patches. Suppose we use a total of 12 filters for this layer we’ll get output volume of dimension `8 x 8 x 12`.<br><br>
- **Activation Function Layer:** This layer will apply an element-wise activation function to the output of the convolution layer. Some common activation functions are `RELU`, `Sigmoid`, `tanh`, `Leaky RELU`, etc. The volume remains unchanged hence output volume will have dimension `8 x 8 x 12`.<br><br>
- **Pool Layer:** This layer is periodically inserted in the convolutional nets and its main function is to reduce the size of volume which makes the computation fast, reduces memory and also prevents overfitting. Two common types of pooling layers are **max pooling** and **average pooling**. If we use a max pool with `2 x 2` filters and stride 2, the resultant volume will be of dimension `4 x 4 x 12`. 

### Convolution Layers

- Running an ANN for the MNIST data set results in a network with relatiely good accuracy.
- However, there are some issues with always using ANN for image data.

**NOTE:** A single image is actually a 3-D tensor.

There's a question: how do we perform a convolution on a color image; we actually end up with a 3D filter, with values for each color channel.
- Often convolutional layers are fed into another convolutional layer.
- This allows the networks to discover patterns within patterns, usually with more complexity for later convolutional layers.

> n = input shape<br>
> f = filter shape<br>
> p = padding shape<br>
> s = stride

Output matrix shape is determined by:<br>
> `(((n + 2 * p - f) / s) + 1) x (((n + 2 * p - f) / s) + 1)` = `(m x m)`<br>
> `m` is the shape of the output layer.

**NOTE:** Even though padding is used to keep the original size, stride (***s***) reduces the output layer's shape.

### Pooling Layers

- Even with local connectivity, when dealing with color images and possibly 10s 100s of filters we will have a large amount of parameters.<br>
![cnnfilt.png](attachment:cnnfilt.png)

- Pooling layers accept convolutional layers as input.<br>
- Because our convolutional layers will often have many filters, can use pooling layers to reduce this:<br>
![pool.png](attachment:pool.png)

We can reduce the size subsampling, most known pooling types are:<br>
- Max Pooling
- Average Pooling

### Max Pooling

![max-pool.png](attachment:max-pool.png)

### Average Pooling

Below;<br> 
> ***stride*** ==========> `2`<br> 
> ***window*** ==========> `(2 x 2)`

![avg.png](attachment:avg.png)

### Another Example:

![ppool.png](attachment:ppool.png)

- This greatly reduces our number of parameters.<br>
- This poling layer will end up removing a lot of information, even a small pooling 'kernel' of `(2 x 2)` with a stride of 2 will remove %75 of the data.

- Another common technique deployed with CNN is called **Dropout**.
- Dropout can be thought of as a form of regularization to help prevent overfitting.
- During training, units are randomly dropped, along with their connections.

This helps prevent units from 'co-adapting' too much. One of the most famous CNN architectures are:<br>

> - LeNet-5 ==========> Yann LeCun <br>
> - AlexNet ==========> Alex Kirzhevsky et al. <br>
> - GoogLeNet =========> Google <br>
> - ResNet ==========> Kaiming He et al. <br>

### Flattening Layers

Flattening layers turns 2D pools to 1D vectors.

![flatt.png](attachment:flatt.png)

### Fully Connected Layers

Fully connected layers are optional, it is put at the end of the network, where it's input is the output from the previous layer and its output are N neurons, with N being the quantity of classes from the model to finalize the classification.

### Final Note

There are lots of architectures for CNN's, there is no just one true way to build a good model. 