# Convolutional Neural Network

**Remember:** Overfitting is detected by comparing the validation loss to the training loss. If the training loss is much lower than the validation loss, then the model might be overfitting.

We have seen that we can create a neural network model with a **Multilayer Perceptron** and we have tested it on a handwritten images dataset (Mnist). The problem is that we needed to convert the image in a vector in order to feed it to the network and MLP uses a fully connected layers creating a great deal of parameters. That means a great deal of computation and losing a lot of details on the images. 

The better option for image classification is Convolutional Neural Network or CNN.

* It works with matrix, hence we can feed the model converting the image in a matrix.
* Uses sparsely connected layers, saving a lot of time in computation.


**Local Connectivity**

Locally connected layers uses far fewer parameters than a densely connected layer. The idea is that we convert the image in a matrix and divide it in 4 different areas. Every area will be connected to its specific node and the node will only learn about that area in the matrix (image). Then the nodes report to the output layer where the information learned is combined. The hidden nodes work also together with a **sharing weights system**. We can expand the number of nodes creating a **collection of hidden layers** where every collection will be responsible for learning a specific area of the image. 

**Convolutional Layers**

Using the idea of the locally connected layer, we present **the convolutional layer**. We select the width and height of the convolutional window which are the weights represented on a grid called a filter, and then the window is slided vertically and horizontally on the matrix. At each position, the window defines a small section of pixels in the matrix and connects it to a specific single hidden node called the convolutional layer. 

If we want to find more patterns in the image, we can use more filter to learn better the pixels in the image. Every filter will be a collection of nodes with different weights and bias. This collection of nodes are called **feature maps** or **activation maps**.

Gray scale images are interpreted as 2D array with height and width color images are interpreted as 3D array with height, width and depth (In RGB images, the depth value is 3). It is considered as a stack of 3 two-dimensional matrices. Hence, the filter or the convolutional window needs to be also a 3 dimensional grid.


**Stride and Padding**

We can control the behaviour of the convolutional layers by specifying the number of filters and the size of filters:
* To increase the number of nodes, increase the numbers of filters
* To increase the size of the detected patterns, increase the size of the filter

We also have the **hyperparameters of the stride** of the convolutional which is the amount by which the filter slides over the image. 
A stride value of 1 makes the convolutional layer the same width and height as the input image. But if we set the value to 2, when we do not have too many details in the image, then the convolutional layer will differ from the input images and the filter will fall outside the image input. We then need to also set **padding** to create another dummy column in the image input to have the same size of the convolutional layer.


## Convolutional Layers in Keras

**Import** the module:

```
from keras.layers import Conv2d
```

**Create** a convolutional layer by using the following format:

```
Conv2D(filters, kernel_size, strides, padding, activation='relu', input_shape)

```

**Arguments**

Must arguments:

- ***filters*** - The number of filters.
- ***kernel_size*** - Number specifying both the height and width of the (square) convolution window.

Optional arguments:

- ***strides*** - The stride of the convolution. If you don't specify anything, strides is set to 1.
- ***padding*** - One of 'valid' or 'same'. If you don't specify anything, padding is set to 'valid'.
- ***activation*** - Typically 'relu'. If you don't specify anything, no activation is applied. You are strongly encouraged to add a ReLU activation function to every convolutional layer in your networks.

**NOTE**: It is possible to represent both kernel_size and strides as either a number or a tuple. 

When using your convolutional layer as the first layer (appearing after the input layer) in a model, you must provide an additional ***input_shape*** argument:

- ***input_shape*** - Tuple specifying the height, width, and depth (in that order) of the input.

Do not include the input_shape argument if the convolutional layer is not the first layer in your network.

**Documentation** can be find here:

https://keras.io/layers/convolutional/

**Example 1**

Say I'm constructing a CNN, and my input layer accepts grayscale images that are 200 by 200 pixels (corresponding to a 3D array with height 200, width 200, and depth 1). Then, say I'd like the next layer to be a convolutional layer with 16 filters, each with a width and height of 2. When performing the convolution, I'd like the filter to jump two pixels at a time. I also don't want the filter to extend outside of the image boundaries; in other words, I don't want to pad the image with zeros. Then, to construct this convolutional layer, I would use the following line of code:

```
Conv2D(filters=16, kernel_size=2, strides=2, activation='relu', input_shape=(200, 200, 1))
```

**Example 2**

Say I'd like the next layer in my CNN to be a convolutional layer that takes the layer constructed in Example 1 as input. Say I'd like my new layer to have 32 filters, each with a height and width of 3. When performing the convolution, I'd like the filter to jump 1 pixel at a time. I want the convolutional layer to see all regions of the previous layer, and so I don't mind if the filter hangs over the edge of the previous layer when it's performing the convolution. Then, to construct this convolutional layer, I would use the following line of code:

```
Conv2D(filters=32, kernel_size=3, padding='same', activation='relu')
```

**Example 3**

If you look up code online, it is also common to see convolutional layers in Keras in this format:

```
Conv2D(64, (2,2), activation='relu')
```

In this case, there are 64 filters, each with a size of 2x2, and the layer has a ReLU activation function. The other arguments in the layer use the default values, so the convolution uses a stride of 1, and the padding has been set to 'valid'.

***Acknowledgement***

* Udacity, Inc.