# Convolutional Neural Network - Keras

**Remember:** Overfitting is detected by comparing the validation loss to the training loss. If the training loss is much lower than the validation loss, then the model might be overfitting.

We have seen that we can create a neural network model with a **Multilayer Perceptron** and we have tested it on a handwritten images dataset (Mnist). The problem is that we needed to convert the image in a vector in order to feed it to the network and MLP uses a fully connected layers creating a great deal of parameters. That means a great deal of computation and losing a lot of details on the images. 

The better option for image classification is Convolutional Neural Network or CNN.

* It works with matrix, hence we can feed the model converting the image in a matrix.
* Uses sparsely connected layers, saving a lot of time in computation.


### Local Connectivity

Locally connected layers uses far fewer parameters than a densely connected layer. The idea is that we convert the image in a matrix and divide it in 4 different areas. Every area will be connected to its specific node and the node will only learn about that area in the matrix (image). Then the nodes report to the output layer where the information learned is combined. The hidden nodes work also together with a **sharing weights system**. We can expand the number of nodes creating a **collection of hidden layers** where every collection will be responsible for learning a specific area of the image. 

### Convolutional Layers

Using the idea of the locally connected layer, we present **the convolutional layer**. We select the width and height of the convolutional window which are the weights represented on a grid called a filter, and then the window is slided vertically and horizontally on the matrix. At each position, the window defines a small section of pixels in the matrix and connects it to a specific single hidden node called the convolutional layer. 

If we want to find more patterns in the image, we can use more filter to learn better the pixels in the image. Every filter will be a collection of nodes with different weights and bias. This collection of nodes are called **feature maps** or **activation maps**.

Gray scale images are interpreted as 2D array with height and width color images are interpreted as 3D array with height, width and depth (In RGB images, the depth value is 3). It is considered as a stack of 3 two-dimensional matrices. Hence, the filter or the convolutional window needs to be also a 3 dimensional grid.

### Create Convolutional Layers

```
# Import 
from keras.layers import Conv2d

# Create architecture
Conv2D(filters, kernel_size, strides, padding, activation='relu', input_shape)
```

### Arguments

Must arguments:

- ***filters*** - The number of filters.
- ***kernel_size*** - Number specifying both the height and width of the (square) convolution window.

Optional arguments:

- ***strides*** - The stride of the convolution. If you don't specify anything, strides is set to 1.
- ***padding*** - One of 'valid' or 'same'. If you don't specify anything, padding is set to 'valid'.
- ***activation*** - Typically 'relu'. If you don't specify anything, no activation is applied. You are strongly encouraged to add a ReLU activation function to every convolutional layer in your networks.

**NOTE**: It is possible to represent both kernel_size and strides as either a number or a tuple. 

When using your convolutional layer as the first layer (appearing after the input layer) in a model, you must provide an additional ***input_shape*** argument:

- ***input_shape*** - Tuple specifying the height, width, and depth (in that order) of the input.

Do not include the input_shape argument if the convolutional layer is not the first layer in your network.

### Stride and Padding

We can control the behaviour of the convolutional layers by specifying the number of filters and the size of filters:
* To increase the number of nodes, increase the numbers of filters
* To increase the size of the detected patterns, increase the size of the filter

We also have the **hyperparameters of the stride** of the convolutional which is the amount by which the filter slides over the image. 
A stride value of 1 makes the convolutional layer the same width and height as the input image. But if we set the value to 2, when we do not have too many details in the image, then the convolutional layer will differ from the input images and the filter will fall outside the image input. We then need to also set **padding** to create another dummy column in the image input to have the same size of the convolutional layer.


### Example 1

Say I'm constructing a CNN, and my input layer accepts grayscale images that are 200 by 200 pixels (corresponding to a 3D array with height 200, width 200, and depth 1). Then, say I'd like the next layer to be a convolutional layer with 16 filters, each with a width and height of 2. When performing the convolution, I'd like the filter to jump two pixels at a time. I also don't want the filter to extend outside of the image boundaries; in other words, I don't want to pad the image with zeros. Then, to construct this convolutional layer, I would use the following line of code:
```
Conv2D(filters=16, kernel_size=2, strides=2, activation='relu', input_shape=(200, 200, 1))
```

### Example 2

Say I'd like the next layer in my CNN to be a convolutional layer that takes the layer constructed in Example 1 as input. Say I'd like my new layer to have 32 filters, each with a height and width of 3. When performing the convolution, I'd like the filter to jump 1 pixel at a time. I want the convolutional layer to see all regions of the previous layer, and so I don't mind if the filter hangs over the edge of the previous layer when it's performing the convolution. Then, to construct this convolutional layer, I would use the following line of code:
```
Conv2D(filters=32, kernel_size=3, padding='same', activation='relu')
```

### Example 3

If you look up code online, it is also common to see convolutional layers in Keras in this format:
```
Conv2D(64, (2,2), activation='relu')
```
In this case, there are 64 filters, each with a size of 2x2, and the layer has a ReLU activation function. The other arguments in the layer use the default values, so the convolution uses a stride of 1, and the padding has been set to 'valid'.

### Dimencionality of the CNN

Create a CNN and change the parameters. Observe the architecture

### Formula: Number of Parameters in a Convolutional Layer

The number of parameters in a convolutional layer depends on the supplied values of ***filters***, ***kernel_size***, and ***input_shape***.

The number of parameters in the convolutional layer is given by:
```
filters * kernel_size(height * width) * input_shape + Bias(one bias per filter).
```

### Shape of a Convolutional Layer

The shape of a convolutional layer depends on the supplied values of ***kernel_size***, ***input_shape***, ***padding***, and ***stride***.

The ***depth*** of the convolutional layer will always equal the number of filters.

### Code

In [1]:
from keras.models import Sequential
from keras.layers import Conv2D

model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, strides=2, padding='valid',
                 activation='relu', input_shape=(200, 200, 1)))
model.summary()

Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 100, 100, 16)      80        
Total params: 80
Trainable params: 80
Non-trainable params: 0
_________________________________________________________________


In [5]:
from keras.models import Sequential
from keras.layers import Conv2D

model = Sequential()
model.add(Conv2D(filters=32, kernel_size=3, strides=2, padding='same',
                 activation='relu', input_shape=(128, 128, 3)))

model.summary()
# The number of parameters is (32 x 3 x 3 x 3) + 32 = 896
# The depth of a convolutional layer is 32
# The width of the convolutional layer is 64

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_3 (Conv2D)            (None, 64, 64, 32)        896       
Total params: 896
Trainable params: 896
Non-trainable params: 0
_________________________________________________________________


### Pooling Layers

***Remember:*** Convolutional layers are stack of features maps where we have one feature map for each filter. A complicated dataset will require large number of filters responsible for finding a pattern which means a high number of parameters.


Pooling Layers are often taking convolutional layers as input. It is a method to reduce the dimencionality of the CNN. Higher dimencionality means more parameters which can lead to overfitting. Therefore, the goal of the pooling layer is to avoid overfitting for cases where our model needs a lot of features (filters)

There are 2 types:
- **Max Pooling Layer** - It is defined by a window size and stride as the convolutional layer. I will go stride by stride and will select the max value in each stride to create a new feature map that has been reduced in width and height. 
- **Global Average Pooling Layer** - This is a more extreme type of dimensionality reduction. It takes a stack of feuture maps and computes the average value of the nodes for each map in the stack. Therefore, it takes a 3D array and turns it into a vector.

### Create Max Pooling Layers

```
# Import
from keras.layers import MaxPooling2D

# Create architecture
MaxPooling2D(pool_size, strides, padding)
```

### Arguments

Must arguments:

- ***pool_size*** - Number specifying the height and width of the pooling window.

Optional arguments:

- ***strides*** - The vertical and horizontal stride. If you don't specify anything, strides will default to *pool_size*.
- ***padding*** - One of 'valid' or 'same'. If you don't specify anything, padding is set to 'valid'.

**NOTE**: It is possible to represent both *pool_size* and *strides* as either a number or a tuple.

### Example

Say I'm constructing a CNN, and I'd like to reduce the dimensionality of a convolutional layer by following it with a max pooling layer. Say the convolutional layer has size (100, 100, 15), and I'd like the max pooling layer to have size (50, 50, 15). I can do this by using a 2x2 window in my max pooling layer, with a stride of 2, which could be constructed in the following line of code:

```
MaxPooling2D(pool_size=2, strides=2)
```

### Code

In [6]:
from keras.models import Sequential
from keras.layers import MaxPooling2D

model = Sequential()
model.add(MaxPooling2D(pool_size=2, strides=2, input_shape=(100, 100, 15)))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
max_pooling2d_1 (MaxPooling2 (None, 50, 50, 15)        0         
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________


### CNNs for Image Classification

- The most used layers in CNNs are Fully Connected Layers, Flatten Layers, Convolutional Layers and Pooling Layers and they need to be arrange carefully to design a CNN architecture.

- The model must accept an image array as input. It is common to resize the images to a square with the spatial dimensions equal to the power of two. For example: height*width*depth = 32 * 32(pixels) * 1(or 3, if it is a color image) since images are interpreted by computers as 3D array.

- As the image passes through the model, the convolutional layers make the array deeper while the pooling layer decrease the spatial dimensions. 

- It is common practice to set the prameters *strides*=1 and *padding*='same' so the with and height is the same as the previous layer.

- Ususally the numbers of filters increase in sequence like 16, 32, 64, and so on.

- Usaually the pooling layers are set to *pool_size*=2 and *stride*=2 so they are half of the previous layer. 

- Usually, when the initial array has been transformed into a reduced dimension and deep array, it is feed to a fully connected layer to process the information. In same cases, we will flat the array al the way to converted to a vector and then feed it to the fully connected layer.

- Finally, it is common practice to connect the dense layer to a final dense layer with activation *softmax* for classification according to the objects classes we have. Note that, it is common practice to use *relu* activation for all other layers.

**Things to Remember**

- Always add a ReLU activation function to the Conv2D layers in your CNN. With the exception of the final layer in the network, Dense layers should also have a ReLU activation function.
- When constructing a network for classification, the final layer in the network should be a Dense layer with a softmax activation function. The number of nodes in the final layer should equal the total number of classes in the dataset.
Have fun! If you start to feel discouraged, we recommend that you check out Andrej Karpathy's tumblr with user-submitted loss functions, corresponding to models that gave their owners some trouble. Recall that the loss is supposed to decrease during training. These plots show very different behavior :).

### Code

In [7]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# The network begins with a sequence of three convolutional layers, followed
# by max pooling layers. These first six layers are designed to take the input
# array of image pixels and convert it to an array where all of the spatial
# information has been squeezed out, and only information encoding the content
# of the image remains. The array is then flattened to a vector in the seventh
# layer of the CNN. It is followed by two dense layers designed to further
# elucidate the content of the image. The final layer has one entry for each
# object class in the dataset, and has a softmax activation function, so that it
# returns probabilities.

model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, padding='same', activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPooling2D(pool_size=2))

model.add(Conv2D(filters=32, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))

model.add(Conv2D(filters=64, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))

model.add(Flatten())

model.add(Dense(500, activation='relu'))

model.add(Dense(10, activation='softmax'))

# Display model's architecture
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 32, 32, 16)        208       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 16, 16, 16)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 16, 16, 32)        2080      
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 32)          0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 8, 8, 64)          8256      
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 4, 4, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1024)              0         
__________

### Documentation 

https://keras.io/layers/convolutional/

### Acknowledgement

* Udacity, Inc.