# Convolutional neural networks
Currently following this tutorial: https://medium.freecodecamp.org/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050

## General
The layers of convolutional neural networks are, unlike in classical (deep) neural networks, 3-dimensional, having width, height and depth.
<br>
Width and heigth correspond to the width and height of the images processed and the depth can correspond to the brightness, greyscale or the three primary colors (RGB).
<br>
<br>
Another major difference is, that the neurons in one layer needn't connect to all neurons in the next layer. While there often are fully connected layers in the end, for the convolutional layers usually several neurons in one layer are connected to only few or one in the next layer.
<br>
The convolutional layers are in the beginning and their job is the recognition of features. The first layers might only recognize simple things like lines or edges, but the further you go the more complex the shapes become (a head, a leg, a leaf?). The job of the fully connected layers is to put all this information together and use it to make a good prediction on whatever the goal of the network is (like detecting pedestrians).

### The convolution
In mathematics a convolution is the combination of two functions in a certain way to result in a third function.
<br>
In machine learning a convolution is the combination of the input image (or the output of a previous layer) and a filter/kernel. A filter has a certain job, like detecting edges. Combined with the input image the filter detects edges in the image and passes the result on to the next layer for further processing. The result is often called a feature map, since it contains the features detected by the previous layer(s).
<br>
<br>
Further (and quite extensive) information can be found here: http://timdettmers.com/2015/03/26/convolution-deep-learning/

### Filters / Kernels
A filter is used to extract features from an image or from results returned by previous layers. The terms filter and kernel can be used interchangably.
<br>
The filter almost never spans the entire image, usually it is quite small e.g. 3x3 pixels. The thought behind this is to limit the focus of processing to only small parts of the image at a time and then put this information together, which is why the area of the filter is sometimes called the receptive field.
<br>
To still cover the entire field the filter is moved across the image. You could imagine this as blocking out everything in the image except the filter area. When you are done looking at this part you move the filter, so the area you look at changes. With every step the filter is moved across the image until every part of it has been passed through the filter.

### Stride
The stride is the amount of pixels a filter moves over the layer with each step. If you have a stride of 1, the filter moves one pixel at a time. If you have a stride of 2 the filter moves 2 pixels with each step and so on.
<br>
This is repeated until every part of the image has been filtered.

### Padding
Depending on the size of the filter (if it is bigger than 1x1), you will reduce the size of your input by filtering. Imagine having a 4x4 input and sliding a 3x3 filter over it. Since the filter is bigger than 1x1 it can not have every single pixel as it's center. This means instead of applying the filter with all 16 pixels, in this case we could only apply it with 4 pixels at it's center. In other words the output would not be 4x4 anymore but 2x2.
<br>
This can, especially over several convolution layers, reduce the size of the resulting feature maps significantly. To prevent this you can pad your image, meaning you add "dummy pixels" around it before filtering.
<br>
Often padding is used to keep the original input size the same, despite filter and stride size.

### Pooling layer
After a convolution layer you usually have a pooling layer. This layer reduces the resulting feature map, it pools the resulting values to get a smaller result.
<br>
For example you can have a pooling layer with a 2x2 receptive field and a stride of 2. If you look at a 4x4 image you have to slide the pool across the feature map 4 times, each time looking at 4 totally different pixels.
Since the goal of this layer is to reduce the feature map's size you usually reduce all the values looked at up the pool (in this case 4) into a single value.
<br>
There are several ways to do this, you can simply select the max value (max pool) or calculate the arithmetic mean.

# Code example

In [12]:
import numpy as np
from keras.layers import Conv2D, Activation, MaxPool2D, Flatten, Dense
from keras.models import Sequential

from keras.datasets import mnist

In [13]:
img_shape = (28, 28, 1)

model = Sequential()
# Uses 6 filters that are 2x2 in size
model.add(Conv2D(6, 2, input_shape=img_shape))
model.add(Activation("relu"))
model.add(MaxPool2D(2))

model.add(Flatten())
model.add(Dense(10))
model.add(Activation("softmax"))
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_7 (Conv2D)            (None, 27, 27, 6)         30        
_________________________________________________________________
activation_9 (Activation)    (None, 27, 27, 6)         0         
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 13, 13, 6)         0         
_________________________________________________________________
flatten_5 (Flatten)          (None, 1014)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 10)                10150     
_________________________________________________________________
activation_10 (Activation)   (None, 10)                0         
Total params: 10,180
Trainable params: 10,180
Non-trainable params: 0
_________________________________________________________________
None


In [14]:
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["acc"])

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)

model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x320b828>

# Questions
- How do filters work (detect lines etc.)
<br>
<br>
- Provide a notebook that helps other beginners more than most blogposts currently do
- Hard to explain this without adding images