### Convolutional Neural Networks

General guides: 
https://medium.freecodecamp.org/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050

Adit Deshpande (2016) - The 9 deep learning papers you need to know about (understanding CNNs part 3)
https://adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html


Example: http://scs.ryerson.ca/~aharley/vis/conv/flat.html

<img src="files/CNNs.png">


Steps:
1. Convolution
2. Max Pooling
3. Flattening
4. Full Connection


Yann Lecun (facebook) - godfather of CNN's

eg Yann Lecun (1998) - Gradient-based learning applied to document recognition
http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

Math: 
Jianxin Wu (2017) - Introduction to Convolutional Neural Networks:
https://pdfs.semanticscholar.org/450c/a19932fcef1ca6d0442cbf52fec38fb9d1e5.pdf


#### What are Convolutional Neural Networks

CNN's are networks that are used to classify images.
Image = matrix of pixels;


#### Convolution Operation

Filters that are applied to images to detect certain features in an image.

Convolution product:

In general: $(f * g)(t) = \int_{-inf}^{+inf} f(\tau) g(t-\tau) d\tau$

For images: matrix form --> feature detectors = Kernels = filters

(Input Image * Feature Detector) = (Feature Map) [= convolved map, activation map,...]


<img src="files/Convolution.png">

note that the size is reduced after the filter is applied.

In the NN: different filter are applied to the original image --> creates feature maps.

##### ReLU Layer

After the convolution layer: a rectifier function layer $\phi(x) = \max(x,0)$ is added.
Add non-linearity to the model.
Only keep the important 'features' of an image after a filter is applied.

for more info: 

Examples: http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf

C. - C. Jay Kuo (2016) - Understanding Convolutional Neural Networks with a Mathematical Model 
    https://arxiv.org/pdf/1609.04112.pdf

Alternative to ReLU (parametric ReLU): 

Kaiming He et al.(2015) - Delving deep into rectifiers: surpassing human-level performance on ImageNet Classification
https://arxiv.org/pdf/1502.01852.pdf


#### (Max) Pooling

Spatial invariance: features can be tilted, a bit distorted (rotated etc;) the network must still be able to recognise the feature.

Feature map --> Pooled Feature Map

<img src="files/max_pooling.png">

Large numbers represent largest "features", distrotion is removed from the image --> (images are simplified).
 - total image size is reduced.
 - overfitting is reduced (only the features are maintained).
 
Different options for pooling:

Dominik Scherer et al. (2010) - Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition
http://ais.uni-bonn.de/papers/icann2010_maxpool.pdf


#### Flattening

Change the matrix of the feature map to a vector. Which can be used as input to an ANN.



#### Fully Connected Layer

<img src="files/ANN_layer.png">

Fully connected ANN is created that combines the features to predict the output (classification).

Backpropagation: through the ANN, and the CNN network to change the feature detectors.


#### Softmax & Cross-Entropy

During training: Need to find what neurons are important for a certain output value (e.g. dog).
The output neurons need to 'learn' what nodes in the final hidden layer are usefull for a certain output (=voting).

- Softmax (normalized exponential function), so the total chance of the output is 1.

$ f_j(z) = \frac{e^{z_j}}{\sum_k e^{z_k}}$


- Cross-entropy - cost function (loss function)

$H(p,q) = - \sum_x p(x) \log{q(x)}$

- p = true value 
- q = predicted value
- x = a certain node

Cost-function for assessing the network performance of a CNN. (also called the loss-function).

Why cross-entropy over mse? 

- Small values --> gives problems with gradient backward propagation. Especially in the beginning of the algorithm. (Small adjustments in absolute terms can be huge in relative terms).

To know more:
Youtube video: the softmax output function by Jeffrey Hinton.

Rob DiPietro (2016) - A friendly introduction to cross-entropy loss
    https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/

Peter Roelants (2016) - How to implement a neural network Intermezzo 2

https://peterroelants.github.io/


Keras documentation

https://keras.io/


## Image Classification Problem

In [1]:
# Convolutional Neural Network

#libraries
%matplotlib notebook   
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os

# - Set path - 
#convert to raw string and add an extra \ to the end (not to escape the string)
dir = (r'C:\Users\msfernandez\Machine Learning A-Z\Machine Learning A-Z Template Folder\Part 8 - Deep Learning\Section 40 - Convolutional Neural Networks (CNN)\\')
os.chdir(dir)

# - - - - - - - - - - - - - - 
# Part 1: Build the CNN
# - - - - - - - - - - - - - -

# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense

    # Initialising the CNN
    # - - - - - - - - - - - - - -


classifier = Sequential()



Using TensorFlow backend.


In [2]:
# Step 1 - Convolution
# 32 feature detections, with size (3,3).
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))

# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2, 2))) #stride of 2 & max pooling.

# Add a 2nd layer to improve the accuracy.
# popular to double the number of filters to 64 in the 2nd layer
classifier.add(Conv2D(32, (3, 3), activation = 'relu')) # convolution
classifier.add(MaxPooling2D(pool_size = (2, 2))) # pooling

# Step 3 - Flattening
classifier.add(Flatten())

# Step 4 - Full connection
classifier.add(Dense(units = 128, activation = 'relu'))  # fully connected layer (dense)
classifier.add(Dense(units = 1, activation = 'sigmoid')) # output layer

    # Compiling the CNN
    # - - - - - - - - - - - - - -

classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy']) #binary outcome

# Part 2 - Fitting the CNN to the images

from keras.preprocessing.image import ImageDataGenerator

# Image augmentation, to prevent overfitting on the small set.
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_directory('dataset/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')

test_set = test_datagen.flow_from_directory('dataset/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

classifier.fit_generator(training_set,
                         steps_per_epoch = 250, #number of unique samples/batch size = 8000/32.
                         epochs = 25,
                         validation_data = test_set,
                         validation_steps = 62) #number of unique sampels/batch size = 2000/32


Instructions for updating:
Colocations handled automatically by placer.
Found 8000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Instructions for updating:
Use tf.cast instead.
Epoch 1/1


<keras.callbacks.History at 0x14d8921ae80>