# CNNs in `keras`

Notebook by [Aaron Berk](http://asberk.ca) for the 2017 [BC Data Science workshop](http://workshop.bcdata.ca).

In this tutorial notebook, we will demonstrate how to construct a CNN model in keras for recognizing hand-written digits. We leave it as an exercise to the reader to combine the material from the previous 3 notebooks in order to complete the training and evaluation of the constructed CNN. 

## Import packages

In [None]:
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
import tensorflow as tf
sess = tf.Session()

In [None]:
from keras import backend as K
K.set_session(sess)

`keras` has several options for convolutional layers. Since we're working with images, we'll be using a `Conv2D` layer. À la standard machine learning practice, we'll also be using max pooling in order to aggregate our data (and decrease the dimension). 

In [None]:
from keras.models import Model
from keras.layers import Dense, Dropout, Input
from keras.layers import Conv2D, Flatten, MaxPooling2D

Next, load the method that imports the MNIST data set, and a method to convert the `y` data to categorical type (one-hot encoding). 

In [None]:
from keras.datasets import mnist
from keras.utils import to_categorical

Import `classification_report` and `confusion_matrix` in order to display the testing results.

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

## Preliminary definitions

To construct our CNN, we'll be chaining together a bunch of convolutional layers, and a bunch of max-pooling layers: $$
\mathrm{data} \to \mathrm{Conv2D} \to \mathrm{MaxPool2D} \to \mathrm{Conv2D} \to \mathrm{MaxPool2D} \to \mathrm{Conv2D} \to \mathrm{MaxPool2D} \to \cdots
$$
Therefore, it will be easiest if we can define a set of functions in order to speed up the construction of our model. Building a CNN in `keras` is very similar to building a dense network in `keras`, but uses different kinds of layers.

* `Conv2D` takes as input a number of `filters`, a `tuple` for the `kernel_size`, an `activation` function, a `stride` and a `padding`. For all of our `Conv2D` layers, we will be using `kernel_size=(3,3)`, `activation='relu'`, and `padding='same'`.
* `MaxPool2D` takes as input a `pool_size` and a `padding`. We will use a standard `(2,2)` and `'same'` for these, respectively.

To combine the two together, we can create a helper function called `convMP` which performs convolution and then max-pooling.

In [None]:
def reluConv2d(x, filters=16):
    return Conv2D(filters=filters, kernel_size=(3,3), 
                  activation='relu', padding='same')(x)

def mp2d(x):
    return MaxPooling2D(pool_size=(2,2), 
                        padding='same')(x)

def convMP(x, filters=16):
    return mp2d(reluConv2d(x, filters))

## Construct the network

In [None]:
input_img = Input(shape=(28,28,1))
x = convMP(input_img)
x = convMP(x)
x = convMP(x)
x = Flatten()(x)
final_cnn = Dense(10, activation='softmax')(x)

digits_cnnclf = Model(input_img, final_cnn, name='cnnClassifier')

In [None]:
digits_cnnclf.compile(optimizer='adadelta',
                      loss='categorical_crossentropy',
                      metrics=['mse', 'accuracy'])

## Load data

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))  # adapt this if using `channels_first` image data format

In [None]:
y_train_1h = to_categorical(y_train)
y_test_1h = to_categorical(y_test)

In [None]:
print(x_train.shape)
print(x_test.shape)

## Exercises

<div class='alert alert-block alert-info'>
1. `fit` the model to the **training data** using a `validation_split` of `.2` and 10 epochs. Remember to `shuffle` the data!  
2. Evaluate the model **on the test data**. Report the accuracy.  
3. Display a classification report for the **test** data.  
4. What is precision? recall? support?  
5. Plot a confusion matrix, using the *inferno* colormap.  
6. What is the total number of *trainable parameters* the model has? How many in each layer? Are there more or fewer parameters than in the previous model?
7. Save the your CNN model. In a new notebook, load your *simple model* and your *CNN model*. On which entries of the testing data do the two models differ in their predicitions? Visualize these images using `matplotlib`. 
</div>