# A Convolutional Neural Network implementation example in Keras

[Code for original example](https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py)

### Set-up configurations

In [48]:
batch_size = 128
num_classes = 10
epochs = 1

Input image dimensions

In [2]:
img_rows, img_cols = 28, 28

### Load the data

In [3]:
from keras.datasets import mnist

Using TensorFlow backend.


In [4]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

### A little bit of housekeeping...

In [5]:
import keras
from keras import backend as K

Different Keras' backends assume different data representations. We are using Tensorflow, but, you know, best practices ;)

In [8]:
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

In [9]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

Normalize the valures from [0;255] to [0;1] range

In [11]:
x_train /= 255
x_test /= 255

Convert class vectors to binary class matrices

In [18]:
y_train

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [19]:
y_train = keras.utils.to_categorical(y_train, num_classes)

In [22]:
y_train

array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 1.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.]])

In [23]:
y_test = keras.utils.to_categorical(y_test, num_classes)

### Sanity check

Shape of the training tensor

In [13]:
x_train.shape

(60000, 28, 28, 1)

Number of samples for training

In [15]:
x_train.shape[0]

60000

Number of samples for testing

In [21]:
x_test.shape[0]

10000

## Build the model

In [26]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D

In [33]:
model = Sequential()

In [34]:
model.add(Conv2D(32, 
                 kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))

Relu

<img src="http://cs231n.github.io/assets/nn1/relu.jpeg">

Convolutional layer

<img src='http://engineering.flipboard.com/assets/convnets/Convolution_schematic.gif'>

Each convolutional layer has multiple filters in it:

<img src='https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/img/53c160301db12a6e.png'>

In [35]:
model.add(Conv2D(64, 
                 kernel_size=(3, 3),
                 activation='relu'))

In [36]:
model.add(MaxPooling2D(pool_size=(2,2)))

This is what max-pooling does:

<img src='http://i.imgur.com/MxuEsSo.gif'>

In [37]:
model.add(Dropout(0.25))

In [38]:
model.add(Flatten())

In [39]:
model.add(Dense(128, activation='relu'))

In [40]:
model.add(Dropout(0.5))

In [41]:
model.add(Dense(num_classes, activation='softmax'))

### Compile the model: set the loss and optimization definition

In [42]:
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

## Train the model

In [49]:
history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/1


In [45]:
score = model.evaluate(x_test, y_test, verbose=0)

Test loss:

In [46]:
score[0]

0.13700307509973644

Test accuracy:

In [47]:
score[1]

0.95779999999999998

## Exercise 1.

Try running the model for more iterations (i.e. increase the 'epochs' variable to, e.g., 12). What do you detect? 

## Exercise 2. (the hard one)

Change the model definition (or create a new model) with the following layers configuration:
 * **2D Convolution**, with 32 filters and a 5x5 window, relu activation;
 * **Max-Polling**, with a factor of 2 horizontally and vertically;
 * **Dropout**, with probability of keeping the values of 0.7 (careful with this one ;) );
 * **2D Convolution**, with 64 filters and a 5x5 window, relu activation;
 * **Max-Polling**, with a factor of 2 horizontally and vertically;
 * **Dropout**, with probability of keeping the values of 0.7
 * **Flatten** the output, to prepare it for the fully-connected layer;
 * A **Dense** layer, with 1024 neurons;
 * An output **Dense** layer, with 10 neuros, activated with softmax.


Does the result improve?


(TIP: Go to [Keras documentation](https://keras.io/) to see the layers methods definition.)

## Exercise 3. (the hardest one)

Can you come up with an architecture and configuration that improves on the previous results? Show it ;)