# Part 2: Training Convolutional Neural Networks

In [None]:
%matplotlib inline

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# for elementary image manipulation

# specifies the default figure size for this notebook
plt.rcParams['figure.figsize'] = (10, 10)

# specifies the default color map
plt.rcParams['image.cmap'] = 'gray'

# we'll use keras to build our networks
from tensorflow import keras
from keras.models import Sequential

from keras.layers import Flatten, Dense, Dropout
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import ZeroPadding2D

from keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras.utils import to_categorical

## Training your own Convolutional Neural Network

We're going to use the MNIST dataset, in which the goal is to categorise images in one of 10 categories. ([more information about MNIST here](http://yann.lecun.com/exdb/mnist/))

If you're interested in benchmarks, have a look [here](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html). 

### Loading the MNIST dataset

Load the MNIST dataset

* use `mnist.load_data()` to load the data (hopefully you have done that earlier)
* separate train and test
* normalise the values by 255
* there are 10 categories so use `np_utils.to_categorical` to specify the output has 10 categories


In [None]:
# Add your code here to load and prepare the cifar10 data
# Load the data
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()

# Turn our images into floating point numbers
X_train = X_train.astype('float32').reshape(-1, 28, 28, 1)
X_test = X_test.astype('float32').reshape(-1, 28, 28, 1)

# Put our input data in the range 0-1
X_train /= 255
X_test  /= 255

# Convert class vectors to binary class matrices
Y_train = to_categorical(y_train, 10)
Y_test  = to_categorical(y_test, 10)

# Check the shape of the data
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

### Building the model

Let's define a model, you will use a small model so that the training is not too slow. Let's go step-by-step.

### Implementing a convolutional model

You are going to define the first convolutional layer of the network. 

In what follows you don't have to modify the cells but just run them making sure you understand what is being done. Do not tune the parameters as we will load pre-trained weights on the architecture!

In [None]:
# Create the model, it's a Sequential model (stack of layers one after the other)
model = Sequential()

# On the very first layer, you must specify the input shape
# Your first convolutional layer should have 28 3x3 filters, 
# the tuple (1, 1) indicates it's one pixel and symmetric.
# and will use a relu activation function
model.add(
    Conv2D(
        32,
        (5, 5),
        padding='same', 
        input_shape=(28, 28, 1),
        activation='relu'
    )
)
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.1))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.1))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(10, activation='softmax'))

In [None]:
model.summary()

**Quiz: HOW MANY WEIGHTS IN THE NETWORK?**

- How many convolution weights does the first layer contain? What about the second layer?
- Are there any other weights in those layers?

### Define the training schedule

Using the Adam optimizer, you can compile the model.

In [None]:
# Using the Adam optimizer, as before

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

### Image pre-processing

We have to define the preprocessing for the images. Here we define:

* we introduce a random horizontal and vertical shift to create more ("perturbed") training samples (makes the NN more robust as well)
* why do you think we shouldn't flip images?

This is called data or image **augmentation**. Such randomization can improve things significantly, especially if you have small datasets.

In [None]:
# Preprocessing, does both normalization and augmentation
datagen = ImageDataGenerator(
        rotation_range=10,                        # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.1,                   # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,                  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,                    # randomly flip images
        vertical_flip=False)                     # randomly flip images

# Compute quantities required for featurewise normalization (std, mean)
datagen.fit(X_train)
batch_size = 128
data_gen = datagen.flow(X_train, Y_train, batch_size=batch_size)

And you're set! You can start training and see the accuracy improve! 

Use the `.fit(data_gen, ...)` method, with `batch_size = 16`, 3 epochs, and add the validation data, testing the validation accuracy every epoch. 

In [None]:
nb_epoch = 2

model.fit(
    data_gen,
    steps_per_epoch=X_train.shape[0] // batch_size,
    epochs=nb_epoch,
    validation_data=(X_test, Y_test),
    validation_freq=1)

In [None]:
# how did this model perform?
# evaluate the model on the test data
model.evaluate(X_test, Y_test)


In [None]:
# check the first test image
image = X_test[:1]
plt.imshow(image.reshape(28, 28), cmap='gray')
print(f"Is this number {model.predict(image).argmax()}?")
