# CNN Deep Learning on the MNIST Dataset

## By Christopher Hauman
<br>

This brief guide will cover building a simple Convolutional Neural Network with keras. This is a sequel to my more detailed guide and introduction to Neural Networks, [MLP Deep Learning on the MNIST Dataset](https://github.com/chrisman1015/Deep-Learning/blob/master/MLP%20Deep%20Learning%20on%20MNIST%20Data/MLP%20Deep%20Learning%20on%20Mnist%20Data.ipynb). This will adapt and explain the CNN example in [keras' domumentation](https://keras.io/examples/mnist_cnn/).
<br>

If you're new to CNNs, I'd highly recommend you check out [Brandon Rohrer](https://youtu.be/FmpDIaiMIeA)'s guide on them, which will give you all the theory you need to know for this implimentation guide. This type of learning also falls under the umbrella of supervised machine learning, which you can learn much more about in my guides [here](https://github.com/chrisman1015/Supervised-Learning).
<br>

Note: This assumes you have basic knowledge of python data science basics. If you don't, or encounter something you're not familiar with, don't worry! You can get a crash course in my guide, [Cleaning MLB Statcast Data using pandas DataFrames and seaborn Visualization](https://github.com/chrisman1015/Cleaning-Statcast-Data/blob/master/Cleaning%20Statcast%20Data/Cleaning%20Statcast%20Data.ipynb). 
<br>
***

In [47]:
# import libraries
from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
from keras.callbacks import EarlyStopping

# to make sure gpu is being used for 
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

Let's start by importing the data as usual:

In [48]:
# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

A key difference between using keras for MLP neural networks and CNN neural networks is the input shape. MLP required the input be a flat image, while CNNs want the data to remain in the rectangular (in this case square) shape. 
<br>

Let's look at the shape of the X_train data:

In [49]:
X_train.shape

(60000, 28, 28)

We see the X_training data is 60000 28x28 images. For CNN input, we specifically need the input data to be in the format (batch, height, width, channels). This means we are lacking one dimension, the channel value. Channels contains the 3 RGB values for color data, but only one for grayscale images. We can fix the shape by assigning a dimension of 1 for the channel of the X_train and X_test data

In [50]:
print('X_train before reshaping:', X_train.shape)
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)




print("X_train after reshaping:", X_train.shape )

X_train before reshaping: (60000, 28, 28)
X_train after reshaping: (60000, 28, 28, 1)


Now the X_train and X_test data are in the correct shape. Let's also store the input shape which we'll pass to the first CNN layer similar to the MLP example. We'll also normalize the X data and force the y data into categorical as usual.

In [68]:
# get CNN first layer input shape
input_shape = X_train[0].shape
input_shape

# normalize data
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255


num_classes = 10
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

**kernel_size**
-An integer or tuple/list of a single integer, specifying the length of the 1D convolution window.
-A 3x3 kernal size means the convolutional window will be a 3x3 square.

You can read about the pooling layer [here](http://cs231n.github.io/convolutional-networks/#pool). The argument *pool_size** is a window size similar to **kernel_size**.
<br>

Other than that, our model will be very similar. The model is still sequential, and will use similar layers and arguments as the MLP model. Note that about halfway through we use **Flatten** to flatten the data into the 1-D arrays that the **Dense** layers use.

In [60]:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

# compile the model
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer='adam',
              metrics=['accuracy'])

We'll fit the model with an early stopping monitor as well.

In [64]:
# initialize early stopping monitor
early_stopping_monitor = EarlyStopping(patience=3)

batch_size = 128
epochs = 12

model.fit(X_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          callbacks = [early_stopping_monitor],
          verbose=1,)

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x1de785cfbe0>

In [65]:
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.02781256944448555
Test accuracy: 0.9938


Look at how high the accuracy is! For image classfication, CNNs are an incredibly useful tool.