## Purpose

In this notebook, we create a basic convolutional neural network so as to perform a 10 class classification task on the mnist dataset.  

In [1]:
from keras.datasets import mnist
from keras.utils import to_categorical
from keras import layers
from keras import models
import warnings
warnings.filterwarnings('ignore')

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


## Data

We will begin by loading the mnist dataset.

In [2]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


In [3]:
print(train_images.shape)
print(test_images.shape)
print(train_labels.shape)

(60000, 28, 28)
(10000, 28, 28)
(60000,)


Therefore, our training set consists of 60000 images of size 28x28 and our test set of 10000 images of the same size. Let us now see the labels of the first five trainig images so as to see how they are expressed.

In [4]:
for i in range(5):
  print(train_labels[i])

5
0
4
1
9


We see that the label gives us the number corresponding to the image. Recall that our images show numbers from 1 to 10, so the labels go from 0 to 9. Therefore, we will need to convert them to one-hot encoding when preprocessing the data.

## Model

Let us now created the model. It will be a convolutional neural network. Keep in mind that:
* Our images are of size 28x28.
* Our tasks consists in performing a classification within 10 possible classes.

In [5]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))







Let us now display the architecture of our network.

In [6]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                36928     
__________

## Preprocessing

Before training, we will preprocess our data by reshaping it into the shape that the network expects, and scaling it so that all values are in the [0, 1] interval. For instance, our training images are stored in an array of shape (60000, 28, 28) of type uint8 with values in the [0, 255] interval. So we are now going to transform it into a float32 array of shape (60000, 28 * 28) with values between 0 and 1.<br>
On the other hand, we need to perform a one-hot encoding of the labels so that a sample with a label $N$ is represented by a vector of all $0$s and a $1$ in the $N$th position.

In [7]:
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

## Training

In [8]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=5, batch_size=64)



Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Let us now evaluate the model on the test dataset.

In [9]:
test_loss, test_acc = model.evaluate(test_images, test_labels)



In [10]:
print('test accuracy: ', test_acc)
print('test loss: ', test_loss)

test accuracy:  0.9905
test loss:  0.03641391144730378


We see that the generalization accuracy and error are very high and low respectively. So this trained model generalizes well to unseen data. 

## Save the model

In [11]:
model.save("model.h5")
print("Model saved")

Model saved
