<img src="images/kiksmeisedwengougent.png" alt="Banner" width="1100"/>

<div style='color: #690027;' markdown="1">
<h1>CLASSIFICATION OF THE MNIST DATASET WITH A CONVOLUTIONAL NEURAL NETWORK</h1></div>

<div class="alert alert-box alert-success">
This notebook contains a <em>concrete example of a convolutional neural network</em> that is built with the functionalities of the Python module <em>Keras</em>. <br>The example concerns a problem of <em>multiclass classification</em>, namely the classification of grayscale images of handwritten digits.</div>

The images are provided in a format of 28x28 pixels; there are 10classes, i.e. classes with labels 0 through 9.
To train the network, the MNIST-dataset is used. This dataset consists of 60,000 images to train the network and 10,000 images to test the network. <br>These images were collected by the National Institute of Standards and Technology (the NIST in the acronym MNIST) in the 1980s.

### Import necessary modules

In this notebook, you will build a *Sequential model* with Keras just like in the other MNIST notebook. <br>That is a model that consists of *linked layers*.You will be working with a neural network that first contains several *convolutional layers*, alternated with a *max pooling* operation, and finally a *feedforward* network.<br>According to the rules of the art, you work with training data, validation data, and test data.

In [None]:
The input provided does not contain any text to be translated from Dutch to English. The text "import numpy as np" is a Python statement that loads the numpy library and it doesn't need to be translated. Therefore, as per your instructions, the output is the input itself.

import numpy as npimport matplotlib.pyplot as plt
# import kerasfrom tensorflow.keras import modelsfrom tensorflow.keras import layersfrom tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import mnist

<div style='color: #690027;' markdown="1">
<h2>1. Reading the data</h2></div>

The MNIST data consists of a pair. Moreover, each of the two elements of the pair is itself a pair.
The data is structured as follows:- (training data, corresponding labels) in the first element;- (testdata, corresponding labels) in the second element.
The data therefore consists of four types of data.
You load the dataset with the instruction `mnist.load_data()`.<br>`load_data()` is a function from the `mnist` module. <br> Meanwhile, you name four variables each referring to a certain type of data in the dataset.

In [None]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()      # elements of tuples get correct name

In [None]:
train_images[4].shape

The training data is divided into effective training data and validation data. This validation set will be used to monitor the network's performance during training. This way, overfitting can be detected more quickly.

In [None]:
validation_images = train_images[0:5000]validation_labels = train_labels[0:5000]train_images = train_images[5000:]train_labels = train_labels[5000:]

<div style='color: #690027;' markdown="1">
<h2>3. Building the neural network</h2></div>

<div style='color: #690027;' markdown="1">
<h3>3.1 Architecture of the neural network</h3></div>

In [None]:
# network architecture     https://keras.io/getting-started/sequential-model-guide/
# 'Sequential model' is a model that consists of connected layers# Here first some layers that together form a convolutional network,# alternated with Max Pooling which reduces the resolution of the images (less computing power needed).# Convolutional network is followed by network with dense layers:# (feed forward network with) 1 hidden layer;# 'dense layers' means 'fully connected',# i.e. that neurons in a certain layer are connected with all neurons in the previous layer.# For the first convolutional layer, you should specify input_shape instead of input_dim (input_dim is not supported).# This input_shape is the dimension of one input data point, so here 1 MNIST image.# A convolutional layer expects a 3D tensor for an image, such as for an RGB image.# Model needs to know what form of input it can expect, i.e. dimension of the input points,# therefore, this is passed to the first layer of the Sequential model;# only to the first one because subsequent layers get that automatically, by performing mathematical operations.# Loss function needs to be minimized using an optimizer;# with metrics, you check the performance of the model.
# network architecturenetwork = models.Sequential()network.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1))) # first convolutional layer with ReLU activationnetwork.add(layers.MaxPooling2D((2,2)))                         # max pooling operationnetwork.add(layers.Conv2D(64, (3,3), activation='relu'))        # second convolutional layer with ReLU activationnetwork.add(layers.MaxPooling2D((2,2)))                         # max pooling operationnetwork.add(layers.Conv2D(64, (3,3), activation='relu'))        # third convolutional layer with ReLU activationnetwork.add(layers.Flatten())                                   # needed to be able to give output to dense layersnetwork.add(layers.Dense(64, activation='relu'))                # hidden layer with 64 neurons, ReLU activationnetwork.add(layers.Dense(10, activation='softmax'))             # output layer  10 output neurons, activation softmaxnetwork.compile(optimizer='sgd',                loss='categorical_crossentropy',                metrics=['accuracy'])                           # choose optimizer, loss and metrics

<div style='color: #690027;' markdown="1">
<h3>3.2 Training the neural network</h3></div>

For this, you previously asked for `train_images.shape`. The data points are therefore matrices. However, these data points must be presented as vectors.

The dimension of each vector is fixed in the model architecture. Thus, every data point must be converted into a vector with dimension 784.
The 60,000 28x28 matrices therefore need to be transformed into vectors, in other words, you have to transform the training set from a composition of 60,000 28x28 matrices to a composition of 60,000 vectors.
Moreover, it is better to *normalize* the values of the images.

<div class="alert alert-block alert-warning"> 
More explanation about normalizing can be found in the notebook 'Standardize'.</div>

In [None]:
# training of network
# note that the input_shape of the hidden layer is fixed in the architecture# the 60000 28x28 matrices must therefore be transformed into a vector with length 28 * 28, so length 784# transform training set from composition of 60000 28x28 matrices into stack of 60000 vectors
train_images = train_images.reshape((55000, 28, 28, 1))train_images = train_images.astype('float32') / 255      # normalize data: rescale to interval [0,1] instead of [0,255]validation_images = validation_images.reshape((5000, 28, 28, 1))validation_images = validation_images.astype('float32') / 255# transform test set of 10000 28x28 matrices into a set of 10000 vectorstest_images = test_images.reshape((10000, 28, 28, 1))test_images = test_images.astype('float32') / 255

In [None]:
# one hot encoding# store labels in another form e.g. 0 becomes 1000000000, 1 becomes 0100000000, ..., 7 becomes 00000001000 ...# so for 7 a 1 at position with index 7 (you start counting from 0) and zeros for the resttrain_labels = to_categorical(train_labels)test_labels = to_categorical(test_labels)validation_labels = to_categorical(validation_labels)

In [None]:
# training with the fit method of network, in other words, aligning the pictures and the labels with each other# 3 epochs, so going through the training set 3 times# batch_size = 64: for each epoch, 64 training images are randomly selected from the training set.# always use 64 images at a time to apply the optimizer to, so average adjustment over the 64 points# note that after each epoch, the loss and accuracy are mentioned, namely after each batchhistory = network.fit(train_images, train_labels, epochs=3, batch_size=64, validation_data=(validation_images, validation_labels))loss = history.history["loss"]epochs = range (1 , len(loss) +1)acc = history.history["accuracy"]val_acc = history.history["val_accuracy"]val_loss = history.history["val_loss"]

Do you see that the loss is decreasing and the accuracy is increasing?

In [None]:
'face': 'normal',
'size': 10}

# Maak een cirkel met een lijn
circle = plt.Circle((0.5, 0.5), 0.1, color='blue', fill=False)
ax.add_artist(circle)

# Teken de x- en y-as
plt.axhline(0, color='black')
plt.axvline(0, color='black')

# Toon de grafiek
plt.show()

Comment translation:
# Create a circle with a line
# Draw the x and y axis
# Show the graph'color': 'black','weight': 'normal','size': 16,As there's no Dutch text provided in your query, I can't provide a translation. Please provide the Dutch text for translation.plt.figure(figsize=(12,6))
plt.subplot(1,2,1)plt.plot(epochs, loss, "o", color="blue", label="train")plt.plot(epochs, val_loss, "o", color="lightblue", label="val")plt.xticks(np.arange(0, 6, step=1))plt.title("Loss on training and validation set", fontdict=font)plt.xlabel("epoch", fontdict=font)plt.ylabel("loss", fontdict=font)plt.legend(loc="lower left")
plt.subplot(1,2,2)plt.plot(epochs, acc, "o", color="green", label="train")plt.plot(epochs, val_acc, "o", color="lime", label="val")plt.xticks(np.arange(0, 6, step=1))plt.xlabel("epoch", fontdict=font)plt.ylabel("acc", fontdict=font)plt.title("Accuracy on training and validation set", fontdict=font)plt.legend(loc="lower right")
plt.show()

The accuracy of the network after training is quite good. The error is still large though.

<div style='color: #690027;' markdown="1">
<h3>3.3 Operation of the model</h3></div>

By executing the following code cell, you take two data points from the training set. First, it ensures that you are working with the correct format.

In [None]:
example1 = train_images[4]example2 = train_images[100]# labelsprint(train_labels[4], train_labels[100])

Which numbers represent these data points?

Answer:

You ensure that you are working with the correct format.

In [None]:
# prepare data points# normalization has already occurredexample1 = example1.reshape((1, 28, 28, 1))example2 = example2.reshape((1, 28, 28, 1))

The method `predict()` returns an array indicating how confident the model is that the given data point is a 0, a 1, a 2, etc., in that order. These certainties are expressed in percent.

In [None]:
# testingnetwork.predict(example1)

In [None]:
# testingnetwork.predict(example2)

Fill in and remove what doesn't fit:
The model is most certain that it is a .... <br>That certainty is .... <br>Correct/Incorrectly classified!

<div style='color: #690027;' markdown="1">
<h3>3.3 Performance of the model</h3></div>

Just because the model performs well on the training data does not mean it also performs well on unseen data. Therefore, you check what the loss and accuracy are on the test data.

In [None]:
test_loss, test_acc = network.evaluate(test_images, test_labels)

In [None]:
print('test_loss:', test_loss)print('test_acc:', test_acc)

Compare the accuracy on the test set with that on the training set. Does the model generalize well?

Answer:

<div style='color: #690027;' markdown="1">
<h3>3.4 Testing model on own data</h3></div>

Can the model also recognize handwritten numbers? Try it out.First, upload the necessary files.

In [None]:
# three images of handwritten numbers# 28 by 28 pixels, white on dark background# normalizedzeven = np.loadtxt("data/zeven.dat")       # ´loadtxt´ for dat-file, ´load´ for npy-filefour = np.loadtxt("data/four.dat")two = np.loadtxt("data/two.dat")

In [None]:
print(seven)

In [None]:
print(np.min(seven), np.max(seven))

In [None]:
plt.figure()
plt.subplot(1,3,1)plt.imshow(seven, cmap="gray")plt.subplot(1,3,2)plt.imshow(four, cmap="gray")plt.subplot(1,3,3)plt.imshow(two, cmap="gray")
plt.show()

Do these numbers sufficiently resemble those of the dataset?Why is that important?

Answer:

The data points take the form of matrices.

See how the model performs on these figures.

In [None]:
# preparing data, already normalizedzeven = zeven.reshape((1, 28, 28, 1))              # reshape into tensor that contains 1 vectorfour = four.reshape((1, 28, 28, 1))two = two.reshape((1, 28, 28, 1))

In [None]:
network.predict(seven)

In [None]:
network.predict(four)

In [None]:
network.predict(two)

How does the model perform on these self-written numbers?<br>Answer:

### Assignment 3.1Write some numbers yourself and test whether the model correctly reads your handwriting!

Decision:

<div style='color: #690027;' markdown="1">
<h2>4. Searching for a better model</h2></div>

### Assignment 4.1Adjust the number of neurons and the number of epochs in the network architecture to improve the performance of the network.
Who achieves the best accuracy?

Tip: The difference between the training accuracy and the test accuracy is important. If the training accuracy is higher than the test accuracy, it is referred to as *overfitting*: the fact that the model will perform worse on new data than on the training data.

### Assignment 4.2Test your model on your own numbers.

<div style='color: #690027;' markdown="1">
<h2>5. Evaluation</h2></div>

The best accuracy that my model achieves is ........ for the training set, .... for the validation set and .... for the test set.

What do you think of this task?

.............

<div>
<h2>Reference List</h2></div>

[1] Chollet, F. (2018). *Deep learning with Python*. Manning publications co.<br>[2] Getting started with the Keras Sequential model. Consulted on September 25, 2019 via https://keras.io/getting-started/sequential-model-guide/.

<div>
<h2>With support from</h2></div>

<img src="images/kikssteun2.png" alt="Banner" width="1100"/>

<img src="images/cclic.png" alt="Banner" align="left" width:="100"/><br><br>
Notebook KIKS, see <a href="http://www.aiopschool.be">AI At School</a>, by F. wyffels & N. Gesquière is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.