<a href="https://colab.research.google.com/github/Machine-Learning-Tokyo/intro-to-DL/blob/master/mnist_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## MNIST model

In this notebook we will see how we can train a simple model on the mnist dataset

First, we import some necessary packages, like keras for the model, pyplot for plotting the images and numpy for the array caclucations

In [0]:
import keras
from keras.datasets import mnist, fashion_mnist
from keras.models import Sequential, Model
from keras.layers import *
from keras.activations import softmax, relu
import keras.backend as K
from keras.utils import to_categorical

import matplotlib.pyplot as plt
import numpy as np

The first thing we need is the data. The data are small (28x28 pixels) gray scale images of hand-written digits.

Notice the line


```
X_train, X_val = X_train / 255.0, X_val / 255.0
```

Originally the images' pixels have values in [0, 255]
However that big values are not easy to be handled by the networks. Thus we usually change the input values to something more "*model friendly*".

This is called data preprocessing.

In our case the preprocessing is just to map the values from [0, 255] to [0. 1]



In [0]:
(X_train, Y_train), (X_val, Y_val) = mnist.load_data()
# (X_train, Y_train), (X_val, Y_val) = fashion_mnist.load_data()
X_train, X_val = X_train / 255.0, X_val / 255.0
labels_names = 'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine'
# labels_names = 'T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'
r, c = 3, 3

It is very important to know the shape of the arrays we use.

All the data that we use when training DL models are actually n-dimensional arrays with some values. No matter if it was originally a video, an image, a voice record or a text, in the end everything is transformed to arrays.
The shape of the array is important since the models are built to be able to accept specific kind of arrays regarding the shape

In [0]:
print('The shape of the traing data array is:', X_train.shape)

In the next cell we define some functions for getting random images from the dataset and plotting them. Don't pay too much attention to them for the moment.

In [0]:
def get_random_imgs_labels(X_set, Y_set, n_imgs):
  inds = np.random.randint(0, len(X_set), n_imgs)
  images, labels = X_set[inds], Y_set[inds]
  return images, labels

def plot_images(images, labels, preds=None):
  fig, axs = plt.subplots(r, c)
  cnt = 0
  for i in range(r):
    for j in range(c):
      axs[i, j].imshow(images[cnt], cmap='gray')
      axs[i, j].axis('off')
      title = labels_names[labels[cnt]] if preds is None else '%s/%s' % (labels_names[labels[cnt]], labels_names[preds[cnt]])
      axs[i, j].set_title(title, fontsize=16)
      cnt += 1
  plt.show()

Let's plot some of the images together with their labels to see what the look like

In [0]:
images, labels = get_random_imgs_labels(X_train, Y_train, r*c)
plot_images(images, labels)

Now we have to build the model we will use to predict the labe of a given image.

The model we will use has two fully connected layers and outputs numbers that can be interpreted as probabilities for each one of the ten labels (numbers).

We use the label with the highest probability as predicted label.

In [0]:
K.clear_session()
model = Sequential([
    Flatten(input_shape=X_train.shape[1:]),
    Dense(32, activation='relu'),
    Dense(len(labels_names), activation='softmax'),
])
model.summary()

Before we start the training of the model we need to define the target and the way to achieve it.

In our case we want the model's predicted probabilities for each label to be as close as possible to the given probabilities for each label. And since in each case we have only one correct label, we want ideally the model to return probability 1 for the correct label and 0 for the rest ones.

To ahcieve this we use a *loss function*. In our case the loss function will be the *categorical crossentropy*.

Also we need to define a method based upon the model will try to minimize the loss function.

The method (also called optimizer since it optimize the model's parameters) that we will use is Adam.

We don't need to get into too much details for this one.

In [0]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])

Now the model has been randomly initialized which means that the output will be mostly wrong. Let's see some examples. We can run the next cell more than once to obtain different randomly selected images and the corresponding (true and predicted) labels

In [0]:
images, labels = get_random_imgs_labels(X_val, Y_val, 9)
predictions = model.predict_on_batch(images)
predictions = np.argmax(predictions, 1)

print('correct: %d out of %d' % (np.sum(labels == predictions), len(labels)))
plot_images(images, labels, predictions)

Now let's train the model for some epochs (one epoch is one pass through the how training dataset) and see if we get better results.

After training the model we can run again the previous cell to obtain the results of the trained model.

Of course we can train the model more than once by repeatidly excecuting the next cell and obtain the results by runing the previous one.

In [0]:
history = model.fit(X_train, to_categorical(Y_train), validation_data=(X_val, to_categorical(Y_val)), epochs=10)

That was it!

We trained a model to classify images of handwritten digits.

Now a more challenging task will be to classify images of clothes.

In order to do this uncomment (delete the #) these lines from the second cell:



```
# (X_train, Y_train), (X_val, Y_val) = fashion_mnist.load_data()

# labels_names = 'T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'
```

You will notice that it will be harder for the model to get very high accuracy and training it more times will not increase the accuracy after a point.

In this case we need to make some changes to get better results.

These changes could be related to the model (architecture, depth, width), the optimizer, the preprocessing the regularization the loss function etc.

## The end
