# Classifying digits using a fully connected neural network

In this practical exercice a fully connected neural network (also called multi-layer perceptron) is built using keras. It is then trained to classify image digits from the MNIST database.

Some baseline results:

| Method                                                                      | Test error (%) |
|-----------------------------------------------------------------------------|---------------:|
| Linear classifier (LeCun et al. 1998)                                       |           12.0 |
| K-nearest-neighbors, Euclidean (L2) (LeCun et al. 1998)                     |            5.0 |
| 3-layer NN, 500-300, softmax, cross entropy, weight decay (Hinton, 2005)    |            1.5 |
| Convolutional net LeNet-4 (LeCun et al. 1998)                               |            1.1 |
| Virtual SVM deg-9 poly [data augmentation] (LeCun et al. 1998)              |            0.8 |
| 6-layer NN with [data augmentation] (Ciresan et al. 2010)                   |           0.35 |
| Deep conv. net, 7 layers [data augmentation] (Ciresan et al. IJCAI 2011)    |           0.35 |

More results are available from: http://yann.lecun.com/exdb/mnist/

Try to improve on some of these results, at least on those that do not use data augmentation or convolutional neural networks.

In [None]:
import keras
from keras.datasets import mnist as db
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import matplotlib.pyplot as plt

# Magic used by the notebook to show figures inline
%matplotlib inline
# matplotlib default values
plt.rcParams['figure.figsize'] = (10.0, 8.0)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# auto-reloading packages
%load_ext autoreload
%autoreload 2



In [None]:
# Load and have a look at the data
(x, y), (x_test_ori, y_test_ori) = db.load_data()

# Visualize a single digit, with its class
index = 2
plt.imshow(x[index])
print("Class: ", y[index])

In [None]:
# Data management
val_nb = 5000  # number of validation samples
nb_samples = x.shape[0]

if val_nb > nb_samples:
    raise ValueError("You need some samples to train your network!")

x = x.reshape(nb_samples, 784)
x_test = x_test_ori.reshape(x_test_ori.shape[0], 784)
x = x.astype('float32')
x_test = x_test.astype('float32')
x /= 255
x_test /= 255

x_val = x[:val_nb, ]
x_train = x[val_nb:, ]
y_val = y[:val_nb]
y_train = y[val_nb:]

print(x_train.shape, 'x train samples')
print(x_val.shape, 'x val samples')
print(x_test.shape, 'x test samples')
print(y_train.shape, 'y train samples')
print(y_val.shape, 'y val samples')
print(y_test_ori.shape, 'y test samples')

# convert class vectors to binary class matrices
num_classes = max(y) + 1
y_train = keras.utils.to_categorical(y_train, num_classes)
y_val = keras.utils.to_categorical(y_val, num_classes)
y_test = keras.utils.to_categorical(y_test_ori, num_classes)


# Model definition

The following model uses keras to build a fully convolutional network. It has to respect some constraints:

- The input shape has to match the size of each input sample. 
- The ouptput should be of size 10 (num_classes)

Other than that, you change the number of layers as well as the size of each of them.

In [None]:
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(x_train.shape[1],)))
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

model.summary()


# Model training

The following section takes care of training.

Firstly, the model has to be 'compiled'. This operations lets the user choose the loss, the optimizer and the metrics, then configures the model for training.

Secondly, the 'fit' method runs the optmization. Training and validation data are specified here, as well as batch size and the number of epochs.

In [None]:
from keras.optimizers import SGD
batch_size = 128
epochs = 20
learning_rate = 0.1

model.compile(loss='categorical_crossentropy',
              optimizer=SGD(lr=learning_rate),
              metrics=['accuracy'])

output = model.fit(x_train, y_train,
                   batch_size=batch_size,
                   epochs=epochs,
                   validation_data=(x_val, y_val),
)


# Analysis of the results

Visualizing what is going on is extremely important. For that:

- inspecting traning and validation performance is essential;

- looking at the errors might also be interesting.

Is there overfitting? How can it be reduced?
Is the network 'confident' when making errors?


In [None]:
plt.plot(output.epoch, output.history['loss'], label='train')
plt.plot(output.epoch, output.history['val_loss'], label='val')
plt.title('Training and validation performance')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend()
# plt.ylim(0.2, 0.8)

In [None]:
y_predict_proba = model.predict(x_test)
# print("Prediction example:", y_predict_proba[0])

y_predict = np.argmax(y_predict_proba, 1)
# print(y_predict[3000:3010])

diff = y_test_ori != y_predict
# print("Difference mask: ", diff)
x_test_errors = x_test_ori[diff]
y_test_errors = y_test_ori[diff]
y_predict_errors = y_predict[diff]
y_predict_proba_errors = y_predict_proba[diff]

index = 0
print("Correct label is: ", y_test_errors[index])
print("Predicted label is: ", y_predict_errors[index])
print("Probabilities: ", y_predict_proba_errors[index])
plt.imshow(x_test_errors[index])

# Testing

Testing is the last stage of the learning process. Good practice recommends to do it only once, when you have completely finished with the optimization of the network parameters and hyperparameters.

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

# Experimenting with a more complex database

In the second cell, you can replace:

<code>from keras.datasets import mnist as db</code>

with:

<code>from keras.datasets import fashion_mnist as db</code>

in order to experiment with a more complex database. The best test accuracy reported on this database is 0.967 (see https://github.com/zalandoresearch/fashion-mnist).

You can use the following dictionary to transform number labels into meaningfull labels:

In [None]:
fashion_dict = {
    0: "T-shirt/top",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle boot"
    }

print(fashion_dict[2])

# Regularization

If you run into overfitting problems, you can try to regularize your network. You can use L1 and L2 regularization and try different regularization weights.


In [None]:
from keras.regularizers import l1, l2

reg_weight = 0.01

model = Sequential()
model.add(Dense(
        128,
        kernel_regularizer=l1(reg_weight),
        activation='relu', 
        input_shape=(x_train.shape[1],)))
model.add(Dense(
        128,
        kernel_regularizer=l1(reg_weight),
        activation='relu'))
model.add(Dense(
        num_classes,
        kernel_regularizer=l1(reg_weight),
        activation='softmax'))