# Classifying handwritten digits

Let's take a short break from the theory and see a neural network in action.

  > Readings
  > * Chapter 6, Deep Feedforward Networks, Deep Learning, I. Goodfellow, Y. Bengio, and A. Courville, MIT Press, 2016.
  > * Pattern Recognition and Machine Learning, C. M. Bishop and others, Volume 1. Springer New York, 2006.

We will implement and train our first multilayer neural network to classify handwritten digits from the popular **Mixed National Institute of Standards and Technology (MNIST)** dataset that has been constructed by Yann LeCun and others, and serves as a popular benchmark dataset for machine learning algorithms.

## Obtaining the MNIST dataset

The MNIST dataset is publicly available at [MNIST](http://yann.lecun.com/exdb/mnist/) and consists of the following four parts:

  * Training set images: train-images-idx3-ubyte.gz
  * Training set labels: train-labels-idx1-ubyte.gz
  * Test set images: t10k-images-idx3-ubyte.gz
  * Test set labels: t10k-labels-idx1-ubyte.gz

In [None]:
# %load load_mnist.py

The `load_mnist` function returns two arrays, the first being an $n \times m$ dimensional NumPy array (images), where $n$ is the number of samples and $m$ is the number of features (here, pixels).

By executing the following code, we will now load the 60,000 training instances as well as the 10,000 test samples from the local directory where we unzipped the MNIST dataset (in the following code snippet, it is assumed that the downloaded MNIST files were unzipped to the same directory in which this code was executed):

In [None]:
# get data train and test

In [None]:
%matplotlib inline
# show some digits

In [None]:
# showing a representative image
fig, ax = plt.subplots(nrows=2,

In [None]:
!mkdir savez_data

In [None]:
#  plot multiple examples of the same digit

After we've gone through all the previous steps, it is a good idea to save the scaled images in a format that we can load more quickly into a new Python session to avoid the overhead of reading in and processing the data again. When we are working with NumPy arrays, an efficient yet most convenient method to save multidimensional arrays to disk is NumPy's `savez` function. In short, the `savez` function is analogous to Python's pickle.

The following code snippet will save both the training and test datasets to the archive file `mnist_scaled.npz`:

In [None]:
# save data into savez_compressed files.

In [None]:
!ls savez_data

In [None]:
# we can load the preprocessed MNIST image arrays using NumPy's load function as follows:


In [None]:
# The mnist variable now references to an object that can access the four data arrays 


In [None]:
# to load the training data into our current Python session,
# we will access the 'X_train' array as follows (similar to a Python dictionary)


In [None]:
# Using a list comprehension, we can retrieve all four data arrays as follows


We will now implement the code of an MLP with one input, one hidden, and one output layers to classify the images in the MNIST dataset. The code will contain parts that we have not talked about yet, such as the backpropagation algorithm, but most of the code should look familiar to you based on the Adaline implementation.

Do not worry if not all of the code makes immediate sense to you; we will follow up on certain parts later. However, going over the code at this stage can make it easier to follow the theory later.

In [None]:
# create the NeuralNetMLP class

Let's now initialize a new 784-100-10 MLP—a neural network with 784 input units (n_features), 100 hidden units (n_hidden), and 10 output units (n_output):

In [None]:
# use the NeuralNetMLP clas

Next, we train the MLP using 55,000 samples from the already shuffled MNIST training dataset and use the remaining 5,000 samples for validation during training. Note that training the neural network may take up to 5 minutes on standard desktop computer hardware.

As you may have noticed from the preceding code implementation, we implemented the `fit` method so that it takes four input arguments: training images, training labels, validation images, and validation labels. In neural network training, it is really useful to already compare training and validation accuracy during training, which helps us judge whether the network model performs well, given the architecture and hyperparameters.

In general, training (deep) neural networks is relatively expensive compared with the other models we discussed so far. Thus, we want to stop it early in certain circumstances and start over with different hyperparameter settings. Alternatively, if we find that it increasingly tends to overfit the training data (noticeable by an increasing gap between training and validation set performance), we may want to stop the training early as well.

Now, to start the training, we execute the following code:

In [None]:
# train the net

In our `NeuralNetMLP` implementation, we also defined an `eval_` attribute that collects the cost, training, and validation accuracy for each epoch so that we can visualize the results using Matplotlib:

In [None]:
# plot the cost for each epoch

As we can see, the cost decreased substantially during the first 100 epochs and seems to slowly converge in the last 100 epochs. However, the small slope between epoch 175 and epoch 200 indicates that the cost would further decrease with a training over additional epochs.

Next, let's take a look at the training and validation accuracy:

In [None]:
# show the accuracy

The plot reveals that the gap between training and validation accuracy increases the more epochs we train the network. At approximately the 50th epoch, the training and validation accuracy values are equal, and then, the network starts overfitting the training data.

Finally, let's evaluate the generalization performance of the model by calculating the prediction accuracy on the test set:

In [None]:
# predict the data test

Despite the slight overfitting on the training data, our relatively simple one-hidden layer neural network achieved a relatively good performance on the test dataset, similar to the validation set accuracy (97.98 percent).

Lastly, let's take a look at some of the images that our MLP struggles with:

In [None]:
# show results