# Example 2 - MNIST with a Shallow Feedforward Network
We will use a shallow feedforward neural network to learn the famous MNIST dataset.

First, as usual, we import Numpy to hold the data, and we import Learny McLearnface.

In [1]:
import numpy as np
import LearnyMcLearnface as lml
%matplotlib inline

Now, we will set up the MNIST dataset.  This dataset consists of a training set of 60,000 28x28 images of handwritten digits, along with a test set of 10,000 different images of the same size.  We will use a built-in utility to import the data.

Since these images are 28x28 grayscale pixels, they will be stretched into single dimensional vectors of length 784.
Then, the training and test image sets are combined into one large matrix each, where each row is a single 784 image.

There are also two vectors of image classifications (lengths 60,000 and 10,000) which correspond to these two matrices.


In [2]:
training_images, training_labels, test_images, test_labels = lml.utils.get_mnist()

It is common to have separate validation and test sets, so we will split MNIST's test set in half to accomodate this. The first 5000 images will be our validation set, and the rest will be our test set.

In [3]:
X_train = training_images
y_train = training_labels
X_val = test_images[:5000, :]
y_val = test_labels[:5000]
X_test = test_images[5000:, :]
y_test = test_labels[5000:]

Now, we wrap our training and validation sets in a dictionary as usual, so that we may feed it to Learny McLearnface.

In [4]:
data = {
    'X_train' : X_train,
    'y_train' : y_train,
    'X_val' : X_val,
    'y_val' : y_val
}

Now we will create our model. We will use a similar architecture to Example 1: a shallow fully-connected network with 500 hidden layer neurons, ReLU activations, and a Softmax classifier.

Since this network will be taking 28x28 images, its input dimension will be 28*28. We will use the Xavier scheme to initialize our parameters.

In [5]:
opts = {
    'input_dim' : 28*28,
    'init_scheme' : 'xavier'
}

And then we build the model.

In [6]:
nn = lml.NeuralNetwork(opts)
nn.add_layer('Affine', {'neurons':600})
nn.add_layer('ReLU', {})
nn.add_layer('Affine', {'neurons':10})
nn.add_layer('SoftmaxLoss', {})

Like before, we will use a Trainer in order to fit the model to the MNIST dataset. We will use stochastic gradient descent, and we will use a learning rate of 1e-2 and a regularization constant of 1e-8. We will also train the model for 10 epochs.

In [7]:
opts = {
    'update_options' : {'update_rule' : 'sgd', 'learning_rate' : 1e-2},
    'reg_param' : 1e-8,
    'num_epochs' : 10
}

We then use a Trainer object in order to train the model to the data, with the chosen options.

In [8]:
trainer = lml.Trainer(nn, data, opts)

We will evaluate the model once before training. Since the model is randomly initialized, we will expect the model's predictions to be essentially random as well. As there are 10 classes, we will expect an initial accuracy roughly near 10%.

In [9]:
accuracy = trainer.accuracy(X_val, y_val)
print('Initial model accuracy:', accuracy)

Initial model accuracy: 0.1242


Now, we train the model.

In [10]:
trainer.train()

  loss = np.sum(-np.log(probabilities[(range(N), y)])) / N


Epoch 1 of 10 Validation accuracy: 0.8348
Epoch 2 of 10 Validation accuracy: 0.9128
Epoch 3 of 10 Validation accuracy: 0.8846
Epoch 4 of 10 Validation accuracy: 0.8758
Epoch 5 of 10 Validation accuracy: 0.9216
Epoch 6 of 10 Validation accuracy: 0.9276
Epoch 7 of 10 Validation accuracy: 0.926
Epoch 8 of 10 Validation accuracy: 0.9312
Epoch 9 of 10 Validation accuracy: 0.9372
Epoch 10 of 10 Validation accuracy: 0.9336


And then, we print the final test set accuracy. With the hyperparameters we used earlier, we can expect a test set accuracy of roughly 92-95%. It is clear that the model has fit the dataset pretty well. However, with a bit of hyperparameter optimization, it can do even better. Feel free to experiment with the hyperparameter values, and try to beat our final accuracy!

In [12]:
accuracy = trainer.accuracy(X_test, y_test)
print('Initial model accuracy:', accuracy)

Initial model accuracy: 0.961
