# Demonstration: The MNIST dataset

## Importing the dataset

The script ``mnist.py`` can read MNIST data. The options ``vectorize`` converts it into a usable format.

In [1]:
import mnist

ModuleNotFoundError: No module named 'mnist'

In [None]:
training_set = mnist.read(path='mnist', dataset='training', vectorize=True)
testing_set = mnist.read(path='mnist', dataset='testing', vectorize=True)

## Constructing and training the network

We can now build the network. We'll use 784 input nodes, one for each pixel, and 10 output nodes, one for each label.

In [None]:
import pynn

In [None]:
network = pynn.Network([784, 10], activation='ReLU', softmax=1)

We'll use ReLU as the activation function and apply a softmax filter since this is a classification problem.

In [None]:
network.train(training_set, epochs=1, batchsize=1, eta=0.005)

## Predicting test samples

Now that the network is trained we can try to predict a sample from the training set.

In [None]:
prediction = network.predict(testing_set[42][0])
print(prediction)

This seems to be a 4. Let's check by showing the actual image alongside with it's label:

In [None]:
import numpy as np
mnist.show(np.array(testing_set[42][0]).reshape(28, 28))
print(testing_set[42][1])

Seems like the network was correct. But what is our actual error rate over the whole testing set?

In [None]:
num_wrong = 0
for sample in testing_set:
    prediction = network.predict(sample[0])
    prediction = np.argmax(prediction)
    expectation = np.argmax(sample[1])
    if not prediction == expectation:
        num_wrong += 1
error_rate = num_wrong / len(testing_set)
print("Achieved testing error rate: %.2f %%." % (error_rate * 100))

Only 9 %? That's pretty good. In fact, it's better than what LeCun et al. achieved 1998 using a similar setup. [1]

## Training for more epochs

Below is a graph illustrating the evolution of the testing and training errors when training for up to 50 epochs.

In [None]:
import matplotlib.pyplot as plt
data = np.loadtxt("data.txt")
plt.plot(data[:, 0], data[:, 2] * 100, label="Testing set")
plt.plot(data[:, 0], data[:, 1] * 100, linestyle="dashed", label="Training set")
plt.xlim(0, 50)
plt.ylim(6, 10)
plt.grid(linestyle="dashed", color="grey")
plt.xlabel("Epoch")
plt.ylabel(r"Error rate in %")
plt.legend()
plt.show()

# Literature

[1] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998