# Classify handwritten digits -- MNIST

Let's look at a concrete example of a neural network that uses the Python library Keras to learn to classify handwritten digits.

The problem we're trying to solve here is to classify graysacle images of handwritten digits (28 * 28 pixels) into their 10 categories (0 through 9). We'll use the MNIST dataset, a classic in the machine-learning community, which has been around almost as long as the field itself and has been intensively studied. 

##### About MNIST 
It's a set of 60,000 training images, plus 10,000 test images, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s. 

You can think of "solving" MNIST as the "Hello World" of deep learning -- it's what you do to verify that your algorithms are working as expected. 

In [10]:
# Loading the MNIST dataset in Keras

from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()


# Let's look at the training data
print(train_images.shape)
len(train_labels)

# Then the test data
print(test_images.shape)
len(test_labels)
test_labels

(60000, 28, 28)
(10000, 28, 28)


array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

In [11]:
# The network architecture
from tensorflow.keras import models
from tensorflow.keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation = 'relu', input_shape = (28*28,)))
network.add(layers.Dense(10, activation = 'softmax'))

### The compilation step

network.compile(optimizer = 'rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

## Question 1
1.1 How many images are there in the training data?

Solution: There are 60,000 images in the training data. Each image is in grayscale with 28*28 pixels.

1.2 How many images in the testing data?

Solution: There are 10,000 images in the testing data.

1.3 How many layers are there in the neural network above?

Solution: There are one input layer, one hidden layer, and one output layer in this neural network. Here, our network consists of a sequence of two Dense layers, which are densly connected (also called fully connected) neural layers. The second (and last) layer is a 10-way softmax layer, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes. 

1.4 What was the optimizer chosen in the example above? What other optimizers do you know in the deep learning models?

Solution: The optimixer chosen in this example is "rmsprop".
Generally, there are families of optimizers: 1. Gradient Descent Optimizers, e.g. SGD in Keras; 2. Adaptive optimizers, e.g. RMSprop, Adam.

1.5 Why 'categorical_crossentropy', not 'mean_squared_error' was used as the loss function?

Solution: Because this question is a classification problem, ‘categorical_crossentropy’ is commonly used for this type of quesiton. For regression problems, 'mean-squared_error' is more common. 

In [12]:
### Preparing the image data

### We transform the training image data into a float32 array of shape (60000, 28*28) with values between 0 and 1
train_images = train_images.reshape((60000, 28*28))
train_images = train_images.astype('float32')/255


### Similarly for the testing data
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

### We also need to categorically encode the labels
from tensorflow.keras.utils import to_categorical

# print(train_labels.shape)
# print(test_labels.shape)

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# print(train_labels.shape)
# print(test_labels.shape)


In [13]:
### Train the network;
network.fit(train_images, train_labels, epochs = 5, batch_size = 128)

### Make prediction on the testing data;
test_loss, test_acc = network.evaluate(test_images, test_labels)

print('test_acc: ', test_acc)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
test_acc:  0.9782999753952026


## Question 2
2.1 Please use the '.shape' function to compare the shape of training or testing data before and after transformation. What is the shape of the train_images when training the network by network.fit()?

Solution: Previously, the training images were stored in an array of shape (60000, 28, 28) of type
uint8 with values in the [0, 255] interval. We transform it into a float32 array of
shape (60000, 28 * 28) with values between 0 and 1.

What is uint8 in Numpy? Range from 0 to 255. Pixels are uint8 [0,255].
What is float32 in Numpy? numpy.float32 is a single precision float. It's double precision counterpart is numpy.float64.

2.2 What is the accuracy on the training data? 

Solution: 98.86%


2.3 What is the accuracy on the testing data?

Solution: 97.83%

2.4 Please repeat the procedure above for 5 times. Did the training accuracy and testing accuracy change?

Solution: Yes, it changes along with the different ways to have the epoches.