# A first look at a neural network
------------------
The problem is about **classifying grayscale images of handwritten digits** (28 pixels by 28 pixels), into their 10 categories (0 to 9). The dataset we will use is the MNIST dataset. It's a set of 60,000 training images, plus 10,000 test images, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s.

The MNIST dataset comes pre-loaded in Keras, in the form of a set of four Numpy arrays.

In [1]:
import numpy as np 
import matplotlib.pyplot as plt 
import tensorflow as tf

from tensorflow.keras import layers, models, losses, optimizers, metrics
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

## Load dataset

`train_images` and `train_labels` form the "training set", the data that the model will learn from. The model will then be tested on the "test set", `test_images` and `test_labels`. The images are encoded as Numpy arrays, and the labels are simply an array of digits, ranging from 0 to 9. There is a one-to-one correspondence between the images and the labels.

In [14]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [15]:
train_images.shape

(60000, 28, 28)

In [16]:
train_labels.shape

(60000,)

In [17]:
test_images.shape

(10000, 28, 28)

In [18]:
train_labels.shape

(60000,)

In [19]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

## Build the network

- One dense layer with 512 units;
- A final layer with `softmax` activation function;

The core building block of neural networks is the "layer", a data-processing module which you can conceive as a "filter" for data. Some data comes in, and comes out in a more useful form. Precisely, layers extract representations out of the data fed into them -- hopefully representations that are more meaningful for the problem at hand. Most of deep learning really consists of chaining together simple layers which will implement a form of progressive "data distillation". A deep learning model is like a sieve for data processing, made of a succession of increasingly refined data filters -- the "layers".

Here our network consists of a sequence of two Dense layers, which are densely-connected (also called "fully-connected") neural layers. The second (and last) layer is a 10-way "softmax" layer, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes.

To make our network ready for training, we need to pick three more things, as part of "compilation" step:

A loss function: the is how the network will be able to measure how good a job it is doing on its training data, and thus how it will be able to steer itself in the right direction.
An optimizer: this is the mechanism through which the network will update itself based on the data it sees and its loss function.
Metrics to monitor during training and testing. Here we will only care about accuracy (the fraction of the images that were correctly classified).
The exact purpose of the loss function and the optimizer will be made clear throughout the next two chapters.

In [20]:
model = models.Sequential()

In [21]:
model.add(layers.Dense(units = 512, activation = "relu", input_shape = (28 * 28, )))
model.add(layers.Dense(units = 10, activation = "softmax"))

In [22]:
model.compile(optimizer = "rmsprop", loss = "categorical_crossentropy", metrics = ["accuracy"])

In [23]:
train_images = train_images.reshape(60000, (28 * 28))
train_images = train_images.astype("float32") / 255

test_images = test_images.reshape(10000, (28 * 28))
test_images = test_images.astype("float32") / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

## Train the model

Two quantities are being displayed during training: the "loss" of the network over the training data, and the accuracy of the network over the training data

In [25]:
model.fit(train_images, train_labels, epochs = 5, batch_size = 128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x1ef5788d490>

In [26]:
test_loss, test_acc = model.evaluate(test_images, test_labels)



In [27]:
print("Accuracy: ", test_acc * 100)
print("Loss: ", test_loss * 100)

Accuracy:  97.50000238418579
Loss:  7.902292162179947


The test set accuracy turns out to be *97.8%* -- that's quite a bit lower than the training set accuracy. This gap between training accuracy and test accuracy is an example of "overfitting", the fact that machine learning models tend to perform worse on new data than on their training data.