### 2.1 A first look at a neural network

The problem we are trying to solve here is to classify gray-scale images of handwritten digits (28x28) pixels into their 10 categories (0 through 9). We will use MNIST dataset. It has a set of 60,000 training images, plus 10,000 test images assembled by the National Institute of Standards and Technology.

- Data-points are called samples (Images in this case)
- The class associated with a specific sample is called a label (0 through 9)

#### Listing 2.1 Loading the MNIST dataset in Keras

In [1]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

`train_images` and `train_labels` form the *training set*, the data that the model will learn from, The model will then be tested on the *test set*, `test_images` and `test_labels`.

The images are encoded as Numpy arrays, and the labels are an array of digits, ranging from 0 to 9. The images and labels have a one-to-one correspondence.

In [2]:
# Let's look at the training data

print(f"Shape of train_images: {train_images.shape}")
print(f"Length of train_labels: {len(train_labels)}")
train_labels

Shape of train_images: (60000, 28, 28)
Length of train_labels: 60000


array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [3]:
# Let's look at the test data

print(f"Shape of test_images: {test_images.shape}")
print(f"Length of test_labels: {len(test_labels)}")
test_labels

Shape of test_images: (10000, 28, 28)
Length of test_labels: 10000


array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

The work flow will be as follows: First, we'll feed the neural network the training data, `train_images` and `train_labels`. The network will then learn to associate images and labels. Finally, we'll ask the network to produce predictions for `test_images`, and we'll verify whether these predictions match the labels from `test_labels`.

#### Listing 2.2 The network architecture

In [4]:
from tensorflow.keras import models, layers

In [5]:
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))
network.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


The core building block of neural networks is the *layer*, a data-processing module that you can think of as a filter for data. Some data goes in, and it comes out in a more useful form. Specifically, layers extract *representations* out of the data fed into them. Most of deep learning consists of chaining together simple layers that will implement a form of progressive *data distillation*. 

Here, our network consists of a sequence of two `Dense` layer, which are densely connected (also called *fully connected*) neural layers. The second layer is a 10-way *softmax* layer, which means it will return an array of 10 probability scores. Each score will be the probability that the current digit image belongs to one of our 10 digits classes.

To make the network ready for training, we need to pick three more things, as part of the *compilation* step:
- A **loss-function**: How the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.
- An **optimizer**: The mechanism through which the network will update itself based on the data it sees and its loss function.
- **Metrics** *to monitor during training and testing*: Here we'll only care about accuracy (the fraction of the images that were correctly classified).

#### Listing 2.3: The compilation step

In [6]:
network.compile(
    optimizer='rmsprop',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

Before training, we'll preprocess the data by reshaping it into the shape the network expects and scaling it so that all values are in the `[0, 1]` interval. Previously, our training images, for instance, were stored in an array of shape `(60000. 28, 28)` of type `uint8` with values in the `[0, 255]` interval. We transform it into a `float32` array of shape `(60000, 28 * 28)` with values between 0 and 1

#### Listing 2.4: Preparing the image data

In [7]:
# Reshape and normalize `train_images`
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

# Reshape and normalize `test_images`
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

We also need to categorically encode the labels

#### Listing 2.5: Preparing the labels

In [8]:
from tensorflow.keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

We're now ready to train the network, which in keras is done via a call to the network's `fit` method

In [9]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f63bab60e10>

Two quantities are displayed during training: the loss of the network over the training data, and the accuracy of the network over the training data. 
We quickly reach an accuracy of 0.989 (98.9%) on the training data. Now let's check that the model performs well on the test set, too:

In [10]:
test_loss, test_acc = network.evaluate(test_images, test_labels, verbose=0)
print(f"test_acc: {test_acc}")

test_acc: 0.980400025844574


The test-set accuracy turns out to be 97.6%, that's quite a bit lower than the training set accuracy. This gap between training accuracy and test accuracy is an example of *overfitting*. The fact that machine learning models tent to perform worse on new data than on their training data.