# <font color='red'>MNIST (a simple NN)</font>

Import MNIST.

In [None]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [None]:
train_images.shape

In [None]:
test_images.shape

In [None]:
train_images.dtype

In [None]:
len(train_labels)

In [None]:
test_images.dtype

In [None]:
train_labels

In [None]:
test_images.shape

In [None]:
len(test_labels)

In [None]:
test_labels

The workflow will be as follows: 
1.   we’ll feed the neural network the training data, `train_images` and `train_labels`. The network will then learn to associate images and labels. 
2.   we’ll ask the network to produce predictions for `test_images`, and we’ll verify whether these predictions match the labels from test_labels.

In [None]:
# the network architecture
from tensorflow.keras import models
from tensorflow.keras import layers
my_network = models.Sequential([
  layers.Dense(512, activation='relu'),
  layers.Dense(10, activation='softmax')
])

A **layer** is a data-processing module that you can think of as a filter for data. Layers extract *representations* out of the data fed into them.


In this specific case, our NN consists of:

*   a sequence of two `Dense` layers, which are `densely connected` (also called *fully connected*) neural layers
*   the second (and last) layer is a bit special: it is a 10-way `softmax` layer, which will return an array of 10 numbers, our *probability scores* (summing to 1). 

To make the network ready for training, we need to pick 3 more things, in the compilation step:

*   A **loss function** (`loss` below)
*   An **optimizer** (`optimizer` below)
*   Some **metrics** (`metrics` below) 


In [None]:
# the compilation step
my_network.compile(optimizer='rmsprop',
                   loss='sparse_categorical_crossentropy',
                   metrics=['accuracy'])

Before training, we’ll preprocess the data by reshaping it into the shape the network expects and scaling it so that all values are in the [0, 1] interval. 



In [None]:
# Preparing the image data
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
#
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

In [None]:
train_images.shape

In [None]:
train_images.dtype

In [None]:
test_images.shape

In [None]:
test_images.dtype

We’re now ready to train the network, via the `fit` method. 

In [None]:
# Fit the NN
my_network.fit(train_images, train_labels, epochs=5, batch_size=128)

Two quantities are displayed during training:

*   the loss of the network over the training data
*   the accuracy of the network over the training data.

We quickly reach a high accuracy e.g. 0.98 or 0.99 (98% or 99%) on the training data. 

Now let’s check that the model performs well on the test set, too:

In [None]:
test_loss, test_acc = my_network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

The test set accuracy should turn out to be a bit lower than the training set accuracy. This gap between training accuracy and test accuracy is an example of overfitting.