##### Loading the MNIST dataset in Keras

In [1]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

The images are encoded as Numpy arrays, and the labels are an array of digits, ranging from 0 to 9. The images and labels have a one-to-one correspondence

A look at the training data:

In [2]:
train_images.shape

(60000, 28, 28)

In [3]:
len(train_labels)

60000

In [4]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

And here's the tests data

In [5]:
test_images.shape

(10000, 28, 28)

In [6]:
len(test_labels)

10000

In [7]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

Workflow: firstly feed the neural network with the training data ('train_images' and 'train_labels'), then ask the network to produce predictions for 'test_images', and lastly verify whether these predictions match the labels from 'test_labels'

##### The network architecture

In [8]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation = 'relu', input_shape = (28 * 28, )))
network.add(layers.Dense(10, activation = 'softmax'))

2022-03-27 21:43:45.458743: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-27 21:43:45.478086: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-27 21:43:45.478235: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-27 21:43:45.479228: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags

In [9]:
network.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 512)               401920    
                                                                 
 dense_1 (Dense)             (None, 10)                5130      
                                                                 
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


The core building block of neural networks is the 'layer'. Specifically, layers extract representations out of the data, representations that are more meaningful for the problem at hand. Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation. A deep-learning model is like a sieve for data processing, made of a succession of increasingly refinded data filters - the layers. 

Here the network consists of a sequence of two 'Dense' layers, which are denseely connected (also called fully connected) neural layers. The second (last) layer is a 10-way 'softmax' layer, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 110 digit classes. 

##### The compilation step

The 'compilation' step for training:

In [10]:
network.compile(optimizer = 'rmsprop', 
               loss = 'categorical_crossentropy', 
               metrics = ('accuracy'))

 - A 'loss function' -- how the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction 

 - A 'optimizer' -- the mechanism through which the network will update itself based on the data it sees and its loss function

 - 'Metrics to monitor during training and testing' -- here only cares about accuracy (the fraction of the images that were correctly classified)

##### Preparing the image data

preprocess the data by reshaping it into the shape the network expects and scaling it so that all values are in the [0, 1] interval

In [11]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

In [12]:
train_images.shape

(60000, 784)

In [13]:
test_images.shape

(10000, 784)

Original training images were sorted in an array of shape (60000, 28, 28) of type 'uint8' with values in the [0, 255] interval. We transform it into a 'float32' array of shape (60000, 28, 28) with values between 0 and 1

##### Preparing the labels

for the most recent version of keras: 'from keras.utils import' is now 'from tensorflow.keras.utils import'

In [14]:
from tensorflow.keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [15]:
train_labels

array([[0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.]], dtype=float32)

In [16]:
test_labels

array([[0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

##### Train the model 

keras trains the network via a call to the network's 'fit' method

In [17]:
network.fit(train_images, 
            train_labels, 
            epochs = 5, 
            batch_size = 128)

Epoch 1/5
 54/469 [==>...........................] - ETA: 1s - loss: 0.6398 - accuracy: 0.8196

2022-03-27 21:43:47.048140: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f55ce5f3a30>

two quantities are displayed during training: the loss and the accuracy of the network over the training data

The accuracy achieved 0.9887 (98.87%) on the training data

Now check the model performance on the test set:

In [18]:
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

test_acc: 0.9789000153541565


The gap between training accuracy and test accuracy is an example of 'overfitting', the fact that machine-learning models tend to perform worse on new data than on their training data