Classifying handwritten digits

In [4]:

from keras.datasets import mnist

In [5]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Let’s look at the training data:

In [6]:
train_images.shape

(60000, 28, 28)

In [7]:
len(train_labels)

60000

In [8]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

 And here’s the test data:

In [9]:
test_images.shape

(10000, 28, 28)

In [10]:
len(test_labels)

10000

The network architecture

In [11]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


The compilation step

In [12]:
network.compile(optimizer='rmsprop',
   loss='categorical_crossentropy',
   metrics=['accuracy'])

 Before training, we’ll preprocess the data by reshaping it into the shape the network
 expects and scaling it so that all values are in the [0, 1] interval. Previously, our train
ing images, for instance, were stored in an array of shape (60000, 28, 28) of type
 uint8 with values in the [0, 255] interval. We transform it into a float32 array of
 shape (60000, 28 * 28) with values between 0 and 1.

 Preparing the image data

In [13]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

We also need to categorically encode the labels

 Preparing the labels

In [14]:
from keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

We’re now ready to train the network, which in Keras is done via a call to the net
work’s fit method—we fit the model to its training data:

In [15]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.8756 - loss: 0.4422
Epoch 2/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.9671 - loss: 0.1151
Epoch 3/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.9789 - loss: 0.0720
Epoch 4/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.9855 - loss: 0.0491
Epoch 5/5
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.9891 - loss: 0.0375


<keras.src.callbacks.history.History at 0x2d2445e8890>

Two quantities are displayed during training: the loss of the network over the training
 data, and the accuracy of the network over the training data.
 We quickly reach an accuracy of 0.989 (98.9%) on the training data. Now let’s
 check that the model performs well on the test set, too:

In [16]:
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9778 - loss: 0.0742
test_acc: 0.9817000031471252


 The test-set accuracy turns out to be 98.1% —that’s quite a bit lower than the training
 set accuracy. This gap between training accuracy and test accuracy is an example of
 overfitting: the fact that machine-learning models tend to perform worse on new data
 than on their training data.