In [None]:
from keras.datasets import mnist

train_images: ndarray[uint8, tuple[int, int, int]]
train_labels: ndarray[uint8, tuple[int, int, int]]
test_images: ndarray[uint8, tuple[int, int, int]]
test_labels: ndarray[uint8, tuple[int, int, int]]
    
training_set, test_set = mnist.load_data()
train_images, train_labels = training_set
test_images, test_labels = test_set

# The layers
The core building block of neural networks is the layer, a data-processing module that you can think of as a filter for data.
Some data goes in, and it comes out in a more useful form. Layers extract representations out of the data fed into them—hopefully, representations that are more meaningful for the problem at hand. Most of the deep learning consists of chaining together simple layers that will implement a form of progressive data distillation.
A deep-learning model is like a sieve for data processing, made of a succession of increasingly refined data filters

In [None]:
from keras import models
from keras import layers
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

Our network consists of a sequence of two Dense layers, which are densely connected (called fully connected) neural layers. The second (and last) layer is a 10-way softmax layer, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes.
To make the network ready for training, we need to pick three more things, as part of the compilation step:
+ A loss function—How the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.
+ An optimizer—The mechanism through which the network will update itself based on the data it sees and its loss function.
+ Metrics to monitor during training and testing—Here, we’ll only care about accuracy (the fraction of the images that were correctly classified).

In [None]:
network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

Before training, we’ll preprocess the data by reshaping it into the shape the network expects and scaling it so that all values are in the [0, 1] interval. Our training images were stored in an array of shape (60000, 28, 28) of type uint8 with values in the [0, 255] interval. We transform it into a float32 array of shape (60000, 28 * 28) with values between 0 and 1.