# 2. The Mathematical Building Blocks of Neural Networks

Let’s look at a concrete example of a neural network that uses the Python library Keras to learn to classify handwritten digits.

In [4]:
# 2.1 Loading the MNIST dataset in Keras
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images and train_labels form the training set, the data that the model will learn from. The model will then be tested on the test set, test_images and test_labels.

In [5]:
# Let's look at the training data:
print("Train images shape =", train_images.shape)
print("Length of train labels =", train_labels)
print("The train labels:", train_labels)

# Here's the test data:
print("The shape of the test images =",test_images.shape)
print("The length of the test labels =", len(test_labels))
print("The test labels: ", test_labels)

Train images shape = (60000, 28, 28)
Length of train labels = [5 0 4 ... 5 6 8]
The train labels: [5 0 4 ... 5 6 8]
The shape of the test images = (10000, 28, 28)
The length of the test labels = 10000
The test labels:  [7 2 1 ... 4 5 6]


### The workflow will be as follows: First, we’ll feed the neural network the training data, train_images and train_labels. The network will then learn to associate images and labels. Finally, we’ll ask the network to produce predictions for test_images, and we’ll verify whether these predictions match the labels from test_labels.

In [6]:
# 2.2 The network architecture
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
    layers.Dense(512, activation="relu"),
    layers.Dense(10, activation="softmax")
])

The core building block of neural networks is the layer. You can think of a layer as a filter for data: some data goes in, and it comes out in a more useful form. Specifically, layers extract representations out of the data fed into them—hopefully, representations that are more meaningful for the problem at hand.<br><br>
To make the model ready for training, we need to pick three more things, as part of the compilation step:<br>
<br>• An optimizer—The mechanism through which the model will update itself based on the training data it sees, so as to improve its performance.<br><br>
<br>• A loss function—How the model will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.<br><br>
<br>• Metrics to monitor during training and testing—Here, we’ll only care about accuracy (the fraction of the images that were correctly classified).

In [8]:
# 2.3 The compliation step
model.compile(optimizer="rmsprop",
              loss= "sparse_categorical_crossentropy",
              metrics = ["accuracy"])

Before training, we’ll preprocess the data by reshaping it into the shape the model expects and scaling it so that all values are in the [0, 1] interval.

In [9]:
train_images = train_images.reshape((60000, 28*28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28*28))
test_images = test_images.astype("float32") / 255

We’re now ready to train the model, which in Keras is done via a call to the model’s fit() method—we fit the model to its training data:

In [15]:
model.fit(train_images, train_labels, epochs = 5, batch_size = 128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x166eb80a0>

Now that we have a trained model, you can use it to predict class probabilities for new digits—images that weren’t part of the training data, like those from the test set:

In [11]:
# 2.6 Using the model to make predictions
test_digits = test_images[0:10]
predictions = model.predict(test_digits)
predictions[0] # the highest score is at index 7

2021-09-04 10:50:50.556916: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


array([9.3156616e-10, 4.6683359e-11, 5.6938075e-06, 7.4108299e-05,
       1.6151157e-12, 1.1291430e-08, 2.2101914e-14, 9.9991918e-01,
       1.6538783e-08, 9.2814599e-07], dtype=float32)

In [12]:
# Checking to see if the first digit is a 7:
predictions[0].argmax()
predictions[0][7]

0.9999192

We can check that the test label agrees:

In [13]:
test_labels[0]

7

On average, how good is our model at classifying such never-seen-before digits? Let’s check by computing average accuracy over the entire test set.

In [17]:
test_loss, test_acc =model.evaluate(test_images, test_labels)
print(f"test_acc: {test_acc}")

test_acc: 0.9814000725746155
