In [40]:
import keras
keras.__version__

'2.10.0'

# The first example of a neural network

If you don't understand all the elements of this example, don't worry. This is normal if you are not experienced with Keras or any other package similar to it. You probably haven't even installed Keras yet… It doesn't bother you at all. In the next chapter I will describe each element of this example in detail. Therefore, don't worry if some things seem to you to be black magic! We have to start somewhere.

In the presented example, we try to solve the problem of classification of grayscale images representing handwritten digits (these images have a resolution of 28x28 pixels). We want to divide them into 10 categories (numbers from 0 to 9). We will use the MNIST dataset, which is recognized by the analyst community as a classic dataset. It exists as long as the history of machine learning is long. This collection contains 60,000 training images and 10,000 test images. It was established by the National Institute of Standards and Technology (NIST) in the 1980s. The solution to this problem can be compared to the display of the words "Hello world!" while learning a new programming language. This set is also used to check that the algorithm is working properly. If you start working professionally with machine learning, you will find that the MNIST set appears repeatedly in various scientific papers, articles published on the Internet, and so on. Figure 2.1 shows some of the elements of that set.




```
# Formatted as code
```

The MNIST dataset is included with Keras in the form of four Numpy tables:

In [41]:
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Tablice train_images i train_labels create the training dataset. It will be used when training the model. For testing, we will use a test dataset, consisting of the arrays test_images and test_labels. The images are encoded as Numpy arrays and the labels are in the form of an array of numbers (0 through 9). Only one label is assigned to each image.

Let's take a look at the training dataset:

In [42]:
train_images.shape

(60000, 28, 28)

In [43]:
len(train_labels)

60000

In [44]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [45]:
import numpy as np

In [46]:
np.unique(train_labels)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8)

And now, let see how look testing data: 

In [47]:
test_images.shape

(10000, 28, 28)

In [48]:
len(test_labels)

10000

In [49]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

We will work according to the following workflow: first we will train the neural network on the training data: train_images and train_labels. The network will learn to associate images and labels. Then our network will generate predictions about the test_images set, and we will compare the results with the test_labels labels.

Let's build our network. As a reminder, you do not yet need to understand everything that is happening in this example.

In [50]:
from keras import models
from keras import layers
from keras.layers import Dropout

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(Dropout(0.4))
network.add(layers.Dense(256, activation='tanh', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

The main building block of a neural network is the layer. It is a data processing module that can be treated as a data filter. The data going out of the filter has a more useful form than the data going into it. Some layers extract representations of the data directed to them - these representations should help solve the problem we are struggling with. Most of deep learning consists of combining simple layers together to implement progressive data distillation. The deep learning model is like a sieve that processes data made up of finer and finer grids - layers.

Our network consists of a sequence of two Dense layers that are densely connected to each other (there is a dense connection here). The second layer is a ten-element softmax layer - this layer will return an array of 10 probability values ​​(the sum of all these values ​​equals 1). Each of these results determines the probability that a given image is represented by a given digit (the image may be one of ten digits).

During the compilation phase, we need to define three more things in order to prepare the network for training. These are:

* Loss function - this function defines how the network performance is measured while processing the training dataset, and thus allows you to tune the network parameters in the right direction.
* Optimizer - a mechanism for tuning the network based on the data returned by the loss functions.
* Metrics monitored during training and testing - here we are only interested in accuracy (the part of the images that has been properly classified).

In the next two chapters, I will explain the purpose of using the loss and optimizer functions.

In [51]:
network.compile(optimizer='adam',
                loss='categorical_crossentropy',
                metrics=['accuracy'])


Before we start training, we'll reshape the data so that it takes the shape expected by the network and scale it to a value in the range [0, 1]. Initially, our training images were saved in the form of a matrix with dimensions (60000, 28, 28), containing values ​​in the range [0, 255], and the uint8 type. We transform them into a float32 array with dimensions (60000, 28 * 28) containing values ​​from 0 to 1.

In [52]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

In [53]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

We have to additionally encode labels with categories

> Block with indentation


In [54]:
from tensorflow.keras.utils import to_categorical

In [55]:
# from tensorflow.keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [56]:
train_labels[0]

array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)

In [58]:
network.fit(train_images, train_labels, epochs=10, batch_size=512)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1e5816e17b0>

During training, two values ​​are displayed: net loss network accuracy (both values ​​are for the training dataset).

During training, we quickly achieve accuracy of 0.989 (98.9%). Now we can check the processing accuracy of the test dataset:

In [59]:
test_loss, test_acc = network.evaluate(test_images, test_labels)



In [60]:
print('test_acc:', test_acc)

test_acc: 0.9850999712944031



In the case of the test data set, we obtained an accuracy of 97.8%, which is a slightly lower value than for the training set. The difference between these values ​​is due to overfitting. Machine learning models tend to process new data less accurately than training data. This issue is at the heart of Chapter 3.

So much for our first example - you just saw that building and training a neural network to classify digits by hand can take less than 20 lines of Python code. In the next chapter, I will go into detail about all the lines of this code and explain what operations are performed in the background. You will learn what tensors, network data storage objects, and tensor operations are. You will learn about the structure of network layers and the gradient descent algorithm that enables networks to learn from a training dataset.