Python import statements import libraries of ready-to-go capabilities

In [None]:
import random                    # so we can display random images from the dataset
import tensorflow as tf          # tensorflow is a deep learning framework. 'tf' is a commonly-used nickname
from tensorflow import keras     # tensorflow.keras offers higher-level control of common tensorflow tasks
import numpy as np               # numpy is everything with arrays and matrices
import matplotlib.pyplot as plt  # most common python graphing library

The tensorflow keras library includes some datasets for training. The Modified National Institute of Standards and Technology (MNIST) dataset is 70000 images of handwritten numerical digits; each a 28x28 pixel image, and corresponding labels specifying the right answer for each.

The dataset is pre-divided into 60000 images for training the neural network, and 10000 held back from the training set for independent testing.

In [None]:
(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()
print(len(train_images), 'training images')
print(len(test_images),  'test images')

Just a little data prep here, we scale the pixel values from the range 0..255 to 0..1.0 (numpy makes it simple, every value in the 28x28 array gets scaled by the one divisor).
class_names are string/text labels corresponding to the numerical labels.

In [None]:
train_images = train_images / 255.0  # rescale from 0..255 to 0..1.0
test_images  = test_images  / 255.0 
class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

Let's take a look at what some of these handwritten digits look like. This python code plots 25 of them in a 5x5 grid. In the middle of the loop you can switch between which=i and which=random... to control which 25 get displayed.

If you display enough random digits, can you find some that look ambiguous and might be difficult? Make a note of the index, we can see later how the network did.

In [None]:
# This cell shows 5x5 training images (first 25 or random 25)
plt.figure(figsize=(10,10))# 10x10 'inches'?
for i in range(25):        # i = 0...24
    plt.subplot(5,5, i+1)  # in a 5x5 grid, setup subplot 1...25
    plt.xticks([])         # don't use xticks, yticks, or grid
    plt.yticks([])
    plt.grid(False)
    which = i                                       # show the ith training image
    #which = random.randint(0,len(train_images)-1)  # show a random image
    plt.imshow(train_images[which], cmap=plt.cm.binary) # show image in plot
    caption = 'train[{}] is a {}'.format(which, train_labels[which])
    plt.xlabel(caption)        # caption with corresp. label
plt.show()  # after all 25 subplots are set up, show the plot

This is where we set up the structure of the neural network. And run the training

The first Flatten layer maps the individual pixel values from their 28x28 grid to an array of 784 values.

The next layer is Dense, which means an arc from each of the 784 first-layer nodes, to every 2nd-layer nodes. 'relu' is the most common 'activation function', and all it does is check whether the sum of scaled/biased inputs is positive or not. If it is positive, it 'fires' by outputting that value. If it is negative, it doesn't fire.

The final layer (also dense) must have 10 nodes, because we are classifying the 10 different digits. 'softmax' takes the 10 numerical values that accumulate in the 10 nodes, and rescales them so they are positive and sum to 1. This way we can interpret them as probabilities.

Note for the middle layer we can choose more or fewer nodes, but the outer layers have to fit the input and output. Also we could add more intermediate layers.

In [None]:
# Model architecture
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(10, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

# 'compile' basically means get ready to run
model.compile(optimizer='adam',
             loss='sparse_categorical_crossentropy',
             metrics=['accuracy'])

# This is where the computation (evaluation of models in the candidate network, comparison to truth results,
# and back-propagation from the errors to update the coefficients)
model.fit(train_images, 
          train_labels, 
          epochs=1)     # More epochs can be added until it converges (stops improving)

Now that the model is trained (the network coefficients have been fit to the training data), we test it by evaluating on the test images it has never seen.

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
predictions = model.predict(test_images)

Now that we have applied the trained model to predict classifications for all the test data, let's take a look at a few to see how they compare to truth (the provided labels).

Set i to a different number and run the following cells to investigate.

This first cell looks at the predictions array (the 10 numbers in the final layer of the network, after pushing the image through). The largest value is what digit is predicted.

In [None]:
i=0    # i=0 is the first test image (python always counts starting from 0 not 1)

preds = predictions[i]
print('Predictions:', preds)
ansa = test_labels[i]
pred = np.argmax(preds)
print('Correct answer is',      ansa)
print('Largest prediction is:', pred)
if pred == ansa:
    print('Prediction is correct')
else:
    print('WRONG')

This cell shows the image for test data i. Does it look like the prediction? If the prediction is wrong, does it make sense why it could have chosen that wrong prediction?

In [None]:
plt.figure(figsize=(2,2)) # 2x2 inches
plt.grid(False)
plt.xticks([])
plt.yticks([])
plt.imshow(test_images[i], cmap=plt.cm.binary)
plt.show()

This cell plots the predictions as a bar graph

In [None]:
plt.figure(figsize=(2,2))
plt.grid(False)
plt.xticks(range(10)) # xticks at 0,1,...9, matching the digit labels
plt.yticks([])
barplot = plt.bar(range(10), preds)

Repeat the cells above, setting i to a different index (any of the test images 0...9999), looking at the results for various test data.

Below is more complex code that graphs a large number of test results, with the bar graph red to highlight wrong answers.

In [None]:
# This function plots one image with a blue or red caption
def plot_image(i, predictions_array, true_label, img):
  plt.grid(False)
  plt.xticks([])
  plt.yticks([])

  plt.imshow(img, cmap=plt.cm.binary)

  predicted_label = np.argmax(predictions_array)
  if predicted_label == true_label:
    color = 'blue'
  else:
    color = 'red'

  plt.xlabel("test {} {:2.0f}% ({})".format(i,
                                100*np.max(predictions_array),
                                class_names[true_label]),
                                color=color)

In [None]:
# This function plots the corresponding bar graph, red if it's wrong
def plot_value_array(i, predictions_array, true_label):
  predictions_array, true_label = predictions_array, true_label[i]
  plt.grid(False)
  plt.xticks(range(10))
  plt.yticks([])
  thisplot = plt.bar(range(10), predictions_array, color="#777777")
  plt.ylim([0, 1])
  predicted_label = np.argmax(predictions_array)

  thisplot[predicted_label].set_color('red')
  thisplot[true_label].set_color('blue')

In [None]:
# This uses the functions above to graph images and bar graphs in a grid.
# In the middle again choose either which=i for the first results, or which=random
num_rows=6
num_cols=4
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
    which = i
    #which = random.randint(0, len(test_images)-1)
    plt.subplot(num_rows, 2*num_cols, 2*i+1)
    plot_image(which, predictions[which], test_labels[which], test_images[which])
    plt.subplot(num_rows, 2*num_cols, 2*i+2)
    plot_value_array(which, predictions[which], test_labels)
plt.tight_layout()
plt.show()