# Import libraries and modules

Let's start by importing numpy and setting a seed for the computer's pseudorandom number generator. This allows us to reproduce the results from our script:

In [2]:
import numpy as np
from tensorflow import keras

# Load image data 

MNIST is a great dataset for getting started with deep learning and computer vision. It's a big enough challenge to warrant neural networks, but it's manageable on a single computer. 

## The MNIST Dataset

![](assets/mnist.jpg)

The MNIST dataset is one of the most well studied datasets in the computer vision and machine learning literature. In many cases, it’s a benchmark, a standard to which some machine learning algorithms are ranked against.

The goal of this dataset is to correctly classify the handwritten digits 0-9. The data points are approximately uniformly distributed per digit, so no substantial class label imbalance exists.

Each feature vector is 784-dim, corresponding to the 28 x 28 grayscale pixel intensities of the image. These grayscale pixel intensities are unsigned integers, falling into the range [0, 255].

All digits are placed on a black background, with the foreground being white and shades of gray.

The Keras library conveniently includes it already. We can load it like so:

In [None]:
import mnist

# The first time you run this might be a bit slow, since the
# mnist package has to download and cache the data.
train_images = mnist.train_images()
train_labels = mnist.train_labels()


We can look at the shape of the dataset:



In [None]:
print(train_images.shape) # (60000, 28, 28)
print(train_labels.shape) # (60000,)

Great, so it appears that we have 60,000 samples in our training set, and the images are 28 pixels x 28 pixels each. We can confirm this by plotting the first sample in matplotlib:

In [None]:
from matplotlib import pyplot as plt
plt.imshow(train_images[0])

In general, when working with computer vision, it's helpful to visually plot the data before doing any algorithm work. It's a quick sanity check that can prevent easily avoidable mistakes (such as misinterpreting the data dimensions).

# Preprocess input data

We need to flatten each image before we can pass it into our neural network. We’ll also normalize the pixel values from [0, 255] to [-0.5, 0.5] to make our network easier to train (using smaller, centered values is often better).

In [None]:
import numpy as np
import mnist

train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images.
train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5

# Flatten the images.
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))

print(train_images.shape) # (60000, 784)
print(test_images.shape)  # (10000, 784)

We’re ready to start building our neural network!



# Building the Model

Every Keras model is either built using the [Sequential class](https://keras.io/api/models/sequential/), which represents a linear stack of layers, or the functional [Model class](https://keras.io/api/models/model/), which is more customizeable. We’ll be using the simpler Sequential model, since our network is indeed a linear stack of layers.

We start by instantiating a Sequential model:

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# WIP
model = Sequential([
  # layers...
])

The Sequential constructor takes an array of [Keras Layers](https://keras.io/api/layers/). Since we’re just building a standard feedforward network, we only need the [Dense layer](https://keras.io/api/layers/core_layers/#dense), which is your regular fully-connected (dense) network layer.

Let’s throw in 3 Dense layers:

In [None]:
# Still a WIP
model = Sequential([
  Dense(64, activation='relu'),
  Dense(64, activation='relu'),
  Dense(10, activation='softmax'),
])

The first two layers have 64 nodes each and use the ReLU activation function. The last layer is a Softmax output layer with 10 nodes, one for each class.

The last thing we always need to do is tell Keras what our network’s input will look like. We can do that by specifying an input_shape to the first layer in the Sequential model:

In [None]:
model = Sequential([
  Dense(64, activation='relu', input_shape=(784,)),
  Dense(64, activation='relu'),
  Dense(10, activation='softmax'),
])

Once the input shape is specified, Keras will automatically infer the shapes of inputs for later layers. We’ve finished defining our model! Here’s where we’re at:



In [None]:
import numpy as np
import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images.
train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5

# Flatten the images.
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))

# Build the model.
model = Sequential([
  Dense(64, activation='relu', input_shape=(784,)),
  Dense(64, activation='relu'),
  Dense(10, activation='softmax'),
])

# Compiling the Model

Before we can begin training, we need to configure the training process. We decide 3 key factors during the compilation step:

 - The optimizer. We’ll stick with a pretty good default: the Adam gradient-based optimizer. Keras has many other optimizers you can look into as well.
 
 
 - The loss function. Since we’re using a Softmax output layer, we’ll use the Cross-Entropy loss. Keras distinguishes between binary_crossentropy (2 classes) and categorical_crossentropy (>2 classes), so we’ll use the latter. See all Keras losses.
 
 
 - A list of metrics. Since this is a classification problem, we’ll just have Keras report on the accuracy metric.
 
 
Here’s what that compilation looks like:

In [None]:
model.compile(
  optimizer='adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)

# Training the Model

Training a model in Keras literally consists only of calling fit() and specifying some parameters. There are a lot of possible parameters, but we’ll only manually supply a few:

 - The training data (images and labels), commonly known as X and Y, respectively.
 
 
 - The number of epochs (iterations over the entire dataset) to train for.
 
 
 - The batch size (number of samples per gradient update) to use when training.
 
 
Here’s what that looks like:

In [None]:
model.fit(
  train_images, # training data
  train_labels, # training targets
  epochs=5,
  batch_size=32,
)

This doesn’t actually work yet, though - we overlooked one thing. Keras expects the training targets to be 10-dimensional vectors, since there are 10 nodes in our Softmax output layer, but we’re instead supplying a single integer representing the class for each image.

Conveniently, Keras has a utility method that fixes this exact issue: [to_categorical](https://keras.io/api/utils/#to_categorical). It turns our array of class integers into an array of one-hot vectors instead. For example, 2 would become [0, 0, 1, 0, 0, 0, 0, 0, 0, 0] (it’s zero-indexed).

We can now put everything together to train our network:

In [None]:
import numpy as np
import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images.
train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5

# Flatten the images.
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))

# Build the model.
model = Sequential([
  Dense(64, activation='relu', input_shape=(784,)),
  Dense(64, activation='relu'),
  Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
  optimizer='adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)

# Train the model.
model.fit(
  train_images,
  to_categorical(train_labels),
  epochs=5,
  batch_size=32,
)

We reached quite high training accuracy after 5 epochs! This doesn’t tell us much, though - we may be overfitting. The real challenge will be seeing how our model performs on our test data.

# Testing the Model

Evaluating the model is pretty simple:

In [None]:
model.evaluate(
  test_images,
  to_categorical(test_labels)
)

[evaluate()](https://keras.io/api/models/sequential/#evaluate) returns an array containing the test loss followed by any metrics we specified.vHow much accuracy have you achieved?

# Using the Model

Now that we have a working, trained model, let’s put it to use. The first thing we’ll do is save it to disk so we can load it back up anytime:

In [None]:
model.save_weights('model.h5')

We can now reload the trained model whenever we want by rebuilding it and loading in the saved weights:

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Build the model.
model = Sequential([
  Dense(64, activation='relu', input_shape=(784,)),
  Dense(64, activation='relu'),
  Dense(10, activation='softmax'),
])

# Load the model's saved weights.
model.load_weights('model.h5')

Using the trained model to make predictions is easy: we pass an array of inputs to predict() and it returns an array of outputs. Keep in mind that the output of our network is 10 probabilities (because of softmax), so we’ll use [np.argmax()](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html) to turn those into actual digits.

In [None]:
# Predict on the first 5 test images.
predictions = model.predict(test_images[:5])

# Print our model's predictions.
print(np.argmax(predictions, axis=1)) # [7, 2, 1, 0, 4]

# Check our predictions against the ground truths.
print(test_labels[:5]) # [7, 2, 1, 0, 4]

# Extensions

What we’ve covered so far was but a brief introduction - there’s much more we can do to experiment with and improve this network. 

## Tuning Hyperparameters
A good hyperparameter to start with is the learning rate for the Adam optimizer. What happens when you increase or decrease it?

What about the batch size and number of epochs?

## Network Depth

What happens if we remove or add more fully-connected layers? How does that affect training and/or the model’s final performance?

## Activations

What if we use an activation other than ReLU, e.g. sigmoid?

## Dropout
What if we tried adding Dropout layers, which are known to prevent overfitting?

## Validation

We can also use the testing dataset for validation during training. Keras will evaluate the model on the validation set at the end of each epoch and report the loss and any metrics we asked for. This allows us to monitor our model’s progress over time during training, which can be useful to identify overfitting and even support early stopping.

## Useful links
- https://www.tensorflow.org/datasets/keras_example
- https://www.tensorflow.org/quantum/tutorials/mnist