# TensorFlow Keras MNIST Classifier - Local Example

This notebook trains and exports a Keras CNN-based classifier for the MNIST DIGITS dataset, performing all storage and computation here on the notebook instance where you run it.

To give a better idea for how you might apply the example to real-world problems, we follow through the data preparation and model build process manually - rather than relying on Keras built-in functions and data classes.

Can you figure out how to re-create this workflow using SageMaker more effectively?

See the accompanying **Instructions** notebook for more guidance!


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.models import Sequential

%matplotlib inline

print(f"Using TensorFlow version {tf.__version__}")
print(f"Keras version {tf.keras.__version__}")


## Explore the Data

Let's use the Keras built-in to load the MNIST data, but explore exactly what format that gives us:


In [None]:
(x_train_raw, y_train_raw), (x_test_raw, y_test_raw) = tf.keras.datasets.mnist.load_data()

print(f"x_train.shape {x_train_raw.shape}; dtype {x_train_raw.dtype}")
print(f"y_train.shape {y_train_raw.shape}; dtype {y_train_raw.dtype}")
print(f"x_test.shape {x_test_raw.shape}; dtype {x_test_raw.dtype}")
print(f"y_test.shape {y_test_raw.shape}; dtype {y_test_raw.dtype}")

fig = plt.figure(figsize=(14, 3))
ax = plt.subplot(1, 2, 1)
plt.hist(x_train_raw.flatten())
ax.set_title("Histogram of Training Image Data")
ax.set_ylabel("Frequency in Training Set")
ax.set_xlabel("Pixel Value")

ax = plt.subplot(1, 2, 2)
plt.hist(y_train_raw)
ax.set_title("Histogram of Training Set Labels")
ax.set_ylabel("Frequency in Training Set")
ax.set_xlabel("Y Label Value")

plt.show()


It looks like the data is pretty evenly distributed between labels 0-9, and our images are encoded by fixed-size 28x28 uint8 matrices from 0 to 255. Here we'll just plot a few examples to get a feel for them:


In [None]:
print("Some example images:")
fig = plt.figure(figsize=(14, 2))
for i in range(5):
    fig = plt.subplot(1, 5, i + 1)
    ax = plt.imshow(x_train_raw[i], cmap="gray")
    fig.set_title(f"Number {y_train_raw[i]}")
plt.show()


## Prepare the Data

Rather than just using this data in the nice pre-prepared format, let's assume MNIST is just a stand-in for a real image classification dataset... We'll save the images to disk in per-label folders, like a real dataset might be:


In [None]:
import os


In [None]:
!rm -rf data/train
!rm -rf data/test

# This can take a while due to the number of images
def save_to_disk(x, y, base_folder):
    """Save an image classification dataset to disk as JPEGs in label-named folders"""
    for ix in range(len(y)):
        label_str = "digit-%d" % y[ix]
        os.makedirs(os.path.join(base_folder, label_str), exist_ok=True)
        tf.keras.preprocessing.image.save_img(
            os.path.join(base_folder, label_str, "%s-%06d.jpg" % (label_str, ix)),
            # This function expects a channels dimension, which we'll add in first:
            np.expand_dims(x[ix], 0),
            data_format="channels_first"
        )

print("Saving training data...")
os.makedirs("data/train", exist_ok=True)
save_to_disk(x_train_raw, y_train_raw, "data/train")
print("Saving test data...")
os.makedirs("data/test", exist_ok=True)
save_to_disk(x_test_raw, y_test_raw, "data/test")
print("Done!")


## Load the Data

There wouldn't be much point saving the data in a typical format if we didn't load it in to our training process that way too! Let's do it:


In [None]:
# May as well clear out the old memory, in case you're running a small instance:
x_train_raw = None
y_train_raw = None
x_test_raw = None
y_test_raw = None


In [None]:
from PIL import Image
labels = sorted(os.listdir("data/train"))
n_labels = len(labels)

x_train = []
y_train = []
x_test = []
y_test = []
print("Loading label ", end="")
for ix_label in range(n_labels):
    label_str = labels[ix_label]
    print(f"{label_str}...", end="")
    trainfiles = filter(
        lambda s: s.endswith(".jpg"),
        os.listdir(os.path.join("data/train", label_str))
    )
    for filename in trainfiles:
        # Can't just use tf.keras.preprocessing.image.load_img(), because it doesn't close its file
        # handles! So get "Too many open files" error... Grr
        with open(os.path.join("data/train", label_str, filename), "rb") as imgfile:
            x_train.append(
                # Squeeze (drop) that extra channel dimension, to be consistent with prev format:
                np.squeeze(tf.keras.preprocessing.image.img_to_array(
                    Image.open(imgfile)
                ))
            )
            y_train.append(ix_label)
    # Repeat for test data:
    testfiles = filter(
        lambda s: s.endswith(".jpg"),
        os.listdir(os.path.join("data/test", label_str))
    )
    for filename in testfiles:
        with open(os.path.join("data/test", label_str, filename), "rb") as imgfile:
            x_test.append(
                np.squeeze(tf.keras.preprocessing.image.img_to_array(
                    Image.open(imgfile)
                ))
            )
            y_test.append(ix_label)
print()

print("Shuffling trainset...")
train_shuffled = [(x_train[ix], y_train[ix]) for ix in range(len(y_train))]
np.random.shuffle(train_shuffled)

x_train = np.array([datum[0] for datum in train_shuffled])
y_train = np.array([datum[1] for datum in train_shuffled])
train_shuffled = None

print("Shuffling testset...")
test_shuffled = [(x_test[ix], y_test[ix]) for ix in range(len(y_test))]
np.random.shuffle(test_shuffled)

x_test = np.array([datum[0] for datum in test_shuffled])
y_test = np.array([datum[1] for datum in test_shuffled])
test_shuffled = None

print("Done!")


**Before we go ahead**, let's just quickly validate that the data is the same distribution as the original... Just shuffled in order:


In [None]:
(x_train_raw, y_train_raw), (x_test_raw, y_test_raw) = tf.keras.datasets.mnist.load_data()
print(f"x_train.shape {x_train_raw.shape}; dtype {x_train_raw.dtype}")
print(f"y_train.shape {y_train_raw.shape}; dtype {y_train_raw.dtype}")
print(f"x_test.shape {x_test_raw.shape}; dtype {x_test_raw.dtype}")
print(f"y_test.shape {y_test_raw.shape}; dtype {y_test_raw.dtype}")

fig = plt.figure(figsize=(14, 3))
ax = plt.subplot(1, 2, 1)
plt.hist(x_train_raw.flatten())
ax.set_title("Histogram of Training Image Data")
ax.set_ylabel("Frequency in Training Set")
ax.set_xlabel("Pixel Value")

ax = plt.subplot(1, 2, 2)
plt.hist(y_train_raw)
ax.set_title("Histogram of Training Set Labels")
ax.set_ylabel("Frequency in Training Set")
ax.set_xlabel("Y Label Value")

plt.show()


In [None]:
print("Some example images:")
fig = plt.figure(figsize=(14, 2))
for i in range(5):
    fig = plt.subplot(1, 5, i + 1)
    ax = plt.imshow(x_train[i], cmap="gray")
    fig.set_title(f"Number {y_train[i]}")
plt.show()


You should find that the distributions haven't shifted, and the advertised labels still visually match the images!

## Pre-Process the Data for our CNN

We've recovered the dataset from our JPEG files back to the MNIST original format, and verified nothing's majorly broken...

Next, we'll tweak this format for our neural network: Normalizing pixel values to improve the numerical conditioning, and one-hot encoding our labels to suit a softmax classifier output.

Note in particular that our model expects both a batch dimension (for processing multiple samples in parallel) and a channel dimension (e.g. as if this were a 3-channel RGB image, except single-channel for grayscale) - as well as the X and Y axes.


In [None]:
# Since we're actually feeding the images in to nets this time, we should actually pay attention
# to which way around Keras wants the channel dimension:
if K.image_data_format() == "channels_first":
    x_train = np.expand_dims(x_train, 1)
    x_test = np.expand_dims(x_train, 1)
else:
    x_train = np.expand_dims(x_train, len(x_train.shape))
    x_test = np.expand_dims(x_test, len(x_test.shape))

x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
x_train /= 255
x_test /= 255

input_shape = x_train.shape[1:]

print("x_train shape:", x_train.shape)
print("input_shape:", input_shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

# convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, n_labels)
y_test = tf.keras.utils.to_categorical(y_test, n_labels)

print("n_labels:", n_labels)
print("y_train shape:", y_train.shape)


## Build a Model

At its core, the model is a 2D convolutional network with a softmax output layer that'll yield a confidence score for every possible label (e.g. 10 options for digit = 0 to 9).


In [None]:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(n_labels, activation="softmax"))

model.compile(
    loss=tf.keras.losses.categorical_crossentropy,
    optimizer=tf.keras.optimizers.Adadelta(),
    metrics=["accuracy"]
)


## Fit the Model

Keras makes fitting and evaluating the model straightforward enough: We don't have any fancy hooks, and are happy with the default logging:


In [None]:
batch_size = 128
epochs = 12

model.fit(
    x_train, y_train,
    batch_size=batch_size,
    epochs=epochs,
    shuffle=True,
    verbose=1, # Hint: You might prefer =2 for running in SageMaker!
    validation_data=(x_test, y_test)
)

score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])


## Save the Trained Model

Keras has a built-in `model.save()` command, which in TensorFlow v2 can directly produce TensorFlow Serving-compatible outputs!

...However, this notebook runs TensorFlow v1. To save you the frustration of figuring it out (there's a nice blog post on the subject [here](https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/)), we'll give you a hint by saving the model here in TensorFlow Serving-ready format.


In [None]:
# WARNING: Running this code will clear your TF session!

from tensorflow.python.saved_model.builder import SavedModelBuilder
from tensorflow.python.saved_model.signature_def_utils import predict_signature_def
from tensorflow.python.saved_model import tag_constants

# The export folder needs to be empty, or non-existent
!rm -rf data/model/export

# Note the export/modelID/version portions of the path are required structure for TFServing:
export_path = f"data/model/export/my-model/1"
os.makedirs(export_path, exist_ok=True)

# Freeze learning and take a copy of the model:
K.set_learning_phase(0)
config = model.get_config()
weights = model.get_weights()
model_copy = Sequential.from_config(config)
model_copy.set_weights(weights)

builder = SavedModelBuilder(export_path)
signature = predict_signature_def(
    inputs={ "inputs": model_copy.input },
    outputs={ "score": model_copy.output }
)

with K.get_session() as sess:
    # Save the meta graph and variables
    builder.add_meta_graph_and_variables(
        sess=sess,
        tags=[tag_constants.SERVING],
        signature_def_map={ "serving_default": signature }
    )
    builder.save()
    print("Model exported")

K.clear_session()
