# Using Experiments with the MNIST Dataset and Keras

## Environment Setup

- Image: TensorFlow 2.6 CPU Optimized
- Kernel: Python 3
- Instance type: ml.t3.medium

## Background

This notebook uses the SageMaker SDK to demonstrate how to use experiments.  Using the MNIST dataset (a large database of handwritten digits), we use Keras to train the model.  To write log metrics back to Experiments, we use a Keras callback.

This notebook has been adapted from the [SageMaker examples](https://github.com/aws/amazon-sagemaker-examples/blob/main//sagemaker-experiments/local_experiment_tracking/keras_experiment.ipynb).

## Initialize Environment and Variables

In [7]:
import sys

In [14]:
# Make sure we have the latest versions of the SDKs
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install --upgrade boto3
!{sys.executable} -m pip install --upgrade sagemaker
!{sys.executable} -m pip install --upgrade tensorflow

[0mCollecting protobuf==3.20.1
  Downloading protobuf-3.20.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hInstalling collected packages: protobuf
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.20.3
    Uninstalling protobuf-3.20.3:
      Successfully uninstalled protobuf-3.20.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.12.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.20.1 which is incompatible.
tensorflow-io 0.21.0 requires tensorflow<2.7.0,>=2.6.0, but you have tensorflow 2.12.0 which is incompatible.
tensorflow-io 0.21.0 requires tensorflow-io-gcs-filesystem==0.21.0, but you have tensor

In [15]:
import json
import boto3
import sagemaker
from sagemaker.session import Session
from sagemaker import get_execution_role

# Get the SageMaker and boto sessions, plus the execution role from the SageMaker domain
sagemaker_session = Session()
boto_sess = boto3.Session()

role = get_execution_role()

default_bucket = sagemaker_session.default_bucket()

sm = boto_sess.client("sagemaker")
region = boto_sess.region_name

TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

---
## Data

For this lesson, we're using the MNIST dataset, downloading it from the SageMaker sample files.

In [6]:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
import pandas as pd

In [7]:
!mkdir -p datasets

In [8]:
num_classes = 10
input_shape = (28, 28, 1)

s3 = boto3.client("s3")

train_path = "datasets/input_train.npy"
test_path = "datasets/input_test.npy"
train_labels_path = "datasets/input_train_labels.npy"
test_labels_path = "datasets/input_test_labels.npy"

# Load the data and split it between train and test sets
s3.download_file("sagemaker-sample-files", "datasets/image/MNIST/numpy/input_train.npy", train_path)
s3.download_file("sagemaker-sample-files", "datasets/image/MNIST/numpy/input_test.npy", test_path)
s3.download_file(
    "sagemaker-sample-files", "datasets/image/MNIST/numpy/input_train_labels.npy", train_labels_path
)
s3.download_file(
    "sagemaker-sample-files", "datasets/image/MNIST/numpy/input_test_labels.npy", test_labels_path
)

In [9]:
# Prepare the data for training
x_train = np.load(train_path)
x_test = np.load(test_path)
y_train = np.load(train_labels_path)
y_test = np.load(test_labels_path)

# Reshape the arrays
x_train = np.reshape(x_train, (60000, 28, 28))
x_test = np.reshape(x_test, (10000, 28, 28))
y_train = np.reshape(y_train, (60000,))
y_test = np.reshape(y_test, (10000,))

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255

# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)

print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

# Convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


## Build the Model

In [10]:
def get_model(dropout=0.5):
    """ """
    model = keras.Sequential(
        [
            keras.Input(shape=input_shape),
            layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.Flatten(),
            layers.Dropout(dropout),
            layers.Dense(num_classes, activation="softmax"),
        ]
    )
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

    return model

## Define the Keras Callback

The Keras Callback class provides a method *on_epoch_end*, which emits metrics at the end of each epoch. All emitted metrics will be logged in the run passed to the callback.

In [11]:
class ExperimentCallback(keras.callbacks.Callback):
    """ """

    def __init__(self, run, model, x_test, y_test):
        """Save params in constructor"""
        self.run = run
        self.model = model
        self.x_test = x_test
        self.y_test = y_test

    def on_epoch_end(self, epoch, logs=None):
        """ """
        keys = list(logs.keys())
        for key in keys:
            self.run.log_metric(name=key, value=logs[key], step=epoch)
            print("{} -> {}".format(key, logs[key]))

## Set up a SageMaker Experiment and its Runs, then Train

Next, we train the Keras model locally on the same instance where this notebook is running.  With each run, we track the input artifacts and write them to files.  We use the ExperimentCallback method to log the metrics to the Experiment run.

In [12]:
from sagemaker.experiments.run import Run

batch_size = 10
epochs = 3
dropout = 0.5

model = get_model(dropout)

experiment_name = "mnist-keras-experiment"
run_name = "mnist-keras-batch-size-10"
with Run(experiment_name=experiment_name, run_name=run_name, sagemaker_session=sagemaker_session) as run:
    run.log_parameter("batch_size", batch_size)
    run.log_parameter("epochs", epochs)
    run.log_parameter("dropout", dropout)

    run.log_file("datasets/input_train.npy", is_output=False)
    run.log_file("datasets/input_test.npy", is_output=False)
    run.log_file("datasets/input_train_labels.npy", is_output=False)
    run.log_file("datasets/input_test_labels.npy", is_output=False)

    # Train locally
    model.fit(
        x_train,
        y_train,
        batch_size=batch_size,
        epochs=epochs,
        validation_split=0.1,
        callbacks=[ExperimentCallback(run, model, x_test, y_test)],
    )

    score = model.evaluate(x_test, y_test, verbose=0)
    print("Test loss:", score[0])
    print("Test accuracy:", score[1])

    run.log_metric(name="Final Test Loss", value=score[0])
    run.log_metric(name="Final Test Accuracy", value=score[1])

Epoch 1/3
accuracy -> 0.9449074268341064
val_loss -> 0.05294153466820717
val_accuracy -> 0.984666645526886
Epoch 2/3
accuracy -> 0.976111114025116
val_loss -> 0.03639725595712662
val_accuracy -> 0.9901666641235352
Epoch 3/3
accuracy -> 0.9810000061988831
val_loss -> 0.036584287881851196
val_accuracy -> 0.9890000224113464
Test loss: 0.030803684145212173
Test accuracy: 0.9897000193595886


In [15]:
# Delete the experiment
from sagemaker.experiments.experiment import Experiment

exp = Experiment.load(experiment_name="mnist-keras-experiment", sagemaker_session=sagemaker_session)
exp._delete_all(action="--force")