# Run a SageMaker Experiment with MNIST Handwritten Digits Classification

This demo shows how you can use the [SageMaker Experiments Python SDK](https://sagemaker-experiments.readthedocs.io/en/latest/) to organize, track, compare, and evaluate your machine learning (ML) model training experiments.

You can track artifacts for experiments, including data sets, algorithms, hyperparameters, and metrics. Experiments executed on SageMaker such as SageMaker Autopilot jobs and training jobs are automatically tracked. You can also track artifacts for additional steps within an ML workflow that come before or after model training, such as data pre-processing or post-training model evaluation.

The APIs also let you search and browse your current and past experiments, compare experiments, and identify best-performing models.

We demonstrate these capabilities through an MNIST handwritten digits classification example. The experiment is organized as follows:

1. Download and prepare the MNIST dataset.
2. Train a Convolutional Neural Network (CNN) Model. Tune the hyperparameter that configures the number of hidden channels in the model. Track the parameter configurations and resulting model accuracy using the SageMaker Experiments Python SDK.
3. Finally use the search and analytics capabilities of the SDK to search, compare and evaluate the performance of all model versions generated from model tuning in Step 2.
4. We also show an example of tracing the complete lineage of a model version: the collection of all the data pre-processing and training configurations and inputs that went into creating that model version.

Make sure you select the `Python 3 (Data Science)` kernel in Studio, or `conda_pytorch_p36` in a notebook instance.

## Runtime

This notebook takes approximately 25 minutes to run.

## Contents

1. [Install modules](#Install-modules)
1. [Setup](#Setup)
1. [Download the dataset](#Download-the-dataset)
1. [Step 1: Set up the Experiment](#Step-1:-Set-up-the-Experiment)
1. [Step 2: Track Experiment](#Step-2:-Track-Experiment)
1. [Deploy an endpoint for the best training job / trial component](#Deploy-an-endpoint-for-the-best-training-job-/-trial-component)
1. [Cleanup](#Cleanup)
1. [Contact](#Contact)

## Install modules

In [41]:
import sys

### Install the SageMaker Experiments Python SDK

In [42]:
!{sys.executable} -m pip install sagemaker-experiments==0.1.35

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.1.2[0m[39;49m -> [0m[32;49m22.2.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


### Install PyTorch

In [43]:
# PyTorch version needs to be the same in both the notebook instance and the training job container
# https://github.com/pytorch/pytorch/issues/25214
!{sys.executable} -m pip install torch==1.1.0
!{sys.executable} -m pip install torchvision==0.2.2
!{sys.executable} -m pip install pillow==6.2.2
!{sys.executable} -m pip install --upgrade sagemaker

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.1.2[0m[39;49m -> [0m[32;49m22.2.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.1.2[0m[39;49m -> [0m[32;49m22.2.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.1.2[0m[39;49m -> [0m[32;49m22.2.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Looking in indexes:

## Setup

In [44]:
import time

import boto3
import numpy as np
import pandas as pd
from IPython.display import set_matplotlib_formats
from matplotlib import pyplot as plt
from torchvision import datasets, transforms

import sagemaker
from sagemaker import get_execution_role
from sagemaker.session import Session
from sagemaker.analytics import ExperimentAnalytics

from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from smexperiments.tracker import Tracker

set_matplotlib_formats("retina")



In [45]:
sm_sess = sagemaker.Session()
sess = sm_sess.boto_session
sm = sm_sess.sagemaker_client
role = get_execution_role()
region = sess.region_name

## Download the dataset
We download the MNIST handwritten digits dataset, and then apply a transformation on each image.

In [46]:
bucket = sm_sess.default_bucket()
prefix = "DEMO-mnist"
print("Using S3 location: s3://" + bucket + "/" + prefix + "/")

datasets.MNIST.urls = [
    "https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/train-images-idx3-ubyte.gz",
    "https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/train-labels-idx1-ubyte.gz",
    "https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/t10k-images-idx3-ubyte.gz",
    "https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/t10k-labels-idx1-ubyte.gz",
]

# Download the dataset to the ./mnist folder, and load and transform (normalize) them
train_set = datasets.MNIST(
    "mnist",
    train=True,
    transform=transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]
    ),
    download=True,
)

test_set = datasets.MNIST(
    "mnist",
    train=False,
    transform=transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]
    ),
    download=False,
)

Using S3 location: s3://sagemaker-us-west-2-706553727873/DEMO-mnist/


View an example image from the dataset.

In [47]:
plt.imshow(train_set.data[2].numpy())

<matplotlib.image.AxesImage at 0x7fb43acdadd0>

After transforming the images in the dataset, we upload it to S3.

In [48]:
inputs = sagemaker.Session().upload_data(path="mnist", bucket=bucket, key_prefix=prefix)

Now let's track the parameters from the data pre-processing step.

In [49]:
with Tracker.create(display_name="Preprocessing", sagemaker_boto_client=sm) as tracker:
    tracker.log_parameters(
        {
            "normalization_mean": 0.1307,
            "normalization_std": 0.3081,
        }
    )
    # We can log the S3 uri to the dataset we just uploaded
    tracker.log_input(name="mnist-dataset", media_type="s3/uri", value=inputs)

## Step 1: Set up the Experiment

Create an experiment to track all the model training iterations. Experiments are a great way to organize your data science work. You can create experiments to organize all your model development work for: [1] a business use case you are addressing (e.g. create experiment named “customer churn prediction”), or [2] a data science team that owns the experiment (e.g. create experiment named “marketing analytics experiment”), or [3] a specific data science and ML project. Think of it as a “folder” for organizing your “files”.

### Create an Experiment

In [50]:
mnist_experiment = Experiment.create(
    experiment_name=f"mnist-hand-written-digits-classification-{int(time.time())}",
    description="Classification of mnist hand-written digits",
    sagemaker_boto_client=sm,
)
print(mnist_experiment)

Experiment(sagemaker_boto_client=<botocore.client.SageMaker object at 0x7fb4a00a7e90>,experiment_name='mnist-hand-written-digits-classification-1660138330',description='Classification of mnist hand-written digits',tags=None,experiment_arn='arn:aws:sagemaker:us-west-2:706553727873:experiment/mnist-hand-written-digits-classification-1660138330',response_metadata={'RequestId': '5b6ce9c5-04da-4df3-b03e-38d754b7b1a9', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '5b6ce9c5-04da-4df3-b03e-38d754b7b1a9', 'content-type': 'application/x-amz-json-1.1', 'content-length': '123', 'date': 'Wed, 10 Aug 2022 13:32:09 GMT'}, 'RetryAttempts': 0})


## Step 2: Track Experiment
### Now create a Trial for each training run to track its inputs, parameters, and metrics.
While training the CNN model on SageMaker, we experiment with several values for the number of hidden channel in the model. We create a Trial to track each training job run. We also create a TrialComponent from the tracker we created before, and add to the Trial. This enriches the Trial with the parameters we captured from the data pre-processing stage.

In [51]:
from sagemaker.pytorch import PyTorch, PyTorchModel

In [52]:
hidden_channel_trial_name_map = {}

If you want to run the following five training jobs in parallel, you may need to increase your resource limit. Here we run them sequentially.

In [53]:
preprocessing_trial_component = tracker.trial_component

In [54]:
for i, num_hidden_channel in enumerate([2, 5, 10, 20, 32]):
    # Create trial
    trial_name = f"cnn-training-job-{num_hidden_channel}-hidden-channels-{int(time.time())}"
    cnn_trial = Trial.create(
        trial_name=trial_name,
        experiment_name=mnist_experiment.experiment_name,
        sagemaker_boto_client=sm,
    )
    hidden_channel_trial_name_map[num_hidden_channel] = trial_name

    # Associate the proprocessing trial component with the current trial
    cnn_trial.add_trial_component(preprocessing_trial_component)

    # All input configurations, parameters, and metrics specified in
    # the estimator definition are automatically tracked
    estimator = PyTorch(
        py_version="py3",
        entry_point="./mnist.py",
        role=role,
        sagemaker_session=sagemaker.Session(sagemaker_client=sm),
        framework_version="1.1.0",
        instance_count=1,
        instance_type="ml.c4.xlarge",
        hyperparameters={
            "epochs": 2,
            "backend": "gloo",
            "hidden_channels": num_hidden_channel,
            "dropout": 0.2,
            "kernel_size": 5,
            "optimizer": "sgd",
        },
        metric_definitions=[
            {"Name": "train:loss", "Regex": "Train Loss: (.*?);"},
            {"Name": "test:loss", "Regex": "Test Average loss: (.*?),"},
            {"Name": "test:accuracy", "Regex": "Test Accuracy: (.*?)%;"},
        ],
        enable_sagemaker_metrics=True,
    )

    cnn_training_job_name = "cnn-training-job-{}".format(int(time.time()))

    # Associate the estimator with the Experiment and Trial
    estimator.fit(
        inputs={"training": inputs},
        job_name=cnn_training_job_name,
        experiment_config={
            "TrialName": cnn_trial.trial_name,
            "TrialComponentDisplayName": "Training",
        },
        wait=True,
    )

    # Wait two seconds before dispatching the next training job
    time.sleep(2)

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: cnn-training-job-1660138330


2022-08-10 13:32:11 Starting - Starting the training job...
2022-08-10 13:32:34 Starting - Preparing the instances for trainingProfilerReport-1660138331: InProgress
.........
2022-08-10 13:33:55 Downloading - Downloading input data...
2022-08-10 13:34:35 Training - Downloading the training image.....[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2022-08-10 13:35:24,388 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2022-08-10 13:35:24,391 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-08-10 13:35:24,403 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2022-08-10 13:35:24,404 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2022-08-10 13:35:24,865 sagemaker-containers INFO     Module mnist does not provide a setup.py. [0m
[3

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: cnn-training-job-1660138619


2022-08-10 13:37:00 Starting - Starting the training job...
2022-08-10 13:37:24 Starting - Preparing the instances for trainingProfilerReport-1660138620: InProgress
.........
2022-08-10 13:38:44 Downloading - Downloading input data...
2022-08-10 13:39:24 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2022-08-10 13:39:23,269 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2022-08-10 13:39:23,272 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-08-10 13:39:23,283 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2022-08-10 13:39:23,284 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2022-08-10 13:39:23,622 sagemaker-containers INFO     Module mnist does not provid

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: cnn-training-job-1660138845


2022-08-10 13:40:45 Starting - Starting the training job...
2022-08-10 13:41:09 Starting - Preparing the instances for trainingProfilerReport-1660138845: InProgress
.........
2022-08-10 13:42:29 Downloading - Downloading input data...
2022-08-10 13:43:10 Training - Training image download completed. Training in progress.[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2022-08-10 13:43:06,056 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2022-08-10 13:43:06,060 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-08-10 13:43:06,077 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2022-08-10 13:43:06,078 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2022-08-10 13:43:06,488 sagemaker-containers INFO     Module mnist does not provide

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: cnn-training-job-1660139101


2022-08-10 13:45:02 Starting - Starting the training job...
2022-08-10 13:45:27 Starting - Preparing the instances for trainingProfilerReport-1660139101: InProgress
......
2022-08-10 13:46:31 Downloading - Downloading input data.....[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2022-08-10 13:47:15,343 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2022-08-10 13:47:15,347 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-08-10 13:47:15,358 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2022-08-10 13:47:15,359 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2022-08-10 13:47:15,706 sagemaker-containers INFO     Module mnist does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34m2022-08-10 13:47:15,706 sagemaker-con

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: cnn-training-job-1660139359


2022-08-10 13:49:19 Starting - Starting the training job...
2022-08-10 13:49:43 Starting - Preparing the instances for trainingProfilerReport-1660139359: InProgress
.........
2022-08-10 13:51:03 Downloading - Downloading input data...
2022-08-10 13:51:43 Training - Training image download completed. Training in progress.[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2022-08-10 13:51:38,014 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2022-08-10 13:51:38,018 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-08-10 13:51:38,039 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2022-08-10 13:51:38,040 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2022-08-10 13:51:38,417 sagemaker-containers INFO     Module mnist does not provide

### Compare the model training runs for an experiment

Now we use the analytics capabilities of the Experiments SDK to query and compare the training runs for identifying the best model produced by our experiment. You can retrieve trial components by using a search expression.

### Some Simple Analyses

In [55]:
search_expression = {
    "Filters": [
        {
            "Name": "DisplayName",
            "Operator": "Equals",
            "Value": "Training",
        }
    ],
}

In [56]:
trial_component_analytics = ExperimentAnalytics(
    sagemaker_session=Session(sess, sm),
    experiment_name=mnist_experiment.experiment_name,
    search_expression=search_expression,
    sort_by="metrics.test:accuracy.max",
    sort_order="Descending",
    metric_names=["test:accuracy"],
    parameter_names=["hidden_channels", "epochs", "dropout", "optimizer"],
)

In [57]:
trial_component_analytics.dataframe()

Unnamed: 0,TrialComponentName,DisplayName,SourceArn,dropout,epochs,hidden_channels,optimizer,test:accuracy - Min,test:accuracy - Max,test:accuracy - Avg,...,test:accuracy - Last,test:accuracy - Count,training - MediaType,training - Value,SageMaker.DebugHookOutput - MediaType,SageMaker.DebugHookOutput - Value,SageMaker.ModelArtifact - MediaType,SageMaker.ModelArtifact - Value,Trials,Experiments
0,cnn-training-job-1660138845-aws-training-job,Training,arn:aws:sagemaker:us-west-2:706553727873:train...,0.2,2.0,10.0,"""sgd""",95.0,97.0,96.0,...,97.0,2,,s3://sagemaker-us-west-2-706553727873/DEMO-mnist,,s3://sagemaker-us-west-2-706553727873/,,s3://sagemaker-us-west-2-706553727873/cnn-trai...,[cnn-training-job-10-hidden-channels-1660138845],[mnist-hand-written-digits-classification-1660...
1,cnn-training-job-1660139101-aws-training-job,Training,arn:aws:sagemaker:us-west-2:706553727873:train...,0.2,2.0,20.0,"""sgd""",96.0,97.0,96.5,...,97.0,2,,s3://sagemaker-us-west-2-706553727873/DEMO-mnist,,s3://sagemaker-us-west-2-706553727873/,,s3://sagemaker-us-west-2-706553727873/cnn-trai...,[cnn-training-job-20-hidden-channels-1660139100],[mnist-hand-written-digits-classification-1660...
2,cnn-training-job-1660139359-aws-training-job,Training,arn:aws:sagemaker:us-west-2:706553727873:train...,0.2,2.0,32.0,"""sgd""",95.0,97.0,96.0,...,97.0,2,,s3://sagemaker-us-west-2-706553727873/DEMO-mnist,,s3://sagemaker-us-west-2-706553727873/,,s3://sagemaker-us-west-2-706553727873/cnn-trai...,[cnn-training-job-32-hidden-channels-1660139359],[mnist-hand-written-digits-classification-1660...
3,cnn-training-job-1660138330-aws-training-job,Training,arn:aws:sagemaker:us-west-2:706553727873:train...,0.2,2.0,2.0,"""sgd""",95.0,97.0,96.0,...,97.0,2,,s3://sagemaker-us-west-2-706553727873/DEMO-mnist,,s3://sagemaker-us-west-2-706553727873/,,s3://sagemaker-us-west-2-706553727873/cnn-trai...,[cnn-training-job-2-hidden-channels-1660138330],[mnist-hand-written-digits-classification-1660...
4,cnn-training-job-1660138619-aws-training-job,Training,arn:aws:sagemaker:us-west-2:706553727873:train...,0.2,2.0,5.0,"""sgd""",94.0,96.0,95.0,...,96.0,2,,s3://sagemaker-us-west-2-706553727873/DEMO-mnist,,s3://sagemaker-us-west-2-706553727873/,,s3://sagemaker-us-west-2-706553727873/cnn-trai...,[cnn-training-job-5-hidden-channels-1660138619],[mnist-hand-written-digits-classification-1660...


To isolate and measure the impact of change in hidden channels on model accuracy, we vary the number of hidden channel and fix the value for other hyperparameters.

Next let's look at an example of tracing the lineage of a model by accessing the data tracked by SageMaker Experiments for the `cnn-training-job-2-hidden-channels` trial.

In [58]:
lineage_table = ExperimentAnalytics(
    sagemaker_session=Session(sess, sm),
    search_expression={
        "Filters": [
            {
                "Name": "Parents.TrialName",
                "Operator": "Equals",
                "Value": hidden_channel_trial_name_map[2],
            }
        ]
    },
    sort_by="CreationTime",
    sort_order="Ascending",
)

In [59]:
lineage_table.dataframe()

Unnamed: 0,TrialComponentName,DisplayName,normalization_mean,normalization_std,mnist-dataset - MediaType,mnist-dataset - Value,Trials,Experiments,SourceArn,SageMaker.ImageUri,...,train:loss - Avg,train:loss - StdDev,train:loss - Last,train:loss - Count,training - MediaType,training - Value,SageMaker.DebugHookOutput - MediaType,SageMaker.DebugHookOutput - Value,SageMaker.ModelArtifact - MediaType,SageMaker.ModelArtifact - Value
0,TrialComponent-2022-08-10-133210-yikj,Preprocessing,0.1307,0.3081,s3/uri,s3://sagemaker-us-west-2-706553727873/DEMO-mnist,[cnn-training-job-10-hidden-channels-166013884...,[mnist-hand-written-digits-classification-1660...,,,...,,,,,,,,,,
1,cnn-training-job-1660138330-aws-training-job,Training,,,,,[cnn-training-job-2-hidden-channels-1660138330],[mnist-hand-written-digits-classification-1660...,arn:aws:sagemaker:us-west-2:706553727873:train...,520713654638.dkr.ecr.us-west-2.amazonaws.com/s...,...,0.456703,0.352488,0.157259,18.0,,s3://sagemaker-us-west-2-706553727873/DEMO-mnist,,s3://sagemaker-us-west-2-706553727873/,,s3://sagemaker-us-west-2-706553727873/cnn-trai...


## Push best training job model to model registry
Now we take the best model and push it to [model registry](#https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html).

### Step 1: Create a model package group.

In [60]:
import time

model_package_group_name = "mnist-handwritten-digit-claissification" + str(round(time.time()))
model_package_group_input_dict = {
    "ModelPackageGroupName": model_package_group_name,
    "ModelPackageGroupDescription": "Sample model package group",
}

create_model_package_group_response = sm.create_model_package_group(
    **model_package_group_input_dict
)
model_package_arn = create_model_package_group_response["ModelPackageGroupArn"]

print(f"ModelPackageGroup Arn : {model_package_arn}")

ModelPackageGroup Arn : arn:aws:sagemaker:us-west-2:706553727873:model-package-group/mnist-handwritten-digit-claissification1660139647


In [61]:
model_package_arn

'arn:aws:sagemaker:us-west-2:706553727873:model-package-group/mnist-handwritten-digit-claissification1660139647'

### Step 2: Get the best model training job from SageMaker experiments API

In [62]:
best_trial_component_name = trial_component_analytics.dataframe().iloc[0]["TrialComponentName"]
best_trial_component = TrialComponent.load(best_trial_component_name)

In [63]:
best_trial_component.trial_component_name

'cnn-training-job-1660138845-aws-training-job'

### Step 3: Register the best model.
By default, the model is registered with the `approval_status` set to `PendingManualApproval`. Users can then use API to manually approve the model based on any criteria set for model evaluation.

In [64]:
# create model object
model_data = best_trial_component.output_artifacts["SageMaker.ModelArtifact"].value
env = {
    "hidden_channels": str(int(best_trial_component.parameters["hidden_channels"])),
    "dropout": str(best_trial_component.parameters["dropout"]),
    "kernel_size": str(int(best_trial_component.parameters["kernel_size"])),
}
model = PyTorchModel(
    model_data,
    role,
    "./mnist.py",
    py_version="py3",
    env=env,
    sagemaker_session=sagemaker.Session(sagemaker_client=sm),
    framework_version="1.1.0",
    name=best_trial_component.trial_component_name,
)

In [65]:
model_package = model.register(
    content_types=["*"],
    response_types=["application/json"],
    inference_instances=["ml.m5.xlarge"],
    transform_instances=["ml.m5.xlarge"],
    description="MNIST image classification model",
    approval_status="PendingManualApproval",
    model_package_group_name=model_package_group_name,
)

### Step 4: Verify model has been registered.

In [66]:
sm.describe_model_package_group(ModelPackageGroupName=model_package_group_name)

{'ModelPackageGroupName': 'mnist-handwritten-digit-claissification1660139647',
 'ModelPackageGroupArn': 'arn:aws:sagemaker:us-west-2:706553727873:model-package-group/mnist-handwritten-digit-claissification1660139647',
 'ModelPackageGroupDescription': 'Sample model package group',
 'CreationTime': datetime.datetime(2022, 8, 10, 13, 54, 7, 26000, tzinfo=tzlocal()),
 'CreatedBy': {'UserProfileArn': 'arn:aws:sagemaker:us-west-2:706553727873:user-profile/d-wywtbp4ylr4f/sagemaker-new-features',
  'UserProfileName': 'sagemaker-new-features',
  'DomainId': 'd-wywtbp4ylr4f'},
 'ModelPackageGroupStatus': 'Completed',
 'ResponseMetadata': {'RequestId': 'dfb34b20-9ddc-4464-8f53-6c0b62a21e8c',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'dfb34b20-9ddc-4464-8f53-6c0b62a21e8c',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '539',
   'date': 'Wed, 10 Aug 2022 13:54:07 GMT'},
  'RetryAttempts': 0}}

In [67]:
## check model version
sm.list_model_packages(ModelPackageGroupName=model_package_group_name)

{'ModelPackageSummaryList': [{'ModelPackageGroupName': 'mnist-handwritten-digit-claissification1660139647',
   'ModelPackageVersion': 1,
   'ModelPackageArn': 'arn:aws:sagemaker:us-west-2:706553727873:model-package/mnist-handwritten-digit-claissification1660139647/1',
   'ModelPackageDescription': 'MNIST image classification model',
   'CreationTime': datetime.datetime(2022, 8, 10, 13, 54, 8, 172000, tzinfo=tzlocal()),
   'ModelPackageStatus': 'Completed',
   'ModelApprovalStatus': 'PendingManualApproval'}],
 'ResponseMetadata': {'RequestId': '05764160-eb1b-401c-9288-0e333c9132e1',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '05764160-eb1b-401c-9288-0e333c9132e1',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '430',
   'date': 'Wed, 10 Aug 2022 13:54:07 GMT'},
  'RetryAttempts': 0}}

In [68]:
model_package_arn = sm.list_model_packages(ModelPackageGroupName=model_package_group_name)[
    "ModelPackageSummaryList"
][0]["ModelPackageArn"]

In [69]:
### Update the model status to approved
model_package_update_input_dict = {
    "ModelPackageArn": model_package_arn,
    "ModelApprovalStatus": "Approved",
}
model_package_update_response = sm.update_model_package(**model_package_update_input_dict)

## Deploy an endpoint for the lastest approved version of the model from model registry

Now we take the best model and deploy it to an endpoint so it is available to perform inference.

In [70]:
from datetime import datetime

now = datetime.now()
time = now.strftime("%m-%d-%Y-%H-%M-%S")
print("time:", time)
endpoint_name = f"cnn-mnist-{time}"
endpoint_name

time: 08-10-2022-13-54-08


'cnn-mnist-08-10-2022-13-54-08'

In [71]:
model_package.deploy(
    initial_instance_count=1, instance_type="ml.m5.xlarge", endpoint_name=endpoint_name
)

INFO:sagemaker:Creating model with name: 1-2022-08-10-13-54-08-692
INFO:sagemaker:Creating endpoint-config with name cnn-mnist-08-10-2022-13-54-08
INFO:sagemaker:Creating endpoint with name cnn-mnist-08-10-2022-13-54-08


----!

## Cleanup

Once we're done, clean up the endpoint to prevent unnecessary billing.

In [72]:
sagemaker_client = boto3.client("sagemaker", region_name=region)
# Delete endpoint
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)

{'ResponseMetadata': {'RequestId': '2ec2af4e-ca50-4e8d-bcba-380b0149930c',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '2ec2af4e-ca50-4e8d-bcba-380b0149930c',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Wed, 10 Aug 2022 13:56:11 GMT'},
  'RetryAttempts': 0}}

In [73]:
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_name)

{'ResponseMetadata': {'RequestId': '973ef714-ab26-4a8a-b588-99bec2812683',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '973ef714-ab26-4a8a-b588-99bec2812683',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Wed, 10 Aug 2022 13:56:11 GMT'},
  'RetryAttempts': 0}}

Trial components can exist independently of trials and experiments. You might want keep them if you plan on further exploration. If not, delete all experiment artifacts.

In [74]:
mnist_experiment.delete_all(action="--force")

## Contact
Submit any questions or issues to https://github.com/aws/sagemaker-experiments/issues or mention @aws/sagemakerexperimentsadmin 