# SageMaker Experiments

This notebook shows how you can use SageMaker Experiment Management Python SDK to organize, track, compare, and evaluate your machine learning (ML) model training experiments.

You can track artifacts for experiments, including data sets, algorithms, hyper-parameters, and metrics. Experiments executed on SageMaker such as SageMaker Autopilot jobs and training jobs will be automatically tracked. You can also track artifacts for additional steps within an ML workflow that come before/after model training e.g. data pre-processing or post-training model evaluation.

The APIs also let you search and browse your current and past experiments, compare experiments, and identify best performing models.

### Setup 

Uncomment (remove the #) the command below to install sagemaker-experiments

In [2]:
#!pip install sagemaker-experiments

### Imports 

In [3]:
import pandas as pd
import numpy as np
import boto3
import time

from sagemaker.analytics import ExperimentAnalytics
from sagemaker.sklearn.model import SKLearnModel
from sagemaker.sklearn.estimator import SKLearn
from sagemaker import get_execution_role
from sagemaker.session import Session
import sagemaker

from smexperiments.trial_component import TrialComponent
from smexperiments.experiment import Experiment
from smexperiments.tracker import Tracker
from smexperiments.trial import Trial

### Essentials

In [4]:
session = sagemaker.Session()
role = get_execution_role()

INFO:botocore.utils:IMDS ENDPOINT: http://169.254.169.254/
INFO:botocore.utils:IMDS ENDPOINT: http://169.254.169.254/


In [5]:
boto_session = boto3.Session()
sagemaker_boto_client = boto_session.client('sagemaker')

INFO:botocore.utils:IMDS ENDPOINT: http://169.254.169.254/


### Upload Data to S3

In [6]:
WORK_DIRECTORY = '.././DATA'

train_data_s3 = session.upload_data(f'{WORK_DIRECTORY}/train', key_prefix='sklearn-clf/train')
test_data_s3 = session.upload_data(f'{WORK_DIRECTORY}/test', key_prefix='sklearn-clf/test')

### Create a Processing Tracker 

In [7]:
with Tracker.create(display_name='Preprocessing', sagemaker_boto_client=sagemaker_boto_client) as processing_tracker:
    processing_tracker.log_parameters({
        'scaling-strategy': 'min-max',
    })
    # we can log the s3 uri to the dataset we just uploaded
    processing_tracker.log_input(name='training-data', 
                      media_type='s3/uri', 
                      value=train_data_s3)

INFO:botocore.utils:IMDS ENDPOINT: http://169.254.169.254/


### Create an Experiment

Create an experiment to track all the model training iterations. Experiments are a great way to organize your data science work. You can create experiments to organize all your model development work for : [1] a business use case you are addressing (e.g. create experiment named “customer churn prediction”), or [2] a data science team that owns the experiment (e.g. create experiment named “marketing analytics experiment”), or [3] a specific data science and ML project. Think of it as a “folder” for organizing your “files”.

In [9]:
experiment = Experiment.create(
    experiment_name='KNN-Classifier-Experiment', 
    description='Sklearn Classifier Example', 
    sagemaker_boto_client=sagemaker_boto_client)

In [10]:
experiment

Experiment(sagemaker_boto_client=<botocore.client.SageMaker object at 0x7fe29f2bf210>,experiment_name='KNN-Classifier-Experiment',description='Sklearn Classifier Example',tags=None,experiment_arn='arn:aws:sagemaker:us-east-1:892313895307:experiment/knn-classifier-experiment',response_metadata={'RequestId': '04887e92-d8b7-429c-94e1-8eee4bfbb7fa', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '04887e92-d8b7-429c-94e1-8eee4bfbb7fa', 'content-type': 'application/x-amz-json-1.1', 'content-length': '97', 'date': 'Mon, 09 Nov 2020 01:51:33 GMT'}, 'RetryAttempts': 0})

### Create a Trial

Now create a Trial for each training run to track the it's inputs, parameters, and metrics.<br>
We will also create a TrialComponent from the <b>Processing Tracker</b> we created before, and add to the Trial.<br>
This will enrich the Trial with the parameters we captured from the data processing stage.<br>
Note the execution of the following code takes a while.

In [11]:
trial_name_map = {}

In [12]:
candidates = [3, 7, 9] # number of neighbors

In [13]:
processing_trial_component = processing_tracker.trial_component

In [15]:
for i, n_neighbors in enumerate(candidates):
    trial_name = f'sklearn-clf-{n_neighbors}-{int(time.time())}'
    trial = Trial.create(
        trial_name=trial_name, 
        experiment_name=experiment.experiment_name,
        sagemaker_boto_client=sagemaker_boto_client,
    )
    trial_name_map[n_neighbors] = trial_name
    
    # Associate the Processing trial component with the current trial
    trial.add_trial_component(processing_trial_component)
    
    estimator = SKLearn(entry_point='train.py',
                    instance_type='ml.m5.large',
                    instance_count=1,
                    framework_version='0.23-1',
                    role=role,
                    hyperparameters= {'nneighbors': n_neighbors},
                    metric_definitions=[
                        {'Name':'test:accuracy', 'Regex':'Test Accuracy: (.*?)%;'}
                    ],
                    enable_sagemaker_metrics=True # IMPORTANT
                )
                        
    training_job_name = 'training-job-{}'.format(int(time.time()))
    
    # Associate the estimator with an Experiment and a Trial
    estimator.fit(
        inputs={'train': train_data_s3, 'test': test_data_s3}, 
        job_name=training_job_name,
        experiment_config={
            'TrialName': trial.trial_name,
            'TrialComponentDisplayName': 'Training',
        }, 
    )

INFO:botocore.utils:IMDS ENDPOINT: http://169.254.169.254/
INFO:sagemaker.image_uris:Same images used for training and inference. Defaulting to image scope: inference.
INFO:sagemaker:Creating training-job with name: training-job-1604886734


2020-11-09 01:52:14 Starting - Starting the training job...
2020-11-09 01:52:17 Starting - Launching requested ML instances......
2020-11-09 01:53:27 Starting - Preparing the instances for training......
2020-11-09 01:54:17 Downloading - Downloading input data......
2020-11-09 01:55:19 Training - Downloading the training image.........
2020-11-09 01:57:13 Uploading - Uploading generated training model
2020-11-09 01:57:13 Completed - Training job completed
[34m2020-11-09 01:56:59,335 sagemaker-training-toolkit INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2020-11-09 01:56:59,337 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-11-09 01:56:59,346 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2020-11-09 01:56:59,649 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-11-09 01:57:01,246 sagemaker-training-toolkit INFO     No GPUs 

INFO:botocore.utils:IMDS ENDPOINT: http://169.254.169.254/
INFO:sagemaker.image_uris:Same images used for training and inference. Defaulting to image scope: inference.


Training seconds: 176
Billable seconds: 176


INFO:sagemaker:Creating training-job with name: training-job-1604887047


2020-11-09 01:57:27 Starting - Starting the training job...
2020-11-09 01:57:30 Starting - Launching requested ML instances......
2020-11-09 01:58:44 Starting - Preparing the instances for training...
2020-11-09 01:59:22 Downloading - Downloading input data...
2020-11-09 01:59:50 Training - Downloading the training image......
2020-11-09 02:00:48 Training - Training image download completed. Training in progress.
2020-11-09 02:01:27 Uploading - Uploading generated training model
2020-11-09 02:01:27 Completed - Training job completed


INFO:botocore.utils:IMDS ENDPOINT: http://169.254.169.254/


[34m2020-11-09 02:00:48,840 sagemaker-training-toolkit INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2020-11-09 02:00:48,843 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-11-09 02:00:48,851 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2020-11-09 02:00:49,132 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-11-09 02:00:55,428 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-11-09 02:00:55,438 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-11-09 02:00:55,447 sagemaker-training-toolkit INFO     Invoking user script
[0m
[34mTraining Env:
[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "test": "/opt/ml/input/data/test",
        "train": "/opt/ml/input/data/train"
    },
    "current_h

INFO:sagemaker.image_uris:Same images used for training and inference. Defaulting to image scope: inference.
INFO:sagemaker:Creating training-job with name: training-job-1604887300


2020-11-09 02:01:40 Starting - Starting the training job...
2020-11-09 02:01:43 Starting - Launching requested ML instances......
2020-11-09 02:02:58 Starting - Preparing the instances for training...
2020-11-09 02:03:39 Downloading - Downloading input data......
2020-11-09 02:04:15 Training - Downloading the training image...
2020-11-09 02:05:10 Uploading - Uploading generated training model[34m2020-11-09 02:05:04,600 sagemaker-training-toolkit INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2020-11-09 02:05:04,602 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-11-09 02:05:04,611 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2020-11-09 02:05:04,912 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-11-09 02:05:06,408 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-11-09 02:05:

### Compare Trials

Now we will use the analytics capabilities of Python SDK to query and compare the training runs for identifying the best model produced by our experiment. You can retrieve trial components by using a search expression.



In [16]:
search_expression = {
    "Filters":[
        {
            "Name": "DisplayName",
            "Operator": "Equals",
            "Value": "Training",
        }
    ],
}

In [17]:
trial_component_analytics = ExperimentAnalytics(
                                sagemaker_session=Session(boto_session, sagemaker_boto_client), 
                                experiment_name="Sklearn-Knn-Classifier-Experiment-6",
                                search_expression=search_expression,
                                sort_by="metrics.test:accuracy.max",
                                sort_order="Descending",
                                metric_names=['test:accuracy'],
                                parameter_names=['nneighbors']
                            )

In [18]:
trial_component_analytics.dataframe()

Unnamed: 0,TrialComponentName,DisplayName,SourceArn,nneighbors,test:accuracy - Min,test:accuracy - Max,test:accuracy - Avg,test:accuracy - StdDev,test:accuracy - Last,test:accuracy - Count,test - MediaType,test - Value,train - MediaType,train - Value,SageMaker.DebugHookOutput - MediaType,SageMaker.DebugHookOutput - Value,SageMaker.ModelArtifact - MediaType,SageMaker.ModelArtifact - Value,Trials,Experiments
0,training-job-1604608548-aws-training-job,Training,arn:aws:sagemaker:us-east-1:892313895307:train...,3.0,97.7273,97.7273,97.7273,0.0,97.7273,1,,s3://sagemaker-us-east-1-892313895307/sklearn-...,,s3://sagemaker-us-east-1-892313895307/sklearn-...,,s3://sagemaker-us-east-1-892313895307/,,s3://sagemaker-us-east-1-892313895307/training...,[sklearn-clf-3-1604608547],[Sklearn-Knn-Classifier-Experiment-6]
1,training-job-1604608802-aws-training-job,Training,arn:aws:sagemaker:us-east-1:892313895307:train...,7.0,93.1818,93.1818,93.1818,0.0,93.1818,1,,s3://sagemaker-us-east-1-892313895307/sklearn-...,,s3://sagemaker-us-east-1-892313895307/sklearn-...,,s3://sagemaker-us-east-1-892313895307/,,s3://sagemaker-us-east-1-892313895307/training...,[sklearn-clf-7-1604608801],[Sklearn-Knn-Classifier-Experiment-6]
2,training-job-1604609057-aws-training-job,Training,arn:aws:sagemaker:us-east-1:892313895307:train...,9.0,90.9091,90.9091,90.9091,0.0,90.9091,1,,s3://sagemaker-us-east-1-892313895307/sklearn-...,,s3://sagemaker-us-east-1-892313895307/sklearn-...,,s3://sagemaker-us-east-1-892313895307/,,s3://sagemaker-us-east-1-892313895307/training...,[sklearn-clf-9-1604609056],[Sklearn-Knn-Classifier-Experiment-6]


### Track Model Lineage
To isolate and measure the impact of change in hidden channels on model accuracy, we vary the number of hidden channel and fix the value for other hyperparameters.

Next let's look at an example of tracing the lineage of a model by accessing the data tracked by SageMaker Experiments for cnn-training-job-2-hidden-channels trial

In [19]:
trial_name_map

{3: 'sklearn-clf-3-1604886734',
 7: 'sklearn-clf-7-1604887047',
 9: 'sklearn-clf-9-1604887300'}

In [20]:
n_neighbors = 3 # PICK candidate for lineage

lineage_table = ExperimentAnalytics(
    sagemaker_session=Session(boto_session, sagemaker_boto_client), 
    search_expression={
        "Filters":[{
            "Name": "Parents.TrialName",
            "Operator": "Equals",
            "Value": trial_name_map[n_neighbors]
        }]
    },
    sort_by="CreationTime",
    sort_order="Ascending",
)

In [21]:
lineage_table.dataframe()

Unnamed: 0,TrialComponentName,DisplayName,scaling-strategy,training-data - MediaType,training-data - Value,Trials,Experiments,SourceArn,SageMaker.ImageUri,SageMaker.InstanceCount,...,test:accuracy - Last,test:accuracy - Count,test - MediaType,test - Value,train - MediaType,train - Value,SageMaker.DebugHookOutput - MediaType,SageMaker.DebugHookOutput - Value,SageMaker.ModelArtifact - MediaType,SageMaker.ModelArtifact - Value
0,TrialComponent-2020-11-09-015014-wdla,Preprocessing,min-max,s3/uri,s3://sagemaker-us-east-1-892313895307/sklearn-...,"[sklearn-clf-9-1604887300, sklearn-clf-7-16048...","[KNN-Classifier-Experiment, KNN-Classifier-Exp...",,,,...,,,,,,,,,,
1,training-job-1604886734-aws-training-job,Training,,,,[sklearn-clf-3-1604886734],[KNN-Classifier-Experiment],arn:aws:sagemaker:us-east-1:892313895307:train...,683313688378.dkr.ecr.us-east-1.amazonaws.com/s...,1.0,...,84.7205,1.0,,s3://sagemaker-us-east-1-892313895307/sklearn-...,,s3://sagemaker-us-east-1-892313895307/sklearn-...,,s3://sagemaker-us-east-1-892313895307/,,s3://sagemaker-us-east-1-892313895307/training...


### Deploy Best Model

In [22]:
# First model is best since sorted by accuracy at the analytics step above 
best_trial_component_name = trial_component_analytics.dataframe().iloc[0]['TrialComponentName']
best_trial_component = TrialComponent.load(best_trial_component_name)

model_data = best_trial_component.output_artifacts['SageMaker.ModelArtifact'].value
model_data

INFO:botocore.utils:IMDS ENDPOINT: http://169.254.169.254/


's3://sagemaker-us-east-1-892313895307/training-job-1604608548/output/model.tar.gz'

In [23]:
env = {'n_neighbors': str(int(best_trial_component.parameters['nneighbors']))}
env

{'n_neighbors': '3'}

In [24]:
model = SKLearnModel(entry_point='train.py',
                     name=best_trial_component.trial_component_name,
                     model_data=model_data,
                     framework_version='0.23-1',
                     role=role,
                     env=env, 
                    )

In [25]:
model.deploy(instance_type='ml.m5.large', 
             initial_instance_count=1)

INFO:botocore.utils:IMDS ENDPOINT: http://169.254.169.254/
INFO:sagemaker.image_uris:Same images used for training and inference. Defaulting to image scope: inference.
INFO:sagemaker:Creating model with name: training-job-1604608548-aws-training-job
INFO:sagemaker:Creating endpoint with name training-job-1604608548-aws-training-jo-2020-11-09-02-07-20-136


---------------!

<sagemaker.sklearn.model.SKLearnPredictor at 0x7fe2973f6f10>