## Use SageMaker Experiments to track and compare multiple trials

In this notebook, we aim to show how data-scientists can use SageMaker Experiments to keep track and organize their machine learning experimentation.

### The Machine Learning problem

We will be predicting house values for the California districts. The problem is a classic regression problem and we will use the [California Housing dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html
) obtained from [Scikit-Learn](https://scikit-learn.org/stable/datasets/real_world.html), which was originally published in:
> Pace, R. Kelley, and Ronald Barry. "Sparse spatial auto-regressions." Statistics & Probability Letters 33.3 (1997): 291-297.

The target variable is the house value for California districts, expressed in hundreds of thousands of dollars (`$100,000`).

We will be using the [Amazon SageMaker built-in XGBoost algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) to create the model because it does not require the use of a script to run the experiments. However, the use of Amazon SageMaker Experiments or Amazon SageMaker Pipelines is not affected by whether you use a different built-in algorithm, use Amazon SageMaker in script mode, or bring your own container.

### Install required and/or update libraries

At the time of writing, the `sagemaker` SDK version tested is `2.73.0`, while the `sagemaker-experiment` SDK library is `0.1.35`.

In [None]:
import sys
import subprocess
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-Uq', 'pip'])
subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'sagemaker==2.73.0', '-Uq'])
subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'sagemaker-experiments==0.1.35', '-Uq'])

### Set up

Let's start by specifying:

* The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the notebook instance, training, and hosting.
* The IAM role arn used to give training and hosting access to your data.
* The experiment name as the logical entity to keep our tests grouped and organized.

In [None]:
import numpy as np
import pandas as pd
import boto3
import sagemaker
import time
from time import strftime

boto_session = boto3.Session()
sagemaker_session = sagemaker.Session(boto_session=boto_session)
sm_client = boto3.client("sagemaker")
region = boto_session.region_name
bucket = sagemaker_session.default_bucket()
role = sagemaker.get_execution_role()

experiment_name = 'DEMO-sagemaker-experiments-pipelines'
prefix = experiment_name

print(f"bucket: {bucket}")
print(f"region: {region}")
print(f"role: {role}")

### Set up the experiment

Amazon SageMaker Experiments have been built for data scientists that are performing different experiments as part of their model development process and want a simple way to organize, track, compare, and evaluate their machine learning experiments. 

Let’s start first with an overview of Amazon SageMaker Experiments features: 

* <u>Organize Experiments:</u> Amazon SageMaker Experiments structures experimentation with a first top level entity called experiment that contains a set of trials. Each trial contains a set of steps called trial components. Each trial component is a combination of datasets, algorithms, parameters, and artifacts. You can picture experiments as the top level “folder” for organizing your hypotheses, your trials as the “subfolders” for each group test run, and your trial components as your “files” for each instance of a test run.
* <u>Track Experiments:</u> Amazon SageMaker Experiments allows the data scientist to track experiments automatically or manually. Amazon SageMaker Experiments offers the possibility to automatically assign the sagemaker jobs to a trial specifying the <i>experiment_config</i> argument, or to manually call the tracking APIs.
* <u>Compare and Evaluate Experiments:</u> The integration of Amazon SageMaker Experiments with Amazon SageMaker Studio makes it easier to produce data visualizations and compare different trials to identify the best combination of hyperparameters.

It is time to create a new <i>experiment</i>. The experiment name needs to be unique within your AWS account and region. Furthermore, let's assign to it a tag. Tagging your experiments adds metadata to your experiments, trials, and trial components, allowing for more fine grained filtering. 

In [None]:
from smexperiments.experiment import Experiment
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from smexperiments.tracker import Tracker

# create the experiment if it doesn't exist
try:
    demo_experiment = Experiment.load(experiment_name=experiment_name)
    print("existing experiment loaded")
except Exception as ex:
    if "ResourceNotFound" in str(ex):
        demo_experiment = Experiment.create(
            experiment_name=experiment_name,
            description = "Demo experiment",
            tags = [{'Key': 'demo-experiments', 'Value': 'demo1'}]
        )
        print("new experiment created")
    else:
        print(f"Unexpected {ex}=, {type(ex)}")
        print("Dont go forward!")
        raise

print(demo_experiment.experiment_arn)

Similar considerations we did for experiments also apply to trials. Let's create a new <i>trial</i> for our test associated with the experiment created earlier.

In [None]:
create_date = time.strftime("%Y-%m-%d-%H-%M-%S")
trial_name = f"xgboost-tuning-{create_date}"

try:
    trial = Trial.load(trial_name=trial_name)
    print("existing trial loaded")
except Exception as ex:
    if "ResourceNotFound" in str(ex):
        trial = Trial.create(
            experiment_name=experiment_name,
            trial_name=trial_name,
            tags = [{'Key': 'demo-experiments', 'Value': 'demo1'}]
        )
        print("new trial created")
    else:
        print(f"Unexpected {ex}=, {type(ex)}")
        print("Dont go forward!")
        raise

print(trial.trial_arn)

### Setup data

Download the California housing dataset, and split it into train-validation-test.

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

databunch = fetch_california_housing()
dataset = np.concatenate((databunch.target.reshape(-1, 1), databunch.data), axis=1)
print(f"Dataset shape = {dataset.shape}")

train, other = train_test_split(dataset, test_size=0.1)
validation, test = train_test_split(other, test_size=0.5)

print(f"Train shape = {train.shape}")
print(f"Validation shape = {validation.shape}")
print(f"Test shape = {test.shape}")

Finally, upload the datasets to S3.

In [None]:
np.savetxt("train.csv", train, delimiter=",")
np.savetxt("validation.csv", validation, delimiter=",")

train_prefix = f"{prefix}/input/train.csv"
s3_input_train = f"s3://{bucket}/{train_prefix}"
print(s3_input_train)

validation_prefix = f"{prefix}/input/validation.csv"
s3_input_validation = f"s3://{bucket}/{validation_prefix}"
print(s3_input_validation)

s3 = boto3.client("s3")
s3.upload_file("train.csv", bucket, train_prefix)
s3.upload_file("validation.csv", bucket, validation_prefix)

Set an output path where the trained model will be saved.

In [None]:
output_path = f's3://{bucket}/{prefix}/output'
print(output_path)

### Set up automatic model tuning

There are several steps needed configure an automatic tuning job.

1/ Retrieve the XGBoost algorithm container.

In [None]:
xgboost_container = sagemaker.image_uris.retrieve("xgboost", region, "1.2-2")
print("XGBoost container image URI: {}".format(xgboost_container))

2/ Initialize XGBoost hyperparameters and the XGBoost estimator.

In [None]:
hyperparameters = {
    "objective": "reg:squarederror",
    "num_round": "50",
    "max_depth": "5",
    "eta": "0.2",
    "gamma": "4",
    "min_child_weight": "6",
    "subsample": "0.7",
    "verbosity": "1"
}

estimator = sagemaker.estimator.Estimator(image_uri=xgboost_container, 
                                          hyperparameters=hyperparameters,
                                          role=role,
                                          instance_count=1, 
                                          instance_type='ml.m5.2xlarge', 
                                          volume_size=5, # 5 GB 
                                          output_path=output_path)

3/ Define the hyperparameters values ranges.

In [None]:
from sagemaker.tuner import ContinuousParameter, HyperparameterTuner

hyperparameter_ranges = {
    "lambda": ContinuousParameter(0.01, 10, scaling_type="Logarithmic")
}

4/ Define the objective metric we are interested in and whether we are looking to minimize or optimize it.

In [None]:
objective_metric_name = 'validation:rmse'
objective_type = 'Minimize'

5/ Configure the tuning job.

<u>NOTE:</u> When using the `Bayesian` strategy, we recommend you to set the parallel jobs value to less than 10% of the total number of training jobs (we will set it higher just for this example to keep it short).

In [None]:
tuner = HyperparameterTuner(estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            objective_type=objective_type,
                            strategy="Bayesian",
                            max_jobs=10,
                            max_parallel_jobs=3)

5/ Execute the training jobs with automatic tuning. This will take around ~10-15 minutes.

In [None]:
%%time

from sagemaker.inputs import TrainingInput

content_type = "csv"
train_input = TrainingInput(s3_input_train, content_type=content_type)
validation_input = TrainingInput(s3_input_validation, content_type=content_type)
tuner.fit({'train': train_input, 'validation': validation_input})

Describe the job status.

In [None]:
boto3.client('sagemaker').describe_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuner.latest_tuning_job.job_name
)['HyperParameterTuningJobStatus']

## SageMaker Experiments APIs

Once the tuning job is completed, each training job that was spawned also generated a <i>trial component</i> that is not associated with neither a <i>trial</i> nor an <i>experiment</i>. The Amazon SageMaker Experiments SDK offers filtering capabilities to quickly retrieve the list of trial components we are interested in, and associate them with the intended trial. Let's see how we can do this.

In [None]:
def associate_trial_components(trial_name, search_expression, verbose=True):

    # Search iterates over every page of results by default.
    trial_component_search_results = list(
        TrialComponent.search(search_expression=search_expression)
    )
    
    if verbose:
        print(f"Found {len(trial_component_search_results)} trial components.")
        
    # Associate the components with the trial.
    for tc in trial_component_search_results:
        if verbose:
            print(f"Associating trial component {tc.trial_component_name} with trial {trial.trial_name}.")
        trial.add_trial_component(tc.trial_component_name)
        # sleep to avoid throttling
        time.sleep(0.5)

Define a Search Expression as s boolean conditional statement to combine filters. Then manually associate the trial components to the trial.

In [None]:
from smexperiments.search_expression import Filter, Operator, SearchExpression

# return the latest training job name
tuning_job_name = tuner.latest_tuning_job.name

# The training job names contain the tuning job name, and the training job name is in the source ARN.
source_arn_filter = Filter(
    name="TrialComponentName", operator=Operator.CONTAINS, value=tuning_job_name
)

source_type_filter = Filter(
    name="Source.SourceType", operator=Operator.EQUALS, value="SageMakerTrainingJob"
)

search_expression = SearchExpression(
    filters=[source_arn_filter, source_type_filter]
)

associate_trial_components(trial_name, search_expression)

### Hosting

We deploy the best model to an endpoint. This will take ~5-10 minutes.

In [None]:
%%time

from sagemaker.serializers import CSVSerializer

tuner_predictor = tuner.deploy(
    initial_instance_count=1,
    instance_type="ml.c5.xlarge",
    serializer=CSVSerializer()
)

Predict one test record and compare it with the actual value.

In [None]:
print(f"Predicted:\t{tuner_predictor.predict(test[0, 1:])}")
print(f"Actual:\t\t{test[0, 0]}")

### Explore the results of hyperparameter tuning

SageMaker offers a very convenient way to retrieve the <i>TrailComponents</i> data. Using the same filtering, we can extract into a `pandas` DataFrame all returned TrialComponents to further analyze the data. For example, we want to see how the hyperapeters has affected the model metric.

In [None]:
from sagemaker.analytics import ExperimentAnalytics

trial_component_analytics = ExperimentAnalytics(
    experiment_name=demo_experiment.experiment_name,
    search_expression=search_expression.to_boto()
)
analytic_table = trial_component_analytics.dataframe()
analytic_table.head()

Plot the last validation dataset RMSE value of each trial component against the lambda the run used. Once you have the data in a `pandas` DataFrame, you can decide your to use your preferred plotting tool, e.g., plots directly from `pandas`, or use `matplotlib` if you want to have more control.

In [None]:
ax = analytic_table.plot.scatter("lambda", "validation:rmse - Last", grid=True)
analytic_table["TrialComponentID"] = [str(int(x.split('-')[4])) for x in analytic_table["TrialComponentName"]]
for _, v in analytic_table[["TrialComponentID", "lambda", "validation:rmse - Last"]].iterrows():
    ax.annotate(v.TrialComponentID, v[1:])

### Clean up
Delete the endpoint to avoid unnecessary charges.

In [None]:
tuner_predictor.delete_endpoint()

Uncomment the cell below to remove all Experiments, and associated Trials and TrialsComponents.

In [None]:
#demo_experiment.delete_all(action="--force")

### Next steps

Now go to the [SageMaker Pipelines](./02-PipelineExperiments.ipynb) notebook to learn how to package this notebook's workflow in a single pipeline.