# Getting Started with Amazon SageMaker Automatic Model Tuning

## Background

Machine learning model training is controlled by the set of values that is refered to as hyperparameters. In contrast to parameters plugged into optimization functions, such as node weights or bias, hyperparameters are defined before model training. One can tune them manually to reach to better model performance by changing the values based on one's expertise. Alternatively, one can run hyperparameter optimization and tune these parameters automatically. 

Amazon SageMaker Automatic Model Tuning reduces the undifferentiated heavy lifting of researching the hyperparameter space, by launching training jobs with several sets of hyperparameter combinations and provides the set of best performing values as a result.

This tutorial will walk you through SageMaker Automatic Model Tuning (AMT) using a built-in XGBoost algorithm on Amazon SageMaker. Additional information can be found on documentation pages:
* For running a simple hyperparameter tuning job: https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-ex.html
* For HyperParameterTuner API: https://sagemaker.readthedocs.io/en/stable/api/training/tuner.html

## Overview

The notebook is split into the following sections:
* Setup and Imports
* Load and Prepare dataset
* Train a SageMaker Built-In XGBoost Algorithm
* Train and Tune a SageMaker Built-In XGBoost Algorithm
* View the AMT job statistics 
* Visualize AMT job results and tuned Hyperparameters


### Setup and Imports

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
#import sys
import sagemaker

#!{sys.executable} -m pip install --upgrade pip       --quiet # upgrade pip to the latest vesion
#!{sys.executable} -m pip install --upgrade sagemaker --quiet # upgrade SageMaker to the latest vesion
sagemaker.__version__

'2.107.0'

In [4]:
import io
import os
import argparse
import traceback
import boto3
import numpy as np
import pandas as pd

from sklearn import datasets
from pathlib import Path

In [5]:
# SDK setup
role = sagemaker.get_execution_role()
region = boto3.Session().region_name
sm = boto3.client('sagemaker')
boto_sess = boto3.Session(region_name=region)
sm_sess = sagemaker.session.Session(boto_session=boto_sess, sagemaker_client=sm)

In [6]:
BUCKET = sm_sess.default_bucket()
PREFIX = 'amt-visualize-demo/data'
s3_data_url = f's3://{BUCKET}/{PREFIX}'

# Eventual output destination for our XGBoost model
output_path = f's3://{BUCKET}/{PREFIX}/output'
print(output_path)

s3://sagemaker-us-east-1-632581975302/amt-visualize-demo/data/output


## Load and Prepare dataset 

In [7]:
!mkdir -p data

The dataset used in the notebook is a scikit-learn library copy of the test set of the UCI ML hand-written digits datasets https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits. Each datapoint is a 8x8 image of a digit. 

In [8]:
from sklearn import datasets

digits         = datasets.load_digits()
digits_df      = pd.DataFrame(digits.data)
digits_df['y'] = digits.target
digits_df.insert(0, 'y', digits_df.pop('y')) # XGBoost expects the target to be the first column 

In [9]:
# Randomly sort the data then split out into train 70% and validation 30%
train_data, valid_data= np.split(
    digits_df, [int(0.7 * len(digits_df))]
)

In [10]:
train_data.to_csv("data/train.csv", index=False, header=False)
valid_data.to_csv("data/valid.csv", index=False, header=False) # valid

We upload train and validation datasets into Amazon S3. Amazon SageMaker will interact with the data directly from S3.

In [11]:
boto_sess.resource("s3").Bucket(BUCKET).Object(os.path.join(PREFIX, "train/train.csv")
                                                 ).upload_file("data/train.csv")
boto_sess.resource("s3").Bucket(BUCKET).Object(os.path.join(PREFIX, "valid/valid.csv")
                                                 ).upload_file("data/valid.csv")

We will use the built-in algorithm that comes in an image URI as described in the docs here:
https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html

## Train an Amazon SageMaker Built-In XGBoost Algorithm

In [12]:
%%time
from sagemaker import image_uris
from sagemaker.session import Session
from sagemaker.inputs import TrainingInput

hyperparameters = {
        "num_class": "10",
        "max_depth":"5",
        "eta":"0.2",
        "gamma":"1",
        "min_child_weight":"6",
        "subsample":"0.7",
        "objective":"multi:softmax",
        "eval_metric":"accuracy",
        "num_round":"50"}

# lookup the XGBoost image URI and builds an XGBoost container
xgboost_container = sagemaker.image_uris.retrieve("xgboost", region, "1.5-1")
display(xgboost_container)

# construct a SageMaker estimator that calls the XGBoost container
estimator = sagemaker.estimator.Estimator(image_uri=xgboost_container, 
                                          hyperparameters=hyperparameters,
                                          role=role,
                                          instance_count=1, 
                                          instance_type='ml.m5.large', 
                                          volume_size=5, # 5 GB 
                                          output_path=output_path)

# define the data type and paths to the training and validation datasets
s3_input_train = TrainingInput(
    s3_data=f's3://{BUCKET}/{PREFIX}/train', content_type="csv")
s3_input_valid = TrainingInput(
    s3_data=f's3://{BUCKET}/{PREFIX}/valid', content_type="csv")

# execute the XGBoost training job
estimator.fit({'train': s3_input_train, 'validation': s3_input_valid})

'683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.5-1'

2022-10-19 10:48:54 Starting - Starting the training job...
2022-10-19 10:49:17 Starting - Preparing the instances for trainingProfilerReport-1666176534: InProgress
.........
2022-10-19 10:50:41 Downloading - Downloading input data......
2022-10-19 10:51:42 Training - Downloading the training image......
2022-10-19 10:52:38 Training - Training image download completed. Training in progress.[34m[2022-10-19 10:52:37.424 ip-10-0-126-71.ec2.internal:1 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None[0m
[34m[2022-10-19:10:52:37:INFO] Imported framework sagemaker_xgboost_container.training[0m
[34m[2022-10-19:10:52:37:INFO] Failed to parse hyperparameter eval_metric value accuracy to Json.[0m
[34mReturning the value itself[0m
[34m[2022-10-19:10:52:37:INFO] Failed to parse hyperparameter objective value multi:softmax to Json.[0m
[34mReturning the value itself[0m
[34m[2022-10-19:10:52:37:INFO] No GPUs detected (normal if no gpus installed)[0m
[34m[2022-10-19:10:52:37:INFO] R

## Train and Tune an Amazon SageMaker Built-In XGBoost Algorithm

Amazon SageMaker AMT now orchestrates different trials. We use `tuner.wait()` to pause notebook execution until the AMT job is completed. Depending on the number of jobs and teh configuration of their paralelization this may take a while. For the example above it may take around 10 minutes. During this time you can view the status of your jobs in the console by navigating to Amazon SageMaker > Training > Hyperparameter tuning jobs.

For more information on AMT job monitoring, see: https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-monitor.html

In [13]:
from sagemaker.tuner import IntegerParameter, CategoricalParameter
from sagemaker.tuner import ContinuousParameter, HyperparameterTuner

n_jobs = 50
n_parallel_jobs = 2

# redundant declaration - included for visibility 
hyperparameters = {
        "num_class": "10",
        "max_depth":"5",
        "eta":"0.2",
        "gamma":"1",
        "min_child_weight":"6",
        "subsample":"0.7",
        "objective":"multi:softmax",
        "eval_metric":"accuracy",
        "num_round":"50"}

hpt_ranges = {'eta': IntegerParameter(0, 1),
              'alpha': IntegerParameter(0, 2),
              'min_child_weight': IntegerParameter(1, 10),
              'max_depth': IntegerParameter(1, 20)
             }

tuner_parameters = {'estimator': estimator,
                    'base_tuning_job_name': 'bayesian',                   
                    'objective_metric_name': 'validation:accuracy',
                    'objective_type': 'Maximize',
                    'hyperparameter_ranges': hpt_ranges,
                    'strategy': 'Bayesian',
                    'max_jobs': n_jobs,
                    'max_parallel_jobs': n_parallel_jobs}

In [14]:
tuner = HyperparameterTuner(**tuner_parameters)
tuner.fit({'train': s3_input_train, 'validation': s3_input_valid}, wait=False)
tuner_name = tuner.describe()["HyperParameterTuningJobName"]
print(f'tuning job submitted: {tuner_name}.')

tuning job submitted: bayesian-221019-1100.


In [15]:
tuner.wait()

....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................!


## View the AMT job statistics and results 

Tuning jobs you have run can be accessed from the Amazon SageMaker console at https://console.aws.amazon.com/sagemaker/. Select Hyperparameter tuning job from the Training menu to see the list. More information here: https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-monitor.html

We can check the results of the HPO jobs and investigate the hyperparameters used, the final value achieved in the objective function and the total training time per job.

#### This can be done either via the Amazon SageMaker Python SDK

In [16]:
sagemaker.HyperparameterTuningJobAnalytics(tuner_name).dataframe()[:10]

Unnamed: 0,alpha,eta,max_depth,min_child_weight,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
0,0.0,1.0,19.0,3.0,bayesian-221019-1100-050-476d1788,Completed,0.85926,2022-10-19 11:38:47+00:00,2022-10-19 11:39:39+00:00,52.0
1,0.0,1.0,11.0,4.0,bayesian-221019-1100-049-74bd2888,Completed,0.86296,2022-10-19 11:38:29+00:00,2022-10-19 11:39:26+00:00,57.0
2,0.0,1.0,12.0,4.0,bayesian-221019-1100-048-caac8bc3,Completed,0.86296,2022-10-19 11:37:27+00:00,2022-10-19 11:38:29+00:00,62.0
3,0.0,1.0,13.0,4.0,bayesian-221019-1100-047-125a55e8,Completed,0.86296,2022-10-19 11:37:19+00:00,2022-10-19 11:38:13+00:00,54.0
4,0.0,1.0,14.0,4.0,bayesian-221019-1100-046-2e4c55e4,Completed,0.86296,2022-10-19 11:35:58+00:00,2022-10-19 11:37:10+00:00,72.0
5,0.0,1.0,15.0,4.0,bayesian-221019-1100-045-ca1e48cc,Completed,0.86296,2022-10-19 11:35:49+00:00,2022-10-19 11:36:56+00:00,67.0
6,0.0,1.0,16.0,4.0,bayesian-221019-1100-044-9597baea,Completed,0.86296,2022-10-19 11:34:48+00:00,2022-10-19 11:35:40+00:00,52.0
7,0.0,1.0,17.0,4.0,bayesian-221019-1100-043-81602918,Completed,0.86296,2022-10-19 11:33:59+00:00,2022-10-19 11:35:32+00:00,93.0
8,0.0,1.0,18.0,4.0,bayesian-221019-1100-042-75ea1cce,Completed,0.86296,2022-10-19 11:33:30+00:00,2022-10-19 11:34:32+00:00,62.0
9,0.0,1.0,19.0,4.0,bayesian-221019-1100-041-95b33bb6,Completed,0.86296,2022-10-19 11:32:43+00:00,2022-10-19 11:33:39+00:00,56.0


#### Or via the AWS SDK for Python (Boto3)

With the boto3 client we review the results of HPO job using describe_hyper_parameter_tuning_job() function.

In [17]:
#sm.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuner_name)   # to review all statistics
sm.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuner_name)['BestTrainingJob']

{'TrainingJobName': 'bayesian-221019-1100-020-f99889f3',
 'TrainingJobArn': 'arn:aws:sagemaker:us-east-1:632581975302:training-job/bayesian-221019-1100-020-f99889f3',
 'CreationTime': datetime.datetime(2022, 10, 19, 11, 17, 45, tzinfo=tzlocal()),
 'TrainingStartTime': datetime.datetime(2022, 10, 19, 11, 17, 49, tzinfo=tzlocal()),
 'TrainingEndTime': datetime.datetime(2022, 10, 19, 11, 18, 51, tzinfo=tzlocal()),
 'TrainingJobStatus': 'Completed',
 'TunedHyperParameters': {'alpha': '0',
  'eta': '1',
  'max_depth': '5',
  'min_child_weight': '5'},
 'FinalHyperParameterTuningJobObjectiveMetric': {'MetricName': 'validation:accuracy',
  'Value': 0.8629599809646606},
 'ObjectiveStatus': 'Succeeded'}

We can also utilize the boto3 list_training_jobs_for_hyper_parameter_tuning_job() function to review the results sorted by the value of the objective function and including metric definitions. More functions available for Amazon SageMaker with boto3 are described on this page: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html

In [21]:
hpo_jobs = sm.list_training_jobs_for_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuner_name,
    MaxResults=100,
    SortBy='FinalObjectiveMetricValue',
    SortOrder='Descending')

for job in hpo_jobs['TrainingJobSummaries'][:10]:
    job_descr = sm.describe_training_job(TrainingJobName=job['TrainingJobName'])
    metrics = {m['MetricName']:  m['Value'] for m in job_descr['FinalMetricDataList']}
    print(f'{job["TrainingJobName"]} Metrics: {metrics}')

bayesian-221019-1100-032-1f791914 Metrics: {'train:mlogloss': 0.048489999026060104, 'train:accuracy': 0.9960200190544128, 'validation:accuracy': 0.8629599809646606, 'validation:mlogloss': 0.48034000396728516, 'ObjectiveMetric': 0.8629599809646606}
bayesian-221019-1100-026-f3f87857 Metrics: {'train:mlogloss': 0.048489999026060104, 'train:accuracy': 0.9960200190544128, 'validation:accuracy': 0.8629599809646606, 'validation:mlogloss': 0.48034000396728516, 'ObjectiveMetric': 0.8629599809646606}
bayesian-221019-1100-048-caac8bc3 Metrics: {'train:mlogloss': 0.043299999088048935, 'train:accuracy': 0.9976099729537964, 'validation:accuracy': 0.8629599809646606, 'validation:mlogloss': 0.45642000436782837, 'ObjectiveMetric': 0.8629599809646606}
bayesian-221019-1100-046-2e4c55e4 Metrics: {'train:mlogloss': 0.043299999088048935, 'train:accuracy': 0.9976099729537964, 'validation:accuracy': 0.8629599809646606, 'validation:mlogloss': 0.45642000436782837, 'ObjectiveMetric': 0.8629599809646606}
bayesian

## Visualize AMT job results and tuned Hyperparameters

Finally, we want to visualise the behaviour of our hyperparameters at different values.

To do this, we are using the altair library, and have written two custom analysis scripts `job_analytics.py` and `reporting_util.py` that we make available with this notebook

In [22]:
!pip install -Uq altair

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m22.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [23]:
from job_analytics import *
from reporting_util import *

import altair as alt


_ = alt.data_transformers.disable_max_rows()
alt.renderers.enable('mimetype')

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth', None)  # Don't truncate TrainingJobName        

Please ensure that the role used by SageMaker allows the cloudwatch:ListMetrics action on IAM.

In [25]:
analyze_hpo_job(tuner)

Tuning job bayesian-221019-1100      status: Completed

Number of training jobs with valid objective: 50
Lowest: 0.09814999997615814 Highest 0.8629599809646606


Unnamed: 0,alpha,eta,max_depth,min_child_weight,TrainingJobName,TrainingJobStatus,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds,TuningJobName,validation:accuracy
25,0.0,1.0,18.0,5.0,bayesian-221019-1100-025-e244e268,Completed,2022-10-19 11:21:42+00:00,2022-10-19 11:22:44+00:00,62.0,bayesian-221019-1100,0.86296
10,0.0,1.0,8.0,4.0,bayesian-221019-1100-040-a13063c2,Completed,2022-10-19 11:31:34+00:00,2022-10-19 11:33:11+00:00,97.0,bayesian-221019-1100,0.86296
26,0.0,1.0,16.0,5.0,bayesian-221019-1100-024-de8fae17,Completed,2022-10-19 11:20:51+00:00,2022-10-19 11:21:53+00:00,62.0,bayesian-221019-1100,0.86296
1,0.0,1.0,11.0,4.0,bayesian-221019-1100-049-74bd2888,Completed,2022-10-19 11:38:29+00:00,2022-10-19 11:39:26+00:00,57.0,bayesian-221019-1100,0.86296
24,0.0,1.0,14.0,5.0,bayesian-221019-1100-026-f3f87857,Completed,2022-10-19 11:22:08+00:00,2022-10-19 11:23:05+00:00,57.0,bayesian-221019-1100,0.86296
23,0.0,1.0,15.0,5.0,bayesian-221019-1100-027-0b4ec286,Completed,2022-10-19 11:23:07+00:00,2022-10-19 11:24:19+00:00,72.0,bayesian-221019-1100,0.86296
30,0.0,1.0,5.0,5.0,bayesian-221019-1100-020-f99889f3,Completed,2022-10-19 11:17:49+00:00,2022-10-19 11:18:51+00:00,62.0,bayesian-221019-1100,0.86296
18,0.0,1.0,20.0,5.0,bayesian-221019-1100-032-1f791914,Completed,2022-10-19 11:26:03+00:00,2022-10-19 11:27:00+00:00,57.0,bayesian-221019-1100,0.86296
15,0.0,1.0,19.0,5.0,bayesian-221019-1100-035-6786fcdc,Completed,2022-10-19 11:28:32+00:00,2022-10-19 11:29:29+00:00,57.0,bayesian-221019-1100,0.86296
11,0.0,1.0,20.0,4.0,bayesian-221019-1100-039-ee79a6e0,Completed,2022-10-19 11:31:18+00:00,2022-10-19 11:32:20+00:00,62.0,bayesian-221019-1100,0.86296


<VegaLite 4 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/troubleshooting.html
