![ML Logo](images/mod00_logo.png "Logo") 

# Module 6 Part II: ML Model Optimization -  Analyze Hyper Parameter Tuning


`(Revision History:
PA1, 2020-04-15, @akirmak: Initial version
`

## Module Overview

This notebook is an exact copy of the [AWS SageMaker Samples in Github: Analyze Hyper Parameter Tuning Results analyze_results/HPO_Analyze_TuningJob_Results](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/hyperparameter_tuning/analyze_results/HPO_Analyze_TuningJob_Results.ipynb)
 
In this overlooked mini utility notebook, you 
 1. dig into the Hyperparam tuning job details and insert them into a Pandas Data Frame 
 1. then analyze any correlation between the objective metric and each individual hyperparameters.  Because having that insight helps you adjust search ranges in this stochastic universe.  


# Analyze Results of a Hyperparameter Tuning job

Once you have completed a tuning job, (or even while the job is still running) you can use this notebook to analyze the results to understand how each hyperparameter effects the quality of the model.

---
## Set up the environment
To start the analysis, you must pick the name of the hyperparameter tuning job.

In [1]:
import boto3
import sagemaker
import os

region = boto3.Session().region_name
sage_client = boto3.Session().client('sagemaker')

tuning_job_name = 'xgboost-200414-2217'

## Track hyperparameter tuning job progress
After you launch a tuning job, you can see its progress by calling describe_tuning_job API. The output from describe-tuning-job is a JSON object that contains information about the current state of the tuning job. You can call list_training_jobs_for_tuning_job to see a detailed list of the training jobs that the tuning job launched.

In [2]:
# run this cell to check current status of hyperparameter tuning job
tuning_job_result = sage_client.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)

status = tuning_job_result['HyperParameterTuningJobStatus']
if status != 'Completed':
    print('Reminder: the tuning job has not been completed.')
    
job_count = tuning_job_result['TrainingJobStatusCounters']['Completed']
print("%d training jobs have completed" % job_count)
    
is_minimize = (tuning_job_result['HyperParameterTuningJobConfig']['HyperParameterTuningJobObjective']['Type'] != 'Maximize')
objective_name = tuning_job_result['HyperParameterTuningJobConfig']['HyperParameterTuningJobObjective']['MetricName']

Reminder: the tuning job has not been completed.
12 training jobs have completed


In [3]:
from pprint import pprint
if tuning_job_result.get('BestTrainingJob',None):
    print("Best model found so far:")
    pprint(tuning_job_result['BestTrainingJob'])
else:
    print("No training jobs have reported results yet.")

Best model found so far:
{'CreationTime': datetime.datetime(2020, 4, 14, 22, 20, 55, tzinfo=tzlocal()),
 'FinalHyperParameterTuningJobObjectiveMetric': {'MetricName': 'validation:auc',
                                                 'Value': 0.7762699723243713},
 'ObjectiveStatus': 'Succeeded',
 'TrainingEndTime': datetime.datetime(2020, 4, 14, 22, 24, 11, tzinfo=tzlocal()),
 'TrainingJobArn': 'arn:aws:sagemaker:us-east-1:503254810580:training-job/xgboost-200414-2217-005-92f42faa',
 'TrainingJobName': 'xgboost-200414-2217-005-92f42faa',
 'TrainingJobStatus': 'Completed',
 'TrainingStartTime': datetime.datetime(2020, 4, 14, 22, 23, 17, tzinfo=tzlocal()),
 'TunedHyperParameters': {'alpha': '0.994167630344954',
                          'eta': '0.19671847152320243',
                          'max_depth': '3',
                          'min_child_weight': '2.917386324038538'}}


## Fetch all results as DataFrame
We can list hyperparameters and objective metrics of all training jobs and pick up the training job with the best objective metric.

In [4]:
import pandas as pd

tuner = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)

full_df = tuner.dataframe()

if len(full_df) > 0:
    df = full_df[full_df['FinalObjectiveValue'] > -float('inf')]
    if len(df) > 0:
        df = df.sort_values('FinalObjectiveValue', ascending=is_minimize)
        print("Number of training jobs with valid objective: %d" % len(df))
        print({"lowest":min(df['FinalObjectiveValue']),"highest": max(df['FinalObjectiveValue'])})
        pd.set_option('display.max_colwidth', -1)  # Don't truncate TrainingJobName        
    else:
        print("No training jobs have reported valid results yet.")
        
df

Number of training jobs with valid objective: 12
{'lowest': 0.5, 'highest': 0.7762699723243713}


Unnamed: 0,FinalObjectiveValue,TrainingElapsedTimeSeconds,TrainingEndTime,TrainingJobName,TrainingJobStatus,TrainingStartTime,alpha,eta,max_depth,min_child_weight
10,0.77627,54.0,2020-04-14 22:24:11+00:00,xgboost-200414-2217-005-92f42faa,Completed,2020-04-14 22:23:17+00:00,0.994168,0.196718,3.0,2.917386
6,0.775866,97.0,2020-04-14 22:28:10+00:00,xgboost-200414-2217-009-5174f946,Completed,2020-04-14 22:26:33+00:00,1.258611,0.389956,2.0,3.131444
5,0.773587,68.0,2020-04-14 22:31:04+00:00,xgboost-200414-2217-010-69dab8e5,Completed,2020-04-14 22:29:56+00:00,1.328396,0.159243,2.0,2.385579
11,0.767794,63.0,2020-04-14 22:23:59+00:00,xgboost-200414-2217-004-1c34ed13,Completed,2020-04-14 22:22:56+00:00,1.115972,0.15848,7.0,2.099117
13,0.763186,65.0,2020-04-14 22:20:41+00:00,xgboost-200414-2217-002-2a012d1e,Completed,2020-04-14 22:19:36+00:00,0.515211,0.203475,1.0,4.194148
3,0.760709,59.0,2020-04-14 22:31:48+00:00,xgboost-200414-2217-012-77b1ddc8,Completed,2020-04-14 22:30:49+00:00,0.479314,0.735151,1.0,2.696561
9,0.759532,67.0,2020-04-14 22:24:27+00:00,xgboost-200414-2217-006-9acc4f53,Completed,2020-04-14 22:23:20+00:00,1.439968,0.914717,1.0,4.716771
14,0.756599,66.0,2020-04-14 22:20:34+00:00,xgboost-200414-2217-001-0a56dcd3,Completed,2020-04-14 22:19:28+00:00,0.028123,0.583794,5.0,9.551942
8,0.749732,73.0,2020-04-14 22:27:44+00:00,xgboost-200414-2217-007-02ae026a,Completed,2020-04-14 22:26:31+00:00,1.560086,0.288759,10.0,4.703587
12,0.735502,83.0,2020-04-14 22:21:00+00:00,xgboost-200414-2217-003-fb8f4445,Completed,2020-04-14 22:19:37+00:00,0.477242,0.597317,7.0,7.438805


## See TuningJob results vs time
Next we will show how the objective metric changes over time, as the tuning job progresses.  For Bayesian strategy, you should expect to see a general trend towards better results, but this progress will not be steady as the algorithm needs to balance _exploration_ of new areas of parameter space against _exploitation_ of known good areas.  This can give you a sense of whether or not the number of training jobs is sufficient for the complexity of your search space.

In [5]:
import bokeh
import bokeh.io
bokeh.io.output_notebook()
from bokeh.plotting import figure, show
from bokeh.models import HoverTool

class HoverHelper():

    def __init__(self, tuning_analytics):
        self.tuner = tuning_analytics

    def hovertool(self):
        tooltips = [
            ("FinalObjectiveValue", "@FinalObjectiveValue"),
            ("TrainingJobName", "@TrainingJobName"),
        ]
        for k in self.tuner.tuning_ranges.keys():
            tooltips.append( (k, "@{%s}" % k) )

        ht = HoverTool(tooltips=tooltips)
        return ht

    def tools(self, standard_tools='pan,crosshair,wheel_zoom,zoom_in,zoom_out,undo,reset'):
        return [self.hovertool(), standard_tools]

hover = HoverHelper(tuner)

p = figure(plot_width=900, plot_height=400, tools=hover.tools(), x_axis_type='datetime')
p.circle(source=df, x='TrainingStartTime', y='FinalObjectiveValue')
show(p)

### Discussion Points
- What is Bayesian search strategy, and how is it different from Grid Search and Random Search? (A topic we covered in the previous workshop)
- What is the exploration vs exploitation dilemma? (Exploration of new areas of parameter space against exploitation of known good areas).
- What can we infer from the above mini experiment? Is there a correlation between training time (X axis) vs Objective Function value? (Y Axis)


## Analyze the correlation between objective metric and individual hyperparameters 
Now you have finished a tuning job, you may want to know the correlation between your objective metric and individual hyperparameters you've selected to tune. Having that insight will help you decide whether it makes sense to adjust search ranges for certain hyperparameters and start another tuning job. For example, if you see a positive trend between objective metric and a numerical hyperparameter, you probably want to set a higher tuning range for that hyperparameter in your next tuning job.

The following cell draws a graph for each hyperparameter to show its correlation with your objective metric.

In [6]:
ranges = tuner.tuning_ranges
figures = []
for hp_name, hp_range in ranges.items():
    categorical_args = {}
    if hp_range.get('Values'):
        # This is marked as categorical.  Check if all options are actually numbers.
        def is_num(x):
            try:
                float(x)
                return 1
            except:
                return 0           
        vals = hp_range['Values']
        if sum([is_num(x) for x in vals]) == len(vals):
            # Bokeh has issues plotting a "categorical" range that's actually numeric, so plot as numeric
            print("Hyperparameter %s is tuned as categorical, but all values are numeric" % hp_name)
        else:
            # Set up extra options for plotting categoricals.  A bit tricky when they're actually numbers.
            categorical_args['x_range'] = vals

    # Now plot it
    p = figure(plot_width=500, plot_height=500, 
               title="Objective vs %s" % hp_name,
               tools=hover.tools(),
               x_axis_label=hp_name, y_axis_label=objective_name,
               **categorical_args)
    p.circle(source=df, x=hp_name, y='FinalObjectiveValue')
    figures.append(p)
show(bokeh.layouts.Column(*figures))

## Discussion Points

### Hyperparameters

The Training Job tuned four hyperparameters in previous example:
    
#### Max Tree Depth
*max_depth*: Maximum depth of a tree. Increasing this value makes the model more complex and likely to be overfitted. 

#### Learning Rate
*eta*: The learning rate is the shrinkage you do at every step you are making. the learning rate (eta) must be set as low as possible. However, as the learning rate (eta) gets lower, you need many more steps (rounds) to get to the optimum:
 - Increasing eta (learning rate) makes computation faster (because you need to input less rounds) but does not make reaching the best optimum.
 - Decreasing eta makes computation slower (because you need to input more rounds) but makes easier reaching the best optimum. 

#### Regularization 

*alpha*: Remember, 

    -Regularization is a technique to discourage the complexity of the model. It does this by penalizing the loss function. 
    - Loss function is the sum of squared difference between the actual value and the predicted value.
    - Regularization works on assumption that smaller weights generate simpler model and thus helps avoid overfitting.

### Child Weight

*min_child_weight*: In summary, 
    - the Tree building process gives up further partitioning based on this parameter. 
    - In linear regression models, this simply corresponds to a minimum number of instances needed in each node. The larger the algorithm, the more conservative it is. 


## Further Discussion Points

The graphs do not demonstrate any trend, because the number of runs are very limited. As an optional exercise, try doing more training and interpreting the results after a new tuning job run.