# Notebook 2: Breast Cancer Classification using Gene Expression Data

## Learning Objectives
- Use SageMaker processing, training, and hyperparameter tuning jobs to optimize cost and performance
- Compare model performance using SageMaker Experiments

## Environment Notes:
This notebook was created and tested on an `ml.t3.medium (2 vCPU + 4 GiB)` notebook instance running the `Python 3 (Data Science)` kernel in SageMaker Studio.

## Table of Contents
1. [Background](#1.-Background)
    1. [Jobs](#1.A.-Jobs)
    1. [Experiments](#1.B.-Experiments)
1. [Preparation](#2.-Preparation)
    1. [Import Python libraries](#2.A.-Import-Python-Libraries)
    1. [Create Some Necessary Clients](#2.B.-Create-some-necessary-clients)
    1. [Create an Experiment](#2.C.-Create-an-experiment)
    1. [Specify S3 Bucket and Prefix](#2.D.-Specify-S3-bucket-and-prefix)
    1. [Define Local Working Directories](#2.E.-Define-local-working-directories)
1. [Data Preparation with Amazon SageMaker Processing](#3.-Data-Preparation-with-Amazon-SageMaker-Processing)
    1. [Upload Raw Data to S3](#3.A.-Upload-Raw-Data-to-S3)
    1. [Submit SageMaker Processing Job](#3.B.-Submit-SageMaker-Processing-Job)
1. [Model Training](#4.-Model-Training)
    1. [Train Model Using a SKLearn Random Forest Algorithm](#4.A.-Train-Model-Using-a-SKLearn-Random-Forest-Algorithm)
    1. [Train Model using a Keras MLP](#4.B.-Train-Model-using-a-Keras-MLP)
    1. [Train Model Using the XGBoost Algorithm](#4.C.-Train-Model-Using-the-XGBoost-Algorithm)
1. [Model Evaluation](#5.-Model-Evaluation)
    1. [Download and Run the Trained XGBoost Model](#5.A.-Download-and-Run-the-Trained-XGBoost-Model)
    1. [Compare Model Results Using SageMaker Experiments](#5.B.-Compare-Model-Results-Using-SageMaker-Experiments)
1. [Hyperparameter Optimization](#6.-Hyperparameter-Optimization)
    1. [Submit Hyperparameter Optimization Job](#6.A-Submit-Hyperparameter-Optimization-Job)
    1. [Add HPO Job to Experiment](#6.B-Add-HPO-Job-to-Experiment)
    1. [Deploy Best Model](#6.C-Deploy-Best-Model)    

---

## 1. Background
In notebook 1 of this series, we demonstrated using RNAseq data to predict HER2 status using the compute resources on the notebook server. However, using notebook server resources to process large amounts of data or train complex models is generally not a good idea. It's possible to scale up your notebook server, but any time you spend on non-compute intensive tasks (i.e. most of your time) will be wasted. A better idea is to run your notebook on a small server and submit compute-intensive tasks to independent jobs. SageMaker provides managed services for running data processing, model training, and hyperparameter tuning jobs. In this notebook, we'll demonstrate how to leverage these services to optimize the performance and cost of our tasks.

Specifically, we'll demonstrate two best practices: **Jobs** and **Experiments**.

---

## 1.A. SageMaker Jobs

[SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) processing, training, and hyperparameter optimization (HPO) jobs allow data scientists to submit compute-heavy processes to external services. This keeps costs optimized and ensures that these tasks run in reproducible environments. It also improves data scientist productivity by allowing these jobs to run in "the background" and provides resiliancy if something happens to your notebook environment.

![alt text](img/jobs.png "Jobs")

## 1.B. SageMaker Experiments

![alt text](img/experiments.png "Experiments")

[SageMaker Experiments](https://aws.amazon.com/blogs/aws/amazon-sagemaker-experiments-organize-track-and-compare-your-machine-learning-trainings) make it as easy as possible to track data preparation and analysis steps. Organizing your ML project into experiments helps you manage large numbers of trials and alternative algorithms. Experiments also ensure that any artifacts your generate for production use can be traced back to their source.

---
## 2. Preparation

Let's start by specifying:

- The Python libraries that we'll use throughout the analysis
- The S3 bucket and prefix that you want to use for training and model data.  This should be within the same region as the Notebook Instance, training, and hosting.
- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these.  Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the boto regexp with a the appropriate full IAM role arn string(s).

### 2.A. Import Python Libraries

In [None]:
%pip install -r requirements.txt -q -q

In [None]:
import argparse
import boto3
from botocore.client import ClientError
import json
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import pickle
import sagemaker
from sagemaker.analytics import ExperimentAnalytics
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.sklearn.estimator import SKLearn
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.tensorflow import TensorFlow
from sagemaker.tuner import (
    IntegerParameter,
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner,
)
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, f1_score
import seaborn as sns
from smexperiments.experiment import Experiment
from smexperiments.search_expression import Filter, Operator, SearchExpression
from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent
from time import strftime, sleep
import xgboost as xgb

### 2.B. Create Some Necessary Clients

In [None]:
session = boto3.session.Session()
sm_session = sagemaker.session.Session(session)
region = session.region_name
role = sagemaker.get_execution_role()
s3 = boto3.client("s3", region_name=region)
account_id = boto3.client("sts").get_caller_identity().get("Account")

### 2.C. Create an Experiment

We create a new SageMaker experiment specific to our scientific goal, in this case to predict HER2 status.

In [None]:
create_date = strftime("%Y-%m-%d-%H-%M-%S")
brca_her2_experiment = Experiment.create(
    experiment_name=f"BRCA-HER2-{create_date}",
    description="Predict HER2 status using TCGA RNAseq data.",
    tags=[{"Key": "Creator", "Value": "arosalez"}],
)

### 2.D. Specify S3 Bucket and Prefix

In [None]:
bucket_name = f"brca-her2-classifier-{account_id}"
print(f"S3 bucket name is {bucket_name}")

### 2.E. Define Local Working Directories

In [None]:
WORKING_DIR = os.getcwd()
DATA_DIR = os.path.join(WORKING_DIR, "data")
print(f"Working directory is {WORKING_DIR}")
print(f"Data directory is {DATA_DIR}")

---
## 3. Data Preparation  with Amazon SageMaker Processing

Amazon SageMaker Processing allows you to run steps for data pre- or post-processing, feature engineering, data validation, or model evaluation workloads on Amazon SageMaker. Processing jobs accept data from Amazon S3 as input and store data into Amazon S3 as output.

![processing](https://sagemaker.readthedocs.io/en/stable/_images/amazon_sagemaker_processing_image1.png)

Here, we'll import the dataset and transform it with SageMaker Processing, which can be used to process terabytes of data in a SageMaker-managed cluster separate from the instance running your notebook server. In a typical SageMaker workflow, notebooks are only used for prototyping and can be run on relatively inexpensive and less powerful instances, while processing, training and model hosting tasks are run on separate, more powerful SageMaker-managed instances.  SageMaker Processing includes off-the-shelf support for Scikit-learn, as well as a Bring Your Own Container option, so it can be used with many different data transformation technologies and tasks.    

To use SageMaker Processing, simply supply a Python data preprocessing script as shown below.  For this example, we're using a SageMaker prebuilt Scikit-learn container, which includes many common functions for processing data.  There are few limitations on what kinds of code and operations you can run, and only a minimal contract:  input and output data must be placed in specified directories.  If this is done, SageMaker Processing automatically loads the input data from S3 and uploads transformed data back to S3 when the job is complete.

### 3.A. Upload Raw Data to S3

Download the raw data

In [None]:
# Define working directories
WORKING_DIR = os.getcwd()
DATA_DIR = os.path.join(WORKING_DIR, "data")
print(f"Working directory is {WORKING_DIR}")
print(f"Data directory is {DATA_DIR}")

# Get TCGA BRCA Gene Expression Data
!wget https://tcga.xenahubs.net/download/TCGA.BRCA.sampleMap/HiSeqV2_PANCAN.gz -nc -P $DATA_DIR/input/raw/
!gzip -df $DATA_DIR/input/raw/HiSeqV2_PANCAN.gz

# Get TCGA BRCA Phenotype Data
!wget https://tcga.xenahubs.net/download/TCGA.BRCA.sampleMap/BRCA_clinicalMatrix -nc -P $DATA_DIR/input/raw/

In [None]:
# Check if bucket already exists. If it does not, create it
try:
    s3.head_bucket(Bucket=bucket_name)
except ClientError:
    s3.create_bucket(Bucket=bucket_name)
    print(f"Created Bucket: {bucket_name} in Region: {region}")

Upload data to S3

In [None]:
clinical_source = sm_session.upload_data(
    f"{DATA_DIR}/input/raw/BRCA_clinicalMatrix",
    bucket=bucket_name,
    key_prefix="data/input",
)
RNAseq_source = sm_session.upload_data(
    f"{DATA_DIR}/input/raw/HiSeqV2_PANCAN", bucket=bucket_name, key_prefix="data/input"
)
print(f"Clinical phenotypes now available at {clinical_source}")
print(f"Normalized expression data now available at {RNAseq_source}")

### 3.B. Submit SageMaker Processing Job

Notice that this code block references `scripts/processing/processing.py`. The processing job will run this script on a different compute instance, in this case a ml.m5.xlarge. This allows us to use a small instance for our notebook server, while still taking advantage of a more powerful instance for the processing.

In [None]:
# Define the inputs for the processing job
inputs = [
    ProcessingInput(
        source=f"s3://{bucket_name}/data/input/",
        destination="/opt/ml/processing/input",
        s3_data_distribution_type="ShardedByS3Key",
    )
]

# Define the outputs for the processing job
outputs = [
    ProcessingOutput(
        output_name="train",
        source="/opt/ml/processing/output/train",
        destination=f"s3://{bucket_name}/data/output/train/",
    ),
    ProcessingOutput(
        output_name="validation",
        source="/opt/ml/processing/output/val",
        destination=f"s3://{bucket_name}/data/output/val/",
    ),
    ProcessingOutput(
        output_name="test",
        source="/opt/ml/processing/output/test",
        destination=f"s3://{bucket_name}/data/output/test/",
    ),
]

sklearn_processor = SKLearnProcessor(
    framework_version="0.20.0",
    role=role,
    instance_type="ml.m5.xlarge",
    instance_count=1,
)

processing_run_name = f"Processing-{strftime('%Y-%m-%d-%H-%M-%S')}"

sklearn_processor.run(
    job_name=processing_run_name,
    code="scripts/processing/processing.py",
    inputs=inputs,
    outputs=outputs,
    experiment_config={
        "ExperimentName": brca_her2_experiment.experiment_name,
        "TrialComponentDisplayName": processing_run_name,
    },
    wait=True,
)

Download Processed Data from S3

In [None]:
sm_session.download_data(
    f"{DATA_DIR}/output/train",
    bucket=bucket_name,
    key_prefix="data/output/train/train.csv",
)
sm_session.download_data(
    f"{DATA_DIR}/output/val", bucket=bucket_name, key_prefix="data/output/val/val.csv"
)
sm_session.download_data(
    f"{DATA_DIR}/output/test",
    bucket=bucket_name,
    key_prefix="data/output/test/test.csv",
)

---
## 4. Model Training

Now that our training data is set up, we can train some models. To highlight the benefits of experiment tracking, we're going to train models using three different frameworks:
- The random forest model from Scikit Learm
- A multi-layer perceptron (MLP) neural network in Keras
- The open-source XGBoost algorithm

Since we're using SageMaker jobs to run our training, we don't need to install any additional libraries or spin up expensive compute resources on our notebook server. The jobs use their own dependencies and we're only charged for the time they run.

First, let's define some variables that all three training jobs will need.

In [None]:
# define the data type and paths to the training and validation datasets
content_type = "text/csv"

s3_input_train = sagemaker.inputs.TrainingInput(
    f"s3://{bucket_name}/data/output/train/train.csv", content_type=content_type
)

s3_input_validation = sagemaker.inputs.TrainingInput(
    f"s3://{bucket_name}/data/output/val/val.csv", content_type=content_type
)

s3_input_test = sagemaker.inputs.TrainingInput(
    f"s3://{bucket_name}/data/output/test/test.csv", content_type=content_type
)

model_output_path = f"s3://{bucket_name}/"

### 4.A. Train Model Using a SKLearn Random Forest Algorithm

Create a trial

In [None]:
rf_trial = Trial.create(
    trial_name=f"RF-Trial-{strftime('%Y-%m-%d-%H-%M-%S')}",
    experiment_name=brca_her2_experiment.experiment_name,
)

Here again we're passing a script (`scripts/rf_train/rf_train.py`) to run during the training job. Notice that we've also included a `requirements.txt` file in the training script directory to install additional dependencies in the training container. This is a great way to install an extra package or two without creating your own container image from scratch!

Setting `wait=False` allows us to continue running the notebook while the training job runs in "the background" (on a different machine).

In [None]:
rf_job_name = f"RF-Training-Job-{strftime('%Y-%m-%d-%H-%M-%S')}"

rf_estimator = SKLearn(
    entry_point="rf_train.py",
    source_dir="scripts/rf_train",
    output_path=model_output_path,
    role=role,
    instance_count=1,
    instance_type="ml.c5.2xlarge",
    framework_version="0.23-1",
    enable_sagemaker_metrics=True,
    base_job_name=rf_job_name,
    hyperparameters={
        "n-estimators": 100,
        "min-samples-leaf": 3,
    },
)

rf_estimator.fit(
    {"train": s3_input_train, "validation": s3_input_validation, "test": s3_input_test},
    job_name=rf_job_name,
    experiment_config={
        "TrialName": rf_trial.trial_name,
        "TrialComponentDisplayName": rf_job_name,
    },
    wait=False,
)

Note that you can also run the same training script in the notebook, as long as you have the dependencies installed!

In [None]:
!python scripts/rf_train/rf_train.py --n-estimators 100 \
                   --min-samples-leaf 3 \
                   --model-dir models \
                   --train "data/output/train" \
                   --validation "data/output/val" \
                   --test "data/output/test"

---

### 4.B. Train Model using a Keras MLP

Create a trial

In [None]:
tf_trial = Trial.create(
    trial_name=f"TF-Trial-{strftime('%Y-%m-%d-%H-%M-%S')}",
    experiment_name=brca_her2_experiment.experiment_name,
)

Submit the training job

In [None]:
tf_job_name = f"TF-Training-Job-{strftime('%Y-%m-%d-%H-%M-%S')}"

tf_estimator = TensorFlow(
    entry_point="tf_train.py",
    source_dir="scripts/tf_train",
    output_path=model_output_path,
    role=role,
    instance_count=1,
    instance_type="ml.g4dn.xlarge",
    enable_sagemaker_metrics=True,
    framework_version="2.2",
    py_version="py37",
    metric_definitions=[
        {"Name": "test:accuracy", "Regex": "Accuracy: ([0-9.]+)$"},
        {"Name": "test:precision", "Regex": "Precision: ([0-9.]+)$"},
        {"Name": "test:f1", "Regex": "F1 Score: ([0-9.]+)$"},
    ],
)

tf_estimator.fit(
    {"train": s3_input_train, "validation": s3_input_validation, "test": s3_input_test},
    job_name=tf_job_name,
    experiment_config={
        "TrialName": tf_trial.trial_name,
        "TrialComponentDisplayName": tf_job_name,
    },
    wait=False,
)

### 4.C. Train Model Using the XGBoost Algorithm

Compare the XGBoost training script we're about to run (scripts/rf_train/rf_train.py) with the training function we used in Notebook 1. You'll notice that the `xgb.train` call is the same in both!

Since we're setting `wait=True` our Jupyter session will wait until this training job is finished before moving on

Create a trial

In [None]:
xgb_trial = Trial.create(
    trial_name=f"XGBoost-Trial-{strftime('%Y-%m-%d-%H-%M-%S')}",
    experiment_name=brca_her2_experiment.experiment_name,
)

Submit the training job

In [None]:
xgb_job_name = f"XGB-Training-Job-{strftime('%Y-%m-%d-%H-%M-%S')}"

framework_version = "1.3-1"
py_version = "py3"

hyper_params_dict = {
    "objective": "binary:logistic",
    "booster": "gbtree",
    "eval_metric": "error",
}

xgb_estimator = XGBoost(
    entry_point="xgb_train.py",
    source_dir="scripts/xgb_train",
    output_path=model_output_path,
    framework_version=framework_version,
    py_version=py_version,
    hyperparameters=hyper_params_dict,
    role=role,
    instance_count=1,
    instance_type="ml.m5.2xlarge",
    # Workaround for https://github.com/aws/sagemaker-python-sdk/issues/2876
    image_uri=sagemaker.image_uris.retrieve("xgboost", region, framework_version)
    + "-cpu-"
    + py_version,
    enable_sagemaker_metrics=True,
)

xgb_estimator.fit(
    {"train": s3_input_train, "validation": s3_input_validation, "test": s3_input_test},
    job_name=xgb_job_name,
    experiment_config={
        "TrialName": xgb_trial.trial_name,
        "TrialComponentDisplayName": xgb_job_name,
    },
    logs=True,
    wait=True,
)

---
## 5. Model Evaluation

### 5.A. Download and Run the Trained XGBoost Model

In Notebook 1, we used a confusion matrix to evaluate the accuracy of our model. Let's download our trained XGBoost model and do the same thing here.

First, we download the model artifact from S3 and load it into our notebook.

In [None]:
sm_session.download_data(
    "models", bucket=bucket_name, key_prefix=f"{xgb_job_name}/output/model.tar.gz"
)
!tar xvfz models/model.tar.gz -C models

loaded_model = pickle.load(open("models/xgboost-model", "rb"))

Next, we read in the test data and seperate it into features and labels

In [None]:
with open(f"{DATA_DIR}/output/test/test.csv", "rb") as file:
    test_np = np.loadtxt(file, delimiter=",")
test_labels = test_np[:, 0]
test_np = test_np[:, 1:]
test_dm = xgb.DMatrix(test_np)

Finally, we define a function for generating a confusion matrix and use it to analyze our test predictions

In [None]:
# Create a custom function for generating a confusion matrix for a given p-value
def plot_cm(labels, predictions, p=0.5):
    cm = confusion_matrix(labels, predictions > p)
    plt.figure(figsize=(5, 5))
    sns.heatmap(cm, annot=True, fmt="d")
    plt.title("Confusion matrix @{:.2f}".format(p))
    plt.ylabel("Actual label")
    plt.xlabel("Predicted label")

    if len(set(list(labels))) == 2:
        print("Correctly un-detected (True Negatives): ", cm[0][0])
        print("Incorrectly detected (False Positives): ", cm[0][1])
        print("Misses (False Negatives): ", cm[1][0])
        print("Hits (True Positives): ", cm[1][1])
        print("Total: ", np.sum(cm[1]))

In [None]:
# evaluate predictions
test_predictions = loaded_model.predict(test_dm)

accuracy = accuracy_score(test_labels, np.rint(test_predictions))
precision = precision_score(test_labels, np.rint(test_predictions))
f1 = f1_score(test_labels, np.rint(test_predictions))

plot_cm(test_labels, np.array(test_predictions))

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"F1 Score: {f1:.2f}")

### 5.B. Compare Model Results Using SageMaker Experiments

SageMaker Experiments saves key information about our models for easy viewing and comparison in the SageMaker Studio UI.

To start, click on the SageMaker Resources icon on the Studio sidebar and select `Experiments and trials` from the menu. To view information about your experiment click on the name (should start with "BRCA-HER2-" and then select `Open in trial component list`.

![alt text](img/sm-resources-tab.png "Studio Resources")

The Trial Component list has a record for each of the training jobs, plus the processing job. You can click on a trial component name for more information about that job.

![alt text](img/Trial-component-list.png "Trial Component List")

We can compare the performance of our model training jobs by adding an additional metric to the table. To do this, click on the Gear on the Studio sidebar and then `test:f1` in the Metrics section.

![alt text](img/metrics.png "Metrics")

Now we can see that the XGBoost model had the highest f1 score on the test data.

![alt text](img/tc-list-2.png "Updated Trial Component List")

You can view the same information programmatically by using the `ExperimentAnalytics` class

In [None]:
search_expression = {
    "Filters": [
        {
            "Name": "DisplayName",
            "Operator": "Contains",
            "Value": "Training",
        }
    ],
}

trial_component_analytics = ExperimentAnalytics(
    sagemaker_session=sm_session,
    experiment_name=brca_her2_experiment.experiment_name,
    search_expression=search_expression,
    sort_by="metrics.test:f1.last",
    sort_order="Descending",
    metric_names=["test:f1"],
    parameter_names=["SageMaker.InstanceType"],
)

trial_component_analytics.dataframe()

---
## 6. Hyperparameter Optimization

In the previous section, we saw that our XGBoost classifier gave the best results on our test dataset. However, we can likely improve its accuracy further through hyperparameter optimization (HPO). During HPO, we repeatedly train our model with small changes to one or more parameters each time. SageMaker Training is a great fit for this because it allows us to run multiple training jobs in parallel.

### 6.A. Submit Hyperparameter Optimization Job

In this example, we'll look at different values of the `alpha` and `eta` hyperparameters to see if we can decrease the error of our model on the validation data.

In [None]:
hyperparameter_ranges = {
    "alpha": IntegerParameter(0, 250, scaling_type="Auto"),
    "eta": ContinuousParameter(0.1, 0.5, scaling_type="Auto"),
}

hpo_tuner = HyperparameterTuner(
    xgb_estimator,
    objective_metric_name="validation:error",
    objective_type="Minimize",
    hyperparameter_ranges=hyperparameter_ranges,
    max_jobs=10,
    max_parallel_jobs=10,
)

hpo_tuner.fit(
    {"train": s3_input_train, "validation": s3_input_validation, "test": s3_input_test},
    include_cls_metadata=True,
)

View tuning job results

In [None]:
tuner_description = hpo_tuner.describe()
objective_name = tuner_description["HyperParameterTuningJobConfig"][
    "HyperParameterTuningJobObjective"
]["MetricName"]
tuner = hpo_tuner.analytics()
tuning_results = tuner.dataframe().sort_values(by="FinalObjectiveValue")
tuning_results

Let's visualize what impact our hyperparameter tuning had on the model error

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.suptitle("Hyperparameter tuning results")
fig.set_size_inches(10, 5)

ax1.scatter(tuning_results.alpha, tuning_results.FinalObjectiveValue)
ax1.set_xlabel("alpha")
ax1.set_ylabel(objective_name)

ax2.scatter(tuning_results.eta, tuning_results.FinalObjectiveValue)
ax2.set_xlabel("eta")
ax2.set_ylabel(objective_name)

plt.show()

This data suggests that `alpha` less than 100 and `eta` values around 0.5 lead to the lowest error.

### 6.B. Add HPO Job to Experiment

HPO training jobs automatically create new, unassigned trial components in SageMaker Experiments. To view them alongside our other trials, we need to manually associated them to our experiment.

Create a trial

In [None]:
hpo_trial = Trial.create(
    trial_name=f"HPO-Trial-{strftime('%Y-%m-%d-%H-%M-%S')}",
    experiment_name=brca_her2_experiment.experiment_name,
)

Filter for the training job names that contain the tuning job name (and have "SageMakerTrainingJob" as the source type)


In [None]:
source_arn_filter = Filter(
    name="TrialComponentName",
    operator=Operator.CONTAINS,
    value=tuner_description["HyperParameterTuningJobName"],
)
source_type_filter = Filter(
    name="Source.SourceType", operator=Operator.EQUALS, value="SageMakerTrainingJob"
)

search_expression = SearchExpression(filters=[source_arn_filter, source_type_filter])

trial_component_search_results = list(
    TrialComponent.search(search_expression=search_expression)
)

Associate the trial components with the trial


In [None]:
for tc in trial_component_search_results:
    print(
        f"Associating trial component {tc.trial_component_name} with trial {hpo_trial.trial_name}."
    )
    hpo_trial.add_trial_component(tc.trial_component_name)
    sleep(0.5)  # sleep to avoid throttling

Now we can view the HPO jobs in the SageMaker Studio UI alongside the training jobs we created above.

![alt text](img/hpo_tc.png "Trial Component List with HPO jobs")

### 6.C. Deploy Best Model

To deploy the best model we first create an estimator from the HPO object, then call `deploy()`.

In [None]:
predictor = hpo_tuner.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.2xlarge",
)

We can now use this endpoint to make a prediction

In [None]:
predictor.serializer = CSVSerializer()
predictor.deserializer = JSONDeserializer()

test_df = pd.read_csv(f"{DATA_DIR}/output/test/test.csv")

print(f"Prediction for test record 1 is {predictor.predict(test_df.iloc[0,1:])[0]}")
print(f"Actual label is {test_df.iloc[0,0]}")

Clean up your endpoint to avoide ongoing charges

In [None]:
predictor.delete_endpoint()

In the next notebook, we'll look at deployment options in more detail.