Amazon SageMaker Autopilot Candidate Definition Notebook
This notebook was automatically generated by the AutoML job five. This notebook allows you to customize the AutoGluon trials and execute the SageMaker Autopilot workflow.

The dataset has 9 columns and the column named heart_disease is used as the target column. This is being treated as a BinaryClassification problem. The dataset also has 2 classes. This notebook will build a BinaryClassification model that maximizes the "F1" quality metric of the trained models. The "F1" metric applies for binary classification with a positive and negative class. It mixes between precision and recall, and is recommended in cases where there are more negative examples compared to positive examples.

As part of the AutoML job, the input dataset has been randomly split into two pieces, one for training and one for validation. Given an input dataset, Amazon SageMaker Autopilot runs a number of trials with different base models and metaparameter settings. This notebook helps you inspect and modify the metaparameters proposed by Amazon SageMaker Autopilot. You can interactively select one of the configurations proposed by Amazon SageMaker Autopilot, modify it and execute a processing job to train models as per the selected configuration.

Contents
Sagemaker Setup
Downloading Generated Candidates
SageMaker Autopilot Job and Amazon Simple Storage Service (Amazon S3) Configuration
Candidate Trials
Select Candidate to Train
Update Selected Candidate
Display Selected Candidate
Executing the Candidate Trial
Run Processing Job
Model Deployment
Deploying the Trained Model

Sagemaker Setup
Before you launch the SageMaker Autopilot jobs, we'll setup the environment for Amazon SageMaker

Check environment & dependencies.
Create a few helper objects/function to organize input/output data and SageMaker sessions.
Minimal Environment Requirements

Jupyter: Tested on JupyterLab 1.0.6, jupyter_core 4.5.0 and IPython 6.4.0
Kernel: conda_python3
Dependencies required
sagemaker-python-sdk>=2.40.0
Use !pip install sagemaker==2.40.0 to download this dependency.
Kernel may need to be restarted after download.
Expected Execution Role/permission
S3 access to the bucket that stores the notebook.


Downloading Generated Modules
Download the generated trial configurations and a SageMaker Autopilot helper module used by this notebook. Those artifacts will be downloaded to five-artifacts folder.

In [None]:
!mkdir -p five-artifacts
!aws s3 sync s3://sagemaker-studio-359522689357-fzxtb6a46av/five/sagemaker-automl-candidates/notebooks/sagemaker_automl_ensemble five-artifacts/sagemaker_automl_ensemble --only-show-errors
!aws s3 sync s3://sagemaker-studio-359522689357-fzxtb6a46av/five/sagemaker-automl-candidates/notebooks/trial_configs five-artifacts/trial_configs --only-show-errors

import sys
sys.path.append("five-artifacts")

SageMaker Autopilot Job and Amazon Simple Storage Service (Amazon S3) Configuration
The following configuration has been derived from the SageMaker Autopilot job. These items configure where this notebook will look for generated candidates, and where input and output data is stored on Amazon S3.


In [None]:
from sagemaker_automl_ensemble import AutoMLLocalEnsembleRunConfig, uid

# Where the existing AutoML job is stored
BASE_AUTOML_JOB_NAME = 'five'
BASE_AUTOML_JOB_CONFIG = {
    'automl_job_name': BASE_AUTOML_JOB_NAME,
    'automl_output_s3_base_path': 's3://sagemaker-studio-359522689357-fzxtb6a46av/five'
}

# Path conventions of the output data storage path from the local AutoML job run of this notebook
LOCAL_AUTOML_JOB_NAME = 'five-notebook-run-{}'.format(uid())
LOCAL_AUTOML_JOB_CONFIG = {
    'local_automl_job_name': LOCAL_AUTOML_JOB_NAME,
    'local_automl_job_output_s3_base_path': 's3://sagemaker-studio-359522689357-fzxtb6a46av/five/{}'.format(LOCAL_AUTOML_JOB_NAME),
}

AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG = AutoMLLocalEnsembleRunConfig(
    test_artifacts_path = 'five-artifacts',
    base_automl_job_config = BASE_AUTOML_JOB_CONFIG,
    local_automl_job_config = LOCAL_AUTOML_JOB_CONFIG
)

AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.display()

Candidate Trials
Select Candidate to Train
The SageMaker Autopilot Job has analyzed the dataset and has generated a number of trial configurations with different metaparameter settings. You can select a trial configuration that you wish to train:

In [None]:
from ipywidgets import interact

trials_dropdown = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.dropdown
interact(AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.select_trial, trials_dropdown=trials_dropdown)

Selected Candidate Metaparameters
You have selected the following metaparameters for your trial. (please run the cell below to load and display your selection):

In [None]:
AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.display_candidate()

Executing the Candidate Trial
Run Processing Job
Now you are ready to create processing job with the updated trial configuration.

Prepare Processor and Processing Job Inputs¶

In [None]:
from sagemaker.processing import Processor

processor_args = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_processor_args()
processor = Processor(**processor_args)

processing_inputs = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_processing_inputs()
processing_outputs = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_processing_outputs()
processing_job_name = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.local_automl_job_name

Run Processing Job for the Selected Trial

In [None]:
from IPython.display import display, Markdown

display(
Markdown(f"Creating Processing Job {processing_job_name}, please track the progress from [here](https://{AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.region}.console.aws.amazon.com/sagemaker/home?region={AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.region}#/processing-jobs/{processing_job_name})."))

processor.run(
    job_name = processing_job_name,
    inputs = processing_inputs,
    outputs = processing_outputs,
    logs = False
)

Model Deployment¶
Now, you can deploy the trained model from the processing job. After the deployment completes, you will get an endpoint that's ready to serve online inference.

In [None]:
from sagemaker.model import Model

model_args = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_model_args()
model = Model(**model_args)

model.deploy(initial_instance_count=2,
             instance_type='ml.m5.12xlarge',
             endpoint_name="AutoML-{}".format(processing_job_name),
             wait=True)