# Amazon SageMaker Autopilot Candidate Definition Notebook

 This notebook was automatically generated by the AutoML job Six or Final model. This notebook allows you to customize the AutoGluon trials and execute the SageMaker Autopilot workflow.

The dataset has 9 columns and the column named diabetes is used as the target column. This is being treated as a BinaryClassification problem. The dataset also has 2 classes. This notebook will build a BinaryClassification model that maximizes the "F1" quality metric of the trained models. The "F1" metric applies for binary classification with a positive and negative class. It mixes between precision and recall, and is recommended in cases where there are more negative examples compared to positive examples.

As part of the AutoML job, the input dataset has been randomly split into two pieces, one for training and one for validation. Given an input dataset, Amazon SageMaker Autopilot runs a number of trials with different base models and metaparameter settings. This notebook helps you inspect and modify the metaparameters proposed by Amazon SageMaker Autopilot. You can interactively select one of the configurations proposed by Amazon SageMaker Autopilot, modify it and execute a processing job to train models as per the selected configuration.

1. Contents
2. Sagemaker Setup
3. Downloading Generated Candidates
4. SageMaker Autopilot Job and Amazon Simple Storage Service (Amazon S3) Configuration
5. Candidate Trials
6. Select Candidate to Train
7. Update Selected Candidate
8. Display Selected Candidate
9. Executing the Candidate Trial
10. Run Processing Job
11. Model Deployment
Deploying the Trained Model

## Sagemaker Setup
Before you launch the SageMaker Autopilot jobs, we'll setup the environment for Amazon SageMaker

1. Check environment & dependencies.
2. Create a few helper objects/function to organize input/output data and SageMaker     sessions.
3. Minimal Environment Requirements

4. Jupyter: Tested on JupyterLab 1.0.6, jupyter_core 4.5.0 and IPython 6.4.0
5. Kernel: conda_python3
6. Dependencies required
7. sagemaker-python-sdk>=2.40.0
8. Use !pip install sagemaker==2.40.0 to download this dependency.
9. Kernel may need to be restarted after download.
10. Expected Execution Role/permission
11. S3 access to the bucket that stores the notebook.
12. Downloading Generated Modules
13. Download the generated trial configurations and a SageMaker Autopilot helper module used by this notebook. Those artifacts will be downloaded to six-artifacts folder.

Downloading Generated Modules
Download the generated trial configurations and a SageMaker Autopilot helper module used by this notebook. Those artifacts will be downloaded to six-artifacts folder.

In [None]:
!mkdir -p six-artifacts
!aws s3 sync s3://sagemaker-studio-359522689357-fzxtb6a46av/six/sagemaker-automl-candidates/notebooks/sagemaker_automl_ensemble six-artifacts/sagemaker_automl_ensemble --only-show-errors
!aws s3 sync s3://sagemaker-studio-359522689357-fzxtb6a46av/six/sagemaker-automl-candidates/notebooks/trial_configs six-artifacts/trial_configs --only-show-errors

import sys
sys.path.append("six-artifacts")

# SageMaker Autopilot Job and Amazon Simple Storage Service (Amazon S3) Configuration
The following configuration has been derived from the SageMaker Autopilot job. These items configure where this notebook will look for generated candidates, and where input and output data is stored on Amazon S3.

In [None]:
from sagemaker_automl_ensemble import AutoMLLocalEnsembleRunConfig, uid

# Where the existing AutoML job is stored
BASE_AUTOML_JOB_NAME = 'six'
BASE_AUTOML_JOB_CONFIG = {
    'automl_job_name': BASE_AUTOML_JOB_NAME,
    'automl_output_s3_base_path': 's3://sagemaker-studio-359522689357-fzxtb6a46av/six'
}

# Path conventions of the output data storage path from the local AutoML job run of this notebook
LOCAL_AUTOML_JOB_NAME = 'six-notebook-run-{}'.format(uid())
LOCAL_AUTOML_JOB_CONFIG = {
    'local_automl_job_name': LOCAL_AUTOML_JOB_NAME,
    'local_automl_job_output_s3_base_path': 's3://sagemaker-studio-359522689357-fzxtb6a46av/six/{}'.format(LOCAL_AUTOML_JOB_NAME),
}

AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG = AutoMLLocalEnsembleRunConfig(
    test_artifacts_path = 'six-artifacts',
    base_automl_job_config = BASE_AUTOML_JOB_CONFIG,
    local_automl_job_config = LOCAL_AUTOML_JOB_CONFIG
)

AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.display()

# Candidate Trials
Select Candidate to Train
The SageMaker Autopilot Job has analyzed the dataset and has generated a number of trial configurations with different metaparameter settings. You can select a trial configuration that you wish to train:

In [None]:
from ipywidgets import interact

trials_dropdown = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.dropdown
interact(AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.select_trial, trials_dropdown=trials_dropdown)

# Update Selected Candidate
By editing and saving the metaparameters.json file, you can update the metaparameters that will be used for training. (To edit the file use Right Click->Open With->Editor.)
IF you wish to reselect another trial from the dropdown, make sure you close and reopen the metaparameters.json file tab, before editing.

The following are the metaparameters that can be updated. You can update the metaparameters of your choice. The updated parameters will be passed to AutoGluon predictor for training. For a detailed description of the parameters, refer to the description of each arguments in AutoGluon predictor.

# Available Knobs
num_bag_sets: Number of repeats of kfold bagging to perform. Valid values: integer
included_model_types: List of models to train. Valid values: any subset of following list: ["XGB", "GBM", "CAT", "FASTAI", "NN_TORCH", "LR", "RF", "XT"]
"XGB" (XGBoost)
"GBM" (LightGBM)
"CAT" (CatBoost)
"FASTAI" (neural network with FastAI backend)
"NN_TORCH" ((neural network implemented in Pytorch)
"LR" (linear regression)
"RF" (random forest)
"XT" (extremely randomized trees)
presets: List of preset configurations for various arguments. ['best_quality', 'high_quality', 'good_quality', 'medium_quality', 'optimize_for_deployment', 'interpretable', 'ignore_text']
It is recommended to only use one quality based preset in a given call to fit() as they alter many of the same arguments and are not compatible with each-other.
auto_stack: Whether AutoGluon should automatically utilize bagging and multi-layer stack ensembling to boost predictive accuracy. Valid values: boolean
num_stack_levels: Number of stacking levels to use in stack ensemble. Valid values: integer
refit_full: Whether to retrain all models on all of the data (training + validation) after the normal training procedure. Valid values: boolean
set_best_to_refit_full: If True, AutoGluon will change the default model that Predictor uses for prediction when model is not specified to the refit_full version of the model that exhibited the highest validation score. Only valid if refit_full is set. Valid values: boolean
save_bag_folds: Whether bagged models will save their fold models. Valid values: boolean

# Selected Candidate Metaparameters
You have selected the following metaparameters for your trial. (please run the cell below to load and display your selection):

In [None]:
AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.display_candidate()

# Executing the Candidate Trial
Run Processing Job
Now you are ready to create processing job with the updated trial configuration.

Prepare Processor and Processing Job Inputs

In [None]:
from sagemaker.processing import Processor

processor_args = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_processor_args()
processor = Processor(**processor_args)

processing_inputs = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_processing_inputs()
processing_outputs = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_processing_outputs()
processing_job_name = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.local_automl_job_name

# Run Processing Job for the Selected Trial

In [None]:
from IPython.display import display, Markdown

display(
Markdown(f"Creating Processing Job {processing_job_name}, please track the progress from [here](https://{AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.region}.console.aws.amazon.com/sagemaker/home?region={AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.region}#/processing-jobs/{processing_job_name})."))

processor.run(
    job_name = processing_job_name,
    inputs = processing_inputs,
    outputs = processing_outputs,
    logs = False
)

# Model Deployment
Now, you can deploy the trained model from the processing job. After the deployment completes, you will get an endpoint that's ready to serve online inference.

# 💡 Available Knobs
You can customize the initial instance count and instance type used to deploy this model.
Endpoint name can be changed to avoid conflict with existing endpoints.

In [None]:
from sagemaker.model import Model

model_args = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_model_args()
model = Model(**model_args)

model.deploy(initial_instance_count=2,
             instance_type='ml.m5.12xlarge',
             endpoint_name="AutoML-{}".format(processing_job_name),
             wait=True)

Hello