# Amazon SageMaker Autopilot Candidate Definition Notebook

This notebook was automatically generated by the AutoML job **Canvas1722881529759**.
This notebook allows you to customize the [AutoGluon](https://auto.gluon.ai/stable/index.html) trial and execute the SageMaker Autopilot workflow.

The dataset has **4** columns and the column named **QUANTIDADE_ESTOQUE** is used as
the target column. This is being treated as a **Regression** problem. 
This notebook will build a **[Regression](https://en.wikipedia.org/wiki/Regression_analysis)** model that
**minimizes** the "**MSE**" quality metric of the trained models.
The "**MSE**" metric stands for mean square error. It minimizes the square distance between the model's prediction and the true answer.

As part of the AutoML job, the input dataset has been randomly split into two pieces, one for **training** and one for
**validation**. Given an input dataset, Amazon SageMaker Autopilot runs one trial with hyperparameter settings.
This notebook helps you inspect and modify the hyperparameter proposed by Amazon SageMaker Autopilot.
You can modify hyperparameter and execute a training job to train models as per the modified configuration.

---

## Contents

1. [Sagemaker Setup](#Sagemaker-Setup)
    1. [Downloading Generated Candidates](#Downloading-Generated-Modules)
    1. [SageMaker Autopilot Job and Amazon Simple Storage Service (Amazon S3) Configuration](#SageMaker-Autopilot-Job-and-Amazon-Simple-Storage-Service-(Amazon-S3)-Configuration)
1. [Modify Hyperparameters](#Modify-Hyperparameters)
1. [Executing Training Job](#Executing-Training-Job)
    1. [Run Training Job](#Run-Training-Job)
1. [Model Deployment](#Model-Deployment)

---

## Sagemaker Setup

Before you launch the SageMaker Autopilot jobs, we'll setup the environment for Amazon SageMaker
- Check environment & dependencies.
- Create a few helper objects/function to organize input/output data and SageMaker sessions.

**Minimal Environment Requirements**

- Jupyter: Tested on `JupyterLab 4.1.5`, `jupyter_core 5.7.2` and `IPython 8.22.2`
- Kernel: `conda_python3`
- Dependencies required
  - `sagemaker-python-sdk>=2.214.3`
    - Use `!pip install sagemaker==2.214.3` to download this dependency.
    - Kernel may need to be restarted after download.
- Expected Execution Role/permission
  - S3 access to the bucket that stores the notebook.
  - Permission to create SageMaker training job and deploy endpoint
  - Permission to call describe_training_job on the SageMaker training job

### Downloading Generated Modules
Download the generated trial configurations and a SageMaker Autopilot helper module used by this notebook.
Those artifacts will be downloaded to **Canvas1722881529759-artifacts** folder.

In [None]:
!mkdir -p Canvas1722881529759-artifacts
!aws s3 sync \
$(aws sagemaker describe-training-job --training-job-name Canvas1722881529759-t1-1-4a6078fec54c46599527ec1c0848ee76e29bf7 --query 'CheckpointConfig.S3Uri' --output text)/sagemaker-automl-candidates/notebooks/sagemaker_automl_ensemble \
Canvas1722881529759-artifacts/sagemaker_automl_ensemble --only-show-errors

import sys
sys.path.append("Canvas1722881529759-artifacts")

### SageMaker Autopilot Job and Amazon Simple Storage Service (Amazon S3) Configuration

The following configuration has been derived from the SageMaker Autopilot job. These items configure where this notebook will
look for generated candidates, and where input and output data is stored on Amazon S3.

In [None]:
from sagemaker_automl_ensemble import AutoMLLocalEnsembleTrainingJobConfig, uid

# Where the existing AutoML job is stored
BASE_AUTOML_JOB_NAME = 'Canvas1722881529759'
BASE_AUTOML_JOB_CONFIG = {
    'automl_job_name': BASE_AUTOML_JOB_NAME,
    'base_deployment_image_uri': '763104351884.dkr.ecr.us-east-2.amazonaws.com/autogluon-inference:0.4.3-cpu-py38-ubuntu20.04',
}

# Path conventions of the output data storage path from the local AutoML job run of this notebook
LOCAL_TRAINING_JOB_NAME = 'Canvas1722-notebook-run-{}'.format(uid())
LOCAL_TRAINING_JOB_CONFIG = {
    'local_training_job_name': LOCAL_TRAINING_JOB_NAME,
}

AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG = AutoMLLocalEnsembleTrainingJobConfig(
    test_artifacts_path = 'Canvas1722881529759-artifacts',
    base_training_job_name = 'Canvas1722881529759-t1-1-4a6078fec54c46599527ec1c0848ee76e29bf7',
    base_automl_job_config = BASE_AUTOML_JOB_CONFIG,
    local_training_job_config = LOCAL_TRAINING_JOB_CONFIG
)

## Modify Hyperparameters

By editing hyperparameters in the next cell, you can update the hyperparameters that will be used for training.

The following are the hyperparameters that can be updated. You can update the hyperparameters of your choice.
The updated parameters will be passed to AutoGluon predictor for training. For a detailed description of the parameters,
refer to the [description of each arguments in AutoGluon predictor.](https://auto.gluon.ai/stable/_modules/autogluon/tabular/predictor/predictor.html)

<div class="alert alert-info"> 💡 <strong> Available Knobs</strong>

1. excluded_model_types: List of banned models to avoid training.
List of models banned to train. Valid values: any subset of following list: ["XGB", "GBM", "CAT", "FASTAI", "NN_TORCH", "LR", "RF", "XT"]
    1. "XGB" (XGBoost)
    1. "GBM" (LightGBM)
    1. "CAT" (CatBoost)
    1. "FASTAI" (neural network with FastAI backend)
    1. "NN_TORCH" ((neural network implemented in Pytorch)
    1. "LR" (linear regression)
    1. "RF" (random forest)
    1. "XT" (extremely randomized trees)
1. presets: List of preset configurations for various arguments. ['best_quality', 'high_quality', 'good_quality', 'medium_quality', 'optimize_for_deployment', 'interpretable', 'ignore_text']
    - It is recommended to only use one `quality` based preset in a given call to `fit()` as they alter many of the same arguments and are not compatible with each-other.
1. auto_stack: Whether AutoGluon should automatically utilize bagging and multi-layer stack ensembling to boost predictive accuracy. Valid values: boolean
1. refit_full: Whether to retrain all models on all of the data (training + validation) after the normal training procedure. Valid values: boolean
1. set_best_to_refit_full: If True, AutoGluon will change the default model that Predictor uses for prediction when model is not specified to the refit_full version
    of the model that exhibited the highest validation score. Only valid if refit_full is set. Valid values: boolean
1. save_bag_folds: Whether bagged models will save their fold models. Valid values: boolean
1. time_limit: Approximately how long training should run for (time in seconds).

</div>

In [None]:
hyperparameters = {
    "eval_metric": "MSE",
    "excluded_model_types": "KNN, NN_TORCH, GBM, CAT, FASTAI, LR, RF, XT, custom",
    "presets": "good_quality, optimize_for_deployment",
    "problem_type": "Regression",
    "time_limit": "300"
}

AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.set_hyperparameters(hyperparameters)

## Executing Training Job
### Run Training Job
Now you are ready to create training job with the modified hyperparameter.

#### Prepare Training Job Inputs

In [None]:
from sagemaker.estimator import Estimator

estimator_args = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_estimator_args()
estimator = Estimator(**estimator_args)

inputs = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_training_input()
training_job_name = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.local_training_job_name

#### Run Training Job with modifed hyperparameters

In [None]:
from IPython.display import display, Markdown

display(
Markdown(f"Creating Training Job {training_job_name}, please track the progress from [here](https://{AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.region}.console.aws.amazon.com/sagemaker/home?region={AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.region}#/jobs/{training_job_name})."))

estimator.fit(
    inputs=inputs,
    job_name=training_job_name
)

## Model Deployment
Now, you can deploy the trained model from the training job. After the deployment completes, you will get an endpoint that's ready to serve online inference.

<div class="alert alert-info"> 💡 <strong> Available Knobs</strong>

1. You can customize the initial instance count and instance type used to deploy this model.
2. Endpoint name can be changed to avoid conflict with existing endpoints.

</div>

In [None]:
from sagemaker.model import Model

model_args = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.prepare_model_args()
model = Model(**model_args)

local_training_job_name = AUTOML_LOCAL_ENSEMBLE_RUN_CONFIG.local_training_job_name

model.deploy(initial_instance_count=2,
             instance_type='ml.m5.12xlarge',
             endpoint_name="AutoML-{}".format(local_training_job_name),
             wait=True,
             tags=[{'Key': 'sagemaker:is-canvas-resource', 'Value': 'True'}, {'Key': 'sagemaker:service:source:additionalMetadata', 'Value': 'canvas:notebook:tabular-quickbuild'}])

Congratulations! Now you could visit the sagemaker
[endpoint console page](https://us-west-2.console.aws.amazon.com/sagemaker/home?region=us-west-2#/endpoints) to find the deployed endpoint (it'll take a few minutes to be in service).

<div class="alert alert-warning">
    <strong>To rerun this notebook, delete or change the name of your endpoint!</strong> <br>
    If you rerun this notebook, you'll run into an error on the last step because the endpoint already exists. You can either delete the endpoint from the endpoint console page or you can change the <code>endpoint_name</code> in the previous code block.
</div>