# <font color='blue'>Data Science Academy</font>
# <font color='blue'>Deploy de Modelos de Machine Learning</font>

# <font color='blue'>Amazon SageMaker</font>
## <font color='blue'>Lab</font>
### <font color='blue'>Deploy de Modelo Para Previsão de Doenças Usando Regsitros Médicos Eletrônicos</font>

## Parte 6 - Parallel Hyper Parameter Optimization (HPO) com SageMaker Tuning

O treinamento de um modelo de Machine Learning é governando por hiperprâmetros do algoritmo escolhido. Não temos como saber previamente quais são os valores ideais para os hiperparâmetros pois cada modelo pode requerer um conjunto diferente.

Visando aumemntar a performance de um modelo podemos realizar a otimização de hiperparâmetros a fim de buscar a combinação ideal para conseguir a melhor performance possível.

Neste jupyter notebook você encontra um exemplo completo de como realizar a otimização de hiperparâmetros com o SageMaker Tuning.

In [1]:
# Versão da Linguagem Python
from platform import python_version
print('Versão da Linguagem Python Usada Neste Jupyter Notebook:', python_version())

Versão da Linguagem Python Usada Neste Jupyter Notebook: 3.7.10


### Imports 

In [2]:
# ML Imports 
import os
import json
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.model_selection import train_test_split

# AWS Imports 
import boto3
import sagemaker
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner
from sagemaker.serializers import CSVSerializer
from sagemaker.inputs import TrainingInput
from sagemaker import get_execution_role

## Carrega os Dados e Define Parâmetros

In [3]:
# Parâmetros
sagemaker_execution_role = get_execution_role()
print('Role = {}'.format(sagemaker_execution_role))
session = boto3.Session()

# Clients e Resources
s3 = session.resource('s3')
sagemaker_session = sagemaker.Session()
sagemaker_client = boto3.client('sagemaker')

BUCKET = sagemaker_session.default_bucket()
PREFIX = 'xgboost-clf'

Couldn't call 'get_role' to get Role ARN from role name AmazonSageMaker-ExecutionRole-20210330T151228 to get Role path.
Assuming role was created in SageMaker AWS console, as the name contains `AmazonSageMaker-ExecutionRole`. Defaulting to Role ARN with service-role in path. If this Role ARN is incorrect, please add IAM read permissions to your role or supply the Role Arn directly.


Role = arn:aws:iam::879456481532:role/service-role/AmazonSageMaker-ExecutionRole-20210330T151228


In [4]:
# Altere para o nome do seu bucket
s3_bucket = 'dsa-deploy-app'
prefix = 'dados'

In [5]:
raiz = 's3://{}/{}/'.format(s3_bucket, prefix)
print(raiz)

s3://dsa-deploy-app/dados/


In [6]:
dados_treino = TrainingInput(s3_data = raiz + 'treino.csv', content_type = 'csv')
dados_teste = TrainingInput(s3_data = raiz + 'teste.csv', content_type = 'csv')

In [7]:
print(json.dumps(dados_treino.__dict__, indent = 2))

{
  "config": {
    "DataSource": {
      "S3DataSource": {
        "S3DataType": "S3Prefix",
        "S3Uri": "s3://dsa-deploy-app/dados/treino.csv",
        "S3DataDistributionType": "FullyReplicated"
      }
    },
    "ContentType": "csv"
  }
}


## Treinando o Modelo com SageMaker e Algoritmo XgBoost 

In [8]:
# Image URI
container_uri = sagemaker.image_uris.retrieve(region = session.region_name, 
                                              framework = 'xgboost', 
                                              version = '1.0-1', 
                                              image_scope = 'training')

In [9]:
# Estimador
xgb = sagemaker.estimator.Estimator(image_uri = container_uri,
                                    role = sagemaker_execution_role, 
                                    instance_count = 1, 
                                    instance_type = 'ml.m5.large',
                                    output_path='s3://{}/artefatos'.format(s3_bucket, prefix),
                                    sagemaker_session = sagemaker_session,
                                    base_job_name = 'clf-xgboost')

In [10]:
# Define os hiperparâmetros básicos
xgb.set_hyperparameters(objective='binary:logistic', num_round = 100)

In [11]:
# Dicionário com hiperparâmetros que serão usados na otimização
hyperparameter_ranges = {'eta': ContinuousParameter(0, 1),
                         'min_child_weight': ContinuousParameter(1, 10),
                         'alpha': ContinuousParameter(0, 2),
                         'max_depth': IntegerParameter(1, 10)}

In [12]:
# Métrica
objective_metric_name = 'validation:accuracy'

In [13]:
# Cria o objeto para otimização de hiiperparâmetros
tuner = HyperparameterTuner(xgb, objective_metric_name, hyperparameter_ranges, max_jobs = 10, max_parallel_jobs = 5)

In [14]:
# Treinamento
tuner.fit({'train': dados_treino, 'validation': dados_teste}, include_cls_metadata = False)

................................................................................................!


In [15]:
# Nome do job
hpo_job_name = tuner.latest_tuning_job.job_name
hpo_job_name

'sagemaker-xgboost-210402-1950'

In [16]:
# Resultados da otimização
tuning_job_results = sagemaker_client.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName = hpo_job_name)
status = tuning_job_results['HyperParameterTuningJobStatus']
status

'Completed'

In [17]:
# Melhor resultado da otimização
best_training_job = tuning_job_results['BestTrainingJob']
best_training_job

{'TrainingJobName': 'sagemaker-xgboost-210402-1950-008-4c50c2df',
 'TrainingJobArn': 'arn:aws:sagemaker:us-east-2:879456481532:training-job/sagemaker-xgboost-210402-1950-008-4c50c2df',
 'CreationTime': datetime.datetime(2021, 4, 2, 19, 54, 54, tzinfo=tzlocal()),
 'TrainingStartTime': datetime.datetime(2021, 4, 2, 19, 57, 19, tzinfo=tzlocal()),
 'TrainingEndTime': datetime.datetime(2021, 4, 2, 19, 58, 32, tzinfo=tzlocal()),
 'TrainingJobStatus': 'Completed',
 'TunedHyperParameters': {'alpha': '1.8647070752210735',
  'eta': '0.6177407010490166',
  'max_depth': '2',
  'min_child_weight': '8.540440581373812'},
 'FinalHyperParameterTuningJobObjectiveMetric': {'MetricName': 'validation:accuracy',
  'Value': 0.7961400151252747},
 'ObjectiveStatus': 'Succeeded'}

## Avaliação

Podemos listar hiperparâmetros e métricas objetivas de todos os jobs de treinamento e escolher o job de treinamento com a melhor métrica objetiva.

In [18]:
tuner = sagemaker.HyperparameterTuningJobAnalytics(hpo_job_name)
hpo_results_df = tuner.dataframe()

In [19]:
hpo_results_df

Unnamed: 0,alpha,eta,max_depth,min_child_weight,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
0,1.852622,0.595463,3.0,9.340844,sagemaker-xgboost-210402-1950-010-aa78fc81,Completed,0.79332,2021-04-02 19:56:59+00:00,2021-04-02 19:58:18+00:00,79.0
1,1.813682,0.344181,1.0,4.495148,sagemaker-xgboost-210402-1950-009-992d9c92,Completed,0.79614,2021-04-02 19:57:11+00:00,2021-04-02 19:58:32+00:00,81.0
2,1.864707,0.617741,2.0,8.540441,sagemaker-xgboost-210402-1950-008-4c50c2df,Completed,0.79614,2021-04-02 19:57:19+00:00,2021-04-02 19:58:32+00:00,73.0
3,0.21106,0.568119,2.0,8.852875,sagemaker-xgboost-210402-1950-007-619d3a67,Completed,0.79574,2021-04-02 19:56:55+00:00,2021-04-02 19:58:14+00:00,79.0
4,1.931602,0.615543,2.0,5.75528,sagemaker-xgboost-210402-1950-006-aa796bd2,Completed,0.79574,2021-04-02 19:56:49+00:00,2021-04-02 19:58:11+00:00,82.0
5,1.505116,0.595216,2.0,4.2339,sagemaker-xgboost-210402-1950-005-6663b100,Completed,0.79534,2021-04-02 19:53:20+00:00,2021-04-02 19:54:39+00:00,79.0
6,1.166638,0.490399,10.0,1.127587,sagemaker-xgboost-210402-1950-004-c3f169e9,Completed,0.76076,2021-04-02 19:53:01+00:00,2021-04-02 19:54:20+00:00,79.0
7,1.548065,0.814118,2.0,8.613298,sagemaker-xgboost-210402-1950-003-6911455e,Completed,0.79614,2021-04-02 19:53:00+00:00,2021-04-02 19:54:19+00:00,79.0
8,0.686665,0.781193,9.0,2.576308,sagemaker-xgboost-210402-1950-002-ac2c1162,Completed,0.74347,2021-04-02 19:52:53+00:00,2021-04-02 19:54:10+00:00,77.0
9,0.559307,0.736994,9.0,4.47546,sagemaker-xgboost-210402-1950-001-10c37320,Completed,0.75231,2021-04-02 19:52:47+00:00,2021-04-02 19:54:10+00:00,83.0


# Fim