# Modelo Para Previsão de Doenças Usando Registros Médicos Eletrônicos - Parte 6

## Parte 6 - Parallel Hyper Parameter Optimization (HPO) com SageMaker Tuning

O treinamento de um modelo de Machine Learning é governando por hiperprâmetros do algoritmo escolhido. Não temos como saber previamente quais são os valores ideais para os hiperparâmetros pois cada modelo pode requerer um conjunto diferente.

Visando aumemntar a performance de um modelo podemos realizar a otimização de hiperparâmetros a fim de buscar a combinação ideal para conseguir a melhor performance possível.

Neste jupyter notebook você encontra um exemplo completo de como realizar a otimização de hiperparâmetros com o SageMaker Tuning.

### Imports 

In [1]:
# ML Imports 
import os
import json
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.model_selection import train_test_split

# AWS Imports 
import boto3
import sagemaker
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner
from sagemaker.serializers import CSVSerializer
from sagemaker.inputs import TrainingInput
from sagemaker import get_execution_role

## Carrega os Dados e Define Parâmetros

In [2]:
# Parâmetros
sagemaker_execution_role = get_execution_role()
print('Role = {}'.format(sagemaker_execution_role))
session = boto3.Session()

# Clients e Resources
s3 = session.resource('s3')
sagemaker_session = sagemaker.Session()
sagemaker_client = boto3.client('sagemaker')

BUCKET = sagemaker_session.default_bucket()
PREFIX = 'xgboost-clf'

Role = arn:aws:iam::351371806175:role/service-role/AmazonSageMaker-ExecutionRole-20220722T092670


In [3]:
# Altere para o nome do seu bucket
s3_bucket = 'krupck-bucket-bloodpressure'
prefix = 'dados'

In [4]:
raiz = 's3://{}/{}/'.format(s3_bucket, prefix)
print(raiz)

s3://krupck-bucket-bloodpressure/dados/


In [5]:
dados_treino = TrainingInput(s3_data = raiz + 'treino.csv', content_type = 'csv')
dados_teste = TrainingInput(s3_data = raiz + 'teste.csv', content_type = 'csv')

In [6]:
print(json.dumps(dados_treino.__dict__, indent = 2))

{
  "config": {
    "DataSource": {
      "S3DataSource": {
        "S3DataType": "S3Prefix",
        "S3Uri": "s3://krupck-bucket-bloodpressure/dados/treino.csv",
        "S3DataDistributionType": "FullyReplicated"
      }
    },
    "ContentType": "csv"
  }
}


## Treinando o Modelo com SageMaker e Algoritmo XgBoost 

In [7]:
# Image URI
container_uri = sagemaker.image_uris.retrieve(region = session.region_name, 
                                              framework = 'xgboost', 
                                              version = '1.0-1', 
                                              image_scope = 'training')

In [8]:
# Estimador
xgb = sagemaker.estimator.Estimator(image_uri = container_uri,
                                    role = sagemaker_execution_role, 
                                    instance_count = 1, 
                                    instance_type = 'ml.m5.large',
                                    output_path='s3://{}/artefatos'.format(s3_bucket, prefix),
                                    sagemaker_session = sagemaker_session,
                                    base_job_name = 'clf-xgboost')

In [9]:
# Define os hiperparâmetros básicos
xgb.set_hyperparameters(objective='binary:logistic', num_round = 100)

In [10]:
# Dicionário com hiperparâmetros que serão usados na otimização
hyperparameter_ranges = {'eta': ContinuousParameter(0, 1),
                         'min_child_weight': ContinuousParameter(1, 10),
                         'alpha': ContinuousParameter(0, 2),
                         'max_depth': IntegerParameter(1, 10)}

In [11]:
# Métrica
objective_metric_name = 'validation:accuracy'

In [12]:
# Cria o objeto para otimização de hiiperparâmetros
tuner = HyperparameterTuner(xgb, objective_metric_name, hyperparameter_ranges, max_jobs = 10, max_parallel_jobs = 5)

In [13]:
# Treinamento
tuner.fit({'train': dados_treino, 'validation': dados_teste}, include_cls_metadata = False)

No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config
No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config


...................................................................................................!


In [14]:
# Nome do job
hpo_job_name = tuner.latest_tuning_job.job_name
hpo_job_name

'sagemaker-xgboost-220722-1440'

In [15]:
# Resultados da otimização
tuning_job_results = sagemaker_client.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName = hpo_job_name)
status = tuning_job_results['HyperParameterTuningJobStatus']
status

'Completed'

In [16]:
# Melhor resultado da otimização
best_training_job = tuning_job_results['BestTrainingJob']
best_training_job

{'TrainingJobName': 'sagemaker-xgboost-220722-1440-006-094bd182',
 'TrainingJobArn': 'arn:aws:sagemaker:us-east-2:351371806175:training-job/sagemaker-xgboost-220722-1440-006-094bd182',
 'CreationTime': datetime.datetime(2022, 7, 22, 14, 43, 52, tzinfo=tzlocal()),
 'TrainingStartTime': datetime.datetime(2022, 7, 22, 14, 45, 31, tzinfo=tzlocal()),
 'TrainingEndTime': datetime.datetime(2022, 7, 22, 14, 47, 28, tzinfo=tzlocal()),
 'TrainingJobStatus': 'Completed',
 'TunedHyperParameters': {'alpha': '1.5198635096218103',
  'eta': '0.46553827054181685',
  'max_depth': '1',
  'min_child_weight': '4.651519262461322'},
 'FinalHyperParameterTuningJobObjectiveMetric': {'MetricName': 'validation:accuracy',
  'Value': 0.8166700005531311},
 'ObjectiveStatus': 'Succeeded'}

## Avaliação

Podemos listar hiperparâmetros e métricas objetivas de todos os jobs de treinamento e escolher o job de treinamento com a melhor métrica objetiva.

In [17]:
tuner = sagemaker.HyperparameterTuningJobAnalytics(hpo_job_name)
hpo_results_df = tuner.dataframe()

In [18]:
hpo_results_df

Unnamed: 0,alpha,eta,max_depth,min_child_weight,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
0,1.882727,0.0,10.0,1.981523,sagemaker-xgboost-220722-1440-010-52cc0ddd,Completed,0.81667,2022-07-22 14:45:30+00:00,2022-07-22 14:47:37+00:00,127.0
1,0.918899,0.117683,10.0,2.796062,sagemaker-xgboost-220722-1440-009-08c33dc9,Completed,0.8075,2022-07-22 14:45:20+00:00,2022-07-22 14:47:22+00:00,122.0
2,0.575891,0.0,10.0,3.725771,sagemaker-xgboost-220722-1440-008-f61c8293,Completed,0.81667,2022-07-22 14:45:26+00:00,2022-07-22 14:47:38+00:00,132.0
3,0.212223,0.397208,1.0,1.495,sagemaker-xgboost-220722-1440-007-536fbc8c,Completed,0.81667,2022-07-22 14:45:25+00:00,2022-07-22 14:47:39+00:00,134.0
4,1.519864,0.465538,1.0,4.651519,sagemaker-xgboost-220722-1440-006-094bd182,Completed,0.81667,2022-07-22 14:45:31+00:00,2022-07-22 14:47:28+00:00,117.0
5,0.999369,0.481301,5.0,6.59841,sagemaker-xgboost-220722-1440-005-963207a2,Completed,0.80333,2022-07-22 14:41:34+00:00,2022-07-22 14:43:37+00:00,123.0
6,0.94767,0.459445,3.0,7.052626,sagemaker-xgboost-220722-1440-004-013d346a,Completed,0.81292,2022-07-22 14:41:43+00:00,2022-07-22 14:43:45+00:00,122.0
7,1.929878,0.044551,10.0,2.063586,sagemaker-xgboost-220722-1440-003-41392c20,Completed,0.81667,2022-07-22 14:41:33+00:00,2022-07-22 14:43:35+00:00,122.0
8,1.73854,0.515396,5.0,3.944418,sagemaker-xgboost-220722-1440-002-ed8f5043,Completed,0.79542,2022-07-22 14:41:35+00:00,2022-07-22 14:43:37+00:00,122.0
9,1.171897,0.466366,1.0,2.883897,sagemaker-xgboost-220722-1440-001-8afa49e0,Completed,0.81667,2022-07-22 14:41:29+00:00,2022-07-22 14:43:21+00:00,112.0
