# Part 2/4 - Optimizing the model

Now that you know how to train a ML model using SageMaker, it's time to optimize it using [Automatic Model Tuning](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html) or Hyperparameter optimization. This is a powerful technique that explores the space of possible values for the selected hyperparameters and tries to find the best combination based on a given metric. You can also select the strategy you want to execute based on that metric, for instance: If my objective function (metric) is **Acuraccy**, then I will select the **Maximize** stragegy. If my metric is **Error**, then I will select **Minimize**.

SageMaker library 2.0+ is required!

## Let's start by recreating the estimator

In [None]:
import sagemaker
import boto3
import numpy as np

from sagemaker import get_execution_role
from sklearn.model_selection import train_test_split
from sklearn import datasets

role = get_execution_role()

prefix='mlops/iris'
# Retrieve the default bucket
sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()
assert(sagemaker.__version__ >= "2.0")

### Preparing the dataset and uploading it

In [None]:
iris = datasets.load_iris()
X=iris.data
y=iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, stratify=y)
yX_train = np.column_stack((y_train, X_train))
yX_test = np.column_stack((y_test, X_test))
np.savetxt("iris_train.csv", yX_train, delimiter=",", fmt='%0.3f')
np.savetxt("iris_test.csv", yX_test, delimiter=",", fmt='%0.3f')

# Upload the dataset to an S3 bucket
input_train = sagemaker_session.upload_data(path='iris_train.csv', key_prefix='%s/data' % prefix)
input_test = sagemaker_session.upload_data(path='iris_test.csv', key_prefix='%s/data' % prefix)

train_data = sagemaker.inputs.TrainingInput(s3_data=input_train,content_type="csv")
test_data = sagemaker.inputs.TrainingInput(s3_data=input_test,content_type="csv")

In [None]:
# get the URI for new container
container_uri = sagemaker.image_uris.retrieve('xgboost', boto3.Session().region_name, version='1.0-1')

# Create the estimator
xgb = sagemaker.estimator.Estimator(container_uri,
                                    role, 
                                    instance_count=1, 
                                    instance_type='ml.m4.xlarge',
                                    output_path='s3://{}/{}/output'.format(bucket, prefix),
                                    sagemaker_session=sagemaker_session)
# Set the hyperparameters
xgb.set_hyperparameters(num_class=len(np.unique(y)),
                        silent=0,
                        objective='multi:softmax',
                        num_round=30)

## Hyperparameter Tuning Jobs
#### A.K.A. Hyperparameter Optimization

We know that the iris dataset is an easy challenge. We can achieve a better score with XGBoost. However, we don't want to waste time testing all the possible variations of the hyperparameters in order to optimize the training process.

Instead, we'll use the Sagemaker's tuning feature. For that, we'll use the same estimator, but let's create a Tuner and ask it for optimize the model for us. 

In [None]:
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner

hyperparameter_ranges = {'eta': ContinuousParameter(0, 1),
                        'min_child_weight': ContinuousParameter(1, 10),
                        'alpha': ContinuousParameter(0, 2),
                         'gamma': ContinuousParameter(0, 10),
                        'max_depth': IntegerParameter(1, 10)}

objective_metric_name = 'validation:merror'

tuner = HyperparameterTuner(xgb,
                            objective_metric_name,
                            hyperparameter_ranges,
                            max_jobs=20,
                            max_parallel_jobs=4,
                            objective_type='Minimize')

tuner.fit({'train': train_data, 'validation': test_data, })
tuner.wait()

In [None]:
job_name = tuner.latest_tuning_job.name
attached_tuner = HyperparameterTuner.attach(job_name)
xgb_predictor = attached_tuner.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

In [None]:
endpoint_name = xgb_predictor.endpoint_name
model_name = boto3.client('sagemaker').describe_endpoint_config(
    EndpointConfigName=endpoint_name
)['ProductionVariants'][0]['ModelName']
!echo $model_name > model_name.txt
!echo $endpoint_name > endpoint_name2.txt

## A simple test before we move on

In [None]:
from sagemaker.serializers import CSVSerializer
from sklearn.metrics import f1_score
csv_serializer = CSVSerializer()
xgb_predictor.serializer = csv_serializer

In [None]:
predictions_test = [ float(xgb_predictor.predict(x).decode('utf-8')) for x in X_test] 
score = f1_score(y_test,predictions_test,labels=[0.0,1.0,2.0],average='micro')

print('F1 Score(micro): %.1f' % (score * 100.0))

## Alright, now that you know how to optimize a model let's run a batch prediction

Click [here to start the Part 3/4](03_BasicModel_Part3_BatchPrediction.ipynb) of this warmup: Batch Prediction

## Cleaning up (Attention! Read the message before deleting the Endpoint)
Only run the next cell if you will **NOT** continue running the next part of the WarmUp. If you decide to continue, please, click on the link above.

In [None]:
xgb_predictor.delete_endpoint()

# The end