## Initialize the Model

Import the necessary libraries, find the samples bucket, and initialize an **XGBoost** model.

In [None]:
import math
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.amazon.amazon_estimator import get_image_uri
from sagemaker.session import s3_input

all_buckets = boto3.client('s3').list_buckets()['Buckets']
samples_bucket = [bucket['Name'] for bucket in all_buckets if bucket['Name'].startswith('aim368-samples-bucket-')][0]

model = sagemaker.estimator.Estimator(image_name = get_image_uri(boto3.Session().region_name, 'xgboost', '0.90-1'),
                                      role = get_execution_role(), 
                                      train_instance_count = 1, 
                                      train_instance_type = 'ml.c5.2xlarge',
                                      sagemaker_session = sagemaker.Session())

print('Done!')

## Choose Hyperparameters

**Hyperparameters** are settings that adjust how a machine learning algorithm learns from a dataset.

For descriptions of each hyperparameter avaiable in the **XGBoost** algorithm, reference SageMaker's official AWS documentation:
https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html

---

In this cell, you will need to delete each **REPLACE_ME** below and type in a value for that hyperparameter. Each one has a comment next to it that specifies a range of reasonable values to help you choose. Feel free to add more hyperparameters, but remember it could affect the quality of your model!

In [None]:
model.set_hyperparameters(
    num_round             = REPLACE_ME, # integer [20, 200]
    early_stopping_rounds = REPLACE_ME, # integer [1, 10]
    max_depth             = REPLACE_ME, # integer [3, 6]
    eta                   = REPLACE_ME  # float [0.1, 1.0]
)

print('Success!')

## Start the Training Job

The code below will begin training your ML model with the hyperparameters you selected.

---

Training your ML model should take about 5-10 minutes, so in the mean time, feel free to start training jobs for the other algorithms.

You will know the command is done when when you see ```Completed - Training job completed``` in the output.

In [None]:
model.fit(inputs = {'train': s3_input('s3://' + samples_bucket + '/TrainSamples.csv', content_type='text/csv'),
                    'validation': s3_input('s3://' + samples_bucket + '/TestSamples.csv', content_type='text/csv')}, logs = False)

## Evaluate the Model

Once your model has finished training, execute the next cell to extract the "root mean squared error" from the training job logs. This number represents the average inaccuracy of your model in milliseconds, so the smaller the number the better!

In [None]:
log_streams = boto3.client('logs').describe_log_streams(logGroupName = '/aws/sagemaker/TrainingJobs', logStreamNamePrefix = 'sagemaker-xgboost')['logStreams']
events = boto3.client('logs').get_log_events(logGroupName = '/aws/sagemaker/TrainingJobs', logStreamName = log_streams[-1]['logStreamName'])['events']

print('XGBoost training error: ' + str(int(float(events[-1]['message'].split(':')[-1]))))

## Deploy the Model

If you believe your **XGBoost** model is better than your other two, copy the deployment code from the instructions into the cell below and execute it.