# Finding the Best Compute 
Finding the optimal EC2 instance can actually be framed as its own learning problem! Here we suggest a method that explores a variety of EC2 instances for SageMaker data scientists who struggle with:
- Finding the best EC2 instance for a training job
- Picking the right EC2 instance for deploying a model
- Updating your endpoint EC2 instance when your model changes
- Updating your instances when AWS launches new instances

### Getting data
To step through this notebook, you'll need to get your hands on some data. We recommend stepping through this example notebook, handily available both on Github through the sagemaker-examples, or pre-installed on your SageMaker notebook instance
- https://github.com/awslabs/amazon-sagemaker-examples/tree/master/introduction_to_applying_machine_learning/xgboost_direct_marketing 

Feel free to just run all on the cells after you specify your bucket. We'll point to a train and a test set to run our instance experiments here.

In [6]:
import sagemaker
import pandas as pd
import boto3
import os

In [2]:
train = '/home/ec2-user/SageMaker/xgboost_direct_marketing_2019-11-22/train.csv'
validation = '/home/ec2-user/SageMaker/xgboost_direct_marketing_2019-11-22/validation.csv'

In [4]:
from sagemaker import get_execution_role

role = get_execution_role()

sess = sagemaker.Session()

bucket = 'mandalorian'

prefix = 'xgboost/direct-marketing'


In [8]:
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file(train)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')).upload_file(validation)

In [None]:
from sagemaker.amazon.amazon_estimator import get_image_uri
container = get_image_uri(boto3.Session().region_name, 'xgboost', '0.90-1')

s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')
s3_input_validation = sagemaker.s3_input(s3_data='s3://{}/{}/validation/'.format(bucket, prefix), content_type='csv')

xgb = sagemaker.estimator.Estimator(container,
                                    role, 
                                    train_instance_count=1, 
                                    train_instance_type='ml.m4.xlarge',
                                    output_path='s3://{}/{}/output'.format(bucket, prefix),
                                    sagemaker_session=sess)
xgb.set_hyperparameters(max_depth=5,
                        eta=0.2,
                        gamma=4,
                        min_child_weight=6,
                        subsample=0.8,
                        silent=0,
                        objective='binary:logistic',
                        num_round=100)

xgb.fit({'train': s3_input_train, 'validation': s3_input_validation}) 