## UFO Sightings Evaluation and Optimization Lab

The goal of this notebook is to find out if the optimized model hyperparmeters out performs the training of our baseline Linear Learner model. Compare things like accurary and see if they differ.

1. [Create and train our "optimized" model (Linear Learner)](#1.-Create-and-train-our-%22optimized%22-model-(Linear-Learner))
1. Compare the results!

import all the needed libraries.

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime


import boto3
from sagemaker import get_execution_role
import sagemaker

In [2]:
role = get_execution_role()
bucket='ml-lab-ufo-elly'

---

### 1. Create and train our "optimized" model (Linear Learner)
evaluate the Linear Learner algorithm with the new optimized hyperparameters.  go ahead and get the data that we already stored into S3 as recordIO protobuf data.

get the recordIO file for the training data that is in S3

In [3]:
train_file = 'ufo_sightings_train_recordIO_protobuf.data'
training_recordIO_protobuf_location = 's3://{}/algorithms_lab/linearlearner_train/{}'.format(bucket, train_file)
print('The Pipe mode recordIO protobuf training data: {}'.format(training_recordIO_protobuf_location))

The Pipe mode recordIO protobuf training data: s3://ml-lab-ufo-elly/algorithms_lab/linearlearner_train/ufo_sightings_train_recordIO_protobuf.data


get the recordIO file for the validation data that is in S3

In [4]:
validation_file = 'ufo_sightings_validatioin_recordIO_protobuf.data'
validate_recordIO_protobuf_location = 's3://{}/algorithms_lab/linearlearner_validation/{}'.format(bucket, validation_file)
print('The Pipe mode recordIO protobuf validation data: {}'.format(validate_recordIO_protobuf_location))

The Pipe mode recordIO protobuf validation data: s3://ml-lab-ufo-elly/algorithms_lab/linearlearner_validation/ufo_sightings_validatioin_recordIO_protobuf.data


In [6]:
from sagemaker.amazon.amazon_estimator import get_image_uri
import sagemaker

container = get_image_uri(boto3.Session().region_name, 'linear-learner', "1")

'get_image_uri' method will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


Create a job and use the optimzed hyperparamters.

In [7]:
# Create a training job name
job_name = 'ufo-linear-learner-job-optimized-{}'.format(datetime.now().strftime("%Y%m%d%H%M%S"))
print('Here is the job name {}'.format(job_name))

# Here is where the model-artifact will be stored
output_location = 's3://{}/optimization_evaluation_lab/linearlearner_optimized_output'.format(bucket)

Here is the job name ufo-linear-learner-job-optimized-20200810030637


Next start building out my model by using the SageMaker Python SDK and passing in everything that is required to create a Linear Learner training job.

Here are the [linear learner hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/ll_hyperparameters.html) 


In [None]:
%%time
sess = sagemaker.Session()

# Setup the LinearLeaner algorithm from the ECR container
linear = sagemaker.estimator.Estimator(container,
                                       role, 
                                       train_instance_count=1, 
                                       train_instance_type='ml.c4.xlarge',
                                       output_path=output_location,
                                       sagemaker_session=sess,
                                       input_mode='Pipe')
# Setup the hyperparameters
linear.set_hyperparameters( feature_dim=22, 
                            predictor_type='multiclass_classifier',
                            num_classes=3,
                            l1 = 0.00015634445960768285,
                            learning_rate= 0.02986332628768753,
                            mini_batch_size = 1978,
                            use_bias='true',
                            wd=0.0028658419345028887
                          )


# Launch a training job. This method calls the CreateTrainingJob API call
data_channels = {
    'train': training_recordIO_protobuf_location,
    'validation': validate_recordIO_protobuf_location
}
linear.fit(data_channels, job_name=job_name)

Parameter image_name will be renamed to image_uri in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.


2020-08-10 03:07:04 Starting - Starting the training job...
2020-08-10 03:07:06 Starting - Launching requested ML instances......
2020-08-10 03:08:17 Starting - Preparing the instances for training...

compare the amount of time billed and the accuracy compared to our baseline model.
