## UFO Sightings Evaluation and Optimization Lab

The goal of this notebook is to find out if our optimized model hyperparmeters out performs the training of our baseline Linear Learner model. We can also compare things like accurary and see if they differ.

What we plan on accompishling is the following:
1. [Create and train our "optimized" model (Linear Learner)](#1.-Create-and-train-our-%22optimized%22-model-(Linear-Learner))
1. Compare the results!

First let's go ahead and import all the needed libraries.

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime


import boto3
from sagemaker import get_execution_role
import sagemaker

In [None]:
role = get_execution_role()
bucket='<INSERT_BUCKET_NAME_HERE>'

---

### 1. Create and train our "optimized" model (Linear Learner)

Let's evaluate the Linear Learner algorithm with the new optimized hyperparameters. Let's go ahead and get the data that we already stored into S3 as recordIO protobuf data.

Let's get the recordIO file for the training data that is in S3

In [None]:
train_file = 'ufo_sightings_train_recordIO_protobuf.data'
training_recordIO_protobuf_location = 's3://{}/algorithms_lab/linearlearner_train/{}'.format(bucket, train_file)
print('The Pipe mode recordIO protobuf training data: {}'.format(training_recordIO_protobuf_location))

Let's get the recordIO file for the validation data that is in S3

In [None]:
validation_file = 'ufo_sightings_validatioin_recordIO_protobuf.data'
validate_recordIO_protobuf_location = 's3://{}/algorithms_lab/linearlearner_validation/{}'.format(bucket, validation_file)
print('The Pipe mode recordIO protobuf validation data: {}'.format(validate_recordIO_protobuf_location))

---

Alright we are good to go for the Linear Learner algorithm. Let's get everything we need from the ECR repository to call the Linear Learner algorithm.

In [None]:
from sagemaker import image_uris
container = image_uris.retrieve('linear-learner', boto3.Session().region_name, '1')

Let's create a job and use the optimzed hyperparamters.

In [None]:
# Create a training job name
job_name = 'ufo-linear-learner-job-optimized-{}'.format(datetime.now().strftime("%Y%m%d%H%M%S"))
print('Here is the job name {}'.format(job_name))

# Here is where the model-artifact will be stored
output_location = 's3://{}/optimization_evaluation_lab/linearlearner_optimized_output'.format(bucket)

Next we can start building out our model by using the SageMaker Python SDK and passing in everything that is required to create a Linear Learner training job.

Here are the [linear learner hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/ll_hyperparameters.html) that we can use within our training job.

After we run this job we can view the results.

In [None]:
%%time
sess = sagemaker.Session()

# Setup the LinearLeaner algorithm from the ECR container
linear = sagemaker.estimator.Estimator(container,
                                       role, 
                                       instance_count=1, 
                                       instance_type='ml.c4.xlarge',
                                       output_path=output_location,
                                       sagemaker_session=sess,
                                       input_mode='Pipe')
# Setup the hyperparameters
linear.set_hyperparameters( feature_dim=22, 
                            predictor_type='multiclass_classifier',
                            num_classes=3,
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here
                            ## enter optimized hyperparameters here)
                          )


# Launch a training job. This method calls the CreateTrainingJob API call
data_channels = {
    'train': training_recordIO_protobuf_location,
    'validation': validate_recordIO_protobuf_location
}
linear.fit(data_channels, job_name=job_name)

Now we can compare the amount of time billed and the accuracy compared to our baseline model.