# Blog post 3: Excel in tuning models using Amazon LinearLearner algorithm

# Introduction
In this third part of the three-part blog series explaining the AWS implementation of a scalable linear regression model, I continue to explore the inner-workings of Amazon SageMaker and the Amazon LinearLearner algorithm. I attempt to fine-tune the model on the Visa dataset to see whether or not the recall could be improved. In blog post 2 I downloaded the Visa dataset from Kaggle (https://www.kaggle.com/mlg-ulb/creditcardfraud/data) from an Amazon S3 location to my notebook instance and pre-processed to feed the data to the algorithm. I then created a live endpoint and made predictions using trained models.

# Prerequisites
For this series of blog posts, I assume that you have already completed the following tutorials from the Amazon SageMaker documentation:

- [Setting up](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-set-up.html) 
- [Create am Amazon SageMaker Notebook Instance](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html)

I have included “sagemaker” in the name of my S3 bucket, "cyrusmv-sagemaker-demos' and have chosen to let any SageMaker notebook instance access any Amazon S3 bucket with the term “sagemaker” included in the name. Note: This is not a recommended security option for production and is only useful for simplifying the flow of the blog.

In this blog I am using the [Visa dataset from Kaggle](https://www.kaggle.com/mlg-ulb/creditcardfraud). I have put the dataset in an Amazon S3 bucket. You should also download the dataset and upload the data onto Amazon S3, otherwise you will receive errors.

I assume that you are familiar with linear regression. If you’re not, read [blog post one](linearlearner-blogpost-part1.ipynb) in this series.

# Hyperparameter tuning
In [blog post 2](linearlearner-blogpost-part2.ipynb) in this three-art series, we used default hyperparameters:
u'epochs': u'10', u'init_bias': u'0.0', u'lr_scheduler_factor': u'0.99', u'num_calibration_samples': u'10000000', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'bias_lr_mult': u'10', u'lr_scheduler_step': u'100', u'init_method': u'uniform', u'init_sigma': u'0.01', u'lr_scheduler_minimum_lr': u'0.00001', **u'target_recall': u'0.8'**, **u'num_models': u'32'**, u'momentum': u'0.0', u'unbias_label': u'auto', u'wd': u'0.0', u'optimizer': u'adam', u'learning_rate': u'auto', u'_kvstore': u'auto', **u'normalize_data': u'true'**, **u'binary_classifier_model_selection_criteria': u'accuracy'**, u'use_lr_scheduler': u'true', **u'target_precision': u'0.8'**, u'force_dense': u'true', u'unbias_data': u'auto', u'init_scale': u'0.07', u'bias_wd_mult': u'0', u'mini_batch_size': u'1000', u'beta_1': u'0.9', u'loss': u'auto', u'beta_2': u'0.999', u'normalize_label': u'auto', u'_num_gpus': u'auto', u'_data_format': u'record', u'positive_example_weight_mult': u'1.0', u'l1': u'0.0'}

Let's highlight a few of these parameters:
- **target_recall and target_precision** are both set to 80%.I intend to optimize for recall, so I change the recall target to 90% and see what accuracy we are going to achieve.
- **normalize_data** is already true.
- **binary_classifier_model_selection_criteria** is ```accuracy```. I will change it to **precision_at_target_recall**. This forces the model to optimize for recall of 90% whatever the accuracy might end up at.
- **num_models** is 32 so we know that LinearLearner is running 32 models in parallel. Each one of these parallel training jobs is passing a permutation of hyperparameter values in order to find the parameter combination that yields the best result.

for more information on linear learner hyperparmeters pleaese see the [Amazon SageMaker Documentaiton](https://docs.aws.amazon.com/sagemaker/latest/dg/ll_hyperparameters.html)


In [1]:
#imports
import boto3 #AWS python SDK for accessing AWS services
import numpy as np #Array libraru with probability and statistics capabilities
import io
import sagemaker.amazon.common as smac # Amazon Sagemaker common library that includes data formats
import sagemaker #sagemaker python sdk
import os
from sagemaker.predictor import csv_serializer, json_deserializer #sagemaker prediction sdk
from sagemaker import get_execution_role


In [2]:
bucket = 'cyrusmv-sagemaker-demos'     #replace this with your own bucket 
original_key = 'visa-kaggle/original.csv'    #replace this with your own file inside the bucket
local_pickel_root = '../data/'
dist = 'visa-kaggle/data/'
s3_4_output = 'visa-kaggle/'

files = {}

role = get_execution_role() #this is SageMaker role that would be later used for authorizing SageMaker to access S3
print(role) 

sagemaker_session = sagemaker.Session()

arn:aws:iam::475933981307:role/service-role/AmazonSageMaker-ExecutionRole-20180102T172706


# Downloading data files from Amazon S3
We iterate over Amazon S3 sub-directories recursively and when reaching a leaf (files are leaves of directory structure), we download the file. We also append the location of the file and key to the files' array, so the code can be generalized based on your folder structure in S3. 

*Disclaimer: The code here is based on [this stackoverflow reference](https://stackoverflow.com/questions/31918960/boto3-to-download-all-files-from-a-s3-bucket) plus exception handling and creating a dictionary of files.*

In [18]:
def download_dir(client, resource, dist, local, bucket):
    paginator = client.get_paginator('list_objects')
    for result in paginator.paginate(Bucket=bucket, Delimiter='/', Prefix=dist):
        if result.get('CommonPrefixes') is not None:
            for subdir in result.get('CommonPrefixes'):
                download_dir(client, resource, subdir.get('Prefix'), local, bucket)
        if result.get('Contents') is not None:
            for file in result.get('Contents'):
                if not os.path.exists(os.path.dirname(local + os.sep + file.get('Key'))):
                    os.makedirs(os.path.dirname(local + os.sep + file.get('Key')))
                print('bucket: {} source file: {}; ==> local: {} \n'.format(bucket, file.get('Key'), local + os.sep + file.get('Key')))
                try:
                    dest = local + os.sep + file.get('Key')
                    key = dest.rsplit('/',1)[-1]
                    key = key.rsplit('.', 1)[0]
                    resource.meta.client.download_file(bucket, file.get('Key'),dest)
                    files[key] = dest
                except (IsADirectoryError, NotADirectoryError):
                    print('WARNING: {}/{} is a directory, skipping download operation'.format(bucket, file.get('Key')))

                    
def _start():
    client = boto3.client('s3')
    resource = boto3.resource('s3')
    download_dir(client, resource, local=local_pickel_root, bucket=bucket, dist=dist)
    print('\ndownload completed.')
    
_start()

files


bucket: cyrusmv-sagemaker-demos source file: visa-kaggle/data/test/val_data.npy; ==> local: ../data//visa-kaggle/data/test/val_data.npy 

bucket: cyrusmv-sagemaker-demos source file: visa-kaggle/data/test/val_label.npy; ==> local: ../data//visa-kaggle/data/test/val_label.npy 

bucket: cyrusmv-sagemaker-demos source file: visa-kaggle/data/train/train_data.npy; ==> local: ../data//visa-kaggle/data/train/train_data.npy 

bucket: cyrusmv-sagemaker-demos source file: visa-kaggle/data/train/train_label.npy; ==> local: ../data//visa-kaggle/data/train/train_label.npy 

bucket: cyrusmv-sagemaker-demos source file: visa-kaggle/data/; ==> local: ../data//visa-kaggle/data/ 

bucket: cyrusmv-sagemaker-demos source file: visa-kaggle/data/recordio-pb-data; ==> local: ../data//visa-kaggle/data/recordio-pb-data 


download completed.


{'recordio-pb-data': '../data//visa-kaggle/data/recordio-pb-data',
 'train_data': '../data//visa-kaggle/data/train/train_data.npy',
 'train_label': '../data//visa-kaggle/data/train/train_label.npy',
 'val_data': '../data//visa-kaggle/data/test/val_data.npy',
 'val_label': '../data//visa-kaggle/data/test/val_label.npy'}

# Loading data into vectors
We will need to have the train and validation data to be loaded into numpy vectors before oriessing them.

In [19]:
train_data = np.load(files['train_data'])
train_label = np.load(files['train_label'])

val_data = np.load(files['val_data'])
val_label = np.load(files['val_label'])

print("training data shape= {}; training label shape = {} \nValidation data shape= {}; validation label shape = {}".format(train_data.shape, 
                                                                        train_label.shape,
                                                                        val_data.shape,
                                                                        val_label.shape))
train_set = (train_data, train_label)
test_set = (val_data, val_label)


training data shape= (199364, 30); training label shape = (199364,) 
Validation data shape= (85443, 30); validation label shape = (85443,)


# Converting the data
Amazon Algorithms support csv and recordio/protobuf. recordio is faster than CSV and specially in algorithms that deal with sparse matrices.
In the below snippet I am using sagemaker.amazon.core library in order to convert my numpy arrays into protobuf recordIO.

In [20]:
vectors = np.array([t.tolist() for t in train_set[0]]).astype('float32')
labels = np.array([t.tolist() for t in train_set[1]]).astype('float32')

buf = io.BytesIO()
smac.write_numpy_to_dense_tensor(buf, vectors, labels)
buf.seek(0)

0

# Upload training data to Amazon S3
Now that we've created our recordIO-wrapped protobuf, we'll need to upload it to Amazon S3, so that Amazon SageMaker training can use it.

In [21]:
key = 'recordio-pb-data'
boto3.resource('s3').Bucket(bucket).Object(os.path.join(dist, key)).upload_fileobj(buf)
s3_train_data = 's3://{}/{}{}'.format(bucket, dist, key)
print('uploaded training data location: {}'.format(s3_train_data))

uploaded training data location: s3://cyrusmv-sagemaker-demos/visa-kaggle/data/recordio-pb-data


Let's also setup an output Amazon S3 location where the model artifacts can be uploaded to after training is complete.

In [22]:
output_location = 's3://{}/{}output'.format(bucket, s3_4_output)
print('training artifacts will be uploaded to: {}'.format(output_location))

training artifacts will be uploaded to: s3://cyrusmv-sagemaker-demos/visa-kaggle/output


# Training the model with new hyper parameters

In [23]:
containers = {'us-west-2': '174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest',
              'us-east-1': '382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest',
              'us-east-2': '404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest',
              'eu-west-1': '438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest'}

In [26]:
sess = sagemaker.Session()

linear = sagemaker.estimator.Estimator(containers[boto3.Session().region_name],
                                       role, #S3 role, so the notebook can read the data and upload the model
                                       train_instance_count=1, #number of instances for training
                                       train_instance_type='ml.m4.xlarge', # type of training instance
                                       output_path=output_location, #s3 location for uploading trained mdoel
                                       sagemaker_session=sess)

linear.set_hyperparameters(feature_dim=30, #dataset has 30 columns (features)
                           predictor_type='binary_classifier', # we predict a binary value. it could have been regressor
                           mini_batch_size=200,
                           #making recall the selection criteria and changin calibration samples that are used for threshold setting
                           binary_classifier_model_selection_criteria = 'precision_at_target_recall', 
                           target_recall = 0.9   
                          )

linear.fit({'train': s3_train_data})

INFO:sagemaker:Creating training-job with name: linear-learner-2018-02-26-12-55-21-014


..................................................................
[31mDocker entrypoint called with argument(s): train[0m
[31m[02/26/2018 13:00:44 INFO 139651068401472] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/default-input.json: {u'epochs': u'10', u'init_bias': u'0.0', u'lr_scheduler_factor': u'0.99', u'num_calibration_samples': u'10000000', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'bias_lr_mult': u'10', u'lr_scheduler_step': u'100', u'init_method': u'uniform', u'init_sigma': u'0.01', u'lr_scheduler_minimum_lr': u'0.00001', u'target_recall': u'0.8', u'num_models': u'32', u'early_stopping_patience': u'3', u'momentum': u'0.0', u'unbias_label': u'auto', u'wd': u'0.0', u'optimizer': u'adam', u'early_stopping_tolerance': u'0.001', u'learning_rate': u'auto', u'_kvstore': u'auto', u'normalize_data': u'true', u'binary_classifier_model_selection_criteria': u'accuracy', u'use_lr

[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.005667652155470896, "sum": 0.005667652155470896, "min": 0.005667652155470896}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650083.074053, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1519650083.073974}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004825513667789808, "sum": 0.004825513667789808, "min": 0.004825513667789808}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650083.074134, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1519650083.074119}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cro

[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004935127349054239, "sum": 0.004935127349054239, "min": 0.004935127349054239}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650102.52742, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1519650102.527341}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.00427203314891272, "sum": 0.00427203314891272, "min": 0.00427203314891272}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650102.527499, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1519650102.527485}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cross_e

[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.00470667933693612, "sum": 0.00470667933693612, "min": 0.00470667933693612}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650121.662357, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1519650121.662279}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004219565484186552, "sum": 0.004219565484186552, "min": 0.004219565484186552}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650121.662438, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1519650121.662424}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cross_

[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004611710490262413, "sum": 0.004611710490262413, "min": 0.004611710490262413}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650140.604067, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 5}, "StartTime": 1519650140.603988}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004105323628114769, "sum": 0.004105323628114769, "min": 0.004105323628114769}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650140.604149, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 5}, "StartTime": 1519650140.604135}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cro

[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004567881404874794, "sum": 0.004567881404874794, "min": 0.004567881404874794}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650159.643498, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 6}, "StartTime": 1519650159.643419}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004108989967788797, "sum": 0.004108989967788797, "min": 0.004108989967788797}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650159.643579, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 6}, "StartTime": 1519650159.643564}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cro

[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004545867700937642, "sum": 0.004545867700937642, "min": 0.004545867700937642}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650179.086655, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 7}, "StartTime": 1519650179.086576}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004049344359650221, "sum": 0.004049344359650221, "min": 0.004049344359650221}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650179.086736, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 7}, "StartTime": 1519650179.086722}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cro

[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004526676941722871, "sum": 0.004526676941722871, "min": 0.004526676941722871}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650199.028148, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 8}, "StartTime": 1519650199.028068}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004051024555509169, "sum": 0.004051024555509169, "min": 0.004051024555509169}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650199.028227, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 8}, "StartTime": 1519650199.028214}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cro

[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004505495541470956, "sum": 0.004505495541470956, "min": 0.004505495541470956}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650219.623476, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 9}, "StartTime": 1519650219.623397}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004010293081058675, "sum": 0.004010293081058675, "min": 0.004010293081058675}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650219.623554, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 9}, "StartTime": 1519650219.62354}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cros

[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004484348957871936, "sum": 0.004484348957871936, "min": 0.004484348957871936}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650239.849232, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 10}, "StartTime": 1519650239.849153}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_cross_entropy": {"count": 1, "max": 0.004022592038222884, "sum": 0.004022592038222884, "min": 0.004022592038222884}, "validation_binary_classification_cross_entropy": {"count": 1, "max": -Infinity, "sum": NaN, "min": Infinity}}, "EndTime": 1519650239.849303, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 10}, "StartTime": 1519650239.849289}
[0m
[31m#metrics {"Metrics": {"training_binary_classification_c

===== Job Complete =====


# Hosting

In [29]:
linear_predictor = linear.deploy(initial_instance_count=1, #Initial number of instances. 
                                                           #Autoscaling can increase the number of instances.
                                 instance_type='ml.m4.xlarge') # instance type

INFO:sagemaker:Creating model with name: linear-learner-2018-02-26-13-10-07-983
INFO:sagemaker:Creating endpoint with name linear-learner-2018-02-26-12-55-21-014


----------------------------------------------------------------------------------------------------------------------------!

In [30]:
type(linear_predictor)

sagemaker.predictor.RealTimePredictor

# Prediction

In [31]:
linear_predictor.content_type = 'text/csv'
linear_predictor.serializer = csv_serializer
linear_predictor.deserializer = json_deserializer

In [32]:
predictions = []
for array in np.array_split(test_set[0], 100):
    result = linear_predictor.predict(array)
    predictions += [r['predicted_label'] for r in result['predictions']]

predictions = np.array(predictions)

In [33]:
import pandas as pd

pd.crosstab(test_set[1], predictions, rownames=['actuals'], colnames=['predictions'])

predictions,0.0,1.0
actuals,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,83778,1518
1.0,20,127


In [37]:
print("false positive after Hyper-Parameter change = {}".format(127/(20+127)))
print("false positive before Hyper-Parameter change = {}".format(85443/(85443+1518)))


false positive after Hyper-Parameter change = 0.8639455782312925
false positive before Hyper-Parameter change = 0.982543898989202


# Analyzing the results
The confusion matrix in the earlier section indicates the following:
- Total fraudulent transactions: 147
 - Num Examples (NE) = 85443
 - True Positive (TP) = 127
 - False Positive (FP) = 1518
 - False Negative (FN) = 20

- **Recall** = TP/(TP+FN) = 127/(127+20) = 0.86
- **Precision** = TP/(TP+FP) = 127/(127+1518) = 0.08
- **Accuracy** = 1- (FP+FN)/NE = 1 - (1538/85443) = 0.98

Recall on fraud in this mode is 86% as opposed to 80% with default parameters. This is a significant improvement on recall, even though precision has now dropped to a very low value.
An important fact to notice is that from parallel models, in this model, model #0 and in the model with default values model #12, yielded the best results. This is testament to the power of parallel training based on hyperparameter optimization, which LinearLearner provides out of the box.

Using hyperparameter optimization has significantly shortend experiment time, thus releasing your scientists to work on new problems while reducing time to market for your model.

| After Changing Hyper-Parameters | Before Changing Hyper-Parameters|
|:--------------------------------|:--------------------------------|
| model: 0                        | model: 12                       |
| threshold: 0.002781             | threshold: 0.028                |
| score: 0.079990                 | score: 0.999418                 |

# Delete the endpoint
If you're ready to be done with this notebook, please run the delete_endpoint line in the cell below. This will remove the hosted endpoint you created and avoid any charges from a stray instance being left on.

In [None]:
linear.delete_endpoint()

# Conclusions
By optimizing hyperparameters we have significantly improved recall, but precision is lost. It could work well for this example given our goal and distribution of data. 