## Setup
This notebook was created and tested on an ml.m5.xlarge notebook instance.

Let's start by specifying:

- The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting.
- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these. Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the boto regexp with a the appropriate full IAM role arn string(s).

In [2]:
# Define IAM role
import boto3
import re
from sagemaker import get_execution_role
role = get_execution_role()


import os

bucket = '' # set your bucket
key = '' # set your prefix
data_loc = 's3://{}/{}'.format(bucket, key)
train_file = data_loc+'/xxx.csv' # set your train file name
output_location = 's3://{}/{}'.format(bucket, 'model')


## Training the Linear-learner model
After setting training parameters, we kick off training, and poll for status until training is completed, which in this example, takes between 5 and 6 minutes.

In [7]:
from sagemaker.amazon.amazon_estimator import get_image_uri
container = get_image_uri(boto3.Session().region_name, 'linear-learner')

import sagemaker
from sagemaker.session import s3_input

sess = sagemaker.Session()

linear = sagemaker.estimator.Estimator(container,
                                       role, 
                                       train_instance_count=1, 
                                       train_instance_type='ml.m5.xlarge',
                                       output_path=output_location,
                                       sagemaker_session=sess)
linear.set_hyperparameters(feature_dim=11,
                           predictor_type='regressor',
                           mini_batch_size=200)

content_type = "text/csv"
train_data = s3_input(train_file, content_type=content_type)
linear.fit({'train': train_data})



2020-08-03 17:02:13 Starting - Starting the training job...
2020-08-03 17:02:27 Starting - Launching requested ML instances......
2020-08-03 17:03:38 Starting - Preparing the instances for training.........
2020-08-03 17:04:47 Downloading - Downloading input data...
2020-08-03 17:05:39 Training - Training image download completed. Training in progress..[34mDocker entrypoint called with argument(s): train[0m
[34mRunning default environment configuration script[0m
[34m[08/03/2020 17:05:43 INFO 139895780669248] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'feature_dim': u'auto', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'quantile': u'0.5', u'bias_lr_mult': u'auto', u'lr_schedul

[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.7360183404239398, "sum": 0.7360183404239398, "min": 0.7360183404239398}}, "EndTime": 1596474393.532748, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1596474393.532687}
[0m
[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.734350600226531, "sum": 0.734350600226531, "min": 0.734350600226531}}, "EndTime": 1596474393.532827, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1596474393.532815}
[0m
[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.7319352710800512, "sum": 0.7319352710800512, "min": 0.7319352710800512}}, "EndTime": 1596474393.532881, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1596474393.532869}
[0m
[34m#metrics {"Metric

[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.7308934365175073, "sum": 0.7308934365175073, "min": 0.7308934365175073}}, "EndTime": 1596474417.616814, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1596474417.616753}
[0m
[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.7378766935770473, "sum": 0.7378766935770473, "min": 0.7378766935770473}}, "EndTime": 1596474417.616894, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1596474417.616882}
[0m
[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.7300591093405845, "sum": 0.7300591093405845, "min": 0.7300591093405845}}, "EndTime": 1596474417.616932, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1596474417.616922}
[0m
[34m#metrics {"Met

[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.7299816452066104, "sum": 0.7299816452066104, "min": 0.7299816452066104}}, "EndTime": 1596474442.247328, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1596474442.247267}
[0m
[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.7380743595602967, "sum": 0.7380743595602967, "min": 0.7380743595602967}}, "EndTime": 1596474442.247408, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1596474442.247395}
[0m
[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.7299601268881843, "sum": 0.7299601268881843, "min": 0.7299601268881843}}, "EndTime": 1596474442.247462, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1596474442.247446}
[0m
[34m#metrics {"Met

[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.7298532285676116, "sum": 0.7298532285676116, "min": 0.7298532285676116}}, "EndTime": 1596474467.348028, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1596474467.347967}
[0m
[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.7454178621891945, "sum": 0.7454178621891945, "min": 0.7454178621891945}}, "EndTime": 1596474467.348108, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1596474467.348095}
[0m
[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.730004672857031, "sum": 0.730004672857031, "min": 0.730004672857031}}, "EndTime": 1596474467.348148, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1596474467.348135}
[0m
[34m#metrics {"Metric

[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.7298217714518782, "sum": 0.7298217714518782, "min": 0.7298217714518782}}, "EndTime": 1596474491.70199, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 5}, "StartTime": 1596474491.701927}
[0m
[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.735667759870726, "sum": 0.735667759870726, "min": 0.735667759870726}}, "EndTime": 1596474491.70207, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 5}, "StartTime": 1596474491.702058}
[0m
[34m#metrics {"Metrics": {"train_mse_objective": {"count": 1, "max": 0.7301270789896684, "sum": 0.7301270789896684, "min": 0.7301270789896684}}, "EndTime": 1596474491.702133, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 5}, "StartTime": 1596474491.702115}
[0m
[34m#metrics {"Metrics"


2020-08-03 17:08:24 Uploading - Uploading generated training model
2020-08-03 17:08:24 Completed - Training job completed
Training seconds: 217
Billable seconds: 217


## Create endpoint
After outputing the model, we creates the endpoint that serves up the model through specifying the name. The end result is an endpoint that can be validated and incorporated into production applications. This takes 9-11 minutes to complete.

In [9]:
linear_predictor = linear.deploy(initial_instance_count=1,
                                 instance_type='ml.m5.xlarge',
                                endpoint_name = 'linearendpoint')



-------------!

## Validate the model
Finally, we can now validate the model for use. They can obtain the endpoint, and generate predictions from the trained model using that endpoint.

In [14]:
from sagemaker.predictor import csv_serializer, json_deserializer
import pandas as pd

linear_predictor.content_type = 'text/csv'
linear_predictor.serializer = csv_serializer
linear_predictor.deserializer = json_deserializer

# using features to test
test = '1.0,142424.29207611087,40.86764526367188,39.803932189941406,40.750870408043156,221935.611618042,41.00096130371094,40.58504867553711,40.75204032648586,5446.0,3495.0'

result = linear_predictor.predict(test)
print(result)

{'predictions': [{'score': 783.171875}]}


## Delete Endpoint
Once you are done using the endpoint, you can use the following to delete it.

In [16]:
linear_predictor.delete_endpoint()