<br />

<div style="text-align: center;">
<font size="7">Training Built-in Machine learning Models</font>
<br /> 
<font size="5">Simple Linear Model</font>
    
</div>
<br />

<div style="text-align: right;">
<font size="4">2020/11/11</font>
<br />
<font size="4">Ryutaro Hashimoto</font>
</div>

___

# Summary
- We will use a machine learning model that is pre-built in SageMaker.
- The sample data to be used can be created by running a create_sample_dataset.ipynb

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Define-Training-Job" data-toc-modified-id="Define-Training-Job-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Define Training Job</a></span><ul class="toc-item"><li><span><a href="#Get-the-container-image-to-use." data-toc-modified-id="Get-the-container-image-to-use.-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Get the container image to use.</a></span></li><li><span><a href="#Define-learning-jobs." data-toc-modified-id="Define-learning-jobs.-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Define learning jobs.</a></span></li><li><span><a href="#Define-data-input-and-output" data-toc-modified-id="Define-data-input-and-output-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Define data input and output</a></span></li></ul></li><li><span><a href="#Execute-Training-Job" data-toc-modified-id="Execute-Training-Job-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Execute Training Job</a></span></li><li><span><a href="#Create-endpoints-and-predict-with-trained-models" data-toc-modified-id="Create-endpoints-and-predict-with-trained-models-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Create endpoints and predict with trained models</a></span><ul class="toc-item"><li><span><a href="#Launch-endpoint" data-toc-modified-id="Launch-endpoint-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Launch endpoint</a></span></li><li><span><a href="#Predict-a-sample" data-toc-modified-id="Predict-a-sample-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Predict a sample</a></span><ul class="toc-item"><li><span><a href="#Pattern-1:-Input-sample-as-a-string" data-toc-modified-id="Pattern-1:-Input-sample-as-a-string-3.2.1"><span class="toc-item-num">3.2.1&nbsp;&nbsp;</span>Pattern 1: Input sample as a string</a></span></li><li><span><a href="#Pattern-2:-Input-sample-as-a-string-(another-way-of-writing)" data-toc-modified-id="Pattern-2:-Input-sample-as-a-string-(another-way-of-writing)-3.2.2"><span class="toc-item-num">3.2.2&nbsp;&nbsp;</span>Pattern 2: Input sample as a string (another way of writing)</a></span></li><li><span><a href="#Pattern-3-Entering-multiple-samples-as-strings" data-toc-modified-id="Pattern-3-Entering-multiple-samples-as-strings-3.2.3"><span class="toc-item-num">3.2.3&nbsp;&nbsp;</span>Pattern 3 Entering multiple samples as strings</a></span></li></ul></li><li><span><a href="#Delete-endpoint." data-toc-modified-id="Delete-endpoint.-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Delete endpoint.</a></span></li></ul></li></ul></div>

## Define Training Job

### Get the container image to use.

In [1]:
import boto3
import sagemaker
from sagemaker import image_uris

region = boto3.Session().region_name
container = image_uris.retrieve('linear-learner', region)

print(container)

174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:1


### Define learning jobs.

In [2]:
from sagemaker.estimator import Estimator

role_ARN = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxx'    # ← your iam role ARN

ll_estimator = Estimator(container,
    role=role_ARN, 
    instance_count=1,
    instance_type='ml.m5.large',
    output_path='<S3 path>',
    base_job_name = 'for-deploy',
                         
)

ll_estimator.set_hyperparameters(predictor_type='regressor', mini_batch_size=32)

### Define data input and output

In [3]:
training_data_channel   = sagemaker.TrainingInput(
                                        s3_data = '<S3 path>',
                                        content_type='text/csv')

validation_data_channel   = sagemaker.TrainingInput(
                                        s3_data = '<S3 path>',
                                        content_type='text/csv')

ll_data = {'train': training_data_channel, 'validation': validation_data_channel}

## Execute Training Job

In [4]:
ll_estimator.fit(ll_data)

2021-02-05 06:08:54 Starting - Starting the training job...
2021-02-05 06:09:18 Starting - Launching requested ML instancesProfilerReport-1612505333: InProgress
......
2021-02-05 06:10:21 Starting - Preparing the instances for training...
2021-02-05 06:11:00 Downloading - Downloading input data...
2021-02-05 06:11:23 Training - Downloading the training image...
2021-02-05 06:12:03 Uploading - Uploading generated training model[34mDocker entrypoint called with argument(s): train[0m
[34mRunning default environment configuration script[0m
[34m[02/05/2021 06:11:55 INFO 140156597581632] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'feature_dim': u'auto', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_l


2021-02-05 06:12:40 Completed - Training job completed
Training seconds: 72
Billable seconds: 72


## Create endpoints and predict with trained models

### Launch endpoint

In [5]:
from time import strftime, gmtime
timestamp = strftime('%d-%H-%M-%S', gmtime())

endpoint_name = 'linear-learner-demo-'+timestamp
print(endpoint_name)

ll_predictor = ll_estimator.deploy(endpoint_name=endpoint_name, 
                        initial_instance_count=1, 
                        instance_type='ml.t2.medium')

# ll_predictor.content_type = 'text/csv'
ll_predictor.serializer = sagemaker.serializers.CSVSerializer()
ll_predictor.deserializer = sagemaker.deserializers.CSVDeserializer()

linear-learner-demo-05-06-13-01
---------------------!

### Predict a sample
Let's try to predict with a sample.

#### Pattern 1: Input sample as a string

In [6]:
test_sample = '0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,4.98'
response = ll_predictor.predict(test_sample)
print(response)

[['29.4456710815']]


#### Pattern 2: Input sample as a string (another way of writing)

In [7]:
runtime = boto3.Session().client(service_name='runtime.sagemaker') 
response = runtime.invoke_endpoint(EndpointName=endpoint_name, 
                                  ContentType='text/csv', 
                                  Body=test_sample)

print(response['Body'].read())

b'{"predictions": [{"score": 29.44567108154297}]}'


#### Pattern 3 Entering multiple samples as strings

In [8]:
test_samples = ['0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,4.98',
                '0.02731,0.00,7.070,0,0.4690,6.4210,78.90,4.9671,2,242.0,17.80,9.14']

response = ll_predictor.predict(test_samples)
print(response)

[['29.4456710815'], ['24.0120048523']]


### Delete endpoint.
While the endpoint is running, a cost will be incurred.
It can be deleted with the following code.

In [9]:
ll_predictor.delete_endpoint()

In [10]:
# End of File