## Train, tune, and deploy a custom ML model using IoT Sensors Data Imputer and Classifier Algorithm from AWS Marketplace 


This solution is a deep learning-based trainable algorithm, capable of detecting anomalous behavior in temperature data from IoT sensors.



This sample notebook shows you how to train a custom ML model using IoT Sensors Data Imputer and Classifier from AWS Marketplace.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

#### Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. Some hands-on experience using [Amazon SageMaker](https://aws.amazon.com/sagemaker/).
1. To use this algorithm successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to For Seller to update: Temperature IoT Data Anomaly Detection. 

#### Contents:
1. [Subscribe to the algorithm](#1.-Subscribe-to-the-algorithm)
1. [Prepare dataset](#2.-Prepare-dataset)
	1. [Dataset format expected by the algorithm](#A.-Dataset-format-expected-by-the-algorithm)
	1. [Configure and visualize train and test dataset](#B.-Configure-and-visualize-train-and-test-dataset)
	1. [Upload datasets to Amazon S3](#C.-Upload-datasets-to-Amazon-S3)
1. [Train a machine learning model](#3:-Train-a-machine-learning-model)
	1. [Set up environment](#3.1-Set-up-environment)
	1. [Train a model](#3.2-Train-a-model)
1. [Deploy model and verify results](#4:-Deploy-model-and-verify-results)
    1. [Deploy trained model](#A.-Deploy-trained-model)
    1. [Create input payload](#B.-Create-input-payload)
    1. [Perform real-time inference](#C.-Perform-real-time-inference)
    1. [Visualize output](#D.-Visualize-output)
    1. [Calculate relevant metrics](#E.-Calculate-relevant-metrics)
    1. [Delete the endpoint](#F.-Delete-the-endpoint)
1. [Tune your model! (optional)](#5:-Tune-your-model!-(optional))
	1. [Tuning Guidelines](#A.-Tuning-Guidelines)
	1. [Define Tuning configuration](#B.-Define-Tuning-configuration)
	1. [Run a model tuning job](#C.-Run-a-model-tuning-job)
1. [Perform Batch inference](#6.-Perform-Batch-inference)
1. [Clean-up](#7.-Clean-up)
	1. [Delete the model](#A.-Delete-the-model)
	1. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))


#### Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

### 1. Subscribe to the algorithm

To subscribe to the algorithm:
1. Open the algorithm listing page IoT Sensors Data Imputer and Classifier
1. On the AWS Marketplace listing,  click on **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you agree with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn**. This is the algorithm ARN that you need to specify while training a custom ML model. Copy the ARN corresponding to your region and specify the same in the following cell.

In [49]:
algo_arn='arn:aws:sagemaker:us-east-2:786796469737:algorithm/mm-imputer-classifier'

### 2. Prepare dataset

In [50]:
import base64
import json 
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
from urllib.parse import urlparse
import boto3
from IPython.display import Image
from PIL import Image as ImageEdit
import urllib.request
import numpy as np

#### A. Dataset format expected by the algorithm

The algorithm requires data in the format as decribed for best results 

1. Supported content types: text/csv to train and learn the patterns 

1. Solution takes only non-anomalous data as input data.

1. The input data should be in numerical format to train and learn the patterns. 

1. Try to incorporate as much patterns from non-anomalous data as possible to increase out of sample accuracy 

#### B. Configure and visualize train and test dataset

In [51]:
training_dataset='training/train.zip'

In [52]:
test_dataset='testing/test.csv'

In [53]:
import pandas as pd
df = pd.read_csv(test_dataset)
df.head()

Unnamed: 0,timestamp,raw_acc:magnitude_stats:mean,raw_acc:magnitude_stats:std,raw_acc:magnitude_stats:moment3,raw_acc:magnitude_stats:moment4,raw_acc:magnitude_stats:percentile25,raw_acc:magnitude_stats:percentile50,raw_acc:magnitude_stats:percentile75,raw_acc:magnitude_stats:value_entropy,raw_acc:magnitude_stats:time_entropy,...,lf_measurements:screen_brightness,lf_measurements:temperature_ambient,discrete:time_of_day:between0and6,discrete:time_of_day:between3and9,discrete:time_of_day:between6and12,discrete:time_of_day:between9and15,discrete:time_of_day:between12and18,discrete:time_of_day:between15and21,discrete:time_of_day:between18and24,discrete:time_of_day:between21and3
0,1440354000.0,0.992302,0.001369,-0.00042,0.001798,0.991377,0.992315,0.993209,2.63183,6.684611,...,0.049798,,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0
1,1445468000.0,0.998944,0.001488,0.00036,0.002002,0.998017,0.998954,0.999922,2.469889,6.684611,...,0.465863,,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
2,1445864000.0,0.991375,0.001112,0.000394,0.001448,0.99064,0.99137,0.992086,2.571172,6.684611,...,1.5e-05,,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1439378000.0,0.993948,0.00089,0.000483,0.001172,0.993355,0.993938,0.994578,2.545904,6.684611,...,0.194424,,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1446885000.0,0.991202,0.004023,0.004831,0.005937,0.988957,0.989741,0.990925,2.228749,6.684604,...,0.29487,,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


#### C. Upload datasets to Amazon S3

In [54]:
sagemaker_session = sage.Session()
bucket=sagemaker_session.default_bucket()
bucket

'sagemaker-us-east-2-786796469737'

In [55]:
training_data=sagemaker_session.upload_data(training_dataset, bucket=bucket, key_prefix='iot-sensor-data-imputer')
test_data=sagemaker_session.upload_data(test_dataset, bucket=bucket, key_prefix='iot-sensor-data-imputer')

In [56]:
print("Training input uploaded to " + training_data)

Training input uploaded to s3://sagemaker-us-east-2-786796469737/iot-sensor-data-imputer/train.zip


## 3: Train a machine learning model

Now that dataset is available in an accessible Amazon S3 bucket, we are ready to train a machine learning model. 

### 3.1 Set up environment

In [57]:
role = get_execution_role()


In [58]:
output_location = 's3://{}/iot-sensor-data-imputer/{}'.format(bucket, 'output')

### 3.2 Train a model

You can also find more information about hyperparametes in **Hyperparameters** section of Temperature IoT Data Anomaly Detection

In [59]:
#Define hyperparameters
hyperparameters={"epochs":1,"batch_size":10}

For information on creating an `Estimator` object, see [documentation](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html)

In [60]:
#Create an estimator object for running a training job
estimator = sage.algorithm.AlgorithmEstimator(
    algorithm_arn=algo_arn,
    base_job_name="iot-sensor-data-imputer-training",
    role=role,
    train_instance_count=1,
    train_instance_type='ml.m5.large',
    input_mode="File",
    output_path=output_location,
    sagemaker_session=sagemaker_session,
    hyperparameters=hyperparameters,
    instance_count=1,
    instance_type='ml.m5.large'
)
#Run the training job.
# estimator.fit({"training": training_data,"training":test_data})
estimator.fit({"training": training_data})

2022-04-25 12:13:39 Starting - Starting the training job...
2022-04-25 12:14:02 Starting - Preparing the instances for trainingProfilerReport-1650888819: InProgress
......
2022-04-25 12:15:04 Downloading - Downloading input data...
2022-04-25 12:15:35 Training - Downloading the training image...
2022-04-25 12:16:03 Training - Training image download completed. Training in progress.[34m2022-04-25 12:15:58.950066: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory[0m
[34m2022-04-25 12:15:58.950127: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.[0m
[34mStarting the training.[0m
[34mReading file[0m
[34mFile read[0m
[34m2022-04-25 12:16:02.834361: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.

See this [blog-post](https://aws.amazon.com/blogs/machine-learning/easily-monitor-and-visualize-metrics-while-training-models-on-amazon-sagemaker/) for more information how to visualize metrics during the process. You can also open the training job from [Amazon SageMaker console](https://console.aws.amazon.com/sagemaker/home?#/jobs/) and monitor the metrics/logs in **Monitor** section.

### 4: Deploy model and verify results

Now you can deploy the model for performing real-time inference.

In [61]:
model_name='iot-sensor-data-imputer-inference'

content_type='text/csv'

real_time_inference_instance_type='ml.m5.large'
batch_transform_inference_instance_type='ml.m5.large'

#### A. Deploy trained model

In [62]:
from sagemaker.predictor import csv_serializer
predictor = estimator.deploy(1, real_time_inference_instance_type, serializer=csv_serializer)

..........
-----!

Once endpoint is created, you can perform real-time inference.

#### B. Create input payload

In [63]:
df = pd.read_csv("inference.csv")

In [64]:
df

Unnamed: 0,timestamp,raw_acc:magnitude_stats:mean,raw_acc:magnitude_stats:std,raw_acc:magnitude_stats:moment3,raw_acc:magnitude_stats:moment4,raw_acc:magnitude_stats:percentile25,raw_acc:magnitude_stats:percentile50,raw_acc:magnitude_stats:percentile75,raw_acc:magnitude_stats:value_entropy,raw_acc:magnitude_stats:time_entropy,...,lf_measurements:screen_brightness,lf_measurements:temperature_ambient,discrete:time_of_day:between0and6,discrete:time_of_day:between3and9,discrete:time_of_day:between6and12,discrete:time_of_day:between9and15,discrete:time_of_day:between12and18,discrete:time_of_day:between15and21,discrete:time_of_day:between18and24,discrete:time_of_day:between21and3
0,1440354248,0.992302,0.001369,-0.00042,0.001798,0.991377,0.992315,0.993209,2.63183,6.684611,...,0.049798,,0,0,1,1,0,0,0,0
1,1445467646,0.998944,0.001488,0.00036,0.002002,0.998017,0.998954,0.999922,2.469889,6.684611,...,0.465863,,0,0,0,0,1,1,0,0
2,1445864334,0.991375,0.001112,0.000394,0.001448,0.99064,0.99137,0.992086,2.571172,6.684611,...,1.5e-05,,1,1,0,0,0,0,0,0
3,1439378319,0.993948,0.00089,0.000483,0.001172,0.993355,0.993938,0.994578,2.545904,6.684611,...,0.194424,,1,1,0,0,0,0,0,0
4,1446885058,0.991202,0.004023,0.004831,0.005937,0.988957,0.989741,0.990925,2.228749,6.684604,...,0.29487,,1,0,0,0,0,0,0,1
5,1441921673,0.986758,0.06642,0.087904,0.152966,0.976837,0.984077,0.990767,1.161228,6.682425,...,0.332769,,0,0,0,1,1,0,0,0
6,1441843141,0.995229,0.001026,0.000347,0.001354,0.994542,0.995228,0.995885,2.501182,6.684611,...,0.299112,,0,0,0,0,1,1,0,0
7,1445485451,1.040453,0.281407,0.327393,0.492929,0.965083,0.99539,1.068469,1.840273,6.650234,...,,,0,0,0,0,0,1,1,0
8,1449704568,0.976557,0.001868,0.001895,0.003957,0.975586,0.97656,0.977462,1.477629,6.68461,...,0.332751,,0,0,0,0,1,1,0,0
9,1449300425,1.014876,0.002588,-0.001385,0.003397,1.013108,1.015021,1.016713,2.625336,6.684608,...,0.691706,,0,0,0,0,0,0,1,1


#### C. Perform real-time inference

In [65]:
file_name = "inference.csv"
output_file_name = "inference_out.csv"

In [66]:
!aws sagemaker-runtime invoke-endpoint \
    --endpoint-name $predictor.endpoint \
    --body fileb://$file_name \
    --content-type $content_type \
    --region $sagemaker_session.boto_region_name \
    $output_file_name

The endpoint attribute has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


{
    "ContentType": "text/csv; charset=utf-8",
    "InvokedProductionVariant": "AllTraffic"
}


#### D. Visualize output

In [67]:
result = pd.read_csv("inference_out.csv", header=None)
result

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,200,201,202,203,204,205,206,207,208,209
0,raw_acc:magnitude_stats:mean,raw_acc:magnitude_stats:std,raw_acc:magnitude_stats:moment3,raw_acc:magnitude_stats:moment4,raw_acc:magnitude_stats:percentile25,raw_acc:magnitude_stats:percentile50,raw_acc:magnitude_stats:percentile75,raw_acc:magnitude_stats:value_entropy,raw_acc:magnitude_stats:time_entropy,raw_acc:magnitude_spectrum:log_energy_band0,...,lf_measurements:screen_brightness,discrete:time_of_day:between0and6,discrete:time_of_day:between3and9,discrete:time_of_day:between6and12,discrete:time_of_day:between9and15,discrete:time_of_day:between12and18,discrete:time_of_day:between15and21,discrete:time_of_day:between18and24,discrete:time_of_day:between21and3,predicted_label
1,0.992302,0.001369,-0.00042,0.0017980000000000001,0.9913770000000001,0.992315,0.993209,2.63183,6.684611,5.043029,...,0.049798,0,0,1,1,0,0,0,0,Sitting
2,0.998944,0.0014880000000000002,0.00036,0.002002,0.998017,0.9989540000000001,0.999922,2.469889,6.684611,5.0430589999999995,...,0.465863,0,0,0,0,1,1,0,0,Lying down
3,0.991375,0.001112,0.000394,0.001448,0.99064,0.9913700000000001,0.992086,2.5711720000000002,6.684611,5.0433069999999995,...,1.4999999999999999e-05,1,1,0,0,0,0,0,0,Sitting
4,0.9939479999999999,0.0008900000000000001,0.00048300000000000003,0.001172,0.9933549999999999,0.9939379999999999,0.9945780000000001,2.545904,6.684611,5.043362999999999,...,0.194424,1,1,0,0,0,0,0,0,Lying down
5,0.991202,0.004023,0.004831,0.005937,0.988957,0.989741,0.990925,2.228749,6.684603999999999,5.040080000000001,...,0.29486999999999997,1,0,0,0,0,0,0,1,Lying down
6,0.986758,0.06642,0.087904,0.15296600000000002,0.9768370000000001,0.9840770000000001,0.990767,1.1612280000000001,6.682425,5.049084,...,0.332769,0,0,0,1,1,0,0,0,Lying down
7,0.9952290000000001,0.001026,0.00034700000000000003,0.001354,0.994542,0.9952280000000001,0.995885,2.501182,6.684611,5.043514,...,0.299112,0,0,0,0,1,1,0,0,Lying down
8,1.040453,0.281407,0.327393,0.492929,0.965083,0.99539,1.0684690000000001,1.840273,6.650233999999999,4.963621,...,-1.0,0,0,0,0,0,1,1,0,Lying down
9,0.976557,0.0018679999999999999,0.001895,0.0039570000000000004,0.9755860000000001,0.97656,0.977462,1.477629,6.68461,5.043351,...,0.332751,0,0,0,0,1,1,0,0,Lying down


#### F. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. you can terminate the same to avoid being charged.

In [68]:
predictor.delete_endpoint(delete_endpoint_config=True)

Since this is an experiment, you do not need to run a hyperparameter tuning job. However, if you would like to see how to tune a model trained using a third-party algorithm with Amazon SageMaker's hyperparameter tuning functionality, you can run the optional tuning step.

### 5. Perform Batch inference

In this section, you will perform batch inference using multiple input payloads together.

In [69]:
#upload the batch-transform job input files to S3
transform_input_folder = "inference.csv"
transform_input = sagemaker_session.upload_data(transform_input_folder, key_prefix=model_name) 
print("Transform input uploaded to " + transform_input)

Transform input uploaded to s3://sagemaker-us-east-2-786796469737/iot-sensor-data-imputer-inference/inference.csv


In [70]:
#Run the batch-transform job
transformer = estimator.transformer(1, batch_transform_inference_instance_type)
transformer.transform(transform_input, content_type=content_type)
transformer.wait()

..........
........................
[34mStarting the inference server with 2 workers.[0m
[34m[2022-04-25 12:25:48 +0000] [10] [INFO] Starting gunicorn 20.1.0[0m
[34m[2022-04-25 12:25:48 +0000] [10] [INFO] Listening at: unix:/tmp/gunicorn.sock (10)[0m
[34m[2022-04-25 12:25:48 +0000] [10] [INFO] Using worker: gevent[0m
[34m[2022-04-25 12:25:48 +0000] [14] [INFO] Booting worker with pid: 14[0m
[34m[2022-04-25 12:25:48 +0000] [15] [INFO] Booting worker with pid: 15[0m
[34m2022-04-25 12:25:48.615287: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory[0m
[34m2022-04-25 12:25:48.615322: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.[0m
[34m2022-04-25 12:25:48.672224: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load d

In [71]:
#output is available on following path
transformer.output_path

's3://sagemaker-us-east-2-786796469737/iot-sensor-data-imputer-training-2022-04-25-12-21-58-871'

### 7. Clean-up

#### A. Delete the model

In [72]:
estimator.delete_endpoint()

The function delete_endpoint is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


#### B. Unsubscribe to the listing (optional)

If you would like to unsubscribe to the algorithm, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

