## Train, tune, and deploy a custom ML model using Temperature IoT Data Anomaly Detection Algorithm from AWS Marketplace 


This solution is a deep learning-based trainable algorithm, capable of detecting anomalous behavior in temperature data from IoT sensors.



This sample notebook shows you how to train a custom ML model using Temperature IoT Data Anomaly Detection from AWS Marketplace.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

#### Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. Some hands-on experience using [Amazon SageMaker](https://aws.amazon.com/sagemaker/).
1. To use this algorithm successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to For Seller to update: Temperature IoT Data Anomaly Detection. 

#### Contents:
1. [Subscribe to the algorithm](#1.-Subscribe-to-the-algorithm)
1. [Prepare dataset](#2.-Prepare-dataset)
	1. [Dataset format expected by the algorithm](#A.-Dataset-format-expected-by-the-algorithm)
	1. [Configure and visualize train and test dataset](#B.-Configure-and-visualize-train-and-test-dataset)
	1. [Upload datasets to Amazon S3](#C.-Upload-datasets-to-Amazon-S3)
1. [Train a machine learning model](#3:-Train-a-machine-learning-model)
	1. [Set up environment](#3.1-Set-up-environment)
	1. [Train a model](#3.2-Train-a-model)
1. [Deploy model and verify results](#4:-Deploy-model-and-verify-results)
    1. [Deploy trained model](#A.-Deploy-trained-model)
    1. [Create input payload](#B.-Create-input-payload)
    1. [Perform real-time inference](#C.-Perform-real-time-inference)
    1. [Visualize output](#D.-Visualize-output)
    1. [Calculate relevant metrics](#E.-Calculate-relevant-metrics)
    1. [Delete the endpoint](#F.-Delete-the-endpoint)
1. [Tune your model! (optional)](#5:-Tune-your-model!-(optional))
	1. [Tuning Guidelines](#A.-Tuning-Guidelines)
	1. [Define Tuning configuration](#B.-Define-Tuning-configuration)
	1. [Run a model tuning job](#C.-Run-a-model-tuning-job)
1. [Perform Batch inference](#6.-Perform-Batch-inference)
1. [Clean-up](#7.-Clean-up)
	1. [Delete the model](#A.-Delete-the-model)
	1. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))


#### Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

### 1. Subscribe to the algorithm

To subscribe to the algorithm:
1. Open the algorithm listing page Temperature IoT Data Anomaly Detection
1. On the AWS Marketplace listing,  click on **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you agree with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn**. This is the algorithm ARN that you need to specify while training a custom ML model. Copy the ARN corresponding to your region and specify the same in the following cell.

In [1]:
algo_arn='temp-iot-anomaly'

### 2. Prepare dataset

In [2]:
import base64
import json 
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
from urllib.parse import urlparse
import boto3
from IPython.display import Image
from PIL import Image as ImageEdit
import urllib.request
import numpy as np

#### A. Dataset format expected by the algorithm

The algorithm requires data in the format as decribed for best results 

1. Supported content types: text/csv to train and learn the patterns 

1. Solution takes only non-anomalous data as input data.

1. The input data should be in numerical format to train and learn the patterns. 

1. Try to incorporate as much patterns from non-anomalous data as possible to increase out of sample accuracy 

#### B. Configure and visualize train and test dataset

In [3]:
training_dataset='train.csv'

In [4]:
test_dataset='test.csv'

In [5]:
import pandas as pd
df = pd.read_csv(training_dataset)
df.head()

Unnamed: 0,value,anomaly,year,month,day,hour,minute,daylight,day_of_week,weekday,time_epoch
0,69.880835,0,2013,7,4,0,0,0,3,1,13728960
1,71.220227,0,2013,7,4,1,0,0,3,1,13728996
2,70.877805,0,2013,7,4,2,0,0,3,1,13729032
3,68.9594,0,2013,7,4,3,0,0,3,1,13729068
4,69.283551,0,2013,7,4,4,0,0,3,1,13729104


#### C. Upload datasets to Amazon S3

In [6]:
sagemaker_session = sage.Session()
bucket=sagemaker_session.default_bucket()
bucket

'sagemaker-us-east-2-786796469737'

In [7]:
training_data=sagemaker_session.upload_data(training_dataset, bucket=bucket, key_prefix='temp-iot-anomaly')
test_data=sagemaker_session.upload_data(test_dataset, bucket=bucket, key_prefix='temp-iot-anomaly')

## 3: Train a machine learning model

Now that dataset is available in an accessible Amazon S3 bucket, we are ready to train a machine learning model. 

### 3.1 Set up environment

In [8]:
role = get_execution_role()


In [9]:
output_location = 's3://{}/temp-iot-anomaly/{}'.format(bucket, 'output')

### 3.2 Train a model

You can also find more information about hyperparametes in **Hyperparameters** section of Temperature IoT Data Anomaly Detection

In [10]:
#Define hyperparameters
hyperparameters={"epochs":1,"batch_size":10}

For information on creating an `Estimator` object, see [documentation](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html)

In [12]:
#Create an estimator object for running a training job
estimator = sage.algorithm.AlgorithmEstimator(
    algorithm_arn=algo_arn,
    base_job_name="temp-iot-anomaly-training",
    role=role,
    train_instance_count=1,
    train_instance_type='ml.m5.large',
    input_mode="File",
    output_path=output_location,
    sagemaker_session=sagemaker_session,
    hyperparameters=hyperparameters,
    instance_count=1,
    instance_type='ml.m5.large'
)
#Run the training job.
estimator.fit({"training": training_data,"training":test_data})

2022-02-07 03:17:02 Starting - Starting the training job...
2022-02-07 03:17:18 Starting - Preparing the instances for trainingProfilerReport-1644203822: InProgress
......
2022-02-07 03:18:22 Downloading - Downloading input data...
2022-02-07 03:18:59 Training - Downloading the training image...
2022-02-07 03:19:19 Training - Training image download completed. Training in progress.[34mStarting the training.[0m
[34mModel saved[0m
[34mTraining complete.[0m
[34mTraining Successful[0m



2022-02-07 03:20:00 Uploading - Uploading generated training model
2022-02-07 03:20:00 Completed - Training job completed
Training seconds: 82
Billable seconds: 82


See this [blog-post](https://aws.amazon.com/blogs/machine-learning/easily-monitor-and-visualize-metrics-while-training-models-on-amazon-sagemaker/) for more information how to visualize metrics during the process. You can also open the training job from [Amazon SageMaker console](https://console.aws.amazon.com/sagemaker/home?#/jobs/) and monitor the metrics/logs in **Monitor** section.

### 4: Deploy model and verify results

Now you can deploy the model for performing real-time inference.

In [25]:
model_name='temp-iot-anomaly-inference'

content_type='text/csv'

real_time_inference_instance_type='ml.m5.large'
batch_transform_inference_instance_type='ml.m5.large'

#### A. Deploy trained model

In [28]:
from sagemaker.predictor import csv_serializer
predictor = estimator.deploy(1, real_time_inference_instance_type, serializer=csv_serializer)

..........
-----!

Once endpoint is created, you can perform real-time inference.

#### B. Create input payload

In [29]:
df = pd.read_csv("inference.csv")

In [30]:
df

Unnamed: 0,value,anomaly,year,month,day,hour,minute,daylight,day_of_week,weekday,time_epoch
0,73.271959,0,2013,12,10,1,0,0,1,1,13866372
1,73.909588,0,2013,12,10,2,0,0,1,1,13866408
2,74.127059,0,2013,12,10,3,0,0,1,1,13866444
3,73.649890,0,2013,12,10,4,0,0,1,1,13866480
4,73.120198,0,2013,12,10,5,0,0,1,1,13866516
...,...,...,...,...,...,...,...,...,...,...,...
3848,72.370206,0,2014,5,28,11,0,1,2,1,14012748
3849,72.172956,0,2014,5,28,12,0,1,2,1,14012784
3850,72.046565,0,2014,5,28,13,0,1,2,1,14012820
3851,71.825226,0,2014,5,28,14,0,1,2,1,14012856


#### C. Perform real-time inference

In [34]:
file_name = "inference.csv"
output_file_name = "inference_out.csv"

In [35]:
!aws sagemaker-runtime invoke-endpoint \
    --endpoint-name $predictor.endpoint \
    --body fileb://$file_name \
    --content-type $content_type \
    --region $sagemaker_session.boto_region_name \
    $output_file_name

The endpoint attribute has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


{
    "ContentType": "text/csv; charset=utf-8",
    "InvokedProductionVariant": "AllTraffic"
}


#### D. Visualize output

In [38]:
result = pd.read_csv("inference_out.csv", header=None)
result

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,73.271959,Non-Anomalous,2013,12,10,1,0,0,1,1,13866372
1,73.909588,Non-Anomalous,2013,12,10,2,0,0,1,1,13866408
2,74.127059,Non-Anomalous,2013,12,10,3,0,0,1,1,13866444
3,73.649890,Non-Anomalous,2013,12,10,4,0,0,1,1,13866480
4,73.120198,Non-Anomalous,2013,12,10,5,0,0,1,1,13866516
...,...,...,...,...,...,...,...,...,...,...,...
3848,72.370206,Non-Anomalous,2014,5,28,11,0,1,2,1,14012748
3849,72.172956,Non-Anomalous,2014,5,28,12,0,1,2,1,14012784
3850,72.046565,Non-Anomalous,2014,5,28,13,0,1,2,1,14012820
3851,71.825226,Non-Anomalous,2014,5,28,14,0,1,2,1,14012856


#### F. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. you can terminate the same to avoid being charged.

In [39]:
predictor.delete_endpoint(delete_endpoint_config=True)

Since this is an experiment, you do not need to run a hyperparameter tuning job. However, if you would like to see how to tune a model trained using a third-party algorithm with Amazon SageMaker's hyperparameter tuning functionality, you can run the optional tuning step.

### 5. Perform Batch inference

In this section, you will perform batch inference using multiple input payloads together.

In [40]:
#upload the batch-transform job input files to S3
transform_input_folder = "inference.csv"
transform_input = sagemaker_session.upload_data(transform_input_folder, key_prefix=model_name) 
print("Transform input uploaded to " + transform_input)

Transform input uploaded to s3://sagemaker-us-east-2-786796469737/temp-iot-anomaly-inference/inference.csv


In [41]:
#Run the batch-transform job
transformer = estimator.transformer(1, batch_transform_inference_instance_type)
transformer.transform(transform_input, content_type=content_type)
transformer.wait()

..........
.........................[34mStarting the inference server with 2 workers.[0m
[34m[2022-02-07 03:38:32 +0000] [11] [INFO] Starting gunicorn 20.1.0[0m
[34m[2022-02-07 03:38:32 +0000] [11] [INFO] Listening at: unix:/tmp/gunicorn.sock (11)[0m
[34m[2022-02-07 03:38:32 +0000] [11] [INFO] Using worker: gevent[0m
[34m[2022-02-07 03:38:32 +0000] [15] [INFO] Booting worker with pid: 15[0m
[34m[2022-02-07 03:38:32 +0000] [16] [INFO] Booting worker with pid: 16[0m
[34m169.254.255.130 - - [07/Feb/2022:03:38:41 +0000] "GET /ping HTTP/1.1" 200 1 "-" "Go-http-client/1.1"[0m
[34m169.254.255.130 - - [07/Feb/2022:03:38:41 +0000] "GET /execution-parameters HTTP/1.1" 404 2 "-" "Go-http-client/1.1"[0m
[34mInvoked with 3853 records[0m
[34m169.254.255.130 - - [07/Feb/2022:03:38:42 +0000] "POST /invocations HTTP/1.1" 200 214164 "-" "Go-http-client/1.1"[0m
[32m2022-02-07T03:38:41.503:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD[0m



In [42]:
#output is available on following path
transformer.output_path

's3://sagemaker-us-east-2-786796469737/temp-iot-anomaly-training-2022-02-07-03-34-35-642'

### 7. Clean-up

#### A. Delete the model

In [43]:
estimator.delete_endpoint()

The function delete_endpoint is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


#### B. Unsubscribe to the listing (optional)

If you would like to unsubscribe to the algorithm, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

