## Deploy Anomaly Detection in IoT Data Algorithm from AWS Marketplace 

Imbalance data is a major challenge in anomaly detection domain, with huge non-anomalous data and limited anomalous data. This solution is in sync with data imbalance and is a semi-supervised approach which uses generative deep learning model to learn normal IoT sensor patterns using non-anomalous data and then builds a 1-rule threshold model using data from both classes to identify the anomalous behavior of the sensor using inclusion-exclusion principle. The solution is also re-trainable to capture information drift.

This sample notebook shows you how to deploy Anomaly Detection in IoT Data Algorithm using Amazon SageMaker.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

#### Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to Healthcare Fraud Detection System.

#### Contents:
1. [Subscribe to the Algorithm](#1.-Subscribe-to-the-Algorithm)
2. [Prepare dataset](#2.-Prepare-dataset)
    1. [Dataset format expected by the algorithm](#A.-Dataset-format-expected-by-the-algorithm)
    2. [Configure and visualize train,validation and test dataset](#B.-Configure-and-visualize-train,-validation-and-test-dataset)
    3. [Upload datasets to Amazon S3](#C.-Upload-datasets-to-Amazon-S3)
3. [Train a machine learning model](#3.-Train-a-machine-learning-model)
    1. [Set up environment](#A.-Set-up-environment)
    2. [Train a model](#B.-Train-a-model)
4. [Deploy model and verify results](#4.-Deploy-model-and-verify-results)
    1. [Deplay trained model](#A.-Deploy-trained-model)
    2. [Create input payload](#B.-Create-input-payload)
    3. [Perform real-time inference](#C.-Perform-real-time-inference)
    4. [Visualize output](#D.-Visualize-output)
    5. [Delete the endpoint](#E.-Delete-the-endpoint)
5. [Perform Batch inference](#5.-Perform-Batch-inference)
    1. [Inspect the Batch Transform Output in S3](#A.-Inspect-the-Batch-Transform-Output-in-S3)
6. [Clean-up](#6.-Clean-up)
    1. [Delete the model](#A.-Delete-the-model)
    2. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))
    

#### Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

### 1. Subscribe to the Algorithm

To subscribe to the Algorithm:
1. Open the algorithm listing page **Anomaly Detection in IoT Data**
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the algorithm ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

In [1]:
algorithm_arn ='arn:aws:sagemaker:us-east-2:786796469737:algorithm/anomaly-detection-in-iot-data'

### 2. Prepare dataset

In [2]:
import base64
import json 
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker.algorithm import AlgorithmEstimator
from sagemaker import ModelPackage
from urllib.parse import urlparse
import boto3
from IPython.display import Image
from PIL import Image as ImageEdit
import urllib.request
import numpy as np
import pandas as pd

#### A. Dataset format expected by the algorithm

The deployed solution has these **2 steps**: Training the algorithm and Testing

<li>: The algorithm trains on user provided dataset.
<li>: The train dataset must contain - "train.csv" with 'utf-8' encoding.
<li>: The machine learning model is trained in the training step and once the model is generated, it can be used to make prediction on test data
<li>: The testing API takes a csv file "test.csv" and predicts whether the claim is fraudulent or not.
<br>

#### B. Configure and visualize train, validation and test dataset

In [3]:
training_dataset='data/training/train.csv'

In [4]:
train_input_df = pd.read_csv(training_dataset, index_col=0)
train_input_df.head()

Unnamed: 0,Bearing 1,Bearing 2,Bearing 3,Bearing 4
2/15/2004 10:32,0.060959,0.074498,0.076289,0.043229
2/15/2004 10:42,0.060307,0.073227,0.075189,0.04328
2/15/2004 10:52,0.061289,0.074106,0.076946,0.043786
2/15/2004 11:02,0.061656,0.07476,0.07607,0.043548
2/15/2004 11:12,0.060574,0.073734,0.078335,0.04499


In [5]:
test_dataset='data/transform/test.csv'

In [6]:
test_input_df = pd.read_csv(test_dataset, index_col=0)
test_input_df.head()

Unnamed: 0,Bearing 1,Bearing 2,Bearing 3,Bearing 4
2/15/2004 13:02,0.061667,0.074043,0.076449,0.044803
2/15/2004 13:12,0.059932,0.074051,0.075502,0.044021
2/15/2004 13:22,0.059952,0.073974,0.076825,0.04349
2/15/2004 13:32,0.060587,0.074444,0.077079,0.044251
2/15/2004 13:42,0.062083,0.074579,0.077464,0.043866


#### C. Upload datasets to Amazon S3

In [7]:
sagemaker_session = sage.Session()
bucket=sagemaker_session.default_bucket()

In [8]:
# training input location
common_prefix = "anomaly-detection-in-iot-data"
training_input_prefix = common_prefix + "/training-input-data"
TRAINING_WORKDIR = "data/training"
training_input = sagemaker_session.upload_data(TRAINING_WORKDIR, key_prefix=training_input_prefix)

In [9]:
TRANSFORM_WORKDIR = "data/transform"
batch_inference_input_prefix = common_prefix + "/batch-inference-input-data"
transform_input = sagemaker_session.upload_data(TRANSFORM_WORKDIR, key_prefix=batch_inference_input_prefix) + "/unlabelled.csv"
print("Transform input uploaded to " + transform_input)

Transform input uploaded to s3://sagemaker-us-east-2-786796469737/anomaly-detection-in-iot-data/batch-inference-input-data/unlabelled.csv


### 3. Train a machine learning model

Now that dataset is available in an accessible Amazon S3 bucket, we are ready to train a machine learning model.

#### A. Set up environment

In [10]:
role = get_execution_role()

#### B. Train a model

In [11]:
algo = AlgorithmEstimator(
    algorithm_arn=algorithm_arn,
    role=role,
    instance_count=1,
    instance_type='ml.c4.xlarge',
    base_job_name='anomaly-detection-in-iot-data-marketplace')

In [12]:
print ("Now run the training job using algorithm arn %s in region %s" % (algorithm_arn, sagemaker_session.boto_region_name))
algo.fit({'training': training_input})

Now run the training job using algorithm arn arn:aws:sagemaker:us-east-2:786796469737:algorithm/anomaly-detection-in-iot-data in region us-east-2
2021-11-10 12:52:00 Starting - Starting the training job...
2021-11-10 12:52:23 Starting - Launching requested ML instancesProfilerReport-1636548720: InProgress
...
2021-11-10 12:52:52 Starting - Preparing the instances for training.........
2021-11-10 12:54:23 Downloading - Downloading input data...
2021-11-10 12:54:43 Training - Downloading the training image......
2021-11-10 12:55:56 Uploading - Uploading generated training model[34m2021-11-10 12:55:43.672775: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory[0m
[34m2021-11-10 12:55:43.672859: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.[0m
[34mStartin


2021-11-10 12:56:24 Completed - Training job completed
Training seconds: 101
Billable seconds: 101


### 4. Deploy model and verify results
Now you can deploy the model for performing real-time inference.

In [16]:
model_name='anomaly-detection-in-iot-data-1'

content_type='text/csv'

real_time_inference_instance_type='ml.c4.xlarge'
batch_transform_inference_instance_type='ml.c4.large'

#### A. Deploy trained model

In [17]:
#Deploy the model
predictor = algo.deploy(1, 'ml.c4.xlarge',endpoint_name=model_name)

...........
-----!

Once endpoint is created, you can perform real-time inference.

#### B. Create input payload

In [18]:
file_name = 'data/transform/test.csv'

#### C. Perform real-time inference

In [19]:
!aws sagemaker-runtime invoke-endpoint \
    --endpoint-name 'anomaly-detection-in-iot-data-1' \
    --body fileb://$file_name \
    --content-type 'text/csv' \
    --region us-east-2 \
    "output.csv"

{
    "ContentType": "text/csv; charset=utf-8",
    "InvokedProductionVariant": "AllTraffic"
}


#### D. Visualize output

In [20]:
output = pd.read_csv("output.csv")
output.head(10)

Unnamed: 0,Bearing 1,Bearing 2,Bearing 3,Bearing 4,Anomaly
0,0.061667,0.074043,0.076449,0.044803,No
1,0.059932,0.074051,0.075502,0.044021,No
2,0.059952,0.073974,0.076825,0.04349,No
3,0.060587,0.074444,0.077079,0.044251,No
4,0.062083,0.074579,0.077464,0.043866,No
5,0.060754,0.074319,0.077035,0.043082,No
6,0.062053,0.074256,0.077871,0.043901,No
7,0.062129,0.074085,0.077453,0.044017,No
8,0.061882,0.073684,0.075903,0.044326,No
9,0.06117,0.073761,0.074654,0.044231,No


#### E. Delete the endpoint
Now that you have successfully performed a real-time inference, you do not need the endpoint any more. you can terminate the same to avoid being charged.

In [21]:
predictor.delete_endpoint(delete_endpoint_config=True)

### 5. Perform Batch inference
In this section, you will perform batch inference using multiple input payloads together. If you are not familiar with batch transform, and want to learn more, see these links:
1. [How it works](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-batch-transform.html)
2. [How to run a batch transform job](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html)

In [22]:
TRANSFORM_WORKDIR = "data/transform"
transform_input = sagemaker_session.upload_data(TRANSFORM_WORKDIR, key_prefix=batch_inference_input_prefix) + "/test.csv"
print("Transform input uploaded to " + transform_input)

Transform input uploaded to s3://sagemaker-us-east-2-786796469737/anomaly-detection-in-iot-data/batch-inference-input-data/test.csv


In [23]:
transformer = algo.transformer(1, 'ml.m4.xlarge')
transformer.transform(transform_input, content_type='text/csv')
transformer.wait()

print("Batch Transform output saved to " + transformer.output_path)

..........
................................[34mStarting the inference server with 4 workers.[0m
[34m[2021-11-10 13:13:33 +0000] [13] [INFO] Starting gunicorn 20.1.0[0m
[34m[2021-11-10 13:13:33 +0000] [13] [INFO] Listening at: unix:/tmp/gunicorn.sock (13)[0m
[34m[2021-11-10 13:13:33 +0000] [13] [INFO] Using worker: gevent[0m
[34m[2021-11-10 13:13:33 +0000] [17] [INFO] Booting worker with pid: 17[0m
[34m[2021-11-10 13:13:33 +0000] [18] [INFO] Booting worker with pid: 18[0m
[34m[2021-11-10 13:13:33 +0000] [22] [INFO] Booting worker with pid: 22[0m
[34m[2021-11-10 13:13:33 +0000] [26] [INFO] Booting worker with pid: 26[0m
[34m2021-11-10 13:13:34.712753: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory[0m
[34m2021-11-10 13:13:34.712820: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if yo

In [24]:
#output is available on following path
transformer.output_path

's3://sagemaker-us-east-2-786796469737/anomaly-detection-in-iot-data-marketpla-2021-11-10-13-08-25-535'

#### A. Inspect the Batch Transform Output in S3

In [25]:
from urllib.parse import urlparse

parsed_url = urlparse(transformer.output_path)
bucket_name = parsed_url.netloc
file_key = '{}/{}.out'.format(parsed_url.path[1:], "test.csv")

s3_client = sagemaker_session.boto_session.client('s3')

response = s3_client.get_object(Bucket = sagemaker_session.default_bucket(), Key = file_key)

In [26]:
bucketFolder = transformer.output_path.rsplit('/')[3]

In [27]:
import boto3
s3_conn = boto3.client("s3")
bucket_name=bucket
with open('output.csv', 'wb') as f:
    s3_conn.download_fileobj(bucket_name, bucketFolder+'/' + "test.csv" +'.out', f)
    print("Output file loaded from bucket")

Output file loaded from bucket


In [28]:
output = pd.read_csv('output.csv')

In [29]:
output.head(10)

Unnamed: 0,Bearing 1,Bearing 2,Bearing 3,Bearing 4,Anomaly
0,0.061667,0.074043,0.076449,0.044803,No
1,0.059932,0.074051,0.075502,0.044021,No
2,0.059952,0.073974,0.076825,0.04349,No
3,0.060587,0.074444,0.077079,0.044251,No
4,0.062083,0.074579,0.077464,0.043866,No
5,0.060754,0.074319,0.077035,0.043082,No
6,0.062053,0.074256,0.077871,0.043901,No
7,0.062129,0.074085,0.077453,0.044017,No
8,0.061882,0.073684,0.075903,0.044326,No
9,0.06117,0.073761,0.074654,0.044231,No


### 6. Clean-up

#### A. Delete the model

In [30]:
transformer.delete_model()

#### B. Unsubscribe to the listing (optional)
If you would like to unsubscribe to the algorithm, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

In [31]:
!tar cvfz allfiles.zip *

Anomaly Detection in IoT Data.ipynb
data/
data/training/
data/training/train.csv
data/training/.ipynb_checkpoints/
data/.ipynb_checkpoints/
data/transform/
data/transform/test.csv
data/transform/.ipynb_checkpoints/
output.csv
