## Train, tune, and deploy a custom ML model using Smartwatch Health Data Anomaly Detection Algorithm from AWS Marketplace 

This solution uses an unsupervised anomaly detection approach to identify suspect health metrics in a person using their smartwatch data.

This sample notebook shows you how to train a custom ML model using Smartwatch Health Data Anomaly Detection Algorithm from AWS Marketplace.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

#### Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. Some hands-on experience using [Amazon SageMaker](https://aws.amazon.com/sagemaker/).
1. To use this algorithm successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to For Seller to update: Temperature IoT Data Anomaly Detection. 

#### Contents:
1. [Subscribe to the algorithm](#1.-Subscribe-to-the-algorithm)
1. [Prepare dataset](#2.-Prepare-dataset)
	1. [Dataset format expected by the algorithm](#A.-Dataset-format-expected-by-the-algorithm)
	1. [Configure and visualize train and test dataset](#B.-Configure-and-visualize-train-and-test-dataset)
	1. [Upload datasets to Amazon S3](#C.-Upload-datasets-to-Amazon-S3)
1. [Train a machine learning model](#3:-Train-a-machine-learning-model)
	1. [Set up environment](#3.1-Set-up-environment)
	1. [Train a model](#3.2-Train-a-model)
1. [Deploy model and verify results](#4:-Deploy-model-and-verify-results)
    1. [Deploy trained model](#A.-Deploy-trained-model)
    1. [Create input payload](#B.-Create-input-payload)
    1. [Perform real-time inference](#C.-Perform-real-time-inference)
    1. [Visualize output](#D.-Visualize-output)
    1. [Calculate relevant metrics](#E.-Calculate-relevant-metrics)
    1. [Delete the endpoint](#F.-Delete-the-endpoint)
1. [Tune your model! (optional)](#5:-Tune-your-model!-(optional))
	1. [Tuning Guidelines](#A.-Tuning-Guidelines)
	1. [Define Tuning configuration](#B.-Define-Tuning-configuration)
	1. [Run a model tuning job](#C.-Run-a-model-tuning-job)
1. [Perform Batch inference](#6.-Perform-Batch-inference)
1. [Clean-up](#7.-Clean-up)
	1. [Delete the model](#A.-Delete-the-model)
	1. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))


#### Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

### 1. Subscribe to the algorithm

To subscribe to the algorithm:
1. Open the algorithm listing page Smartwatch Health Data Anomaly Detection.
1. On the AWS Marketplace listing,  click on **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you agree with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn**. This is the algorithm ARN that you need to specify while training a custom ML model. Copy the ARN corresponding to your region and specify the same in the following cell.

In [1]:
algo_arn='arn:aws:sagemaker:us-east-2:786796469737:algorithm/healthcare-anomaly-detection'

### 2. Prepare dataset

In [2]:
import base64
import json 
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
from urllib.parse import urlparse
import boto3
from IPython.display import Image
from PIL import Image as ImageEdit
import urllib.request
import numpy as np

#### A. Dataset format expected by the algorithm

Usage Instructions:
- The input data should only be in numerical format. String data types will not be supported.
- Along with the training file, a text file needs to be uploaded which contains a list of sensors in a list format.
- The files should be zipped in a folder named 'train.zip' and uploaded for training
- The names of the sensors in this list should also be present in the column names to indicate which sensor a particular column belongs to.
- Include data with minimal null/missing values in the training file to ensure that the maximum amount of patterns can be learnt on this clean data.
- After training, testing can be done using data with a greater proportion of null values.

The algorithm requires data in the format as decribed for best results 

- The input data should only be in numerical format. There should not be any columns of string data type.
- For the initial training data, there should be minimal null/missing values for optimal training performance on maximum clean data from which the model can learn necessary patterns
- Null values should not be present as empty strings ("" "") as these will not be detected during the processing for this solution
- Along with the training file, a text file named 'sensor_list.txt' needs to be uploaded which contains a list of sensors in a list format eg. ['sensor_1', 'sensor_2', 'sensor_3'].
- The names of the sensors in this list should also be present in the column names to indicate which sensor a particular column belongs to.
- Include data with minimal null/missing values in the training file to ensure that the maximum amount of patterns can be learnt on this clean data. Test data can have a greater proportion of missing values.
- The expected target values should be present in a column named 'label'.

#### B. Configure and visualize train and test dataset

In [3]:
training_dataset='training/train_hr_steps.csv'

In [4]:
test_dataset='testing/inf_data.csv'

In [5]:
import pandas as pd
df = pd.read_csv(test_dataset)
df.head()

Unnamed: 0,datetime,user,heartrate,steps
0,12-08-2020 00:19,AHYIJDV,56,0
1,12-08-2020 00:23,AHYIJDV,57,0
2,12-08-2020 00:25,AHYIJDV,59,0
3,12-08-2020 00:26,AHYIJDV,60,0
4,12-08-2020 00:27,AHYIJDV,59,0


#### C. Upload datasets to Amazon S3

In [6]:
sagemaker_session = sage.Session()
bucket=sagemaker_session.default_bucket()
bucket

'sagemaker-us-east-2-786796469737'

In [7]:
training_data=sagemaker_session.upload_data(training_dataset, bucket=bucket, key_prefix='smartwatch_health_data_anomaly_detection')
test_data=sagemaker_session.upload_data(test_dataset, bucket=bucket, key_prefix='smartwatch_health_data_anomaly_detection')

In [8]:
print("Training input uploaded to " + training_data)

Training input uploaded to s3://sagemaker-us-east-2-786796469737/smartwatch_health_data_anomaly_detection/train_hr_steps.csv


## 3: Train a machine learning model

Now that dataset is available in an accessible Amazon S3 bucket, we are ready to train a machine learning model. 

### 3.1 Set up environment

In [9]:
role = get_execution_role()


In [10]:
output_location = 's3://{}/smartwatch_health_data_anomaly_detection/{}'.format(bucket, 'output')

### 3.2 Train a model

You can also find more information about hyperparametes in **Hyperparameters** section of IoT Sensors Data Imputer and Classifier Algorithm.

In [11]:
#Define hyperparameters
hyperparameters={"outliers_fraction":0.07}

For information on creating an `Estimator` object, see [documentation](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html)

In [12]:
#Create an estimator object for running a training job
estimator = sage.algorithm.AlgorithmEstimator(
    algorithm_arn=algo_arn,
    base_job_name="smartwatch-health-data-anomaly-detection",
    role=role,
    train_instance_count=1,
    train_instance_type='ml.m5.large',
    input_mode="File",
    output_path=output_location,
    sagemaker_session=sagemaker_session,
    hyperparameters=hyperparameters,
    instance_count=1,
    instance_type='ml.m5.large'
)
#Run the training job.
# estimator.fit({"training": training_data,"training":test_data})
estimator.fit({"training": training_data})

2022-05-26 09:50:51 Starting - Starting the training job...
2022-05-26 09:51:14 Starting - Preparing the instances for trainingProfilerReport-1653558651: InProgress
......
2022-05-26 09:52:15 Downloading - Downloading input data...
2022-05-26 09:52:46 Training - Training image download completed. Training in progress..[34mStarting the training.[0m
[34mModel saved[0m
[34mTraining complete.[0m
[34mTraining Successful[0m

2022-05-26 09:53:14 Uploading - Uploading generated training model
2022-05-26 09:53:14 Completed - Training job completed
Training seconds: 52
Billable seconds: 52


See this [blog-post](https://aws.amazon.com/blogs/machine-learning/easily-monitor-and-visualize-metrics-while-training-models-on-amazon-sagemaker/) for more information how to visualize metrics during the process. You can also open the training job from [Amazon SageMaker console](https://console.aws.amazon.com/sagemaker/home?#/jobs/) and monitor the metrics/logs in **Monitor** section.

### 4: Deploy model and verify results

Now you can deploy the model for performing real-time inference.

In [13]:
model_name='smartwatch_health_data_anomaly_detection_inference'

content_type='text/csv'

real_time_inference_instance_type='ml.m5.large'
batch_transform_inference_instance_type='ml.m5.large'

#### A. Deploy trained model

In [14]:
from sagemaker.predictor import csv_serializer
predictor = estimator.deploy(1, real_time_inference_instance_type, serializer=csv_serializer)

..........
----!

Once endpoint is created, you can perform real-time inference.

#### B. Create input payload

In [15]:
df = pd.read_csv("testing/inf_data.csv")

In [16]:
df

Unnamed: 0,datetime,user,heartrate,steps
0,12-08-2020 00:19,AHYIJDV,56,0
1,12-08-2020 00:23,AHYIJDV,57,0
2,12-08-2020 00:25,AHYIJDV,59,0
3,12-08-2020 00:26,AHYIJDV,60,0
4,12-08-2020 00:27,AHYIJDV,59,0
...,...,...,...,...
1256,20-08-2020 22:19,AHYIJDV,69,0
1257,20-08-2020 22:20,AHYIJDV,65,0
1258,20-08-2020 22:23,AHYIJDV,64,8
1259,20-08-2020 22:24,AHYIJDV,67,0


#### C. Perform real-time inference

In [17]:
file_name = "testing/inf_data.csv"
output_file_name = "inference_out.csv"

In [18]:
!aws sagemaker-runtime invoke-endpoint \
    --endpoint-name $predictor.endpoint \
    --body fileb://$file_name \
    --content-type $content_type \
    --region $sagemaker_session.boto_region_name \
    $output_file_name

The endpoint attribute has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


{
    "ContentType": "text/csv; charset=utf-8",
    "InvokedProductionVariant": "AllTraffic"
}


#### D. Visualize output

In [19]:
result = pd.read_csv("inference_out.csv", header=None)
result

Unnamed: 0,0,1,2,3
0,index,heartrate,steps_window_12,predicted_label
1,2020-08-15 10:00:00,-2.675685172385544,2.4202229689617893,Anomalous
2,2020-08-15 11:00:00,-2.3789675415562117,2.4536097188634205,Anomalous
3,2020-08-15 15:00:00,-2.7179218924623214,2.954932635355101,Anomalous
4,2020-08-15 19:00:00,-2.8317303808798817,3.177619648859143,Anomalous
...,...,...,...,...
74,2020-08-20 17:00:00,0.2788786494797608,0.6801994643228628,Normal
75,2020-08-20 18:00:00,-0.22556507013599159,0.789430940346521,Normal
76,2020-08-20 19:00:00,-0.33986858820485105,0.47638875779910916,Normal
77,2020-08-20 21:00:00,-0.10251387515693833,0.2553009050278904,Normal


#### F. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. you can terminate the same to avoid being charged.

In [20]:
predictor.delete_endpoint(delete_endpoint_config=True)

Since this is an experiment, you do not need to run a hyperparameter tuning job. However, if you would like to see how to tune a model trained using a third-party algorithm with Amazon SageMaker's hyperparameter tuning functionality, you can run the optional tuning step.

### 5. Perform Batch inference

In this section, you will perform batch inference using multiple input payloads together.

In [21]:
#upload the batch-transform job input files to S3
transform_input_folder = "testing/inf_data.csv"
transform_input = sagemaker_session.upload_data(transform_input_folder, key_prefix=model_name) 
print("Transform input uploaded to " + transform_input)

Transform input uploaded to s3://sagemaker-us-east-2-786796469737/smartwatch_health_data_anomaly_detection_inference/inf_data.csv


In [22]:
#Run the batch-transform job
transformer = estimator.transformer(1, batch_transform_inference_instance_type)
transformer.transform(transform_input, content_type=content_type)
transformer.wait()

..........
.....................[34mStarting the inference server with 2 workers.[0m
[34m[2022-05-26 10:00:32 +0000] [10] [INFO] Starting gunicorn 20.1.0[0m
[34m[2022-05-26 10:00:32 +0000] [10] [INFO] Listening at: unix:/tmp/gunicorn.sock (10)[0m
[34m[2022-05-26 10:00:32 +0000] [10] [INFO] Using worker: gevent[0m
[34m[2022-05-26 10:00:32 +0000] [14] [INFO] Booting worker with pid: 14[0m
[34m[2022-05-26 10:00:32 +0000] [15] [INFO] Booting worker with pid: 15[0m
[34m169.254.255.130 - - [26/May/2022:10:00:40 +0000] "GET /ping HTTP/1.1" 200 1 "-" "Go-http-client/1.1"[0m
[34m169.254.255.130 - - [26/May/2022:10:00:40 +0000] "GET /execution-parameters HTTP/1.1" 404 2 "-" "Go-http-client/1.1"[0m
[34mInvoked with (1261, 3) records[0m
[34m[INFO] Predicting...[0m
[34m169.254.255.130 - - [26/May/2022:10:00:41 +0000] "POST /invocations HTTP/1.1" 200 5260 "-" "Go-http-client/1.1"[0m
[35m169.254.255.130 - - [26/May/2022:10:00:40 +0000] "GET /ping HTTP/1.1" 200 1 "-" "Go-http-cl

In [23]:
#output is available on following path
transformer.output_path

's3://sagemaker-us-east-2-786796469737/smartwatch-health-data-anomaly-detectio-2022-05-26-09-57-07-139'

### 7. Clean-up

#### A. Delete the model

In [24]:
estimator.delete_endpoint()

The function delete_endpoint is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


#### B. Unsubscribe to the listing (optional)

If you would like to unsubscribe to the algorithm, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

