# Train, tune, and deploy a custom ML model using <font color='red'>AI-Advisor Univariate Point Anomaly Detection </font> Algorithm from AWS Marketplace 


<font color='red'> This algorithm detects anomaly points for continuous univariate time series data that peaks up suddenly, using unsupervised Machine Learning Anomaly Detection approach. </font>

This sample notebook shows you how to train a custom ML model using <font color='red'> For Seller to update: [AI-Advisor UPAD](https://github.com/AI-Advisor-ML-Marketplace/point-anomaly-detection)</font> from AWS Marketplace.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

## Pre-requisites
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. Some hands-on experience using [Amazon SageMaker](https://aws.amazon.com/sagemaker/).
1. To use this algorithm successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to <font color='red'> For Seller to update: [AI-Advisor UPAD](https://github.com/AI-Advisor-ML-Marketplace/point-anomaly-detection)</font>. 

## Contents
1. [Subscribe to the algorithm](#1.-Subscribe-to-the-algorithm)
2. [Prepare dataset](#2.-Prepare-dataset)
	1. [Dataset format expected by the algorithm](#A.-Dataset-format-expected-by-the-algorithm)
	2. [Configure and visualize train and test dataset](#B.-Configure-and-visualize-train-and-test-dataset)
	3. [Upload datasets to Amazon S3](#C.-Upload-datasets-to-Amazon-S3)
3. [Train a machine learning model](#3:-Train-a-machine-learning-model)
	1. [Set up environment](#3.1-Set-up-environment)
	2. [Train a model](#3.2-Train-a-model)
4. [Deploy model and verify results](#4:-Deploy-model-and-verify-results)
    1. [Deploy trained model](#A.-Deploy-trained-model)
    2. [Create input payload](#B.-Create-input-payload)
    3. [Perform real-time inference](#C.-Perform-real-time-inference)
    4. [Visualize output](#D.-Visualize-output)
    6. [Delete the endpoint](#F.-Delete-the-endpoint)
5. [Perform Batch inference](#5.-Perform-Batch-inference)
6. [Clean-up](#6.-Clean-up)
	1. [Delete the model](#A.-Delete-the-model)


## Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

## 1. Subscribe to the algorithm

To subscribe to the algorithm:
1. Open the algorithm listing page <font color='red'> For Seller to update: [AI-Advisor UPAD](https://github.com/AI-Advisor-ML-Marketplace/point-anomaly-detection)</font>
1. On the AWS Marketplace listing,  click on **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you agree with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn**. This is the algorithm ARN that you need to specify while training a custom ML model. Copy the ARN corresponding to your region and specify the same in the following cell.

![product_arn_image](images/product_arn_image.png)

In [29]:
from getpass import getpass 

# SHAPE
# algo_arn = "<Customer to specify algorithm ARN corresponding to their AWS region follow the instruction above>"

########################################CHANGE####################################################
# SAMPLE
algo_arn='arn:aws:sagemaker:us-east-2:438613450817:algorithm/aiadvisor-pad-v1-3-1'
##################################################################################################

# get your seesion information
#####################################################
aws_region = "us-east-2"  ##
aws_access_key = getpass(prompt="Access key: ")
aws_secret_key = getpass(prompt="Secret key: ")
######################aws_access_key#########################

Access key:  ········
Secret key:  ········


## 2. Prepare dataset

In [30]:
import base64
import json
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
from urllib.parse import urlparse
import boto3
from IPython.display import Image
from PIL import Image as ImageEdit
import urllib.request
import numpy as np
import pandas as pd

### A. Dataset format expected by the algorithm

This solution follows these **2 steps**:  `Training` and `Testing` the algorithm.

**Train**
- The algorithm trains on user provided dataset.
- Dataset must be in `txt/csv` shape, under `./data/train/` folder, with 'utf-8' encoding.

**Test**
- After the Machine Learning model is trained, it can be used to make prediction using test dataset.
- The algorithm also tests on user provided dataset.
- Dataset must be in `txt/csv` shape, under `./data/test/` folder, with 'utf-8' encoding.

### B. Configure and visualize train and test dataset
The `train` and `test` dataset should look like this as below:

In [31]:
import pandas as pd # import padas to show how data looks like

In [32]:
# SHAPE
# training_dataset = "data/train/<FileName.ext>"

########################################CHANGE####################################################
# SAMPLE
training_dataset = "data/training/train.csv"
##################################################################################################

In [33]:
# show sample of training dataset
df = pd.read_csv(training_dataset)
df.tail(5)

Unnamed: 0,timestamp,value,anomaly
598,2014-03-09 06:26:00,44.774,False
599,2014-03-09 06:31:00,45.728,False
600,2014-03-09 06:36:00,44.756,False
601,2014-03-09 06:41:00,45.258,False
602,2014-03-09 06:46:00,42.522,False


In [34]:
# SHAPE
# test_dataset = "data/test/<FileName.ext>"

########################################CHANGE####################################################
# SAMPLE
test_dataset = "data/inference/test.csv"
##################################################################################################

In [35]:
# show sample of test dataset
df = pd.read_csv(test_dataset)
df.tail(5)

Unnamed: 0,timestamp,value,anomaly
3413,2014-03-21 03:21:00,25.352,False
3414,2014-03-21 03:26:00,38.216,False
3415,2014-03-21 03:31:00,22.864,False
3416,2014-03-21 03:36:00,66.26,False
3417,2014-03-21 03:41:00,30.962,False


### C. Upload datasets to Amazon S3

<font color='red'>Do not change bucket parameter value. Do not hardcode your S3 bucket name.</font>

In [36]:
import boto3
import sagemaker

boto_session = boto3.Session(region_name=aws_region, aws_access_key_id=aws_access_key, aws_secret_access_key=aws_secret_key)
sagemaker_session = sagemaker.Session(boto_session=boto_session) # get session info

bucket = sagemaker_session.default_bucket()
bucket

'sagemaker-us-east-2-438613450817'

In [37]:
# upload training data to s3 bucket
algo_prefix = "point-anomaly-detection"
training_data = sagemaker_session.upload_data(training_dataset, bucket=bucket, key_prefix=algo_prefix + "/traing-input-data")
print("Training input uploaded to : " + training_data)

Training input uploaded to : s3://sagemaker-us-east-2-438613450817/point-anomaly-detection/traing-input-data/train.csv


In [38]:
# upload test data to s3 bucket
test_data = sagemaker_session.upload_data(test_dataset, bucket=bucket, key_prefix=algo_prefix+"/inference-input-data")
print("Inference input uploaded to : " + test_data)

Inference input uploaded to : s3://sagemaker-us-east-2-438613450817/point-anomaly-detection/inference-input-data/test.csv


## 3: Train a machine learning model

Now that dataset is available in an accessible Amazon S3 bucket, we are ready to train a machine learning model. 

### 3.1 Set up environment

In [39]:
## If you are running on a local server, enter the role name specified in IAM role.

sts = boto3.client('sts', region_name=aws_region, aws_access_key_id=aws_access_key, aws_secret_access_key=aws_secret_key)
caller_identity = sts.get_caller_identity()
account_id = caller_identity['Account']
role_name = input("Role name: ")
role = f'arn:aws:iam::{account_id}:role/{role_name}'



### If you are running in sagemaker jupyter notebook then uncomment the below. (The above is commented out.) 

#from sagemaker import get_execution_role
#role = get_execution_role(sagemaker_session=sagemaker_session)

print (f"Result: {role}")

Role name:  sagemaker-operation


Result: arn:aws:iam::438613450817:role/sagemaker-operation


<font color='red'>For Seller to update: update algorithm sepcific unique prefix in following cell. </font>

In [40]:
# SHAPE
# output_location = "s3://{}/<For seller to Update:Update a unique prefix>/{}".format(bucket, "output")

########################################CHANGE####################################################
# SAMPLE
output_location = "s3://{}/ai-advisor-upad/{}".format(bucket, "output")
##################################################################################################

### 3.2 Train a model

You can also find more information about dataset format in **Hyperparameters** section of <font color='red'> For Seller to update: [AI-Advisor UPAD](https://github.com/AI-Advisor-ML-Marketplace/point-anomaly-detection).</font>

In [43]:
# SHAPE
# hyperparameters = {}

########################################CHANGE####################################################
# Define hyperparameters
#hyperparameters = {'hpo_repeat': '10', 'min_rows': '200', 'train_models': 'all'}
hyperparameters = {
            'common_x_columns': 'value',
            'common_time_column': 'timestamp',
            'common_index_columns': '',
    
            'train_hpo_repeat': 3,                 
            'train_decision_rule': 'upper'}
##################################################################################################

<font color='red'>For Seller to update: Update appropriate values in estimator definition and ensure that fit call works as expected.</font>

For information on creating an `Estimator` object, see [documentation](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html)

In [76]:
########################################CHANGE####################################################
# Create an estimator object for running a training job
estimator = sagemaker.algorithm.AlgorithmEstimator(
    algorithm_arn=algo_arn,
    base_job_name="ai-advisor-upad",
    role=role,
    instance_count=1,
    instance_type='ml.c5.xlarge',
    input_mode="File",
    output_path=output_location,
    sagemaker_session=sagemaker_session,
    hyperparameters=hyperparameters,
)
##################################################################################################

# Run the training job.
estimator.fit({'training': training_data})

INFO:sagemaker:Creating training-job with name: ai-advisor-upad-2023-03-31-04-27-03-412


2023-03-31 04:27:04 Starting - Starting the training job...
2023-03-31 04:27:18 Starting - Preparing the instances for training...
2023-03-31 04:28:20 Downloading - Downloading input data
2023-03-31 04:28:20 Training - Downloading the training image.........
2023-03-31 04:29:46 Training - Training image download completed. Training in progress.[34mjson_path:  /opt/ml/input/config/hyperparameters.json[0m
[34myaml_path:  /opt/program/framework/configure/pad.train.workflow.yaml[0m
[34mcurrent mode :  train[0m
[34m##############Train Hyperparameters overload complete##############[0m
[34mjson_path:  /opt/ml/input/config/hyperparameters.json[0m
[34myaml_path:  /opt/program/framework/configure/pad.inference.workflow.yaml[0m
[34mcurrent mode :  inference[0m
[34m##############Inference Hyperparameters overload complete##############[0m
[34mTrain, Inference hyperparams overriden![0m
[34maip yaml file is replaced with aip_train yaml file![0m
[34mStart Train Pipeline![0m
[3

See this [blog-post](https://aws.amazon.com/blogs/machine-learning/easily-monitor-and-visualize-metrics-while-training-models-on-amazon-sagemaker/) for more information how to visualize metrics during the process. You can also open the training job from [Amazon SageMaker console](https://console.aws.amazon.com/sagemaker/home?#/jobs/) and monitor the metrics/logs in **Monitor** section.

## 4: Deploy model and verify results

Now you can deploy the model for performing real-time inference.

In [47]:
########################################CHANGE####################################################
model_name = "ai-advisor-upad"
content_type='text/csv'

# set instance type
instance_type = 'ml.c5.2xlarge'
##################################################################################################

### A. Deploy trained model

In [72]:
from sagemaker.predictor import csv_serializer

# deploy model
predictor = estimator.deploy(
    initial_instance_count=1, 
    instance_type= instance_type, 
    serializer=csv_serializer)

INFO:sagemaker:Creating model package with name: ai-advisor-upad-2023-03-30-14-33-29-058


..........


INFO:sagemaker:Creating model with name: ai-advisor-upad-2023-03-30-14-33-29-058
INFO:sagemaker:Creating endpoint-config with name ai-advisor-upad-2023-03-30-14-33-29-058
INFO:sagemaker:Creating endpoint with name ai-advisor-upad-2023-03-30-14-33-29-058


-----!

Once endpoint is created, you can perform real-time inference.

### B. Create input payload

In [73]:
file_name = "data/inference/test.csv"

<Add code snippet that shows the payload contents>

### C. Perform real-time inference

In [74]:
import pandas as pd
import io

runtime = boto3.client('sagemaker-runtime', region_name=aws_region, aws_access_key_id=aws_access_key, aws_secret_access_key=aws_secret_key)

response = runtime.invoke_endpoint(
    EndpointName=predictor.endpoint_name,
    ContentType=content_type,
    #Body=file_name.encode('utf-8'),
    Body=open(file_name, 'rb').read(),
    Accept=content_type
)

content = response['Body'].read()
binary_stream = io.BytesIO(content)

In [75]:
import tarfile
from PIL import Image


with tarfile.open(fileobj=binary_stream, mode='r') as tar:
    csv_contents = list()
    image_contents = list()
    for member in tar.getmembers():
        if member.name.endswith('.csv'):
            csv_contents.append(tar.extractfile(member).read())
        elif member.name.endswith('.png'):
            image_contents.append(tar.extractfile(member).read())

if len(csv_contents) != 0 :
    for csv_raw in csv_contents:
        result_df = pd.read_csv(io.StringIO(csv_raw.decode('utf-8')))
        display(result_df.tail(10))
        
if len(image_contents) != 0 : 
    for img_raw in image_contents:
        img = Image.open(io.BytesIO(png_contents))



Unnamed: 0,index,timestamp,value,anomaly,anomaly_detection,anomaly_score,model_name,threshold_upper,threshold_lower,decision_rule
3408,3408,2014-03-21 02:56:00,46.948,False,False,0.01162,rrcf,0.513029,,upper
3409,3409,2014-03-21 03:01:00,25.422,True,False,0.0,rrcf,0.513029,,upper
3410,3410,2014-03-21 03:06:00,57.958,False,False,0.291291,rrcf,0.513029,,upper
3411,3411,2014-03-21 03:11:00,28.052,False,False,0.445745,rrcf,0.513029,,upper
3412,3412,2014-03-21 03:16:00,56.572,False,False,0.458555,rrcf,0.513029,,upper
3413,3413,2014-03-21 03:21:00,25.352,False,False,0.0,rrcf,0.513029,,upper
3414,3414,2014-03-21 03:26:00,38.216,False,False,0.175338,rrcf,0.513029,,upper
3415,3415,2014-03-21 03:31:00,22.864,False,False,0.0,rrcf,0.513029,,upper
3416,3416,2014-03-21 03:36:00,66.26,False,False,0.503243,rrcf,0.513029,,upper
3417,3417,2014-03-21 03:41:00,30.962,False,False,0.402267,rrcf,0.513029,,upper


Unnamed: 0,index,timestamp,value,anomaly,anomaly_detection,anomaly_score,model_name,threshold_upper,threshold_lower,decision_rule
0,2777,2014-03-18 22:21:00,54.508,False,True,0.604043,rrcf,0.513029,,upper
1,2780,2014-03-18 22:36:00,65.68,False,True,0.872745,rrcf,0.513029,,upper
2,2781,2014-03-18 22:41:00,99.248,True,True,1.0,rrcf,0.513029,,upper


### D. Visualize output

In [67]:
# print result
result_df.tail(20)

Unnamed: 0,index,timestamp,value,anomaly,anomaly_detection,anomaly_score,model_name,threshold_upper,threshold_lower,decision_rule
0,2777,2014-03-18 22:21:00,54.508,False,True,0.604043,rrcf,0.513029,,upper
1,2780,2014-03-18 22:36:00,65.68,False,True,0.872745,rrcf,0.513029,,upper
2,2781,2014-03-18 22:41:00,99.248,True,True,1.0,rrcf,0.513029,,upper


### F. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. you can terminate the same to avoid being charged.

In [None]:
predictor.delete_endpoint(delete_endpoint_config=True)

Since this is an experiment, you do not need to run a hyperparameter tuning job. However, if you would like to see how to tune a model trained using a third-party algorithm with Amazon SageMaker's hyperparameter tuning functionality, you can run the optional tuning step.

<font color='red'>For seller to update: Review/update the tuner configuration including but not limited to `base_tuning_job_name`, `max_jobs`, and `max_parallel_jobs`. </font>

In [None]:
tuner = HyperparameterTuner(
    estimator=estimator,
    base_tuning_job_name="<For Seller to update: Specify base job name>",
    objective_metric_name=objective_metric_name,
    objective_type=tuning_direction,
    hyperparameter_ranges=hyperparameter_ranges,
    max_jobs=50,
    max_parallel_jobs=7,
)

<font color='red'>For seller to update: Uncomment following lines, specify appropriate channels, and run the tuner to test it out. </font>

In [None]:
# Uncomment following two lines to run Hyperparameter optimization job.
# tuner.fit({'training':  data})
# tuner.wait()

<font color='red'>For seller to update: Once you have tested the code written in the preceding cell, comment three lines in the preceding cell so that customers who choose to simply run entire notebook do not end up triggering a tuning job. </font>

Once you have completed a tuning job, (or even while the job is still running) you can [clone and use this notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/hyperparameter_tuning/analyze_results/HPO_Analyze_TuningJob_Results.ipynb) to analyze the results to understand how each hyperparameter effects the quality of the model.

## 5. Perform Batch inference

In this section, you will perform batch inference using multiple input payloads together.

In [None]:
########################################CHANGE####################################################
# upload the batch-transform job input files to S3
transform_dataset = "data/inference/test.csv"
##################################################################################################

transform_input = sagemaker_session.upload_data(transform_dataset, key_prefix=model_name)
print("Transform input uploaded to : " + transform_input)

In [None]:
# Run the batch-transform job
transformer = estimator.transformer(instance_count=1, instance_type=instance_type)
transformer.transform(transform_input, content_type=content_type)
transformer.wait()

In [None]:
# output is available on following path
transformer.output_path

## 7. Clean-up

### A. Delete the model

In [None]:
predictor.delete_model()