## Train, tune, and deploy a custom ML model using <font color='red'>Optimize Next Best Action Prediction</font> Algorithm from AWS Marketplace 


<font color='red'> A deep learning based solution that analyzes event (e.g. loan approval process) log data with contextual information (e.g. loan request parameters, etc.) and predicts the next step and time to next step for an open request within a process. With process execution data stored in form of event logs, an AI based operations planning system can help in understanding future system state based on current state and business context. This solution improves business operations planning by reducing cost and improving efficiency through dynamic resource planning. </font>

This sample notebook shows you how to train a custom ML model using <font color='red'>  [Optimize Next Best Action Prediction](https://aws.amazon.com/marketplace/pp/prodview-abpjqnkcxrmhq?qid=1614675374394&sr=0-1&ref_=srh_res_product_title)</font> from AWS Marketplace.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

#### Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. Some hands-on experience using [Amazon SageMaker](https://aws.amazon.com/sagemaker/).
1. To use this algorithm successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to <font color='red'>  [Optimize Next Best Action Prediction](https://aws.amazon.com/marketplace/pp/prodview-abpjqnkcxrmhq?qid=1614675374394&sr=0-1&ref_=srh_res_product_title)</font>. 

#### Contents:
1. [Subscribe to the algorithm](#1.-Subscribe-to-the-algorithm)
1. [Prepare dataset](#2.-Prepare-dataset)
	1. [Dataset format expected by the algorithm](#A.-Dataset-format-expected-by-the-algorithm)
	1. [Configure and visualize train and test dataset](#B.-Configure-and-visualize-train-and-test-dataset)
	1. [Upload datasets to Amazon S3](#C.-Upload-datasets-to-Amazon-S3)
1. [Train a machine learning model](#3:-Train-a-machine-learning-model)
	1. [Set up environment](#3.1-Set-up-environment)
	1. [Train a model](#3.2-Train-a-model)
1. [Deploy model and verify results](#4:-Deploy-model-and-verify-results)
    1. [Deploy trained model](#A.-Deploy-trained-model)
    1. [Create input payload](#B.-Create-input-payload)
    1. [Perform real-time inference](#C.-Perform-real-time-inference)
    1. [Visualize output](#D.-Visualize-output)
    1. [Calculate relevant metrics](#E.-Calculate-relevant-metrics)
    1. [Delete the endpoint](#F.-Delete-the-endpoint)
1. [Tune your model! (optional)](#5:-Tune-your-model!-(optional))
	1. [Tuning Guidelines](#A.-Tuning-Guidelines)
	1. [Define Tuning configuration](#B.-Define-Tuning-configuration)
	1. [Run a model tuning job](#C.-Run-a-model-tuning-job)
1. [Perform Batch inference](#6.-Perform-Batch-inference)
1. [Clean-up](#7.-Clean-up)
	1. [Delete the model](#A.-Delete-the-model)
	1. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))


#### Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

### 1. Subscribe to the algorithm

To subscribe to the algorithm:
1. Open the algorithm listing page <font color='red'> [Optimize Next Best Action Prediction](https://aws.amazon.com/marketplace/pp/prodview-abpjqnkcxrmhq?qid=1614675374394&sr=0-1&ref_=srh_res_product_title).</font>
1. On the AWS Marketplace listing,  click on **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you agree with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn**. This is the algorithm ARN that you need to specify while training a custom ML model. Copy the ARN corresponding to your region and specify the same in the following cell.

In [9]:
algo_arn='arn:aws:sagemaker:us-east-2:786796469737:algorithm/mphasis-marketplace-next-best-action-new-v1'

### 2. Prepare dataset

In [10]:
import base64
import json 
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
from urllib.parse import urlparse
import boto3
from IPython.display import Image
from PIL import Image as ImageEdit
import urllib.request
import numpy as np
import pandas as pd

#### A. Dataset format expected by the algorithm

The deployed solution has these **2 steps**:
1. Training API: The system trains on user provided historical process data  with contextual information and builds & saves a deep learning model which is a representation of the process behavior.
2. Testing API:  Once the model is generated, the solution can be used to predict next possible step and time to next step for a given open request within a process. 

** Following are the mandatory inputs for both the APIs:**
* *CaseID*: Unique identifier of a request/journey e.g. E-comm order ID, loan ID etc.
* *ActivityID*: Activity Identifier/Activity Name performed for each CASE_ID e.g. INVOICE GENERATION, KYC etc.
* *CompleteTimestamp*: Timestamp for a unique CASE_ID/ACTIVITY_ID combination.
* *context*: Contextual variables can be anything which provides information related to case. E.g. Loan Amount,  Vendor ID etc.

You can also find more information about dataset format in **Usage Information** section of <font color='red'>  [Optimize Next Best Action Prediction](https://aws.amazon.com/marketplace/pp/prodview-abpjqnkcxrmhq?qid=1614675374394&sr=0-1&ref_=srh_res_product_title).</font>

#### B. Configure and visualize train and test dataset

In [11]:
training_dataset='data/train/sample_train_file.csv'

In [12]:
train_input_df = pd.read_csv(training_dataset)
train_input_df.head()

Unnamed: 0,CaseID,ActivityID,CompleteTimestamp,context
0,2,1,4/3/2012 16:55,1
1,2,8,4/3/2012 16:55,1
2,2,6,4/5/2012 17:15,1
3,3,1,10/29/2010 18:14,5
4,3,8,11/4/2010 1:16,5


In [13]:
test_dataset='data/test/sample_test_file.csv'

In [14]:
test_input_df = pd.read_csv(test_dataset)
test_input_df.head()

Unnamed: 0,CaseID,ActivityID,CompleteTimestamp,context
0,4350,1,6/30/2011 22:58,2
1,4350,1,6/30/2011 22:58,2
2,4350,8,7/5/2011 17:27,2
3,4350,6,7/19/2011 21:37,2
4,4355,1,8/2/2012 17:31,2


#### C. Upload datasets to Amazon S3

In [15]:
sagemaker_session = sage.Session()
bucket=sagemaker_session.default_bucket()
bucket

'sagemaker-us-east-2-786796469737'

In [16]:
training_data=sagemaker_session.upload_data(test_dataset, bucket=bucket, key_prefix='nba')
test_data=sagemaker_session.upload_data(test_dataset, bucket=bucket, key_prefix='nba')

## 3: Train a machine learning model

Now that dataset is available in an accessible Amazon S3 bucket, we are ready to train a machine learning model. 

### 3.1 Set up environment

In [17]:
role = get_execution_role()


In [18]:
output_location = 's3://{}/nba/{}'.format(bucket, 'output')

### 3.2 Train a model

For information on creating an `Estimator` object, see [documentation](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html)

In [23]:
#Create an estimator object for running a training job
estimator = sage.algorithm.AlgorithmEstimator(
    algorithm_arn=algo_arn,
    base_job_name="nba",
    role=role,
    instance_count=1,
    instance_type='ml.m5.large',
    input_mode="File",
    output_path=output_location,
    sagemaker_session=sagemaker_session,
    hyperparameters={}
)
#Run the training job.
estimator.fit({"training": training_data})

2021-03-02 09:45:35 Starting - Starting the training job...
2021-03-02 09:45:58 Starting - Launching requested ML instancesProfilerReport-1614678335: InProgress
......
2021-03-02 09:46:59 Starting - Preparing the instances for training...
2021-03-02 09:47:33 Downloading - Downloading input data
2021-03-02 09:47:33 Training - Downloading the training image......
2021-03-02 09:48:31 Training - Training image download completed. Training in progress..[34m2021-03-02 09:48:33.666628: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory[0m
[34m2021-03-02 09:48:33.666678: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.[0m
[34mStarting the training.[0m
[34mlist.remove(x): x not in list[0m
[34mtotal chars: 6, target chars: 6[0m
[34m{0: 1, 1: 2, 2: 3, 3: 4, 

See this [blog-post](https://aws.amazon.com/blogs/machine-learning/easily-monitor-and-visualize-metrics-while-training-models-on-amazon-sagemaker/) for more information how to visualize metrics during the process. You can also open the training job from [Amazon SageMaker console](https://console.aws.amazon.com/sagemaker/home?#/jobs/) and monitor the metrics/logs in **Monitor** section.

### 4: Deploy model and verify results

Now you can deploy the model for performing real-time inference.

In [24]:
model_name='nba-test'

content_type='text/csv'

real_time_inference_instance_type='ml.m5.large'
batch_transform_inference_instance_type='ml.m5.large'

#### A. Deploy trained model

In [25]:
predictor = estimator.deploy(1, real_time_inference_instance_type)

..........
-------------!

Once endpoint is created, you can perform real-time inference.

#### B. Create input payload

In [28]:
file_name = test_dataset
output_file_name = "nba_test_output"

#### C. Perform real-time inference

In [29]:
!aws sagemaker-runtime invoke-endpoint \
    --endpoint-name $predictor.endpoint \
    --body fileb://$file_name \
    --content-type $content_type \
    --region $sagemaker_session.boto_region_name \
    $output_file_name

The endpoint attribute has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


{
    "ContentType": "text/csv; charset=utf-8",
    "InvokedProductionVariant": "AllTraffic"
}


#### D. Visualize output

In [33]:
output_df = pd.read_csv(output_file_name)
output_df.head()

Unnamed: 0.1,Unnamed: 0,Case_id,nextStep,Time(mins)
0,0,4350,6,34.7
1,1,4355,6,0.0
2,2,4356,6,94.63
3,3,4357,6,0.0
4,4,4358,6,0.0


#### F. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. you can terminate the same to avoid being charged.

In [35]:
predictor.delete_endpoint(delete_endpoint_config=True)

### 6. Perform Batch inference

In this section, you will perform batch inference using multiple input payloads together.

In [36]:
#upload the batch-transform job input files to S3
transform_input_folder = "data/test"
transform_input = sagemaker_session.upload_data(transform_input_folder, key_prefix=model_name) 
print("Transform input uploaded to " + transform_input)

Transform input uploaded to s3://sagemaker-us-east-2-786796469737/nba-test


In [None]:
#Run the batch-transform job
transformer = model.transformer(1, batch_transform_inference_instance_type)
transformer.transform(transform_input, content_type=content_type)
transformer.wait()

In [None]:
#output is available on following path
transformer.output_path

### 7. Clean-up

#### A. Delete the model

In [38]:
predictor.delete_model()

#### B. Unsubscribe to the listing (optional)

If you would like to unsubscribe to the algorithm, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

