# Mphasis HyperGraf Market Basket Analysis

Mphasis HyperGraf Market Basket Analysis uncovers associations between articles and identifies the frequent products which are likely to be purchased together by analyzing large volumes of transactional data. Mphasis HyperGraf is an omni- channel digital 360° solution that transforms enterprise decision making by providing the most comprehensive, accurate, real-time and actionable customer engagement insights across millions of data points spread over multiple engagement channels.

### Prerequisite

To run this algorithm you need to have access to the following AWS Services:
- Access to AWS SageMaker and the model package.
- An S3 bucket to specify input/output.
- Role for AWS SageMaker to access input/output from S3.

This sample notebook shows you how to deploy Mphasis HyperGraf Market Basket Analysis using Amazon SageMaker.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

#### Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to Mphasis HyperGraf Market Basket Analysis. If so, skip step: [Subscribe to the model package](#1.-Subscribe-to-the-model-package)

#### Contents:
1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)
2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
   1. [Create an endpoint](#A.-Create-an-endpoint)
   2. [Create input payload](#B.-Create-input-payload)
   3. [Perform real-time inference](#C.-Perform-real-time-inference)
   4. [Output Result](#D.-Output-Result)
   5. [Delete the endpoint](#E.-Delete-the-endpoint)
3. [Perform batch inference](#3.-Perform-batch-inference) 
4. [Clean-up](#4.-Clean-up)
    1. [Delete the model](#A.-Delete-the-model)
    2. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))
    

#### Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

### 1. Subscribe to the model package

To subscribe to the model package:
1. Open the model package listing page Mphasis HyperGraf Market Basket Analysis
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

In [18]:
model_package_arn='arn:aws:sagemaker:us-east-2:786796469737:model-package/marketplace-market-basket-2'

In [19]:
import pandas as pd
import numpy as np
import json
import os
import boto3
from zipfile import ZipFile
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
from IPython.display import Image, display

In [None]:
role = get_execution_role()

sagemaker_session = sage.Session()

bucket=sagemaker_session.default_bucket()
bucket

### 2. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

In [21]:
model_name='mba-a'

content_type='text/csv'

real_time_inference_instance_type='ml.m5.large'
batch_transform_inference_instance_type='ml.m5.large'

#### A. Create an endpoint

In [22]:
def predict_wrapper(endpoint, session):
    return sage.predictor.RealTimePredictor(endpoint, session,content_type)

#create a deployable model from the model package.

model = ModelPackage(role=role,
                    model_package_arn=model_package_arn,
                    sagemaker_session=sagemaker_session,
                    predictor_cls=predict_wrapper)

In [None]:
predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name)

Once endpoint has been created, you would be able to perform real-time inference.

#### B. Create input payload

#### Instructions

    1) The input dataset should be in csv format.

    2) The column names in input file should be:

        *  InvoiceID: This is the Invoice Number which is the systematically assigned sequential code unique to each invoice.
        *  SKUID: Stock Keeping Unit ID.
        *  Item: description of item, a string, name of item along with brand name and color name.

    3) More than one items may have same stock keeping unit id(SKUID), but no item can have more than one stock keeping unit id.

    4) The return orders must contain ‘C’ in Invoice number.

    5) No item mentioned in the description can have more than one stock keeping unit.

In [24]:
file_name = 'sample_input3.csv'

#### C. Perform real-time inference

In [25]:
!aws sagemaker-runtime invoke-endpoint --endpoint-name $model_name --body fileb://$file_name --content-type 'text/csv' --region us-east-2 output.csv

{
    "ContentType": "text/csv; charset=utf-8",
    "InvokedProductionVariant": "AllTraffic"
}


#### D. Output Result

- The output file (in csv format) contains the following columns:

    1. Item in cart: A single item added in cart. If nothing added, then the output file contains all the filtered rules based on set threshold values.

    2. Recommendation: The consequent of the item added in the cart.

    3. Item support: The proportion of transactions containing the item in cart (antecedent).

    4. Confidence: Given an antecedent, the probability of the consequent in the same transaction is confidence.

    5. Lift: Measures the dependency of consequent on the antecedent. If Lift = 1, then the antecedent and consequent are independent. If Lift > 1, then given the antecedent, the probability of consequent is greater than the support of the consequent.

    6. Conviction: The dependency of consequent on the antecedent increases with conviction value.

    If conviction = 1, then the antecedent and consequent are independent.

    7. Leverage: The difference between the probability of set of antecedent and consequent and the probability of antecedent and consequent considering independent to each other. If leverage = 0, the antecedent and consequent are independent.

- Generated results are sorted in the decreasing order of item support and can be filtered based on rule support, confidence and lift.

In [33]:
file_path = os.getcwd()
file_name = 'output.csv'

output_df = pd.read_csv(file_name)
output_df.head()

Unnamed: 0,Item in cart,Recommendation,Item Support,Support,Confidence,Lift,Leverage,Conviction
0,PRODUCT_2,PRODUCT_1,0.33333,0.33333,1.0,3.0,0.22222,inf
1,PRODUCT_1,PRODUCT_2,0.33333,0.33333,1.0,3.0,0.22222,inf
2,PRODUCT_3,PRODUCT_1,0.33333,0.33333,1.0,3.0,0.22222,inf
3,PRODUCT_1,PRODUCT_3,0.33333,0.33333,1.0,3.0,0.22222,inf
4,PRODUCT_2,PRODUCT_3,0.33333,0.33333,1.0,3.0,0.22222,inf


#### E. Delete the endpoint

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

In [28]:
predictor=sage.predictor.Predictor(model_name, sagemaker_session,content_type)
predictor.delete_endpoint(delete_endpoint_config=True)

### 3. Perform batch inference

In this section, you will perform batch inference using multiple input payloads together. If you are not familiar with batch transform, and want to learn more, see these links:
1. [How it works](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-batch-transform.html)
2. [How to run a batch transform job](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html)

In [None]:
#upload the batch-transform job input files to S3
transform_input_folder = "data/input/batch"
transform_input = sagemaker_session.upload_data(transform_input_folder, key_prefix=model_name) 
print("Transform input uploaded to " + transform_input)

In [None]:
#Run the batch-transform job
transformer = model.transformer(1, batch_transform_inference_instance_type)
transformer.transform(transform_input, content_type=content_type)
transformer.wait()

In [None]:
#output is available on following path
transformer.output_path

In [37]:
s3_conn = boto3.client("s3")
bucket_name="sagemaker-us-east-2-786796469737"
with open('output.csv', 'wb') as f:
    s3_conn.download_fileobj(bucket_name, os.path.basename(transformer.output_path)+'/sample_input2.csv.out', f)
    print("Output file loaded from bucket")

Output file loaded from bucket


In [38]:
file_path = os.getcwd()
file_name = 'output.csv'

output_df = pd.read_csv(file_name)
output_df.head()

Unnamed: 0,Item in cart,Recommendation,Item Support,Support,Confidence,Lift,Leverage,Conviction
0,PRODUCT_1,PRODUCT_2,0.33333,0.33333,1.0,3.0,0.22222,inf
1,PRODUCT_2,PRODUCT_1,0.33333,0.33333,1.0,3.0,0.22222,inf
2,PRODUCT_3,PRODUCT_1,0.33333,0.33333,1.0,3.0,0.22222,inf
3,PRODUCT_1,PRODUCT_3,0.33333,0.33333,1.0,3.0,0.22222,inf
4,PRODUCT_3,PRODUCT_2,0.33333,0.33333,1.0,3.0,0.22222,inf


### 4. Clean-up

#### A. Delete the model

In [39]:
model.delete_model()

#### B. Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.