# AWS Marketplace Product Usage Demonstration - 7Park Data transaction data parsing

**7Park Data** transaction data parsing allows you to wrangle more value out of your credit card, POS, and receipt data by identifying and extracting key entities. 

Our transaction data classifier (NER) has been trained and optimized on trillions of credit card transactions over the last 5 years. Entities covered in this solution include: 
- **Merchants / Companies** (e.g., "Starbucks")
- **Locations** (e.g., "Venice Beach, Los Angeles, CA")

F1 scores are 95% and higher for all entities on our data. 

# Pre-requisites

This sample notebook requires subscription to the following pre-trained machine learning model packages from AWS Marketplace:

**[Transaction Data Parsing (NER)](https://aws.amazon.com/marketplace/pp/prodview-sqnwjvzzqntn2)**
    
If your AWS account has not been subscribed to these listings, here is the process you can follow for each of the above mentioned listings:

1. Open the listing from AWS Marketplace
1. Read the **Highlights** section and then **product overview** section of the listing.
1. View **usage information** and then **additional resources.**
1. Note the supported instance types.
1. Next, click on **Continue to subscribe.**
1. Review **End user license agreement, support terms**, as well as **pricing information.**
1. **"Accept Offer"** button needs to be clicked if your organization agrees with EULA, pricing information as well as support terms.

**Notes:**

If **Continue to configuration** button is active, it means your account already has a subscription to this listing.
Once you click on **Continue to configuration** button and then choose region, you will see that a Product Arn will appear. This is the model package ARN that you need to specify while creating a deployable model. However, for this notebook, the algorithm ARN has been specified in **src/model_package_arns.py** file and you do not need to specify the same explicitly.

# Set up environment and view a sample image

In this section, we will import necessary libraries and define variables such as an S3 bucket, an IAM role, and sagemaker session to be used.

In [6]:
import json
from pprint import pprint
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage

from src.model_package_arns import ModelPackageArnProvider

role = get_execution_role()

sagemaker_session = sage.Session()

# Live Inference Endpoint

## Step 1: Deploy the model for performing real-time inference.

In [138]:
# Get the model_package_arn
modelpackage_arn = ModelPackageArnProvider.get_model_package_arn(sagemaker_session.boto_region_name)

# Define predictor wrapper class
def ner_detection_predict_wrapper(endpoint, session):
    return sage.RealTimePredictor(endpoint, session, content_type='application/json')

# Create a deployable model for the transaction data parsing model package.
ner_model = ModelPackage(role=role,
                         model_package_arn=modelpackage_arn,
                         sagemaker_session=sagemaker_session,
                         predictor_cls=ner_detection_predict_wrapper)

# Deploy the model
ner_predictor = ner_model.deploy(initial_instance_count=1, 
                                 instance_type='ml.m5.xlarge',
                                 endpoint_name='txn-ner-endpoint')

--------------------------------------------------------------------------------------------------------------!

## Step 2: Perform a prediction on Amazon Sagemaker Endpoint created.

In [151]:
sample = {'instance': '63212 THE HOMEDEPOT 9325 LUBBOCK TX'}

# Perform a prediction
ner_result = ner_predictor.predict(json.dumps(sample)).decode('utf-8')

# View the prediction
pprint(json.loads(ner_result))

{'ner': [{'end_pos': 9, 'key': 'THE', 'start_pos': 6, 'type': 'NE_MERCHANT'},
         {'end_pos': 19,
          'key': 'HOMEDEPOT',
          'start_pos': 10,
          'type': 'NE_MERCHANT'},
         {'end_pos': 24,
          'key': '9325',
          'start_pos': 20,
          'type': 'NE_STORE_LOCATION'},
         {'end_pos': 32,
          'key': 'LUBBOCK',
          'start_pos': 25,
          'type': 'NE_STORE_LOCATION'},
         {'end_pos': 35,
          'key': 'TX',
          'start_pos': 33,
          'type': 'NE_STORE_LOCATION'}]}


# Batch Transform Job

Now let's use the model built to run a batch inference job and verify it works. 

The model supports data in [jsonlines](http://jsonlines.org/) format.

In [88]:
# review input file
SAMPLE_FILE = 'data/samples.jl'

with open(SAMPLE_FILE) as f:
    print(f.read())

{"id": 0, "instance": "02/07 THE HOME DEPOT 0561 MIDLAND TX"}
{"id": 1, "instance": "NST THE HOME-Depot 682011 7545 N MESA & REMCON EL PASO TX"}
{"id": 2, "instance": "63212 THE HOMEDEPOT 9325 LUBBOCK TX"}
{"id": 3, "instance": "01-23 THE HOME DEP 7943 FLUSHING NY"}
{"id": 4, "instance": "DUNKIn #352275 Q ROCKWALL TX"}
{"id": 5, "instance": "SA DUNKIN #293874 Q BRENHAM TX"}
{"id": 6, "instance": "Da DUNKIN donuts CA"}
{"id": 7, "instance": "ASMW DUNKIN-DONUTS MA"}
{"id": 8, "instance": "Wal-Mart Su 0321 WAL-MARTS BRENHAM TX"}
{"id": 9, "instance": "MURPHY 6781 AT WALMRT TEXARKANA AR"}
{"id": 10, "instance": "wal-Mart #4367 TEXARKANA TX"}
{"id": 11, "instance": "SW 2738 WAL-marts CA"}


## Step 1: Update the input file to S3

In [None]:
transform_input = sagemaker_session.upload_data(
    SAMPLE_FILE, 
    key_prefix='transaction_ner/' + SAMPLE_FILE)
print("Transform input uploaded to " + transform_input)

## Step 2: Run a new transform job

In [None]:
import json 
import uuid

transformer = ner_model.transformer(1, 'ml.m5.xlarge', 
                                    accept="application/jsonlines",
                                    assemble_with='Line')
transformer.transform(
    transform_input, 
    content_type='application/jsonlines',
    join_source= "Input",
    split_type='Line'
)
transformer.wait()

print("Batch Transform output saved to " + transformer.output_path)

## Step 3: Inspect the Batch Transform Output in S3

In [96]:
from urllib.parse import urlparse

parsed_url = urlparse(transformer.output_path)
bucket_name = parsed_url.netloc
file_key = '{}/{}.out'.format(parsed_url.path[1:], "samples.jl")

s3_client = sagemaker_session.boto_session.client('s3')

response = s3_client.get_object(Bucket = sagemaker_session.default_bucket(), Key = file_key)
response_bytes = response['Body'].read().decode('utf-8')

print(response_bytes[:1000] + '...')

{"SageMakerOutput":{"ner":[{"end_pos":9,"key":"THE","start_pos":6,"type":"NE_MERCHANT"},{"end_pos":14,"key":"HOME","start_pos":10,"type":"NE_MERCHANT"},{"end_pos":20,"key":"DEPOT","start_pos":15,"type":"NE_MERCHANT"},{"end_pos":33,"key":"MIDLAND","start_pos":26,"type":"NE_STORE_LOCATION"},{"end_pos":36,"key":"TX","start_pos":34,"type":"NE_STORE_LOCATION"}]},"id":0,"instance":"02/07 THE HOME DEPOT 0561 MIDLAND TX"}
{"SageMakerOutput":{"ner":[{"end_pos":7,"key":"THE","start_pos":4,"type":"NE_MERCHANT"},{"end_pos":12,"key":"HOME","start_pos":8,"type":"NE_MERCHANT"},{"end_pos":13,"key":"-","start_pos":12,"type":"NE_MERCHANT"},{"end_pos":18,"key":"Depot","start_pos":13,"type":"NE_MERCHANT"},{"end_pos":32,"key":"N","start_pos":31,"type":"NE_STORE_LOCATION"},{"end_pos":37,"key":"MESA","start_pos":33,"type":"NE_STORE_LOCATION"},{"end_pos":39,"key":"\u0026","start_pos":38,"type":"NE_MERCHANT"},{"end_pos":46,"key":"REMCON","start_pos":40,"type":"NE_MERCHANT"},{"end_pos":49,"key":"EL","start_pos"

# Cleanup

In [None]:
ner_predictor.delete_endpoint()
ner_predictor.delete_model()

Finally, if the AWS Marketplace subscription was created just for an experiment and you would like to unsubscribe, here are the steps that can be followed. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model.

**Steps to unsubscribe from the product on AWS Marketplace:**

Navigate to Machine Learning tab on Your [Software subscriptions page](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=lbr_tab_ml).
Locate the listing that you would need to cancel, and click Cancel Subscription.