# NLP Online Explainability with SageMaker Clarify

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

---

1. [Introduction](#Introduction)
1. [General Setup](#General-Setup)
    1. [Install dependencies](#Install-dependencies)
    1. [Import libraries](#Import-libraries)
    1. [Set configurations](#Set-configurations)
    1. [Create serializer and deserializer](#Create-serializer-and-deserializer)
    1. [For visualization](#For-visualization)
1. [Prepare data](#Prepare-data)
    1. [Download data](#Download-data)
    1. [Loading the data](#Loading-the-data)
    1. [Data preparation for model training](#Data-preparation-for-model-training)
    1. [Upload the dataset](#Upload-the-dataset)
1. [Train and Deploy Hugging Face Model](#Train-and-Deploy-Hugging-Face-Model)
    1. [Train model with Hugging Face estimator](#Train-model-with-Hugging-Face-estimator)
    1. [Download the trained model files](#Download-the-trained-model-files)
    1. [Prepare model container definition](#Prepare-model-container-definition)
1. [Create endpoint](#Create-endpoint)
    1. [Create model](#Create-model)
    1. [Create endpoint config](#Create-endpoint-config)
    1. [Create endpoint](#Create-endpoint)
1. [Invoke endpoint](#Invoke-endpoint)
    1. [Single record request](#Single-record-request)
    1. [Single record request, no explanation](#Single-record-request,-no-explanation)
    1. [Batch request, explain both](#Batch-request,-explain-both)
    1. [Batch request with more records, explain some of the records](#Batch-request-with-more-records,-explain-some-of-the-records)
1. [Cleanup](#Cleanup)

## Introduction

Amazon SageMaker Clarify helps improve your machine learning models by detecting potential bias and helping explain how these models make predictions. The fairness and explainability functionality provided by SageMaker Clarify takes a step towards enabling AWS customers to build trustworthy and understandable machine learning models. 

SageMaker Clarify currently supports explainability for SageMaker models as an offline processing job. This example notebook showcases a new feature for explainability on a [SageMaker real-time inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html) endpoint, a.k.a. [Online Explainability](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-online-explainability.html).

This example notebook walks you through:  
1. Key terms and concepts needed to understand SageMaker Clarify
1. Trained the model on the Women's ecommerce clothing reviews dataset.
1. Create a model from trained model artifacts, create an endpoint configuration with the new SageMaker Clarify explainer configuration, and create an endpoint using the same explainer configuration.
1. Invoke the endpoint with single and batch request with different `EnableExplanations` query.
1. Explaining the importance of the various input features on the model's decision.


## General Setup

We recommend you use `Python 3 (Data Science)` kernel on SageMaker Studio or `conda_python3` kernel on SageMaker Notebook Instance.

### Install dependencies

Install required dependencies. `datasets[s3]` and `transformers` are used for data preparation and training, `captum` is used to visualize the feature attributions.

In [1]:
! pip install -r requirements.txt --upgrade --quiet

### Import libraries

In [2]:
import boto3
import csv
import pandas as pd
import numpy as np
import pprint
import tarfile

from sagemaker.huggingface import HuggingFace
from datasets import Dataset
from datasets.filesystems import S3FileSystem
from captum.attr import visualization
from sklearn.model_selection import train_test_split
from sagemaker import get_execution_role, Session
from sagemaker.s3 import S3Uploader
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer
from sagemaker.utils import unique_name_from_base

### Set configurations

In [3]:
boto3_session = boto3.session.Session()
sagemaker_client = boto3.client("sagemaker")
sagemaker_runtime_client = boto3.client("sagemaker-runtime")

# Initialize sagemaker session
sagemaker_session = Session(
    boto_session=boto3_session,
    sagemaker_client=sagemaker_client,
    sagemaker_runtime_client=sagemaker_runtime_client,
)

region = sagemaker_session.boto_region_name
print(f"Region: {region}")

role = get_execution_role()
print(f"Role: {role}")

prefix = unique_name_from_base("DEMO-NLP-Women-Clothing")

s3_bucket = sagemaker_session.default_bucket()
s3_prefix = f"sagemaker/{prefix}"
s3_key = f"s3://{s3_bucket}/{s3_prefix}"
print(f"Demo S3 key: {s3_key}")

model_name = f"{prefix}-model"
print(f"Demo model name: {model_name}")
endpoint_config_name = f"{prefix}-endpoint-config"
print(f"Demo endpoint config name: {endpoint_config_name}")
endpoint_name = f"{prefix}-endpoint"
print(f"Demo endpoint name: {endpoint_name}")

# SageMaker Clarify model directory name
model_path = "model/"

# Instance type for training and hosting
instance_type = "ml.m5.xlarge"

Region: us-west-2
Role: arn:aws:iam::000000000000:role/service-role/SMClarifySageMaker-ExecutionRole
Demo S3 key: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-NLP-Women-Clothing-1687464029-bfff
Demo model name: DEMO-NLP-Women-Clothing-1687464029-bfff-model
Demo endpoint config name: DEMO-NLP-Women-Clothing-1687464029-bfff-endpoint-config
Demo endpoint name: DEMO-NLP-Women-Clothing-1687464029-bfff-endpoint


### Create serializer and deserializer

CSV serializer to serialize test data to string

In [4]:
csv_serializer = CSVSerializer()

JSON deserializer to deserialize invoke endpoint response

In [5]:
json_deserializer = JSONDeserializer()

### For visualization
We have some methods implemented for visualization in `visualization_utils.py` file.

In [6]:
%run visualization_utils.py

## Prepare data

### Download data
Data Source: `https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews/`

The Women’s E-Commerce Clothing Reviews dataset has been made available under a Creative Commons Public Domain license. A copy of the dataset has been saved in a sample data Amazon S3 bucket. In the first section of the notebook, we’ll walk through how to download the data and get started with building the ML workflow as a SageMaker pipeline

In [7]:
s3 = boto3.client("s3")
s3.download_file(
    f"sagemaker-example-files-prod-{region}",
    "datasets/tabular/womens_clothing_ecommerce/Womens_Clothing_E-Commerce_Reviews.csv",
    "womens_clothing_reviews_dataset.csv",
)

### Load the dataset

In [8]:
df = pd.read_csv("womens_clothing_reviews_dataset.csv", index_col=[0])
df.head()

Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name
0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates
1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses
2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses
3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants
4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses


**Context**

The Women’s Clothing E-Commerce dataset contains reviews written by customers. Because the dataset contains real commercial data, it has been anonymized, and any references to the company in the review text and body have been replaced with “retailer”.



**Content**

The dataset contains 23486 rows and 10 columns. Each row corresponds to a customer review.

The columns include:

* Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
* Age: Positive Integer variable of the reviewer's age.
* Title: String variable for the title of the review.
* Review Text: String variable for the review body.
* Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best.
* Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended.
* Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive.
* Division Name: Categorical name of the product high level division.
* Department Name: Categorical name of the product department name.
* Class Name: Categorical name of the product class name.

**Goal**

To predict the sentiment of a review based on the text, and then explain the predictions using SageMaker Clarify.

### Data preparation for model training

#### Target Variable Creation
Since the dataset does not contain a column that indicates the sentiment of the customer reviews, lets create one. To do this, let's assume that reviews with a `Rating` of 4 or higher indicate positive sentiment and reviews with a `Rating` of 2 or lower indicate negative sentiment. Let's also assume that a `Rating` of 3 indicates neutral sentiment and exclude these rows from the dataset. Additionally, to predict the sentiment of a review, we are going to use the `Review Text` column; therefore let's remove rows that are empty in the `Review Text` column of the dataset


In [None]:
def create_target_column(df, min_positive_score, max_negative_score):
    neutral_values = [i for i in range(max_negative_score + 1, min_positive_score)]
    for neutral_value in neutral_values:
        df = df[df["Rating"] != neutral_value]
    df["Sentiment"] = df["Rating"] >= min_positive_score
    return df.replace({"Sentiment": {True: 1, False: 0}})


df = create_target_column(df, 4, 2)
df = df[~df["Review Text"].isna()]

#### Train-Validation-Test splits

The most common approach for model evaluation is using the train/validation/test split. Although this approach can be very effective in general, it can result in misleading results and potentially fail when used on classification problems with a severe class imbalance. Instead, the technique must be modified to stratify the sampling by the class label as below. Stratification ensures that all classes are well represented across the train, validation and test datasets.


In [10]:
target = "Sentiment"
cols = "Review Text"

X = df[cols]
y = df[target]

# Data split: 11%(val) of the 90% (train and test) of the dataset ~ 10%; resulting in 80:10:10split
test_dataset_size = 0.10
val_dataset_size = 0.11
RANDOM_STATE = 42

# Stratified train-val-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=test_dataset_size, stratify=y, random_state=RANDOM_STATE
)
X_train, X_val, y_train, y_val = train_test_split(
    X_train,
    y_train,
    test_size=val_dataset_size,
    stratify=y_train,
    random_state=RANDOM_STATE,
)

print(
    "Dataset: train ",
    X_train.shape,
    y_train.shape,
    y_train.value_counts(dropna=False, normalize=True).to_dict(),
)
print(
    "Dataset: validation ",
    X_val.shape,
    y_val.shape,
    y_val.value_counts(dropna=False, normalize=True).to_dict(),
)
print(
    "Dataset: test ",
    X_test.shape,
    y_test.shape,
    y_test.value_counts(dropna=False, normalize=True).to_dict(),
)

# Combine the independent columns with the label
df_train = pd.concat([X_train, y_train], axis=1).reset_index(drop=True)
df_test = pd.concat([X_test, y_test], axis=1).reset_index(drop=True)
df_val = pd.concat([X_val, y_val], axis=1).reset_index(drop=True)

Dataset: train  (15874,) (15874,) {1: 0.8804334131283861, 0: 0.11956658687161396}
Dataset: validation  (1962,) (1962,) {1: 0.8802242609582059, 0: 0.11977573904179409}
Dataset: test  (1982,) (1982,) {1: 0.8804238143289607, 0: 0.11957618567103935}


In [11]:
headers = df_test.columns.to_list()
feature_headers = headers[0]
label_header = headers[1]
print(f"Feature names: {feature_headers}")
print(f"Label name: {label_header}")
print(f"Test data (without label column):")
test_data = df_test.iloc[:, :1]
test_data

Feature names: Review Text
Label name: Sentiment
Test data (without label column):


Unnamed: 0,Review Text
0,"I am 5'6"", 130 lbs with an athletic body type ..."
1,The design on the blue sweater is actually a d...
2,The colors are so much brighter than pictured....
3,A very versatile and cozy top. would look grea...
4,Just not cute. i don't know how else to descri...
...,...
1977,"As soon as i opened the package, i knew that t..."
1978,"As the title suggests, i am very skeptical and..."
1979,I love this dress. i'm 6' so it's a tad bit sh...
1980,I love the concept of this dress. i love the s...


We have split the dataset into train, test, and validation datasets. We use the train and validation datasets during training process, and run Clarify on the test dataset.

### Upload the dataset
Here, we upload the prepared datasets to S3 buckets so that we can train the model with the Hugging Face Estimator.

In [12]:
df_train.to_csv("train.csv", index=False, header=False)
df_val.to_csv("test.csv", index=False, header=False)

In [13]:
training_input_path = f"{s3_key}/train"
print(f"training input path: {training_input_path}")
val_input_path = f"{s3_key}/test"
print(f"validation input path: {val_input_path}")

train_uri = S3Uploader.upload("train.csv", training_input_path)
test_uri = S3Uploader.upload("test.csv", val_input_path)

training input path: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-NLP-Women-Clothing-1687464029-bfff/train
validation input path: s3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-NLP-Women-Clothing-1687464029-bfff/test


## Train and Deploy Hugging Face Model

In this step of the workflow, we use the [Hugging Face Estimator](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html) to load the pre-trained `distilbert-base-uncased` model and fine-tune the model on our dataset.

### Train model with Hugging Face estimator
The hyperparameters defined below are parameters that are passed to the custom PyTorch code in [`scripts/train.py`](./scripts/train.py). The only required parameter is `model_name`. The other parameters like `epoch`, `train_batch_size` all have default values which can be overridden by setting their values here.

The training job requires GPU instance type. Here, we use `ml.g4dn.xlarge`.

In [14]:
training_input_path

's3://sagemaker-us-west-2-000000000000/sagemaker/DEMO-NLP-Women-Clothing-1687464029-bfff/train'

In [15]:
# Hyperparameters passed into the training job
hyperparameters = {
    "epochs": 1,
    "model_name": "distilbert-base-uncased",
    "train_file": "train.csv",
    "test_file": "test.csv",
}

huggingface_estimator = HuggingFace(
    entry_point="train.py",
    source_dir="scripts",
    instance_type="ml.g4dn.xlarge",
    instance_count=1,
    transformers_version="4.6.1",
    pytorch_version="1.7.1",
    py_version="py36",
    role=role,
    hyperparameters=hyperparameters,
    disable_profiler=True,
    debugger_hook_config=False,
)

# starting the train job with our uploaded datasets as input
huggingface_estimator.fit({"train": training_input_path, "test": val_input_path}, logs=True)

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: huggingface-pytorch-training-2023-06-22-20-00-31-761


Using provided s3_resource
2023-06-22 20:00:32 Starting - Starting the training job...
2023-06-22 20:00:47 Starting - Preparing the instances for training......
2023-06-22 20:01:51 Downloading - Downloading input data...
2023-06-22 20:02:16 Training - Downloading the training image...............
2023-06-22 20:04:42 Training - Training image download completed. Training in progress.[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2023-06-22 20:04:55,537 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2023-06-22 20:04:55,567 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2023-06-22 20:04:55,571 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2023-06-22 20:04:55,825 sagemaker-training-toolkit INFO     Invoking user script[0m
[34mTraining Env:[0m
[34m{
    "addi

### Download the trained model files

In [16]:
! aws s3 cp {huggingface_estimator.model_data} model.tar.gz
! mkdir -p {model_path}
! tar -xvf model.tar.gz -C  {model_path}/

download: s3://sagemaker-us-west-2-000000000000/huggingface-pytorch-training-2023-06-22-20-00-31-761/output/model.tar.gz to ./model.tar.gz
tokenizer.json
training_args.bin
tokenizer_config.json
special_tokens_map.json
config.json
vocab.txt
pytorch_model.bin


### Prepare model container definition

We are going to use the trained model files along with the HuggingFace Inference container to deploy the model to a SageMaker endpoint.

In [17]:
with tarfile.open("hf_model.tar.gz", mode="w:gz") as archive:
    archive.add(model_path, recursive=True)
    archive.add("code/")
directory_name = s3_prefix.split("/")[-1]
zipped_model_path = sagemaker_session.upload_data(
    path="hf_model.tar.gz", key_prefix=directory_name + "/hf-model-sm"
)
zipped_model_path

's3://sagemaker-us-west-2-000000000000/DEMO-NLP-Women-Clothing-1687464029-bfff/hf-model-sm/hf_model.tar.gz'

Create a new model object and then update its model artifact and inference script. The model object will be used to create the SageMaker model.

In [18]:
model = huggingface_estimator.create_model(name=model_name)
container_def = model.prepare_container_def(instance_type=instance_type)
container_def["ModelDataUrl"] = zipped_model_path
container_def["Environment"]["SAGEMAKER_PROGRAM"] = "inference.py"
pprint.pprint(container_def)

{'Environment': {'SAGEMAKER_CONTAINER_LOG_LEVEL': '20',
                 'SAGEMAKER_PROGRAM': 'inference.py',
                 'SAGEMAKER_REGION': 'us-west-2',
                 'SAGEMAKER_SUBMIT_DIRECTORY': ''},
 'Image': '763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.7.1-transformers4.6.1-cpu-py36-ubuntu18.04',
 'ModelDataUrl': 's3://sagemaker-us-west-2-000000000000/DEMO-NLP-Women-Clothing-1687464029-bfff/hf-model-sm/hf_model.tar.gz'}


## Create endpoint

### Create model

The following parameters are required to create a SageMaker model:

* `ExecutionRoleArn`: The ARN of the IAM role that Amazon SageMaker can assume to access the model artifacts/ docker images for deployment

* `ModelName`: name of the SageMaker model.

* `PrimaryContainer`: The location of the primary docker image containing inference code, associated artifacts, and custom environment map that the inference code uses when the model is deployed for predictions.

In [19]:
sagemaker_client.create_model(
    ExecutionRoleArn=role,
    ModelName=model_name,
    PrimaryContainer=container_def,
)
print(f"Model created: {model_name}")

Model created: DEMO-NLP-Women-Clothing-1687464029-bfff-model


### Create endpoint config

Create an endpoint configuration by calling the `create_endpoint_config` API. Here, supply the same `model_name` used in the `create_model` API call. The `create_endpoint_config` now supports the additional parameter `ClarifyExplainerConfig` to enable the Clarify explainer. The SHAP baseline is mandatory, it can be provided either as inline baseline data (the `ShapBaseline` parameter) or by a S3 baseline file (the `ShapBaselineUri` parameter). Baseline dataset type shall be the same as input dataset type, and baseline samples shall only include features. For more details on baseline selection please [refer this documentation](https://docs.aws.amazon.com/en_us/sagemaker/latest/dg/clarify-feature-attribute-shap-baselines.html).

Please see [the API documentation](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpointConfig.html) for details on other config parameters.

Here we use a special token as the baseline.

In [20]:
baseline = [["<UNK>"]]
print(f"SHAP baseline: {baseline}")

SHAP baseline: [['<UNK>']]


The `TextConfig` configured with `sentence` level granularity (When granularity is `sentence`, each sentence is a feature, and we need a few sentences per review for good visualization) and the language as English.

In [21]:
sagemaker_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "TestVariant",
            "ModelName": model_name,
            "InitialInstanceCount": 1,
            "InstanceType": instance_type,
        }
    ],
    ExplainerConfig={
        "ClarifyExplainerConfig": {
            "InferenceConfig": {"FeatureTypes": ["text"]},
            "ShapConfig": {
                "ShapBaselineConfig": {"ShapBaseline": csv_serializer.serialize(baseline)},
                "TextConfig": {"Granularity": "sentence", "Language": "en"},
            },
        }
    },
)

{'EndpointConfigArn': 'arn:aws:sagemaker:us-west-2:000000000000:endpoint-config/demo-nlp-women-clothing-1687464029-bfff-endpoint-config',
 'ResponseMetadata': {'RequestId': 'ad8d98fe-ac16-4227-b80b-570abb94ac58',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'ad8d98fe-ac16-4227-b80b-570abb94ac58',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '136',
   'date': 'Thu, 22 Jun 2023 20:10:54 GMT'},
  'RetryAttempts': 0}}

### Create endpoint

Once you have your model and endpoint configuration ready, use the `create_endpoint` API to create your endpoint. The `endpoint_name` must be unique within an AWS Region in your AWS account. The `create_endpoint` API is synchronous in nature and returns an immediate response with the endpoint status being `Creating` state.

In [22]:
sagemaker_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name,
)

{'EndpointArn': 'arn:aws:sagemaker:us-west-2:000000000000:endpoint/demo-nlp-women-clothing-1687464029-bfff-endpoint',
 'ResponseMetadata': {'RequestId': 'd8f02d0a-b221-4dc3-8a58-317eb9077844',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'd8f02d0a-b221-4dc3-8a58-317eb9077844',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '116',
   'date': 'Thu, 22 Jun 2023 20:10:55 GMT'},
  'RetryAttempts': 0}}

Wait for the endpoint to be in "InService" state

In [23]:
sagemaker_session.wait_for_endpoint(endpoint_name)

--!

{'EndpointName': 'DEMO-NLP-Women-Clothing-1687464029-bfff-endpoint',
 'EndpointArn': 'arn:aws:sagemaker:us-west-2:000000000000:endpoint/demo-nlp-women-clothing-1687464029-bfff-endpoint',
 'EndpointConfigName': 'DEMO-NLP-Women-Clothing-1687464029-bfff-endpoint-config',
 'ProductionVariants': [{'VariantName': 'TestVariant',
   'DeployedImages': [{'SpecifiedImage': '763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.7.1-transformers4.6.1-cpu-py36-ubuntu18.04',
     'ResolvedImage': '763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference@sha256:97cdf11484b82818b195579c7b5d8f16bc97d600ae352f47667e0587de7ae7f0',
     'ResolutionTime': datetime.datetime(2023, 6, 22, 20, 10, 56, 696000, tzinfo=tzlocal())}],
   'CurrentWeight': 1.0,
   'DesiredWeight': 1.0,
   'CurrentInstanceCount': 1,
   'DesiredInstanceCount': 1}],
 'EndpointStatus': 'InService',
 'CreationTime': datetime.datetime(2023, 6, 22, 20, 10, 56, 158000, tzinfo=tzlocal()),
 'LastModified

## Invoke endpoint

There are expanding business needs and legislative regulations that require explanations of _why_ a model made the decision it did. SageMaker Clarify uses SHAP to explain the contribution that each input feature makes to the final decision.

Kernel SHAP algorithm requires a baseline (also known as background dataset). By definition, `baseline` should either be a S3 URI to the baseline dataset file, or an in-place list of records. Baseline dataset type shall be the same as the original request data type, and baseline records shall only include features. 

Below are the several different combination of endpoint invocation, call them one by one and visualize the explanations by running the subsequent cell. 

### Single record request

Put only one record in the request body, and then send the request to the endpoint to get its predictions and explanations.

In [24]:
num_records = 1

In [25]:
response = sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    Accept="text/csv",
    Body=csv_serializer.serialize(test_data.iloc[:num_records, :].to_numpy()),
)
pprint.pprint(response)

{'Body': <botocore.response.StreamingBody object at 0x7fbc8bb4ffd0>,
 'ContentType': 'application/json',
 'InvokedProductionVariant': 'TestVariant',
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '809',
                                      'content-type': 'application/json',
                                      'date': 'Thu, 22 Jun 2023 20:13:28 GMT',
                                      'x-amzn-invoked-production-variant': 'TestVariant',
                                      'x-amzn-requestid': '3acca534-1feb-42dc-b322-9f8d64f27e75'},
                      'HTTPStatusCode': 200,
                      'RequestId': '3acca534-1feb-42dc-b322-9f8d64f27e75',
                      'RetryAttempts': 0}}


In [26]:
result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)

{'explanations': {'kernel_shap': [[{'attributions': [{'attribution': [0.06300842799999996],
                                                      'description': {'partial_text': 'I '
                                                                                      'am '
                                                                                      '5\'6", '
                                                                                      '130 '
                                                                                      'lbs '
                                                                                      'with '
                                                                                      'an '
                                                                                      'athletic '
                                                                                      'body '
                                                                 

In [27]:
visualize_result(result, df_test[label_header][:num_records])

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
0.0,1 (0.97),True,0.92,"I am 5'6"", 130 lbs with an athletic body type and i ordered a size small. these were really baggy in the thigh/quadricep area and made my thighs look bulky. the fabric quality is very nice and i like the idea of them for curvier body types. my son commented that they looked like pajama pants and i agreed."
,,,,


### Single record request, no explanation

Use the `EnableExplanations` parameter to disable the explanations for this request.

In [28]:
num_records = 1

In [29]:
response = sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    Accept="text/csv",
    Body=csv_serializer.serialize(test_data.iloc[:num_records, :].to_numpy()),
    EnableExplanations="`false`",  # Do not provide explanations
)
pprint.pprint(response)

{'Body': <botocore.response.StreamingBody object at 0x7fbc8cd62440>,
 'ContentType': 'application/json',
 'InvokedProductionVariant': 'TestVariant',
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '98',
                                      'content-type': 'application/json',
                                      'date': 'Thu, 22 Jun 2023 20:13:28 GMT',
                                      'x-amzn-invoked-production-variant': 'TestVariant',
                                      'x-amzn-requestid': '268abfc9-870c-4423-8e38-30a86add6a20'},
                      'HTTPStatusCode': 200,
                      'RequestId': '268abfc9-870c-4423-8e38-30a86add6a20',
                      'RetryAttempts': 0}}


In [30]:
result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)

{'explanations': {},
 'predictions': {'content_type': 'text/csv', 'data': '0.9713779\n'},
 'version': '1.0'}


In [31]:
visualize_result(result, df_test[label_header][:num_records])

No Clarify explanations for the record(s)


### Batch request, explain both

Put two records in the request body, and then send the request to the endpoint to get their predictions and explanations.

In [32]:
num_records = 2

In [33]:
response = sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    Accept="text/csv",
    Body=csv_serializer.serialize(test_data.iloc[:num_records, :].to_numpy()),
)
pprint.pprint(response)

{'Body': <botocore.response.StreamingBody object at 0x7fbc8bb4ee30>,
 'ContentType': 'application/json',
 'InvokedProductionVariant': 'TestVariant',
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '1574',
                                      'content-type': 'application/json',
                                      'date': 'Thu, 22 Jun 2023 20:13:31 GMT',
                                      'x-amzn-invoked-production-variant': 'TestVariant',
                                      'x-amzn-requestid': '168b10a5-27d2-479a-a5cc-bb25d41e7030'},
                      'HTTPStatusCode': 200,
                      'RequestId': '168b10a5-27d2-479a-a5cc-bb25d41e7030',
                      'RetryAttempts': 0}}


In [34]:
result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)

{'explanations': {'kernel_shap': [[{'attributions': [{'attribution': [0.06300842799999996],
                                                      'description': {'partial_text': 'I '
                                                                                      'am '
                                                                                      '5\'6", '
                                                                                      '130 '
                                                                                      'lbs '
                                                                                      'with '
                                                                                      'an '
                                                                                      'athletic '
                                                                                      'body '
                                                                 

In [35]:
visualize_result(result, df_test[label_header][:num_records])

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
0.0,1 (0.97),True,0.92,"I am 5'6"", 130 lbs with an athletic body type and i ordered a size small. these were really baggy in the thigh/quadricep area and made my thighs look bulky. the fabric quality is very nice and i like the idea of them for curvier body types. my son commented that they looked like pajama pants and i agreed."
,,,,
1.0,1 (0.99),True,2.68,"The design on the blue sweater is actually a dark navy (not black, as i thought it was), but it still looks beautiful with black underneath. rather than jeans like it's shown, the v-neck is a nice change from other cardigans i have - looks great with a cami or another v-neck underneath it. soft, nice medium-weight and not at all itchy. happy with this as an easy everyday sweater."
,,,,


### Batch request with more records, explain some of the records

Put a few more records to the request body, and then use the `EnableExplanations` expression to filter the records to be explained according to their predictions.

In [36]:
num_records = 4

In [37]:
response = sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    Accept="text/csv",
    Body=csv_serializer.serialize(test_data.iloc[:num_records, :].to_numpy()),
    EnableExplanations="[0]>`0.99`",  # Explain a record only when its prediction meets the condition
)
pprint.pprint(response)

{'Body': <botocore.response.StreamingBody object at 0x7fbc8bb4db70>,
 'ContentType': 'application/json',
 'InvokedProductionVariant': 'TestVariant',
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '1340',
                                      'content-type': 'application/json',
                                      'date': 'Thu, 22 Jun 2023 20:13:33 GMT',
                                      'x-amzn-invoked-production-variant': 'TestVariant',
                                      'x-amzn-requestid': 'd3ddf9ed-042d-41de-ad26-71a77fc90d0b'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'd3ddf9ed-042d-41de-ad26-71a77fc90d0b',
                      'RetryAttempts': 0}}


In [38]:
result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)

{'explanations': {'kernel_shap': [None,
                                  [{'attributions': [{'attribution': [0.0625544215],
                                                      'description': {'partial_text': 'The '
                                                                                      'design '
                                                                                      'on '
                                                                                      'the '
                                                                                      'blue '
                                                                                      'sweater '
                                                                                      'is '
                                                                                      'actually '
                                                                                      'a '
                             

In [39]:
visualize_result(result, df_test[label_header][:num_records])

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.99),True,2.68,"The design on the blue sweater is actually a dark navy (not black, as i thought it was), but it still looks beautiful with black underneath. rather than jeans like it's shown, the v-neck is a nice change from other cardigans i have - looks great with a cami or another v-neck underneath it. soft, nice medium-weight and not at all itchy. happy with this as an easy everyday sweater."
,,,,
1.0,1 (1.00),True,2.96,A very versatile and cozy top. would look great dressed up or down for a casual comfy fall day. what a fun piece for my wardrobe!
,,,,


## Cleanup

Finally, don’t forget to clean up the resources we set up and used for this demo!

In [40]:
sagemaker_client.delete_endpoint(EndpointName=endpoint_name);

In [41]:
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name);

In [42]:
sagemaker_client.delete_model(ModelName=model_name);

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/sagemaker-clarify|online_explainability|natural_language_processing|nlp_online_explainability_with_sagemaker_clarify.ipynb)
