
## Feature attribute drift monitoring with Amazon SageMaker Clarify


This notebook provides a walkthrough of the high level steps involved in monitoring a production ML model with SageMaker Clarify for bias drift and feature attribute drift. To demonstrate the data drift monitoring we will use a pre-trained model to deploy an endpoint.  We provide the pre-trained model artifact along with baseline and test datasets along with this notebook.

1. Set up
2. Enable datacapture on a SageMaker endpoint 
3. Generate a baseline with Model Monitor 
4. Schedule continous monitoring to monitor predictions for bias drift on a regular basis.
5. Analyze bias drift monitoring results
6. Schedule continous monitoring to monitor predictions for feature attribute drift on a regular basis.
7. Analyze feature attribute drift monitoring results
8. Clean up

### 1. Set up

#### 1.1. Imports

In [1]:
import copy
import json
import random
import time
import pandas as pd
import os
import boto3
import re
from botocore.response import StreamingBody
from sagemaker import get_execution_role, session

from datetime import datetime, timedelta

from sagemaker import get_execution_role, image_uris, Session

from time import gmtime, strftime
from sagemaker.model import Model
from sagemaker.image_uris import retrieve

from sagemaker.clarify import (
    BiasConfig,
    DataConfig,
    ModelConfig,
    ModelPredictedLabelConfig,
    SHAPConfig,
)
from sagemaker.model import Model
from sagemaker.model_monitor import (
    #BiasAnalysisConfig,
    CronExpressionGenerator,
    DataCaptureConfig,
    EndpointInput,
    ExplainabilityAnalysisConfig,
    #ModelBiasMonitor,
    ModelExplainabilityMonitor,
)
from sagemaker.s3 import S3Downloader, S3Uploader

#### 1.2 Setup variables

In [2]:
region = boto3.Session().region_name

role = get_execution_role()
print("RoleArn: {}".format(role))

#This is the bucket into which the data is captured
bucket = 'bestpractices-bucket-sm' ##TODO Upadate
prefix = "FeatureAttributionMonitoring"

data_capture_prefix = "{}/datacapture".format(prefix)
s3_capture_upload_path = "s3://{}/{}".format(bucket, data_capture_prefix)
reports_prefix = "{}/reports".format(prefix)
s3_report_path = "s3://{}/{}".format(bucket, reports_prefix)
#code_prefix = "{}/code".format(prefix)
#s3_code_preprocessor_uri = "s3://{}/{}/{}".format(bucket, code_prefix, "preprocessor.py")
#s3_code_postprocessor_uri = "s3://{}/{}/{}".format(bucket, code_prefix, "postprocessor.py")

ground_truth_upload_path = (
    f"s3://{bucket}/{prefix}/ground_truth_data/{datetime.now():%Y-%m-%d-%H-%M-%S}"
)

print("Capture path: {}".format(s3_capture_upload_path))
print("Report path: {}".format(s3_report_path))
#print("Preproc Code path: {}".format(s3_code_preprocessor_uri))
#print("Postproc Code path: {}".format(s3_code_postprocessor_uri))

RoleArn: arn:aws:iam::802439482869:role/service-role/AmazonSageMaker-ExecutionRole-20210418T143524
Capture path: s3://bestpractices-bucket-sm/FeatureAttributionMonitoring/datacapture
Report path: s3://bestpractices-bucket-sm/FeatureAttributionMonitoring/reports


#### 1.3 Setup service clients

In [3]:
s3_client = boto3.Session().client("s3")
sagemaker_runtime_client = boto3.Session().client("sagemaker-runtime")

### 2. Enable datacapture on a SageMaker endpoint 

Create an endpoint to showcase the data capture capability in action.

For the endpoint we will use a pre-trained XGBoost model that is ready to deploy. This model was trained in the previous chapters using the weather dataset and has been included in the model directory for ease of use.

Note that you can also train a new model and use your model and data below as well.

#### 2.1 Upload the model object into S3

In [5]:
model_file = open("model/weather-prediction-model.tar.gz", "rb")
s3_key = os.path.join(prefix, "weather-prediction-model.tar.gz")
boto3.Session().resource("s3").Bucket(bucket).Object(s3_key).upload_fileobj(model_file)

#### 2.2  Create SageMaker Model

In [6]:
model_name = f"weather-pred-model-monitor-{datetime.utcnow():%Y-%m-%d-%H%M}"
print("Model name: ", model_name)

model_url = "https://{}.s3-{}.amazonaws.com/{}/weather-prediction-model.tar.gz".format(
    bucket, region, prefix
)

print(model_url)

image_uri = retrieve("xgboost", boto3.Session().region_name, "1.2-1")

model = Model(name=model_name, image_uri=image_uri, model_data=model_url, role=role)

Model name:  weather-pred-model-monitor-2021-08-04-1719
https://bestpractices-bucket-sm.s3-us-west-2.amazonaws.com/FeatureAttributionMonitoring/weather-prediction-model.tar.gz


In [8]:
##Test and validation files to use with model
test_dataset="data/t_file.csv"
validation_dataset="data/v_file.csv"
dataset_type = "text/csv"

with open(validation_dataset) as f:
    headers_line = f.readline().rstrip()
    all_headers = headers_line.split(",")
##Get the label name
label_header = all_headers[0]
print(label_header)

value


#### 2.3  Configure datacapture

To enable data capture on the endpoint, you specify the new capture option called `DataCaptureConfig`. On enabling data capture, input to and output from the SageMaker endpoint are captured and saved in S3. Input captured includes the live inference traffic requests and output captured includes predictions from the deployed model.

In [10]:
endpoint_name = "weather-prediction-fa-drift-model-monitor-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("EndpointName={}".format(endpoint_name))

data_capture_config = DataCaptureConfig(
    enable_capture=True, sampling_percentage=100, destination_s3_uri=s3_capture_upload_path
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.m4.xlarge",
    endpoint_name=endpoint_name,
    data_capture_config=data_capture_config,
)

EndpointName=weather-prediction-fa-drift-model-monitor-2021-08-04-17-21-12


Using already existing model: weather-pred-model-monitor-2021-08-04-1719


-----------------!

#### 2.4 Capture data from endpoint 

This step invokes the endpoint with included sample data for about 3 minutes. Data is captured based on the sampling percentage specified and the capture continues until the data capture option is turned off.

In [11]:
##Use the test file in the data directory  to execute inferences using the test file 't_file.csv' provided
with open('data/t_file.csv', 'r') as TF:
    t_lines = TF.readlines()

In [12]:
### Define a method to run inferences against the endpoint
def get_predictions():
    smrt = boto3.Session().client("sagemaker-runtime")
    #Skip the first line since it has column headers
    for tl in t_lines[1:50]:
        #Remove the first column since it is the label
        test_list = tl.split(",")
        test_list.pop(0)
        test_string = ','.join([str(elem) for elem in test_list])
        
        #print("invoking with payload " + test_string)
    
        result = smrt.invoke_endpoint(EndpointName=endpoint_name,
                                   ContentType="text/csv",
                                   Body=test_string)
        rbody = StreamingBody(raw_stream=result['Body'],content_length=int(result['ResponseMetadata']['HTTPHeaders']['content-length']))
        #print(f"Result from {result['InvokedProductionVariant']} = {rbody.read().decode('utf-8')}")
        print(".", end="", flush=True)
        time.sleep(0.5)

In [13]:
#Get predictions
get_predictions()

.................................................

#### 2.5  View captured data

Now list the data capture files stored in Amazon S3. You should expect to see different files from different time periods organized based on the hour in which the invocation occurred. The format of the Amazon S3 path is:

`s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`

In [14]:
s3_capture_upload_path

's3://bestpractices-bucket-sm/FeatureAttributionMonitoring/datacapture'

In [17]:
#Note : If you see an error in this cell, it could be because the captured files didn't appear in S3 yet.
#Retry after a minute.
current_endpoint_capture_prefix = "{}/{}".format(data_capture_prefix, endpoint_name)

result = s3_client.list_objects(Bucket=bucket, Prefix=current_endpoint_capture_prefix)
capture_files = [capture_file.get("Key") for capture_file in result.get("Contents")]
print("Found Capture Files:")
print("\n ".join(capture_files))

Found Capture Files:
FeatureAttributionMonitoring/datacapture/weather-prediction-fa-drift-model-monitor-2021-08-04-17-21-12/AllTraffic/2021/08/04/17/33-09-047-8f7d436a-6d6d-4314-bb08-fd3349a7ffaa.jsonl


Next, view the content of a single capture file. Take a quick peek at the first few lines in the captured file.

In [18]:
def get_obj_body(obj_key):
    return s3_client.get_object(Bucket=bucket, Key=obj_key).get("Body").read().decode("utf-8")


capture_file = get_obj_body(capture_files[-1])
print(capture_file[:2000])

#capture_file = S3Downloader.read_file(capture_files[-1]).split("\n")[-10:-1]
#print(capture_file[-1])

{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0,2020,12,4,31,0,19.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0\n","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"-4.902510643005371","encoding":"CSV"}},"eventMetadata":{"eventId":"1585e9cd-a9c8-4d55-ab19-0f871d3cf094","inferenceTime":"2021-08-04T17:33:09Z"},"eventVersion":"0"}
{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0,2020,12,4,31,0,19.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0\n","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"-4.902510643005371","encoding":"CSV"}},"eventMetadata":{"eventId":"4c980935-76a5-4070-ab82-7e8fb5967d97","inferenceTime":"2021-08-04T17:33:09Z"},"eventVersion":"0"}
{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0,2020,12,4,31,0,19.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0\n","enc

Finally, the contents of a single line is present below in a formatted JSON file to observe a little better.

In [19]:
#print(json.dumps(json.loads(capture_file[-1]), indent=2))
print(json.dumps(json.loads(capture_file.split("\n")[0]), indent=2))

{
  "captureData": {
    "endpointInput": {
      "observedContentType": "text/csv",
      "mode": "INPUT",
      "data": "0,2020,12,4,31,0,19.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0\n",
      "encoding": "CSV"
    },
    "endpointOutput": {
      "observedContentType": "text/csv; charset=utf-8",
      "mode": "OUTPUT",
      "data": "-4.902510643005371",
      "encoding": "CSV"
    }
  },
  "eventMetadata": {
    "eventId": "1585e9cd-a9c8-4d55-ab19-0f871d3cf094",
    "inferenceTime": "2021-08-04T17:33:09Z"
  },
  "eventVersion": "0"
}


### 3. Create a baseline with ....

## Start generating some artificial traffic
The cell below starts a thread to send some traffic to the endpoint. If there is no traffic, the monitoring jobs are marked as `Failed` since there is no data to process.

Notice the `inferenceId` attribute used above to invoke. If this is present, it will be used to join with ground truth data (otherwise `eventId` will be used):

In [131]:
from threading import Thread
from time import sleep
import time

#Invoke the endpoint in a loop
def invoke_endpoint_forever():
    while True:
        get_predictions()
        
# Note that you need to stop the kernel to stop the invocations
#thread = Thread(target=invoke_endpoint_forever)
#thread.start()

## Start generating some fake ground truth

Besides captures, model bias monitoring execution also requires ground truth data. In real use cases, ground truth data should be regularly collected and uploaded to designated S3 location. In this example notebook, below code snippet is used to generate fake ground truth data. The first-party merge container will combine captures and ground truth data, and the merged data will be passed to model bias monitoring job for analysis. Similar to captures, the model bias monitoring execution will fail if there's no data to merge.

In [132]:
import random

test_dataset_size  = 350


def ground_truth_with_id(inference_id):
    random.seed(inference_id)  # to get consistent results
    rand = random.random()
    # format required by the merge container
    return {
        "groundTruthData": {
            "data": "1" if rand < 0.7 else "0",  # randomly generate positive labels 70% of the time # randomly generate positive labels 70% of the time #
             # TODO : Need to make this a decimal??
            "encoding": "CSV",
        },
        "eventMetadata": {
            "eventId": str(inference_id),
        },
        "eventVersion": "0",
    }


def upload_ground_truth(upload_time):
    records = [ground_truth_with_id(i) for i in range(test_dataset_size)]
    fake_records = [json.dumps(r) for r in records]
    data_to_upload = "\n".join(fake_records)
    target_s3_uri = f"{ground_truth_upload_path}/{upload_time:%Y/%m/%d/%H/%M%S}.jsonl"
    print(f"Uploading {len(fake_records)} records to", target_s3_uri)
    S3Uploader.upload_string_as_file_body(data_to_upload, target_s3_uri)

In [133]:
# Generate data for the last hour
upload_ground_truth(datetime.utcnow() - timedelta(hours=1))

Uploading 350 records to s3://bestpractices-bucket-sm/BiasDriftFeatureAttributionMonitoring/ground_truth_data/2021-08-02-23-38-12/2021/08/03/17/2451.jsonl


In [134]:
# Generate data once a hour
#def generate_fake_ground_truth(terminate_event):
def generate_fake_ground_truth_forever():
    upload_ground_truth(datetime.utcnow())
    for _ in range(0, 60):
        time.sleep(60)
        #if terminate_event.is_set():
         #   break

gt_thread = Thread(target=generate_fake_ground_truth_forever)
gt_thread.start()

#ground_truth_thread = WorkerThread(do_run=generate_fake_ground_truth)
#ground_truth_thread.start()

Uploading 350 records to s3://bestpractices-bucket-sm/BiasDriftFeatureAttributionMonitoring/ground_truth_data/2021-08-02-23-38-12/2021/08/03/18/2523.jsonl


In [None]:
baseline_prefix = prefix + "/baselining"
baseline_data_prefix = baseline_prefix + "/data"
baseline_results_prefix = baseline_prefix + "/results"

baseline_data_uri = f"s3://{bucket}/{baseline_data_prefix}"
baseline_results_uri = f"s3://{bucket}/{baseline_results_prefix}"
model_bias_baselining_job_result_uri = f"{baseline_results_uri}/model_bias"

print(f"Baseline data uri: {baseline_data_uri}")
print(f"Baseline results uri: {baseline_results_uri}")

In [25]:
##TODO : Delete it from here or from the previous section
validation_dataset = "v_file.csv"

validation_dataset = "data/data-drift-baseline-data.csv"  ##Lets try with this
#dataset_type = "text/csv"

#with open(validation_dataset) as f:
 #   headers_line = f.readline().rstrip()
#all_headers = headers_line.split(",")
#label_header = all_headers[0]

#with open('data/v_file.csv', 'r') as TF:
 #   t_lines = TF.readlines()
    
#df_for_validation = pd.read_csv("data/v_file.csv")
#df_for_validation['city'] 

`ModelConfig` is configuration related to model to be used for inferencing. In order to compute post-training bias metrics, the computation needs to get inferences for the model name provided. To accomplish this, the processing job will use the model to create an ephemeral endpoint (also known as "shadow endpoint"). The processing job will delete the shadow endpoint after the computations are completed. The configuration is also used by explainability monitor.

In [30]:
endpoint_instance_count=1

endpoint_instance_type="ml.m4.xlarge"
    
model_config = ModelConfig(
    model_name=model_name,
    instance_count=endpoint_instance_count,
    instance_type=endpoint_instance_type,
    content_type=dataset_type,
    accept_type=dataset_type,
)

### 4. Schedule continous monitoring
When you have collected the data above, analyze and monitor the data with Monitoring Schedules

#### 4.1 Generate prediction data for Model Quality  Monitoring

Start generating some artificial traffic.  The cell below starts a thread to send some traffic to the endpoint. Note that you need to stop the kernel to terminate this thread. If there is no traffic, the monitoring jobs are marked as `Failed` since there is no data to process.

In [251]:
import random


def ground_truth_with_id(inference_id):
    random.seed(inference_id)  # to get consistent results
    rand = random.random()
    return {
        "groundTruthData": {
            "data": "1" if rand < 0.7 else "0",  # randomly generate positive labels 70% of the time #
             # TODO : Need to make this a decimal??
            "encoding": "CSV",
        },
        "eventMetadata": {
            "eventId": str(inference_id),
        },
        "eventVersion": "0",
    }


def upload_ground_truth(records, upload_time):
    fake_records = [json.dumps(r) for r in records]
    data_to_upload = "\n".join(fake_records)
    target_s3_uri = f"{ground_truth_upload_path}/{upload_time:%Y/%m/%d/%H/%M%S}.jsonl"
    print(f"Uploading {len(fake_records)} records to", target_s3_uri)
    S3Uploader.upload_string_as_file_body(data_to_upload, target_s3_uri)

In [252]:
NUM_GROUND_TRUTH_RECORDS = 300


def generate_fake_ground_truth_forever():
    j = 0
    while True:
        fake_records = [ground_truth_with_id(i) for i in range(NUM_GROUND_TRUTH_RECORDS)]
        upload_ground_truth(fake_records, datetime.utcnow())
        j = (j + 1) % 5
        sleep(60 * 60)  # do this once an hour


gt_thread = Thread(target=generate_fake_ground_truth_forever)
gt_thread.start()

Uploading 300 records to s3://bestpractices-bucket-sm/BiasDriftFeatureAttributionMonitoring/ground_truth_data/2021-08-03-18-31-49/2021/08/04/15/3411.jsonl
Uploading 300 records to s3://bestpractices-bucket-sm/BiasDriftFeatureAttributionMonitoring/ground_truth_data/2021-08-03-18-31-49/2021/08/04/15/5615.jsonl


#### 4.4 Create a monitoring schedule

Now that you have the baseline information and ground truth labels, create a monitoring schedule to run model quality monitoring job.

# PART C: Model Explainability Monitor

Model explainability monitor can explain the predictions of a deployed model producing inferences and detect feature attribution drift on a regular basis.

In [26]:
session = Session()
model_explainability_monitor = ModelExplainabilityMonitor(
    role=role,
    sagemaker_session=session,
    #max_runtime_in_seconds=5400,
    max_runtime_in_seconds=3000,
)

## Create a baselining job

Similary, a baselining job can be scheduled to suggest constraints for model explainability monitor.

### Configuration

In this example, the explainability baselining job shares the test dataset with the bias baselining job, so here it uses the same `DataConfig`, the only difference is the job output URI.

In [27]:
model_explainability_baselining_job_result_uri = f"{baseline_results_uri}/model_explainability"
model_explainability_data_config = DataConfig(
    s3_data_input_path=validation_dataset,
    s3_output_path=model_explainability_baselining_job_result_uri,
    label=label_header,
    headers=all_headers,
    dataset_type=dataset_type,
)

Currently the Clarify explainer offers a scalable and efficient implementation of SHAP, so the explainability config is `SHAPConfig`, including
* baseline: A list of rows (at least one) or S3 object URI to be used as the baseline dataset in the Kernel SHAP algorithm. The format should be the same as the dataset format. Each row should contain only the feature columns/values and omit the label column/values.
* num_samples: Number of samples to be used in the Kernel SHAP algorithm. This number determines the size of the generated synthetic dataset to compute the SHAP values.
* agg_method: Aggregation method for global SHAP values. Valid values are
  * "mean_abs" (mean of absolute SHAP values for all instances),
  * "median" (median of SHAP values for all instances) and
  * "mean_sq" (mean of squared SHAP values for all instances).
* use_logit: Indicator of whether the logit function is to be applied to the model predictions. Default is False. If "use_logit" is true then the SHAP values will have log-odds units.
* save_local_shap_values (bool): Indicator of whether to save the local SHAP values in the output location. Default is True.

In [28]:
# Here use the mean value of test dataset as SHAP baseline
test_dataframe_full = pd.read_csv(test_dataset, header=[0])
#Remove the first column since it is the label
test_dataframe = test_dataframe_full.iloc[:, 1:]
print(test_dataframe)

shap_baseline = [list(test_dataframe.mean())]

print(shap_baseline)

shap_config = SHAPConfig(
    baseline=shap_baseline,
    num_samples=50,
    #num_samples=100,
    agg_method="mean_abs",
    save_local_shap_values=False,
)

       ismobile  year  month  quarter  day  isBadAir  location   city  \
0             0  2020     12        4   31         0      19.0    0.0   
1             0  2020     12        4   31         0      19.0    0.0   
2             0  2020     12        4   31         0      19.0    0.0   
3             0  2020     12        4   31         0      19.0    0.0   
4             0  2020     12        4   31         0      19.0    0.0   
...         ...   ...    ...      ...  ...       ...       ...    ...   
23154         0  2021      1        1    1         0    3424.0  127.0   
23155         0  2020     12        4   31         0     333.0  165.0   
23156         0  2020     12        4   31         0     333.0  165.0   
23157         0  2020     12        4   31         0     333.0  165.0   
23158         0  2020     12        4   31         0     333.0  165.0   

       sourcename  sourcetype  no2   o3  pm10  pm25  so2   co  
0             6.0         0.0  0.0  0.0   0.0   0.0  0.0  1

### Kick off baselining job

The same model_config is required, because the explainability baselining job needs to create shadow endpoint to get predictions for generated synthetic dataset.

In [31]:
model_explainability_monitor.suggest_baseline(
    data_config=model_explainability_data_config,
    model_config=model_config,
    explainability_config=shap_config,
)
print(
    f"ModelExplainabilityMonitor baselining job: {model_explainability_monitor.latest_baselining_job_name}"
)


Job Name:  baseline-suggestion-job-2021-08-04-17-39-09-269
Inputs:  [{'InputName': 'dataset', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://sagemaker-us-west-2-802439482869/baseline-suggestion-job-2021-08-04-17-39-09-269/input/dataset/data-drift-baseline-data.csv', 'LocalPath': '/opt/ml/processing/input/data', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'analysis_config', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://bestpractices-bucket-sm/FeatureAttributionMonitoring/baselining/results/model_explainability/analysis_config.json', 'LocalPath': '/opt/ml/processing/input/config', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]
Outputs:  [{'OutputName': 'analysis_result', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://bestpractices-bucket-sm/FeatureAttributionMonitoring/baselining/results/model_explainability', '

Wait for baselining job to finish (or skip this cell because the monitor to be scheduled will wait for it anyway).

In [32]:
model_explainability_monitor.latest_baselining_job.wait(logs=False)

..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................!

Now we can inspects the constraints suggested by the baseline job. 

In [33]:

model_explainability_constraints = model_explainability_monitor.suggested_constraints()
print()
print(
    f"ModelExplainabilityMonitor suggested constraints: {model_explainability_constraints.file_s3_uri}"
)
print(S3Downloader.read_file(model_explainability_constraints.file_s3_uri))


ModelExplainabilityMonitor suggested constraints: s3://bestpractices-bucket-sm/FeatureAttributionMonitoring/baselining/results/model_explainability/analysis.json
{
    "version": "1.0",
    "explanations": {
        "kernel_shap": {
            "label0": {
                "global_shap_values": {
                    "ismobile": 0.007424901047090124,
                    "year": 0.009579587470671898,
                    "month": 0.010584970117213112,
                    "quarter": 0.007406574647169218,
                    "day": 0.007720702139818704,
                    "isBadAir": 0.011302626773941254,
                    "location": 0.0192333616285298,
                    "city": 0.01893646882046639,
                    "sourcename": 0.01956434880904637,
                    "sourcetype": 0.007430860663886335,
                    "no2": 0.007457049793477212,
                    "o3": 0.007441452425478423,
                    "pm10": 0.014456348477081812,
                    "pm25": 0.00

## Schedule model explainability monitor

Call `create_monitoring_schedule()` method to schedule a hourly monitor, to analyze the data with monitoring schedule. If a baselining job has been submitted, then the monitor will automatically pick up analysis configuration from the baselining job. But if the baselining step is skipped, or the capture dataset has different nature than the training dataset, then analysis configuration has to be provided.

`ModelConfig` is required by `ExplainabilityAnalysisConfig` for the same reason as it is required by the baselining job. Note that only features are required for computing feature attribution, so ground truth label should be excluded.

In [241]:
#CronExpressionGenerator.hourly()

'cron(0 * ? * * *)'

In [242]:
#CronExpressionGenerator.daily_every_x_hours(hour_interval=2, starting_hour=0)

'cron(0 0/2 ? * * *)'

In [34]:
model_explainability_monitor.latest_baselining_job

<sagemaker.model_monitor.clarify_model_monitoring.ClarifyBaseliningJob at 0x7fd629069208>

In [35]:
#model_explainability_analysis_config = None
#if not model_explainability_monitor.latest_baselining_job:
    # Remove label because only features are required for the analysis
 #   headers_without_label_header = copy.deepcopy(all_headers)
  #  headers_without_label_header.remove(label_header)
   # model_explainability_analysis_config = ExplainabilityAnalysisConfig(
    #    explainability_config=shap_config,
     #   model_config=model_config,
      #  headers=headers_without_label_header,
    #)
    
model_explainability_monitor.create_monitoring_schedule(
    output_s3_uri=s3_report_path,
    endpoint_input=endpoint_name,
    schedule_cron_expression=CronExpressionGenerator.hourly()
    #schedule_cron_expression=schedule_expression,
    #schedule_cron_expression=CronExpressionGenerator.daily_every_x_hours(hour_interval=2, starting_hour=0)
)

## Wait for execution and inspect analysis results

Once created the schedule is started by default, here wait for the its first execution to start, then stop the schedule to avoid incurring charges.

In [37]:
def wait_for_execution_to_start(model_monitor):
    print(
        "A hourly schedule was created above and it will kick off executions ON the hour (plus 0 - 20 min buffer)."
    )

    print("Waiting for the first execution to happen", end="")
    schedule_desc = model_monitor.describe_schedule()
    while "LastMonitoringExecutionSummary" not in schedule_desc:
        schedule_desc = model_monitor.describe_schedule()
        print(".", end="", flush=True)
        time.sleep(60)
    print()
    print("Done! Execution has been created")

    print("Now waiting for execution to start", end="")
    while schedule_desc["LastMonitoringExecutionSummary"]["MonitoringExecutionStatus"] in "Pending":
        schedule_desc = model_monitor.describe_schedule()
        print(".", end="", flush=True)
        time.sleep(10)

    print()
    print("Done! Execution has started")

In [38]:
wait_for_execution_to_start(model_explainability_monitor)

A hourly schedule was created above and it will kick off executions ON the hour (plus 0 - 20 min buffer).
Waiting for the first execution to happen.........................................
Done! Execution has been created
Now waiting for execution to start
Done! Execution has started


In [None]:
#model_explainability_monitor.stop_monitoring_schedule()
#model_explainability_monitor.delete_monitoring_schedule()

Wait further for the execution to finish, then inspect its analysis results,

In [39]:
# Waits for the schedule to have last execution in a terminal status.
def wait_for_execution_to_finish(model_monitor):
    schedule_desc = model_monitor.describe_schedule()
    execution_summary = schedule_desc.get("LastMonitoringExecutionSummary")
    if execution_summary is not None:
        print("Waiting for execution to finish", end="")
        while execution_summary["MonitoringExecutionStatus"] not in [
            "Completed",
            "CompletedWithViolations",
            "Failed",
            "Stopped",
        ]:
            print(".", end="", flush=True)
            time.sleep(60)
            schedule_desc = model_monitor.describe_schedule()
            execution_summary = schedule_desc["LastMonitoringExecutionSummary"]
        print()
        print("Done! Execution has finished")
    else:
        print("Last execution not found")

In [40]:
wait_for_execution_to_finish(model_explainability_monitor)

Waiting for execution to finish
Done! Execution has finished


In [42]:
schedule_desc = model_explainability_monitor.describe_schedule()
execution_summary = schedule_desc.get("LastMonitoringExecutionSummary")
if execution_summary and execution_summary["MonitoringExecutionStatus"] in [
    "Completed",
    "CompletedWithViolations",
]:
    last_model_explainability_monitor_execution = model_explainability_monitor.list_executions()[-1]
    last_model_explainability_monitor_execution_report_uri = (
        last_model_explainability_monitor_execution.output.destination
    )
    print(f"Report URI: {last_model_explainability_monitor_execution_report_uri}")
    last_model_explainability_monitor_execution_report_files = sorted(
        S3Downloader.list(last_model_explainability_monitor_execution_report_uri)
    )
    print("Found Report Files:")
    print("\n ".join(last_model_explainability_monitor_execution_report_files))
else:
    last_model_explainability_monitor_execution = None
    print(
        "====STOP==== \n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures."
    )

====STOP==== 
 No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures.


If there are any violations compared to the baseline, they will be listed here.

In [43]:
if last_model_explainability_monitor_execution:
    model_explainability_violations = (
        last_model_explainability_monitor_execution.constraint_violations()
    )
    if model_explainability_violations:
        print(model_explainability_violations.body_dict)

The analysis results and CloudWatch metrics are visualized in SageMaker Studio. Select the Endpoints tab, then double click the endpoint to show the UI.

# PART D: Cleanup

The endpoint can keep running and capturing data, but if there is no plan to collect more data or use this endpoint further, it should be deleted to avoid incurring additional charges. Note that deleting endpoint does not delete the data that was captured during the model invocations.

First stop the worker threads,

In [None]:
#invoke_endpoint_thread.terminate()
#ground_truth_thread.terminate()

Then stop all monitors scheduled for the endpoint

In [261]:
model_bias_monitor.delete_monitoring_schedule()


Deleting Monitoring Schedule with name: monitoring-schedule-2021-08-03-20-57-17-238


In [44]:
from sagemaker.predictor import Predictor

predictor = Predictor(endpoint_name, sagemaker_session=sagemaker_session)
model_monitors = predictor.list_monitors()
for model_monitor in model_monitors:
    model_monitor.stop_monitoring_schedule()
    wait_for_execution_to_finish(model_monitor)
    model_monitor.delete_monitoring_schedule()

NameError: name 'sagemaker_session' is not defined

Finally delete the endpoint

In [None]:
predictor.delete_endpoint()
predictor.delete_model()