## Model quality drift monitoring with Amazon SageMaker Model Monitor


This notebook provides a walkthrough of the high level steps involved in monitoring a production ML model with SageMaker Model Monitor for data drift. To demonstrate the data drift monitoring we will use a pre-trained model to deploy an endpoint.  We provide the pre-trained model artifact along with baseline and test datasets along with this notebook.

1. Set up
2. Enable datacapture on a SageMaker endpoint 
3. Generate a baseline with Model Monitor 
4. Schedule continous monitoring
5. Analyze monitoring results
6. Clean up

### 1. Set up

#### 1.1. Imports

In [1]:
import os
import boto3
import re
import json
import pandas as pd

from botocore.response import StreamingBody

from sagemaker import get_execution_role, session, Session
from sagemaker.model import Model
from sagemaker.image_uris import retrieve
from sagemaker.s3 import S3Downloader, S3Uploader
from sagemaker.model_monitor import ModelQualityMonitor
from sagemaker.model_monitor import EndpointInput
from sagemaker.model_monitor.dataset_format import DatasetFormat

from datetime import datetime, timedelta, timezone
from threading import Thread

from time import sleep
from time import gmtime, strftime


#### 1.2 Setup variables

In [2]:
region = boto3.Session().region_name

role = get_execution_role()
print("RoleArn: {}".format(role))

#This is the bucket into which the data is captured
#bucket = 'bestpractices-bucket-sm'
bucket = 'datascience-environment-notebookinstance--06dc7a0224df'
prefix = "ModelQualityMonitor"

data_capture_prefix = "{}/datacapture".format(prefix)
s3_capture_upload_path = "s3://{}/{}".format(bucket, data_capture_prefix)
reports_prefix = "{}/reports".format(prefix)
s3_report_path = "s3://{}/{}".format(bucket, reports_prefix)
code_prefix = "{}/code".format(prefix)
#s3_code_preprocessor_uri = "s3://{}/{}/{}".format(bucket, code_prefix, "preprocessor.py")
#s3_code_postprocessor_uri = "s3://{}/{}/{}".format(bucket, code_prefix, "postprocessor.py")


ground_truth_upload_path = (
    f"s3://{bucket}/{prefix}/ground_truth_data/{datetime.now():%Y-%m-%d-%H-%M-%S}"
)

print("Capture path: {}".format(s3_capture_upload_path))
print("Report path: {}".format(s3_report_path))
#print("Preproc Code path: {}".format(s3_code_preprocessor_uri))
#print("Postproc Code path: {}".format(s3_code_postprocessor_uri))

RoleArn: arn:aws:iam::802439482869:role/service-role/AmazonSageMaker-ExecutionRole-20210418T143524
Capture path: s3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/datacapture
Report path: s3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/reports


#### 1.3 Setup service clients

In [3]:
s3_client = boto3.Session().client("s3")
sagemaker_client = boto3.client('sagemaker')
sagemaker_runtime_client = boto3.Session().client("sagemaker-runtime")

### 2. Enable datacapture on a SageMaker endpoint 

Create an endpoint to showcase the data capture capability in action.

For the endpoint we will use a pre-trained XGBoost model that is ready to deploy. This model was trained in the previous chapters using the weather dataset and has been included in the model directory for ease of use.

Note that you can also train a new model and use your model and data below as well.

#### 2.1 Upload the model object into S3

In [4]:
model_file = open("model/weather-prediction-model.tar.gz", "rb")
s3_key = os.path.join(prefix, "weather-prediction-model.tar.gz")
boto3.Session().resource("s3").Bucket(bucket).Object(s3_key).upload_fileobj(model_file)

#### 2.2  Create SageMaker Model

In [5]:
model_url = "https://{}.s3-{}.amazonaws.com/{}/weather-prediction-model.tar.gz".format(
    bucket, region, prefix
)

print(model_url)

image_uri = retrieve("xgboost", boto3.Session().region_name, "1.2-1")

model = Model(image_uri=image_uri, model_data=model_url, role=role)

https://datascience-environment-notebookinstance--06dc7a0224df.s3-us-west-2.amazonaws.com/ModelQualityMonitor/weather-prediction-model.tar.gz


#### 2.3  Configure datacapture

To enable data capture on the endpoint, you specify the new capture option called `DataCaptureConfig`. On enabling data capture, input to and output from the SageMaker endpoint are captured and saved in S3. Input captured includes the live inference traffic requests and output captured includes predictions from the deployed model.

In [6]:
from sagemaker.model_monitor import DataCaptureConfig

endpoint_name = "xgb-weather-prediction-model-monitor-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("EndpointName={}".format(endpoint_name))

data_capture_config = DataCaptureConfig(
    enable_capture=True, sampling_percentage=100, destination_s3_uri=s3_capture_upload_path
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.m4.xlarge",
    endpoint_name=endpoint_name,
    data_capture_config=data_capture_config,
)

EndpointName=xgb-weather-prediction-model-monitor-2021-08-03-06-29-41
---------------!

#### 2.4 Capture data from endpoint

This step invokes the endpoint with included sample data for about 3 minutes. Data is captured based on the sampling percentage specified and the capture continues until the data capture option is turned off.

In [7]:
##Download the test files to execute inferences
#s3 = boto3.client('s3')
#s3_prefix = 'prepared'
#test_file_name='part-00000-0b01100d-c57d-4375-9fa3-e11879c4cd0a-c000.csv'
#s3.download_file(bucket, f"{s3_prefix}/test/{test_file_name}", 't_file.csv')

#with open('t_file.csv', 'r') as TF:
 #   t_lines = TF.readlines()

In [8]:
##Use the test file in the data directory  to execute inferences using the test file 't_file.csv' provided
with open('data/t_file.csv', 'r') as TF:
    t_lines = TF.readlines()

In [9]:
### Define a method to run inferences against the endpoint
def get_predictions():
    smrt = boto3.Session().client("sagemaker-runtime")
    #Skip the first line since it has column headers
    for tl in t_lines[1:50]:
        #Remove the first column since it is the label
        test_list = tl.split(",")
        test_list.pop(0)
        test_string = ','.join([str(elem) for elem in test_list])
        
        #print("invoking with payload " + test_string)
    
        result = smrt.invoke_endpoint(EndpointName=endpoint_name,
                                   ContentType="text/csv",
                                   Body=test_string)
        #print(result)                              
        rbody = StreamingBody(raw_stream=result['Body'],content_length=int(result['ResponseMetadata']['HTTPHeaders']['content-length']))
        #print(f"Result from {result['InvokedProductionVariant']} = {rbody.read().decode('utf-8')}")
        print(".", end="", flush=True)
        sleep(0.5)

In [10]:
#Get predictions
get_predictions()

.................................................

#### 2.5  View captured data

Now list the data capture files stored in Amazon S3. You should expect to see different files from different time periods organized based on the hour in which the invocation occurred. The format of the Amazon S3 path is:

`s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`

In [21]:
s3_capture_upload_path

's3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/datacapture'

In [22]:
#Note : If you see an error in this cell, it could be because the captured files didn't appear in S3 yet.
#Retry after a minute.

s3_client = boto3.Session().client("s3")
current_endpoint_capture_prefix = "{}/{}".format(data_capture_prefix, endpoint_name)
result = s3_client.list_objects(Bucket=bucket, Prefix=current_endpoint_capture_prefix)

#print(result.get("Contents"))

capture_files = [capture_file.get("Key") for capture_file in result.get("Contents")]
print("Found Capture Files:")
print("\n ".join(capture_files))

Found Capture Files:
ModelQualityMonitor/datacapture/xgb-weather-prediction-model-monitor-2021-08-03-06-29-41/AllTraffic/2021/08/03/06/37-13-691-105c7f5d-682e-4405-ba5a-0c38e213f99a.jsonl


Next, view the contents of a single capture file. Here you should see all the data captured in an Amazon SageMaker specific JSON-line formatted file. Take a quick peek at the first few lines in the captured file.

In [24]:
def get_obj_body(obj_key):
    return s3_client.get_object(Bucket=bucket, Key=obj_key).get("Body").read().decode("utf-8")


capture_file = get_obj_body(capture_files[-1])
print(capture_file[:2000])

{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0,2020,12,4,31,0,19.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0\n","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"-4.902510643005371","encoding":"CSV"}},"eventMetadata":{"eventId":"747a755b-8410-408e-86f2-1a76988d346e","inferenceTime":"2021-08-03T06:37:13Z"},"eventVersion":"0"}
{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0,2020,12,4,31,0,19.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0\n","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"-4.902510643005371","encoding":"CSV"}},"eventMetadata":{"eventId":"d631cfc1-bb7b-4a19-bdd3-5ed8f2f3bf9b","inferenceTime":"2021-08-03T06:37:14Z"},"eventVersion":"0"}
{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0,2020,12,4,31,0,19.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0\n","enc

Finally, the contents of a single line is present below in a formatted JSON file so that you can observe a little better.

In [25]:
print(json.dumps(json.loads(capture_file.split("\n")[0]), indent=2))

{
  "captureData": {
    "endpointInput": {
      "observedContentType": "text/csv",
      "mode": "INPUT",
      "data": "0,2020,12,4,31,0,19.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0\n",
      "encoding": "CSV"
    },
    "endpointOutput": {
      "observedContentType": "text/csv; charset=utf-8",
      "mode": "OUTPUT",
      "data": "-4.902510643005371",
      "encoding": "CSV"
    }
  },
  "eventMetadata": {
    "eventId": "747a755b-8410-408e-86f2-1a76988d346e",
    "inferenceTime": "2021-08-03T06:37:13Z"
  },
  "eventVersion": "0"
}


As you can see, each inference request is captured in one line in the jsonl file. The line contains both the input and output merged together. In the example, you provided the ContentType as `text/csv` which is reflected in the `observedContentType` value. Also, you expose the encoding that you used to encode the input and output payloads in the capture format with the `encoding` value.

To recap, you observed how you can enable capturing the input or output payloads to an endpoint with a new parameter. You have also observed what the captured format looks like in Amazon S3. Next, continue to explore how Amazon SageMaker helps with monitoring the data collected in Amazon S3.

### 3. Create a baseline with Model Monitor 

In addition to collecting the data, Amazon SageMaker provides the capability for you to monitor and evaluate the data observed by the endpoints. To see this in action, lets first create a baseline with will then be used to compare the realtime traffic against. 

#### 3.1 Setup the baseline dataset

For generating a baseline, you need to provide a baseline dataset.  

The model quality baseline job compares the labels in a baseline data set with 
the predictions made by the model. So instead of using the training data directly, 
you have to first generate a baseline dataset consisting of labels by running predictions 
against the model. In this example, we will use the validation dataset to run predictions against the model and use the results as input to the baseline generation job. 

In [26]:
##Use the validation file in the data directory  to execute inferences using the test file 'validation_file.csv' provided
with open('data/validation_data.csv', 'r') as TF:
    v_lines = TF.readlines()

In [27]:
model_baseline_file = 'model-quality-baseline-data.csv'

In [28]:
with open(f"data/{model_baseline_file}", "w") as baseline_file:
    baseline_file.write("probability,prediction,label\n")  # Header of the file
    for vl in v_lines[1:300]:
        #Remove the first column since it is the label
        validation_list = vl.split(",")
        label = validation_list.pop(0)
        validation_string = ','.join([str(elem) for elem in validation_list])
        
        #print("invoking with payload " + test_string)
    
        result = sagemaker_runtime_client.invoke_endpoint(EndpointName=endpoint_name,
                                   ContentType="text/csv",
                                   Body=validation_string)
        #print(result)                              
        rbody = StreamingBody(raw_stream=result['Body'],content_length=int(result['ResponseMetadata']['HTTPHeaders']['content-length']))
        #print(rbody)
        prediction = rbody.read().decode('utf-8')
        #print('prediction : ' , prediction)
        ##Using prediction as the probability
        baseline_file.write(f"{prediction},{prediction},{label}\n")
        #print(f"label {label} ; prediction {prediction} ")
        print(".", end="", flush=True)
        sleep(0.5)

...........................................................................................................................................................................................................................................................................................................

In [31]:
!head data/{model_baseline_file}

probability,prediction,label
-4.902510643005371,-4.902510643005371,-7.535634515882377
-4.902510643005371,-4.902510643005371,-7.535634515882377
-4.902510643005371,-4.902510643005371,-7.535634515882377
-4.902510643005371,-4.902510643005371,-7.535634515882377
-4.902510643005371,-4.902510643005371,-7.535634515882377
-4.902510643005371,-4.902510643005371,-7.535634515882377
-4.902510643005371,-4.902510643005371,-7.535634515882377
-4.902510643005371,-4.902510643005371,-7.535634515882377
-4.902510643005371,-4.902510643005371,-7.535634515882377


#### 3.2 Upload the predictions as a baseline dataset.
Now we will upload the predictions made using validation dataset to S3 which will be used for creating model quality baseline statistics and constraints.

In [32]:
baseline_prefix = prefix + "/baselining"
baseline_data_prefix = baseline_prefix + "/data"
baseline_results_prefix = baseline_prefix + "/results"

baseline_data_uri = f"s3://{bucket}/{baseline_data_prefix}"
baseline_results_uri = f"s3://{bucket}/{baseline_results_prefix}"
print(f"Baseline data uri: {baseline_data_uri}")
print(f"Baseline results uri: {baseline_results_uri}")

Baseline data uri: s3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/baselining/data
Baseline results uri: s3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/baselining/results


In [33]:
baseline_dataset_uri = S3Uploader.upload(f"data/{model_baseline_file}", baseline_data_uri)
baseline_dataset_uri

's3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/baselining/data/model-quality-baseline-data.csv'

#### 3.3 Create a baselining job

Now that you have the training data ready in Amazon S3, start a job to `suggest` constraints. `ModelQualityMonitor.suggest_baseline(..)` starts a `ProcessingJob` using an Amazon SageMaker provided Model Monitor container to generate the constraints.

In [36]:
session = Session()

model_quality_monitor = ModelQualityMonitor(
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=20,
    max_runtime_in_seconds=1800,
    sagemaker_session=session,
)

In [37]:
# Name of the model quality baseline job
baseline_job_name = f"model-quality-baseline-job-{datetime.utcnow():%Y-%m-%d-%H%M}"

In [38]:
baseline_dataset_uri

's3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/baselining/data/model-quality-baseline-data.csv'

In [39]:
# Execute the baseline suggestion job.
# You will specify problem type, in this case regression, and provide other required attributes.
job = model_quality_monitor.suggest_baseline(
    job_name=baseline_job_name,
    baseline_dataset=baseline_dataset_uri,
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=baseline_results_uri,
    problem_type="Regression",
    inference_attribute="prediction",
    probability_attribute="probability",
    ground_truth_attribute="label",
)
job.wait(logs=False)





Job Name:  model-quality-baseline-job-2021-08-03-0641
Inputs:  [{'InputName': 'baseline_dataset_input', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/baselining/data/model-quality-baseline-data.csv', 'LocalPath': '/opt/ml/processing/input/baseline_dataset_input', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]
Outputs:  [{'OutputName': 'monitoring_output', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/baselining/results', 'LocalPath': '/opt/ml/processing/output', 'S3UploadMode': 'EndOfJob'}}]
.................................................................!

#### 3.4 Explore the results of the baselining job
You could see the baseline constraints and statistics files are uploaded to the S3 location.

In [40]:
baseline_job = model_quality_monitor.latest_baselining_job

##### 3.4.1 View the metrics generated
You could see that the baseline statistics and constraints files are already uploaded to S3.

In [41]:
#baseline_job.baseline_statistics().body_dict

In [42]:

binary_metrics = baseline_job.baseline_statistics().body_dict["regression_metrics"]
pd.json_normalize(binary_metrics).T

Unnamed: 0,0
mae.value,3.955825
mae.standard_deviation,0.058059
mse.value,18.548753
mse.standard_deviation,0.495597
rmse.value,4.306826
rmse.standard_deviation,0.058016
r2.value,-7.741915
r2.standard_deviation,1.346406


##### 3.4.2 View the constraints generated

In [43]:
pd.DataFrame(baseline_job.suggested_constraints().body_dict["regression_constraints"]).T

Unnamed: 0,threshold,comparison_operator
mae,3.95583,GreaterThanThreshold
mse,18.5488,GreaterThanThreshold
rmse,4.30683,GreaterThanThreshold
r2,-7.74191,LessThanThreshold


### 4. Schedule continous monitoring
When you have collected the data above, analyze and monitor the data with Monitoring Schedules

#### 4.1 Generate prediction data for Model Quality  Monitoring

Start generating some artificial traffic.  The cell below starts a thread to send some traffic to the endpoint. Note that you need to stop the kernel to terminate this thread. If there is no traffic, the monitoring jobs are marked as `Failed` since there is no data to process.

In [44]:
#smrt = boto3.Session().client("sagemaker-runtime")

#with open(f"data/{validate_dataset}", "w") as baseline_file:
 #   baseline_file.write("probability,prediction,label\n")  # Header of the file
i = 0
for tl in t_lines[1:300]:
        #Remove the first column since it is the label
        test_list = tl.split(",")
        label = test_list.pop(0)
        test_string = ','.join([str(elem) for elem in test_list])
        
        #print("invoking with payload " + test_string)
    
        result = sagemaker_runtime_client.invoke_endpoint(EndpointName=endpoint_name,
                                   ContentType="text/csv",
                                   Body=test_string,
                                    InferenceId=str(i))  # unique ID per row
        #print(result)                              
        rbody = StreamingBody(raw_stream=result['Body'],content_length=int(result['ResponseMetadata']['HTTPHeaders']['content-length']))
        #print(rbody)
        prediction = rbody.read().decode('utf-8')
        #print('prediction : ' , prediction)
        ##Using prediction as the probability
        #baseline_file.write(f"{prediction},{prediction},{label}\n")
        #print(f"label {label} ; prediction {prediction} ")
        print(".", end="", flush=True)
        i += 1
        sleep(0.5)
        

...........................................................................................................................................................................................................................................................................................................

Notice the new attribute `inferenceId`, which we're setting when invoking the endpoint. This is used to join the prediction data with the ground truth data.

#### 4.2 View captured data

Now list the data capture files stored in Amazon S3. You should expect to see different files from different time periods organized based on the hour in which the invocation occurred. The format of the Amazon S3 path is:

`s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`

In [45]:
print("Waiting for captures to show up", end="")
for _ in range(120):
    capture_files = sorted(S3Downloader.list(f"{s3_capture_upload_path}/{endpoint_name}"))
    if capture_files:
        capture_file = S3Downloader.read_file(capture_files[-1]).split("\n")
        capture_record = json.loads(capture_file[0])
        if "inferenceId" in capture_record["eventMetadata"]:
            break
    print(".", end="", flush=True)
    sleep(1)
print()
print("Found Capture Files:")
print("\n ".join(capture_files[-3:]))

Waiting for captures to show up
Found Capture Files:
s3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/datacapture/xgb-weather-prediction-model-monitor-2021-08-03-06-29-41/AllTraffic/2021/08/03/06/41-08-476-2e02d9ae-512f-4188-9d8d-0ed1f1534950.jsonl
 s3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/datacapture/xgb-weather-prediction-model-monitor-2021-08-03-06-29-41/AllTraffic/2021/08/03/06/47-09-023-c2948643-51cc-456c-9c9c-6520614a3f77.jsonl
 s3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/datacapture/xgb-weather-prediction-model-monitor-2021-08-03-06-29-41/AllTraffic/2021/08/03/06/48-09-401-087f7348-9179-442d-a6ae-11317b831d87.jsonl


Next, view the contents of a single capture file. Here you should see all the data captured in an Amazon SageMaker specific JSON-line formatted file. Take a quick peek at the first few lines in the captured file.

In [46]:
print("\n".join(capture_file[-3:-1]))

{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0,2020,12,4,31,0,2791.0,76.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0\n","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"0.17314249277114868","encoding":"CSV"}},"eventMetadata":{"eventId":"73cccb5c-fb3a-4973-b55c-5d0e3c043cf6","inferenceId":"234","inferenceTime":"2021-08-03T06:49:08Z"},"eventVersion":"0"}
{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0,2020,12,4,31,0,4252.0,39.0,2.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0\n","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"0.17314249277114868","encoding":"CSV"}},"eventMetadata":{"eventId":"96c5a119-2c78-4358-b01f-5b8fae830240","inferenceId":"235","inferenceTime":"2021-08-03T06:49:09Z"},"eventVersion":"0"}


Finally, the contents of a single line is present below in a formatted JSON file so that you can observe a little better.

Again, notice the `inferenceId` attribute that is set as part of the invoke_endpoint call.  If this is present, it will be used to join with ground truth data (otherwise `eventId` will be used):

In [47]:
print(json.dumps(capture_record, indent=2))

{
  "captureData": {
    "endpointInput": {
      "observedContentType": "text/csv",
      "mode": "INPUT",
      "data": "0,2020,12,4,31,0,80.0,0.0,6.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0\n",
      "encoding": "CSV"
    },
    "endpointOutput": {
      "observedContentType": "text/csv; charset=utf-8",
      "mode": "OUTPUT",
      "data": "-4.3170342445373535",
      "encoding": "CSV"
    }
  },
  "eventMetadata": {
    "eventId": "d9b99b7b-b3f9-4a4f-bd89-a710b234ea6a",
    "inferenceId": "118",
    "inferenceTime": "2021-08-03T06:48:09Z"
  },
  "eventVersion": "0"
}


#### 4.3 Generate synthetic ground truth

Next, start generating ground truth data. The model quality job will fail if there's no ground truth data to merge.

In [48]:
import random


def ground_truth_with_id(inference_id):
    random.seed(inference_id)  # to get consistent results
    rand = random.random()
    return {
        "groundTruthData": {
            "data": "1" if rand < 0.7 else "0",  # randomly generate positive labels 70% of the time #
             # TODO : Need to make this a decimal??
            "encoding": "CSV",
        },
        "eventMetadata": {
            "eventId": str(inference_id),
        },
        "eventVersion": "0",
    }


def upload_ground_truth(records, upload_time):
    fake_records = [json.dumps(r) for r in records]
    data_to_upload = "\n".join(fake_records)
    target_s3_uri = f"{ground_truth_upload_path}/{upload_time:%Y/%m/%d/%H/%M%S}.jsonl"
    print(f"Uploading {len(fake_records)} records to", target_s3_uri)
    S3Uploader.upload_string_as_file_body(data_to_upload, target_s3_uri)

In [49]:
#NUM_GROUND_TRUTH_RECORDS = 334  # 334 are the number of rows in data we're sending for inference
NUM_GROUND_TRUTH_RECORDS = 300


def generate_fake_ground_truth_forever():
    j = 0
    while True:
        fake_records = [ground_truth_with_id(i) for i in range(NUM_GROUND_TRUTH_RECORDS)]
        upload_ground_truth(fake_records, datetime.utcnow())
        j = (j + 1) % 5
        sleep(60 * 60)  # do this once an hour


gt_thread = Thread(target=generate_fake_ground_truth_forever)
gt_thread.start()

Uploading 300 records to s3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/ground_truth_data/2021-08-03-06-29-40/2021/08/03/06/4942.jsonl


#### 4.4 Create a monitoring schedule

Now that you have the baseline information and ground truth labels, create a monitoring schedule to run model quality monitoring job.

In [50]:
##Monitoring schedule name
model_quality_monitor_schedule_name = (
    f"model-quality-monitoring-schedule-{datetime.utcnow():%Y-%m-%d-%H%M}"
)

For the monitoring schedule you need to specify how to interpret an endpoint's output. Given that the endpoint in this notebook outputs CSV data, the below code specifies that the first column of the output, `0`, contains a probability (of churn in this example). You will further specify `0.5` as the cutoff  used to determine a positive label (that is, predict that a customer will churn).

In [51]:
# Create an enpointInput
endpointInput = EndpointInput(
    endpoint_name=endpoint_name,
    inference_attribute='0',
    #probability_attribute="0",
    #probability_threshold_attribute=0.5,
    destination="/opt/ml/processing/input_data",
)

In [52]:
# Create the monitoring schedule to execute every hour.
from sagemaker.model_monitor import CronExpressionGenerator

response = model_quality_monitor.create_monitoring_schedule(
    monitor_schedule_name=model_quality_monitor_schedule_name,
    endpoint_input=endpointInput,
    output_s3_uri=baseline_results_uri,
    problem_type="Regression",
    ground_truth_input=ground_truth_upload_path,
    constraints=baseline_job.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True
)

In [53]:
# Create the monitoring schedule
# You will see the monitoring schedule in the 'Scheduled' status
model_quality_monitor.describe_schedule()

{'MonitoringScheduleArn': 'arn:aws:sagemaker:us-west-2:802439482869:monitoring-schedule/model-quality-monitoring-schedule-2021-08-03-0649',
 'MonitoringScheduleName': 'model-quality-monitoring-schedule-2021-08-03-0649',
 'MonitoringScheduleStatus': 'Pending',
 'MonitoringType': 'ModelQuality',
 'CreationTime': datetime.datetime(2021, 8, 3, 6, 49, 43, 236000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2021, 8, 3, 6, 49, 43, 259000, tzinfo=tzlocal()),
 'MonitoringScheduleConfig': {'ScheduleConfig': {'ScheduleExpression': 'cron(0 * ? * * *)'},
  'MonitoringJobDefinitionName': 'model-quality-job-definition-2021-08-03-06-49-42-976',
  'MonitoringType': 'ModelQuality'},
 'EndpointName': 'xgb-weather-prediction-model-monitor-2021-08-03-06-29-41',
 'ResponseMetadata': {'RequestId': '67097672-ccd3-480e-98a2-c0f1d77e2ce0',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '67097672-ccd3-480e-98a2-c0f1d77e2ce0',
   'content-type': 'application/x-amz-json-1.1',
   'cont

In [54]:
#### 4.5 Examine monitoring schedule executions

In [55]:
# Initially there will be no executions since the first execution happens at the top of the hour
# Note that it is common for the execution to luanch upto 20 min after the hour.
executions = model_quality_monitor.list_executions()
executions

No executions found for schedule. monitoring_schedule_name: model-quality-monitoring-schedule-2021-08-03-0649


[]

In [56]:
# Wait for the first execution of the monitoring_schedule
print("Waiting for first execution", end="")
while True:
    execution = model_quality_monitor.describe_schedule().get(
        "LastMonitoringExecutionSummary"
    )
    if execution:
        break
    print(".", end="", flush=True)
    sleep(10)
print()
print("Execution found!")

Waiting for first execution............................................................................
Execution found!


In [57]:
while not executions:
    executions = model_quality_monitor.list_executions()
    sleep(10)
latest_execution = executions[-1]
latest_execution.describe()

{'ProcessingInputs': [{'InputName': 'groundtruth_input_1',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/ground_truth_data/2021-08-03-06-29-40/2021/08/03/06',
    'LocalPath': '/opt/ml/processing/groundtruth/2021/08/03/06',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated',
    'S3CompressionType': 'None'}},
  {'InputName': 'endpoint_input_1',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/datacapture/xgb-weather-prediction-model-monitor-2021-08-03-06-29-41/AllTraffic/2021/08/03/06',
    'LocalPath': '/opt/ml/processing/input_data/xgb-weather-prediction-model-monitor-2021-08-03-06-29-41/AllTraffic/2021/08/03/06',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated',
    'S3CompressionType': 'None'}}],
 'ProcessingOutpu

##### Inspect a specific execution (latest execution)
In the previous cell, you picked up the latest completed or failed scheduled execution. Here are the possible terminal states and what each of them mean: 
* Completed - This means the monitoring execution completed and no issues were found in the violations report.
* CompletedWithViolations - This means the execution completed, but constraint violations were detected.
* Failed - The monitoring execution failed, maybe due to client error (perhaps incorrect role permissions) or infrastructure issues. Further examination of FailureReason and ExitMessage is necessary to identify what exactly happened.
* Stopped - job exceeded max runtime or was manually stopped.

In [58]:
status = execution["MonitoringExecutionStatus"]

while status in ["Pending", "InProgress"]:
    print("Waiting for execution to finish", end="")
    latest_execution.wait(logs=False)
    latest_job = latest_execution.describe()
    print()
    print(f"{latest_job['ProcessingJobName']} job status:", latest_job["ProcessingJobStatus"])
    print(
        f"{latest_job['ProcessingJobName']} job exit message, if any:",
        latest_job.get("ExitMessage"),
    )
    print(
        f"{latest_job['ProcessingJobName']} job failure reason, if any:",
        latest_job.get("FailureReason"),
    )
    sleep(
        30
    )  # model quality executions consist of two Processing jobs, wait for second job to start
    latest_execution = model_quality_monitor.list_executions()[-1]
    execution = model_quality_monitor.describe_schedule()["LastMonitoringExecutionSummary"]
    status = execution["MonitoringExecutionStatus"]

print("Execution status is:", status)

if status != "Completed":
    print(execution)
    print(
        "====STOP==== \n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures."
    )

Waiting for execution to finish...........................................................!
groundtruth-merge-202108030700-259fee1f24dfb6ac09b389dd job status: Completed
groundtruth-merge-202108030700-259fee1f24dfb6ac09b389dd job exit message, if any: None
groundtruth-merge-202108030700-259fee1f24dfb6ac09b389dd job failure reason, if any: None
Waiting for execution to finish............................................................!
model-quality-monitoring-202108030700-259fee1f24dfb6ac09b389dd job status: Completed
model-quality-monitoring-202108030700-259fee1f24dfb6ac09b389dd job exit message, if any: CompletedWithViolations: Job completed successfully with 1 violations.
model-quality-monitoring-202108030700-259fee1f24dfb6ac09b389dd job failure reason, if any: None
Execution status is: CompletedWithViolations
{'MonitoringScheduleName': 'model-quality-monitoring-schedule-2021-08-03-0649', 'ScheduledTime': datetime.datetime(2021, 8, 3, 7, 0, tzinfo=tzlocal()), 'CreationTime': datetim

In [59]:
latest_execution = model_quality_monitor.list_executions()[-1]
report_uri = latest_execution.describe()["ProcessingOutputConfig"]["Outputs"][0]["S3Output"][
    "S3Uri"
]
print("Report Uri:", report_uri)

Report Uri: s3://datascience-environment-notebookinstance--06dc7a0224df/ModelQualityMonitor/baselining/results/xgb-weather-prediction-model-monitor-2021-08-03-06-29-41/model-quality-monitoring-schedule-2021-08-03-0649/2021/08/03/07


### 5. Analyze monitoring results

#### 5.1 View violations generated by monitoring schedule

If there are any violations compared to the baseline, they will be listed in the reports uploaded to S3.

In [60]:
pd.options.display.max_colwidth = None
violations = latest_execution.constraint_violations().body_dict["violations"]
violations_df = pd.json_normalize(violations)
violations_df.head(10)

Unnamed: 0,constraint_check_type,description,metric_name
0,LessThanThreshold,Metric r2 with -53.790146743270654 +/- 4.829550001308316 was LessThanThreshold '-7.7419149776821',r2


Here you can see that one of the violations generated is that the f2 score is less than the threshold value set as part of baselining.

In addition to the violations, the monitoring schedule also emits CloudWatch metrics. In this section, you will view the metrics generated and setup an CloudWatch alarm to be triggered when the model quality drifts from the baseline thresholds. You could use CloudWatch alarms to trigger remedial actions such as retraining your model or updating the training dataset.

#### 5.2 List the CW metrics generated.

In [61]:
# Create CloudWatch client
cw_client = boto3.Session().client("cloudwatch")

namespace = "aws/sagemaker/Endpoints/model-metrics"

cw_dimensions = [
    {"Name": "Endpoint", "Value": endpoint_name},
    {"Name": "MonitoringSchedule", "Value": model_quality_monitor_schedule_name},
]

In [62]:
# List metrics through the pagination interface
paginator = cw_client.get_paginator("list_metrics")

for response in paginator.paginate(Dimensions=cw_dimensions, Namespace=namespace):
    model_quality_metrics = response["Metrics"]
    for metric in model_quality_metrics:
        print(metric["MetricName"])

mae
mse
rmse
r2


#### 5.3 Create a CloudWatch Alarm

Based on the cloud watch metrics, you can create a cloud watch alarm when a specific metric does not meet the threshold configured. Here you will create an alarm if the f2 value of the model fall below the threshold suggested by the baseline constraints.

In [63]:
alarm_name = "MODEL_QUALITY_R2_SCORE"
alarm_desc = (
    "Trigger an CloudWatch alarm when the r2 score drifts away from the baseline constraints"
)
mdoel_quality_r2_drift_threshold = (
    -7.7419149776821  ##Setting this threshold purposefully low to see the alarm quickly.  
)
metric_name = "r2"
namespace = "aws/sagemaker/Endpoints/model-metrics"

cw_client.put_metric_alarm(
    AlarmName=alarm_name,
    AlarmDescription=alarm_desc,
    ActionsEnabled=True,
    MetricName=metric_name,
    Namespace=namespace,
    Statistic="Average",
    Dimensions=[
        {"Name": "Endpoint", "Value": endpoint_name},
        {"Name": "MonitoringSchedule", "Value": model_quality_monitor_schedule_name},
    ],
    Period=600,
    EvaluationPeriods=1,
    DatapointsToAlarm=1,
    Threshold=mdoel_quality_r2_drift_threshold,
    ComparisonOperator="LessThanOrEqualToThreshold",
    TreatMissingData="breaching",
)

{'ResponseMetadata': {'RequestId': '63071239-3a34-4ab0-b4d7-022cba8fde19',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '63071239-3a34-4ab0-b4d7-022cba8fde19',
   'content-type': 'text/xml',
   'content-length': '214',
   'date': 'Tue, 03 Aug 2021 07:13:41 GMT'},
  'RetryAttempts': 0}}

#### 5.3 Validation
In a few minutes, you should see a CloudWatch alarm created. The alarm will first be in "Insufficient Data" state and moves into "Alert" state. This can be verified in the CloudWatch console

<IMG src=images/r2_InsufficientData.png/>

<IMG src=images/r2_Alarm.png/>

Once the CW Alarm is generated, you can decide on what actions you want to take on these alerts.  A possible action could be updating the training data an retraining the model 

### Other commands
We can also start and stop the monitoring schedules.

In [64]:
#model_quality_monitor.stop_monitoring_schedule()
# model_quality_monitor.start_monitoring_schedule()

### 6. Clean up 

You can keep your endpoint running to continue capturing data. If you do not plan to collect more data or use this endpoint further, you should delete the endpoint to avoid incurring additional charges. Note that deleting your endpoint does not delete the data that was captured during the model invocations. That data persists in Amazon S3 until you delete it yourself.

But before that, you need to delete the schedule first.

In [65]:
model_quality_monitor.delete_monitoring_schedule()


Deleting Monitoring Schedule with name: model-quality-monitoring-schedule-2021-08-03-0649


In [67]:
##Delete the endpoint
response = sagemaker_client.delete_endpoint(
    EndpointName=endpoint_name
)

ClientError: An error occurred (ValidationException) when calling the DeleteEndpoint operation: Could not find endpoint "arn:aws:sagemaker:us-west-2:802439482869:endpoint/xgb-weather-prediction-model-monitor-2021-08-03-06-29-41".