## Data drift monitoring with Amazon SageMaker Model Monitor


This notebook provides a walkthrough of the high level steps involved in monitoring a production ML model with SageMaker Model Monitor for data drift. To demonstrate the data drift monitoring we will use a pre-trained model to deploy an endpoint.  We provide the pre-trained model artifact along with baseline and test datasets along with this notebook.

1. Set up
2. Enable datacapture on a SageMaker endpoint 
3. Generate a baseline with Model Monitor 
4. Schedule continous monitoring
5. Analyze monitoring results
6. Clean up

### 1. Set up

#### 1.1. Imports

In [19]:
import os
import boto3
import re
import json
import time
import pandas as pd

from botocore.response import StreamingBody
from sagemaker import get_execution_role, session
from sagemaker.model import Model
from sagemaker.image_uris import retrieve
from sagemaker.model_monitor import DataCaptureConfig

from time import gmtime, strftime

#### 1.2 Setup variables

In [20]:
region = boto3.Session().region_name

role = get_execution_role()

#This is the bucket into which the data is captured
#Set the s3_bucket to the correct bucket name created in your datascience environment
s3_bucket = 'datascience-environment-notebookinstance--06dc7a0224df'
prefix = "DataDrift-ModelMonitor"

data_capture_prefix = "{}/datacapture".format(prefix)
s3_capture_upload_path = "s3://{}/{}".format(s3_bucket, data_capture_prefix)
reports_prefix = "{}/reports".format(prefix)
s3_report_path = "s3://{}/{}".format(s3_bucket, reports_prefix)
code_prefix = "{}/code".format(prefix)

print("Capture path: {}".format(s3_capture_upload_path))
print("Report path: {}".format(s3_report_path))

Capture path: s3://datascience-environment-notebookinstance--06dc7a0224df/DataDrift-ModelMonitor/datacapture
Report path: s3://datascience-environment-notebookinstance--06dc7a0224df/DataDrift-ModelMonitor/reports


#### 1.3 Setup service clients

In [21]:
s3_client = boto3.Session().client("s3")
sagemaker_client = boto3.client('sagemaker')

### 2. Enable datacapture on a SageMaker endpoint 

Create an endpoint to showcase the data capture capability in action.

For the endpoint we will use a pre-trained XGBoost model that is ready to deploy. This model was trained in the previous chapters using the weather dataset and has been included in the model directory for ease of use.

Note that you can also train a new model and use your model and data below as well.


#### 2.1 Upload the model object into S3

In [22]:
model_file = open("model/weather-prediction-model.tar.gz", "rb")
s3_key = os.path.join(prefix, "weather-prediction-model.tar.gz")
boto3.Session().resource("s3").Bucket(s3_bucket).Object(s3_key).upload_fileobj(model_file)

#### 2.2  Create SageMaker Model

In [23]:
model_url = "https://{}.s3-{}.amazonaws.com/{}/weather-prediction-model.tar.gz".format(
    s3_bucket, region, prefix
)

print(model_url)

image_uri = retrieve("xgboost", boto3.Session().region_name, "1.2-1")

model = Model(image_uri=image_uri, model_data=model_url, role=role)

https://datascience-environment-notebookinstance--06dc7a0224df.s3-us-west-2.amazonaws.com/DataDrift-ModelMonitor/weather-prediction-model.tar.gz


#### 2.3  Configure datacapture

To enable data capture on the endpoint, you specify the new capture option called `DataCaptureConfig`. On enabling data capture, input to and output from the SageMaker endpoint are captured and saved in S3. Input captured includes the live inference traffic requests and output captured includes predictions from the deployed model.

In [24]:
endpoint_name = "xgb-weather-prediction-model-monitor-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("EndpointName={}".format(endpoint_name))

data_capture_config = DataCaptureConfig(
    enable_capture=True, sampling_percentage=100, destination_s3_uri=s3_capture_upload_path
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.m4.xlarge",
    endpoint_name=endpoint_name,
    data_capture_config=data_capture_config,
)

EndpointName=xgb-weather-prediction-model-monitor-2021-08-03-03-01-53
-----------------!

#### 2.4 Capture data from endpoint 

This step invokes the endpoint with included sample data for about 3 minutes. Data is captured based on the sampling percentage specified and the capture continues until the data capture option is turned off.

In [25]:
##Use the test file in the data directory  to execute inferences using the test file 't_file.csv' provided
with open('data/t_file.csv', 'r') as TF:
    t_lines = TF.readlines()

In [26]:
### Define a method to run inferences against the endpoint
def get_predictions():
    smrt = boto3.Session().client("sagemaker-runtime")
    #Skip the first line since it has column headers
    for tl in t_lines[1:50]:
        #Remove the first column since it is the label
        test_list = tl.split(",")
        test_list.pop(0)
        test_string = ','.join([str(elem) for elem in test_list])
        
        #print("invoking with payload " + test_string)
    
        result = smrt.invoke_endpoint(EndpointName=endpoint_name,
                                   ContentType="text/csv",
                                   Body=test_string)
        rbody = StreamingBody(raw_stream=result['Body'],content_length=int(result['ResponseMetadata']['HTTPHeaders']['content-length']))
        #print(f"Result from {result['InvokedProductionVariant']} = {rbody.read().decode('utf-8')}")
        print(".", end="", flush=True)
        time.sleep(0.5)

In [27]:
#Get predictions
get_predictions()

.................................................

#### 2.5  View captured data

Now list the data capture files stored in Amazon S3. You should expect to see different files from different time periods organized based on the hour in which the invocation occurred. The format of the Amazon S3 path is:

`s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`

In [28]:
s3_capture_upload_path

's3://datascience-environment-notebookinstance--06dc7a0224df/DataDrift-ModelMonitor/datacapture'

In [31]:
#Note : If you see an error in this cell, it could be because the captured files didn't appear in S3 yet.
#Retry after a minute.
current_endpoint_capture_prefix = "{}/{}".format(data_capture_prefix, endpoint_name)

result = s3_client.list_objects(Bucket=s3_bucket, Prefix=current_endpoint_capture_prefix)
capture_files = [capture_file.get("Key") for capture_file in result.get("Contents")]
print("Found Capture Files:")
print("\n ".join(capture_files))

Found Capture Files:
DataDrift-ModelMonitor/datacapture/xgb-weather-prediction-model-monitor-2021-08-03-03-01-53/AllTraffic/2021/08/03/03/10-26-266-0d88af2b-1330-475f-9ee6-fe0b31295b77.jsonl


Next, view the contents of a single capture file. Here you should see all the data captured in an Amazon SageMaker specific JSON-line formatted file. Take a quick peek at the first few lines in the captured file.

In [33]:
def get_obj_body(obj_key):
    return s3_client.get_object(Bucket=s3_bucket, Key=obj_key).get("Body").read().decode("utf-8")


capture_file = get_obj_body(capture_files[-1])
print(capture_file[:2000])

{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0,2020,12,4,31,0,19.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0\n","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"-4.902510643005371","encoding":"CSV"}},"eventMetadata":{"eventId":"f5677ce4-2bb2-48a9-bf05-267b8a62ebe4","inferenceTime":"2021-08-03T03:10:26Z"},"eventVersion":"0"}
{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0,2020,12,4,31,0,19.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0\n","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"-4.902510643005371","encoding":"CSV"}},"eventMetadata":{"eventId":"02b22ee8-4084-4591-ad4e-8f63059d5ce6","inferenceTime":"2021-08-03T03:10:26Z"},"eventVersion":"0"}
{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0,2020,12,4,31,0,19.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0\n","enc

Finally, the contents of a single line is present below in a formatted JSON file so that you can observe a little better.

In [34]:
print(json.dumps(json.loads(capture_file.split("\n")[0]), indent=2))

{
  "captureData": {
    "endpointInput": {
      "observedContentType": "text/csv",
      "mode": "INPUT",
      "data": "0,2020,12,4,31,0,19.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0\n",
      "encoding": "CSV"
    },
    "endpointOutput": {
      "observedContentType": "text/csv; charset=utf-8",
      "mode": "OUTPUT",
      "data": "-4.902510643005371",
      "encoding": "CSV"
    }
  },
  "eventMetadata": {
    "eventId": "f5677ce4-2bb2-48a9-bf05-267b8a62ebe4",
    "inferenceTime": "2021-08-03T03:10:26Z"
  },
  "eventVersion": "0"
}


As you can see, each inference request is captured in one line in the jsonl file. The line contains both the input and output merged together. In the example, you provided the ContentType as `text/csv` which is reflected in the `observedContentType` value. Also, you expose the encoding that you used to encode the input and output payloads in the capture format with the `encoding` value.

### 3. Create a baseline with Model Monitor 

In addition to collecting the data, Amazon SageMaker provides the capability for you to monitor and evaluate the data observed by the endpoints. To see this in action, lets first create a baseline with will then be used to compare the realtime traffic against. 

#### 3.1 Setup the baseline dataset

For generating a baseline, you need to provide a baseline dataset.  The training dataset with which you trained the model is usually a good baseline dataset.

From the training dataset you can ask Amazon SageMaker to suggest a set of baseline `constraints` and generate descriptive `statistics` to explore the data. For this example, we provide a subset of training data in the data directory to use as baseline.  

Alternative you can also point to the complete training data avilable in S3 directly as well.

In [37]:
# copy over the training dataset to Amazon S3 (if you already have it in Amazon S3, you could reuse it)
baseline_prefix = prefix + "/baselining"
baseline_data_prefix = baseline_prefix + "/data"
baseline_results_prefix = baseline_prefix + "/results"
baseline_data_uri = "s3://{}/{}".format(s3_bucket, baseline_data_prefix)
baseline_results_uri = "s3://{}/{}".format(s3_bucket, baseline_results_prefix)
print("Baseline data uri: {}".format(baseline_data_uri))
print("Baseline results uri: {}".format(baseline_results_uri))

Baseline data uri: s3://datascience-environment-notebookinstance--06dc7a0224df/DataDrift-ModelMonitor/baselining/data
Baseline results uri: s3://datascience-environment-notebookinstance--06dc7a0224df/DataDrift-ModelMonitor/baselining/results


In [40]:
#Uploading the baseline data provided in the data directory to S3 bucket
#Note that this step will not be necessary if you are directly using the training data in s3 bucket as baseline
baseline_data_file = open("data/data-drift-baseline-data.csv", "rb")
s3_key = os.path.join(baseline_data_prefix, "data-drift-baseline-data.csv")
boto3.Session().resource("s3").Bucket(s3_bucket).Object(s3_key).upload_fileobj(baseline_data_file)

#### 3.2 Create a baselining job

Now that you have the baseline data ready in Amazon S3, start a job to `suggest` constraints. `DefaultModelMonitor.suggest_baseline(..)` starts a `ProcessingJob` using an Amazon SageMaker provided Model Monitor container to generate the constraints.

In [41]:
from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat

my_default_monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

my_default_monitor.suggest_baseline(
    baseline_dataset=baseline_data_uri,
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=baseline_results_uri,
    wait=True,
)


Job Name:  baseline-suggestion-job-2021-08-03-03-14-25-404
Inputs:  [{'InputName': 'baseline_dataset_input', 'AppManaged': False, 'S3Input': {'S3Uri': 's3://datascience-environment-notebookinstance--06dc7a0224df/DataDrift-ModelMonitor/baselining/data', 'LocalPath': '/opt/ml/processing/input/baseline_dataset_input', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]
Outputs:  [{'OutputName': 'monitoring_output', 'AppManaged': False, 'S3Output': {'S3Uri': 's3://datascience-environment-notebookinstance--06dc7a0224df/DataDrift-ModelMonitor/baselining/results', 'LocalPath': '/opt/ml/processing/output', 'S3UploadMode': 'EndOfJob'}}]
..........................[34m2021-08-03 03:18:34.683570: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory[0m
[34m2021-08-03 03:18:34.

<sagemaker.processing.ProcessingJob at 0x7f0885e648d0>

#### 3.3 Explore the generated constraints and statistics

In [43]:
result = s3_client.list_objects(Bucket=s3_bucket, Prefix=baseline_results_prefix)
report_files = [report_file.get("Key") for report_file in result.get("Contents")]
print("Found Files:")
print("\n ".join(report_files))

Found Files:
DataDrift-ModelMonitor/baselining/results/constraints.json
 DataDrift-ModelMonitor/baselining/results/statistics.json


In [44]:
baseline_job = my_default_monitor.latest_baselining_job
schema_df = pd.io.json.json_normalize(baseline_job.baseline_statistics().body_dict["features"])
schema_df.head(10)



Unnamed: 0,name,inferred_type,string_statistics.common.num_present,string_statistics.common.num_missing,string_statistics.distinct_count,numerical_statistics.common.num_present,numerical_statistics.common.num_missing,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max,numerical_statistics.distribution.kll.buckets,numerical_statistics.distribution.kll.sketch.parameters.c,numerical_statistics.distribution.kll.sketch.parameters.k,numerical_statistics.distribution.kll.sketch.data
0,value,String,652652.0,0.0,16902.0,,,,,,,,,,,
1,ismobile,Integral,,,,652652.0,0.0,0.0,0.0,0.0,0.0,0.0,"[{'lower_bound': 0.0, 'upper_bound': 0.0, 'cou...",0.64,2048.0,"[[], [], [0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0,..."
2,year,Integral,,,,652652.0,0.0,2020.27611,1318537000.0,0.469578,2018.0,2021.0,"[{'lower_bound': 2018.0, 'upper_bound': 2018.3...",0.64,2048.0,"[[], [], [2021.0], [2021.0, 2019.0, 2020.0, 20..."
3,month,Integral,,,,652652.0,0.0,8.838076,5768188.0,4.975293,1.0,12.0,"[{'lower_bound': 1.0, 'upper_bound': 2.1, 'cou...",0.64,2048.0,"[[], [], [12.0], [12.0, 1.0, 1.0, 1.0, 1.0, 1...."
4,quarter,Integral,,,,652652.0,0.0,3.13831,2048224.0,1.35708,1.0,4.0,"[{'lower_bound': 1.0, 'upper_bound': 1.3, 'cou...",0.64,2048.0,"[[], [], [4.0], [4.0, 1.0, 1.0, 1.0, 1.0, 1.0,..."
5,day,Integral,,,,652652.0,0.0,22.355307,14590240.0,13.546739,1.0,31.0,"[{'lower_bound': 1.0, 'upper_bound': 4.0, 'cou...",0.64,2048.0,"[[], [], [31.0], [31.0, 1.0, 1.0, 1.0, 1.0, 1...."
6,isBadAir,Integral,,,,652652.0,0.0,0.085516,55812.0,0.279648,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...",0.64,2048.0,"[[], [], [1.0], [1.0, 0.0, 0.0, 0.0, 0.0, 0.0,..."
7,location,Fractional,,,,652652.0,0.0,1698.446891,1108495000.0,1639.521162,0.0,7069.0,"[{'lower_bound': 0.0, 'upper_bound': 706.9, 'c...",0.64,2048.0,"[[], [], [4890.0], [4890.0, 2.0, 2.0, 2.0, 2.0..."
8,city,Fractional,,,,652652.0,0.0,368.281914,240359900.0,402.585782,0.0,2279.0,"[{'lower_bound': 0.0, 'upper_bound': 227.9, 'c...",0.64,2048.0,"[[], [], [1597.0], [1597.0, 6.0, 6.0, 6.0, 6.0..."
9,sourcename,Fractional,,,,652652.0,0.0,7.080349,4621004.0,10.258794,0.0,80.0,"[{'lower_bound': 0.0, 'upper_bound': 8.0, 'cou...",0.64,2048.0,"[[], [], [56.0], [56.0, 0.0, 0.0, 0.0, 0.0, 0...."


In [45]:
schema_df.columns

Index(['name', 'inferred_type', 'string_statistics.common.num_present',
       'string_statistics.common.num_missing',
       'string_statistics.distinct_count',
       'numerical_statistics.common.num_present',
       'numerical_statistics.common.num_missing', 'numerical_statistics.mean',
       'numerical_statistics.sum', 'numerical_statistics.std_dev',
       'numerical_statistics.min', 'numerical_statistics.max',
       'numerical_statistics.distribution.kll.buckets',
       'numerical_statistics.distribution.kll.sketch.parameters.c',
       'numerical_statistics.distribution.kll.sketch.parameters.k',
       'numerical_statistics.distribution.kll.sketch.data'],
      dtype='object')

In [46]:
#Display statistics for numerical features
is_numberical_feature =  schema_df['inferred_type']!='String'
is_string_feature =  schema_df['inferred_type']=='String'
#print(is_numberical_feature)

schema_df_string = schema_df[is_string_feature]
schema_df_numerical = schema_df[is_numberical_feature]

In [47]:
## Print statistics of String features 
print("String Features")
schema_df_string[['name',
                  'inferred_type',
                  'string_statistics.common.num_present',
                  'string_statistics.common.num_missing',
                  'string_statistics.distinct_count']]

String Features


Unnamed: 0,name,inferred_type,string_statistics.common.num_present,string_statistics.common.num_missing,string_statistics.distinct_count
0,value,String,652652.0,0.0,16902.0


In [48]:
## Print statistics of Numberical features 
print("Numerical Features")
schema_df_numerical[['name',
                     'inferred_type',
                     'numerical_statistics.mean',
                     'numerical_statistics.sum',
                     'numerical_statistics.std_dev',
                     'numerical_statistics.min',
                     'numerical_statistics.max'
                    ]]

Numerical Features


Unnamed: 0,name,inferred_type,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max
1,ismobile,Integral,0.0,0.0,0.0,0.0,0.0
2,year,Integral,2020.27611,1318537000.0,0.469578,2018.0,2021.0
3,month,Integral,8.838076,5768188.0,4.975293,1.0,12.0
4,quarter,Integral,3.13831,2048224.0,1.35708,1.0,4.0
5,day,Integral,22.355307,14590240.0,13.546739,1.0,31.0
6,isBadAir,Integral,0.085516,55812.0,0.279648,0.0,1.0
7,location,Fractional,1698.446891,1108495000.0,1639.521162,0.0,7069.0
8,city,Fractional,368.281914,240359900.0,402.585782,0.0,2279.0
9,sourcename,Fractional,7.080349,4621004.0,10.258794,0.0,80.0
10,sourcetype,Fractional,0.0,0.0,0.0,0.0,0.0


In [49]:
constraints_df = pd.io.json.json_normalize(
    baseline_job.suggested_constraints().body_dict["features"]
)
constraints_df.head(10)

  from ipykernel import kernelapp as app


Unnamed: 0,name,inferred_type,completeness,num_constraints.is_non_negative
0,value,String,1.0,
1,ismobile,Integral,1.0,True
2,year,Integral,1.0,True
3,month,Integral,1.0,True
4,quarter,Integral,1.0,True
5,day,Integral,1.0,True
6,isBadAir,Integral,1.0,True
7,location,Fractional,1.0,True
8,city,Fractional,1.0,True
9,sourcename,Fractional,1.0,True


### 4. Schedule continous monitoring
When you have collected the data above, analyze and monitor the data with Monitoring Schedules

#### 4.1 Create a schedule

You can create a model monitoring schedule for the endpoint created earlier. Use the baseline resources (constraints and statistics) to compare against the realtime traffic.

In [50]:
from sagemaker.model_monitor import CronExpressionGenerator
from time import gmtime, strftime

mon_schedule_name = "weather-pred-model-monitor-schedule-" + strftime(
    "%Y-%m-%d-%H-%M-%S", gmtime()
)
my_default_monitor.create_monitoring_schedule(
    monitor_schedule_name=mon_schedule_name,
    endpoint_input=endpoint_name,
    output_s3_uri=s3_report_path,
    statistics=my_default_monitor.baseline_statistics(),
    constraints=my_default_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True
)

#### 4.2 Generate traffic

Next, lets send some traffic to the endpoint.  If there is no traffic, the monitoring jobs are marked as `Failed` since there is no data to process.

The cell below starts a thread to send some traffic to the endpoint. Note that you need to stop the kernel to terminate this thread.

In [51]:
from threading import Thread
from time import sleep
import time

#Invoke the endpoint in a loop
def invoke_endpoint_forever():
    while True:
        get_predictions()
        
# Note that you need to stop the kernel to stop the invocations
thread = Thread(target=invoke_endpoint_forever)
thread.start()



.......

#### 4.3 Describe and inspect the schedule
Once you describe, observe that the MonitoringScheduleStatus changes to Scheduled.

In [52]:
desc_schedule_result = my_default_monitor.describe_schedule()
print("Schedule status: {}".format(desc_schedule_result["MonitoringScheduleStatus"]))

Schedule status: Pending
........

#### 4.4 List executions
The schedule starts jobs at the previously specified intervals. Here, you list the latest five executions. Note that if you are kicking this off after creating the hourly schedule, you might find the executions empty. You might have to wait until you cross the hour boundary (in UTC) to see executions kick off. The code below has the logic for waiting.

Note: Even for an hourly schedule, Amazon SageMaker has a buffer period of 20 minutes to schedule your execution. You might see your execution start in anywhere from zero to ~20 minutes from the hour boundary. This is expected and done for load balancing in the backend.

In [53]:
mon_executions = my_default_monitor.list_executions()
print(
    "We created a hourly schedule above and it will kick off executions ON the hour (plus 0 - 20 min buffer.\nWe will have to wait till we hit the hour..."
)

while len(mon_executions) == 0:
    print("Waiting for the 1st execution to happen...")
    time.sleep(60)
    mon_executions = my_default_monitor.list_executions()

No executions found for schedule. monitoring_schedule_name: weather-pred-model-monitor-schedule-2021-08-03-03-25-16
We created a hourly schedule above and it will kick off executions ON the hour (plus 0 - 20 min buffer.
We will have to wait till we hit the hour...
Waiting for the 1st execution to happen...
................................................................................................................No executions found for schedule. monitoring_schedule_name: weather-pred-model-monitor-schedule-2021-08-03-03-25-16
Waiting for the 1st execution to happen...
..................................................................................................................No executions found for schedule. monitoring_schedule_name: weather-pred-model-monitor-schedule-2021-08-03-03-25-16
Waiting for the 1st execution to happen...
..................................................................................................................No executions found for schedule. 

#### 4.5 Inspect a specific execution (latest execution)
In the previous cell, you picked up the latest completed or failed scheduled execution. Here are the possible terminal states and what each of them mean: 
* Completed - This means the monitoring execution completed and no issues were found in the violations report.
* CompletedWithViolations - This means the execution completed, but constraint violations were detected.
* Failed - The monitoring execution failed, maybe due to client error (perhaps incorrect role premissions) or infrastructure issues. Further examination of FailureReason and ExitMessage is necessary to identify what exactly happened.
* Stopped - job exceeded max runtime or was manually stopped.

In [54]:
latest_execution = mon_executions[-1]  # latest execution's index is -1, second to last is -2 and so on..
time.sleep(60)
latest_execution.wait(logs=False)

print("Latest execution status: {}".format(latest_execution.describe()["ProcessingJobStatus"]))
print("Latest execution result: {}".format(latest_execution.describe()["ExitMessage"]))

latest_job = latest_execution.describe()
if latest_job["ProcessingJobStatus"] != "Completed":
    print(
        "====STOP==== \n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures."
    )

................................................................................................................................................................................................................................................................................!Latest execution status: Completed
Latest execution result: CompletedWithViolations: Job completed successfully with 1 violations.
.....................................................................................................................................................................................................

In [55]:
latest_execution

<sagemaker.model_monitor.model_monitoring.MonitoringExecution at 0x7f0884185240>

..........

In [56]:
report_uri = latest_execution.output.destination
print("Report Uri: {}".format(report_uri))

Report Uri: s3://datascience-environment-notebookinstance--06dc7a0224df/DataDrift-ModelMonitor/reports/xgb-weather-prediction-model-monitor-2021-08-03-03-01-53/weather-pred-model-monitor-schedule-2021-08-03-03-25-16/2021/08/03/04
.......

### 5. Analyze monitoring results

#### 5.1 List the generated reports

In [57]:
from urllib.parse import urlparse

s3uri = urlparse(report_uri)
report_bucket = s3uri.netloc
report_key = s3uri.path.lstrip("/")
print("Report bucket: {}".format(report_bucket))
print("Report key: {}".format(report_key))

result = s3_client.list_objects(Bucket=report_bucket, Prefix=report_key)
report_files = [report_file.get("Key") for report_file in result.get("Contents")]
print("Found Report Files:")
print("\n ".join(report_files))

Report bucket: datascience-environment-notebookinstance--06dc7a0224df
Report key: DataDrift-ModelMonitor/reports/xgb-weather-prediction-model-monitor-2021-08-03-03-01-53/weather-pred-model-monitor-schedule-2021-08-03-03-25-16/2021/08/03/04
.Found Report Files:
DataDrift-ModelMonitor/reports/xgb-weather-prediction-model-monitor-2021-08-03-03-01-53/weather-pred-model-monitor-schedule-2021-08-03-03-25-16/2021/08/03/04/constraint_violations.json
 DataDrift-ModelMonitor/reports/xgb-weather-prediction-model-monitor-2021-08-03-03-01-53/weather-pred-model-monitor-schedule-2021-08-03-03-25-16/2021/08/03/04/constraints.json
 DataDrift-ModelMonitor/reports/xgb-weather-prediction-model-monitor-2021-08-03-03-01-53/weather-pred-model-monitor-schedule-2021-08-03-03-25-16/2021/08/03/04/statistics.json
.......

#### 5.2 Violations report

If there are any violations compared to the baseline, they will be listed here.

In [58]:
violations = my_default_monitor.latest_monitoring_constraint_violations()
pd.set_option("display.max_colwidth", -1)
constraints_df = pd.io.json.json_normalize(violations.body_dict["violations"])
constraints_df.head(10)

.

  from ipykernel import kernelapp as app
  app.launch_new_instance()


Unnamed: 0,feature_name,constraint_check_type,description
0,value,data_type_check,"Data type match requirement is not met. Expected data type: String, Expected match: 100.0%. Observed: Only 0.0% of data is String."


....................................................................................................................

### 6. Clean up 

#### Delete the resources

You can keep your endpoint running to continue capturing data. If you do not plan to collect more data or use this endpoint further, you should delete the endpoint to avoid incurring additional charges. Note that deleting your endpoint does not delete the data that was captured during the model invocations. That data persists in Amazon S3 until you delete it yourself.

We can also start and stop the monitoring schedules.

In [None]:
#my_default_monitor.stop_monitoring_schedule()
#my_default_monitor.start_monitoring_schedule()

Before deleting the endpoint, you need delete the monitoring schedule associated with the endpoint.

In [59]:
my_default_monitor.delete_monitoring_schedule()


Deleting Monitoring Schedule with name: weather-pred-model-monitor-schedule-2021-08-03-03-25-16
....................

In [60]:
##Delete the endpoint
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)

{'ResponseMetadata': {'RequestId': '1a0e8d95-f05b-4f78-97fb-3a214cf77989',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '1a0e8d95-f05b-4f78-97fb-3a214cf77989',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Tue, 03 Aug 2021 04:18:16 GMT'},
  'RetryAttempts': 0}}

........

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-51-9776ac0062e1>", line 8, in invoke_endpoint_forever
    get_predictions()
  File "<ipython-input-26-5b7e8f0ff2ef>", line 15, in get_predictions
    Body=test_string)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Endpoint xgb-weat