# Step 6: Add data and model monitoring
After executing five previous notebooks, you have a production-ready solution with automated model building and model deployment CI/CD pipelines.

In this notebook you are going to use [Amazon SageMaker model monitor](https://aws.amazon.com/sagemaker/model-monitor/) to add continuous and automated [monitoring of the data quality](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-quality.html) for the traffic to your real-time SageMaker inference endpoints. You also implement [model monitoring](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality.html) to detect performance drift and model metric anomalies.

Using Model Monitor integration with [Amazon EventBridge](https://aws.amazon.com/eventbridge/) you can implement automated response and remediation to any detected issues with data and model quality. For example, you can launch an automated model retraining if the model performance falls below a specific threshold.

Additionally to data and model quality monitoring you can implement [bias drift](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-bias-drift.html) and [feature attribution drift](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-feature-attribution-drift.html) monitoring.

![](img/six-steps-6.png)

In [2]:
%pip install jsonlines tqdm

Collecting jsonlines
  Downloading jsonlines-3.1.0-py3-none-any.whl (8.6 kB)
Installing collected packages: jsonlines
Successfully installed jsonlines-3.1.0
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [31]:
import boto3
import botocore
import sagemaker 
import json
import jsonlines
import random
from tqdm import trange
from sagemaker.predictor import Predictor
from sagemaker import ModelPackage
import time
from time import gmtime, strftime
from datetime import datetime, timedelta
import uuid
import pandas as pd
import numpy as np
from sagemaker.model_monitor import (
    DefaultModelMonitor,
    DataCaptureConfig,
    CronExpressionGenerator,
    ModelQualityMonitor,
    EndpointInput,
)
from sagemaker.model_monitor.dataset_format import DatasetFormat
from sagemaker.model_monitor import DataCaptureConfig
from utils.monitoring_utils import run_model_monitor_job
from sagemaker.s3 import S3Downloader, S3Uploader
from sagemaker.clarify import (
    BiasConfig,
    DataConfig,
    ModelConfig,
    ModelPredictedLabelConfig,
    SHAPConfig,
)
from urllib.parse import urlparse

sagemaker.__version__

'2.132.0'

In [5]:
sm = boto3.client("sagemaker")
s3 = boto3.client("s3")

In [6]:
session = sagemaker.Session()

In [7]:
pd.set_option("display.max_colwidth", None)

In [8]:
%store -r 

%store

try:
    initialized
except NameError:
    print("+++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] YOU HAVE TO RUN 00-start-here notebook   ")
    print("+++++++++++++++++++++++++++++++++++++++++++++++++")

Stored variables and their in-db values:
baseline_s3_url                        -> 's3://sagemaker-us-east-1-906545278380/from-idea-t
bucket_name                            -> 'sagemaker-us-east-1-906545278380'
bucket_prefix                          -> 'from-idea-to-prod/xgboost'
domain_id                              -> 'd-2dbyvqm5ecfc'
evaluation_s3_url                      -> 's3://sagemaker-us-east-1-906545278380/from-idea-t
experiment_name                        -> 'from-idea-to-prod-experiment-11-22-12-47'
initialized                            -> True
input_s3_url                           -> 's3://sagemaker-us-east-1-906545278380/from-idea-t
model_package_group_name               -> 'from-idea-to-prod-model-group'
output_s3_url                          -> 's3://sagemaker-us-east-1-906545278380/from-idea-t
prediction_baseline_s3_url             -> 's3://sagemaker-us-east-1-906545278380/from-idea-t
region                                 -> 'us-east-1'
sm_role                     

## How Model Monitor works
Amazon SageMaker Model Monitor automatically monitors ML models in production and notifies you when quality issues arise. Model Monitor uses rules to detect drift in your models and data and alerts you when it happens. The following figure shows how this process works.

![](img/data-monitoring-architecture.png)

The process for setting up and using the data monitoring:
1. Enable the SageMaker endpoint to capture data from incoming requests to a trained ML model and the resulting model predictions
2. Create a baseline from the dataset that was used to train the model. The baseline computes metrics and suggests constraints for the metrics. 
3. Create a monitoring schedule specifying what data to collect, how often to collect it, and how to analyze it. Data traffic to your model and predictions from the model are compared to the constraints, and are reported as violations if they are outside the constrained values. You can define multiple monitoring schedule per endpoint
4. Inspect the reports, which compare the latest data with the baseline, and watch for any violations reported and for metrics and notifications from Amazon CloudWatch
5. Implement observability for your ML models with Amazon CloudWatch and event-based architecture with Amazon EventBridge. You can automate data and model updates, model retraining, and user notification based on the data and model quality events

## Real-time inference data capture from a SageMaker endpoint
To work with the model monitor, you need to enable data capture on a SageMaker real-time inference endpoint. 
If you completed the [step 5](05-deploy.ipynb) notebook, there is at least one deployed endpoint with the name like `model-deploy-19-20-31-59-staging`. If you don't have an active endpoint, you need to create one.

In [56]:
# List all deployed real-time endpoints. Depending on your existing environment you might have multiple endpoints
endpoints = sm.list_endpoints(StatusEquals="InService")["Endpoints"]

if not len(endpoints):
    print(f"There is no deployed active endpoints. You must have at least one endpoint. Run the step 3 pipeline to create a model")
    
for ep in endpoints:
    print(f"Data capture configuration for {ep['EndpointName']}:")
    print(f"{json.dumps(sm.describe_endpoint(EndpointName=ep['EndpointName'])['DataCaptureConfig'], indent=2)}")

Data capture configuration for from-idea-to-prod-endpoint-13-15-39-04:
{
  "EnableCapture": true,
  "CaptureStatus": "Started",
  "CurrentSamplingPercentage": 100,
  "DestinationS3Uri": "s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture"
}


<div class="alert alert-info"> 💡
If there is no active endpoints, you can run the step 3 notebook to create a model and register the model in the model registry.
If you have an active endpoint, you can go to the <strong>Check the data capture configuration</strong> section.
</div>

### Deploy a model from the model registry as a real-time endpoint
Run this section if you'd like to create an endpoint with a model from the model registry you created in the step 3 pipeline.

In [38]:
try:
    model_package_group = sm.describe_model_package_group(ModelPackageGroupName=model_package_group_name)
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == 'ValidationException':
        print("******* ERROR *********")
        print(f"Model package group with the name {model_package_group_name} is not found. You need to run the step 3 pipeline to create a model")

In [43]:
model_packages = []

# Find the latest model package
# Set the parameter ModelApprovalStatus='Approved' if you'd like to get only the approved packages
# Sort by the CreationTime
for p in sm.get_paginator('list_model_packages').paginate(
    ModelPackageGroupName=model_package_group_name,
    # ModelApprovalStatus='Approved',
    SortBy="CreationTime",
    SortOrder="Descending",
    ):
    model_packages.extend(p["ModelPackageSummaryList"])
    
if not len(model_packages):
    print("There is no model packages in the model package group {}. You need to run the step 3 pipeline to create a model")
    
latest_model_package_arn = model_packages[0]['ModelPackageArn']
print(f"The latest model package is version {model_packages[0]['ModelPackageVersion']}, {latest_model_package_arn}")


The latest model package is version 16, arn:aws:sagemaker:us-east-1:906545278380:model-package/from-idea-to-prod-model-group/16


You can only deploy a model with the model approval status `Approved`, so the next code cell updates the status.

In [44]:
sm.update_model_package(
    ModelPackageArn=latest_model_package_arn,
    ModelApprovalStatus="Approved",
)

{'ModelPackageArn': 'arn:aws:sagemaker:us-east-1:906545278380:model-package/from-idea-to-prod-model-group/16',
 'ResponseMetadata': {'RequestId': '03803851-8efe-4706-a894-db9bf00f38dd',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '03803851-8efe-4706-a894-db9bf00f38dd',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '109',
   'date': 'Mon, 13 Feb 2023 15:38:53 GMT'},
  'RetryAttempts': 0}}

In [45]:
# Create a model from the registry using Python SDK
model = ModelPackage(role=sm_role, 
                     model_package_arn=model_packages[0]['ModelPackageArn'], 
                     sagemaker_session=session)

In [46]:
endpoint_name = f"from-idea-to-prod-endpoint-{strftime('%d-%H-%M-%S', gmtime())}"

data_capture_config = DataCaptureConfig(
            enable_capture=True,
            sampling_percentage=100,
            destination_s3_uri=f"s3://{bucket_name}/{bucket_prefix}/data-capture",
        )

In [47]:
# Deploy the model
model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.large",
    wait=False,
    data_capture_config=data_capture_config,
    endpoint_name=endpoint_name,
    serializer=sagemaker.serializers.CSVSerializer(),
    deserializer=sagemaker.deserializers.CSVDeserializer(),
)

------!

In [48]:
# Wait until the endpoint has the status InService, it takes approximately 5 min
waiter = session.sagemaker_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

### Check the data capture configuration
If you completed the step 5 [notebook](05-deploy.ipynb), the model deployment CI/CD pipeline contains an infrastructure as code (IaS) data capture configuration for the deployed endpoints. If you clone the project's code repository to the Studio file system, you can browse the project files. Let's take a look into the endpoint configuration.

The CloudFormation deployment template `endpoint-config-template.yml` in the project directory enables data capture for the endpoint configuration:
```yaml
EndpointConfig:
    Type: AWS::SageMaker::EndpointConfig
    Properties:
      ProductionVariants:
        - InitialInstanceCount: !Ref EndpointInstanceCount
          InitialVariantWeight: 1.0
          InstanceType: !Ref EndpointInstanceType
          ModelName: !GetAtt Model.ModelName
          VariantName: AllTraffic
      DataCaptureConfig:
          EnableCapture: !Ref EnableDataCapture 
          InitialSamplingPercentage: !Ref SamplingPercentage
          DestinationS3Uri: !Ref DataCaptureUploadPath
          CaptureOptions:
            - CaptureMode: Input
            - CaptureMode: Output
          CaptureContentTypeHeader:
            CsvContentTypes:
              - "text/csv"
```

The MLOps deploy project parametrizes all settings in the CloudFormation template.
The configuration files `prod-config.json` and `staging-config.json` provide the actual values for `EnableCapture`, `InitialSamplingPercentage`, and `DestinationS3Uri`:
```json
{
  "Parameters": {
    "StageName": "prod",
    "EndpointInstanceCount": "1",
    "EndpointInstanceType": "ml.m5.large",
    "SamplingPercentage": "80",
    "EnableDataCapture": "true"
  }
}
```

Let's check the endpoint configuration and see how data capture is confgured.

<div class="alert alert-info"> 💡
If you use the endpoint deployed in the step 5 notebook, set the endpoint_name to the name of the endpoint in the following code block.
</div>

In [51]:
# Get the configuration for the endpoint
# endpoint_name = "model-deploy-19-20-31-59-staging" # USE YOUR STAGING ENDPOINT NAME
data_capture_s3_url = sm.describe_endpoint(EndpointName=endpoint_name)['DataCaptureConfig']['DestinationS3Uri']
data_capture_bucket = data_capture_s3_url.split('/')[2]
data_capture_prefix = '/'.join(data_capture_s3_url.split('/')[3:])

print(data_capture_s3_url)

s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture


### Define helper functions
Define some helper functions with code snippets that you're going to use throughout this notebook.

In [52]:
# Send data to the endpoint
def generate_endpoint_traffic(predictor, data):
    l = len(data)
    for i in trange(l):
        predictions = np.array(predictor.predict(data.iloc[i].values), dtype=float).squeeze()

In [53]:
# Get all file keys under a specified prefix
def get_file_list(bucket, prefix):
    try:
        files = [f.get("Key") for f in s3.list_objects(Bucket=bucket, Prefix=prefix).get("Contents")]
        print(f"Found {len(files)} files in s3://{bucket}/{prefix}")
        
        return files
    except TypeError:
        print(f"No files found in s3://{bucket}/{prefix}")
        return None

In [54]:
# Get S3 url for the latest captured data
def get_latest_data_capture_s3_url(bucket, prefix):
    capture_files = get_file_list(bucket, prefix)
    
    if capture_files:
        latest_data_capture_s3_url = f"s3://{bucket}/{'/'.join(capture_files[-1].split('/')[:-1])}"

        print(f"Latest data capture S3 url: {latest_data_capture_s3_url}")
        
        return latest_data_capture_s3_url
    else:
        return None

In [55]:
# Get S3 url for the latest monitoring job output
def get_latest_monitoring_report_s3_url(job_name):
    monitor_job = sm.list_processing_jobs(
        NameContains=job_name,
        SortOrder='Descending',
        MaxResults=2
    )['ProcessingJobSummaries'][0]['ProcessingJobName']

    monitoring_job_output_s3_url = sm.describe_processing_job(
        ProcessingJobName=monitor_job
    )['ProcessingOutputConfig']['Outputs'][0]['S3Output']['S3Uri']

    print(f"Latest monitoring report S3 url: {monitoring_job_output_s3_url}")
    
    return monitoring_job_output_s3_url

In [57]:
# Helper to load a json file from S3
def load_json_from_file(file_s3_url):
    bucket = file_s3_url.split('/')[2]
    key = '/'.join(file_s3_url.split('/')[3:])
    print(f"Load JSON from: {bucket}/{key}")
    
    return json.loads(
        s3.get_object(Bucket=bucket, 
                      Key=key)["Body"].read().decode("utf-8")
    )

In [58]:
def get_latest_monitor_execution(monitor):
    mon_executions = monitor.list_executions()

    if len(mon_executions):
        latest_execution = mon_executions[-1]  # get the latest execution
        latest_execution.wait(logs=False)

        print(f"Latest execution status: {latest_execution.describe().get('ProcessingJobStatus')}")
        print(f"Latest execution result: {latest_execution.describe().get('ExitMessage')}")

        latest_job = latest_execution.describe()
        if latest_job["ProcessingJobStatus"] != "Completed":
            print("No completed executions to inspect further")
        else:
            report_uri = latest_execution.output.destination
            print(f"Report Uri: {report_uri}")
        
        return latest_execution
    else:
        print("No monitoring schedule executions found")
        return None

### Generate endpoint traffic and captured data
You must send some data to an endpoint for inference to generate data capture.

In [59]:
# Create a predictor class for the endpoint
predictor = Predictor(
    endpoint_name=endpoint_name, 
    serializer=sagemaker.serializers.CSVSerializer(),
    deserializer=sagemaker.deserializers.CSVDeserializer()
)

Use test dataset prepared in the [step 2](02-sagemaker-containers.ipynb) notebook (`test_x.csv`) and saved on the EFS volume:

In [60]:
# Set the number of data vectors from the test dataset sent to the inference endpoint
number_of_vectors = 100

In [61]:
test_x = pd.read_csv("tmp/test_x.csv", names=[f'_c{i}' for i in range(59)]).sample(number_of_vectors)

In [62]:
test_x.head(1)

Unnamed: 0,_c0,_c1,_c2,_c3,_c4,_c5,_c6,_c7,_c8,_c9,...,_c49,_c50,_c51,_c52,_c53,_c54,_c55,_c56,_c57,_c58
3376,31,1,999,0,1,0,0,0,0,0,...,0,0,0,0,0,0,1,0,1,0


If you need to add or update the data capture configuration for the endpoint, you can use `DataCaptureConfig` and call [`update_data_capture_config()`](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor.update_data_capture_config) method of the predictor:

In [63]:
# data_capture_config = DataCaptureConfig(
#     enable_capture=True,
#     sampling_percentage=100,
#     destination_s3_uri=data_capture_s3_url,
# )

# predictor.update_data_capture_config(data_capture_config)

Send the data to the endpoint:

In [276]:
generate_endpoint_traffic(predictor, test_x)

100%|██████████| 100/100 [00:01<00:00, 90.67it/s]


### View captured data
Now list the data capture files stored in Amazon S3. The data is stored as `jsonl` an Amazon S3 path format is `s3://{data-capture-destination-s3-url}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`.

Wait until captured data appears in the Amazon S3 bucket, it may take several minutes.

In [269]:
!aws s3 ls {data_capture_s3_url} --recursive

2023-02-13 15:53:40      48702 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/15/52-32-739-b3380519-6e72-4079-ac47-13312a2eb253.jsonl
2023-02-13 16:21:30      48890 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/16/20-24-369-8f1b57e7-8215-4711-9889-c908d356f5fc.jsonl
2023-02-13 21:04:40      48702 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21/03-35-139-5a322d84-4197-4e5e-86e2-8ec9b36f5fb7.jsonl
2023-02-14 10:01:10      60690 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/14/10/00-05-920-b3f34273-1268-422b-ab84-2a9a2161b20b.jsonl


In [265]:
capture_files = get_file_list(data_capture_bucket, data_capture_prefix)

Found 4 files in s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture


In [84]:
assert len(capture_files) > 0, "Wait until the capture data delivered to the Amazon S3 bucket"

In [None]:
capture_files[0]

'from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/15/52-32-739-b3380519-6e72-4079-ac47-13312a2eb253.jsonl'

Each inference request is captured in one line in the `jsonl` file. The line contains both the input and output merged together. In the example, you provided the ContentType as `text/csv` which is reflected in the `observedContentType` value. Also, you expose the encoding that you used to encode the input and output payloads in the capture format with the encoding value.

In [275]:
# Download a capture data file and print it's content
file_key = capture_files[-1]
S3Downloader.download(f"s3://{data_capture_bucket}/{file_key}", f"./tmp")

print(f"Content of the capture file:")
# Read the jsonl file and show the first object
with jsonlines.open(f"./tmp/{file_key.split('/')[-1]}") as reader:      
    print(json.dumps(reader.read(), indent=2))
    # print(json.dumps(reader.read(), indent=2))

Content of the capture file:
{
  "captureData": {
    "endpointInput": {
      "observedContentType": "text/csv",
      "mode": "INPUT",
      "data": "-99.99,1.0,999.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0",
      "encoding": "CSV"
    },
    "endpointOutput": {
      "observedContentType": "text/csv; charset=utf-8",
      "mode": "OUTPUT",
      "data": "0.2193777859210968\n",
      "encoding": "CSV"
    }
  },
  "eventMetadata": {
    "eventId": "11b8a634-f96f-4a16-8f22-666209d1d8cf",
    "inferenceTime": "2023-02-14T10:00:05Z"
  },
  "eventVersion": "0"
}


## Part 1: Monitor data quality
In this part you learn how to setup data quality monitoring for SageMaker real-time endpoints.

To enable inference data quality monitoring and evaluation you must:
1. Enable [data capture](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-capture.html)
1. [Create a baseline](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-create-baseline.html) with which you compare the realtime traffic
1. Once a baseline is ready, [schedule monitoring jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-scheduling.html) to continously evaluate and compare against the baseline
1. [See and interpret the results](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-results.html) of monitoring jobs
1. [Integrate data quality monitoring](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-cloudwatch.html) with Amazon CloudWatch

### Create a baselining job with the training dataset
The whole dataset with which you trained and tested the model is usually a good baseline dataset. Note that the baseline dataset data schema and the inference dataset schema should exactly match (i.e. the number and order of the features).

From the baseline dataset you can ask Amazon SageMaker to suggest a set of baseline _constraints_ and generate descriptive _statistics_ to explore the data. Model Monitor provides a [built-in container](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-built-container.html) that provides the ability to suggest the constraints automatically for CSV and flat JSON input. This `sagemaker-model-monitor-analyzer` container also provides you with a range of model monitoring capabilities, including constraint validation against a baseline, and emitting Amazon CloudWatch metrics. This container is based on Spark and is built with [Deequ](https://github.com/awslabs/deequ). 

<div class="alert alert-info"> 💡 <strong> All column names in your baseline dataset must be compliant with Spark. For column names, use only lowercase characters, and _ as the only special character. </strong>
</div>

Use the baseline dataset you created in the [step 2](02-sagemaker-containers.ipynb) notebook data processing. The baseline dataset is the full dataset without header, index, and label column.

In [87]:
!aws s3 ls {baseline_s3_url}/

2023-02-13 11:40:07    4982068 baseline.csv


In [88]:
baseline_results_s3_url = f"{baseline_s3_url}/results"
data_mon_reports_s3_url = f"{baseline_s3_url}/reports"
baseline_dataset_uri = f"{baseline_s3_url}/baseline.csv"

Use the Python SDK class [`DefaultModelMonitor`](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.DefaultModelMonitor) to create a data monitor and interact with it:

In [89]:
data_monitor = DefaultModelMonitor(
    role=sm_role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
    sagemaker_session=session,
)

Start a SageMaker processing job on the baseline data to profile data and suggest constraints.

In [90]:
data_baseline_job_name = f"from-idea-to-prod-data-baselining-{strftime('%d-%H-%M-%S', gmtime())}-{str(uuid.uuid4())[:8]}"

data_baseline_job = data_monitor.suggest_baseline(
    baseline_dataset=baseline_dataset_uri,
    dataset_format=DatasetFormat.csv(header=False),
    output_s3_uri=baseline_results_s3_url,
    wait=False,
    logs=False,
    job_name=data_baseline_job_name,
)

print(data_baseline_job_name)

INFO:sagemaker:Creating processing-job with name from-idea-to-prod-data-baselining-13-15-55-51-17622085


from-idea-to-prod-data-baselining-13-15-55-51-17622085


The baselining job takes about 7 minutes to complete:

In [91]:
data_baseline_job.wait(logs=False)

...........................................!

### See the generated statistics and constraints
After the baselining jobs finished, it saves the baseline statistics to the `statistics.json` file and the suggested baseline constraints to the `constraints.json` file in the location you specify with `output_s3_uri`.

In [169]:
data_monitor.describe_latest_baselining_job()

{'ProcessingInputs': [{'InputName': 'baseline_dataset_input',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/baseline.csv',
    'LocalPath': '/opt/ml/processing/input/baseline_dataset_input',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated',
    'S3CompressionType': 'None'}}],
 'ProcessingOutputConfig': {'Outputs': [{'OutputName': 'monitoring_output',
    'S3Output': {'S3Uri': 's3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/results',
     'LocalPath': '/opt/ml/processing/output',
     'S3UploadMode': 'EndOfJob'},
    'AppManaged': False}]},
 'ProcessingJobName': 'from-idea-to-prod-data-baselining-13-15-55-51-17622085',
 'ProcessingResources': {'ClusterConfig': {'InstanceCount': 1,
   'InstanceType': 'ml.m5.xlarge',
   'VolumeSizeInGB': 20}},
 'StoppingCondition': {'MaxRuntimeInSeconds': 3600},
 'AppSpecification': {'ImageUri': '156

In [170]:
!aws s3 ls {baseline_results_s3_url}/

2023-02-13 16:01:43       9272 constraints.json
2023-02-13 16:01:43    1216759 statistics.json


In [171]:
data_statistics_s3_url = f"{baseline_results_s3_url}/statistics.json"
data_constraints_s3_url = f"{baseline_results_s3_url}/constraints.json"

Copy statistics and constraints JSON files to the Studio EFS:

In [172]:
!aws s3 cp {data_constraints_s3_url} ./tmp/
!aws s3 cp {data_statistics_s3_url} ./tmp/

download: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/results/constraints.json to tmp/constraints.json
download: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/results/statistics.json to tmp/statistics.json


In [96]:
!head -20 tmp/constraints.json

{
  "version" : 0.0,
  "features" : [ {
    "name" : "_c0",
    "inferred_type" : "Integral",
    "completeness" : 1.0,
    "num_constraints" : {
      "is_non_negative" : true
    }
  }, {
    "name" : "_c1",
    "inferred_type" : "Integral",
    "completeness" : 1.0,
    "num_constraints" : {
      "is_non_negative" : true
    }
  }, {
    "name" : "_c2",
    "inferred_type" : "Integral",
    "completeness" : 1.0,


In [97]:
!head -20 tmp/statistics.json

{
  "version" : 0.0,
  "dataset" : {
    "item_count" : 41188
  },
  "features" : [ {
    "name" : "_c0",
    "inferred_type" : "Integral",
    "numerical_statistics" : {
      "common" : {
        "num_present" : 41188,
        "num_missing" : 0
      },
      "mean" : 40.02406040594348,
      "sum" : 1648511.0,
      "std_dev" : 10.42112347183873,
      "min" : 17.0,
      "max" : 98.0,
      "distribution" : {
        "kll" : {


Load the generated JSON as Pandas DataFrame and see the content of `statistics.json` and `constaints.json`:

In [101]:
baseline_job = data_monitor.latest_baselining_job
statistics_df = pd.json_normalize(baseline_job.baseline_statistics().body_dict["features"])
statistics_df.head()

Unnamed: 0,name,inferred_type,numerical_statistics.common.num_present,numerical_statistics.common.num_missing,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max,numerical_statistics.distribution.kll.buckets,numerical_statistics.distribution.kll.sketch.parameters.c,numerical_statistics.distribution.kll.sketch.parameters.k,numerical_statistics.distribution.kll.sketch.data
0,_c0,Integral,41188,0,40.02406,1648511.0,10.421123,17.0,98.0,"[{'lower_bound': 17.0, 'upper_bound': 25.1, 'count': 1676.0}, {'lower_bound': 25.1, 'upper_bound': 33.2, 'count': 11336.0}, {'lower_bound': 33.2, 'upper_bound': 41.3, 'count': 12040.0}, {'lower_bound': 41.3, 'upper_bound': 49.4, 'count': 8072.0}, {'lower_bound': 49.4, 'upper_bound': 57.5, 'count': 5856.0}, {'lower_bound': 57.5, 'upper_bound': 65.6, 'count': 1600.0}, {'lower_bound': 65.6, 'upper_bound': 73.7, 'count': 308.0}, {'lower_bound': 73.7, 'upper_bound': 81.8, 'count': 192.0}, {'lower_bound': 81.8, 'upper_bound': 89.9, 'count': 96.0}, {'lower_bound': 89.9, 'upper_bound': 98.0, 'count': 12.0}]",0.64,2048.0,"[[], [], [17.0, 18.0, 18.0, 18.0, 18.0, 19.0, 19.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, ...], [88.0], [88.0], [19.0, 21.0, 22.0, 22.0, 23.0, 23.0, 23.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, ...]]"
1,_c1,Integral,41188,0,2.567593,105754.0,2.76998,1.0,56.0,"[{'lower_bound': 1.0, 'upper_bound': 6.5, 'count': 38792.0}, {'lower_bound': 6.5, 'upper_bound': 12.0, 'count': 1712.0}, {'lower_bound': 12.0, 'upper_bound': 17.5, 'count': 444.0}, {'lower_bound': 17.5, 'upper_bound': 23.0, 'count': 128.0}, {'lower_bound': 23.0, 'upper_bound': 28.5, 'count': 64.0}, {'lower_bound': 28.5, 'upper_bound': 34.0, 'count': 48.0}, {'lower_bound': 34.0, 'upper_bound': 39.5, 'count': 0.0}, {'lower_bound': 39.5, 'upper_bound': 45.0, 'count': 0.0}, {'lower_bound': 45.0, 'upper_bound': 50.5, 'count': 0.0}, {'lower_bound': 50.5, 'upper_bound': 56.0, 'count': 0.0}]",0.64,2048.0,"[[], [], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...], [13.0], [33.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...]]"
2,_c2,Integral,41188,0,962.475454,39642439.0,186.908638,0.0,999.0,"[{'lower_bound': 0.0, 'upper_bound': 99.9, 'count': 1516.0}, {'lower_bound': 99.9, 'upper_bound': 199.8, 'count': 0.0}, {'lower_bound': 199.8, 'upper_bound': 299.7, 'count': 0.0}, {'lower_bound': 299.7, 'upper_bound': 399.6, 'count': 0.0}, {'lower_bound': 399.6, 'upper_bound': 499.5, 'count': 0.0}, {'lower_bound': 499.5, 'upper_bound': 599.4, 'count': 0.0}, {'lower_bound': 599.4, 'upper_bound': 699.3, 'count': 0.0}, {'lower_bound': 699.3, 'upper_bound': 799.2, 'count': 0.0}, {'lower_bound': 799.2, 'upper_bound': 899.1, 'count': 0.0}, {'lower_bound': 899.1, 'upper_bound': 999.0, 'count': 39672.0}]",0.64,2048.0,"[[], [], [1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, ...], [999.0], [999.0], [3.0, 6.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, ...]]"
3,_c3,Integral,41188,0,0.172963,7124.0,0.494895,0.0,7.0,"[{'lower_bound': 0.0, 'upper_bound': 0.7, 'count': 35572.0}, {'lower_bound': 0.7, 'upper_bound': 1.4, 'count': 4548.0}, {'lower_bound': 1.4, 'upper_bound': 2.1, 'count': 760.0}, {'lower_bound': 2.1, 'upper_bound': 2.8, 'count': 0.0}, {'lower_bound': 2.8, 'upper_bound': 3.5, 'count': 216.0}, {'lower_bound': 3.5, 'upper_bound': 4.2, 'count': 68.0}, {'lower_bound': 4.2, 'upper_bound': 4.9, 'count': 0.0}, {'lower_bound': 4.9, 'upper_bound': 5.6, 'count': 16.0}, {'lower_bound': 5.6, 'upper_bound': 6.3, 'count': 4.0}, {'lower_bound': 6.3, 'upper_bound': 7.0, 'count': 4.0}]",0.64,2048.0,"[[], [], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...], [2.0], [2.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...]]"
4,_c4,Integral,41188,0,0.963217,39673.0,0.188228,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 1516.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 39672.0}]",0.64,2048.0,"[[], [], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...], [1.0], [1.0], [0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...]]"


In [102]:
constraints_df = pd.json_normalize(
    baseline_job.suggested_constraints().body_dict["features"]
)
constraints_df.head()

Unnamed: 0,name,inferred_type,completeness,num_constraints.is_non_negative
0,_c0,Integral,1.0,True
1,_c1,Integral,1.0,True
2,_c2,Integral,1.0,True
3,_c3,Integral,1.0,True
4,_c4,Integral,1.0,True


For this dataset the baselining job suggest three constraints:
1. DataType
2. Completeness
3. Is non-negative

Additionally, the Model Monitor prebuilt container does missing and extra column check, baseline drift check, and categorical values check. Refer to [Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-violations.html) for more details.

In a real-world project you can add your own constraints the data must comply with.

Next you schedule and run a monitoring job to validate incoming data against these constraints and statistics.

### Create a data monitoring schedule
With a monitoring schedule, SageMaker launches processing jobs at a specified frequency to analyze the data collected during a given period. SageMaker provides a [built-in container](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-built-container.html) for performing analysis on tabular datasets. In the processing job, SageMaker compares the dataset for the current analysis with the baseline statistics and constraints and generates a violations report. In addition, CloudWatch metrics are emitted for each data feature under analysis.

#### Implement custom record processing with a preprocessing script
You can extend Model Monitor by providing a custom record preprocessing function. In this function you can implement your own filtering or preprocessing of every data record. For example, you can skip some records from analysis based on values or some event metadata. Refer to [Preprocessing and Postprocessing](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-and-post-processing.html) documentation for more details and examples.

When you created a monitoring baseline, you used the baseline dataset with all features but without the label. The Model Monitor by default concatenates model input and output, resulting in a dataset which contains all features plus the label. If you don't preprocess records before passing them to Model Monitor, the number of columns in the baseline dataset won't match the number of columns in the data capture record, and Model Monitor will report a `extra_column_check` violation. To avoid this situation, you need either to include the label column in the baselining or remove model output from the monitored records. This notebook uses the latter approach and provides a preprocessing script that returns only input data without the label.

For another example of custom preprocessing see the blog post [Design a compelling record filtering method with Amazon SageMaker Model Monitor](https://aws.amazon.com/blogs/machine-learning/design-a-compelling-record-filtering-method-with-amazon-sagemaker-model-monitor/).

In [103]:
# !pygmentize ./record_preprocessor.py

In [104]:
# Upload the preprocessing script to S3
record_preprocessor_s3_url = f"s3://{bucket_name}/{bucket_prefix}/code"

In [105]:
!aws s3 cp ./record_preprocessor.py {record_preprocessor_s3_url}/

upload: ./record_preprocessor.py to s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/code/record_preprocessor.py


In [106]:
data_mon_schedule_name = "from-idea-to-prod-data-monitor-schedule-" + strftime(
    "%Y-%m-%d-%H-%M-%S", gmtime()
)

data_monitor.create_monitoring_schedule(
    monitor_schedule_name=data_mon_schedule_name,
    endpoint_input=predictor.endpoint_name,
    record_preprocessor_script=f"{record_preprocessor_s3_url}/record_preprocessor.py",
    # post_analytics_processor_script=s3_code_postprocessor_uri,
    output_s3_uri=data_mon_reports_s3_url,
    statistics=data_monitor.baseline_statistics(),
    constraints=data_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
)

INFO:sagemaker.model_monitor.model_monitoring:Creating Monitoring Schedule with name: from-idea-to-prod-data-monitor-schedule-2023-02-13-16-06-24


In [107]:
while data_monitor.describe_schedule()["MonitoringScheduleStatus"] != "Scheduled":
    print(f"Waiting until data monitoring schedule status becomes Scheduled")
    time.sleep(3)

data_monitor.describe_schedule()

Waiting until data monitoring schedule status becomes Scheduled
Waiting until data monitoring schedule status becomes Scheduled
Waiting until data monitoring schedule status becomes Scheduled


{'MonitoringScheduleArn': 'arn:aws:sagemaker:us-east-1:906545278380:monitoring-schedule/from-idea-to-prod-data-monitor-schedule-2023-02-13-16-06-24',
 'MonitoringScheduleName': 'from-idea-to-prod-data-monitor-schedule-2023-02-13-16-06-24',
 'MonitoringScheduleStatus': 'Scheduled',
 'MonitoringType': 'DataQuality',
 'CreationTime': datetime.datetime(2023, 2, 13, 16, 6, 25, 235000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2023, 2, 13, 16, 6, 34, 320000, tzinfo=tzlocal()),
 'MonitoringScheduleConfig': {'ScheduleConfig': {'ScheduleExpression': 'cron(0 * ? * * *)'},
  'MonitoringJobDefinitionName': 'data-quality-job-definition-2023-02-13-16-06-24-914',
  'MonitoringType': 'DataQuality'},
 'EndpointName': 'from-idea-to-prod-endpoint-13-15-39-04',
 'ResponseMetadata': {'RequestId': '29878010-5ffe-4197-9e78-c357c8885786',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '29878010-5ffe-4197-9e78-c357c8885786',
   'content-type': 'application/x-amz-json-1.1',
   'c

### Generate compliant traffic
Generate traffic that won't trigger any violations. Use the `test_x.csv` dataset to send requests to the endpoint.

In [108]:
generate_endpoint_traffic(predictor, test_x)

100%|██████████| 100/100 [00:01<00:00, 69.70it/s]


### See the captured data
List captured data files under `data_capture_s3_url`. Wait couple of minutes before the captured data appears in the Amazon S3 bucket.

In [263]:
!aws s3 ls {data_capture_s3_url} --recursive

2023-02-13 15:53:40      48702 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/15/52-32-739-b3380519-6e72-4079-ac47-13312a2eb253.jsonl
2023-02-13 16:21:30      48890 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/16/20-24-369-8f1b57e7-8215-4711-9889-c908d356f5fc.jsonl
2023-02-13 21:04:40      48702 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21/03-35-139-5a322d84-4197-4e5e-86e2-8ec9b36f5fb7.jsonl
2023-02-14 10:01:10      60690 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/14/10/00-05-920-b3f34273-1268-422b-ab84-2a9a2161b20b.jsonl


In [112]:
# !aws s3 rm {data_capture_s3_url} --recursive

### Launch a manual monitoring job
You can launch a monitoring job manually and don't wait until a configured data monitor schedule execution. You created an hourly schedule, so you need to wait until you cross the hour boundary to see some schedule executions.

Since the Model Monitor uses a [built-in container](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-built-container.html) and a SageMaker [processing job](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html) to run analysis of the captured data, you can manually configure and run a monitoring job. 

This [repository](https://github.com/aws-samples/reinvent2019-aim362-sagemaker-debugger-model-monitor/tree/master/02_deploy_and_monitor) contains an implementation of a helper function to manually run a monitoring job.

In [113]:
# !pygmentize ./utils/monitoring_utils.py

Get an S3 url for the latest captured data files:

In [264]:
latest_data_capture_s3_url = get_latest_data_capture_s3_url(data_capture_bucket, data_capture_prefix)

Found 4 files in s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture
Latest data capture S3 url: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/14/10


In [115]:
print(f"Data capture path: {latest_data_capture_s3_url}")
print(f"Data baseline statistics file: {data_statistics_s3_url}")
print(f"Data baseline constraints file: {data_constraints_s3_url}")
print(f"Data monitor report output path: {data_mon_reports_s3_url}")
print(f"Record preprocessor script path: {record_preprocessor_s3_url}")

Data capture path: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/16
Data baseline statistics file: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/results/statistics.json
Data baseline constraints file: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/results/constraints.json
Data monitor report output path: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/reports
Record preprocessor script path: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/code


Run a monitoring job, it takes about 7 minutes:

In [None]:
from utils.monitoring_utils import run_model_monitor_job

run_model_monitor_job(
    region=region,
    instance_type="ml.m5.xlarge",
    role=sm_role,
    data_capture_path=latest_data_capture_s3_url,
    statistics_path=data_statistics_s3_url,
    constraints_path=data_constraints_s3_url,
    reports_path=data_mon_reports_s3_url,
    instance_count=1,
    preprocessor_path=f"{record_preprocessor_s3_url}/record_preprocessor.py",
    postprocessor_path=None,
    publish_cloudwatch_metrics="Disabled",
    logs=False,
)

INFO:sagemaker:Creating processing-job with name sagemaker-model-monitor-analyzer-2023-02-13-16-08-44-980


.....................................................................!

### See the monitoring job output
Let's check what reports the monitoring job generated. 

In [117]:
manual_monitoring_job_output_s3_url = get_latest_monitoring_report_s3_url("sagemaker-model-monitor-analyzer")

Latest monitoring report S3 url: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/reports/6


In [118]:
!aws s3 ls {manual_monitoring_job_output_s3_url}/

2023-02-13 16:14:26         24 constraint_violations.json
2023-02-13 16:14:24       9272 constraints.json
2023-02-13 16:14:24     125280 statistics.json


Load the monitoring report and see if there are any violations:

In [277]:
violations = load_json_from_file(f"{manual_monitoring_job_output_s3_url}/constraint_violations.json")

Load JSON from: sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/reports/0/constraint_violations.json


In [278]:
pd.json_normalize(violations["violations"])

Unnamed: 0,feature_name,constraint_check_type,description
0,_c41,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
1,_c34,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
2,_c17,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
3,_c51,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
4,_c13,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
5,_c37,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
6,_c38,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
7,_c5,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
8,_c27,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
9,_c23,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."


You can also copy the constraint violations report to the Studio EFS and print the content of the file:

In [175]:
!aws s3 cp {manual_monitoring_job_output_s3_url}/constraint_violations.json ./tmp/

download: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/reports/6/constraint_violations.json to tmp/constraint_violations.json


In [176]:
!head ./tmp/constraint_violations.json

{
  "violations" : [ ]
}

Now load the newly calculated statistics and constratins based on the captured dataset.

In [123]:
statistics = load_json_from_file(f"{data_statistics_s3_url}")
constraints = load_json_from_file(f"{data_constraints_s3_url}")

print(f"Records processed: {statistics['dataset']['item_count']}")

Load JSON from: sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/reports/6/statistics.json
Load JSON from: sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/reports/6/constraints.json
Records processed: 100


In [125]:
pd.json_normalize(statistics["features"]).head()

Unnamed: 0,name,inferred_type,numerical_statistics.common.num_present,numerical_statistics.common.num_missing,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max,numerical_statistics.distribution.kll.buckets,numerical_statistics.distribution.kll.sketch.parameters.c,numerical_statistics.distribution.kll.sketch.parameters.k,numerical_statistics.distribution.kll.sketch.data
0,_c0,Integral,100,0,39.8,3980.0,11.027239,24.0,74.0,"[{'lower_bound': 24.0, 'upper_bound': 29.0, 'count': 13.0}, {'lower_bound': 29.0, 'upper_bound': 34.0, 'count': 21.0}, {'lower_bound': 34.0, 'upper_bound': 39.0, 'count': 20.0}, {'lower_bound': 39.0, 'upper_bound': 44.0, 'count': 16.0}, {'lower_bound': 44.0, 'upper_bound': 49.0, 'count': 10.0}, {'lower_bound': 49.0, 'upper_bound': 54.0, 'count': 5.0}, {'lower_bound': 54.0, 'upper_bound': 59.0, 'count': 10.0}, {'lower_bound': 59.0, 'upper_bound': 64.0, 'count': 2.0}, {'lower_bound': 64.0, 'upper_bound': 69.0, 'count': 0.0}, {'lower_bound': 69.0, 'upper_bound': 74.0, 'count': 3.0}]",0.64,2048.0,"[[37.0, 40.0, 27.0, 35.0, 24.0, 46.0, 47.0, 25.0, 34.0, 74.0, 37.0, 30.0, 60.0, 30.0, 41.0, 59.0, 25.0, 31.0, 27.0, 31.0, 26.0, 58.0, 44.0, 31.0, 43.0, 42.0, 31.0, 39.0, 26.0, 38.0, 43.0, 39.0, 49.0, 55.0, 44.0, 43.0, 57.0, 37.0, 47.0, 32.0, 36.0, 48.0, 43.0, 29.0, 30.0, 55.0, 40.0, 31.0, 37.0, 31.0, 51.0, 34.0, 71.0, 49.0, 57.0, 42.0, 51.0, 38.0, 26.0, 29.0, 43.0, 32.0, 33.0, 33.0, 38.0, 41.0, 51.0, 40.0, 57.0, 30.0, 32.0, 32.0, 58.0, 38.0, 32.0, 58.0, 48.0, 24.0, 35.0, 47.0, 33.0, 34.0, 28.0, 25.0, 39.0, 34.0, 35.0, 69.0, 41.0, 28.0, 29.0, 34.0, 47.0, 35.0, 48.0, 35.0, 37.0, 26.0, 55.0, 54.0]]"
1,_c1,Integral,100,0,2.62,262.0,2.777697,1.0,22.0,"[{'lower_bound': 1.0, 'upper_bound': 3.1, 'count': 82.0}, {'lower_bound': 3.1, 'upper_bound': 5.2, 'count': 10.0}, {'lower_bound': 5.2, 'upper_bound': 7.3, 'count': 2.0}, {'lower_bound': 7.3, 'upper_bound': 9.4, 'count': 3.0}, {'lower_bound': 9.4, 'upper_bound': 11.5, 'count': 2.0}, {'lower_bound': 11.5, 'upper_bound': 13.6, 'count': 0.0}, {'lower_bound': 13.6, 'upper_bound': 15.7, 'count': 0.0}, {'lower_bound': 15.7, 'upper_bound': 17.8, 'count': 0.0}, {'lower_bound': 17.8, 'upper_bound': 19.9, 'count': 0.0}, {'lower_bound': 19.9, 'upper_bound': 22.0, 'count': 1.0}]",0.64,2048.0,"[[1.0, 3.0, 4.0, 2.0, 1.0, 5.0, 22.0, 4.0, 3.0, 3.0, 2.0, 2.0, 3.0, 10.0, 1.0, 1.0, 1.0, 1.0, 3.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 3.0, 1.0, 3.0, 1.0, 2.0, 3.0, 3.0, 1.0, 3.0, 1.0, 1.0, 10.0, 1.0, 1.0, 1.0, 3.0, 1.0, 5.0, 2.0, 6.0, 4.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 1.0, 4.0, 1.0, 2.0, 1.0, 3.0, 5.0, 2.0, 1.0, 1.0, 6.0, 8.0, 5.0, 3.0, 2.0, 1.0, 4.0, 1.0, 2.0, 2.0, 1.0, 1.0, 3.0, 4.0, 3.0, 3.0, 1.0, 8.0, 1.0, 2.0, 2.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 9.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0]]"
2,_c2,Integral,100,0,0.05,5.0,0.217945,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 95.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 5.0}]",0.64,2048.0,"[[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]"
3,_c3,Integral,100,0,0.06,6.0,0.237487,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 94.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 6.0}]",0.64,2048.0,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]"
4,_c4,Integral,100,0,0.02,2.0,0.14,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 98.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 2.0}]",0.64,2048.0,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]"


In [126]:
pd.json_normalize(constraints["features"]).head()

Unnamed: 0,name,inferred_type,completeness,num_constraints.is_non_negative
0,_c0,Integral,1.0,True
1,_c1,Integral,1.0,True
2,_c2,Integral,1.0,True
3,_c3,Integral,1.0,True
4,_c4,Integral,1.0,True


### What is monitored
Refer to [Schema for Violations](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-violations.html) in the Developer Guide to see what constraints are monitored by the model monitor. You can configure a tolerance threshold that fits your specific data quality requirements.  To configure the thresholds, you must change the `monitoring_config` section of the baseline `constraints.json` file:

In [127]:
with open("tmp/constraints.json", "r") as c:
    data = c.read()
    
print(json.dumps(json.loads(data)["monitoring_config"], indent=2))

{
  "evaluate_constraints": "Enabled",
  "emit_metrics": "Enabled",
  "datatype_check_threshold": 1.0,
  "domain_content_threshold": 1.0,
  "distribution_constraints": {
    "perform_comparison": "Enabled",
    "comparison_threshold": 0.1,
    "comparison_method": "Robust"
  }
}


To modify monitoring configuration, change this section and upload the file to Amazon S3.
You can use `Robust` or `Simple` method to detect a data distribution drift, refer to [Schema for Constraints](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-byoc-constraints.html) in the Developer Guide. `Robust` method is recommended for small datasets and based on the [Two-sample Kolmogorov-Smirnov test](https://en.m.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).

### Generate non-compliant traffic
Now generate traffic that will trigger the violation in the model monitor data quality check.

In [245]:
non_compliant_pd = test_x.copy()
non_compliant_pd.iloc[:,0] = -99.99

In [231]:
non_compliant_pd.drop(columns=['_c0'], inplace=True)

In [246]:
non_compliant_pd.head()

Unnamed: 0,_c0,_c1,_c2,_c3,_c4,_c5,_c6,_c7,_c8,_c9,...,_c49,_c50,_c51,_c52,_c53,_c54,_c55,_c56,_c57,_c58
3376,-99.99,1,999,0,1,0,0,0,0,0,...,0,0,0,0,0,0,1,0,1,0
3218,-99.99,1,999,0,1,0,0,1,0,0,...,0,1,0,0,0,1,0,0,1,0
1336,-99.99,1,999,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,1,0
1709,-99.99,3,999,0,1,1,0,0,0,0,...,0,0,0,1,0,0,0,0,1,0
447,-99.99,3,999,0,1,0,0,0,0,0,...,0,0,0,0,0,0,1,0,1,0


In [249]:
# Remove previous saved data capture from the S3 bucket
latest_data_capture_s3_url = get_latest_data_capture_s3_url(data_capture_bucket, data_capture_prefix)

Found 3 files in s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture
Latest data capture S3 url: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21


In [248]:
# uncomment the next line to remove the previous data capture files
# !aws s3 rm {latest_data_capture_s3_url} --recursive

delete: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/14/09/45-50-619-7e16df8a-1ccf-4ad4-890a-d244a3e6dc72.jsonl


In [250]:
generate_endpoint_traffic(predictor, non_compliant_pd)

100%|██████████| 100/100 [00:01<00:00, 77.59it/s]


### See the captured data
List captured data files under `data_capture_s3_url`. Wait couple of minutes before the captured data appears in the Amazon S3 bucket.

In [252]:
!aws s3 ls {data_capture_s3_url} --recursive

2023-02-13 15:53:40      48702 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/15/52-32-739-b3380519-6e72-4079-ac47-13312a2eb253.jsonl
2023-02-13 16:21:30      48890 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/16/20-24-369-8f1b57e7-8215-4711-9889-c908d356f5fc.jsonl
2023-02-13 21:04:40      48702 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21/03-35-139-5a322d84-4197-4e5e-86e2-8ec9b36f5fb7.jsonl
2023-02-14 10:01:10      60690 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/14/10/00-05-920-b3f34273-1268-422b-ab84-2a9a2161b20b.jsonl


### Launch a manual monitoring job
Let's run a manual monitoring job again to analyze the capture data:

In [253]:
latest_data_capture_s3_url = get_latest_data_capture_s3_url(data_capture_bucket, data_capture_prefix)

Found 4 files in s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture
Latest data capture S3 url: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/14/10


In [None]:
run_model_monitor_job(
    region=region,
    instance_type="ml.m5.xlarge",
    role=sm_role,
    data_capture_path=latest_data_capture_s3_url,
    statistics_path=data_statistics_s3_url,
    constraints_path=data_constraints_s3_url,
    reports_path=data_mon_reports_s3_url,
    instance_count=1,
    preprocessor_path=f"{record_preprocessor_s3_url}/record_preprocessor.py",
    postprocessor_path=None,
    publish_cloudwatch_metrics="Disabled",
    logs=False,
)

INFO:sagemaker:Creating processing-job with name sagemaker-model-monitor-analyzer-2023-02-14-10-21-03-307


.................................................

### See the monitoring job output
Let's check what reports the monitoring job generated. Since you send non-compliant data to the endpoint, you must see a violation report.

In [256]:
manual_monitoring_job_output_s3_url = get_latest_monitoring_report_s3_url("sagemaker-model-monitor-analyzer")

Latest monitoring report S3 url: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/reports/0


In [257]:
!aws s3 ls {manual_monitoring_job_output_s3_url}/

2023-02-14 10:26:47      14291 constraint_violations.json
2023-02-14 10:26:46       9391 constraints.json
2023-02-14 10:26:46     125649 statistics.json


Load the monitoring report and see the violations:

In [258]:
violations = load_json_from_file(f"{manual_monitoring_job_output_s3_url}/constraint_violations.json")
violations

Load JSON from: sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/reports/0/constraint_violations.json


{'violations': [{'feature_name': '_c41',
   'constraint_check_type': 'data_type_check',
   'description': 'Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral.'},
  {'feature_name': '_c34',
   'constraint_check_type': 'data_type_check',
   'description': 'Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral.'},
  {'feature_name': '_c17',
   'constraint_check_type': 'data_type_check',
   'description': 'Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral.'},
  {'feature_name': '_c51',
   'constraint_check_type': 'data_type_check',
   'description': 'Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral.'},
  {'feature_name': '_c13',
   'constraint_check_type': 'data_t

In [259]:
pd.json_normalize(violations["violations"])

Unnamed: 0,feature_name,constraint_check_type,description
0,_c41,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
1,_c34,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
2,_c17,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
3,_c51,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
4,_c13,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
5,_c37,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
6,_c38,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
7,_c5,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
8,_c27,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."
9,_c23,data_type_check,"Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral."


In [260]:
statistics = load_json_from_file(f"{manual_monitoring_job_output_s3_url}/statistics.json")
constraints = load_json_from_file(f"{manual_monitoring_job_output_s3_url}/constraints.json")

print(f"Records processed: {statistics['dataset']['item_count']}")

Load JSON from: sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/reports/0/statistics.json
Load JSON from: sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/reports/0/constraints.json
Records processed: 100


In [261]:
pd.json_normalize(statistics["features"]).head()

Unnamed: 0,name,inferred_type,numerical_statistics.common.num_present,numerical_statistics.common.num_missing,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max,numerical_statistics.distribution.kll.buckets,numerical_statistics.distribution.kll.sketch.parameters.c,numerical_statistics.distribution.kll.sketch.parameters.k,numerical_statistics.distribution.kll.sketch.data
0,_c0,Fractional,100,0,-99.99,-9999.0,0.0,-99.99,-99.99,"[{'lower_bound': -99.99, 'upper_bound': -99.99, 'count': 0.0}, {'lower_bound': -99.99, 'upper_bound': -99.99, 'count': 0.0}, {'lower_bound': -99.99, 'upper_bound': -99.99, 'count': 0.0}, {'lower_bound': -99.99, 'upper_bound': -99.99, 'count': 0.0}, {'lower_bound': -99.99, 'upper_bound': -99.99, 'count': 0.0}, {'lower_bound': -99.99, 'upper_bound': -99.99, 'count': 0.0}, {'lower_bound': -99.99, 'upper_bound': -99.99, 'count': 0.0}, {'lower_bound': -99.99, 'upper_bound': -99.99, 'count': 0.0}, {'lower_bound': -99.99, 'upper_bound': -99.99, 'count': 0.0}, {'lower_bound': -99.99, 'upper_bound': -99.99, 'count': 100.0}]",0.64,2048.0,"[[-99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99]]"
1,_c1,Fractional,100,0,2.62,262.0,2.777697,1.0,22.0,"[{'lower_bound': 1.0, 'upper_bound': 3.1, 'count': 82.0}, {'lower_bound': 3.1, 'upper_bound': 5.2, 'count': 10.0}, {'lower_bound': 5.2, 'upper_bound': 7.3, 'count': 2.0}, {'lower_bound': 7.3, 'upper_bound': 9.4, 'count': 3.0}, {'lower_bound': 9.4, 'upper_bound': 11.5, 'count': 2.0}, {'lower_bound': 11.5, 'upper_bound': 13.6, 'count': 0.0}, {'lower_bound': 13.6, 'upper_bound': 15.7, 'count': 0.0}, {'lower_bound': 15.7, 'upper_bound': 17.8, 'count': 0.0}, {'lower_bound': 17.8, 'upper_bound': 19.9, 'count': 0.0}, {'lower_bound': 19.9, 'upper_bound': 22.0, 'count': 1.0}]",0.64,2048.0,"[[1.0, 3.0, 4.0, 2.0, 1.0, 5.0, 22.0, 4.0, 3.0, 3.0, 2.0, 2.0, 3.0, 10.0, 1.0, 1.0, 1.0, 1.0, 3.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 3.0, 1.0, 3.0, 1.0, 2.0, 3.0, 3.0, 1.0, 3.0, 1.0, 1.0, 10.0, 1.0, 1.0, 1.0, 3.0, 1.0, 5.0, 2.0, 6.0, 4.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 1.0, 4.0, 1.0, 2.0, 1.0, 3.0, 5.0, 2.0, 1.0, 1.0, 6.0, 8.0, 5.0, 3.0, 2.0, 1.0, 4.0, 1.0, 2.0, 2.0, 1.0, 1.0, 3.0, 4.0, 3.0, 3.0, 1.0, 8.0, 1.0, 2.0, 2.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 9.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0]]"
2,_c2,Fractional,100,0,0.05,5.0,0.217945,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 95.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 5.0}]",0.64,2048.0,"[[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]"
3,_c3,Fractional,100,0,0.06,6.0,0.237487,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 94.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 6.0}]",0.64,2048.0,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]"
4,_c4,Fractional,100,0,0.02,2.0,0.14,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 98.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 2.0}]",0.64,2048.0,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]"


In [262]:
pd.json_normalize(constraints["features"]).head()

Unnamed: 0,name,inferred_type,completeness,num_constraints.is_non_negative
0,_c0,Fractional,1.0,False
1,_c1,Fractional,1.0,True
2,_c2,Fractional,1.0,True
3,_c3,Fractional,1.0,True
4,_c4,Fractional,1.0,True


### List schedule executions and monitoring reports
You created a hourly schedule above that begins executions on the hour plus 0-20 min buffer. You will have to wait till the clock hit the hour. You can also change the schedule.

This section demonstrates how to work with scheduled monitoring job execution. The Python SDK class [`DefaultModelMonitor`](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.DefaultModelMonitor) implements helper methods to load and see the executions and monitoring reports.

List executions and view a monitoring job details:

In [189]:
latest_execution = get_latest_monitor_execution(data_monitor)

!Latest execution status: Completed
Latest execution result: Completed: Job completed successfully with no violations.
Report Uri: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/reports/from-idea-to-prod-endpoint-13-15-39-04/from-idea-to-prod-data-monitor-schedule-2023-02-13-16-06-24/2023/02/13/17


See details about the latest scheduled monitoring execution:

In [152]:
latest_execution.describe()

{'ProcessingInputs': [{'InputName': 'baseline',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/results/statistics.json',
    'LocalPath': '/opt/ml/processing/baseline/stats',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated'}},
  {'InputName': 'constraints',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/baseline/results/constraints.json',
    'LocalPath': '/opt/ml/processing/baseline/constraints',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated'}},
  {'InputName': 'pre_processor_script',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/code/record_preprocessor.py',
    'LocalPath': '/opt/ml/processing/code/preprocessing',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'Fil

Get the latest execution statistics and constraint violations as objects:

In [153]:
last_execution_statistics = latest_execution.statistics()
last_execution_violations = latest_execution.constraint_violations()

Load reports into Pandas DataFrame:

In [155]:
pd.json_normalize(last_execution_statistics.body_dict["features"]).head()

Unnamed: 0,name,inferred_type,numerical_statistics.common.num_present,numerical_statistics.common.num_missing,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max,numerical_statistics.distribution.kll.buckets,numerical_statistics.distribution.kll.sketch.parameters.c,numerical_statistics.distribution.kll.sketch.parameters.k,numerical_statistics.distribution.kll.sketch.data
0,_c0,Integral,100,0,-100.0,-10000.0,0.0,-100.0,-100.0,"[{'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 100.0}]",0.64,2048.0,"[[-100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0]]"
1,_c1,Integral,100,0,2.62,262.0,2.777697,1.0,22.0,"[{'lower_bound': 1.0, 'upper_bound': 3.1, 'count': 82.0}, {'lower_bound': 3.1, 'upper_bound': 5.2, 'count': 10.0}, {'lower_bound': 5.2, 'upper_bound': 7.3, 'count': 2.0}, {'lower_bound': 7.3, 'upper_bound': 9.4, 'count': 3.0}, {'lower_bound': 9.4, 'upper_bound': 11.5, 'count': 2.0}, {'lower_bound': 11.5, 'upper_bound': 13.6, 'count': 0.0}, {'lower_bound': 13.6, 'upper_bound': 15.7, 'count': 0.0}, {'lower_bound': 15.7, 'upper_bound': 17.8, 'count': 0.0}, {'lower_bound': 17.8, 'upper_bound': 19.9, 'count': 0.0}, {'lower_bound': 19.9, 'upper_bound': 22.0, 'count': 1.0}]",0.64,2048.0,"[[1.0, 3.0, 4.0, 2.0, 1.0, 5.0, 22.0, 4.0, 3.0, 3.0, 2.0, 2.0, 3.0, 10.0, 1.0, 1.0, 1.0, 1.0, 3.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 3.0, 1.0, 3.0, 1.0, 2.0, 3.0, 3.0, 1.0, 3.0, 1.0, 1.0, 10.0, 1.0, 1.0, 1.0, 3.0, 1.0, 5.0, 2.0, 6.0, 4.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 1.0, 4.0, 1.0, 2.0, 1.0, 3.0, 5.0, 2.0, 1.0, 1.0, 6.0, 8.0, 5.0, 3.0, 2.0, 1.0, 4.0, 1.0, 2.0, 2.0, 1.0, 1.0, 3.0, 4.0, 3.0, 3.0, 1.0, 8.0, 1.0, 2.0, 2.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 9.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0]]"
2,_c2,Integral,100,0,0.05,5.0,0.217945,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 95.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 5.0}]",0.64,2048.0,"[[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]"
3,_c3,Integral,100,0,0.06,6.0,0.237487,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 94.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 6.0}]",0.64,2048.0,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]"
4,_c4,Integral,100,0,0.02,2.0,0.14,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 98.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 2.0}]",0.64,2048.0,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]"


In [156]:
pd.json_normalize(last_execution_violations.body_dict["violations"]).head()

See the baseline and the latest data profiling statistics:

In [157]:
pd.json_normalize(data_monitor.baseline_statistics().body_dict["features"]).head()

Unnamed: 0,name,inferred_type,numerical_statistics.common.num_present,numerical_statistics.common.num_missing,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max,numerical_statistics.distribution.kll.buckets,numerical_statistics.distribution.kll.sketch.parameters.c,numerical_statistics.distribution.kll.sketch.parameters.k,numerical_statistics.distribution.kll.sketch.data
0,_c0,Integral,41188,0,40.02406,1648511.0,10.421123,17.0,98.0,"[{'lower_bound': 17.0, 'upper_bound': 25.1, 'count': 1676.0}, {'lower_bound': 25.1, 'upper_bound': 33.2, 'count': 11336.0}, {'lower_bound': 33.2, 'upper_bound': 41.3, 'count': 12040.0}, {'lower_bound': 41.3, 'upper_bound': 49.4, 'count': 8072.0}, {'lower_bound': 49.4, 'upper_bound': 57.5, 'count': 5856.0}, {'lower_bound': 57.5, 'upper_bound': 65.6, 'count': 1600.0}, {'lower_bound': 65.6, 'upper_bound': 73.7, 'count': 308.0}, {'lower_bound': 73.7, 'upper_bound': 81.8, 'count': 192.0}, {'lower_bound': 81.8, 'upper_bound': 89.9, 'count': 96.0}, {'lower_bound': 89.9, 'upper_bound': 98.0, 'count': 12.0}]",0.64,2048.0,"[[], [], [17.0, 18.0, 18.0, 18.0, 18.0, 19.0, 19.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, ...], [88.0], [88.0], [19.0, 21.0, 22.0, 22.0, 23.0, 23.0, 23.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, ...]]"
1,_c1,Integral,41188,0,2.567593,105754.0,2.76998,1.0,56.0,"[{'lower_bound': 1.0, 'upper_bound': 6.5, 'count': 38792.0}, {'lower_bound': 6.5, 'upper_bound': 12.0, 'count': 1712.0}, {'lower_bound': 12.0, 'upper_bound': 17.5, 'count': 444.0}, {'lower_bound': 17.5, 'upper_bound': 23.0, 'count': 128.0}, {'lower_bound': 23.0, 'upper_bound': 28.5, 'count': 64.0}, {'lower_bound': 28.5, 'upper_bound': 34.0, 'count': 48.0}, {'lower_bound': 34.0, 'upper_bound': 39.5, 'count': 0.0}, {'lower_bound': 39.5, 'upper_bound': 45.0, 'count': 0.0}, {'lower_bound': 45.0, 'upper_bound': 50.5, 'count': 0.0}, {'lower_bound': 50.5, 'upper_bound': 56.0, 'count': 0.0}]",0.64,2048.0,"[[], [], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...], [13.0], [33.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...]]"
2,_c2,Integral,41188,0,962.475454,39642439.0,186.908638,0.0,999.0,"[{'lower_bound': 0.0, 'upper_bound': 99.9, 'count': 1516.0}, {'lower_bound': 99.9, 'upper_bound': 199.8, 'count': 0.0}, {'lower_bound': 199.8, 'upper_bound': 299.7, 'count': 0.0}, {'lower_bound': 299.7, 'upper_bound': 399.6, 'count': 0.0}, {'lower_bound': 399.6, 'upper_bound': 499.5, 'count': 0.0}, {'lower_bound': 499.5, 'upper_bound': 599.4, 'count': 0.0}, {'lower_bound': 599.4, 'upper_bound': 699.3, 'count': 0.0}, {'lower_bound': 699.3, 'upper_bound': 799.2, 'count': 0.0}, {'lower_bound': 799.2, 'upper_bound': 899.1, 'count': 0.0}, {'lower_bound': 899.1, 'upper_bound': 999.0, 'count': 39672.0}]",0.64,2048.0,"[[], [], [1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, ...], [999.0], [999.0], [3.0, 6.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, 999.0, ...]]"
3,_c3,Integral,41188,0,0.172963,7124.0,0.494895,0.0,7.0,"[{'lower_bound': 0.0, 'upper_bound': 0.7, 'count': 35572.0}, {'lower_bound': 0.7, 'upper_bound': 1.4, 'count': 4548.0}, {'lower_bound': 1.4, 'upper_bound': 2.1, 'count': 760.0}, {'lower_bound': 2.1, 'upper_bound': 2.8, 'count': 0.0}, {'lower_bound': 2.8, 'upper_bound': 3.5, 'count': 216.0}, {'lower_bound': 3.5, 'upper_bound': 4.2, 'count': 68.0}, {'lower_bound': 4.2, 'upper_bound': 4.9, 'count': 0.0}, {'lower_bound': 4.9, 'upper_bound': 5.6, 'count': 16.0}, {'lower_bound': 5.6, 'upper_bound': 6.3, 'count': 4.0}, {'lower_bound': 6.3, 'upper_bound': 7.0, 'count': 4.0}]",0.64,2048.0,"[[], [], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...], [2.0], [2.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...]]"
4,_c4,Integral,41188,0,0.963217,39673.0,0.188228,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 1516.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 39672.0}]",0.64,2048.0,"[[], [], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...], [1.0], [1.0], [0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...]]"


In [158]:
pd.json_normalize(data_monitor.latest_monitoring_statistics().body_dict["features"]).head()

Unnamed: 0,name,inferred_type,numerical_statistics.common.num_present,numerical_statistics.common.num_missing,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max,numerical_statistics.distribution.kll.buckets,numerical_statistics.distribution.kll.sketch.parameters.c,numerical_statistics.distribution.kll.sketch.parameters.k,numerical_statistics.distribution.kll.sketch.data
0,_c0,Integral,100,0,-100.0,-10000.0,0.0,-100.0,-100.0,"[{'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 0.0}, {'lower_bound': -100.0, 'upper_bound': -100.0, 'count': 100.0}]",0.64,2048.0,"[[-100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0, -100.0]]"
1,_c1,Integral,100,0,2.62,262.0,2.777697,1.0,22.0,"[{'lower_bound': 1.0, 'upper_bound': 3.1, 'count': 82.0}, {'lower_bound': 3.1, 'upper_bound': 5.2, 'count': 10.0}, {'lower_bound': 5.2, 'upper_bound': 7.3, 'count': 2.0}, {'lower_bound': 7.3, 'upper_bound': 9.4, 'count': 3.0}, {'lower_bound': 9.4, 'upper_bound': 11.5, 'count': 2.0}, {'lower_bound': 11.5, 'upper_bound': 13.6, 'count': 0.0}, {'lower_bound': 13.6, 'upper_bound': 15.7, 'count': 0.0}, {'lower_bound': 15.7, 'upper_bound': 17.8, 'count': 0.0}, {'lower_bound': 17.8, 'upper_bound': 19.9, 'count': 0.0}, {'lower_bound': 19.9, 'upper_bound': 22.0, 'count': 1.0}]",0.64,2048.0,"[[1.0, 3.0, 4.0, 2.0, 1.0, 5.0, 22.0, 4.0, 3.0, 3.0, 2.0, 2.0, 3.0, 10.0, 1.0, 1.0, 1.0, 1.0, 3.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 3.0, 1.0, 3.0, 1.0, 2.0, 3.0, 3.0, 1.0, 3.0, 1.0, 1.0, 10.0, 1.0, 1.0, 1.0, 3.0, 1.0, 5.0, 2.0, 6.0, 4.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 1.0, 4.0, 1.0, 2.0, 1.0, 3.0, 5.0, 2.0, 1.0, 1.0, 6.0, 8.0, 5.0, 3.0, 2.0, 1.0, 4.0, 1.0, 2.0, 2.0, 1.0, 1.0, 3.0, 4.0, 3.0, 3.0, 1.0, 8.0, 1.0, 2.0, 2.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 9.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0]]"
2,_c2,Integral,100,0,0.05,5.0,0.217945,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 95.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 5.0}]",0.64,2048.0,"[[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]"
3,_c3,Integral,100,0,0.06,6.0,0.237487,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 94.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 6.0}]",0.64,2048.0,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]"
4,_c4,Integral,100,0,0.02,2.0,0.14,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 98.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 2.0}]",0.64,2048.0,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]"


#### View a violation report
Model monitor outputs any violations compared to the baseline to a violation report.

In [168]:
violations = data_monitor.latest_monitoring_constraint_violations()

In [166]:
violations

<sagemaker.model_monitor.monitoring_files.ConstraintViolations at 0x7efc8a8ebdd0>

In [160]:
if not violations:
    print("No constraint violations report found")
else:
    violations_df = pd.json_normalize(violations.body_dict["violations"]).head()

In [161]:
violations_df

---

## Part 2: Monitor model quality
Model quality monitoring jobs monitor the performance of a model by comparing the predictions that the model makes with the actual ground truth labels that the model attempts to predict. To do this, model quality monitoring merges data that is captured from real-time inference with actual labels (ground truth labels) that you store in an Amazon S3 bucket, and then compares the predictions with the ground truth labels.

Model quality monitoring follows the same steps as data quality monitoring, but adds an additional step of merging the ground truth labels from Amazon S3 with the predictions captured from the real-time inference endpoint.

To monitor model quality, follow these steps:
1. Enable [data capture](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-capture.html)
1. [Create a baseline](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-baseline.html). A baseline job compares predictions from the model with ground truth labels in a baseline dataset
1. [Schedule monitoring jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-schedule.html)
1. [Ingest ground truth labels](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-merge.html) that model monitor merges with captured prediction data from real-time inference endpoint
1. [Intepret the results](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-results.html)
1. [Integrate model quality monitoring](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-cw.html) with Amazon CloudWatch and Amazon EventBridge

![](img/model-monitoring-architecture.png)

In the following sections you implement the model quality monitoring in this lab environment.

### Define helper functions
Some helper functions for model quality monitoring setup.

In [190]:
def generate_ground_truth_with_id(inference_id):
    # set random seed to get consistent results
    random.seed(inference_id) 
    rand = random.random()
    
    # format required by the merge container.
    return {
        "groundTruthData": {
            "data": "0" if rand < 0.5 else "1", #str(rand),
            "encoding": "CSV",
        },
        "eventMetadata": {
            "eventId": str(inference_id), # eventId must correlate with the eventId in the data capture file
        },
        "eventVersion": "0",
    }

In [191]:
def upload_ground_truth(ground_truth_upload_s3_url, file_name, records, upload_time):
    target_s3_uri = f"{ground_truth_upload_s3_url}/{upload_time:%Y/%m/%d/%H}/{file_name}"
    number_of_records = len(records.split('\n'))
    print(f"Uploading {number_of_records} records to {target_s3_uri}")
    
    S3Uploader.upload_string_as_file_body(records, target_s3_uri)
    
    return target_s3_uri

### Create a model quality monitor
Use the Python SDK class [`ModelQualityMonitor`](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.ModelQualityMonitor) to create a model quality monitor and interact with it:

In [192]:
model_monitor = ModelQualityMonitor(
    role=sm_role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    volume_size_in_gb=20,
    max_runtime_in_seconds=1800,
    sagemaker_session=session
)

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: .
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


### Run a model quality baseline job
Your model building pipeline in the [step 3](03-sagemaker-pipeline.ipynb) notebook saved the model predictions on the test dataset. Now you use the model monitor to establish a [model performance baseline](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-baseline.html). The baseline dataset contains three columns with `predictions`, `probability`, and `label` values.

In [193]:
!aws s3 ls {prediction_baseline_s3_url}/

2023-02-13 10:58:31      63064 prediction_baseline.csv


In [194]:
prediction_baseline_results_s3_url = f"{prediction_baseline_s3_url}/results"
model_mon_reports_s3_url = f"{prediction_baseline_s3_url}/reports"
prediction_baseline_dataset_uri = f"{prediction_baseline_s3_url}/prediction_baseline.csv"

In [195]:
model_baseline_job_name = f"from-idea-to-prod-model-baselining-{strftime('%d-%H-%M-%S', gmtime())}-{str(uuid.uuid4())[:8]}"

model_baseline_job = model_monitor.suggest_baseline(
    baseline_dataset=prediction_baseline_dataset_uri,
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri = prediction_baseline_results_s3_url, 
    problem_type="BinaryClassification",
    inference_attribute= "prediction", # The column in the dataset that contains predictions
    probability_attribute= "probability", # The column in the dataset that contains probabilities
    ground_truth_attribute= "label", # The column in the dataset that contains ground truth labels
    job_name=model_baseline_job_name,
)

print(model_baseline_job_name)

INFO:sagemaker:Creating processing-job with name from-idea-to-prod-model-baselining-13-20-51-05-baa3f00d


from-idea-to-prod-model-baselining-13-20-51-05-baa3f00d


In [196]:
model_baseline_job.wait(logs=False)

!

### Inspect the generated baseline statistics and constraints


In [197]:
!aws s3 ls {prediction_baseline_results_s3_url}/

2023-02-13 20:56:44       1372 constraints.json
2023-02-13 20:56:44      86674 statistics.json


In [198]:
latest_model_baseline_job = model_monitor.latest_baselining_job
pd.DataFrame(latest_model_baseline_job.suggested_constraints().body_dict["binary_classification_constraints"]).T

Unnamed: 0,threshold,comparison_operator
recall,0.196687,LessThanThreshold
precision,0.693431,LessThanThreshold
accuracy,0.895606,LessThanThreshold
true_positive_rate,0.196687,LessThanThreshold
true_negative_rate,0.988449,LessThanThreshold
false_positive_rate,0.011551,GreaterThanThreshold
false_negative_rate,0.803313,GreaterThanThreshold
auc,0.770121,LessThanThreshold
f0_5,0.460718,LessThanThreshold
f1,0.306452,LessThanThreshold


In [199]:
pd.DataFrame(latest_model_baseline_job.baseline_statistics().body_dict["binary_classification_metrics"]["confusion_matrix"])

Unnamed: 0,0,1
0,3594,388
1,42,95


In [200]:
pd.json_normalize(latest_model_baseline_job.baseline_statistics().body_dict["binary_classification_metrics"]).T

Unnamed: 0,0
confusion_matrix.0.0,3594
confusion_matrix.0.1,42
confusion_matrix.1.0,388
confusion_matrix.1.1,95
recall.value,0.196687
recall.standard_deviation,0.005727
precision.value,0.693431
precision.standard_deviation,0.013516
accuracy.value,0.895606
accuracy.standard_deviation,0.001225


### Generate endpoint traffic
Generate synthetic traffic to the endpoint to capture inference input and output.

In [201]:
# Remove previous data capture saved to the S3 bucket
latest_data_capture_s3_url = get_latest_data_capture_s3_url(data_capture_bucket, data_capture_prefix)

Found 2 files in s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture
Latest data capture S3 url: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/16


In [202]:
# uncomment the next line to remove the previous data capture files
# !aws s3 rm {latest_data_capture_s3_url} --recursive

In [203]:
test_x.shape

(100, 59)

In [204]:
generate_endpoint_traffic(predictor, test_x)

100%|██████████| 100/100 [00:01<00:00, 77.46it/s]


Wait until captured data appears in the Amazon S3 bucket, it may take several minutes. The capture data is delivered to the Amazon S3 prefix `{data-capture-prefix}/{EndpointName}/{VariantName}/{year}/{month}/{day}/{UTC hour}`.

In [206]:
!aws s3 ls {data_capture_s3_url} --recursive

2023-02-13 15:53:40      48702 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/15/52-32-739-b3380519-6e72-4079-ac47-13312a2eb253.jsonl
2023-02-13 16:21:30      48890 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/16/20-24-369-8f1b57e7-8215-4711-9889-c908d356f5fc.jsonl
2023-02-13 21:04:40      48702 from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21/03-35-139-5a322d84-4197-4e5e-86e2-8ec9b36f5fb7.jsonl


### Ingest ground truth data
<div class="alert alert-info"> 💡 <strong> Run this section only after the capture data from the latest endpoint invocations has appeared in the Amazon S3 bucket. The capture data is organized based on the UTC hour in which the invocation happened.</strong>
</div>

For model monitoring you must have ground truths labels that the model monitor merges with captured inference data from the endpoint.

In this lab environment you generate synthetic ground truth data to use with the model quality monitoring. In a real-time project you need to implement a workflow to produce and store the ground truth labels to evaluate the quality of the model predictions.

The following code cells generate and save synthetic ground truth labels for all inference records in the latest capture data files.

In [207]:
# Set the S3 url where to store the ground truth labels
variant_name = sm.describe_endpoint(EndpointName=predictor.endpoint_name)["ProductionVariants"][0]["VariantName"]
ground_truth_upload_s3_url = f"s3://{data_capture_bucket}/ground_truth_data/{predictor.endpoint_name}/{variant_name}"
ground_truth_upload_s3_url

's3://sagemaker-us-east-1-906545278380/ground_truth_data/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic'

In [208]:
# Get the S3 prefix where the latest capture data has been delivered
latest_data_capture_s3_url = get_latest_data_capture_s3_url(data_capture_bucket, data_capture_prefix)
latest_data_capture_prefix = '/'.join(latest_data_capture_s3_url.split('/')[3:])

Found 3 files in s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture
Latest data capture S3 url: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21


In [209]:
# Get the list of capture data file prefixes in the latest capture data location
capture_files = get_file_list(data_capture_bucket, latest_data_capture_prefix)

assert capture_files, f"No capture data files found in {latest_data_capture_prefix}. Generate endpoint traffic and wait until capture data appears in the bucket!"

# For each capture data file get the eventIds and generate correlated ground truth labels
for f in capture_files:
    f_name = f.split('/')[-1]
    
    print(f"Downloading {f}")
    S3Downloader.download(f"s3://{data_capture_bucket}/{f}", "./tmp")
    
    print(f"Reading inference ids from the file: ./tmp/{f_name}")
    with jsonlines.open(f"./tmp/{f_name}") as reader: 
        ground_truth_records = "\n".join([
            json.dumps(r) for r in [generate_ground_truth_with_id(l["eventMetadata"]["eventId"]) for l in reader]
        ])
    lastest_ground_truth_s3_uri = upload_ground_truth(ground_truth_upload_s3_url, f"gt-{f_name}", ground_truth_records, datetime.utcnow())

Found 1 files in s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21
Downloading from-idea-to-prod/xgboost/data-capture/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21/03-35-139-5a322d84-4197-4e5e-86e2-8ec9b36f5fb7.jsonl
Reading inference ids from the file: ./tmp/03-35-139-5a322d84-4197-4e5e-86e2-8ec9b36f5fb7.jsonl
Uploading 100 records to s3://sagemaker-us-east-1-906545278380/ground_truth_data/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21/gt-03-35-139-5a322d84-4197-4e5e-86e2-8ec9b36f5fb7.jsonl


In [210]:
# List uploaded ground truth files
!aws s3 ls {ground_truth_upload_s3_url} --recursive

2023-02-13 21:13:05      14499 ground_truth_data/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21/gt-03-35-139-5a322d84-4197-4e5e-86e2-8ec9b36f5fb7.jsonl


Download the last ingested ground truth data file and see it's content:

In [211]:
# Download the last ground truth file to Studio's EFS
!aws s3 cp {lastest_ground_truth_s3_uri} ./tmp/groundtruth.jsonl

download: s3://sagemaker-us-east-1-906545278380/ground_truth_data/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21/gt-03-35-139-5a322d84-4197-4e5e-86e2-8ec9b36f5fb7.jsonl to tmp/groundtruth.jsonl


In [212]:
!head ./tmp/groundtruth.jsonl

{"groundTruthData": {"data": "0", "encoding": "CSV"}, "eventMetadata": {"eventId": "24a6c6fb-f252-4e74-899a-c5c689aeecef"}, "eventVersion": "0"}
{"groundTruthData": {"data": "0", "encoding": "CSV"}, "eventMetadata": {"eventId": "c57f8e54-1e74-41c6-9ad0-5aec10f23918"}, "eventVersion": "0"}
{"groundTruthData": {"data": "1", "encoding": "CSV"}, "eventMetadata": {"eventId": "76a4d9ed-8491-4de7-9046-07203614aa5b"}, "eventVersion": "0"}
{"groundTruthData": {"data": "0", "encoding": "CSV"}, "eventMetadata": {"eventId": "1f55c427-0476-4d8f-9b73-404c875a9a18"}, "eventVersion": "0"}
{"groundTruthData": {"data": "1", "encoding": "CSV"}, "eventMetadata": {"eventId": "91393a9d-ad0d-47af-a47f-5228afce8f4b"}, "eventVersion": "0"}
{"groundTruthData": {"data": "0", "encoding": "CSV"}, "eventMetadata": {"eventId": "ca7760a5-3ec7-4385-b509-86cb4804743c"}, "eventVersion": "0"}
{"groundTruthData": {"data": "0", "encoding": "CSV"}, "eventMetadata": {"eventId": "ace0b5f2-17f2-4454-b797-660ec923b260"}, "event

### Create a model monitoring schedule
Now after you have the capture data and the ground truth data, you can create a model monitoring schedule.
Use [`create_monitoring_schedule()`](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.ModelQualityMonitor.create_monitoring_schedule) method of the `ModelQualityMonitor` class to create a model quality monitoring schedule.

In [213]:
endpoint_input = EndpointInput(
    endpoint_name=predictor.endpoint_name,
    probability_attribute="0",
    probability_threshold_attribute=0.5,
    destination="/opt/ml/processing/input_data",
)

In [214]:
model_mon_schedule_name = "from-idea-to-prod-model-monitor-schedule-" + strftime(
    "%Y-%m-%d-%H-%M-%S", gmtime()
)

model_monitor.create_monitoring_schedule(
    monitor_schedule_name=model_mon_schedule_name,
    endpoint_input=endpoint_input,
    problem_type="BinaryClassification",
    # record_preprocessor_script=f"{record_preprocessor_s3_url}/record_preprocessor.py",
    # post_analytics_processor_script=s3_code_postprocessor_uri,
    output_s3_uri=model_mon_reports_s3_url,
    ground_truth_input=ground_truth_upload_s3_url,
    constraints=model_monitor.suggested_constraints() if model_monitor.latest_baselining_job else f"{prediction_baseline_results_s3_url}/constraints.json",
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
)

INFO:sagemaker.model_monitor.model_monitoring:Creating Monitoring Schedule with name: from-idea-to-prod-model-monitor-schedule-2023-02-13-21-13-30


In [215]:
while model_monitor.describe_schedule()['MonitoringScheduleStatus'] != "Scheduled":
    print(f"Waiting until model monitoring status becomes Scheduled")
    time.sleep(3)
    
model_monitor.describe_schedule()

Waiting until model monitoring status becomes Scheduled
Waiting until model monitoring status becomes Scheduled


{'MonitoringScheduleArn': 'arn:aws:sagemaker:us-east-1:906545278380:monitoring-schedule/from-idea-to-prod-model-monitor-schedule-2023-02-13-21-13-30',
 'MonitoringScheduleName': 'from-idea-to-prod-model-monitor-schedule-2023-02-13-21-13-30',
 'MonitoringScheduleStatus': 'Scheduled',
 'MonitoringType': 'ModelQuality',
 'CreationTime': datetime.datetime(2023, 2, 13, 21, 13, 30, 605000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2023, 2, 13, 21, 13, 38, 776000, tzinfo=tzlocal()),
 'MonitoringScheduleConfig': {'ScheduleConfig': {'ScheduleExpression': 'cron(0 * ? * * *)'},
  'MonitoringJobDefinitionName': 'model-quality-job-definition-2023-02-13-21-13-30-269',
  'MonitoringType': 'ModelQuality'},
 'EndpointName': 'from-idea-to-prod-endpoint-13-15-39-04',
 'ResponseMetadata': {'RequestId': 'f1e56b82-f307-4dc8-8405-49d91899e403',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'f1e56b82-f307-4dc8-8405-49d91899e403',
   'content-type': 'application/x-amz-json-1.1'

The endpoint has two scheduled monitors now, a data quality and a model quality monitor:

In [216]:
predictor.list_monitors()

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: .
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: .
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


[<sagemaker.model_monitor.model_monitoring.ModelQualityMonitor at 0x7efc82031cd0>,
 <sagemaker.model_monitor.model_monitoring.DefaultModelMonitor at 0x7efc8205e0d0>]

### See model monitoring schedule executions
<div class="alert alert-info"> 💡 You created a model monitoring schedule which runs every hour. <strong>You need to wait until you cross the hour boundary to see any executions.</strong>
</div>
 
A monitoring job started by the schedule looks for the ground truth data under the Amazon S3 prefix `{ground_truth_upload_s3_url}/{year}/{month}/{day}/{UTC hour}/`. If there is no ground truth label datasets under this prefix, the model monitoring job fails with an exception `No S3 objects found under S3 URL ...`. In the previous section **Ingest ground truth data** you created a synthetic ground truth dataset and saved it under the correct prefix.

Model quality monitor runs two processing jobs for each schedule execution:
1. A ground truth merge job to contatenate capture data and ground truth label datasets based on the `eventId`
2. A model quality monitoring job to evaluate model performance compared to the baseline

You can see these two jobs for each monitor execution in the SageMaker console under **Processing jobs**:
![](img/model-quality-monitor-execution.png)

#### Inspect the lastest model monitor execution

In [219]:
# call describe_schedule to see the status of the latest completed execution
model_monitor.describe_schedule()

{'MonitoringScheduleArn': 'arn:aws:sagemaker:us-east-1:906545278380:monitoring-schedule/from-idea-to-prod-model-monitor-schedule-2023-02-13-21-13-30',
 'MonitoringScheduleName': 'from-idea-to-prod-model-monitor-schedule-2023-02-13-21-13-30',
 'MonitoringScheduleStatus': 'Scheduled',
 'MonitoringType': 'ModelQuality',
 'CreationTime': datetime.datetime(2023, 2, 13, 21, 13, 30, 605000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2023, 2, 14, 9, 14, 21, 835000, tzinfo=tzlocal()),
 'MonitoringScheduleConfig': {'ScheduleConfig': {'ScheduleExpression': 'cron(0 * ? * * *)'},
  'MonitoringJobDefinitionName': 'model-quality-job-definition-2023-02-13-21-13-30-269',
  'MonitoringType': 'ModelQuality'},
 'EndpointName': 'from-idea-to-prod-endpoint-13-15-39-04',
 'LastMonitoringExecutionSummary': {'MonitoringScheduleName': 'from-idea-to-prod-model-monitor-schedule-2023-02-13-21-13-30',
  'ScheduledTime': datetime.datetime(2023, 2, 14, 9, 0, tzinfo=tzlocal()),
  'CreationTime': datetim

In [220]:
# List all _completed_ model monitor executions
model_mon_executions = model_monitor.list_executions()

In [221]:
model_mon_executions

[<sagemaker.model_monitor.model_monitoring.MonitoringExecution at 0x7efc81ce8f10>]

In [222]:
# See the details of the latest model monitor execution
latest_model_mon_execution = get_latest_monitor_execution(model_monitor)
execution_details = latest_model_mon_execution.describe()
execution_details

!Latest execution status: Completed
Latest execution result: CompletedWithViolations: Job completed successfully with 10 violations.
Report Uri: s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/prediction_baseline/reports/from-idea-to-prod-endpoint-13-15-39-04/from-idea-to-prod-model-monitor-schedule-2023-02-13-21-13-30/2023/02/13/22


{'ProcessingInputs': [{'InputName': 'constraints',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/prediction_baseline/results/constraints.json',
    'LocalPath': '/opt/ml/processing/baseline/constraints',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated'}},
  {'InputName': 'endpoint_input_1',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/prediction_baseline/reports/merge/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21',
    'LocalPath': '/opt/ml/processing/input_data/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated',
    'S3CompressionType': 'None'}}],
 'ProcessingOutputConfig': {'Outputs': [{'OutputName': 'result',
    'S3Output': {'S3Uri': 's3://sagemaker-us-east-1

#### See the execution reports
Each completed model monitor execution produces new statistics, constraints, and violations reports for the capture data. You have various ways to access these reports:
- directly access the files on Amazon S3 under the job output S3 uri
- use the Python SDK class [`MonitoringExecution`](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.MonitoringExecution)
- use [`latest_monitoring_statistics`](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.ModelMonitor.latest_monitoring_statistics) and [`latest_monitoring_constraint_violations`](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.ModelMonitor.latest_monitoring_constraint_violations) methods of the [`ModelMonitor`](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.ModelMonitor) class

In [223]:
# Get the job output S3 uri
mon_job_output_s3_uri = execution_details["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
mon_job_output_s3_uri

's3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/prediction_baseline/reports/from-idea-to-prod-endpoint-13-15-39-04/from-idea-to-prod-model-monitor-schedule-2023-02-13-21-13-30/2023/02/13/22'

In [224]:
# Same S3 uri is accessible via the MonitoringExecution class
latest_model_mon_execution.output.destination

's3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/prediction_baseline/reports/from-idea-to-prod-endpoint-13-15-39-04/from-idea-to-prod-model-monitor-schedule-2023-02-13-21-13-30/2023/02/13/22'

In [225]:
# See the generated files - new statistics, constraints, and violations
!aws s3 ls {mon_job_output_s3_uri} --recursive

2023-02-13 22:22:58       2002 from-idea-to-prod/xgboost/prediction_baseline/reports/from-idea-to-prod-endpoint-13-15-39-04/from-idea-to-prod-model-monitor-schedule-2023-02-13-21-13-30/2023/02/13/22/constraint_violations.json
2023-02-13 22:22:58       1351 from-idea-to-prod/xgboost/prediction_baseline/reports/from-idea-to-prod-endpoint-13-15-39-04/from-idea-to-prod-model-monitor-schedule-2023-02-13-21-13-30/2023/02/13/22/constraints.json
2023-02-13 22:22:58       9202 from-idea-to-prod/xgboost/prediction_baseline/reports/from-idea-to-prod-endpoint-13-15-39-04/from-idea-to-prod-model-monitor-schedule-2023-02-13-21-13-30/2023/02/13/22/statistics.json


<div class="alert alert-info"> 💡 Since you generated random synthetic ground truth labels, you expect to see some violations, more specifically, `LessThanThreshold` constraint violation for various model performance metrics, such as `auc`, `accuracy`, and `precision`.
</div>

In [279]:
# Get the violation report from the MonitoringExecution class
last_execution_violations = latest_model_mon_execution.constraint_violations()

In [280]:
pd.json_normalize(last_execution_violations.body_dict["violations"]).head()

Unnamed: 0,constraint_check_type,description,metric_name
0,LessThanThreshold,Metric auc with 0.5276442307692306 was LessThanThreshold '0.7701214220800956',auc
1,LessThanThreshold,Metric precision with 0.5 was LessThanThreshold '0.6934306569343066',precision
2,LessThanThreshold,Metric truePositiveRate with 0.020833333333333332 was LessThanThreshold '0.19668737060041408',truePositiveRate
3,LessThanThreshold,Metric f1 with 0.039999999999999994 was LessThanThreshold '0.3064516129032258',f1
4,LessThanThreshold,Metric accuracy with 0.52 was LessThanThreshold '0.8956057295460063',accuracy


You can access the violation report directly from the model monitor class:

In [228]:
# Use the ModelMonitor class
violations = model_monitor.latest_monitoring_constraint_violations()
pd.json_normalize(violations.body_dict["violations"]).head()

Unnamed: 0,constraint_check_type,description,metric_name
0,LessThanThreshold,Metric auc with 0.5276442307692306 was LessThanThreshold '0.7701214220800956',auc
1,LessThanThreshold,Metric precision with 0.5 was LessThanThreshold '0.6934306569343066',precision
2,LessThanThreshold,Metric truePositiveRate with 0.020833333333333332 was LessThanThreshold '0.19668737060041408',truePositiveRate
3,LessThanThreshold,Metric f1 with 0.039999999999999994 was LessThanThreshold '0.3064516129032258',f1
4,LessThanThreshold,Metric accuracy with 0.52 was LessThanThreshold '0.8956057295460063',accuracy


#### See the merged datasets
Finally let's take a look on the merged datasets generated by the merge job. The merged dataset contains inference input, inference output, and the ingested ground truth labels. The inference output and the ground truth are connected via `eventMetadata.eventId` identifier.

In [229]:
# Get the S3 url to the merge datasets from the monitor job inputs
mon_job_merge_input_s3_uri = execution_details["ProcessingInputs"][1]["S3Input"]["S3Uri"]

mon_job_merge_bucket = mon_job_merge_input_s3_uri.split('/')[2]
mon_job_merge_prefix = '/'.join(mon_job_merge_input_s3_uri.split('/')[3:])

In [230]:
merge_files = get_file_list(mon_job_merge_bucket, mon_job_merge_prefix)

if merge_files:
    S3Downloader.download(f"s3://{mon_job_merge_bucket}/{merge_files[0]}", f"./tmp")

    print(f"Content of the merge file:")
    # Read the jsonl file and show two first objects
    with jsonlines.open(f"./tmp/{merge_files[0].split('/')[-1]}") as reader:      
        print(json.dumps(reader.read(), indent=2))
        print(json.dumps(reader.read(), indent=2))

Found 1 files in s3://sagemaker-us-east-1-906545278380/from-idea-to-prod/xgboost/prediction_baseline/reports/merge/from-idea-to-prod-endpoint-13-15-39-04/AllTraffic/2023/02/13/21
Content of the merge file:
{
  "eventVersion": "0",
  "groundTruthData": {
    "data": "0",
    "encoding": "CSV"
  },
  "captureData": {
    "endpointInput": {
      "data": "31,1,999,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0",
      "encoding": "CSV",
      "mode": "INPUT",
      "observedContentType": "text/csv"
    },
    "endpointOutput": {
      "data": "0.09129378944635391\n",
      "encoding": "CSV",
      "mode": "OUTPUT",
      "observedContentType": "text/csv; charset=utf-8"
    }
  },
  "eventMetadata": {
    "eventId": "24a6c6fb-f252-4e74-899a-c5c689aeecef",
    "inferenceTime": "2023-02-13T21:03:35Z"
  }
}
{
  "eventVersion": "0",
  "groundTruthData": {
    "data": "0",
    "encoding": "CSV"
  },
  "captureData": {
    "endpoin

## Additional monitoring
Additionally to data and model quality monitoring with Model Monitor, you can use Amazon SageMaker Clarify to:
- [Monitor bias drift](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-bias-drift.html)
- [Monitor feature attribution drift](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-feature-attribution-drift.html)

Refer to a sample notebook [Monitoring bias drift and feature attribution drift Amazon SageMaker Clarify](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/fairness_and_explainability/SageMaker-Model-Monitor-Fairness-and-Explainability.html) for a hands-on example and more details.

## Use SageMaker Studio for data and model monitoring
You can use Studio UX to enable and configure data and model monitoring and to visualize results. You can view the details of any monitoring job run, and you can create charts that show the baseline and captured values for any metric that the monitoring job calculates.

Navigate to **Home** to the left side bar and choose **Deployments** and then **Endpoints** in the list. Click on an endpoint for which you would like to configure the model monitoring:

![](img/endpoints.png)

In the displayed **Endpoint details** pane you can configure data and model monitoring:

![](img/model-monitoring-ux.png)

## Clean-up resources
Stop and remove monitoring schedule for the endpont.

In [None]:
for monitor in predictor.list_monitors():
    try:
        monitor.stop_monitoring_schedule()
        monitor.delete_monitoring_schedule()
    except botocore.exceptions.ClientError as e:
        if e.response['Error']['Code'] == 'ValidationException':
            print(f"ValidationException: {e.response['Error']['Message']}. Wait until the monitoring job is done and run the cell again.")
        else:
            raise e

### Final clean-up
This is the last notebook in this workshop. If you are finished with exploration, to avoid charges on your AWS account, run the [clean-up notebook](99-clean-up.ipynb).

<div class="alert alert-info">
You have at least one real-time endpoint active in your AWS account. To avoid charges, you must delete the endpoint. Go to the clean-up notebook.
</div>

## Further development ideas for your real-world projects
- Add [visualizations](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/visualization/SageMaker-Model-Monitor-Visualize.html) for model monitoring reports
- Add data baselining, explainability report generation, and bias report to the model building pipeline
- Implement [model quality monitoring](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality.html)
- Try different inference options such as [serverless](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html) or [asynchronous](https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html) inference
- Address security considerations for your ML environment and solutions. Start with the developer guide [Security in Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/security.html)
- Implement [deployment guardrails](https://docs.aws.amazon.com/sagemaker/latest/dg/deployment-guardrails.html) to control how to update your models in production

## Additional resources
- [AmazonSageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models](https://assets.amazon.science/97/cc/8dc8526547859351f46d2710aba9/amazon-sagemaker-model-monitor-a-system-for-real-time-insights-into-deployed-machine-learning-models.pdf)
- [Monitor models for data and model quality, bias, and explainability](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html)
- [Monitor data quality](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-quality.html)
- [Model Monitor visualizations](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/visualization/SageMaker-Model-Monitor-Visualize.html)
- [Monitor Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-overview.html)
- [Monitoring a Model in Production](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-model-monitor.html)
- [ModelMonitor for batch transform jobs](https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-sagemaker-model-monitor-batch-transform-jobs/)
- [Security in Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/security.html)
- [Deployment guardrails](https://docs.aws.amazon.com/sagemaker/latest/dg/deployment-guardrails.html)
- [Design a compelling record filtering method with Amazon SageMaker Model Monitor](https://aws.amazon.com/blogs/machine-learning/design-a-compelling-record-filtering-method-with-amazon-sagemaker-model-monitor/)

# Shutdown kernel

In [None]:
%%html

<p><b>Shutting down your kernel for this notebook to release resources.</b></p>
<button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>
        
<script>
try {
    els = document.getElementsByClassName("sm-command-button");
    els[0].click();
}
catch(err) {
    // NoOp
}    
</script>