# SageMaker Model Monitoring 

<div class="alert alert-warning"> This notebook has been last tested on a SageMaker Studio JupyterLab instance using the <code>SageMaker Distribution Image 3.0.1</code> and with the SageMaker Python SDK version <code>2.245.0</code></div>

In this notebook you are going to use [Amazon SageMaker model monitor](https://aws.amazon.com/sagemaker/model-monitor/) to add continuous and automated [monitoring of the data quality](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-quality.html) for the traffic to your real-time SageMaker inference endpoints. You also implement [model monitoring](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality.html) to detect performance drift and model metric anomalies.

Using Model Monitor integration with [Amazon EventBridge](https://aws.amazon.com/eventbridge/) you can implement automated response and remediation to any detected issues with data and model quality. For example, you can launch an automated model retraining if the model performance falls below a specific threshold.

Additionally to data and model quality monitoring you can implement [bias drift](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-bias-drift.html) and [feature attribution drift](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-feature-attribution-drift.html) monitoring.
    
##  Context

In this deployment module, you have:
1. ‚úÖ **Pre-provisioned**: SageMaker Unified Studio domain with registered models
2. ‚úÖ **Deployed**: SageMaker endpoint with data capture enabled - preprovsioned 
3. ‚úÖ **Tested**: Basic endpoint functionality in the previous notebook
4. üéØ **Now**: Set up comprehensive model monitoring

## Prerequisites
- Completed Lab 5.1: Model approved and triggered CDK for endpoint deployment
- SageMaker endpoint with data capture enabled
- **IAM Permissions**: Your execution role must have Model Monitor permissions

### ‚ö†Ô∏è Important: IAM Permissions Required

Your IAM role needs these additional permissions for Model Monitor:

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreateDataQualityJobDefinition",
                "sagemaker:CreateModelQualityJobDefinition",
                "sagemaker:CreateMonitoringSchedule",
                "sagemaker:DescribeMonitoringSchedule",
                "sagemaker:ListMonitoringSchedules",
                "sagemaker:StopMonitoringSchedule",
                "sagemaker:DeleteMonitoringSchedule",
                "sagemaker:CreateProcessingJob",
                "sagemaker:DescribeProcessingJob"
            ],
            "Resource": "*"
        }
    ]
}
```

**If you get AccessDeniedException errors:**
1. Go to AWS Console ‚Üí IAM ‚Üí Roles
2. Find your execution role (shown in setup section below)
3. Add the above permissions as an inline policy

---

<div class="alert alert-info"> üí°
This notebook contains two parts:<br/>
- Part 1: Monitor data quality<br/>
- Part 2: Monitor model quality<br/>
<br/>

You need approximately between 60 and 90 minutes to go through this notebook. To optimize time you can execute both parts independently. For both parts you must execute all following sections up to the <strong>Part 1</storng>.
</div>

<div class="alert alert-info"> Make sure you using <code>Python 3</code> kernel in JupyterLab for this notebook.</div>

## Model Monitoring Architecture

Amazon SageMaker Model Monitor provides continuous monitoring capabilities:

![Model Monitoring Architecture](images/model-monitoring-architecture.png)

### Key Components:
1. **Data Capture**: Real-time capture of inference requests and responses
2. **Baseline Creation**: Statistical baseline from training data
3. **Monitoring Jobs**: Scheduled analysis comparing live data to baseline
4. **Violation Reports**: Automated detection of data/model drift
5. **CloudWatch Integration**: Metrics and alerting
6. **EventBridge Integration**: Automated responses to violations

---


## How Model Monitor works
Amazon SageMaker Model Monitor automatically monitors ML models in production and notifies you when quality issues arise. Model Monitor uses rules to detect drift in your models and data and alerts you when it happens. The following figure shows how this process works.


The process for setting up and using the data monitoring:
1. Enable the SageMaker endpoint to capture data from incoming requests to a trained ML model and the resulting model predictions
2. Create a baseline from the dataset that was used to train the model. The baseline computes metrics and suggests constraints for the metrics. 
3. Create a monitoring schedule specifying what data to collect, how often to collect it, and how to analyze it. Data traffic to your model and predictions from the model are compared to the constraints, and are reported as violations if they are outside the constrained values. You can define multiple monitoring schedule per endpoint
4. Inspect the reports, which compare the latest data with the baseline, and watch for any violations reported and for metrics and notifications from Amazon CloudWatch
5. Implement observability for your ML models with Amazon CloudWatch and event-based architecture with Amazon EventBridge. You can automate data and model updates, model retraining, and user notification based on the data and model quality events

## Setup and Imports

In [1]:
# Install required packages
!pip install -q --use-pep517 sagemaker boto3 pandas numpy matplotlib seaborn tqdm jsonlines


In [2]:
import boto3
import sagemaker
import pandas as pd
import numpy as np
import json
import jsonlines
import time
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm

# SageMaker Model Monitor imports
from sagemaker.model_monitor import (
    DefaultModelMonitor,
    DataCaptureConfig,
    CronExpressionGenerator,
    ModelQualityMonitor,
    EndpointInput,
)
from sagemaker.model_monitor.dataset_format import DatasetFormat
from sagemaker.s3 import S3Downloader, S3Uploader
from sagemaker.predictor import Predictor

# Initialize clients and session
sm_client = boto3.client('sagemaker')
s3_client = boto3.client('s3')
session = sagemaker.Session()

# Handle execution role for both SageMaker and local environments
try:
    role = sagemaker.get_execution_role()
except ValueError:
    # For local development - replace with your actual SageMaker execution role ARN
    print("‚ö†Ô∏è  Running locally - using default role pattern")
    print("   Update this with your actual SageMaker execution role ARN if needed")
    sts_client = boto3.client('sts')
    account_id = sts_client.get_caller_identity()['Account']
    role = f"arn:aws:iam::{account_id}:role/SageMakerExecutionRole"

region = session.boto_region_name
bucket = session.default_bucket()

print(f"‚úÖ SageMaker SDK version: {sagemaker.__version__}")
print(f"‚úÖ Region: {region}")
print(f"‚úÖ Default bucket: {bucket}")
print(f"‚úÖ Role: {role}")

sagemaker.config INFO - Fetched defaults config from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3Bucket
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3ObjectKeyPrefix
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3Bucket
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3ObjectKeyPrefix
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3Bucket
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3ObjectKeyPrefix
‚úÖ SageMaker SDK version: 2.245.0
‚úÖ Region: us-west-2
‚úÖ Default bucket: amazon-sagemaker-006230620263-us-west-

## 1. Discover and Inspect Deployed Endpoint

First, let's find the endpoint deployed by our CDK stack and verify its data capture configuration.

In [3]:
def get_deployed_endpoints():
    """Get all InService endpoints with data capture enabled"""
    
    endpoints = []
    
    try:
        response = sm_client.list_endpoints(
            SortBy='CreationTime',
            SortOrder='Descending',
            MaxResults=20
        )
        
        for endpoint in response['Endpoints']:
            if endpoint['EndpointStatus'] == 'InService':
                # Get detailed endpoint info
                endpoint_details = sm_client.describe_endpoint(
                    EndpointName=endpoint['EndpointName']
                )
                
                # Check if data capture is enabled
                data_capture_config = endpoint_details.get('DataCaptureConfig', {})
                
                endpoints.append({
                    'name': endpoint['EndpointName'],
                    'status': endpoint['EndpointStatus'],
                    'creation_time': endpoint['CreationTime'],
                    'data_capture_enabled': data_capture_config.get('EnableCapture', False),
                    'data_capture_status': data_capture_config.get('CaptureStatus', 'Not Configured'),
                    'sampling_percentage': data_capture_config.get('CurrentSamplingPercentage', 0),
                    's3_uri': data_capture_config.get('DestinationS3Uri', 'Not Configured')
                })
        
        return endpoints
        
    except Exception as e:
        print(f"‚ùå Error listing endpoints: {e}")
        return []

# Get endpoints
endpoints = get_deployed_endpoints()

if endpoints:
    print(f"üìä Found {len(endpoints)} InService endpoint(s):")
    print("-" * 80)
    
    for i, ep in enumerate(endpoints, 1):
        print(f"{i}. {ep['name']}")
        print(f"   Status: {ep['status']}")
        print(f"   Created: {ep['creation_time']}")
        print(f"   Data Capture: {'‚úÖ Enabled' if ep['data_capture_enabled'] else '‚ùå Disabled'}")
        if ep['data_capture_enabled']:
            print(f"   Capture Status: {ep['data_capture_status']}")
            print(f"   Sampling: {ep['sampling_percentage']}%")
            print(f"   S3 Location: {ep['s3_uri']}")
        print()
else:
    print("‚ùå No InService endpoints found. Please deploy an endpoint first.")

üìä Found 2 InService endpoint(s):
--------------------------------------------------------------------------------
1. dev-endpoint-20250918-141753
   Status: InService
   Created: 2025-09-18 13:18:26.907000+00:00
   Data Capture: ‚úÖ Enabled
   Capture Status: Started
   Sampling: 100%
   S3 Location: s3://sagemaker-model-monitor-006230620263-us-west-2-dev/data-capture

2. dev-endpoint-20250828-084146
   Status: InService
   Created: 2025-08-28 07:42:39.695000+00:00
   Data Capture: ‚úÖ Enabled
   Capture Status: Started
   Sampling: 100%
   S3 Location: s3://sagemaker-model-monitor-006230620263-us-west-2-dev/data-capture



In [4]:
# Select the endpoint to monitor
if endpoints:
    # Auto-select the first endpoint with data capture enabled
    monitoring_endpoint = None
    for ep in endpoints:
        if ep['data_capture_enabled']:
            monitoring_endpoint = ep
            break
    
    if monitoring_endpoint:
        endpoint_name = monitoring_endpoint['name']
        data_capture_s3_uri = monitoring_endpoint['s3_uri']
        
        print(f"üéØ Selected endpoint for monitoring: {endpoint_name}")
        print(f"üìÅ Data capture location: {data_capture_s3_uri}")
    else:
        print("‚ùå No endpoints with data capture enabled found.")
        print("Please ensure your CDK deployment includes data capture configuration.")
        endpoint_name = None
else:
    endpoint_name = None
    print("‚ùå No endpoints available for monitoring.")

üéØ Selected endpoint for monitoring: dev-endpoint-20250918-141753
üìÅ Data capture location: s3://sagemaker-model-monitor-006230620263-us-west-2-dev/data-capture


In [5]:
# Get the data capture configuration for the endpoint
# endpoint_name = "model-deploy-16-21-26-26-staging" # must be set before, but you can use any suitable endpoint

if not endpoint_name:
    print(f"You must have at least on endpoint with data capture configuration enabled!")
else:
    print(f"Checking the data capture configuration for the endpoint {endpoint_name}")
    data_capture_config = sm_client.describe_endpoint(EndpointName=endpoint_name)['DataCaptureConfig']
    data_capture_s3_url = data_capture_config['DestinationS3Uri']
    data_capture_bucket = data_capture_s3_url.split('/')[2]
    data_capture_prefix = '/'.join(data_capture_s3_url.split('/')[3:])

    print(json.dumps(data_capture_config, indent=2))
    print(f"Data capture S3 url: {data_capture_s3_url}")
    
    if not data_capture_config['EnableCapture']:
        print(f"Data capture config for the endpoint {endpoint_name} IS NOT ENABLED. You need to enable data capture for monitoring")

Checking the data capture configuration for the endpoint dev-endpoint-20250918-141753
{
  "EnableCapture": true,
  "CaptureStatus": "Started",
  "CurrentSamplingPercentage": 100,
  "DestinationS3Uri": "s3://sagemaker-model-monitor-006230620263-us-west-2-dev/data-capture"
}
Data capture S3 url: s3://sagemaker-model-monitor-006230620263-us-west-2-dev/data-capture


## 2. Generate Test Traffic for Data Capture

Before setting up monitoring, we need to generate some inference traffic to create captured data that we can use for baseline creation.

In [6]:
def generate_test_traffic(endpoint_name, num_requests=50):
    """Generate test traffic to create captured data for monitoring"""
    
    if not endpoint_name:
        print("‚ùå No endpoint available for traffic generation")
        return False
    
    runtime_client = boto3.client('sagemaker-runtime')
    
    print(f"üöÄ Generating {num_requests} test requests to {endpoint_name}...")
    
    successful_requests = 0
    failed_requests = 0
    
    for i in tqdm(range(num_requests), desc="Sending requests"):
        try:
            # Generate realistic test data for abalone dataset
            sex = np.random.choice([0, 1, 2])
            length = np.random.normal(0.5, 0.1)
            diameter = length * np.random.uniform(0.7, 0.9)
            height = length * np.random.uniform(0.15, 0.25)
            whole_weight = length * diameter * height * np.random.uniform(8, 12)
            shucked_weight = whole_weight * np.random.uniform(0.3, 0.5)
            viscera_weight = whole_weight * np.random.uniform(0.1, 0.2)
            shell_weight = whole_weight * np.random.uniform(0.2, 0.4)
            
            test_data = f"{sex},{length:.3f},{diameter:.3f},{height:.3f},{whole_weight:.3f},{shucked_weight:.3f},{viscera_weight:.3f},{shell_weight:.3f},0.0,0.0"
            
            # Send request
            response = runtime_client.invoke_endpoint(
                EndpointName=endpoint_name,
                ContentType='text/csv',
                Body=test_data
            )
            
            successful_requests += 1
            time.sleep(0.1)  # Small delay
            
        except Exception as e:
            failed_requests += 1
            if failed_requests <= 3:
                print(f"\n‚ùå Request {i+1} failed: {e}")
    
    print(f"\nüìä Traffic generation complete:")
    print(f"   ‚úÖ Successful requests: {successful_requests}")
    print(f"   ‚ùå Failed requests: {failed_requests}")
    
    if successful_requests > 0:
        print(f"\nüí° Data capture is enabled, so these requests are now available for monitoring analysis.")
        return True
    return False

# Generate test traffic
if endpoint_name:
    traffic_generated = generate_test_traffic(endpoint_name, num_requests=30)
else:
    print("‚ö†Ô∏è Skipping traffic generation - no endpoint available")
    traffic_generated = False

üöÄ Generating 30 test requests to dev-endpoint-20250918-141753...


Sending requests: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 30/30 [00:04<00:00,  6.39it/s]


üìä Traffic generation complete:
   ‚úÖ Successful requests: 30
   ‚ùå Failed requests: 0

üí° Data capture is enabled, so these requests are now available for monitoring analysis.





## 3. Verify Captured Data is Available

Let's wait a moment and then check that our traffic has been captured:

In [7]:
def inspect_captured_data(s3_uri, max_files=10):
    """Inspect captured data files in S3"""
    
    if not s3_uri or s3_uri == 'Not Configured':
        print("‚ùå No data capture S3 URI configured")
        return []
    
    try:
        # List captured data files
        captured_files = S3Downloader.list(s3_uri)
        
        if not captured_files:
            print(f"üì≠ No captured data files found in {s3_uri}")
            return []
        
        print(f"üìä Found {len(captured_files)} captured data files")
        print(f"üìÅ Location: {s3_uri}")
        
        # Show recent files
        recent_files = sorted(captured_files, reverse=True)[:max_files]
        print(f"\nüìã Recent files ({len(recent_files)}):")
        for i, file_path in enumerate(recent_files, 1):
            file_name = file_path.split('/')[-1]
            print(f"  {i}. {file_name}")
        
        return recent_files
        
    except Exception as e:
        print(f"‚ùå Error inspecting captured data: {e}")
        return []

# Wait for data capture to process and then check
if traffic_generated:
    print("‚è≥ Waiting for data capture to process requests...")
    time.sleep(30)  # Wait for data capture
    
    # Check captured data
    captured_files = inspect_captured_data(data_capture_s3_uri)
    
    if captured_files:
        print(f"\n‚úÖ Data capture is working! Found {len(captured_files)} files.")
        print("üìä We can now proceed to create monitoring baselines using this captured data.")
        has_captured_data = True
    else:
        print("\n‚ö†Ô∏è No captured data found yet. Monitoring will use synthetic baselines.")
        has_captured_data = False
else:
    has_captured_data = False
    captured_files = []

‚è≥ Waiting for data capture to process requests...
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3Bucket
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3ObjectKeyPrefix
üìä Found 15 captured data files
üìÅ Location: s3://sagemaker-model-monitor-006230620263-us-west-2-dev/data-capture

üìã Recent files (10):
  1. 08-59-272-45c5a082-9938-4b62-b27e-d4d83d021e1f.jsonl
  2. 26-26-810-ab62a649-66b1-4a77-8bd2-95d0d2c1d4d7.jsonl
  3. 42-03-176-550bd082-8191-4b9a-afc1-74c5b78a6337.jsonl
  4. 49-52-915-9f462381-9dd1-47a0-89bc-7182b01c76da.jsonl
  5. 52-51-780-01ee6223-8f2b-46ee-91cb-ebbd36c37c62.jsonl
  6. 34-44-026-73285976-8594-4a4e-b8dc-32f419b2feae.jsonl
  7. 56-17-023-29dc6e17-995a-4469-b068-31192b27353d.jsonl
  8. 20-29-410-896912ca-75f0-49d7-8785-b8467dc37e78.jsonl
  9. 19-20-152-70e6017c-4de5-41b5-9282-2feacc6fcd47.jsonl
  10. 56-44-921-c3cbadea-46a9-4ced-9d8a-8d4f4170c60e.jsonl

In [8]:
# Download a capture data file and print its content
if captured_files:
    file_key = captured_files[-1]  # Get the latest file
    local_path = "./tmp"
    
    # Create tmp directory if it doesn't exist
    import os
    os.makedirs(local_path, exist_ok=True)
    
    print(f"üì• Downloading latest captured file: {file_key.split('/')[-1]}")
    
    # Download the file
    S3Downloader.download(file_key, local_path)
    
    print(f"\nüìÑ Content of the capture file:")
    print("-" * 80)
    
    # Read the jsonl file and show the first object
    import jsonlines
    import json
    
    local_file_path = f"{local_path}/{file_key.split('/')[-1]}"
    
    with jsonlines.open(local_file_path) as reader:
        first_record = reader.read()
        print(json.dumps(first_record, indent=2))
        
        # Optionally show second record if available
        try:
            second_record = reader.read()
            print("\n" + "="*40 + " SECOND RECORD " + "="*40)
            print(json.dumps(second_record, indent=2))
        except:
            print(f"\n(Only one record found in file)")
            
    print("-" * 80)
    print(f"‚úÖ File downloaded to: {local_file_path}")
    
else:
    print("‚ùå No captured files available. Run the data capture inspection cell first.")

üì• Downloading latest captured file: 56-44-921-c3cbadea-46a9-4ced-9d8a-8d4f4170c60e.jsonl
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3Bucket
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3ObjectKeyPrefix

üìÑ Content of the capture file:
--------------------------------------------------------------------------------
{
  "captureData": {
    "endpointInput": {
      "observedContentType": "text/csv",
      "mode": "INPUT",
      "data": "0,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,0.0,0.0",
      "encoding": "CSV"
    },
    "endpointOutput": {
      "observedContentType": "text/csv; charset=utf-8",
      "mode": "OUTPUT",
      "data": "7.912803649902344",
      "encoding": "CSV"
    }
  },
  "eventMetadata": {
    "eventId": "fd57342a-e2d1-4bea-a879-3afdcd979fd7",
    "inferenceTime": "2025-09-20T08:56:44Z"
  },
  "eventVersion": "0"
}

{
  "captureData": {
    "endpointI

## Part 1- Data Quality Monitoring

Now that we have captured data, let's set up data quality monitoring.


In this part you learn how to setup data quality monitoring for SageMaker real-time endpoints.

To enable inference data quality monitoring and evaluation you must:
1. Enable [data capture](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-capture.html)
1. [Create a baseline](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-create-baseline.html) with which you compare the realtime traffic
1. Once a baseline is ready, [schedule monitoring jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-scheduling.html) to continously evaluate and compare against the baseline
1. [See and interpret the results](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-results.html) of monitoring jobs
1. [Integrate data quality monitoring](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-cloudwatch.html) with Amazon CloudWatch

![Data Monitoring Architecture](images/data-monitoring-architecture.png)

### 4.1 Create Baseline Dataset


The whole dataset with which you trained and tested the model is usually a good baseline dataset. Note that the baseline dataset data schema and the inference dataset schema should exactly match (i.e. the number and order of the features).

From the baseline dataset you can ask Amazon SageMaker to suggest a set of baseline _constraints_ and generate descriptive _statistics_ to explore the data. Model Monitor provides a [built-in container](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-built-container.html) that provides the ability to suggest the constraints automatically for CSV and flat JSON input. This `sagemaker-model-monitor-analyzer` container also provides you with a range of model monitoring capabilities, including constraint validation against a baseline, and emitting Amazon CloudWatch metrics. This container is based on Spark and is built with [Deequ](https://github.com/awslabs/deequ). 

<div class="alert alert-info"> üí° <strong> All column names in your baseline dataset must be compliant with Spark. For column names, use only lowercase characters, and _ as the only special character. </strong>
</div>

We'll create a baseline dataset for data quality monitoring using captured data if available, or synthetic data as fallback.

In [12]:
def create_baseline_from_captured_data():
    """Create baseline from captured inference data if available"""
    
    if not has_captured_data or not captured_files:
        return None
    
    try:
        print("üìä Creating baseline from captured inference data...")
        
        # Download and parse captured data
        sample_file = captured_files[0]
        captured_content = S3Downloader.read_file(sample_file)
        
        # Parse JSON lines and extract input data
        baseline_data = []
        for line in captured_content.strip().split('\n'):
            record = json.loads(line)
            if 'captureData' in record and 'endpointInput' in record['captureData']:
                input_data = record['captureData']['endpointInput']['data']
                baseline_data.append(input_data)
        
        if baseline_data:
            print(f"‚úÖ Extracted {len(baseline_data)} samples from captured data")
            return baseline_data
        else:
            print("‚ö†Ô∏è No input data found in captured files")
            return None
            
    except Exception as e:
        print(f"‚ùå Error processing captured data: {e}")
        return None

def create_synthetic_baseline():
    """Create synthetic baseline data as fallback"""
    
    print("üìä Creating synthetic baseline dataset...")
    
    np.random.seed(42)
    n_samples = 100
    baseline_data = []
    
    for _ in range(n_samples):
        sex = np.random.choice([0, 1, 2])
        length = np.random.normal(0.5, 0.1)
        diameter = length * np.random.uniform(0.7, 0.9)
        height = length * np.random.uniform(0.15, 0.25)
        whole_weight = length * diameter * height * np.random.uniform(8, 12)
        shucked_weight = whole_weight * np.random.uniform(0.3, 0.5)
        viscera_weight = whole_weight * np.random.uniform(0.1, 0.2)
        shell_weight = whole_weight * np.random.uniform(0.2, 0.4)
        
        baseline_data.append(f"{sex},{length:.3f},{diameter:.3f},{height:.3f},{whole_weight:.3f},{shucked_weight:.3f},{viscera_weight:.3f},{shell_weight:.3f},0.0,0.0")
    
    return baseline_data

# Create baseline dataset
baseline_data = create_baseline_from_captured_data()
if not baseline_data:
    baseline_data = create_synthetic_baseline()

print(f"üìä Created baseline with {len(baseline_data)} samples")

# Upload baseline dataset to S3
baseline_s3_prefix = f"model-monitor/{endpoint_name}/baseline"
baseline_s3_uri = f"s3://{bucket}/{baseline_s3_prefix}/baseline.csv"

# Save baseline dataset
baseline_csv = "\n".join(baseline_data)
s3_client.put_object(
    Bucket=bucket,
    Key=f"{baseline_s3_prefix}/baseline.csv",
    Body=baseline_csv
)

print(f"‚úÖ Baseline dataset uploaded to: {baseline_s3_uri}")

üìä Creating baseline from captured inference data...
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3Bucket
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3ObjectKeyPrefix
‚úÖ Extracted 12 samples from captured data
üìä Created baseline with 12 samples
‚úÖ Baseline dataset uploaded to: s3://amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/baseline/baseline.csv


### 4.2 Create Baseline Job

Create a baseline job to establish statistical constraints:

In [27]:
# Create DefaultModelMonitor instance
data_quality_monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
    sagemaker_session=session
)

# Define S3 paths for baseline job outputs
baseline_job_name = f"data-quality-baseline-{endpoint_name}-{int(time.time())}"
baseline_results_s3_uri = f"s3://{bucket}/{baseline_s3_prefix}/results"

print(f"üöÄ Starting baseline job: {baseline_job_name}")
print(f"üìä Input data: {baseline_s3_uri}")
print(f"üìÅ Output location: {baseline_results_s3_uri}")

# Start baseline job
try:
    data_quality_monitor.suggest_baseline(
        baseline_dataset=baseline_s3_uri,
        dataset_format=DatasetFormat.csv(header=False),
        output_s3_uri=baseline_results_s3_uri,
        job_name=baseline_job_name
    )
    
    print("‚úÖ Baseline job started successfully!")
    print("‚è≥ This will take approximately 10-15 minutes...")
    
except Exception as e:
    print(f"‚ùå Error starting baseline job: {e}")
    baseline_job_name = None

sagemaker.config INFO - Applied value from config key = SageMaker.MonitoringSchedule.MonitoringScheduleConfig.MonitoringJobDefinition.NetworkConfig.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.MonitoringSchedule.MonitoringScheduleConfig.MonitoringJobDefinition.NetworkConfig.VpcConfig.SecurityGroupIds
üöÄ Starting baseline job: data-quality-baseline-dev-endpoint-20250918-141753-1758459469
üìä Input data: s3://amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/baseline/baseline.csv
üìÅ Output location: s3://amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/baseline/results
.............[34m2025-09-21 12:59:58.142093: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory[0m
[34m2025-09-21 12:59:58.142129: I te

In [28]:
# Monitor baseline job progress
if baseline_job_name:
    print("üìä Monitoring baseline job progress...")
    
    # Wait for job completion with progress updates
    start_time = time.time()
    
    while True:
        try:
            job_desc = sm_client.describe_processing_job(ProcessingJobName=baseline_job_name)
            status = job_desc['ProcessingJobStatus']
            
            elapsed_time = int(time.time() - start_time)
            print(f"\r‚è±Ô∏è  Status: {status} | Elapsed: {elapsed_time//60}m {elapsed_time%60}s", end="", flush=True)
            
            if status in ['Completed', 'Failed', 'Stopped']:
                print(f"\n\nüèÅ Baseline job {status.lower()}!")
                
                if status == 'Completed':
                    print("‚úÖ Baseline creation successful!")
                    baseline_job_completed = True
                else:
                    print(f"‚ùå Baseline job failed with status: {status}")
                    if 'FailureReason' in job_desc:
                        print(f"Failure reason: {job_desc['FailureReason']}")
                    baseline_job_completed = False
                break
                
            time.sleep(30)  # Check every 30 seconds
            
        except KeyboardInterrupt:
            print("\n\n‚è∏Ô∏è  Monitoring interrupted. Job continues running in background.")
            baseline_job_completed = None
            break
        except Exception as e:
            print(f"\n‚ùå Error monitoring job: {e}")
            baseline_job_completed = None
            break
else:
    baseline_job_completed = False

üìä Monitoring baseline job progress...
‚è±Ô∏è  Status: Completed | Elapsed: 0m 0s

üèÅ Baseline job completed!
‚úÖ Baseline creation successful!


### See the generated statistics and constraints
After the baselining jobs finished, it saves the baseline statistics to the statistics.json file and the suggested baseline constraints to the constraints.json file in the location you specify with output_s3_uri

In [29]:
data_quality_monitor.describe_latest_baselining_job()

{'ProcessingInputs': [{'InputName': 'baseline_dataset_input',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/baseline/baseline.csv',
    'LocalPath': '/opt/ml/processing/input/baseline_dataset_input',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated',
    'S3CompressionType': 'None'}}],
 'ProcessingOutputConfig': {'Outputs': [{'OutputName': 'monitoring_output',
    'S3Output': {'S3Uri': 's3://amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/baseline/results',
     'LocalPath': '/opt/ml/processing/output',
     'S3UploadMode': 'EndOfJob'},
    'AppManaged': False}]},
 'ProcessingJobName': 'data-quality-baseline-dev-endpoint-20250918-141753-1758459469',
 'ProcessingResources': {'ClusterConfig': {'InstanceCount': 1,
   'InstanceType': 'ml.m5.xlarge',
   'VolumeSizeInGB': 20}},
 'Stopping

In [30]:
!aws s3 ls {baseline_results_s3_uri}/

2025-09-21 13:01:24       1987 constraints.json
2025-09-21 13:01:24      20012 statistics.json


In [31]:
data_statistics_s3_url = f"{baseline_results_s3_uri}/statistics.json"
data_constraints_s3_url = f"{baseline_results_s3_uri}/constraints.json"

#### Copy statistics and constraints JSON files to the Studio EFS:

In [32]:
!aws s3 cp {data_constraints_s3_url} ./tmp/
!aws s3 cp {data_statistics_s3_url} ./tmp/

download: s3://amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/baseline/results/constraints.json to tmp/constraints.json
download: s3://amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/baseline/results/statistics.json to tmp/statistics.json


In [33]:
!head -20 tmp/constraints.json

{
  "version" : 0.0,
  "features" : [ {
    "name" : "_c0",
    "inferred_type" : "Integral",
    "completeness" : 1.0,
    "num_constraints" : {
      "is_non_negative" : true
    }
  }, {
    "name" : "_c1",
    "inferred_type" : "Fractional",
    "completeness" : 1.0,
    "num_constraints" : {
      "is_non_negative" : true
    }
  }, {
    "name" : "_c2",
    "inferred_type" : "Fractional",
    "completeness" : 1.0,


In [34]:
!head -20 tmp/statistics.json

{
  "version" : 0.0,
  "dataset" : {
    "item_count" : 30
  },
  "features" : [ {
    "name" : "_c0",
    "inferred_type" : "Integral",
    "numerical_statistics" : {
      "common" : {
        "num_present" : 30,
        "num_missing" : 0
      },
      "mean" : 1.0666666666666667,
      "sum" : 32.0,
      "std_dev" : 0.8137703743822469,
      "min" : 0.0,
      "max" : 2.0,
      "approximate_num_distinct_values" : 3,
      "completeness" : 1.0,


#### Load the generated JSON as Pandas DataFrame and see the content of statistics.json and constaints.json:

In [35]:
baseline_job = data_quality_monitor.latest_baselining_job
statistics_df = pd.json_normalize(baseline_job.baseline_statistics().body_dict["features"])
statistics_df.head()

Unnamed: 0,name,inferred_type,numerical_statistics.common.num_present,numerical_statistics.common.num_missing,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max,numerical_statistics.approximate_num_distinct_values,numerical_statistics.completeness,numerical_statistics.distribution.kll.buckets,numerical_statistics.distribution.kll.sketch.parameters.c,numerical_statistics.distribution.kll.sketch.parameters.k,numerical_statistics.distribution.kll.sketch.data
0,_c0,Integral,30,0,1.066667,32.0,0.81377,0.0,2.0,3,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.2, 'cou...",0.64,2048.0,"[[0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 1.0, 0.0, 0.0,..."
1,_c1,Fractional,30,0,0.5196,15.588,0.119817,0.282,0.735,30,1.0,"[{'lower_bound': 0.282, 'upper_bound': 0.3273,...",0.64,2048.0,"[[0.618, 0.662, 0.735, 0.479, 0.472, 0.467, 0...."
2,_c2,Fractional,30,0,0.4129,12.387,0.093845,0.235,0.586,29,1.0,"[{'lower_bound': 0.235, 'upper_bound': 0.2701,...",0.64,2048.0,"[[0.441, 0.527, 0.538, 0.407, 0.332, 0.333, 0...."
3,_c3,Fractional,30,0,0.099367,2.981,0.025073,0.045,0.167,28,1.0,"[{'lower_bound': 0.045, 'upper_bound': 0.0572,...",0.64,2048.0,"[[0.093, 0.105, 0.167, 0.076, 0.087, 0.112, 0...."
4,_c4,Fractional,30,0,0.234167,7.025,0.138781,0.034,0.541,31,1.0,"[{'lower_bound': 0.034, 'upper_bound': 0.0847,...",0.64,2048.0,"[[0.24, 0.342, 0.541, 0.147, 0.158, 0.191, 0.1..."


In [36]:
constraints_df = pd.json_normalize(
    baseline_job.suggested_constraints().body_dict["features"]
)
constraints_df

Unnamed: 0,name,inferred_type,completeness,num_constraints.is_non_negative
0,_c0,Integral,1.0,True
1,_c1,Fractional,1.0,True
2,_c2,Fractional,1.0,True
3,_c3,Fractional,1.0,True
4,_c4,Fractional,1.0,True
5,_c5,Fractional,1.0,True
6,_c6,Fractional,1.0,True
7,_c7,Fractional,1.0,True
8,_c8,Fractional,1.0,True
9,_c9,Fractional,1.0,True


For this dataset the baselining job suggest three constraints:

1. DataType
2. Completeness
3. Is non-negative
Additionally, the Model Monitor prebuilt container does missing and extra column check, baseline drift check, and categorical values check. Refer to Developer Guide for more details.

In a real-world project you can add your own constraints the data must comply with.

Next you schedule and run a monitoring job to validate incoming data against these constraints and statistics.

### 4.3 Create Data Quality Monitoring Schedule

Set up automated monitoring to run on a schedule:
With a monitoring schedule, SageMaker launches processing jobs at a specified frequency to analyze the data collected during a given period. SageMaker provides a [built-in container](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-built-container.html) for performing analysis on tabular datasets. In the processing job, SageMaker compares the dataset for the current analysis with the baseline statistics and constraints and generates a violations report. In addition, CloudWatch metrics are emitted for each data feature under analysis.

In [41]:
# Create data quality monitoring schedule
data_quality_schedule_name = f"{endpoint_name}-data-quality-schedule"
data_quality_reports_uri = f"s3://{bucket}/model-monitor/{endpoint_name}/data-quality-reports"

print(f"üîÑ Creating data quality monitoring schedule: {data_quality_schedule_name}")

try:
    data_quality_monitor.create_monitoring_schedule(
        monitor_schedule_name=data_quality_schedule_name,
        endpoint_input=EndpointInput(
            endpoint_name=endpoint_name,
            destination="/opt/ml/processing/input"
        ),
        output_s3_uri=data_quality_reports_uri,
        schedule_cron_expression=CronExpressionGenerator.hourly(),
        enable_cloudwatch_metrics=True
    )
    
    print(f"‚úÖ Data quality monitoring schedule created successfully")
    print(f"üìä Schedule: Hourly monitoring")
    print(f"üìÅ Reports will be stored at: {data_quality_reports_uri}")
    data_quality_schedule_created = True
    
except Exception as e:
    print(f"‚ùå Error creating data quality monitoring schedule: {e}")
    print("üí° This might be due to existing schedule or permission issues.")
    data_quality_schedule_created = False

üîÑ Creating data quality monitoring schedule: dev-endpoint-20250918-141753-data-quality-schedule
‚úÖ Data quality monitoring schedule created successfully
üìä Schedule: Hourly monitoring
üìÅ Reports will be stored at: s3://amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/data-quality-reports


## 5. Launch a Manual Monitoring Job

You can launch a monitoring job manually and don't wait until a configured data monitor schedule execution. You created an hourly schedule, so you need to wait until you cross the hour boundary to see some schedule executions.

Since the Model Monitor uses a built-in container and a SageMaker processing job to run analysis of the captured data, you can manually configure and run a monitoring job.

This utils folder contains an implementation of a helper function to manually run a monitoring job.

In [42]:
# Import the monitoring utilities
import sys
sys.path.append('./utils')
from monitoring_utils import run_model_monitor_job

print("‚úÖ Imported monitoring utilities")

‚úÖ Imported monitoring utilities


### Generate non-compliant traffic
Now generate traffic that will trigger the violation in the model monitor data quality check.

In [43]:
# Generate drift traffic first to test monitoring
def generate_drift_traffic(endpoint_name, num_requests=15):
    """Generate traffic that will likely trigger monitoring violations"""
    
    if not endpoint_name:
        print("‚ùå No endpoint available for drift simulation")
        return False
    
    runtime_client = boto3.client('sagemaker-runtime')
    
    print(f"üö® Generating {num_requests} drift-inducing requests...")
    print("   This data is intentionally different from baseline to trigger violations")
    
    successful_requests = 0
    
    for i in tqdm(range(num_requests), desc="Sending drift traffic"):
        try:
            # Generate data that's significantly different from baseline
            if i % 3 == 0:
                # Extremely large values (outliers)
                test_data = "2,2.0,1.8,0.8,5.0,2.5,1.2,1.8,0.0,0.0"
            elif i % 3 == 1:
                # Extremely small values (outliers)
                test_data = "0,0.05,0.04,0.01,0.01,0.005,0.002,0.003,0.0,0.0"
            else:
                # Negative values (constraint violations)
                test_data = "1,-0.5,0.4,0.1,0.6,-0.2,0.1,0.2,0.0,0.0"
            
            runtime_client.invoke_endpoint(
                EndpointName=endpoint_name,
                ContentType='text/csv',
                Body=test_data
            )
            
            successful_requests += 1
            time.sleep(0.2)
            
        except Exception as e:
            print(f"\n‚ùå Request {i+1} failed: {e}")
    
    print(f"\nüìä Generated {successful_requests} drift requests")
    print("‚ö†Ô∏è This should trigger violations in monitoring analysis")
    return successful_requests > 0

# Generate drift traffic to test monitoring
if endpoint_name:
    print("üéØ Generating drift traffic to test monitoring...")
    drift_generated = generate_drift_traffic(endpoint_name, num_requests=12)
    
    if drift_generated:
        print("\n‚è≥ Waiting for data capture to process...")
        time.sleep(30)
        print("\n‚è≥ Completed")
else:
    drift_generated = False

üéØ Generating drift traffic to test monitoring...
üö® Generating 12 drift-inducing requests...
   This data is intentionally different from baseline to trigger violations


Sending drift traffic: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 12/12 [00:03<00:00,  3.80it/s]



üìä Generated 12 drift requests
‚ö†Ô∏è This should trigger violations in monitoring analysis

‚è≥ Waiting for data capture to process...


### Inspect Captured Drift Data

Let's examine the captured data to see the drift traffic we just generated:

In [46]:
def show_captured_drift_data(s3_uri, max_records=3):
    """Show the latest captured data including drift traffic"""
    
    if not s3_uri or not drift_generated:
        print("‚ö†Ô∏è No drift data to examine")
        return
    
    try:
        print("üîç Examining captured drift data...")
        
        # Get latest captured files
        captured_files = S3Downloader.list(s3_uri)
        
        if not captured_files:
            print("üì≠ No captured files found")
            return
        
        # Get the most recent file (should contain drift data)
        latest_file = sorted(captured_files)[-1]
        print(f"üìÑ Latest capture file: {latest_file.split('/')[-1]}")
        
        # Download and parse the file
        file_content = S3Downloader.read_file(latest_file)
        lines = file_content.strip().split('\n')
        
        print(f"üìä File contains {len(lines)} inference records")
        print(f"\nüîç Showing last {min(max_records, len(lines))} records (likely drift data):")
        
        # Show the last few records (most likely to be drift data)
        for i, line in enumerate(lines[-max_records:], 1):
            try:
                record = json.loads(line)
                
                # Extract input and output data
                if 'captureData' in record:
                    capture_data = record['captureData']
                    
                    # Get input data
                    input_data = capture_data.get('endpointInput', {}).get('data', 'N/A')
                    
                    # Get output data
                    output_data = capture_data.get('endpointOutput', {}).get('data', 'N/A')
                    
                    print(f"\nüìù Record {i}:")
                    print(f"   Input:  {input_data}")
                    print(f"   Output: {output_data}")
                    
                    # Analyze if this looks like drift data
                    if input_data != 'N/A':
                        values = input_data.split(',')
                        if len(values) >= 3:
                            try:
                                # Check for extreme values that indicate drift
                                val1, val2, val3 = float(values[1]), float(values[2]), float(values[3])
                                if val1 > 1.5 or val2 > 1.5:
                                    print(f"   üö® DRIFT: Extremely large values detected!")
                                elif val1 < 0 or val2 < 0 or val3 < 0:
                                    print(f"   ‚ö†Ô∏è VIOLATION: Negative values detected!")
                                elif val1 < 0.1 and val2 < 0.1:
                                    print(f"   üö® DRIFT: Extremely small values detected!")
                                else:
                                    print(f"   ‚úÖ Normal range values")
                            except ValueError:
                                print(f"   ‚ùì Could not analyze values")
                
            except json.JSONDecodeError:
                print(f"   ‚ùå Could not parse record {i}")
        
        print(f"\nüí° The drift data shows intentionally problematic values that should trigger violations!")
        
    except Exception as e:
        print(f"‚ùå Error examining drift data: {e}")

# Show captured drift data
if drift_generated:
    show_captured_drift_data(data_capture_s3_uri, max_records=10)
else:
    print("üí° No drift traffic generated to examine")

üîç Examining captured drift data...
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3Bucket
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3ObjectKeyPrefix
üìÑ Latest capture file: 26-26-810-ab62a649-66b1-4a77-8bd2-95d0d2c1d4d7.jsonl
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3Bucket
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3ObjectKeyPrefix
üìä File contains 12 inference records

üîç Showing last 10 records (likely drift data):

üìù Record 1:
   Input:  1,-0.5,0.4,0.1,0.6,-0.2,0.1,0.2,0.0,0.0
   Output: 9.007847785949707
   ‚ö†Ô∏è VIOLATION: Negative values detected!

üìù Record 2:
   Input:  2,2.0,1.8,0.8,5.0,2.5,1.2,1.8,0.0,0.0
   Output: 9.617115020751953
   üö® DRIFT: Extremely large values detected!

üìù Record 3:
   Input:  0,0.05,0.04,0.01,0.01,0.005,0.

### Run Manual Monitoring Job with Util Function

Now let's use the utility function to manually run a monitoring job on the captured data:

In [48]:
# Run manual monitoring job using the utility function
if baseline_job_completed and drift_generated:
    print("üöÄ Running manual monitoring job using utility function...")
    
    # Define paths for the monitoring job
    manual_reports_path = f"s3://{bucket}/model-monitor/{endpoint_name}/manual-reports"
    
    try:
        # Use the utility function to run monitoring job
        processing_job = run_model_monitor_job(
            region=region,
            instance_type='ml.m5.xlarge',
            role=role,
            data_capture_path=data_capture_s3_uri,
            statistics_path=data_statistics_s3_url,
            constraints_path=data_constraints_s3_url,
            reports_path=manual_reports_path,
            instance_count=1,
            publish_cloudwatch_metrics='Disabled',
            wait=False,  # Don't wait for completion
            logs=False
        )
        
        
        print(f"üìÅ Reports will be saved to: {manual_reports_path}")
        print("‚è≥ Job is running in background - will analyze drift data")
        
    except Exception as e:
        print(f"‚ùå Error running manual monitoring job: {e}")
        manual_job_name = None
        
else:
    print("‚ö†Ô∏è Skipping manual monitoring job - prerequisites not met")
    print(f"   Baseline completed: {baseline_job_completed if 'baseline_job_completed' in locals() else False}")
    print(f"   Drift generated: {drift_generated}")
    manual_job_name = None

üöÄ Running manual monitoring job using utility function...
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3Bucket
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3ObjectKeyPrefix
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.Subnets
sagemaker.config INFO - Applied value from config key = SageMaker.ProcessingJob.NetworkConfig.VpcConfig.SecurityGroupIds
üìÅ Reports will be saved to: s3://amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/manual-reports
‚è≥ Job is running in background - will analyze drift data


## 6. Check Latest Captured Data and Monitoring Results

Let's examine the latest captured data and check for any monitoring violations:

In [60]:

# Check monitoring schedule executions
def check_monitoring_executions(schedule_name):
    """Check recent monitoring schedule executions"""
    
    try:
        response = sm_client.list_monitoring_executions(
            MonitoringScheduleName=schedule_name,
            MaxResults=2
        )
        
        executions = response.get('MonitoringExecutionSummaries', [])
        
        if executions:
            print(f"üìä Found {len(executions)} recent execution(s):")
            print(executions)
            for execution in executions:
                status = execution.get('MonitoringExecutionStatus', 'Unknown')
                scheduled_time = execution.get('ScheduledTime', 'Unknown')
                
                print(f"   ‚Ä¢ Status: {status} | Scheduled: {scheduled_time}")
                
                if status == 'Completed':
                    print("     ‚úÖ Execution completed - check S3 for reports")
                elif status == 'Failed':
                    print("     ‚ùå Execution failed")
                elif status == 'InProgress':
                    print("     ‚è≥ Execution in progress")
        else:
            print("üì≠ No executions found yet - monitoring jobs run on schedule")
            
    except Exception as e:
        print(f"‚ùå Error checking executions: {e}")

# Check monitoring schedule executions
if data_quality_schedule_created:
    print("\nüîç Checking monitoring schedule executions...")
    check_monitoring_executions(data_quality_schedule_name)
else:
    print("üí° No monitoring schedule to check")


üîç Checking monitoring schedule executions...
üìä Found 2 recent execution(s):
[{'MonitoringScheduleName': 'dev-endpoint-20250918-141753-data-quality-schedule', 'ScheduledTime': datetime.datetime(2025, 9, 21, 13, 0, tzinfo=tzlocal()), 'CreationTime': datetime.datetime(2025, 9, 21, 13, 2, 23, 286000, tzinfo=tzlocal()), 'LastModifiedTime': datetime.datetime(2025, 9, 21, 13, 8, 12, 811000, tzinfo=tzlocal()), 'MonitoringExecutionStatus': 'CompletedWithViolations', 'ProcessingJobArn': 'arn:aws:sagemaker:us-west-2:006230620263:processing-job/model-monitoring-202509211300-f944e75d79c63a5ad7055637', 'EndpointName': 'dev-endpoint-20250918-141753', 'MonitoringJobDefinitionName': 'data-quality-job-definition-2025-09-20-18-04-24-772', 'MonitoringType': 'DataQuality'}, {'MonitoringScheduleName': 'dev-endpoint-20250918-141753-data-quality-schedule', 'ScheduledTime': datetime.datetime(2025, 9, 21, 12, 0, tzinfo=tzlocal()), 'CreationTime': datetime.datetime(2025, 9, 21, 12, 2, 23, 346000, tzinfo=t

## 7. View Monitoring Reports and Violations

Let's examine any monitoring reports that have been generated:

In [61]:
def examine_monitoring_reports(reports_s3_uri):
    """Examine monitoring reports and violations"""
    
    try:
        print(f"üìä Examining reports in: {reports_s3_uri}")
        
        # List report files
        report_files = S3Downloader.list(reports_s3_uri)
        
        if not report_files:
            print("üì≠ No report files found yet")
            print("üí° Reports appear after monitoring jobs complete (may take time)")
            return
        
        print(f"üìÅ Found {len(report_files)} report files:")
        for file_path in report_files[-5:]:  # Show last 5
            file_name = file_path.split('/')[-1]
            print(f"  ‚Ä¢ {file_name}")
        
        # Look for constraint violations
        violation_files = [f for f in report_files if 'constraint_violations.json' in f]
        
        if violation_files:
            latest_violations = violation_files[-1]  # Most recent
            print(f"\nüîç Analyzing violations in: {latest_violations.split('/')[-1]}")
            
            violations_content = S3Downloader.read_file(latest_violations)
            violations_data = json.loads(violations_content)
            
            violations_list = violations_data.get('violations', [])
            
            if violations_list:
                print(f"‚ö†Ô∏è Found {len(violations_list)} violations:")
                
                for i, violation in enumerate(violations_list[:3], 1):  # Show first 3
                    feature = violation.get('feature_name', 'Unknown')
                    check_type = violation.get('constraint_check_type', 'Unknown')
                    description = violation.get('description', 'No description')
                    
                    print(f"\n{i}. Feature: {feature}")
                    print(f"   Check: {check_type}")
                    print(f"   Issue: {description[:80]}..." if len(description) > 80 else f"   Issue: {description}")
                
                print(f"\nüéØ Success! Monitoring detected data drift as expected.")
            else:
                print("‚úÖ No violations detected - data within baseline constraints")
        else:
            print("üìã No violation files found - monitoring may still be processing")
            
    except Exception as e:
        print(f"‚ùå Error examining reports: {e}")

# Examine monitoring reports
if data_quality_schedule_created:
    examine_monitoring_reports(data_quality_reports_uri)
else:
    print("üí° No monitoring reports to examine yet")

üìä Examining reports in: s3://amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/data-quality-reports
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3Bucket
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3ObjectKeyPrefix
üìÅ Found 2 report files:
  ‚Ä¢ constraint_violations.json
  ‚Ä¢ constraint_violations.json

üîç Analyzing violations in: constraint_violations.json
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3Bucket
sagemaker.config INFO - Applied value from config key = SageMaker.PythonSDK.Modules.Session.DefaultS3ObjectKeyPrefix
‚ö†Ô∏è Found 1 violations:

1. Feature: Extra columns
   Check: extra_column_check
   Issue: There are extra columns in current dataset. Number of columns in current dataset...

üéØ Success! Monitoring detected data drift as expected.


In [62]:
while data_quality_monitor.describe_schedule()["MonitoringScheduleStatus"] != "Scheduled":
    print(f"Waiting until data monitoring schedule status becomes Scheduled")
    time.sleep(3)

data_quality_monitor.describe_schedule()

{'MonitoringScheduleArn': 'arn:aws:sagemaker:us-west-2:006230620263:monitoring-schedule/dev-endpoint-20250918-141753-data-quality-schedule',
 'MonitoringScheduleName': 'dev-endpoint-20250918-141753-data-quality-schedule',
 'MonitoringScheduleStatus': 'Scheduled',
 'MonitoringType': 'DataQuality',
 'CreationTime': datetime.datetime(2025, 9, 21, 13, 20, 0, 617000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2025, 9, 21, 13, 20, 8, 635000, tzinfo=tzlocal()),
 'MonitoringScheduleConfig': {'ScheduleConfig': {'ScheduleExpression': 'cron(0 * ? * * *)'},
  'MonitoringJobDefinitionName': 'data-quality-job-definition-2025-09-21-13-19-59-803',
  'MonitoringType': 'DataQuality'},
 'EndpointName': 'dev-endpoint-20250918-141753',
 'LastMonitoringExecutionSummary': {'MonitoringScheduleName': 'dev-endpoint-20250918-141753-data-quality-schedule',
  'ScheduledTime': datetime.datetime(2025, 9, 21, 13, 0, tzinfo=tzlocal()),
  'CreationTime': datetime.datetime(2025, 9, 21, 13, 2, 23, 286000, 

In [75]:
# Get S3 url for the latest monitoring job output
def get_latest_monitoring_report_s3_url(job_name):
    monitor_job = sm_client.list_processing_jobs(
        NameContains=job_name,
        SortOrder='Descending',
        MaxResults=2
    )['ProcessingJobSummaries'][0]['ProcessingJobName']

    monitoring_job_output_s3_url = sm_client.describe_processing_job(
        ProcessingJobName=monitor_job
    )['ProcessingOutputConfig']['Outputs'][0]['S3Output']['S3Uri']

    print(f"Latest monitoring report S3 url: {monitoring_job_output_s3_url}")
    
    return monitoring_job_output_s3_url

In [76]:
manual_monitoring_job_output_s3_url = get_latest_monitoring_report_s3_url("sagemaker-model-monitor-analyzer")

Latest monitoring report S3 url: s3://amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/manual-reports/e


In [77]:
!aws s3 ls {manual_monitoring_job_output_s3_url}/

2025-09-21 13:32:43        274 constraint_violations.json


In [81]:
# Helper to load a json file from S3
def load_json_from_file(file_s3_url):
    bucket = file_s3_url.split('/')[2]
    key = '/'.join(file_s3_url.split('/')[3:])
    print(f"Load JSON from: {bucket}/{key}")
    
    return json.loads(
        s3_client.get_object(Bucket=bucket, 
                      Key=key)["Body"].read().decode("utf-8")
    )

In [82]:
violations = load_json_from_file(f"{manual_monitoring_job_output_s3_url}/constraint_violations.json")


Load JSON from: amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/manual-reports/e/constraint_violations.json


In [83]:
pd.json_normalize(violations["violations"])


Unnamed: 0,feature_name,constraint_check_type,description
0,Extra columns,extra_column_check,There are extra columns in current dataset. Nu...


In [84]:
!aws s3 cp {manual_monitoring_job_output_s3_url}/constraint_violations.json ./tmp/


download: s3://amazon-sagemaker-006230620263-us-west-2-f717bf909848/model-monitor/dev-endpoint-20250918-141753/manual-reports/e/constraint_violations.json to tmp/constraint_violations.json


In [85]:
!head ./tmp/constraint_violations.json


{
  "violations" : [ {
    "feature_name" : "Extra columns",
    "constraint_check_type" : "extra_column_check",
    "description" : "There are extra columns in current dataset. Number of columns in current dataset: 11, Number of columns in baseline constraints: 10"
  } ]
}

## 5. Summary and Next Steps

Congratulations! You have successfully set up model monitoring following the correct flow:

1. ‚úÖ **Discovered endpoints** with data capture enabled
2. ‚úÖ **Generated traffic** to create captured data
3. ‚úÖ **Created baselines** from captured data (or synthetic fallback)
4. ‚úÖ **Set up monitoring schedules** for automated analysis

This follows the same pattern as the reference notebook and represents best practices for model monitoring.

---

## 6. Cleanup Resources 

### 6.1 Stop Monitoring Schedules



In [1]:
def cleanup_monitoring_resources():
    """Clean up monitoring resources to avoid ongoing costs"""
    
    print("üßπ CLEANING UP MONITORING RESOURCES")
    print("=" * 50)
    
    cleanup_actions = []
    
    # Stop data quality monitoring schedule
    if 'data_quality_schedule_name' in locals() and data_quality_schedule_created:
        try:
            data_quality_monitor.stop_monitoring_schedule()
            cleanup_actions.append(f"‚úÖ Stopped data quality schedule: {data_quality_schedule_name}")
        except Exception as e:
            cleanup_actions.append(f"‚ùå Error stopping data quality schedule: {e}")
    
    # Delete monitoring schedules
    try:
        schedules = sm_client.list_monitoring_schedules(
            EndpointName=endpoint_name if endpoint_name else "dummy"
        )['MonitoringScheduleSummaries']
        
        for schedule in schedules:
            schedule_name = schedule['MonitoringScheduleName']
            try:
                sm_client.delete_monitoring_schedule(
                    MonitoringScheduleName=schedule_name
                )
                cleanup_actions.append(f"‚úÖ Deleted schedule: {schedule_name}")
            except Exception as e:
                cleanup_actions.append(f"‚ùå Error deleting {schedule_name}: {e}")
                
    except Exception as e:
        cleanup_actions.append(f"‚ùå Error listing schedules: {e}")
    
    # Show cleanup results
    if cleanup_actions:
        print("\nüìã Cleanup Results:")
        for action in cleanup_actions:
            print(f"   {action}")
    else:
        print("üí° No monitoring resources found to clean up")
    
    print(f"\n‚ö†Ô∏è Note: This does not delete:")
    print(f"   ‚Ä¢ S3 data (baselines, reports, captured data)")
    print(f"   ‚Ä¢ CloudWatch alarms")
    print(f"   ‚Ä¢ EventBridge rules")
    print(f"   ‚Ä¢ The endpoint itself")
    
    return len([a for a in cleanup_actions if "‚úÖ" in a])

# Uncomment the lines below to run cleanup
# cleanup_count = cleanup_monitoring_resources()
# print(f"\nüéØ Cleaned up {cleanup_count} monitoring resources")

print("üí° Cleanup code is ready but commented out for safety.")
print("   Uncomment the lines above if you want to clean up resources.")

üí° Cleanup code is ready but commented out for safety.
   Uncomment the lines above if you want to clean up resources.
