# Model Quality Monitor

## Predictive Maintenance for Pharmaceutical Manufacturing Equipment System Model Quality Monitoring (Artifical Ground Truth Testing Alarm)

This model quality monitoring with artificial ground truth for testing alarms was developed as part of the AAI-540 course, specifically aligned with Lab 5.

#### Host a trained machine learning model in Amazon SageMaker.  Monitor and detect machine learning model quality drift


This notebook shows how to:
- Host a machine learning model in Amazon SageMaker and capture inference requests, results, and metadata.
- Generate a baseline of model quality and establish suggested constraints.
- Monitor a live endpoint for violations against these constraints.
- Generate CloudWatch Alarms on model quality drift.


**Table of Contents** 

1. [Introduction](#intro)
2. [Section 1 - Setup](#setup)
3. [Section 2 - Deploy pre-trained model with data capture enabled](#deploy)
5. [Section 3 - Generate baseline for model quality performance](#generate-baseline)
6. [Section 4 - Setup continuous model monitoring to identify model quality drift](#analyze-model-quality-drift)
7. [Section 5 - Analyze model quality CloudWatch metrics](#analyze-cloudwatch-metrics)
8. [Clean up](#cleanup)



## Introduction <a id='intro'></a>    

Amazon SageMaker provides developers and data scientists with a comprehensive, fully-managed ML service that covers the entire ML workflow. SageMaker Model Monitor allows you to maintain high-quality ML models by automatically detecting inaccuracies in model predictions and identifying changes in independent variables.

In this notebook, you will learn how to use SageMaker Model Monitor to assess the performance of the predictive maintenance model deployed for pharmaceutical manufacturing equipment. The model aims to reduce unplanned downtime and ensure regulatory compliance by predicting equipment failures.

### Project Context

The project uses the Predictive Maintenance Dataset from Kaggle, containing 124,000 entries of sensor readings and failure indicators. This dataset was chosen for its rich historical data, which supports the development of a robust predictive model. The Random Forest classifier used in this project is suitable for handling complex sensor data and balancing precision and recall, which is critical in predictive maintenance scenarios.

The system aims to deploy a batch-mode model that provides daily or weekly predictions for scheduled maintenance, as opposed to real-time predictions.

## Section 1 - Setup <a id='setup'></a>

In this section, you will import the necessary libraries, set up variables, and examine the data used to train the Random Forest model for predictive maintenance.

Let's start by specifying:

- **AWS Region**: The region used to host the predictive maintenance model.
- **IAM Role**: The IAM role associated with this SageMaker notebook instance.
- **S3 Bucket**: The S3 bucket storing training data, model artifacts, and captured inference data.

#### 1.1 Import necessary libraries

In [2]:
%%time

from datetime import datetime, timedelta, timezone
import json
import os
import re
import boto3
from time import sleep
from threading import Thread

import pandas as pd

from sagemaker import get_execution_role, session, Session, image_uris
from sagemaker.s3 import S3Downloader, S3Uploader
from sagemaker.processing import ProcessingJob
from sagemaker.serializers import CSVSerializer

from sagemaker.model import Model
from sagemaker.model_monitor import DataCaptureConfig

session = Session()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
CPU times: user 1.59 s, sys: 250 ms, total: 1.84 s
Wall time: 4.05 s


#### 1.2 AWS region and  IAM Role

In [3]:
# Get Execution role
role = get_execution_role()
print("RoleArn:", role)

region = session.boto_region_name
print("Region:", region)

RoleArn: arn:aws:iam::807494057176:role/LabRole
Region: us-east-1


#### 1.3 S3 bucket and prefixes

In [4]:
# Define the S3 bucket (default) and prefix for model monitoring
bucket = session.default_bucket()  # Using the default SageMaker bucket
prefix = "predictive-maintenance-model-monitor"

# Define the S3 paths for data capture, ground truth, and reports
data_capture_prefix = f"{prefix}/datacapture"
s3_capture_upload_path = f"s3://{bucket}/{data_capture_prefix}"
ground_truth_upload_path = f"s3://{bucket}/{prefix}/ground_truth_data/{datetime.now():%Y-%m-%d-%H-%M-%S}"
reports_prefix = f"{prefix}/reports"
s3_report_path = f"s3://{bucket}/{reports_prefix}"

# Model Monitor image URI
monitor_image_uri = image_uris.retrieve(framework="model-monitor", region=region)

# Print setup details
print("S3 Bucket:", bucket)
print(f"Data Capture Path: {s3_capture_upload_path}")
print(f"Ground Truth Path: {ground_truth_upload_path}")
print(f"Report Path: {s3_report_path}")
print(f"Monitor Image URI: {monitor_image_uri}")

S3 Bucket: sagemaker-us-east-1-807494057176
Data Capture Path: s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/datacapture
Ground Truth Path: s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/ground_truth_data/2024-10-22-23-47-31
Report Path: s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/reports
Monitor Image URI: 156813124566.dkr.ecr.us-east-1.amazonaws.com/sagemaker-model-monitor-analyzer


#### 1.4 Test access to the S3 bucket
Verify that the notebook has the right permissions to access the specified S3 bucket. We will attempt to upload a simple test object to ensure that the notebook has the required permissions.

If this command fails, it indicates that the data capture and model monitoring capabilities will not work correctly from this notebook. We set to resolve this issue by updating the role associated with this notebook instance to have "s3:PutObject" permissions and then retry the validation.


In [5]:
from sagemaker.s3 import S3Uploader

# Define the S3 bucket
bucket = 'sagemaker-us-east-1-807494057176'

# Define the local path to the test_data.csv file
file_path = 'test_data.csv'

# Upload the test_data.csv file to the specified S3 bucket
try:
    S3Uploader.upload(file_path, f"s3://{bucket}/test_upload")
    print("Success! You are all set to proceed.")
except FileNotFoundError:
    print(f"Error: File '{file_path}' not found.")
except Exception as e:
    print(f"Error occurred: {e}")

Success! You are all set to proceed.


## Section 2 - Deploy pre-trained model with data capture enabled <a id='deploy'></a>

In this section, we will:
1. Upload the pre-trained XGBoost model to the S3 bucket.
2. Create an Amazon SageMaker Model.
3. Deploy the model to a real-time endpoint with data capture enabled.
4. Create a SageMaker Predictor object to invoke the model.

#### 2.1 Upload the pre-trained model to S3

In this step, you will upload a pre-trained XGBoost model to Amazon S3. This model was trained using the XGB Churn Prediction Notebook in SageMaker. You can replace this model with your own pre-trained model by specifying the correct `s3_key`. 

In [6]:
# Updated model S3 path
model_s3_path = 's3://sagemaker-us-east-1-807494057176/predictive-maintenance-feature-store/output/xgb-2024-10-22-13-55-23/xgb-2024-10-22-13-55-23/output/model.tar.gz'

print(f"Model will be used from: {model_s3_path}")

Model will be used from: s3://sagemaker-us-east-1-807494057176/predictive-maintenance-feature-store/output/xgb-2024-10-22-13-55-23/xgb-2024-10-22-13-55-23/output/model.tar.gz


#### 2.2 Create SageMaker Model entity

This step involves creating an Amazon SageMaker model from the model file uploaded to S3.

In [7]:
from sagemaker.model import Model
from sagemaker import image_uris

# Define model details
model_name = f"sagemaker-xgboost-predictive-maintenance-{datetime.utcnow():%Y-%m-%d-%H-%M-%S}"

# Use the XGBoost image URI from SageMaker
image_uri = image_uris.retrieve(framework="xgboost", version="1.7-1", region=region)

# Create the SageMaker Model entity
model = Model(
    image_uri=image_uri,
    model_data=model_s3_path,  # Use the S3 path from 2.1
    role=role,
    sagemaker_session=session
)

print(f"Model entity created with name: {model_name}")

Model entity created with name: sagemaker-xgboost-predictive-maintenance-2024-10-22-23-47-32


#### 2.3 Deploy the model with data capture enabled.
Now, deploy the SageMaker model to a real-time endpoint on a specific instance type, with data capture enabled to record endpoint invocations, predictions, and metadata.

In [8]:
from sagemaker.model_monitor import DataCaptureConfig

# Define endpoint name
endpoint_name = f"sagemaker-xgboost-endpoint-{datetime.utcnow():%Y-%m-%d-%H-%M-%S}"

# Configure data capture
data_capture_config = DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100,
    destination_s3_uri="s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/datacapture"
)

# Deploy the model to an endpoint
predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge",
    endpoint_name=endpoint_name,
    data_capture_config=data_capture_config
)

print(f"Model deployed to endpoint: {endpoint_name}")


------!Model deployed to endpoint: sagemaker-xgboost-endpoint-2024-10-22-23-47-32


#### 2.4 Create the SageMaker Predictor object from the endpoint to be used for invoking the model
The SageMaker Predictor object will be used to invoke the model and send data for inference.

In [9]:
from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer

# Create the Predictor object for invoking the model
predictor = Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=session,
    serializer=CSVSerializer()
)

print(f"Predictor created for endpoint: {endpoint_name}")

Predictor created for endpoint: sagemaker-xgboost-endpoint-2024-10-22-23-47-32


##  Section 3 - Generate a baseline for model quality performance <a id='generate-baseline'></a>

In this section, we will invoke the endpoint created above using validation data. Predictions from the deployed model using this validation data will be used as a baseline dataset.  We will use SageMaker's Model Monitoring to execute a baseline job that computes model performance data, and suggest model quality constraints based on the baseline dataset.

#### 3.1 Execute predictions using the validation dataset. 

The deployed model returns the probability that an equipment failure will occur. We'll use a cutoff of 0.8 to classify a failure.

In [10]:
failure_cutoff = 0.8
validate_dataset = "validation_with_predictions.csv"
limit = 200  # Need at least 200 samples to compute standard deviations
i = 0
with open(f"{validate_dataset}", "w") as baseline_file:
    baseline_file.write("probability,prediction,label\n")  # our header
    with open("validation_data.csv", "r") as f:
        for row in f:
            (label, input_cols) = row.split(",", 1)
            probability = float(predictor.predict(input_cols))
            prediction = "1" if probability > failure_cutoff else "0"
            baseline_file.write(f"{probability},{prediction},{label}\n")
            i += 1
            if i > limit:
                break
            print(".", end="", flush=True)
            sleep(0.5)
print()
print("Done!")

........................................................................................................................................................................................................
Done!


#### 3.2 Examine the predictions from the model

To verify the predictions, we will display the first few rows of the baseline dataset.

In [11]:
!head validation_with_predictions.csv

probability,prediction,label
0.9942198991775513,1,1
0.947282075881958,1,1
0.9391994476318359,1,1
0.9983158111572266,1,1
0.9875530004501343,1,1
0.9990541338920593,1,1
0.9955667853355408,1,1
0.9984889030456543,1,1
0.9997400641441345,1,1


#### 3.3 Upload the predictions as a baseline dataset.
Next, upload the baseline dataset to S3 to be used for creating baseline statistics and constraints.

In [12]:
from sagemaker.s3 import S3Uploader

# Define the S3 prefixes for baseline data and results
baseline_prefix = prefix + "/baselining"
baseline_data_prefix = baseline_prefix + "/data"
baseline_results_prefix = baseline_prefix + "/results"

# Construct the S3 URIs
baseline_data_uri = f"s3://{bucket}/{baseline_data_prefix}"
baseline_results_uri = f"s3://{bucket}/{baseline_results_prefix}"

print(f"Baseline data uri: {baseline_data_uri}")
print(f"Baseline results uri: {baseline_results_uri}")

# Upload the validation_with_predictions.csv file to S3
baseline_dataset_uri = S3Uploader.upload(validate_dataset, baseline_data_uri)

# Output the URI of the uploaded dataset
print(f"Baseline dataset uploaded to: {baseline_dataset_uri}")

Baseline data uri: s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/baselining/data
Baseline results uri: s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/baselining/results
Baseline dataset uploaded to: s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/baselining/data/validation_with_predictions.csv


#### 3.4 Create a baselining job with validation dataset predictions
Define the model quality monitoring object and execute the baseline job. This job will generate baseline statistics and constraints based on the validation dataset.

In [13]:
from sagemaker.model_monitor import ModelQualityMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat

# Create the ModelQualityMonitor object
predictive_maintenance_monitor = ModelQualityMonitor(
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=20,
    max_runtime_in_seconds=1800,
    sagemaker_session=session,
)

# Define the name of the baseline job
baseline_job_name = f"PM-xgb-model-baseline-job-{datetime.utcnow():%Y-%m-%d-%H%M}"

# Execute the baseline suggestion job
job = predictive_maintenance_monitor.suggest_baseline(
    job_name=baseline_job_name,
    baseline_dataset=baseline_dataset_uri,
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=baseline_results_uri,
    problem_type="BinaryClassification",
    inference_attribute="prediction",
    probability_attribute="probability",
    ground_truth_attribute="label",
)

# Wait for the job to complete
job.wait(logs=False)

print(f"Baseline job '{baseline_job_name}' has completed.")


INFO:sagemaker:Creating processing-job with name PM-xgb-model-baseline-job-2024-10-22-2352


...........................................................!Baseline job 'PM-xgb-model-baseline-job-2024-10-22-2352' has completed.


#### 3.5 Explore the results of the baselining job


##### 3.5.1 View the metrics generated
The baseline statistics and constraints files are already uploaded to the S3 location.

In [14]:
# Retrieve the latest baseline job
baseline_job = predictive_maintenance_monitor.latest_baselining_job

# View the baseline metrics generated
binary_metrics = baseline_job.baseline_statistics().body_dict["binary_classification_metrics"]

# Display the metrics in a structured format
pd.set_option('display.max_rows', None)  # Optional: to view all rows
print(pd.json_normalize(binary_metrics).T)


                                                                                                    0
confusion_matrix.1.1                                                                              200
confusion_matrix.1.0                                                                                1
confusion_matrix.0.1                                                                                0
confusion_matrix.0.0                                                                                0
recall.value                                                                                 0.995025
recall.standard_deviation                                                                     0.00239
precision.value                                                                                   1.0
precision.standard_deviation                                                                      0.0
accuracy.value                                                                    

# Results:

### **Confusion Matrix**
The confusion matrix indicates how well the model classified the validation data:

- **confusion_matrix.1.1 (True Positives): 200**  
  The model correctly identified 200 cases as "failures" (positive class), which represents accurate detection of equipment failures.

- **confusion_matrix.1.0 (False Negatives): 1**  
  The model missed 1 case of actual failure, misclassifying it as "no failure." This suggests a low rate of missed failures.

- **confusion_matrix.0.1 (False Positives): 0**  
  No cases were mistakenly classified as failures when they weren’t. This indicates that the model didn't produce false alarms.

- **confusion_matrix.0.0 (True Negatives): 0**  
  No instances were correctly identified as "no failure." This suggests that the validation set might have included mostly positive cases.

### **Key Metrics**
- **Recall.value: 0.995025**  
  Recall measures the model's ability to identify actual failures. At ~99.5%, it indicates that the model is highly effective at detecting true failures, with a low chance of missing them.

- **Recall.standard_deviation: 0.002447**  
  This low standard deviation shows that the recall is consistent across the validation data.

- **Precision.value: 1.0**  
  Precision measures the accuracy of the model’s positive predictions. A precision of 1.0 means that every failure predicted by the model was indeed a failure, indicating no false alarms.

- **Precision.standard_deviation: 0.0**  
  The zero standard deviation suggests perfect consistency in the model’s positive predictions.

- **Accuracy.value: 0.995025**  
  Accuracy represents the proportion of correct predictions (both positive and negative). At ~99.5%, the model accurately predicts failures most of the time.

- **Accuracy.standard_deviation: 0.002447**  
  The accuracy is consistent across the validation data, with a low standard deviation.

### **Other Metrics**
- **AUC (Area Under the ROC Curve): 1.0**  
  The AUC measures the model's ability to distinguish between classes. A perfect score of 1.0 suggests that the model differentiates failures from non-failures perfectly.

- **AU PRC (Area Under the Precision-Recall Curve): 1.0**  
  This metric measures the model’s performance in handling imbalanced datasets. A score of 1.0 indicates excellent precision-recall balance.

- **F1 Score: 0.997506**  
  The F1 score balances precision and recall. At ~99.7%, it reflects a strong balance between detecting failures and avoiding false positives.

- **F2 Score: 0.996016**  
  The F2 score places more weight on recall than precision, making it suitable for applications where missing failures is more costly. The model’s F2 score (~99.6%) suggests high effectiveness in this context.

### **Interpretation**
The results indicate a highly accurate and consistent model with minimal false negatives, making it well-suited for predictive maintenance. However, the presence of one false negative indicates a slight risk of missing a failure, which could be costly in this context. Despite that, the model achieves perfect precision, meaning no false positives occurred.

The model demonstrates a strong ability to predict equipment failures with a high degree of reliability, which aligns well with the goals of minimizing unplanned downtime. The metrics suggest that the model is performing optimally, but continued monitoring and updates may be needed to maintain this level of performance over time.

##### 3.5.2 View the constraints generated
Check the suggested constraints from the baseline job to ensure model quality.

In [15]:
# View the suggested constraints from the baseline job
constraints = pd.DataFrame(baseline_job.suggested_constraints().body_dict["binary_classification_constraints"])

# Display the constraints in a structured format
print(constraints.T)

                    threshold   comparison_operator
recall               0.995025     LessThanThreshold
precision                 1.0     LessThanThreshold
accuracy             0.995025     LessThanThreshold
true_positive_rate   0.995025     LessThanThreshold
true_negative_rate       None     LessThanThreshold
false_positive_rate      None  GreaterThanThreshold
false_negative_rate  0.004975  GreaterThanThreshold
auc                       1.0     LessThanThreshold
f0_5                 0.999001     LessThanThreshold
f1                   0.997506     LessThanThreshold
f2                   0.996016     LessThanThreshold


In the above example, the model quality monitor suggested a constraint to ensure that the model’s F2 score does not drop below **0.996**. Some of the generated constraints, such as **precision**, may be a bit aggressive, as they will trigger alerts for any drops below **1.0**. It is advisable to adjust these constraints as needed before implementing them for ongoing monitoring.

##  Section 4 - Setup continuous model monitoring to identify model quality drift <a id='analyze-model-quality-drift'></a>

In this section, we will set up a continuous model monitoring job that tracks the quality of the deployed predictive maintenance model against the baseline established in the previous section. This monitoring is crucial to ensure that the model's performance remains consistent and does not degrade over time.

In addition to the baseline metrics and constraints, Amazon SageMaker Model Quality Monitoring requires two additional inputs:

1. **Predictions Made by the Deployed Model Endpoint**: This data is automatically captured in S3, as data capture was enabled during deployment.
2. **Ground Truth Data**: This represents the actual outcomes that occur after predictions are made, such as whether the equipment truly failed or not. Ground truth data is essential for measuring model accuracy over time. In the context of predictive maintenance, this could be obtained from maintenance records or operational logs indicating actual equipment failures.

To simulate real-world conditions, we will generate synthetic ground truth data. This approach allows us to evaluate the monitoring system’s ability to detect deviations from the baseline performance and ensure that the model maintains its predictive accuracy in identifying equipment failures.

The steps in this section include:
1. **Generate Prediction Data**: We will simulate traffic by sending data to the model endpoint to capture predictions.
2. **Generate Synthetic Ground Truth Data**: We will create synthetic ground truth data to compare with predictions.
3. **Create a Monitoring Schedule**: We will set up the monitoring job to run at regular intervals, checking the model's predictions against the ground truth data.
4. **View Captured Data**: We will examine the captured data in S3 to ensure it is properly logged and aligned with the ground truth.
5. **Analyze Violations**: We will review potential violations compared to the baseline to detect any significant quality drift.

By the end of this section, we will have established a continuous monitoring setup that ensures the predictive maintenance model maintains its quality and consistency, minimizing the risk of undetected equipment failures.

#### 4.1 Generate prediction data for Model Quality  Monitoring

We will begin by generating some artificial traffic to simulate inference requests to the model endpoint. The code cell below starts a thread that continuously sends data to the endpoint, triggering predictions and enabling data capture.

It's important to note that we need to keep this process running to ensure that data is continuously sent to the endpoint, as the monitoring jobs depend on having enough data to process. If no traffic is detected, the monitoring jobs will be marked as "Failed" due to the lack of incoming data.

In [16]:
def invoke_endpoint(ep_name, file_name):
    with open(file_name, "r") as f:
        i = 0
        for row in f:
            payload = row.rstrip("\n")
            response = session.sagemaker_runtime_client.invoke_endpoint(
                EndpointName=endpoint_name,
                ContentType="text/csv",
                Body=payload,
                InferenceId=str(i),  # unique ID per row
            )["Body"].read()
            i += 1
            sleep(1)


def invoke_endpoint_forever():
    while True:
        try:
            invoke_endpoint(endpoint_name, "batch_data_noID.csv")
        except session.sagemaker_runtime_client.exceptions.ValidationError:
            pass


thread = Thread(target=invoke_endpoint_forever)
thread.start()

Note: We are setting a new attribute, inferenceId, when invoking the endpoint. This attribute allows us to associate the prediction data with the corresponding ground truth data, ensuring accurate linkage during model quality evaluation.

#### 4.2 View captured data

We will now list the data capture files stored in Amazon S3. These files are expected to be organized based on the time period in which each invocation occurred. The file path structure in S3 follows this format:

`s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`

In [17]:
print("Waiting for captures to show up", end="")
for _ in range(120):
    capture_files = sorted(S3Downloader.list(f"{s3_capture_upload_path}/{endpoint_name}"))
    if capture_files:
        capture_file = S3Downloader.read_file(capture_files[-1]).split("\n")
        capture_record = json.loads(capture_file[0])
        if "inferenceId" in capture_record["eventMetadata"]:
            break
    print(".", end="", flush=True)
    sleep(1)
print()
print("Found Capture Files:")
print("\n ".join(capture_files[-3:]))

Waiting for captures to show up.................................
Found Capture Files:
s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/datacapture/sagemaker-xgboost-endpoint-2024-10-22-23-47-32/AllTraffic/2024/10/22/23/51-05-826-008bad9e-9650-4900-9765-1f73eeb6336c.jsonl
 s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/datacapture/sagemaker-xgboost-endpoint-2024-10-22-23-47-32/AllTraffic/2024/10/22/23/52-06-188-f7852649-01e1-4643-9bff-1798838df9dc.jsonl
 s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/datacapture/sagemaker-xgboost-endpoint-2024-10-22-23-47-32/AllTraffic/2024/10/22/23/57-52-827-0ec4dec3-0b95-4dd0-8630-c22be0058955.jsonl


Next, we will examine the contents of one of the captured data files stored in S3. These files are formatted as Amazon SageMaker-specific JSON-lines, capturing both input and output data from the model endpoint.

In [18]:
print("\n".join(capture_file[-3:-1]))

{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"160599109,4526,0,744,10,245268,8,8,3","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"0.9904484152793884\n","encoding":"CSV"}},"eventMetadata":{"eventId":"8e1e6286-a4bf-4d26-91f6-95c9cdbb0f35","inferenceId":"56","inferenceTime":"2024-10-22T23:58:51Z"},"eventVersion":"0"}
{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"241881004,3362,0,70,6,113180,15,15,0","encoding":"CSV"},"endpointOutput":{"observedContentType":"text/csv; charset=utf-8","mode":"OUTPUT","data":"0.9877150058746338\n","encoding":"CSV"}},"eventMetadata":{"eventId":"6e349854-fac2-4251-8e51-e564a8baf1c0","inferenceId":"57","inferenceTime":"2024-10-22T23:58:52Z"},"eventVersion":"0"}


The contents of a single line are shown below in a formatted JSON file for clearer observation.

Key components to note:
- **`endpointInput`**: Contains information about the input data sent to the endpoint, including its content type, mode, data, and encoding.
- **`endpointOutput`**: Includes the model's output, such as content type, mode, the prediction data, and encoding.
- **`eventMetadata`**: Provides metadata about the event, including **`eventId`**, **`inferenceId`**, and **`inferenceTime`**.
- **`eventVersion`**: Indicates the version of the event format.

Notice the **`inferenceId`** attribute set during the `invoke_endpoint` call. This attribute is used to link prediction data with the corresponding ground truth data. If **`inferenceId`** is unavailable, **`eventId`** will be used instead for this purpose.

In [19]:
print(json.dumps(capture_record, indent=2))

{
  "captureData": {
    "endpointInput": {
      "observedContentType": "text/csv",
      "mode": "INPUT",
      "data": "64506368,0,0,0,8,296181,0,0,0",
      "encoding": "CSV"
    },
    "endpointOutput": {
      "observedContentType": "text/csv; charset=utf-8",
      "mode": "OUTPUT",
      "data": "0.08915980160236359\n",
      "encoding": "CSV"
    }
  },
  "eventMetadata": {
    "eventId": "4d056182-e5df-4a00-96cd-191e104e2239",
    "inferenceId": "0",
    "inferenceTime": "2024-10-22T23:57:52Z"
  },
  "eventVersion": "0"
}


#### 4.3 Generate synthetic ground truth

Now, we will generate synthetic ground truth data to simulate the actual outcomes of equipment conditions. This is crucial, as the model quality monitoring job requires ground truth data to merge with the captured prediction data. Without it, the monitoring job will fail.

The synthetic ground truth will assign a label of "1" (indicating failure) 70% of the time, mimicking a scenario with a higher rate of failures.

In [20]:
import random


def ground_truth_with_id(inference_id):
    random.seed(inference_id)  # to get consistent results
    rand = random.random()
    return {
        "groundTruthData": {
            "data": "1" if rand < 0.7 else "0",  # randomly generate positive labels 70% of the time
            "encoding": "CSV",
        },
        "eventMetadata": {
            "eventId": str(inference_id),
        },
        "eventVersion": "0",
    }


def upload_ground_truth(records, upload_time):
    fake_records = [json.dumps(r) for r in records]
    data_to_upload = "\n".join(fake_records)
    target_s3_uri = f"{ground_truth_upload_path}/{upload_time:%Y/%m/%d/%H/%M%S}.jsonl"
    print(f"Uploading {len(fake_records)} records to", target_s3_uri)
    S3Uploader.upload_string_as_file_body(data_to_upload, target_s3_uri)

In [21]:
NUM_GROUND_TRUTH_RECORDS = 334  # 334 are the number of rows in data we're sending for inference


def generate_fake_ground_truth_forever():
    j = 0
    while True:
        fake_records = [ground_truth_with_id(i) for i in range(NUM_GROUND_TRUTH_RECORDS)]
        upload_ground_truth(fake_records, datetime.utcnow())
        j = (j + 1) % 5
        sleep(60 * 60)  # do this once an hour


gt_thread = Thread(target=generate_fake_ground_truth_forever)
gt_thread.start()

Uploading 334 records to s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/ground_truth_data/2024-10-22-23-47-31/2024/10/22/23/5856.jsonl
Uploading 334 records to s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/ground_truth_data/2024-10-22-23-47-31/2024/10/23/00/5857.jsonl


#### 4.4 Create a monitoring schedule

With the baseline metrics established and synthetic ground truth data being generated, we can now set up a monitoring schedule. This schedule will trigger a model quality monitoring job at regular intervals, checking for deviations in model performance.

The monitoring job will compare the captured predictions against the ground truth data and generate alerts if the model's performance drops below the baseline thresholds.

In [22]:
# Define the monitoring schedule name
monitor_schedule_name = f"PM-xgb-predictive-maintenance-schedule-{datetime.utcnow():%Y-%m-%d-%H%M}"

# Output the name for verification
print(f"Monitoring Schedule Name: {monitor_schedule_name}")

Monitoring Schedule Name: PM-xgb-predictive-maintenance-schedule-2024-10-22-2358


To configure the monitoring schedule, we need to define how to interpret the model's output. The endpoint in this notebook outputs predictions in CSV format, where the first column (index 0) contains the probability of failure.

For this predictive maintenance model, we set a cutoff of 0.5 to classify the prediction as a positive label. In this case, a probability above 0.5 indicates a failure prediction, while a value below 0.5 indicates no failure.

This configuration ensures that the monitoring schedule accurately interprets the endpoint’s output for model quality assessments and drift detection.

In [23]:
from sagemaker.model_monitor import EndpointInput

# Create an EndpointInput object
endpoint_input = EndpointInput(
    endpoint_name=predictor.endpoint_name,
    probability_attribute="0",  # Index for the probability output (e.g., first column)
    probability_threshold_attribute=0.5,  # Threshold for binary classification (e.g., 0.5)
    destination="/opt/ml/processing/input_data"
)

# Print for verification
print(f"Endpoint Input Created: {endpoint_input}")

Endpoint Input Created: <sagemaker.model_monitor.model_monitoring.EndpointInput object at 0x7f31ab8821d0>


In [24]:
from sagemaker.model_monitor import CronExpressionGenerator

# Create the monitoring schedule to execute every hour
response = predictive_maintenance_monitor.create_monitoring_schedule(
    monitor_schedule_name=monitor_schedule_name,
    endpoint_input=endpoint_input,
    output_s3_uri=baseline_results_uri,
    problem_type="BinaryClassification",
    ground_truth_input=ground_truth_upload_path,
    constraints=baseline_job.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
)

# Print the response for verification
print("Monitoring schedule created:", response)

INFO:sagemaker.model_monitor.model_monitoring:Creating Monitoring Schedule with name: PM-xgb-predictive-maintenance-schedule-2024-10-22-2358


Monitoring schedule created: None


In [25]:
# Describe the monitoring schedule
schedule_description = predictive_maintenance_monitor.describe_schedule()

# Print the schedule details for verification
print("Monitoring Schedule Details:")
print(schedule_description)


Monitoring Schedule Details:
{'MonitoringScheduleArn': 'arn:aws:sagemaker:us-east-1:807494057176:monitoring-schedule/PM-xgb-predictive-maintenance-schedule-2024-10-22-2358', 'MonitoringScheduleName': 'PM-xgb-predictive-maintenance-schedule-2024-10-22-2358', 'MonitoringScheduleStatus': 'Pending', 'MonitoringType': 'ModelQuality', 'CreationTime': datetime.datetime(2024, 10, 22, 23, 58, 57, 562000, tzinfo=tzlocal()), 'LastModifiedTime': datetime.datetime(2024, 10, 22, 23, 58, 57, 635000, tzinfo=tzlocal()), 'MonitoringScheduleConfig': {'ScheduleConfig': {'ScheduleExpression': 'cron(0 * ? * * *)'}, 'MonitoringJobDefinitionName': 'model-quality-job-definition-2024-10-22-23-58-56-974', 'MonitoringType': 'ModelQuality'}, 'EndpointName': 'sagemaker-xgboost-endpoint-2024-10-22-23-47-32', 'ResponseMetadata': {'RequestId': '28dd1c12-bfbb-438a-9aed-010e82cf031a', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '28dd1c12-bfbb-438a-9aed-010e82cf031a', 'content-type': 'application/x-amz-jso

#### 4.5 Examine monitoring schedule executions

In [26]:
# List the executions for the created monitoring schedule
executions = predictive_maintenance_monitor.list_executions()

# Print the list of executions for verification
print("List of Executions:")
for execution in executions:
    print(execution)



List of Executions:


In [27]:
# Wait for the first execution of the monitoring_schedule
print("Waiting for first execution", end="")
while True:
    execution = predictive_maintenance_monitor.describe_schedule().get(
        "LastMonitoringExecutionSummary"
    )
    if execution:
        break
    print(".", end="", flush=True)
    sleep(10)
print()
print("Execution found!")

Waiting for first execution.....................................
Execution found!


In [28]:
while not executions:
    executions = predictive_maintenance_monitor.list_executions()
    print(".", end="", flush=True)
    sleep(10)
latest_execution = executions[-1]
latest_execution.describe()

...............

{'ProcessingInputs': [{'InputName': 'groundtruth_input_1',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/ground_truth_data/2024-10-22-23-47-31/2024/10/22/23',
    'LocalPath': '/opt/ml/processing/groundtruth/2024/10/22/23',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated',
    'S3CompressionType': 'None'}},
  {'InputName': 'endpoint_input_1',
   'AppManaged': False,
   'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/datacapture/sagemaker-xgboost-endpoint-2024-10-22-23-47-32/AllTraffic/2024/10/22/23',
    'LocalPath': '/opt/ml/processing/input_data/sagemaker-xgboost-endpoint-2024-10-22-23-47-32/AllTraffic/2024/10/22/23',
    'S3DataType': 'S3Prefix',
    'S3InputMode': 'File',
    'S3DataDistributionType': 'FullyReplicated',
    'S3CompressionType': 'None'}}],
 'ProcessingOutputConfig': {'Outputs': [{'Outpu

### Inspect a Specific Execution (Latest Execution)

In the previous step, we retrieved the latest execution of the monitoring schedule. Here’s an overview of the potential terminal states and what they indicate:

- **Completed**: The monitoring execution finished successfully, with no constraint violations detected in the report.
- **CompletedWithViolations**: The execution completed, but constraint violations were found, indicating possible quality drift.
- **Failed**: The monitoring execution failed, potentially due to client-side errors (e.g., incorrect role permissions) or infrastructure issues. To identify the cause, we need to review the `FailureReason` and `ExitMessage`.
- **Stopped**: The job was either stopped manually or exceeded the maximum runtime.

This inspection allows us to understand the current status of the monitoring job and respond accordingly:

- If **violations** are detected, we should review the report for further details.
- If the job **failed**, we need to investigate based on the provided failure reason and exit message.
- If the job was **stopped**, we should verify whether this was intentional or due to a runtime limit.

This approach ensures that we can effectively manage and maintain model quality, taking corrective action when necessary to keep the model performing within acceptable limits.

In [30]:
status = execution["MonitoringExecutionStatus"]

while status in ["Pending", "InProgress"]:
    print("Waiting for execution to finish", end="")
    latest_execution.wait(logs=False)
    latest_job = latest_execution.describe()
    print()
    print(f"{latest_job['ProcessingJobName']} job status:", latest_job["ProcessingJobStatus"])
    print(
        f"{latest_job['ProcessingJobName']} job exit message, if any:",
        latest_job.get("ExitMessage"),
    )
    print(
        f"{latest_job['ProcessingJobName']} job failure reason, if any:",
        latest_job.get("FailureReason"),
    )
    sleep(
        30
    )  # model quality executions consist of two Processing jobs, wait for second job to start
    latest_execution = predictive_maintenance_monitor.list_executions()[-1]
    execution = predictive_maintenance_monitor.describe_schedule()["LastMonitoringExecutionSummary"]
    status = execution["MonitoringExecutionStatus"]

print("Execution status is:", status)

if status != "Completed":
    print(execution)
    print(
        "====STOP==== \n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures."
    )

Waiting for execution to finish!
groundtruth-merge-202410230000-c9dc23b38db6393a3ca9aa0c job status: Completed
groundtruth-merge-202410230000-c9dc23b38db6393a3ca9aa0c job exit message, if any: None
groundtruth-merge-202410230000-c9dc23b38db6393a3ca9aa0c job failure reason, if any: None
Execution status is: CompletedWithViolations
{'MonitoringScheduleName': 'PM-xgb-predictive-maintenance-schedule-2024-10-22-2358', 'ScheduledTime': datetime.datetime(2024, 10, 23, 0, 0, tzinfo=tzlocal()), 'CreationTime': datetime.datetime(2024, 10, 23, 0, 5, 10, 289000, tzinfo=tzlocal()), 'LastModifiedTime': datetime.datetime(2024, 10, 23, 0, 21, 58, 674000, tzinfo=tzlocal()), 'MonitoringExecutionStatus': 'CompletedWithViolations', 'ProcessingJobArn': 'arn:aws:sagemaker:us-east-1:807494057176:processing-job/model-quality-monitoring-202410230000-c9dc23b38db6393a3ca9aa0c', 'EndpointName': 'sagemaker-xgboost-endpoint-2024-10-22-23-47-32'}
====STOP==== 
 No completed executions to inspect further. Please wait

In [32]:
latest_execution = predictive_maintenance_monitor.list_executions()[-1]
report_uri = latest_execution.describe()["ProcessingOutputConfig"]["Outputs"][0]["S3Output"][
    "S3Uri"
]
print("Report Uri:", report_uri)

Report Uri: s3://sagemaker-us-east-1-807494057176/predictive-maintenance-model-monitor/baselining/results/sagemaker-xgboost-endpoint-2024-10-22-23-47-32/PM-xgb-predictive-maintenance-schedule-2024-10-22-2358/2024/10/23/00


#### 4.5 View violations generated by monitoring schedule

If any violations were detected during the monitoring execution, they are recorded in the violations report generated by Amazon SageMaker Model Monitor. These reports are uploaded to the specified S3 location.

In [33]:
pd.options.display.max_colwidth = None
violations = latest_execution.constraint_violations().body_dict["violations"]
violations_df = pd.json_normalize(violations)
violations_df.head(10)

Unnamed: 0,constraint_check_type,description,metric_name
0,LessThanThreshold,Metric auc with 0.4617049617049617 was LessThanThreshold '1.0',auc
1,LessThanThreshold,Metric precision with 0.7391304347826086 was LessThanThreshold '1.0',precision
2,LessThanThreshold,Metric truePositiveRate with 0.18681318681318682 was LessThanThreshold '0.9950248756218906',truePositiveRate
3,LessThanThreshold,Metric f1 with 0.2982456140350877 was LessThanThreshold '0.9975062344139651',f1
4,LessThanThreshold,Metric accuracy with 0.3548387096774194 was LessThanThreshold '0.9950248756218906',accuracy
5,GreaterThanThreshold,Metric falseNegativeRate with 0.8131868131868132 was GreaterThanThreshold '0.00497512437810943',falseNegativeRate
6,LessThanThreshold,Metric recall with 0.18681318681318682 was LessThanThreshold '0.9950248756218906',recall
7,LessThanThreshold,Metric f2 with 0.2196382428940568 was LessThanThreshold '0.9960159362549802',f2


#### Interpretation
- The **F2 score** is lower than the threshold set during baseline creation, indicating that the model's recall-focused performance has dropped.
- The **AUC**, **precision**, **truePositiveRate**, and **recall** also show a decline, suggesting a decrease in the model's ability to distinguish between classes.
- The **falseNegativeRate** is higher than expected, indicating that the model is missing more failure predictions than anticipated.

#### Next Steps
- **Investigate the cause** of these violations, which could be due to changes in data distribution, model performance degradation, or other factors.
- **Retrain or fine-tune the model** if significant drifts are observed.
- **Adjust baseline thresholds** if necessary, ensuring that they are set realistically to balance precision and recall without triggering unnecessary alerts.

Monitoring these violations helps maintain model performance, ensuring that predictive maintenance remains effective and reliable over time.

## Section 5 - Analyze model quality CloudWatch metrics <a id='analyze-cloudwatch-metrics'></a> 

In addition to generating violation reports, the monitoring schedule also emits CloudWatch metrics that provide detailed insights into the model's performance over time. By tracking these metrics, we can set up CloudWatch alarms to automatically alert us when the model’s performance drifts from baseline thresholds. This allows us to take remedial actions like model retraining or updating the training dataset to maintain model quality.

#### 5.1 List the CW metrics generated.

In [36]:
# Create CloudWatch client
cw_client = boto3.Session().client("cloudwatch")

namespace = "aws/sagemaker/Endpoints/model-metrics"

cw_dimensions = [
    {"Name": "Endpoint", "Value": endpoint_name},
    {"Name": "MonitoringSchedule", "Value": monitor_schedule_name},
]

In [37]:
# List metrics through the pagination interface
paginator = cw_client.get_paginator("list_metrics")

for response in paginator.paginate(Dimensions=cw_dimensions, Namespace=namespace):
    model_quality_metrics = response["Metrics"]
    for metric in model_quality_metrics:
        print(metric["MetricName"])

recall
total_number_of_violations
f0_5_best_constant_classifier
f1_best_constant_classifier
f2
f0_5
precision_best_constant_classifier
accuracy
auc
true_positive_rate
au_prc
recall_best_constant_classifier
f1
precision
f2_best_constant_classifier
true_negative_rate
accuracy_best_constant_classifier
false_negative_rate
false_positive_rate


#### 5.2 Create a CloudWatch Alarm

Based on the metrics emitted by the model quality monitoring schedule, we can set up a CloudWatch alarm to alert us when the F2 score falls below the threshold suggested by the baseline constraints. This ensures we are promptly notified about potential model performance degradation.

In [40]:
alarm_name = "MODEL_QUALITY_F2_SCORE"
alarm_desc = (
    "Trigger an CloudWatch alarm when the f2 score drifts away from the baseline constraints"
)
mdoel_quality_f2_drift_threshold = (
    0.5  ##Setting this threshold purposefully low to see the alarm quickly.
)
metric_name = "f2"
namespace = "aws/sagemaker/Endpoints/model-metrics"

cw_client.put_metric_alarm(
    AlarmName=alarm_name,
    AlarmDescription=alarm_desc,
    ActionsEnabled=True,
    MetricName=metric_name,
    Namespace=namespace,
    Statistic="Average",
    Dimensions=[
        {"Name": "Endpoint", "Value": endpoint_name},
        {"Name": "MonitoringSchedule", "Value": monitor_schedule_name},
    ],
    Period=600,
    EvaluationPeriods=1,
    DatapointsToAlarm=1,
    Threshold=mdoel_quality_f2_drift_threshold,
    ComparisonOperator="LessThanOrEqualToThreshold",
    TreatMissingData="breaching",
)

{'ResponseMetadata': {'RequestId': '3e03841b-f00d-40ce-8d51-d5185439a10f',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '3e03841b-f00d-40ce-8d51-d5185439a10f',
   'content-type': 'text/xml',
   'content-length': '214',
   'date': 'Wed, 23 Oct 2024 01:01:49 GMT'},
  'RetryAttempts': 0}}

#### 5.3 Validation
In a few minutes, you should see a CloudWatch alarm created. The alarm will first be in "Insufficient Data" state and moves into "Alert" state. This can be verified in the CloudWatch console

Once the CW Alarm is generated, you can decide on what actions you want to take on these alerts.  A possible action could be updating the training data an retraining the model 


## Clean up <a id='cleanup'></a>  

You can keep your endpoint running to continue capturing data. If you do not plan to collect more data or use this endpoint further, you should delete the endpoint to avoid incurring additional charges. Note that deleting your endpoint does not delete the data that was captured during the model invocations. That data persists in Amazon S3 until you delete it yourself.

But before that, you need to delete the schedule first.

In [2]:
predictive_maintenance_monitor.delete_monitoring_schedule()
sleep(60)  # actually wait for the deletion

NameError: name 'predictive_maintenance_monitor' is not defined

In [3]:
predictor.delete_model()
predictor.delete_endpoint()

NameError: name 'predictor' is not defined

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker_model_monitor|model_quality|model_quality_churn_sdk.ipynb)
