# SageMaker Model Monitor with Batch Transform - Model Quality Monitoring On-schedule

In this notebook, we use SageMaker Model Monitor to monitor the Model quality of a batch transform job.

Model quality monitoring jobs monitor the performance of a model by comparing the predictions that the model makes with the actual ground truth labels that the model attempts to predict. To do this, model quality monitoring merges data that is captured from real-time inference with actual labels that you store in an Amazon S3 bucket, and then compares the predictions with the actual labels.

### Setup

In [None]:
import sys

! pip install --upgrade pip
!{sys.executable} -m pip install sagemaker==2.114.0
!{sys.executable} -m pip install -U boto3

In [None]:
import sys

!{sys.executable} -m pip show sagemaker

If you run this notebook in SageMaker Studio, you need to make sure latest python SDK is installed and restart the kernel, so please uncomment the code in the next cell, and run it.

In [None]:
# import IPython
# IPython.Application.instance().kernel.do_shutdown(True)  # has to restart kernel so changes are used

In [None]:
%%time

# Handful of configuration

import os
import boto3
import re
import json
from datetime import datetime, timedelta
from sagemaker import get_execution_role, session
import pandas as pd

region = boto3.Session().region_name

role = get_execution_role()
print("RoleArn: {}".format(role))

# You can use a different bucket, but make sure the role you chose for this notebook
# has the s3:PutObject permissions. This is the bucket into which the data is captured
bucket = session.Session(boto3.Session()).default_bucket()
print("Demo Bucket: {}".format(bucket))
prefix = "sagemaker/DEMO-ModelMonitor"

data_capture_prefix = "{}/datacapture".format(prefix)
s3_capture_upload_path = "s3://{}/{}".format(bucket, data_capture_prefix)
reports_prefix = "{}/reports".format(prefix)
s3_report_path = "s3://{}/{}".format(bucket, reports_prefix)

transform_output_path = "s3://{}/{}/transform-outputs".format(bucket, prefix)

print("Transform Output path: {}".format(transform_output_path))
print("Capture path: {}".format(s3_capture_upload_path))
print("Report path: {}".format(s3_report_path))

In [None]:
s3 = boto3.client("s3")

### 1) Create  model in Amazon SageMaker
Create a SageMaker Model from pre-trained churn prediction model. 

In [None]:
model_file = open("model/xgb-churn-prediction-model.tar.gz", "rb")
s3_key = os.path.join(prefix, "xgb-churn-prediction-model.tar.gz")
boto3.Session().resource("s3").Bucket(bucket).Object(s3_key).upload_fileobj(model_file)

In [None]:
from time import gmtime, strftime
from sagemaker.model import Model
from sagemaker.image_uris import retrieve

model_name = "DEMO-xgb-churn-pred-model-monitor-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
model_url = "https://{}.s3-{}.amazonaws.com/{}/xgb-churn-prediction-model.tar.gz".format(
    bucket, region, prefix
)

image_uri = retrieve("xgboost", boto3.Session().region_name, "0.90-1")

model = Model(image_uri=image_uri, model_data=model_url, role=role)

### 2) Upload test data for batch inference that are used as input for a Batch Transform Job

In [None]:
!aws s3 cp test_data/test-dataset-input-cols.csv s3://{bucket}/transform-input/test-dataset-input-cols.csv

### 3) Create the Batch Transform Job

In [None]:
from sagemaker.inputs import BatchDataCaptureConfig

In [None]:
transfomer = model.transformer(
    instance_count=1,
    instance_type="ml.m4.xlarge",
    accept="text/csv",
    assemble_with="Line",
    output_path=transform_output_path,
)

transfomer.transform(
    "s3://{}/transform-input".format(bucket),
    content_type="text/csv",
    split_type="Line",
    # we join the input and the output (you can set this to None)
    join_source="Input",
    # configure the data capturing
    batch_data_capture_config=BatchDataCaptureConfig(
        destination_s3_uri=s3_capture_upload_path,
        # set it to true for model quality monitoring
        generate_inference_id=True,
    ),
    # wait=True,
)

### 4) Examine the Batch Transform Output

#### Captured data

There are two directories under `s3_capture_upload_path`, one is the `/input`, another is the `/output`. Under the `/input` is the captured data file for transform input, whereas, the under the `/output` is the captured data file for transform output. Note that, batch transform data capture is unlike Endpoint data capture, it does not capture the data and log to s3 as this will create tremendous amount of duplications. Instead, batch transform captures data in manifests. The manifests contains the source transform input or output s3 locations.

Lets take a look at the captured data. 

In [None]:
!aws s3 ls {s3_capture_upload_path}/input/ --recursive

In [None]:
captured_input_s3_key = [
    k["Key"]
    for k in s3.list_objects_v2(Bucket=bucket, Prefix=f"{data_capture_prefix}/input/")["Contents"]
]
assert len(captured_input_s3_key) > 0

In [None]:
sample_input_body = s3.get_object(Bucket=bucket, Key=captured_input_s3_key[0])["Body"]
sample_input_content = json.loads(sample_input_body.read())

In [None]:
sample_input_content

Like, we have output here.

In [None]:
!aws s3 ls {s3_capture_upload_path}/output/ --recursive

In [None]:
captured_input_s3_key = [
    k["Key"]
    for k in s3.list_objects_v2(Bucket=bucket, Prefix=f"{data_capture_prefix}/output/")["Contents"]
]
assert len(captured_input_s3_key) > 0
sample_output_body = s3.get_object(Bucket=bucket, Key=captured_input_s3_key[0])["Body"]
sample_output_content = json.loads(sample_output_body.read())

In [None]:
sample_output_content

#### Batch Transform Inference Result

Since the `generate_inference_id` flag is turned on, during inference, an inference id, and an inference time (when we start to run the transform job) are appended to the `.out` file. If your input file is a CSV, then the inference id, and inference time will always be appended as the last two columns. If your input file is a JSON, then the `SageMakerInferenceId` and `SageMakerInferenceTime` attributes are added.

In [None]:
output_prefix = transfomer.output_path.split("/")[-1]

In [None]:
inf_output_s3_key = [
    k["Key"]
    for k in s3.list_objects_v2(Bucket=bucket, Prefix=f"{prefix}/{output_prefix}")["Contents"]
]
assert len(inf_output_s3_key) > 0

In [None]:
bucket, inf_output_s3_key

In [None]:
sample_inf_output_body = s3.get_object(Bucket=bucket, Key=inf_output_s3_key[0])["Body"]
inf_outputs = sample_inf_output_body.read().decode("utf-8").strip().split("\n")

As we can see above, the second last element is the inference id, and the last is the inference time. The last two columns are needed to run model quality monitoring. They are used to match the ground truth provided.

Since we joined the input and output, so the third last column is the inference result, whereas the rest are inputs. 

### 5) Prepare the Ground Truth

https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-merge.html

In [None]:
# read the ground truth for the test dataset
test_gt = pd.read_csv("./test_data/test-dataset-gt-col.csv", header=None)
assert len(test_gt) == len(inf_outputs)

In [None]:
tmp_monitoring_gt_dir = "./test_gt"
!rm -rf {tmp_monitoring_gt_dir}
os.makedirs(tmp_monitoring_gt_dir, exist_ok=True)

In [None]:
for idx, inf_output in enumerate(inf_outputs):
    inf_components = inf_output.split(",")
    inference_id = inf_components[-2]
    json.dump(
        {
            "groundTruthData": {
                # note that the data has to be a comma delimited string
                "data": str(test_gt.values[idx, 0]),
                "encoding": "CSV",
            },
            "eventMetadata": {
                "eventId": inference_id,
            },
            "eventVersion": "0",
        },
        open(f"./{tmp_monitoring_gt_dir}/{inference_id}.jsonl", "w"),
    )

Let's upload the ground truth data to s3. 

In [None]:
gt_s3_uri = f"s3://{bucket}/{prefix}/GT"
curr_utc = datetime.utcnow()
for i in range(-1, 2):
    curr_utc = datetime.utcnow() + timedelta(hours=i)
    
    adjusted_month = f"0{curr_utc.month}" if len(str(curr_utc.month)) == 1 else curr_utc.month
    adjusted_day = f"0{curr_utc.day}" if len(str(curr_utc.day)) == 1 else curr_utc.day
    adjusted_hr = f"0{curr_utc.hour}" if len(str(curr_utc.hour)) == 1 else curr_utc.hour
    
    time_suffix = f"{curr_utc.year}/{adjusted_month}/{adjusted_day}/{adjusted_hr}" 
    gt_s3_dst = f"{gt_s3_uri}/{time_suffix}"
    
    !aws s3 cp --recursive {tmp_monitoring_gt_dir} {gt_s3_dst}

### 6) Create a Baseline that are used by Model Monitor
In general this can be done parallel to the Transform Job

In [None]:
# copy over the training dataset to Amazon S3 (if you already have it in Amazon S3, you could reuse it)
baseline_prefix = prefix + "/baselining"
baseline_data_prefix = baseline_prefix + "/data"
baseline_results_prefix = baseline_prefix + "/results"

baseline_data_uri = "s3://{}/{}".format(bucket, baseline_data_prefix)
baseline_results_uri = "s3://{}/{}".format(bucket, baseline_results_prefix)
print("Baseline data uri: {}".format(baseline_data_uri))
print("Baseline results uri: {}".format(baseline_results_uri))

In [None]:
training_data_file = open("test_data/training-dataset-with-header.csv", "rb")
s3_key = os.path.join(baseline_prefix, "data", "training-dataset-with-header.csv")
boto3.Session().resource("s3").Bucket(bucket).Object(s3_key).upload_fileobj(training_data_file)

In [None]:
from sagemaker.model_monitor import ModelQualityMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat

my_default_monitor = ModelQualityMonitor(
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=30,
    max_runtime_in_seconds=1800,
)

my_default_monitor.suggest_baseline(
    baseline_dataset=baseline_data_uri + "/training-dataset-with-header.csv",
    problem_type="BinaryClassification",
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=baseline_results_uri,
    ground_truth_attribute="Churn",
    # for demonstration purpose, we set the inference_attribute the same as ground_truth_attribute
    # but realistically, we recommend running model against the training dataset inputs, and use
    # it as the inference attribute value
    inference_attribute="Churn",
    wait=True,
)

In [None]:
s3_client = boto3.Session().client("s3")
result = s3_client.list_objects(Bucket=bucket, Prefix=baseline_results_prefix)
report_files = [report_file.get("Key") for report_file in result.get("Contents")]
print("Found Files:")
print("\n ".join(report_files))

In [None]:
baseline_job = my_default_monitor.latest_baselining_job
schema_df = pd.io.json.json_normalize(
    baseline_job.baseline_statistics().body_dict["binary_classification_metrics"]
)
schema_df.transpose().head(10)

In [None]:
constraints_df = pd.io.json.json_normalize(
    baseline_job.suggested_constraints().body_dict["binary_classification_constraints"]
)
constraints_df.transpose().head(10)

### 7) Monitoring Schedule


In [None]:
from sagemaker.model_monitor import (
    CronExpressionGenerator,
    BatchTransformInput,
    MonitoringDatasetFormat,
    MonitoringExecution,
)
from time import gmtime, strftime

### Create a schedule

You can create a model monitoring schedule. Use the baseline resources (constraints and statistics) to compare against the batch transform inference inputs and outputs.

In [None]:
statistics_path = "{}/statistics.json".format(baseline_results_uri)
constraints_path = "{}/constraints.json".format(baseline_results_uri)

mon_schedule_name = "DEMO-xgb-churn-pred-model-quality-schedule-" + strftime(
    "%Y-%m-%d-%H-%M-%S", gmtime()
)
my_default_monitor.create_monitoring_schedule(
    monitor_schedule_name=mon_schedule_name,
    batch_transform_input=BatchTransformInput(
        data_captured_destination_s3_uri=s3_capture_upload_path,
        destination="/opt/ml/processing/input",
        dataset_format=MonitoringDatasetFormat.csv(header=False),
        # since we joined the transform input and output, the output are
        # following the input. There are 69 features for the input, so the index (0-based) for
        # the output (inference prediction) is 69
        probability_attribute="69",
        probability_threshold_attribute=0.5,
        # look back 6 hour to ensure we get the transform job outputs.
        start_time_offset="-PT6H",
        end_time_offset="-PT0H",
    ),
    ground_truth_input=gt_s3_uri,
    output_s3_uri=s3_report_path,
    problem_type="BinaryClassification",
    constraints=constraints_path,
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
)

---

### 8) Describe and inspect the schedule

Once you describe, observe that the MonitoringScheduleStatus changes to Scheduled.

In [None]:
desc_schedule_result = my_default_monitor.describe_schedule()
print("Schedule status: {}".format(desc_schedule_result["MonitoringScheduleStatus"]))

### List executions
The schedule starts jobs at the previously specified intervals. Here, you list the latest five executions. Note that if you are kicking this off after creating the hourly schedule, you might find the executions empty. You might have to wait until you cross the hour boundary (in UTC) to see executions kick off. The code below has the logic for waiting.

Note: Even for an hourly schedule, Amazon SageMaker has a buffer period of 20 minutes to schedule your execution. You might see your execution start in anywhere from zero to ~20 minutes from the hour boundary. This is expected and done for load balancing in the backend.

In [None]:
import time

mon_executions = my_default_monitor.list_executions()
print(
    "We created a hourly schedule above and it will kick off executions ON the hour (plus 0 - 20 min buffer.\nWe have to wait till we hit the hour..."
)

while len(mon_executions) == 0:
    print("Waiting for the 1st execution to happen...")
    time.sleep(60)
    mon_executions = my_default_monitor.list_executions()

### Inspect a specific execution (latest execution)
In the previous cell, you picked up the latest completed or failed scheduled execution. Here are the possible terminal states and what each of them mean: 
* Completed - This means the monitoring execution  is completed and no issues were found in the violations report.
* CompletedWithViolations - This means the execution completed, but constraint violations were detected.
* Failed - The monitoring execution failed, maybe due to client error (perhaps incorrect role permissions) or infrastructure issues. Further examination of FailureReason and ExitMessage is necessary to identify what exactly happened.
* Stopped - job exceeded max runtime or was manually stopped.

In [None]:
latest_execution = mon_executions[
    -1
]  # latest execution's index is -1, second to last is -2 and so on..
# time.sleep(60)
latest_execution.wait(logs=False)

print("Latest execution status: {}".format(latest_execution.describe()["ProcessingJobStatus"]))
print("Latest execution result: {}".format(latest_execution.describe()["ExitMessage"]))

latest_job = latest_execution.describe()
if latest_job["ProcessingJobStatus"] != "Completed":
    print(
        "====STOP==== \n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures."
    )

In [None]:
report_uri = latest_execution.output.destination
print("Report Uri: {}".format(report_uri))

### List the generated reports

In [None]:
from urllib.parse import urlparse

s3uri = urlparse(report_uri)
report_bucket = s3uri.netloc
report_key = s3uri.path.lstrip("/")
print("Report bucket: {}".format(report_bucket))
print("Report key: {}".format(report_key))

s3_client = boto3.Session().client("s3")
result = s3_client.list_objects(Bucket=report_bucket, Prefix=report_key)
report_files = [report_file.get("Key") for report_file in result.get("Contents")]
print("Found Report Files:")
print("\n ".join(report_files))

### Violations report

If there are any violations compared to the baseline, they are listed here.

In [None]:
violations = my_default_monitor.latest_monitoring_constraint_violations()
pd.set_option("display.max_colwidth", -1)
constraints_df = pd.io.json.json_normalize(violations.body_dict["violations"])
constraints_df.head(10)

### Other commands
We can also start and stop the monitoring schedules.

In [None]:
# my_default_monitor.stop_monitoring_schedule()
# my_default_monitor.start_monitoring_schedule()

### 9) Delete the resources


In [None]:
# my_default_monitor.stop_monitoring_schedule()
# my_default_monitor.delete_monitoring_schedule()
# time.sleep(60)  # actually wait for the deletion

In [None]:
# predictor.delete_model()