# SageMaker Model Monitor - visualizing monitoring results


The prebuilt container from SageMaker computes a variety of statistics and evaluates constraints out of the box. This notebook demonstrates how you can visualize them. You can grab the ProcessingJob arn from the executions behind a MonitoringSchedule and use this notebook to visualize the results.

Let's import some python libraries that will be helpful for visualization

In [2]:
from IPython.display import HTML, display
import json
import os
import boto3

import sagemaker
from sagemaker import session
from sagemaker.model_monitor import MonitoringExecution
from sagemaker.s3 import S3Downloader

## Get Utilities for Rendering

The functions for plotting and rendering distribution statistics or constraint violations are implemented in a `utils` file so let's grab that.

In [3]:
!wget https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/sagemaker_model_monitor/visualization/utils.py

import utils as mu

--2022-05-31 09:11:24--  https://raw.githubusercontent.com/awslabs/amazon-sagemaker-examples/master/sagemaker_model_monitor/visualization/utils.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13024 (13K) [text/plain]
Saving to: ‘utils.py’


2022-05-31 09:11:24 (81.4 MB/s) - ‘utils.py’ saved [13024/13024]



## Get Execution and Baseline details from Processing Job Arn

Enter the ProcessingJob arn for an execution of a MonitoringSchedule below to get the result files associated with that execution

In [7]:
processing_job_arn = "arn:aws:sagemaker:us-east-1:086566025687:processing-job/model-monitoring-202205310900-c47d89f2d22a25a8613f4ec8" 

In [8]:
execution = MonitoringExecution.from_processing_arn(sagemaker_session=session.Session(), processing_job_arn=processing_job_arn)
exec_inputs = {inp['InputName']: inp for inp in execution.describe()['ProcessingInputs']}
exec_results = execution.output.destination

In [10]:
exec_results

's3://sagemaker-us-east-1-086566025687/model-monitor/monitoring/wine-model-monitoring/results/xgboost-2022-05-31-08-13-07-565/wine-model-monitoring/2022/05/31/09'

In [12]:
baseline_statistics_filepath = exec_inputs['baseline']['S3Input']['S3Uri'] if 'baseline' in exec_inputs else None
# execution_statistics_filepath = os.path.join(exec_results, 'statistics.json')
violations_filepath = os.path.join(exec_results, 'constraint_violations.json')

baseline_statistics = json.loads(S3Downloader.read_file(baseline_statistics_filepath)) if baseline_statistics_filepath is not None else None
# execution_statistics = json.loads(S3Downloader.read_file(execution_statistics_filepath))
violations = json.loads(S3Downloader.read_file(violations_filepath))['violations']

In [19]:
violations

[{'feature_name': 'Extra columns',
  'constraint_check_type': 'extra_column_check',
  'description': 'There are extra columns in current dataset. Number of columns in current dataset: 18, Number of columns in baseline constraints: 14'}]

## Overview

The code below shows the violations and constraint checks across all features in a simple table.

In [15]:
mu.show_violation_df(baseline_statistics=baseline_statistics, latest_statistics=execution_statistics, violations=violations)

TypeError: show_violation_df() missing 1 required positional argument: 'latest_statistics'

## Distributions

This section visualizes the distribution and renders the distribution statistics for all features

In [16]:
features = mu.get_features(execution_statistics)
feature_baselines = mu.get_features(baseline_statistics)

NameError: name 'execution_statistics' is not defined

In [17]:
mu.show_distributions(features)

NameError: name 'features' is not defined

### Execution Stats vs Baseline

In [18]:
mu.show_distributions(features, feature_baselines)

NameError: name 'features' is not defined