# Monitoring Production Models with Amazon SageMaker Model Monitor and Clarify

#### Monitoring production machine learning (ML) models is a critical step to ensure that the models continue to meet business needs. Besides the infrastructure hosting the model, there are other important aspects of ML models that should be monitored regularly. As models age over a period of time, the real-world inference data distribution may change as compared to the data used for training the model. For example, consumer purchase patterns may change in the retail industry and economic conditions such as mortgage rates may change in the financial industry.

#### This gradual misalignment between the training and the live inference datasets can have a big impact on model predictions. Model quality metrics such as accuracy may degrade over time as well. Degraded model quality has a negative impact on business outcomes. Regulatory requirements, such as ensuring that ML models are unbiased and explainable, add another angle to model monitoring. Comprehensive monitoring of production models for these aspects allows you to proactively identify if and when a production model needs to be updated. Updating a production model needs both retraining and deployment resources. The costs involved in updating a production model should be weighed against the opportunity costs of effectively serving the model consumers.

#### Amazon SageMaker Model Monitor provides capabilities to monitor data drift and the model quality of models deployed as SageMaker real-time endpoints. Amazon SageMaker Clarify provides capabilities to monitor the deployed model for bias and feature attribution drift. Using a combination of these two features, you can monitor the following four different aspects of ML models deployed on SageMaker:

- Data drift: If the live inference traffic data served by the deployed model is statistically different from the training data the model was trained on, the model prediction accuracy will start to deteriorate. Using a combination of a training data baseline and periodic monitoring to compare the incoming inference requests with the baseline data, SageMaker Model Monitor detects data drift. Model Monitor further generates data drift metrics that are integrated with Amazon CloudWatch. Using these CloudWatch alerts, you can generate data drift detection alerts.

- Model quality: Monitoring model quality involves comparing labels predicted by a model to the actual labels, also called the ground truth inference labels. Model Monitor periodically merges data captured from real-time inferences with the ground truth labels to compare model quality drift against a baseline generated with training data. Similar to data drift metrics, model quality metrics are integrated with CloudWatch, so alerts can be generated if the model quality falls below a threshold.

- Bias drift: Statistically, significant drift between the live inference traffic data and the training data could also result in bias in the model over a period of time. This could happen even after detecting and addressing bias in the training data before training and deploying the model. SageMaker Clarify continuously monitors a deployed model for bias and generates bias metrics that are integrated with CloudWatch metrics.

- Feature attribution drift: Along with introducing bias in deployed models, drift in live inference data distribution can also cause drift in feature attribution values. Feature attribution ranks the individual features of a dataset according to their relative importance to a model trained using that dataset using an importance score. The feature importance score provides one way of explaining the model predictions by providing insight into which features played a role in making predictions. SageMaker Clarify compares the feature attribution or feature rankings in the training data to the feature attribution or feature rankings in live inference traffic data. Similar to other types of monitoring, feature attribution drift metrics are generated and integrated with CloudWatch.

#### 1. Enable data capture: The first step is to enable data capture on the real-time endpoint. On enabling data capture, input to and output from the SageMaker endpoint is captured and saved in Amazon Simple Storage Service (S3). Input captured includes the live inference traffic requests and output captured includes predictions from the deployed model. This is a common step for all four types of monitoring: data drift, model quality, bias drift, and feature attribution drift monitoring.

#### 2. Generate baseline: In this step, the training or validation data is analyzed to generate a baseline. The baseline generated will be further used in the next step to compare against the live inference traffic. The baseline generation process computes metrics about the data analyzed and suggests constraints for the metrics. The baseline generated is unique to the type of monitoring.

#### 3 Schedule and execute monitoring job: To continuously monitor the real-time endpoint, the next step is to create a monitoring schedule to execute at a predefined interval. Once the monitoring schedule is in place, SageMaker Processing jobs are automatically kicked off to analyze the data captured from the endpoint in a specific interval. For each execution of the monitoring job, the processing job compares live traffic data captured with the baseline. If the metrics generated on the live traffic data captured in a period are outside the range of constraints suggested by the baseline, a violation is generated. The scheduled monitoring jobs also generate monitoring reports for each execution, which are saved in an S3 bucket. Additionally, CloudWatch metrics are also generated, the exact metrics being unique to the type of monitoring.

#### Analyze and act on results: Reports generated by the monitoring job can either be downloaded directly from S3 or visualized in a SageMaker Studio environment. In the Studio environment, you can also visualize the details of the monitoring jobs and create charts that compare the baseline metrics with the metrics calculated by the monitoring job.

#### To remediate issues discovered, you can use the CloudWatch metrics emitted from the monitoring job. The specific metrics depend on the type of the monitoring job. You can configure CloudWatch alerts for these metrics, based on the threshold values suggested by the baseline job. CloudWatch alerts allow you to automate responses to violations and metrics generated by monitoring jobs.

## Data drift monitoring

#### You monitor a production model for data drift to ensure that the distribution of the live inference traffic the deployed model is serving does not drift away from the distribution of the dataset used for training the model.
