# Automated Model Approval Pipeline with SageMaker Pipelines, Cloudwatch Events and AWS Lambda

---

This notebook's CI test result for us-east-1 is as follows. CI test results in other regions can be found at the end of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/sagemaker-pipelines|automated-model-approval|sm_pipeline_automated-approval.ipynb)

---

This example notebook showcases how you can create an automated model approval pipeline with Amazon SageMaker, Cloudwatch Events and AWS Lambda. This demo is deisnged to run SageMaker Notebook Instances or SageMaker Studio. 

As an organization with multiple distributed teams of ML practitioners, integrating SageMaker Pipelines with your model approval process is useful for the following reasons:
1. Ensures a standardized and repeatable process to approve models for deployment.
2. Decouples different processes in your approval process and allows you to visualize these in a Directed Acyclic Graph (DAG).

This notebook can be used as a template to start building your own SageMaker Pipelines for automated model approval. Before you start, make sure that your SageMaker Execution Role has the following policy:

    AmazonSageMakerFullAccess


Your SageMaker Execution Role should have access to S3 already. If not you can add an S3 policy. You will also need to the inline policy described below:

In [None]:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "s3:*",
                "iam:CreateRole",
                "iam:AttachRolePolicy",
            ],
            "Resource": "*",
        }
    ],
}

Finally, you may need the following trust policies.

In [None]:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": ["s3.amazonaws.com", "sagemaker.amazonaws.com"]
            },
            "Action": "sts:AssumeRole",
        }
    ],
}

## Prerequisites

To recreate the results in the notebook, you will need to generate the necessary Clarify artifacts for model quality, bias and explainability when registering a new model. You can do this be running the cells in the [SageMaker Pipelines integration with Model Monitor](sagemaker-pipelines/tabular/model-monitor-clarify-pipelines/sagemaker-pipeline-model-monitor-clarify-steps.ipynb) notebook. The checks in `model-approval-checks.py` are meant to serve as an example to be extended depending on the use case and your organisation's internal processes.  

First, we are going to import the SageMaker SDK and set some default variables such as the `role` for permissioned execution and the `default_bucket` to store model artifacts.

In [None]:
! pip install sagemaker botocore boto3 awscli --upgrade

In [274]:

import boto3
import sagemaker
import json 

client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime")

s3_client = boto3.client('s3')

boto_session = boto3.session.Session()
region = boto_session.region_name
print(region)

sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
print(role)

default_bucket = sagemaker_session.default_bucket()

## Manually run the checks in the Notebook

In this section of the Notebook, the metadata associated with the registered model is used to retrieve Clarify artifacts. These artifacts contain measures of model quality, bias and feature importance, which are used to approve the model for deployment.

In [255]:
model_package_group_name = "<EXISTING_MODEL_PACKAGE_GROUP_NAME>"

### Retrieve Clarify artifacts based on Model Package 

In [256]:
model_package_arn = client.list_model_packages(ModelPackageGroupName=model_package_group_name)[
    "ModelPackageSummaryList"
][0]["ModelPackageArn"]

In [223]:
model_package_metrics = client.describe_model_package(ModelPackageName=model_package_arn)["ModelMetrics"]

model_quality_s3_key = model_package_metrics["ModelQuality"]["Statistics"]["S3Uri"].split(f"{default_bucket}/")[1]
model_quality_bias = model_package_metrics["Bias"]
model_quality_pretrain_bias_key = model_quality_bias["PreTrainingReport"]["S3Uri"].split(f"{default_bucket}/")[1]
model_quality__post_train_bias_key = model_quality_bias["PostTrainingReport"]["S3Uri"].split(f"{default_bucket}/")[1]
model_explainability_s3_key = model_package_metrics["Explainability"]["Report"]["S3Uri"].split(f"{default_bucket}/")[1]

### Model Quality Check

In [224]:
# MODEL QUALITY (in step 1)
mae_threshold = 1.5

s3_obj = s3_client.get_object(Bucket=default_bucket, Key=model_quality_s3_key)
s3_obj_data = s3_obj['Body'].read().decode('utf-8')
model_quality_json = json.loads(s3_obj_data)
mae = model_quality_json["regression_metrics"]["mae"]["value"]

mae_check = True if mae < mae_threshold else False
mae_check

### Model Pre-training Bias Check

In [None]:
# MODEL PRETRAINIG BIAS (in step 1)
kullback_liebler_check_threshold = 0.5

s3_obj = s3_client.get_object(Bucket=default_bucket, Key=model_quality_pretrain_bias_key)
s3_obj_data = s3_obj['Body'].read().decode('utf-8')
model_quality_pretrain_bias_json = json.loads(s3_obj_data)
kullback_liebler = model_quality_pretrain_bias_json["pre_training_bias_metrics"][
    "facets"]["column_8"][0]["metrics"][4]["value"]

kullback_liebler_check = True if kullback_liebler < kullback_liebler_check_threshold else False 
kullback_liebler_check

### Model Post-training Bias Check

In [None]:
# MODEL POSTTRAINING BIAS (in step 1)
treatment_equity_check_threshold = 0

s3_obj = s3_client.get_object(Bucket=default_bucket, Key=model_quality__post_train_bias_key)
s3_obj_data = s3_obj['Body'].read().decode('utf-8')
model_quality__post_train_bias_json = json.loads(s3_obj_data)
treatment_equity = model_quality__post_train_bias_json["post_training_bias_metrics"][
    "facets"]["column_8"][0]["metrics"][-1]["value"]

treatment_equity_check = True if treatment_equity < treatment_equity_check_threshold else False
treatment_equity_check

### Model Explainability Report Check

In [None]:
# MODEL EXPLAINABILITY REPORT (in step 1)
age_shap_label_0_threshold = 0.5

s3_obj = s3_client.get_object(Bucket=default_bucket, Key=model_explainability_s3_key)
s3_obj_data = s3_obj['Body'].read().decode('utf-8')
model_explainability_s3_json = json.loads(s3_obj_data)
age_shap_label_0 = model_explainability_s3_json[
    "explanations"]["kernel_shap"]["label0"]["global_shap_values"]["column_1"]

age_shap_label_0_check = True if age_shap_label_0 < age_shap_label_0_threshold else False
age_shap_label_0_check

### Model Approval Based on Previous Checks

In [100]:
model_package_update_input_dict = {
    "ModelPackageArn" : model_package_arn,
    "ModelApprovalStatus" : "Approved" if (mae_check and kullback_liebler_check and treatment_equity_check and age_shap_label_0_check) else "Rejected"
}

## Automate using SageMaker Pipelines and SageMaker Processors

In this section of the Notebook, the previous steps are orchestrated as a SageMaker Pipeline. 

In [None]:
pipeline_session = PipelineSession()
pipeline_name = "automated-model-governance"

### Define Pipelines and Input Parameters

In [263]:
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.parameters import (ParameterString)

model_package_group_name = ParameterString(
    name="ModelPackageGroupName",
    default_value="ModelPackageGroupName is required variable."
)

default_bucket_s3 = ParameterString(
    name="Bucket",
    default_value="Bucket is required variable"
)

pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        model_package_group_name,
        default_bucket_s3
    ],
    steps=[process_step, update_model_status_step]
)

### Processors for Validation and Model Status Update Steps

In [None]:
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.steps import ProcessingStep
from sagemaker.sklearn.processing import SKLearnProcessor

validation_step = SKLearnProcessor(
    framework_version="1.0-1",  
    role=role, 
    instance_type="ml.t3.medium", 
    base_job_name = "RegisteredModelValidationProcessor",
    instance_count=1,  
    sagemaker_session=pipeline_session,
)

model_status_update_step = SKLearnProcessor(
    framework_version="1.0-1",  
    role=role, 
    instance_type="ml.t3.medium", 
    base_job_name = "UpdateModelStatusProcessor",
    instance_count=1,  
    sagemaker_session=pipeline_session,
)

### Model Validation Step

In [265]:
process_step = ProcessingStep(
    name="RegisteredModelValidationStep",
    step_args=validation_step.run(
        code="model-approval-checks.py",
        inputs=[],
        outputs=[
            ProcessingOutput(
                output_name="checks",
                destination=f"s3://{default_bucket}/governance-pipeline/processor/",
                source="/opt/ml/processing/output"
        )],
        arguments=[
            "--default_bucket", default_bucket_s3, 
            "--model_package_group_name", model_package_group_name
        ]
    )
)

### Update Model Status Step

In [267]:
update_model_status_step = ProcessingStep(
    name="UpdateModelStatusStep",
    step_args=model_status_update_step.run(
        code="validate-model.py",
        inputs=[
            ProcessingInput(
                source=process_step.properties.ProcessingOutputConfig.Outputs["checks"].S3Output.S3Uri, 
                destination="/opt/ml/processing/input"
            ),
        ],
        outputs=[],
        arguments=[
            "--model_package_group_name", model_package_group_name
        ]
    )
)

### Start Pipeline

This step is represented in in the Lambda function launched when a new model is registered.

In [None]:
pipeline.upsert(role_arn=role)
execution = pipeline.start(
    parameters=dict(
            ModelPackageGroupName=model_package_group_name,
            Bucket=default_bucket
        )
)

### Wait for the pipeline execution to complete

In [None]:
execution.wait()

## Cleaning up resources

Users are responsible for cleaning up resources created when running this notebook. Specify the ModelName, ModelPackageName, and ModelPackageGroupName that need to be deleted. The model names are generated by the CreateModel step of the Pipeline and the property values are available only in the Pipeline context. To delete the models created by this pipeline, navigate to the Model Registry and Console to find the models to delete.

In [None]:
# Create a SageMaker client
# sm_client = boto3.client("sagemaker")

# # Delete SageMaker Models
# sm_client.delete_model(ModelName="...")

# # Delete Model Packages
# sm_client.delete_model_package(ModelPackageName="...")

# # Delete the Model Package Group
# sm_client.delete_model_package_group(ModelPackageGroupName="...")

# # Delete the Pipeline
# sm_client.delete_pipeline(PipelineName="...")