# Notebook 3. Track Model Quality with SageMaker MLOps

## Learning Objectives
- Automate Machine Learning Operations (MLOps) with SageMaker Pipelines.
- Track model versions with the SageMaker Model Registry.
- Validate model performance using SageMaker Model Monitoring and Model Lineage.

## Environment Notes:
This notebook was created and tested on an `ml.t3.medium (2 vCPU + 4 GiB)` notebook instance running the `Python 3 (Data Science)` kernel in SageMaker Studio.

## Table of Contents
1. [Background](#1.-Background)
    1. [Amazon SageMaker Model Building Pipelines](#1.A.-Amazon-SageMaker-Model-Building-Pipelines)
    1. [Amazon SageMaker Model Registry](#1.B.-Amazon-SageMaker-Model-Registry)
    1. [Amazon SageMaker Model Lineage](#1.D.-Amazon-SageMaker-Model-Lineage)
1. [Create SageMaker MLOps Project](#2.-Create-SageMaker-MLOps-Project)
    1. [Create a New Project Using the Build, Traing, and Deploy Template](#2.A.-Create-a-New-Project-Using-the-Build,-Traing,-and-Deploy-Template)
    1. [Clone the New Git Repositories](#2.B.-Clone-the-New-Git-Repositories)
    1. [Update the Build Repository](#2.C.-Update-the-Build-Repository)
    1. [Approve the New Model Version](#2.D.-Approve-the-New-Model-Version)
1. [Test the Model Inference Endpoint](#3.-Test-the-Model-Inference-Endpoint)
    1. [Import Libraries and Create Clients](#3.A.-Import-Libraries-and-Create-Clients)
    1. [Examine Model Training Reports](#3.B.-Examine-Model-Training-Reports)
    1. [Invoke the Staging Model Endpoint](#3.C.-Invoke-the-Staging-Model-Endpoint)
1. [Explore the Model Lineage](#4.-Explore-the-Model-Lineage)
    1. [Visualize Lineage Entities as a Table](#4.A.-Visualize-Lineage-Entities-as-a-Table)
    1. [Visualize Lineage Entities as a Graph](#4.B.-Visualize-Lineage-Entities-as-a-Graph)
1. [Approve the Model Version for Release](#5.-Approve-the-Model-Version-for-Release)
1. [Clean Up](#6.-Clean-Up)

-----
## 1. Background

In Notebook 2 of this series, we demonstrated how SageMaker Processing, Training, and Hyperparameter Optimization (HPO) jobs can make the development of new machine learning (ML) models faster and more cost efficient. In this notebook, we'll look at some best practices for deploying and managing your models into production. Many of these practices fall into the category of "Machine Learning Operations", or "MLOps" and are increasingly a part of many [regulatory and quality requirements](https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdf).

MLOps plays a key role in the **Model Deployment** and **Model Monitoring/Maintenance** phases of the Machine Learning Lifecycle. For more information, please refer to the [Machine Learning Best Practices in Healthcare and Life Sciences Whitepaper](https://d1.awsstatic.com/whitepapers/ML-best-practices-health-science.pdf?did=wp_card&trk=wp_card).

![Machine Learning Life Cycle - Part 1](img/MLLC2.png "ML Life Cycle - Part 1")

### 1.1. Amazon SageMaker Model Building Pipelines

[Amazon SageMaker Model Building Pipelines](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html) is a tool for building machine learning pipelines that take advantage of direct SageMaker integration. Because of this integration, you can create a pipeline and set up SageMaker Projects for orchestration using a tool that handles much of the step creation and management for you. You can manage these pipelines in the SageMaker Studio UI and automatically capture data and model lineage.

One of the challenges with deploying ML solutions is that their effectiveness can change over time.  For example, perhaps the distribution of your data shifts from year-to-year? Or the boundaries of a classification category? In these cases, you want to be able to quickly retrain and deploy new versions of your model, either on a schedule or in response to some event.

Amazon SageMaker Pipelines allows us to define reproducible ML processes that we can trigger at will. In this example, we'll use the processing, training, and registration artifacts from above to create a pipeline and demonstrate how to execute it.

### 1.2. Amazon SageMaker Model Registry

The [Amazon SageMaker Model Registry](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html) is a managed service that allows you to track model metadata, approve releases, and deploy new versions to production. It involves two concepts:

- A **Model Package Group** is a group of models that share a common business goal. For example, you might create a model package group to track models for segmenting a specific kind of medical image.
- A **Model Package** or **Model Version** is a member of a Model Package Group. It refers to the a specific implementation of a model with its own training artifact and/or inference container.

### 1.3. Amazon SageMaker Model Lineage

[Amazon SageMaker Model Lineage](https://docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking.html) creates and stores information about the steps of a machine learning (ML) workflow from data preparation to model deployment. With the tracking information, you can reproduce the workflow steps, track model and dataset lineage, and establish model governance and audit standards.

-----
## 2. Create SageMaker MLOps Project

### 2.1. Create a New Project Using the Build, Traing, and Deploy Template

Select the Home icon from the SageMaker Studio sidebar and then the Deployments group.

![Resources](img/deployments.png)

Select **Projects** and then **Create project**

![Projects](img/projects.png)

In the **SageMaker project templates** view, select the **MLOps template for model building, training, and deployment** template and then **Select Template**.

![Create Project](img/create_project.png)

In the **Project details** view, type `her2-brca-classifier` in the **Name** field and select **Create Project**.

![Create Project](img/project_name.png)

While the new project is starting, import some libraries and create clients

In [None]:
%pip install --disable-pip-version-check setuptools==59.5.0 -q -q
%pip install --disable-pip-version-check -U sagemaker jsonlines pyvis -q -q

In [None]:
import base64
import boto3
import jsonlines
import os
import pandas as pd
import sagemaker
from sagemaker.lineage.context import Context, EndpointContext
from sagemaker.lineage.lineage_trial_component import LineageTrialComponent
from sagemaker.lineage.visualizer import LineageTableVisualizer
from sagemaker.predictor import Predictor
import shutil
from time import sleep
from visualizer.visualizer import Visualizer

boto_session = boto3.session.Session()
region = boto_session.region_name
sagemaker_session = sagemaker.session.Session(boto_session)
sagemaker_execution_role = sagemaker.session.get_execution_role(sagemaker_session)
sagemaker_boto_client = boto_session.client("sagemaker")
s3_boto_client = boto_session.client("s3")
account_id = boto_session.client("sts").get_caller_identity().get("Account")
print(f"Assumed SageMaker role is {sagemaker_execution_role}")

Capture some information associated with your new project.

In [None]:
# NOTE If you use a different name for your project, please update this variable:
project_name = "her2-brca-classifier"

In [None]:
project_id = sagemaker_boto_client.describe_project(ProjectName=project_name).get(
    "ProjectId"
)
print(f"SageMaker project name is {project_name}")
print(f"SageMaker project ID is {project_id}")

s3_bucket = f"sagemaker-project-{project_id}"
pipeline_name = f"{project_name.lower()}-{project_id}"
print(f"Pipeline name is {pipeline_name}")
staging_endpoint_name = f"{project_name}-staging"
prod_endpoint_name = f"{project_name}-prod"
build_code_path = f"/root/{pipeline_name}/sagemaker-{pipeline_name}-modelbuild"
deploy_code_path = f"/root/{pipeline_name}/sagemaker-{pipeline_name}-modeldeploy"

SageMaker will automatically create and run a default "abalone" template when it creates the new project, which can take as long as 15 minutes to finish. To speed things up, run the following cell to stop this pipeline execution.

In [None]:
# Halt execution of default "Abalone" pipeline
codepipeline_boto_client = boto3.client("codepipeline")
codepipeline_name_build = f"sagemaker-{pipeline_name}-modelbuild"
pipelineExecutionId = codepipeline_boto_client.list_pipeline_executions(
    pipelineName=codepipeline_name_build
)["pipelineExecutionSummaries"][-1]["pipelineExecutionId"]

codepipeline_boto_client.stop_pipeline_execution(
    pipelineName=codepipeline_name_build,
    pipelineExecutionId=pipelineExecutionId,
    abandon=True,
)

Here's an overview of the services created by the project pipeline:

**Model Building and Training Stack**

![Model Building and Training Stack](img/template_build.jpg)

**Model Deployment Stack**

![Model Deployment Stack](img/template_deploy.jpg)

### 2.2. Clone the New Git Repositories
Once the project has successfully been created, navigate to the **Repositories** tab in the project view.

![Repositories](img/repositories.png)

Select the **clone repo...** link for the first repository and then **Clone Repository** with the default options on the next view.

![Default Repo Settings](img/repo_defaults.png)

Repeat for the second repository. 

From the SageMaker Studio sidebar, select the **File Browser** icon. Verify that there is a new folder in your home directory named `her2-brca-classifier`, followed by the project ID. There should also be two subfolders, one for the model build steps and another for the model deploy steps.

![Default Repo Settings](img/cloned_folders.png)

<!-- 6. Within the build subfolder (It will be named something like `sagemaker-her2-brca-classifier-[PROJECT ID]-modelbuild`) navigate to `pipelines/abalone`. This folder will contain three files:
- `evaluate.py`: A Python module for measuring model performance.
- `pipeline.py`: A Python module that defines a SageMaker Pipelines model building workflow.
- `preprocess.py`: A Python module for running a data processing job.

Each of these files contains placeholder code for now. We'll update them with our own code in the next section. -->

### 2.3. Update the Build Repository

Run the following cells to update your cloned repository with custom pipeline code and push the changes to CodeCommit. This will restart the MLOps process and build a new version of your pipeline with your custom model training code.

In [None]:
%%bash -s "$build_code_path" 
cp -r scripts/pipelines/her2pipeline $1/pipelines/her2pipeline
cp scripts/pipelines/codebuild-buildspec.yml $1
cd $1
git config --global user.email "awsuser@amazon.com"
git config --global user.name "AWS User"
git add .
git commit -a -m "Update pipeline code"
git config --global credential.helper '!aws codecommit credential-helper $@'
git config --global credential.UseHttpPath true
git push -u origin

It will take approximately 15 minutes to rebuild and execute the pipeline. You can track the progress either on the **Pipelines** tab of the project view or on the AWS **CodeBuild** console.

![Pipeline Execution](img/pipeline_execution.png)

### 2.4. Approve the New Model Version
1. Navigate back to the project view.
2. Select the **Model groups** tab.
3. Double-click on the model group name (e.g. `her2-brca-classifier-[PROJECT ID]`) to view the available model versions.
4. Double-click on model version 2 to view its details.
5. Navigate between the **Activity**, **Model quality**, and **Settings** tabs to view information about the model inference endpoint.
6. Select the orange **Update Status** button in the upper-right corner of the model registry view.
7. Update the **Approved** status and (optionally) add a comment.

![Update the model status](img/update-status.png "Update the model status")

8. Wait several minutes for the "Staging" endpoint appear in the Endpoints tab.

Real-time inference endpoints are deployed to a persistent EC2 instance. This allows them to respond quickly to requests and support a wide range of custom properties. It's a good choice for models with steady usage. However, there are other ways to deploy a model on SageMaker as well.

![alt text](img/deployment_options.png "SageMaker Model Deployment Options")

## 3. Test the Model Inference Endpoint

### 3.1. Examine Model Training Reports

In [None]:
# Download training reports
last_training_job_name = (
    sagemaker_boto_client.list_training_jobs()
    .get("TrainingJobSummaries")[0]
    .get("TrainingJobName")
)
rule_output_path = (
    f"s3://sagemaker-project-{project_id}/{last_training_job_name}/rule-output"
)
print(f"Downloading training reports from {rule_output_path}")
sagemaker.s3.S3Downloader.download(
    s3_uri=rule_output_path, local_path="training_reports/"
)

-----
### 3.2. Invoke the Staging Model Endpoint

In [None]:
# Download test data
recent_test_data_uri = sagemaker.s3.parse_s3_url(
    sagemaker_boto_client.describe_processing_job(
        ProcessingJobName=sagemaker_boto_client.list_processing_jobs(
            NameContains="PreprocessHER2Data"
        )["ProcessingJobSummaries"][-1]["ProcessingJobName"]
    )["ProcessingOutputConfig"]["Outputs"][-1]["S3Output"]["S3Uri"]
)
sagemaker_session.download_data(
    f"data/output/test",
    bucket=recent_test_data_uri[0],
    key_prefix=f"{recent_test_data_uri[1]}/test.csv",
)

# Create a Predictor object for testing
predictor = Predictor(
    endpoint_name=staging_endpoint_name,
    sagemaker_session=sagemaker_session,
    serializer=sagemaker.serializers.CSVSerializer(),
    deserializer=sagemaker.deserializers.JSONDeserializer(),
)

# Load a random sample of 10 records from the test data
test_df = pd.read_csv("data/output/test/test.csv").sample(n=25)

# Submit the 10 samples to the inference endpoint and compare the actual and predicted values
print(
    "Sending test traffic to the endpoint {}. \nPlease wait...".format(
        staging_endpoint_name
    )
)

for i, row in test_df.iterrows():
    print(
        f"[Actual | predicted] labels for record {i:3} are [{row[0]} | {predictor.predict(row.iloc[1:]):.3f}]"
    )
    sleep(0.1)

Wait for the monitoring data to finish processing. This will take about a minute to complete.

In [None]:
# Watch the S3 bucket we specified above for storing monitoring data
endpoint_capture_uri = (
    sagemaker_boto_client.describe_endpoint(EndpointName=staging_endpoint_name)
    .get("DataCaptureConfig")
    .get("DestinationS3Uri")
)
endpoint_capture_bucket = sagemaker.s3.parse_s3_url(endpoint_capture_uri)[0]
endpoint_capture_prefix = sagemaker.s3.parse_s3_url(endpoint_capture_uri)[1]
result = {}
while result.get("Contents") is None:
    print("Waiting for endpoint monitoring data to populate...")
    result = s3_boto_client.list_objects(
        Bucket=endpoint_capture_bucket, Prefix=endpoint_capture_prefix
    )
    sleep(10)
capture_files = [capture_file.get("Key") for capture_file in result.get("Contents")]
print("Found Capture Files:")
print("\n ".join(capture_files))

Examine the contents of the first data capture file

In [None]:
# Download the monitoring data from S3
sagemaker_session.download_data(
    "data", bucket=endpoint_capture_bucket, key_prefix=capture_files[0]
)
runs = []

# Open the jsonlines file and summarize the contents
with jsonlines.open(f"data/{os.path.basename(capture_files[0])}") as reader:
    [runs.append(obj) for obj in reader]

print(f"Number of runs captured in file: {len(runs)}")
print(f"First event metadata: {runs[0]['eventMetadata']}")

first_request = runs[0]["captureData"]["endpointInput"]["data"]
print(f"First event input: {first_request[:2000]}...")

first_response = runs[0]["captureData"]["endpointOutput"]["data"]

print(f"First event output: {first_response}")

Resubmit the data from the first prediction

In [None]:
predictor.predict(first_request)

-----
## 4. Explore the Model Lineage

Effective model governance requires a detailed understanding of the data and data transformations used in the modeling process, in addition to nearly continuous tracking of all model development iterations. It is important to keep track of which dataset was used, what transformations were applied to the data, where the dataset was stored, and what type of model was built. This metadata that tracks the relationships between various entities in your ML workflows is called the "lineage".

In this section, we'll explore the model artifacts and events that Amazon SageMaker ML Lineage Tracking creates for us automatically. We'll also see how to expand the lineage by manually adding additional artifacts.

### 4.1. Visualize Lineage Entities as a Table

Amazon SageMaker automatically creates tracking entities for SageMaker jobs, models, model packages, and endpoints if the data is available.

In [None]:
from sagemaker.workflow import pipeline

pipeline_execution_arn = (
    sagemaker_boto_client.list_pipeline_executions(PipelineName=pipeline_name)
    .get("PipelineExecutionSummaries")[0]
    .get("PipelineExecutionArn")
)

execution = sagemaker.workflow.pipeline._PipelineExecution(arn=pipeline_execution_arn)
table_viz = LineageTableVisualizer(sagemaker_session=sagemaker_session)
for execution_step in reversed(execution.list_steps()):
    print(execution_step.get("StepName"))
    display(table_viz.show(pipeline_execution_step=execution_step))
    sleep(1)

### 4.2. Visualize Lineage Entities as a Graph

We can also visualize the ML lineage as a graph.

In [None]:
endpoint_info = sagemaker_boto_client.describe_endpoint(
    EndpointName=staging_endpoint_name
)
endpoint_arn = endpoint_info["EndpointArn"]
print(f"Endpoint Name: {endpoint_info['EndpointName']}")

# Get the endpoint context for querying the lineage graph
contexts = Context.list(source_uri=endpoint_arn, sagemaker_session=sagemaker_session)
context_name = list(contexts)[0].context_name

viz = Visualizer()
print("Querying lineage for context", context_name)
endpoint_context = EndpointContext.load(
    context_name=context_name, sagemaker_session=sagemaker_session
)
query_response = sagemaker_boto_client.query_lineage(
    StartArns=[endpoint_context.context_arn],
    Direction="Ascendants",
    IncludeEdges=True,
)
viz.render(query_response, "Endpoint", sagemaker_session=sagemaker_session)

## 5. (Optional) Deploy to Production
1. In the AWS Console, search for and select **CodePipeline**.

![Search for CodePipeline](img/code-pipeline.png)

2. Navigate to **Pipeline > Pipelines** and select the model deploy pipeline already in progress.

![Find Prod Deploy Pipeline](img/find-prod-deploy.png)

3. Scroll down to the **DeployStaging** stage and select Review.

![Select Review](img/deploy-stage.png)

4. Select **Approve** in the Review view.

![Approve Prod Deployment](img/approve-prod.png)

5. Navigate back to the SageMaker Project view. After several minutes, a second "prod" endpoint will appear.

![Second Endpoint](img/second-endpoint.png)


## 6. Clean Up

In [None]:
# Delete model registry records
for package in sagemaker_boto_client.list_model_packages(
    ModelPackageGroupName=pipeline_name
).get("ModelPackageSummaryList"):
    print(package)
    sagemaker_boto_client.delete_model_package(
        ModelPackageName=package.get("ModelPackageArn")
    )
sagemaker_boto_client.delete_model_package_group(ModelPackageGroupName=pipeline_name)

# Delete endpoint
predictor.delete_endpoint()

# Delete pipeline
sagemaker_boto_client.delete_pipeline(PipelineName=pipeline_name)

# Delete all S3 objects
bucket = boto_session.resource("s3").Bucket(s3_bucket)
bucket.objects.filter().delete()
bucket.delete()

# Delete Project
sagemaker_boto_client.delete_project(ProjectName=project_name)

# Delete deployment infrastructure
cfn = boto3.client("cloudformation")
cfn.delete_stack(StackName=f"sagemaker-{project_name}-{project_id}-deploy-staging")
cfn.delete_stack(StackName=f"sagemaker-{project_name}-{project_id}-deploy-prod")

# Delete local  objects
os.system(f"rm -rf ~/{project_name}-{project_id}")
os.system("rm -rf data models generated training_reports")