# Notebook 3. Track Model Quality with SageMaker MLOps

## Learning Objectives
- Automate Machine Learning Operations (MLOps) with SageMaker Pipelines.
- Track model versions with the SageMaker Model Registry.
- Validate model performance using SageMaker Model Monitoring and Model Lineage.

## Environment Notes:
This notebook was created and tested on an `ml.t3.medium (2 vCPU + 4 GiB)` notebook instance running the `Python 3 (Data Science)` kernel in SageMaker Studio.

## Table of Contents
1. [Background](#1.-Background)
    1. [Amazon SageMaker Model Building Pipelines](#1.A.-Amazon-SageMaker-Model-Building-Pipelines)
    1. [Amazon SageMaker Model Registry](#1.B.-Amazon-SageMaker-Model-Registry)
    1. [Amazon SageMaker Model Lineage](#1.D.-Amazon-SageMaker-Model-Lineage)
1. [Create SageMaker MLOps Project](#2.-Create-SageMaker-MLOps-Project)
    1. [Create a New Project Using the Build, Traing, and Deploy Template](#2.A.-Create-a-New-Project-Using-the-Build,-Traing,-and-Deploy-Template)
    1. [Clone the New Git Repositories](#2.B.-Clone-the-New-Git-Repositories)
    1. [Update the Build Repository](#2.C.-Update-the-Build-Repository)
    1. [Approve the New Model Version](#2.D.-Approve-the-New-Model-Version)
1. [Test the Model Inference Endpoint](#3.-Test-the-Model-Inference-Endpoint)
    1. [Import Libraries and Create Clients](#3.A.-Import-Libraries-and-Create-Clients)
    1. [Examine Model Training Reports](#3.B.-Examine-Model-Training-Reports)
    1. [Invoke the Staging Model Endpoint](#3.C.-Invoke-the-Staging-Model-Endpoint)
1. [Explore the Model Lineage](#4.-Explore-the-Model-Lineage)
    1. [Visualize Lineage Entities as a Table](#4.A.-Visualize-Lineage-Entities-as-a-Table)
    1. [Visualize Lineage Entities as a Graph](#4.B.-Visualize-Lineage-Entities-as-a-Graph)
1. [Approve the Model Version for Release](#5.-Approve-the-Model-Version-for-Release)
1. [Clean Up](#6.-Clean-Up)

-----
## 1. Background

In Notebook 2 of this series, we demonstrated how SageMaker Processing, Training, and Hyperparameter Optimization (HPO) jobs can make the development of new machine learning (ML) models faster and more cost efficient. In this notebook, we'll look at some best practices for deploying and managing your models into production. Many of these practices fall into the category of "Machine Learning Operations", or "MLOps" and are increasingly a part of many [regulatory and quality requirements](https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdf).

MLOps plays a key role in the **Model Deployment** and **Model Monitoring/Maintenance** phases of the Machine Learning Lifecycle. For more information, please refer to the [Machine Learning Best Practices in Healthcare and Life Sciences Whitepaper](https://d1.awsstatic.com/whitepapers/ML-best-practices-health-science.pdf?did=wp_card&trk=wp_card).

![Machine Learning Life Cycle - Part 1](img/MLLC2.png "ML Life Cycle - Part 1")

### 1.A. Amazon SageMaker Model Building Pipelines

[Amazon SageMaker Model Building Pipelines](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html) is a tool for building machine learning pipelines that take advantage of direct SageMaker integration. Because of this integration, you can create a pipeline and set up SageMaker Projects for orchestration using a tool that handles much of the step creation and management for you. You can manage these pipelines in the SageMaker Studio UI and automatically capture data and model lineage.

One of the challenges with deploying ML solutions is that their effectiveness can change over time.  For example, perhaps the distribution of your data shifts from year-to-year? Or the boundaries of a classification category? In these cases, you want to be able to quickly retrain and deploy new versions of your model, either on a schedule or in response to some event.

Amazon SageMaker Pipelines allows us to define reproducible ML processes that we can trigger at will. In this example, we'll use the processing, training, and registration artifacts from above to create a pipeline and demonstrate how to execute it.

### 1.B. Amazon SageMaker Model Registry

The [Amazon SageMaker Model Registry](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html) is a managed service that allows you to track model metadata, approve releases, and deploy new versions to production. It involves two concepts:

- A **Model Package Group** is a group of models that share a common business goal. For example, you might create a model package group to track models for segmenting a specific kind of medical image.
- A **Model Package** or **Model Version** is a member of a Model Package Group. It refers to the a specific implementation of a model with its own training artifact and/or inference container.

### 1.C. Amazon SageMaker Model Lineage

[Amazon SageMaker Model Lineage](https://docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking.html) creates and stores information about the steps of a machine learning (ML) workflow from data preparation to model deployment. With the tracking information, you can reproduce the workflow steps, track model and dataset lineage, and establish model governance and audit standards.

-----
## 2. Create SageMaker MLOps Project

### 2.A. Create a New Project Using the Build, Traing, and Deploy Template

1. From the SageMaker Studio sidebar, select the SageMaker Resources icon.

![Resources](img/resources.png)

2. Select **Projects** from the resources menu.
3. Select **Create project**
4. In the **SageMaker project templates** view, select the **MLOps template for model building, training, and deployment** template and then **Select Template**.

![Create Project](img/create_project.png)

5. In the **Project details** view, type `her2-brca-classifier` in the **Name** field and select **Create Project**.

![Create Project](img/name_project.png)

6. Wait approximately 2 minutes for the CloudFormation templation template associated with the project to finish deployment.

### 2.B. Clone the New Git Repositories
1. Once the project has successfully been created, navigate to the **Repositories** tab in the project view.

![Repositories](img/repositories.png)

2. Select the **clone repo...** link for the first repository and then **Clone Repository** with the default options on the next view.

![Default Repo Settings](img/repo_defaults.png)

3. Repeat for the second repository.
4. From the SageMaker Studio sidebar, select the **File Browser** icon.
5. Verify that there is a new folder in your home directory named `her2-brca-classifier`, followed by the project ID. There should also be two subfolders, one for the model build steps and another for the model deploy steps.

![Default Repo Settings](img/cloned_folders.png)

6. Within the build subfolder (It will be named something like `sagemaker-her2-brca-classifier-[PROJECT ID]-modelbuild`) navigate to `pipelines/abalone`. This folder will contain three files:
- `evaluate.py`: A Python module for measuring model performance.
- `pipeline.py`: A Python module that defines a SageMaker Pipelines model building workflow.
- `preprocess.py`: A Python module for running a data processing job.

Each of these files contains placeholder code for now. We'll update them with our own code in the next section.

### 2.C. Update the Build Repository
1. Navigate to the workshop folder (i.e. the one holding this notebook) in the File Browser.
1. Right-click on the "abalone" folder and select **Cut**.
1. Navigate back to the model build pipeline folder (e.g. `~/her2-brca-classifier-[PROJECT ID]/sagemaker-her2-brca-classifier-[PROJECT ID]-modelbuild/pipelines`), right-click, and select **Paste**. This will overwrite the existing `abalone` folder with the custom one.
1. From the SageMaker Studio sidebar, select the **Git** icon. The **evaluate**, **pipeline.py**, **preprocess.py** files should all appear in the **Changed** section of the **Changes** tag. Click on the `+` symbol on the section header to stage all changes. Write a short summary of the changes and then select **Commit**
1. Select **Push committed changes** (The cloud with an up arrow icon) from the top of the Git view to push your changes to CodeCommit. This will restart the MLOps process and build a new version of your pipeline and model.
1. It will take approximately 15 minutes to rebuild and execute the pipeline. You can track the progress either on the **Pipelines** tab of the project view or on the AWS **CodeBuild** console.

![Pipeline Execution](img/pipeline_execution.png)

### 2.D. Approve the New Model Version
1. Navigate back to the project view.
2. Select the **Model groups** tab.
3. Double-click on the model group name (e.g. `her2-brca-classifier-[PROJECT ID]`) to view the available model versions.
4. Double-click on model version 2 to view its details.
5. Navigate between the **Activity**, **Model quality**, and **Settings** tabs to view information about the model inference endpoint.
6. Select the orange **Update Status** button in the upper-right corner of the model registry view.
7. Update the **Approved** status and (optionally) add a comment.

![Update the model status](img/update-status.png "Update the model status")

8. Wait several minutes for the "Staging" endpoint appear in the Endpoints tab.

## 3. Test the Model Inference Endpoint

### 3.A. Import Libraries and Create Clients

In [9]:
# %pip install -r ../requirements.txt -q -q
%pip install --disable-pip-version-check setuptools==59.5.0 -q -q
%pip install --disable-pip-version-check -U sagemaker jsonlines pyvis -q -q

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Note: you may need to restart the kernel to use updated packages.
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Note: you may need to restart the kernel to use updated packages.


In [17]:
import base64
import boto3
import jsonlines
import os
import pandas as pd
import sagemaker
from sagemaker.lineage.context import Context, EndpointContext
from sagemaker.lineage.lineage_trial_component import LineageTrialComponent
from sagemaker.lineage.visualizer import LineageTableVisualizer
from sagemaker.predictor import Predictor
from time import strftime, sleep
from visualizer.visualizer import Visualizer

boto_session = boto3.session.Session()
region = boto_session.region_name
sagemaker_session = sagemaker.session.Session(boto_session)
sagemaker_execution_role = sagemaker.session.get_execution_role(sagemaker_session)
sagemaker_boto_client = boto_session.client("sagemaker")
s3_boto_client = boto_session.client("s3")
account_id = boto_session.client("sts").get_caller_identity().get("Account")
print(f"Assumed SageMaker role is {sagemaker_execution_role}")

project_name = "her2-brca-classifier" #Change this if you named your project something else
project_name = "brca-classification" #Change this if you named your project something else
project_id = sagemaker_boto_client.describe_project(ProjectName=project_name).get('ProjectId')
print(f"SageMaker project ID is {project_id}")

s3_bucket = f"sagemaker-project-{project_id}"
pipeline_name = model_package_group_name = f"{project_name}-{project_id}"
staging_endpoint_name=f"{project_name}-staging"
prod_endpoint_name=f"{project_name}-prod"

Assumed SageMaker role is arn:aws:iam::167428594774:role/service-role/AmazonSageMaker-ExecutionRole-20220126T100392
SageMaker project ID is p-k8h7hzoq80hg


### 3.B. Examine Model Training Reports

In [18]:
# Download training reports
last_training_job_name = sagemaker_boto_client.list_training_jobs().get("TrainingJobSummaries")[0].get("TrainingJobName")
rule_output_path = f"s3://sagemaker-project-{project_id}/{last_training_job_name}/rule-output"
print(f"Downloading training reports from {rule_output_path}")
sagemaker.s3.S3Downloader.download(s3_uri=rule_output_path, local_path="training_reports/")

Downloading training reports from s3://sagemaker-project-p-k8h7hzoq80hg/pipelines-s7sjppjkseso-TrainHER2Model-oAVf2ME1ES/rule-output


-----
### 3.C. Invoke the Staging Model Endpoint

In [23]:
# Create a Predictor object for testing
predictor = Predictor(
    endpoint_name=staging_endpoint_name,
    sagemaker_session=sagemaker_session,
    serializer=sagemaker.serializers.CSVSerializer(),
    deserializer=sagemaker.deserializers.JSONDeserializer(),
)

# Load a random sample of 10 records from the validation data
test_df = pd.read_csv("data/output/test/test.csv").sample(n=25)

# Submit the 10 samples to the inference endpoint and compare the actual and predicted values
print("Sending test traffic to the endpoint {}. \nPlease wait...".format(staging_endpoint_name))

for i, row in test_df.iterrows():    
    print(f"[Actual | predicted] labels for record {i:3} are [{row[0]} | {predictor.predict(row.iloc[1:]):.3f}]")
    sleep(0.1)    

Sending test traffic to the endpoint brca-classification-staging. 
Please wait...
[Actual | predicted] labels for record  17 are [0.0 | 0.004]
[Actual | predicted] labels for record  65 are [0.0 | 0.001]
[Actual | predicted] labels for record  78 are [0.0 | 0.207]
[Actual | predicted] labels for record  94 are [0.0 | 0.021]
[Actual | predicted] labels for record  75 are [0.0 | 0.009]
[Actual | predicted] labels for record 114 are [0.0 | 0.003]
[Actual | predicted] labels for record  27 are [0.0 | 0.026]
[Actual | predicted] labels for record  62 are [0.0 | 0.001]
[Actual | predicted] labels for record 100 are [0.0 | 0.003]
[Actual | predicted] labels for record  84 are [0.0 | 0.003]
[Actual | predicted] labels for record  71 are [0.0 | 0.010]
[Actual | predicted] labels for record  93 are [0.0 | 0.003]
[Actual | predicted] labels for record  23 are [1.0 | 0.986]
[Actual | predicted] labels for record 120 are [0.0 | 0.001]
[Actual | predicted] labels for record 117 are [0.0 | 0.009]
[Ac

Wait for the monitoring data to finish processing. This will take about a minute to complete.

In [25]:
# Watch the S3 bucket we specified above for storing monitoring data
endpoint_capture_uri = sagemaker_boto_client.describe_endpoint(EndpointName=staging_endpoint_name).get('DataCaptureConfig').get('DestinationS3Uri')
endpoint_capture_bucket = sagemaker.s3.parse_s3_url(endpoint_capture_uri)[0]
endpoint_capture_prefix = sagemaker.s3.parse_s3_url(endpoint_capture_uri)[1]
result = {}
while result.get("Contents") is None:
    print("Waiting for endpoint monitoring data to populate...")
    result = s3_boto_client.list_objects(
        Bucket=endpoint_capture_bucket, 
        Prefix=endpoint_capture_prefix
    )
    sleep(10)
capture_files = [capture_file.get("Key") for capture_file in result.get("Contents")]
print("Found Capture Files:")
print("\n ".join(capture_files))

Waiting for endpoint monitoring data to populate...
Found Capture Files:
datacapture-staging/brca-classification-staging/AllTraffic/2022/04/27/20/59-31-021-b47ddd47-be52-47ce-8679-b8703946492b.jsonl
 datacapture-staging/brca-classification-staging/AllTraffic/2022/04/27/20/59-57-331-02589d90-cf16-47f7-9cd0-b729091e424b.jsonl


Examine the contents of the first data capture file

In [26]:
# Download the monitoring data from S3
sagemaker_session.download_data("data", bucket=endpoint_capture_bucket, key_prefix=capture_files[0])
runs = []

# Open the jsonlines file and summarize the contents
with jsonlines.open(f"data/{os.path.basename(capture_files[0])}") as reader:
    [runs.append(obj) for obj in reader]

print(f"Number of runs captured in file: {len(runs)}")
print(f"First event metadata: {runs[0]['eventMetadata']}")

first_request = runs[0]['captureData']['endpointInput']['data']
decoded_first_request = base64.b64decode(first_request).decode('ascii')
print(f"First event input: {decoded_first_request[:2000]}...")

first_response = runs[0]['captureData']['endpointOutput']['data']
decoded_first_response = base64.b64decode(first_response).decode('ascii')

print(f"First event output: {decoded_first_response}")

Number of runs captured in file: 24
First event metadata: {'eventId': '8bc7e4fa-b45d-4a47-a369-7e6eeaea877e', 'inferenceTime': '2022-04-27T20:59:31Z'}
First event input: 0.470007530238,-4.49192633632,-0.531035005853,0.404428014046,-0.592278134998,-0.654309910261,-0.425694490831,0.779354342567,0.023400593055,0.680028458837,1.37491071401,-0.543084120172,-1.29535382755,-0.07248715567689999,-0.979302528287,0.252941521654,0.7041523995319999,-3.3786490440900003,-0.5059047678499999,-0.101760452595,-0.404498899727,0.050978821693400005,-0.479071065158,-2.49481390558,0.442446554819,1.07251135388,1.5930061412399998,0.948075965665,1.28373373391,0.0194023800234,-2.54176065548,0.601995575497,-0.126100358954,-0.221068646118,2.1133939602000003,-0.442505922747,-1.9531151073,1.92173272727,1.49256006243,-2.67788274678,5.26076107686,-0.0118447132267,-0.512985946157,0.023835630121,1.7993688021799998,0.158434880999,-1.65111706594,-0.7524733437379999,0.231829504487,-0.7076605696449999,0.51994125634,-2.882875

Resubmit the data from the first prediction

In [27]:
predictor.predict(decoded_first_request)

0.003618627553805709

-----
## 4. Explore the Model Lineage

Effective model governance requires a detailed understanding of the data and data transformations used in the modeling process, in addition to nearly continuous tracking of all model development iterations. It is important to keep track of which dataset was used, what transformations were applied to the data, where the dataset was stored, and what type of model was built. This metadata that tracks the relationships between various entities in your ML workflows is called the "lineage".

In this section, we'll explore the model artifacts and events that Amazon SageMaker ML Lineage Tracking creates for us automatically. We'll also see how to expand the lineage by manually adding additional artifacts.

### 4.A. Visualize Lineage Entities as a Table

Amazon SageMaker automatically creates tracking entities for SageMaker jobs, models, model packages, and endpoints if the data is available.

In [28]:
from sagemaker.workflow import pipeline
pipeline_execution_arn = sagemaker_boto_client.list_pipeline_executions(
    PipelineName=pipeline_name
    ).get('PipelineExecutionSummaries')[0].get('PipelineExecutionArn')

execution = sagemaker.workflow.pipeline._PipelineExecution(arn=pipeline_execution_arn)
table_viz = LineageTableVisualizer(sagemaker_session=sagemaker_session)
for execution_step in reversed(execution.list_steps()):
    print(execution_step.get('StepName'))
    display(table_viz.show(pipeline_execution_step=execution_step))
    sleep(1)

PreprocessHER2Data


Unnamed: 0,Name/Source,Direction,Type,Association Type,Lineage Type
0,s3://...8811b0b83bbf1af/input/code/preprocess.py,Input,DataSet,ContributedTo,artifact
1,25775...om/sagemaker-scikit-learn:0.23-1-cpu-py3,Input,Image,ContributedTo,artifact
2,s3://...1aa62c930b0568811b0b83bbf1af/output/test,Output,DataSet,Produced,artifact
3,s3://...930b0568811b0b83bbf1af/output/validation,Output,DataSet,Produced,artifact
4,s3://...aa62c930b0568811b0b83bbf1af/output/train,Output,DataSet,Produced,artifact


TrainHER2Model


Unnamed: 0,Name/Source,Direction,Type,Association Type,Lineage Type
0,s3://...930b0568811b0b83bbf1af/output/validation,Input,DataSet,ContributedTo,artifact
1,s3://...aa62c930b0568811b0b83bbf1af/output/train,Input,DataSet,ContributedTo,artifact
2,25775...-2.amazonaws.com/sagemaker-xgboost:1.2-1,Input,Image,ContributedTo,artifact
3,s3://...HER2Model-oAVf2ME1ES/output/model.tar.gz,Output,Model,Produced,artifact


EvaluateHER2Model


Unnamed: 0,Name/Source,Direction,Type,Association Type,Lineage Type
0,s3://...eb0c265f465a8ce8b/input/code/evaluate.py,Input,DataSet,ContributedTo,artifact
1,s3://...1aa62c930b0568811b0b83bbf1af/output/test,Input,DataSet,ContributedTo,artifact
2,s3://...HER2Model-oAVf2ME1ES/output/model.tar.gz,Input,Model,ContributedTo,artifact
3,25775...-2.amazonaws.com/sagemaker-xgboost:1.2-1,Input,Image,ContributedTo,artifact
4,s3://...c8346eb0c265f465a8ce8b/output/evaluation,Output,DataSet,Produced,artifact


CheckHER2Evaluation


None

RegisterHER2Model


Unnamed: 0,Name/Source,Direction,Type,Association Type,Lineage Type
0,brca-classification-p-k8h7hzoq80hg-2-Approved-...,Input,Approval,ContributedTo,action
1,s3://...HER2Model-oAVf2ME1ES/output/model.tar.gz,Input,Model,ContributedTo,artifact
2,25775...-2.amazonaws.com/sagemaker-xgboost:1.2-1,Input,Image,ContributedTo,artifact
3,brca-classification-p-k8h7hzoq80hg-2-PendingMa...,Input,Approval,ContributedTo,action
4,brca-classification-staging-1651092533-1-aws-e...,Output,ModelDeployment,ContributedTo,action
5,brca-classification-p-k8h7hzoq80hg-1651085910-...,Output,ModelGroup,AssociatedWith,context


### 4.B. Visualize Lineage Entities as a Graph

We can also visualize the ML lineage as a graph.

In [30]:
endpoint_info = sagemaker_boto_client.describe_endpoint(EndpointName=staging_endpoint_name)
endpoint_arn = endpoint_info["EndpointArn"]
print(f"Endpoint Name: {endpoint_info['EndpointName']}")

# Get the endpoint context for querying the lineage graph
contexts = Context.list(source_uri=endpoint_arn, sagemaker_session=sagemaker_session)
context_name = list(contexts)[0].context_name

viz = Visualizer()
print("Querying lineage for context", context_name)
endpoint_context = EndpointContext.load(
    context_name=context_name, sagemaker_session=sagemaker_session
)
query_response = sagemaker_boto_client.query_lineage(
    StartArns=[endpoint_context.context_arn],
    Direction="Ascendants",
    IncludeEdges=True,
)
viz.render(query_response, "Endpoint", sagemaker_session=sagemaker_session)

Endpoint Name: brca-classification-staging
Querying lineage for context brca-classification-staging-1651092533-aws-endpoint


## 5. (Optional) Deploy to Production
1. In the AWS Console, search for and select **CodePipeline**.

![Search for CodePipeline](img/code-pipeline.png)

2. Navigate to **Pipeline > Pipelines** and select the model deploy pipeline already in progress.

![Find Prod Deploy Pipeline](img/find-prod-deploy.png)

3. Scroll down to the **DeployStaging** stage and select Review.

![Select Review](img/deploy-stage.png)

4. Select **Approve** in the Review view.

![Approve Prod Deployment](img/approve-prod.png)

5. Navigate back to the SageMaker Project view. After several minutes, a second "prod" endpoint will appear.

![Second Endpoint](img/second-endpoint.png)


## 6. Clean Up

In [33]:
if False: # Switch this to True and run the cell to delete resources
    # Delete model registry records
    for package in sagemaker_boto_client.list_model_packages(ModelPackageGroupName=model_package_group_name).get("ModelPackageSummaryList"):
        print(package)
        sagemaker_boto_client.delete_model_package(ModelPackageName=package.get("ModelPackageArn"))
    sagemaker_boto_client.delete_model_package_group(ModelPackageGroupName=model_package_group_name)

    # Delete endpoint
    predictor.delete_endpoint()

    # Delete pipeline
    sagemaker_boto_client.delete_pipeline(PipelineName=pipeline_name)

    # Delete all S3 objects
    bucket = boto_session.resource("s3").Bucket(s3_bucket)
    bucket.objects.filter().delete()
    bucket.delete()
    
    # Delete Project
    sagemaker_boto_client.delete_project(ProjectName=project_name)
    
    # Delete deployment infrastructure
    cfn = boto3.client('cloudformation')
    cfn.delete_stack(StackName=f"sagemaker-{project_name}-{project_id}-deploy-staging")
    cfn.delete_stack(StackName=f"sagemaker-{project_name}-{project_id}-deploy-prod")    
    
    # Delete local  objects
    os.system(f"rm -rf ~/{project_name}-{project_id}")
    os.system("rm -rf data models generated training_reports")