## Build a Custom Model monitoring for Foundation Models with Amazon SageMaker Model Monitor

This notebook shows how to:

* Test custom monitoring script locally
* Build a Docker container to include your custom drift algorithms
* Monitor a live llama2 model endpoint for answer relevance


Amazon SageMaker enables you to capture the input, output and metadata for invocations of the models that you deploy. It also enables you to bring your own metrics to analyze the data and monitor its quality. In this notebook, you learn how Amazon SageMaker enables these capabilities.

## Prerequisite

To get started, make sure you have these prerequisites completed.

* Complete the previous lab where you hosted a fine tuned Llama 2 model and enabled data capture on the live endpoint.
* Add **Amazon Bedrock permission** to SageMaker Execution Role

**inline policy**
```
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "BedrockConsole",
			"Effect": "Allow",
			"Action": [
				"bedrock:*"
			],
			"Resource": "*"
		}
	]
}
```
**trusted relationship**
```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "sagemaker.amazonaws.com",
                    "bedrock.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
```
* Add permission to access ECR: Add **AmazonEC2ContainerRegistryFullAccess** policy to SageMaker Execution Role

### Setup

In [None]:
!pip install -Uq langchain
!pip install -Uq botocore
!pip install -Uq boto3

In [None]:
# Handful of configuration

import os
import boto3
import json
from sagemaker import get_execution_role, session

region= boto3.Session().region_name

sm_client = boto3.client('sagemaker')

role = get_execution_role()
print("RoleArn: {}".format(role))

Bring the parameters from previous lab

In [None]:
endpoint_name = <provide endpoint name>
default_bucket = <default bucket>
current_endpoint_capture_prefix = ""
s3_key_prefix = ""

Download example captured data for testing

Example file path from lab 2: s3://sagemaker-project-p-nebjikc0mfsc/datacapture-staging/hf-llama2-b987c-pipeline-staging/AllTraffic/2023/11/14/04/

In [None]:
!aws s3 cp s3://{default_bucket}/{current_endpoint_capture_prefix} workspace/data

## Test script locally

Preview the custom algorithm script to evaluate answer relevance.

Explain how the algrorithm works.

In [None]:
!pygmentize workspace/src/llm_monitoring.py

In [None]:
import json 
import base64
import os 
import pathlib

infer_dir = os.path.join(os.getcwd(), "workspace/data")

for filepath in pathlib.Path(infer_dir).rglob('*.jsonl'):
    print(filepath)
    with open(filepath, 'r') as handle:
        json_data = [json.loads(line) for line in handle]

In [None]:
base64.b64decode(json_data[1]['captureData']['endpointInput']['data']).decode('utf-8')

In [None]:
base64.b64decode(json_data[1]['captureData']['endpointOutput']['data']).decode('utf-8')

In [None]:
os.environ['dataset_source'] = f'{os.getcwd()}/workspace/data'
os.environ['output_path'] = f'{os.getcwd()}/workspace/output'

!python workspace/src/llm_monitoring.py

## Bring your own custom algorithm for model monitoring

In order to bring your own custom algorithm for model monitoring, you need to do following things:
* Create custom detection algorithms. We have included algorithms under src folder
* Create a Docker container.
* Set enviornmental variables where the container can find the datacapture data from SageMaker Model Monitor. These variables have to match with the values we provide to monitor scheduler later.## Test container locally

preview the Dockerfile

In [None]:
!pygmentize workspace/Dockerfile

Build & test docker container locally.

In [None]:
!cd workspace && docker build -t workspace .

In [None]:
!docker run -v {os.getcwd()}/workspace/data:/home/data -v {os.getcwd()}/workspace/output:/home/output -e dataset_source=data/ -e output_path=output workspace

Build & push the container to ECR

In [None]:
from docker_utils import build_and_push_docker_image

repository_short_name = 'custom-llm-monitor'

image_name = build_and_push_docker_image(repository_short_name, dockerfile='workspace/Dockerfile', context='workspace')

### Create monitoring schedule to detect drifts on hourly basis
Default Model monitor can be setup to monitor the inference on an hourly basis against the baseline metrics and violations. In this example, we are setting custom model monitor. For this purpose, we are using Boto3 calls directly to setup model monitor with the container we built above. Note that we need to setup input and output paths on the container.

In [None]:
s3_result_path = f's3://{default_bucket}/{s3_key_prefix}/result/{endpoint_name}'

sm_client.create_monitoring_schedule(
    MonitoringScheduleName=endpoint_name,
    MonitoringScheduleConfig={
        'ScheduleConfig': {
            'ScheduleExpression': 'cron(0 * ? * * *)'
        },
        'MonitoringJobDefinition': {
            'MonitoringInputs': [
                {
                    'EndpointInput': {
                        'EndpointName': endpoint_name,
                        'LocalPath': '/opt/ml/processing/endpointdata'
                    }
                },
            ],
            'MonitoringOutputConfig': {
                'MonitoringOutputs': [
                    {
                        'S3Output': {
                            'S3Uri': s3_result_path,
                            'LocalPath': '/opt/ml/processing/resultdata',
                            'S3UploadMode': 'EndOfJob'
                        }
                    },
                ]
            },
            'MonitoringResources': {
                'ClusterConfig': {
                    'InstanceCount': 1,
                    'InstanceType': 'ml.c5.xlarge',
                    'VolumeSizeInGB': 10
                }
            },
            'MonitoringAppSpecification': {
                'ImageUri': image_name,
            },
            'StoppingCondition': {
                'MaxRuntimeInSeconds': 600
            },
            'Environment': {
                'string': 'string'
            },
            'RoleArn': role
        }
    }
)

## Triggering job execution manually
Instead of waiting for the monitoring job to execute hourly, you can also trigger the execution manually. Model monitoring is essentially a scheduled processing job.

In [None]:
from sagemaker.processing import Processor, ProcessingInput, ProcessingOutput
from urllib.parse import urlparse

# region
# role
data_capture_path=f's3://{default_bucket}/{current_endpoint_capture_prefix}'
# s3_result_path
instance_count=1
instance_type='ml.c5.xlarge'
# publish_cloudwatch_metrics='Disabled'

data_capture_sub_path = data_capture_path[data_capture_path.rfind('datacapture/') :]
data_capture_sub_path = data_capture_sub_path[data_capture_sub_path.find('/') + 1 :]

input_1 = ProcessingInput(input_name='input_1',
                      source=data_capture_path,
                      destination='/opt/ml/processing/input/endpoint/' + data_capture_sub_path,
                      s3_data_type='S3Prefix',
                      s3_input_mode='File')

outputs = ProcessingOutput(output_name='result',
                           source='/opt/ml/processing/output',
                           destination=s3_result_path,
                           s3_upload_mode='Continuous')

env = {'dataset_source': '/opt/ml/processing/input/endpoint',
       'output_path': '/opt/ml/processing/output'}

processor = Processor(image_uri = image_name,
                      instance_count = instance_count,
                      instance_type = instance_type,
                      role=role,
                      env = env)

processor.run(inputs=[input_1], outputs=[outputs])

## Clean up resources
Delete the monitor schedule

In [None]:
sm_client.delete_monitoring_schedule(MonitoringScheduleName=endpoint_name)

In [None]:
!docker stop 94fb507a2d09