## Build a custom monitoring for foundation models with Amazon SageMaker Model Monitor

This notebook shows how to:

* Test custom monitoring script locally
* Build a Docker container to include your custom drift algorithms
* Monitor a live llama2 model endpoint for answer relevance


Amazon SageMaker enables you to capture the input, output and metadata for invocations of the models that you deploy. It also enables you to bring your own metrics to analyze the data and monitor its quality. In this notebook, you learn how Amazon SageMaker enables these capabilities.

## Prerequisite

To get started, make sure you have these prerequisites completed.

* Complete the previous lab where you hosted a fine tuned Llama 2 model and enabled data capture on the live endpoint.
* Add **Amazon Bedrock permission** to SageMaker Execution Role

**inline policy**
```
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "BedrockConsole",
			"Effect": "Allow",
			"Action": [
				"bedrock:*"
			],
			"Resource": "*"
		}
	]
}
```
**trusted relationship**
```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "sagemaker.amazonaws.com",
                    "bedrock.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
```
* Add permission to access ECR: Add **AmazonEC2ContainerRegistryFullAccess** policy to SageMaker Execution Role

### Setup

In [3]:
# Handful of configuration

import os
import boto3
import json
from sagemaker import get_execution_role, session

region= boto3.Session().region_name
a
sm_client = boto3.client('sagemaker')

role = get_execution_role()
print("RoleArn: {}".format(role))

RoleArn: arn:aws:iam::376678947624:role/vegetation-management-works-SageMakerExecutionRole-OZ2K30BYST0I


Bring the parameters from previous lab

In [2]:
%store -r endpoint_name
%store -r default_bucket
%store -r current_endpoint_capture_prefix
%store -r s3_key_prefix

Download example captured data for testing

In [7]:
!aws s3 sync s3://{default_bucket}/{current_endpoint_capture_prefix} workspace/data

download: s3://sagemaker-us-west-2-376678947624/Mikael110-llama-2-7b-guanaco-fp16/datacapture/llama-2-7b-2023-10-21-02-26-02-152-endpoint/AllTraffic/2023/10/21/02/35-09-353-1f41490f-4e1e-4c43-8d9a-2bb25433a0e6.jsonl to workspace/data/AllTraffic/2023/10/21/02/35-09-353-1f41490f-4e1e-4c43-8d9a-2bb25433a0e6.jsonl
download: s3://sagemaker-us-west-2-376678947624/Mikael110-llama-2-7b-guanaco-fp16/datacapture/llama-2-7b-2023-10-21-02-26-02-152-endpoint/AllTraffic/2023/10/21/03/39-07-208-5459146d-0729-4374-b559-5b391308ce08.jsonl to workspace/data/AllTraffic/2023/10/21/03/39-07-208-5459146d-0729-4374-b559-5b391308ce08.jsonl
download: s3://sagemaker-us-west-2-376678947624/Mikael110-llama-2-7b-guanaco-fp16/datacapture/llama-2-7b-2023-10-21-02-26-02-152-endpoint/AllTraffic/2023/10/21/03/41-03-334-9a247cda-7335-4bb7-b896-05f56d5b1afd.jsonl to workspace/data/AllTraffic/2023/10/21/03/41-03-334-9a247cda-7335-4bb7-b896-05f56d5b1afd.jsonl


## Test script locally

Preview the custom algorithm script to evaluate answer relevance.

Explain how the algrorithm works.

In [36]:
!pygmentize workspace/src/llm_monitoring.py

[34mimport[39;49;00m [04m[36mboto3[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mpathlib[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mre[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mbase64[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m[37m[39;49;00m
[34mimport[39;49;00m [04m[36mnumpy[39;49;00m [34mas[39;49;00m [04m[36mnp[39;49;00m[37m[39;49;00m
[37m[39;49;00m
[34mfrom[39;49;00m [04m[36mlangchain[39;49;00m[04m[36m.[39;49;00m[04m[36mllms[39;49;00m [34mimport[39;49;00m Bedrock[37m[39;49;00m
[34mfrom[39;49;00m [04m[36mlangchain[39;49;00m[04m[36m.[39;49;00m[04m[36membeddings[39;49;00m [34mimport[39;49;00m BedrockEmbeddings[37m[39;49;00m
[34mfrom[39;49;00m [04m[36mlangchain[39;49;00m[04m[36m.[39;49;00m[04m[36mchains[39;49;00m [34mimport[39;49;00m LLMChain[37m[39;49;00m
[34mfrom[39;

    output = {[37m[39;49;00m
        [33m"[39;49;00m[33mllm_metrics[39;49;00m[33m"[39;49;00m: {[37m[39;49;00m
            [33m"[39;49;00m[33manswer_relevancy[39;49;00m[33m"[39;49;00m: {[33m"[39;49;00m[33mvalue[39;49;00m[33m"[39;49;00m: np.mean(scores), [33m"[39;49;00m[33mstandard_deviation[39;49;00m[33m"[39;49;00m: np.std(scores)},[37m[39;49;00m
        },[37m[39;49;00m
    }[37m[39;49;00m
[37m[39;49;00m
    [34mwith[39;49;00m [36mopen[39;49;00m([33mf[39;49;00m[33m"[39;49;00m[33m{[39;49;00mos.environ[[33m'[39;49;00m[33moutput_path[39;49;00m[33m'[39;49;00m][33m}[39;49;00m[33m/results.json[39;49;00m[33m"[39;49;00m, [33m'[39;49;00m[33mw[39;49;00m[33m'[39;49;00m) [34mas[39;49;00m f:[37m[39;49;00m
        json.dump(output, f)[37m[39;49;00m
[37m[39;49;00m
[37m[39;49;00m
[34mif[39;49;00m [31m__name__[39;49;00m == [33m'[39;49;00m[33m__main__[39;49;00m[33m'[39;49;00m:[37m[39;49;00m
    [37m[39;49;

In [37]:
os.environ['dataset_source'] = f'{os.getcwd()}/workspace/data'
os.environ['output_path'] = f'{os.getcwd()}/workspace/output'

!python workspace/src/llm_monitoring.py

['Tell me a fun fact about Boca Raton, Florida', 'What is an anemone?', 'What are some quick ways to lose all of my money?', 'What is core banking?', 'What are some items that you might see in a fridge?', 'what can we do when coffee spill on laptop to make it working', 'Using examples taken from the paragraph, provide the major risks to humans with climate change in a short bulleted list', 'How many world championships has Max Verstappen won?', 'Which is a species of fish? Tetra or Quart', 'Why is pricing important in the overall strategy of a product?']
['Boca Raton is known for various items including:', 'An anemone is a flower with multiple petals that are joined at the center, forming a shape that resembles a cup. The petals are usually arranged in a radial pattern, with the center of the flower being the hub of the anemone. The flower is typically pink, red, or white in color, and is characterized by its delicate and intricate appearance. The anemone is a very popular flower choic

## Bring your own custom algorithm for model monitoring

In order to bring your own custom algorithm for model monitoring, you need to do following things:
* Create custom detection algorithms. We have included algorithms under src folder
* Create a Docker container.
* Set enviornmental variables where the container can find the datacapture data from SageMaker Model Monitor. These variables have to match with the values we provide to monitor scheduler later.## Test container locally

preview the Dockerfile

In [24]:
!pygmentize workspace/Dockerfile

[34mFROM[39;49;00m[37m [39;49;00m[33mpython:3.9-slim-buster[39;49;00m[37m[39;49;00m
[37m[39;49;00m
[34mRUN[39;49;00m[37m [39;49;00mpip3[37m [39;49;00minstall[37m [39;49;00mbotocore[37m [39;49;00m[31mboto3[39;49;00m==[34m1[39;49;00m.28.67[37m [39;49;00m[31mlangchain[39;49;00m==[34m0[39;49;00m.0.319[37m[39;49;00m
[37m[39;49;00m
[34mWORKDIR[39;49;00m[37m [39;49;00m[33m/home[39;49;00m[37m[39;49;00m
[37m[39;49;00m
[34mCOPY[39;49;00m[37m [39;49;00msrc/*[37m [39;49;00m/home/[37m[39;49;00m
[37m[39;49;00m
[34mENTRYPOINT[39;49;00m[37m [39;49;00m[[33m"python3"[39;49;00m,[37m [39;49;00m[33m"llm_monitoring.py"[39;49;00m][37m[39;49;00m


Build & test docker container locally.

In [25]:
!cd workspace && docker build -t workspace .

Sending build context to Docker daemon  50.69kB
Step 1/5 : FROM python:3.9-slim-buster
3.9-slim-buster: Pulling from library/python

[1Bb88d5577: Already exists 
[1B16e23423: Already exists 
[1Bda260408: Already exists 
[1Bc79126f6: Already exists 
[1B130fa3ec: Already exists Digest: sha256:320a7a4250aba4249f458872adecf92eea88dc6abd2d76dc5c0f01cac9b53990
Status: Downloaded newer image for python:3.9-slim-buster
 ---> c84dbfe3b8de
Step 2/5 : RUN pip3 install botocore boto3==1.28.67 langchain==0.0.319
 ---> Running in c8a35dfa6451
Collecting botocore
  Downloading botocore-1.31.70-py3-none-any.whl (11.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 97.4 MB/s eta 0:00:00
Collecting boto3==1.28.67
  Downloading boto3-1.28.67-py3-none-any.whl (135 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 135.8/135.8 kB 32.1 MB/s eta 0:00:00
Collecting langchain==0.0.319
  Downloading langchain-0.0.319-py3-none-any.whl (1.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.

In [26]:
!docker run -v {os.getcwd()}/workspace/data:/home/data -v {os.getcwd()}/workspace/output:/home/output -e dataset_source=data/ -e output_path=output workspace

['Tell me a fun fact about Boca Raton, Florida', 'What is an anemone?', 'What are some quick ways to lose all of my money?', 'What is core banking?', 'What are some items that you might see in a fridge?', 'what can we do when coffee spill on laptop to make it working', 'Using examples taken from the paragraph, provide the major risks to humans with climate change in a short bulleted list', 'How many world championships has Max Verstappen won?', 'Which is a species of fish? Tetra or Quart', 'Why is pricing important in the overall strategy of a product?']
['Boca Raton is known for various items including:', 'An anemone is a flower with multiple petals that are joined at the center, forming a shape that resembles a cup. The petals are usually arranged in a radial pattern, with the center of the flower being the hub of the anemone. The flower is typically pink, red, or white in color, and is characterized by its delicate and intricate appearance. The anemone is a very popular flower choi

Build & push the container to ECR

In [27]:
from docker_utils import build_and_push_docker_image

repository_short_name = 'custom-llm-monitor'

image_name = build_and_push_docker_image(repository_short_name, dockerfile='workspace/Dockerfile', context='workspace')

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


Building docker image custom-llm-monitor from workspace/Dockerfile
$ docker build -t custom-llm-monitor -f workspace/Dockerfile workspace
Sending build context to Docker daemon  50.69kB
Step 1/5 : FROM python:3.9-slim-buster
 ---> c84dbfe3b8de
Step 2/5 : RUN pip3 install botocore boto3==1.28.67 langchain==0.0.319
 ---> Using cache
 ---> cc57b9b43172
Step 3/5 : WORKDIR /home
 ---> Using cache
 ---> 5f643878788d
Step 4/5 : COPY src/* /home/
 ---> Using cache
 ---> 30695f80f2d3
Step 5/5 : ENTRYPOINT ["python3", "llm_monitoring.py"]
 ---> Using cache
 ---> 02368b153529
Successfully built 02368b153529
Successfully tagged custom-llm-monitor:latest
Done building docker image custom-llm-monitor
ECR repository already exists: custom-llm-monitor
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Logged into ECR
$ docker tag custom-llm-monitor 376678947624.dkr.ecr.us-west-2.amazonaws.com/custom-llm-monitor
Pushing docker image to ECR repository 37667894

### Create monitoring schedule to detect drifts on hourly basis
Default Model monitor can be setup to monitor the inference on an hourly basis against the baseline metrics and violations. In this example, we are setting custom model monitor. For this purpose, we are using Boto3 calls directly to setup model monitor with the container we built above. Note that we need to setup input and output paths on the container.

In [29]:
s3_result_path = f's3://{default_bucket}/{s3_key_prefix}/result/{endpoint_name}'

sm_client.create_monitoring_schedule(
    MonitoringScheduleName=endpoint_name,
    MonitoringScheduleConfig={
        'ScheduleConfig': {
            'ScheduleExpression': 'cron(0 * ? * * *)'
        },
        'MonitoringJobDefinition': {
            'MonitoringInputs': [
                {
                    'EndpointInput': {
                        'EndpointName': endpoint_name,
                        'LocalPath': '/opt/ml/processing/endpointdata'
                    }
                },
            ],
            'MonitoringOutputConfig': {
                'MonitoringOutputs': [
                    {
                        'S3Output': {
                            'S3Uri': s3_result_path,
                            'LocalPath': '/opt/ml/processing/resultdata',
                            'S3UploadMode': 'EndOfJob'
                        }
                    },
                ]
            },
            'MonitoringResources': {
                'ClusterConfig': {
                    'InstanceCount': 1,
                    'InstanceType': 'ml.c5.xlarge',
                    'VolumeSizeInGB': 10
                }
            },
            'MonitoringAppSpecification': {
                'ImageUri': image_name,
            },
            'StoppingCondition': {
                'MaxRuntimeInSeconds': 600
            },
            'Environment': {
                'string': 'string'
            },
            'RoleArn': role
        }
    }
)

{'MonitoringScheduleArn': 'arn:aws:sagemaker:us-west-2:376678947624:monitoring-schedule/llama-2-7b-2023-10-21-02-26-02-152-endpoint',
 'ResponseMetadata': {'RequestId': 'afb5edca-30bb-433b-b4ee-4a8b8072c5f6',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'afb5edca-30bb-433b-b4ee-4a8b8072c5f6',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '132',
   'date': 'Wed, 25 Oct 2023 01:18:44 GMT'},
  'RetryAttempts': 0}}

## Triggering job execution manually
Instead of waiting for the monitoring job to execute hourly, you can also trigger the execution manually. Model monitoring is essentially a scheduled processing job.

In [30]:
from sagemaker.processing import Processor, ProcessingInput, ProcessingOutput
from urllib.parse import urlparse

# region
# role
data_capture_path=f's3://{default_bucket}/{current_endpoint_capture_prefix}'
# s3_result_path
instance_count=1
instance_type='ml.c5.xlarge'
# publish_cloudwatch_metrics='Disabled'

data_capture_sub_path = data_capture_path[data_capture_path.rfind('datacapture/') :]
data_capture_sub_path = data_capture_sub_path[data_capture_sub_path.find('/') + 1 :]

input_1 = ProcessingInput(input_name='input_1',
                      source=data_capture_path,
                      destination='/opt/ml/processing/input/endpoint/' + data_capture_sub_path,
                      s3_data_type='S3Prefix',
                      s3_input_mode='File')

outputs = ProcessingOutput(output_name='result',
                           source='/opt/ml/processing/output',
                           destination=s3_result_path,
                           s3_upload_mode='Continuous')

env = {'dataset_source': '/opt/ml/processing/input/endpoint',
       'output_path': '/opt/ml/processing/output'}

processor = Processor(image_uri = image_name,
                      instance_count = instance_count,
                      instance_type = instance_type,
                      role=role,
                      env = env)

processor.run(inputs=[input_1], outputs=[outputs])

INFO:sagemaker:Creating processing-job with name custom-llm-monitor-2023-10-25-01-18-49-705


..............................................[34m['Using examples taken from the paragraph, provide the major risks to humans with climate change in a short bulleted list', 'How many world championships has Max Verstappen won?', 'Which is a species of fish? Tetra or Quart', 'Why is pricing important in the overall strategy of a product?', 'what can we do when coffee spill on laptop to make it working', 'Tell me a fun fact about Boca Raton, Florida', 'What is an anemone?', 'What are some quick ways to lose all of my money?', 'What is core banking?', 'What are some items that you might see in a fridge?', 'Why some people are more stressed than others and how to manage stress?', 'Which episode of The X-Files did Dana Scully get diagnosed with cancer?', 'Why do I have a belly button?'][0m
[34m[None, 'Max Verstappen has won five World Championships: ', 'Quart### Explanation', 'Pricing is a crucial element of a product’s strategy as it directly impacts a product’s success in the market. 

## Clean up resources
Delete the monitor schedule

In [28]:
sm_client.delete_monitoring_schedule(MonitoringScheduleName=endpoint_name)

{'ResponseMetadata': {'RequestId': '436eb725-dfa4-4b95-8e1e-a72a8484baaf',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '436eb725-dfa4-4b95-8e1e-a72a8484baaf',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Wed, 25 Oct 2023 01:18:39 GMT'},
  'RetryAttempts': 0}}

In [35]:
!docker stop 94fb507a2d09

94fb507a2d09
