# <B> # SageMaker monitor </B>
* Container: codna_python3
    - https://github.com/aws-samples/amazon-sagemaker-data-quality-monitor-custom-preprocessing
    - https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker_model_monitor/introduction
    - https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/introduction/SageMaker-ModelMonitoring.html#Create-a-baselining-job-with-training-dataset
    - 컬럼수 안맞을때: https://repost.aws/questions/QU8Xkelo1ARA2zcn4rHuk09w/questions/QU8Xkelo1ARA2zcn4rHuk09w/sagemaker-model-monitor-missing-columns-constraint-violation?

## AutoReload

In [126]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## 0. Install packages

In [127]:
install_needed = True  # should only be True once
# install_needed = False

In [3]:
%%bash
#!/bin/bash

DAEMON_PATH="/etc/docker"
MEMORY_SIZE=10G

FLAG=$(cat $DAEMON_PATH/daemon.json | jq 'has("data-root")')
# echo $FLAG

if [ "$FLAG" == true ]; then
    echo "Already revised"
else
    echo "Add data-root and default-shm-size=$MEMORY_SIZE"
    sudo cp $DAEMON_PATH/daemon.json $DAEMON_PATH/daemon.json.bak
    sudo cat $DAEMON_PATH/daemon.json.bak | jq '. += {"data-root":"/home/ec2-user/SageMaker/.container/docker","default-shm-size":"'$MEMORY_SIZE'"}' | sudo tee $DAEMON_PATH/daemon.json > /dev/null
    sudo service docker restart
    echo "Docker Restart"
fi

Already revised


In [4]:
import sys
import IPython

if install_needed:
    print("installing deps and restarting kernel")
    !{sys.executable} -m pip install -U pip
    !{sys.executable} -m pip install -U smdebug sagemaker-experiments
    !{sys.executable} -m pip install -U sagemaker
    !{sys.executable} -m pip install -U xgboost==1.3.1

    IPython.Application.instance().kernel.do_shutdown(True)

installing deps and restarting kernel
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting pip
  Using cached pip-23.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.0.1
    Uninstalling pip-23.0.1:
      Successfully uninstalled pip-23.0.1
Successfully installed pip-23.1
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting sagemaker
  Downloading sagemaker-2.147.0.tar.gz (718 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m718.7/718.7 kB[0m [31m26.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: sagemaker
  Building wheel for sagemaker (setup.py) ... [?25ldone
[?25h  Created wheel for sagemaker: filename=sagemaker-2.147.0-py2.py3-none-any.whl 

## 1. parameter store 설정

In [128]:
import boto3
from utils.ssm import parameter_store

In [129]:
strRegionName=boto3.Session().region_name
pm = parameter_store(strRegionName)
strPrefix = pm.get_params(key="PREFIX")

In [130]:
strBucketName = pm.get_params(key="-".join([strPrefix, "BUCKET"]))
strExecutionRole = pm.get_params(key="-".join([strPrefix, "SAGEMAKER-ROLE-ARN"]))

In [131]:
print (f'strBucketName: {strBucketName}')
print (f'strExecutionRole: {strExecutionRole}')

strBucketName: sagemaker-us-east-1-419974056037
strExecutionRole: arn:aws:iam::419974056037:role/service-role/AmazonSageMaker-ExecutionRole-20221206T163436


## 2. Dataset

In [132]:
import os

In [133]:
strS3DataPath = f"s3://{strBucketName}/dataset" 
strLocalDataPath = os.path.join(os.getcwd(), "data")

## 3.Depoly with Data capture
- https://github.com/aws-samples/amazon-sagemaker-data-quality-monitor-custom-preprocessing
- https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker_model_monitor/introduction
- https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/introduction/SageMaker-ModelMonitoring.html#Create-a-baselining-job-with-training-dataset

### 3.1 Check functions in local mode
[중요] inference.py를 만들어 주어야 함
* model_fn: 학습한 모델 로드
* input_fn: endpoint invocation시 전달 되는 input 처리 하는 함수
* predict_fn: forword propagation, input_fn의 이후 호출 
* output_fn: 유저에게 결과 전달

- 사용자 정의 inference 코드를 정의해서 사용하기 전에, 노트북에서 사전 테스트 및 디버깅을 하고 진행하면 빠르게 추론 개발을 할수 있습니다.
- 디폴트 inference code (input_fn, model_fn, predict_fn, output_fn) 을 사용해도 되지만, 상황에 따라서는 사용자 정의가 필요할 수 있습니다. 디폴트 코드는 아래 링크를 참고 하세요.
    - [Deploy PyTorch Models](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#deploy-pytorch-models)
    - [디폴트 inference Code](https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/default_pytorch_inference_handler.py)

### 로컬 모드 수행시, 새로운 로컬모드 수행을 위해서는 이전 사용했던 도커는 반드시 stop 해줘야 한다
* docker ps -a 로 현재 수행중인 contatiner ID 확인 후
* docker stop "<<contatiner ID>>"
* docker container prune -f

* 3.1.1 inference.py 생성
    - https://aws.amazon.com/ko/blogs/machine-learning/design-a-compelling-record-filtering-method-with-amazon-sagemaker-model-monitor/
    -  We also need to ensure that Flask Response is returned to match both input and output content types exactly. It is a necessary step for Model Monitor to work for the image running Gunicorn/Flask. The content type of output data captured by Model Monitor, which only works with CSV or JSON, is Base64 by default unless Response() explicitly converts it to a specific type.

In [135]:
%%writefile source/deploy/inference.py
import io
import os
import csv
import time
import json
import pickle as pkl
import numpy as np
import pandas as pd
from io import BytesIO
import xgboost as xgb
import sagemaker_xgboost_container.encoder as xgb_encoders
from sagemaker.serializers import CSVSerializer
from io import StringIO

#For Gunicorn/Flask xgboost image, we need to ensure input and output encoding match exactly for model monitor (CSV or JSON)
from flask import Response 

NUM_FEATURES = 58
CSV_SERIALIZER = CSVSerializer(content_type='text/csv')

def model_fn(model_dir):
    """
    Deserialize and return fitted model.
    """
    model_file = "xgboost-model"
    model = xgb.Booster()
    model.load_model(os.path.join(model_dir, model_file))
    return model
                     

def input_fn(request_body, request_content_type):
    """
    The SageMaker XGBoost model server receives the request data body and the content type,
    and invokes the `input_fn`.
    Return a DMatrix (an object that can be passed to predict_fn).
    """

    print (f'Input, Content_type: {request_content_type}')
    if request_content_type == "application/x-npy":        
        stream = BytesIO(request_body)
        array = np.frombuffer(stream.getvalue())
        array = array.reshape(int(len(array)/NUM_FEATURES), NUM_FEATURES)
        return xgb.DMatrix(array)
    
    elif request_content_type == "text/csv":
        return xgb_encoders.csv_to_dmatrix(request_body.rstrip("\n"))
    
    elif request_content_type == "text/libsvm":
        return xgb_encoders.libsvm_to_dmatrix(request_body)
    
    else:
        raise ValueError(
            "Content type {} is not supported.".format(request_content_type)
        )

def predict_fn(input_data, model):
    """
    SageMaker XGBoost model server invokes `predict_fn` on the return value of `input_fn`.

    Return a two-dimensional NumPy array (predictions and scores)
    """
    start_time = time.time()
    y_probs = model.predict(input_data)
    print("--- Inference time: %s secs ---" % (time.time() - start_time))    
    y_preds = [1 if e >= 0.5 else 0 for e in y_probs] 
    #return np.vstack((y_preds, y_probs))
    y_probs = np.array(y_probs).reshape(1, -1)
    y_preds = np.array(y_preds).reshape(1, -1)   
    output = np.concatenate([y_probs, y_preds], axis=1)
    
    return output


def output_fn(predictions, content_type="text/csv"):
    """
    After invoking predict_fn, the model server invokes `output_fn`.
    """
    print (f'Output, Content_type: {content_type}')
    
    if content_type == "text/csv":
        outputs = CSV_SERIALIZER.serialize(predictions)
        print (outputs)
        return Response(outputs, mimetype=content_type)

    elif content_type == "application/json":

        outputs = json.dumps({
            'pred': predictions[0][0],
            'prob': predictions[0][1]
        })                
        #return outputs
        return Response(outputs, mimetype=content_type)
    else:
        raise ValueError("Content type {} is not supported.".format(content_type))

Overwriting source/deploy/inference.py


* 3.1.2 param setting

In [136]:
import time
import sagemaker
from sagemaker.model_monitor import DataCaptureConfig

In [137]:
local_mode = False

if local_mode:
    
    from sagemaker.local import LocalSession
    
    strInstanceType = "local"
    sagemaker_session = LocalSession()
    sagemaker_session.config = {'local': {'local_code': True}}
    strDeployType = "local"
        
else:
    strInstanceType = "ml.p3.2xlarge" #"ml.p3.2xlarge"#"ml.g4dn.8xlarge"#"ml.p3.2xlarge", 'ml.p3.16xlarge' , ml.g4dn.8xlarge
    sagemaker_session = sagemaker.Session()
    strDeployType = "cloud"
    
strS3ModelPath = pm.get_params(key="-".join([strPrefix, "MODEL-PATH"]))
#strEndpointName = "endpoint-cloud-DJ-SM-IMD-1682040411"
strEndpointName = f"endpoint-{strDeployType}-{strPrefix}-{int(time.time())}"
strS3DataCapturePath = os.path.join(
    "s3://{}".format(strBucketName),
    strPrefix,
    "monitor",
    "data-capture"
)

data_capture_config = DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100,
    destination_s3_uri=strS3DataCapturePath,
    capture_options=["REQUEST", "RESPONSE"],
    csv_content_types=["text/csv"]
)

In [138]:
print (f'strInstanceType: {strInstanceType}')
print (f'sagemaker_session: {sagemaker_session}')
print (f'strS3ModelPath: {strS3ModelPath}')
print (f'strEndpointName: {strEndpointName}')

strInstanceType: ml.p3.2xlarge
sagemaker_session: <sagemaker.session.Session object at 0x7fcc52826440>
strS3ModelPath: s3://sagemaker-us-east-1-419974056037/DJ-SM-IMD/training/model-output/DJ-SM-IMD-experiments-0419-04191681877971/output/model.tar.gz
strEndpointName: endpoint-cloud-DJ-SM-IMD-1682062443


* Create model

In [139]:
from sagemaker.xgboost.model import XGBoostModel
from sagemaker.serializers import CSVSerializer, NumpySerializer, JSONSerializer
from sagemaker.deserializers import CSVDeserializer, JSONDeserializer, NumpyDeserializer

In [140]:
xgb_model = XGBoostModel(
    model_data=strS3ModelPath,
    role=strExecutionRole,
    source_dir="./source/deploy",
    entry_point="inference.py",
    framework_version="1.3-1",
    sagemaker_session=sagemaker_session,
)

* Create Endpoint with **data capture**
    * SageMaker SDK는 `deploy(...)` 메소드를 호출 시, `create-endpoint-config`와 `create-endpoint`를 같이 수행합니다. 좀 더 세분화된 파라메터 조정을 원하면 AWS CLI나 boto3 SDK client 활용을 권장 드립니다.

In [141]:
xgb_predictor = xgb_model.deploy(
    endpoint_name=strEndpointName,
    instance_type=strInstanceType, 
    initial_instance_count=1,
    data_capture_config=data_capture_config,
    serializer=CSVSerializer(),
    deserializer=CSVDeserializer(),
    wait=True,
    log=True,
)

INFO:sagemaker.image_uris:Ignoring unnecessary instance type: ml.p3.2xlarge.
INFO:sagemaker:Creating model with name: sagemaker-xgboost-2023-04-21-07-34-12-678
INFO:sagemaker:Creating endpoint-config with name endpoint-cloud-DJ-SM-IMD-1682062443
INFO:sagemaker:Creating endpoint with name endpoint-cloud-DJ-SM-IMD-1682062443


------!

* inference (based on SageMaker SDK)

In [142]:
pdTest = pd.read_csv(f'{strLocalDataPath}/test.csv')
pdLabel = pdTest.iloc[:, 0].astype('int')
pdTest = pdTest.drop('fraud', axis=1)
payload = pdTest.values[108, :]
outputs = xgb_predictor.predict(payload) ## Auto serialization/deserialization
outputs

[['0.29007914662361145', '0.0']]

* inference (based on **boto3**)
    - **boto3 기반 invocation시 runtime_client가 필요**
    - deploy 시 설정했던 "serialization, deserialization"이 적용되지 않음, 즉, **serialization, deserialization을 manually 해 줘야 함**
        - 번거로울 수 있으나 de/serialization에 대한 자유도가 높음

In [143]:
import json
import boto3
import sagemaker
import pandas as pd
import numpy as np

In [144]:
if "local" in strInstanceType: runtime_client = sagemaker.local.LocalSagemakerRuntimeClient()    
else: runtime_client = boto3.Session().client('sagemaker-runtime')
print (f'runtime_client: {runtime_client}')

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


runtime_client: <botocore.client.SageMakerRuntime object at 0x7fcc4adf6860>


In [145]:
pdTest = pd.read_csv(f'{strLocalDataPath}/test.csv')
pdTest = pdTest.drop('fraud', axis=1)

* serialzaiton (csv)

In [146]:
csv_serializer = CSVSerializer()
csv_deserializer = CSVDeserializer()

In [159]:
payload = csv_serializer.serialize(pdTest.values[165, :])

In [160]:
payload,strEndpointName

('17047.719421914477,28347.719421914484,52.0,51.0,0.0,1.0,750.0,2650.0,94601.0,2020.0,1.0,0.0,0.0,11300.0,1.0,21.0,0.0,11.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0',
 'endpoint-cloud-DJ-SM-IMD-1682062443')

In [161]:
response = runtime_client.invoke_endpoint(
    EndpointName=strEndpointName, 
    ContentType='text/csv',
    Accept='text/csv',
    Body=payload
)
pred = np.array(
    csv_deserializer.deserialize(
        stream=response['Body'],
        content_type="text/csv"
    ),
    dtype=np.float32
)
pred

array([[0.1539903, 0.       ]], dtype=float32)

## 4. View captured data

In [162]:
import json

In [163]:
s3_client = boto3.Session().client("s3")

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


In [164]:
def get_obj_body(obj_key, strBucketName):
    return s3_client.get_object(Bucket=strBucketName, Key=obj_key).get("Body").read().decode("utf-8")

In [165]:
current_endpoint_capture_prefix = os.path.join(
    strPrefix,
    "monitor",
    "data-capture",
    strEndpointName
)
result = s3_client.list_objects(Bucket=strBucketName, Prefix=current_endpoint_capture_prefix)
capture_files = [capture_file.get("Key") for capture_file in result.get("Contents")]
print("Found Capture Files:")
print("\n ".join(capture_files))
print (capture_files[len(capture_files) - 1][:capture_files[len(capture_files) - 1].rfind("/")])

Found Capture Files:
DJ-SM-IMD/monitor/data-capture/endpoint-cloud-DJ-SM-IMD-1682062443/AllTraffic/2023/04/21/07/37-45-420-13675920-9041-43dc-ba75-30351ca76e99.jsonl
DJ-SM-IMD/monitor/data-capture/endpoint-cloud-DJ-SM-IMD-1682062443/AllTraffic/2023/04/21/07


In [166]:
capture_file = get_obj_body(capture_files[-1], strBucketName)
#print(capture_file[:2000])
#print(json.dumps(json.loads(capture_file.split("\n")[0]), indent=2))
print(json.dumps(json.loads(capture_file.split("\n")[-2]), indent=2))

{
  "captureData": {
    "endpointInput": {
      "observedContentType": "text/csv",
      "mode": "INPUT",
      "data": "42730.3308953442,54130.3308953442,58.0,111.0,0.0,1.0,750.0,3000.0,85374.0,2018.0,3.0,1.0,1.0,11400.0,6.0,20.0,3.0,11.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0",
      "encoding": "CSV"
    },
    "endpointOutput": {
      "observedContentType": "text/csv; charset=utf-8",
      "mode": "OUTPUT",
      "data": "0.29007914662361145,0.0",
      "encoding": "CSV"
    }
  },
  "eventMetadata": {
    "eventId": "abe96858-af79-4df9-b9a7-7c3ae3bf52a4",
    "inferenceTime": "2023-04-21T07:37:45Z"
  },
  "eventVersion": "0"
}


## 5. Model Monitor - Baselining and continuous monitoring

### 5.1 Constraint suggestion with baseline/training dataset

* copy over the training dataset to Amazon S3 (if you already have it in Amazon S3, you could reuse it)

In [167]:
strS3DataBaselinePrefix = os.path.join(
    strPrefix,
    "monitor",
    "baselining",
    "data"
)
strS3DataBaselineDataPrefix = os.path.join(
    strS3DataBaselinePrefix,
    "data"
)
strS3DataBaselineResultsPrefix = os.path.join(
    strS3DataBaselinePrefix,
    "results"
)
strS3DataBaselineDataUri = os.path.join(
    "s3://{}".format(strBucketName),
    strS3DataBaselineDataPrefix
)
strS3DataBaselineResultsUri = os.path.join(
    "s3://{}".format(strBucketName),
    strS3DataBaselineResultsPrefix
)

print (f'strS3DataBaselinePrefix: {strS3DataBaselinePrefix}')
print (f'strS3DataBaselineDataUri: {strS3DataBaselineDataUri}')
print (f'strS3BaselineResultsUri: {strS3BaselineResultsUri}')

strS3DataBaselinePrefix: DJ-SM-IMD/monitor/baselining/data
strS3DataBaselineDataUri: s3://sagemaker-us-east-1-419974056037/DJ-SM-IMD/monitor/baselining/data/data
strS3BaselineResultsUri: s3://sagemaker-us-east-1-419974056037/DJ-SM-IMD/monitor/baselining/data/results


* add probability
    - model drift와 함께 사용하기 위함

In [168]:
pdTrain = pd.read_csv(f'{strLocalDataPath}/train.csv')
pdTrain['probability'] = pdTrain['fraud']
listCols = ["probability"] + [strCol for strCol in pdTrain.columns if strCol != "probability"]
pdTrain = pdTrain[listCols]
pdTrain.to_csv(f'{strLocalDataPath}/train.csv', index=False, header=True)

* change dtype
    - 원하는 형태의 dtype으로 정의 할 수 있음

In [169]:
dicDtypes = {}
for strCol, dtype in zip(pdTrain.columns, pdTrain.dtypes):
    strDtype = str(dtype)
    if strDtype == "int64": dtype = np.float64 
    dicDtypes[strCol] = dtype
dicDtypes
pdTrain = pd.read_csv(f'{strLocalDataPath}/train.csv', dtype=dicDtypes)
pdTrain.dtypes

probability                             float64
fraud                                   float64
vehicle_claim                           float64
total_claim_amount                      float64
customer_age                            float64
months_as_customer                      float64
num_claims_past_year                    float64
num_insurers_past_5_years               float64
policy_deductable                       float64
policy_annual_premium                   float64
customer_zip                            float64
auto_year                               float64
num_vehicles_involved                   float64
num_injuries                            float64
num_witnesses                           float64
injury_claim                            float64
incident_month                          float64
incident_day                            float64
incident_dow                            float64
incident_hour                           float64
policy_state_AZ                         

* upload train data to s3

In [170]:
from io import StringIO
s3_client = boto3.client("s3")
s3_key = os.path.join(strS3DataBaselinePrefix, "data", "train.csv")

with StringIO() as csv_buffer:
    pdTrain.to_csv(csv_buffer, index=False, header=True)
    response = s3_client.put_object(
        Bucket=strBucketName, Key=s3_key, Body=csv_buffer.getvalue()
    )
# training_data_file = open(f'{strLocalDataPath}/train.csv', "rb")
# s3_key = os.path.join(strS3BaselinePrefix, "data", "train.csv")
# boto3.Session().resource("s3").Bucket(strBucketName).Object(s3_key).upload_fileobj(training_data_file)

* Create a baselining job with training dataset

In [171]:
from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat

In [172]:
my_default_monitor = DefaultModelMonitor(
    role=strExecutionRole,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

my_default_monitor.suggest_baseline(
    baseline_dataset=os.path.join(
        strS3DataBaselineDataUri,
        "train.csv"
    ),
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=strS3DataBaselineResultsUri,
    wait=True,
)

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: .
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating processing-job with name baseline-suggestion-job-2023-04-21-07-40-59-309


..........................[34m2023-04-21 07:45:21,790 - matplotlib.font_manager - INFO - Generating new fontManager, this may take some time...[0m
[34m2023-04-21 07:45:22.327015: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory[0m
[34m2023-04-21 07:45:22.327046: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.[0m
[34m2023-04-21 07:45:23.854598: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory[0m
[34m2023-04-21 07:45:23.854628: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)[0m
[34m2023-04-21 07:45:23.854649: I tensorflow/stream_executor/cuda/cuda_diagnostic

<sagemaker.processing.ProcessingJob at 0x7fcc48acf970>

* Explore the generated constraints and statistics

In [175]:
result = s3_client.list_objects(Bucket=strBucketName, Prefix=strS3DataBaselineResultsPrefix)
report_files = [report_file.get("Key") for report_file in result.get("Contents")]
print("Found Files:")
print("\n ".join(report_files))

Found Files:
DJ-SM-IMD/monitor/baselining/data/results/constraints.json
 DJ-SM-IMD/monitor/baselining/data/results/statistics.json


In [176]:
baseline_job = my_default_monitor.latest_baselining_job
schema_df = pd.io.json.json_normalize(baseline_job.baseline_statistics().body_dict["features"])
schema_df.head(3)

  schema_df = pd.io.json.json_normalize(baseline_job.baseline_statistics().body_dict["features"])


Unnamed: 0,name,inferred_type,numerical_statistics.common.num_present,numerical_statistics.common.num_missing,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max,numerical_statistics.distribution.kll.buckets,numerical_statistics.distribution.kll.sketch.parameters.c,numerical_statistics.distribution.kll.sketch.parameters.k,numerical_statistics.distribution.kll.sketch.data
0,probability,Fractional,4000,0,0.03275,131.0,0.177982,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 3870.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 130.0}]",0.64,2048.0,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...]]"
1,fraud,Fractional,4000,0,0.03275,131.0,0.177982,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'count': 3870.0}, {'lower_bound': 0.1, 'upper_bound': 0.2, 'count': 0.0}, {'lower_bound': 0.2, 'upper_bound': 0.3, 'count': 0.0}, {'lower_bound': 0.3, 'upper_bound': 0.4, 'count': 0.0}, {'lower_bound': 0.4, 'upper_bound': 0.5, 'count': 0.0}, {'lower_bound': 0.5, 'upper_bound': 0.6, 'count': 0.0}, {'lower_bound': 0.6, 'upper_bound': 0.7, 'count': 0.0}, {'lower_bound': 0.7, 'upper_bound': 0.8, 'count': 0.0}, {'lower_bound': 0.8, 'upper_bound': 0.9, 'count': 0.0}, {'lower_bound': 0.9, 'upper_bound': 1.0, 'count': 130.0}]",0.64,2048.0,"[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...]]"
2,vehicle_claim,Fractional,4000,0,17379.745971,69518980.0,10122.664885,1000.0,51051.625749,"[{'lower_bound': 1000.0, 'upper_bound': 6005.162574949059, 'count': 339.0}, {'lower_bound': 6005.162574949059, 'upper_bound': 11010.325149898117, 'count': 830.0}, {'lower_bound': 11010.325149898117, 'upper_bound': 16015.487724847178, 'count': 1041.0}, {'lower_bound': 16015.487724847178, 'upper_bound': 21020.650299796234, 'count': 703.0}, {'lower_bound': 21020.650299796234, 'upper_bound': 26025.812874745294, 'count': 316.0}, {'lower_bound': 26025.812874745294, 'upper_bound': 31030.975449694357, 'count': 193.0}, {'lower_bound': 31030.975449694357, 'upper_bound': 36036.13802464341, 'count': 281.0}, {'lower_bound': 36036.13802464341, 'upper_bound': 41041.30059959247, 'count': 213.0}, {'lower_bound': 41041.30059959247, 'upper_bound': 46046.463174541524, 'count': 80.0}, {'lower_bound': 46046.463174541524, 'upper_bound': 51051.62574949059, 'count': 4.0}]",0.64,2048.0,"[[15405.795272724228, 12735.637099287029, 32888.58361431063, 15123.659358462191, 8061.140269022241, 14306.573141931169, 29451.50020589542, 29439.49613805037, 19381.90752746968, 20180.81471721645, 10835.56729903342, 14764.561024187442, 13147.146284812296, 25028.96414793668, 22561.390598856324, 14317.327115673936, 18111.820303094064, 9762.384673313809, 12052.399206923075, 8665.59523626169, 13098.432869981629, 5586.210489900347, 6678.061552903706, 12240.734144129983, 14634.359265117231, 10045.775088772874, 16029.73135906059, 18397.9440304374, 13699.780026478777, 11307.172494671751, 23081.11538962997, 30876.401910862045, 16095.667229660943, 21214.97043096469, 11876.46083645957, 24126.388986229034, 37486.31836156099, 8723.470720822703, 13038.368651707313, 11219.458136630645, 30784.428718482846, 18332.440692800035, 2248.3650934227912, 19043.78186225671, 1000.0, 8222.63429383272, 13114.409917697923, 34929.516482547675, 41470.84135868925, 18737.95239679269, 38722.84336808005, 6541.569075759006, 16799.620030309274, 24479.67371391983, 31124.169169585977, 16281.68325557183, 19509.652012044466, 21581.182639311104, 20646.948067465048, 11481.017367169872, 23740.53582980563, 19051.460587494857, 17208.733630240167, 37822.2773062797, 6778.262052259654, 34253.87451735749, 9581.628788128715, 19642.835836630253, 23367.381638342267, 22925.234849123623, 12986.907538982265, 20157.253325862708, 7291.038413247107, 14164.157067605976, 44953.44324280119, 10906.311236469395, 19070.258049482072, 9901.942275836893, 28913.00007261964, 20694.773625164355, 10696.9205964719, 9428.872270467651, 10176.22475134594, 11500.154958524176, 13263.11594711812, 34478.10546029263, 12938.708114597235, 15720.26527397327, 18233.43694353492, 12821.244584788114, 15747.008919490398, 17552.756934355322, 10736.673256396642, 8475.005865614363, 43632.12021378607, 12447.480125904018, 10807.720129747107, 43399.23596070535, 6231.29195145205, 13463.436593639868, ...], [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1100.307715641697, 1327.144263260232, 1399.8919199685558, 1481.129982466243, 1544.3578724933868, 1671.26098783793, 2005.9312718581623, 2026.1852707427315, 2133.482432540692, 2330.1772506860448, 2395.7965016545586, 2405.212787144325, 2554.232689547385, 2630.367072808707, 2661.872635649036, 2695.806859952713, 2860.788194474469, 2886.0800812893926, 3079.2368593591164, 3153.681333803537, 3231.375557642424, 3433.957941931701, 3492.97263375724, 3523.179173796264, 3547.556974918307, 3604.0782676411145, 3700.788783482679, 3779.5387491263054, 3855.883198018855, 3937.272540636429, 4025.18181395994, 4163.180731335177, 4176.492041539203, 4257.962663891289, 4282.01404291706, 4325.642347040866, 4355.027089150685, 4456.133691234346, 4533.862887223978, 4554.214565192528, 4639.558933652257, 4670.291286442472, 4716.911324900945, 4825.364157992912, 4831.666431800193, 4903.471480612036, 4943.037605084623, 5000.440720370012, 5012.2231298961215, 5045.873855110478, 5110.796832061224, 5160.441817764229, 5241.029348500002, 5295.831183583686, 5317.894164054604, 5481.401809827034, 5509.459417318848, 5514.8958833704, 5574.295265842824, 5585.084827253827, 5670.872315691069, 5690.05261797502, 5709.122163864185, 5718.570984163779, 5744.3791884395, 5775.2189541705775, 5784.911207852844, 5805.31589968594, 5834.309990571884, 5854.230127763264, 5880.8164530532895, 5918.564993932315, 5954.0281306010265, 6004.929438298676, 6043.810506024858, 6057.908367841268, 6067.841321543639, 6110.972607840546, 6178.820511439275, 6203.618912007819, 6212.260933286265, 6243.02887344688, 6272.752937573588, 6320.259440033184, ...]]"


### 5.2 Analyze collected data for data quality issues

* Upload some test scripts to the S3 bucket for pre- and post-processing

In [177]:
bucket = boto3.Session().resource("s3").Bucket(strBucketName)
strLocalCodePrefix = os.path.join(os.getcwd(), "source", "monitor")
strS3CodePrepKey = os.path.join(
    strPrefix,
    "monitor",
    "code",
    "data",
    "prep",
    "preprocessor.py"
)
strS3CodePrepUri = os.path.join(
    "s3://{}".format(strBucketName),
    strS3CodePrepKey
)
strS3CodePostpKey = os.path.join(
    strPrefix,
    "monitor",
    "code",
    "data",
    "postp",
    "postprocessor.py"
)
strS3CodePostpUri = os.path.join(
    "s3://{}".format(strBucketName),
    strS3CodePostpKey
)
print (strLocalCodePrefix)
print (strS3CodePostpUri)
print (strS3CodePrepUri)
boto3.Session().resource("s3").Bucket(strBucketName).Object(strS3CodePostpKey).upload_file(strLocalCodePrefix + "/postprocessor.py")
boto3.Session().resource("s3").Bucket(strBucketName).Object(strS3CodePrepKey).upload_file(strLocalCodePrefix + "/preprocessor.py")

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


/home/ec2-user/SageMaker/sagemaker-immersion-day/source/monitor
s3://sagemaker-us-east-1-419974056037/DJ-SM-IMD/monitor/code/data/postp/postprocessor.py
s3://sagemaker-us-east-1-419974056037/DJ-SM-IMD/monitor/code/data/prep/preprocessor.py


* Create a schedule

In [178]:
from time import strftime, gmtime
from sagemaker.model_monitor import CronExpressionGenerator

In [179]:
mon_schedule_name = "DEMO-data-drift-monitor-schedule-" + strftime(
    "%Y-%m-%d-%H-%M-%S", gmtime()
)

strS3ReportPath = os.path.join(
    "s3://{}".format(strBucketName),
    strPrefix,
    "monitor",
    "report",
    "data"
)

mon_schedule_name

'DEMO-data-drift-monitor-schedule-2023-04-21-07-48-14'

In [180]:
my_default_monitor.create_monitoring_schedule(
    monitor_schedule_name=mon_schedule_name,
    endpoint_input=strEndpointName,
    record_preprocessor_script=strS3CodePrepUri,
    # post_analytics_processor_script=strS3CodePostpUri,
    output_s3_uri=strS3ReportPath,
    statistics=my_default_monitor.baseline_statistics(),
    constraints=my_default_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
)

INFO:sagemaker.model_monitor.model_monitoring:Creating Monitoring Schedule with name: DEMO-data-drift-monitor-schedule-2023-04-21-07-48-14


### 5.3 Violations report
- https://docs.aws.amazon.com/ko_kr/sagemaker/latest/dg/model-monitor-interpreting-violations.html

    - data_type_check
        - 현재 실행의 데이터 유형이 기준 데이터 세트의 데이터 유형과 다를 경우 이 위반에 플래그가 지정됩니다.
        - 기준 단계 동안 생성된 제약 조건은 각 열에 대해 추론된 데이터 유형을 제안합니다. 위반으로 플래그가 지정된 경우 임계값을 조정하도록 monitoring_config.datatype_check_threshold 파라미터를 튜닝할 수 있습니다.

    - completeness_check	
        - 현재 실행에서 관찰된 완전성(null이 아닌 항목의 %)이 기능별로 지정된 완전성 임계값에 지정된 임계값을 초과하면 이 위반에 플래그가 지정됩니다.
        - 기준 단계 동안 생성된 제약 조건은 완전성 값을 제안합니다.

    - baseline_drift_check	
        - 현재 데이터 세트와 기준 데이터 세트 간에 계산된 분포 거리가 monitoring_config.comparison_threshold에 지정된 임계값보다 크면 이 위반에 플래그가 지정됩니다.
    
    - missing_column_check	
        - 현재 데이터 세트의 열 수가 기준 데이터 세트의 개수보다 작으면 이 위반에 플래그가 지정됩니다.

    - extra_column_check	
        - 현재 데이터 세트의 열 수가 기준의 개수보다 많으면 이 위반에 플래그가 지정됩니다.

    - categorical_values_check	
        - 현재 데이터 세트의 알 수 없는 값이 기준 데이터 세트보다 더 많으면 이 위반에 플래그가 지정됩니다. 이 값은 monitoring_config.domain_content_threshold의 임계값에 의해 결정됩니다.

In [125]:
violations = my_default_monitor.latest_monitoring_constraint_violations()
pd.set_option("display.max_colwidth", None)
constraints_df = pd.io.json.json_normalize(violations.body_dict["violations"])
constraints_df.head(10)

  constraints_df = pd.io.json.json_normalize(violations.body_dict["violations"])


## [Optional] Triggering execution manually
- In oder to trigger the execution manually, we first get all paths to data capture, baseline statistics, baseline constraints, etc. Then, we use a utility fuction, defined in monitoringjob_utils.py, to run the processing job.

In [181]:
from utils.monitoringjob_utils import run_model_monitor_job_processor

In [182]:
current_endpoint_capture_prefix = os.path.join(
    strPrefix,
    "monitor",
    "data-capture",
    strEndpointName
)
result = s3_client.list_objects(Bucket=strBucketName, Prefix=current_endpoint_capture_prefix)
capture_files = [capture_file.get("Key") for capture_file in result.get("Contents")]
data_capture_path = capture_files[len(capture_files) - 1][:capture_files[len(capture_files) - 1].rfind("/")]
strS3DataCapturePath = os.path.join(
    "s3://{}".format(strBucketName),
    data_capture_path
)

strS3StatisticsPath = os.path.join(
    strS3DataBaselineResultsUri,
    "statistics.json"
)
strS3ConstraintsPath = os.path.join(
    strS3DataBaselineResultsUri,
    "constraints.json"
)
    

In [183]:
print (f'data_capture_path: {data_capture_path}')
print (f'strS3BaselineResultsUri: {strS3DataBaselineResultsUri}')

data_capture_path: DJ-SM-IMD/monitor/data-capture/endpoint-cloud-DJ-SM-IMD-1682062443/AllTraffic/2023/04/21/07
strS3BaselineResultsUri: s3://sagemaker-us-east-1-419974056037/DJ-SM-IMD/monitor/baselining/data/results


In [184]:
processor = run_model_monitor_job_processor(
    strRegionName,
    "ml.p3.2xlarge",
    strExecutionRole,
    strS3DataCapturePath,
    strS3StatisticsPath,
    strS3ConstraintsPath,
    strS3ReportPath,
    preprocessor_path=strS3CodePrepUri,
    postprocessor_path=None
)

INFO:sagemaker:Creating processing-job with name sagemaker-model-monitor-analyzer-2023-04-21-07-48-43-572


..............................[34m2023-04-21 07:53:44,560 - matplotlib.font_manager - INFO - Generating new fontManager, this may take some time...[0m
[34m2023-04-21 07:53:45.199320: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory[0m
[34m2023-04-21 07:53:45.199361: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.[0m
[34m2023-04-21 07:53:47.411637: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero[0m
[34m2023-04-21 07:53:47.412719: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No 

* clean-up (local endpoint)

In [858]:
if "local" in strInstanceType:
    xgb_predictor.delete_endpoint(strEndpointName)

INFO:sagemaker:Deleting endpoint configuration with name: endpoint-local-DJ-SM-IMD-1681953455
INFO:sagemaker:Deleting endpoint with name: endpoint-local-DJ-SM-IMD-1681953455


Gracefully stopping... (press Ctrl+C again to force)


* save endpoint name

In [122]:
pm.put_params(key="-".join([strPrefix, "ENDPOINT-NAME-DEPLOY"]), value=strEndpointName, overwrite=True)

'Store suceess'

In [124]:
pm.get_params(key="-".join([strPrefix, "ENDPOINT-NAME-DEPLOY"]))

'endpoint-cloud-DJ-SM-IMD-1682057307'

모델모니터
https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/model_quality/model_quality_churn_sdk.html