# Monitoring an Amazon SageMaker Model with Arthur
#### Host a trained machine learning model in Amazon SageMaker and log that model's inferences in Arthur


This notebook shows how to:
* Host a machine learning model in Amazon SageMaker and capture inference requests, results, and metadata
* Set up logging of the inputs and outputs of that model into the Arthur platform


**Table of Contents** 

 [Introduction](#intro)
1. [Section 1 - Setup](#setup)
2. [Section 2 - Deploy pre-trained model with data capture enabled](#deploy)
3. [Section 3 - Building your Arthur model.](#BuildArthur)
4. [Section 4 - Sending inferences through SageMaker and Capturing the Data](#SendInferences)




## Introduction <a id='intro'></a>    

Amazon SageMaker provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML. Amazon SageMaker is a fully-managed service that encompasses the entire ML workflow. You can label and prepare your data, choose an algorithm, train a model, and then tune and optimize it for deployment. You can deploy your models to production with Amazon SageMaker. With minimal set up, you can then log that model's inferences in the Arthur platform.  

In this notebook, you learn how to use Amazon SageMaker with Arthur to monitor the inferences of your in-production ML models. 

## Section 1 - Setup <a id='setup'></a>

In this section, you will import the necessary libraries, setup variables, and set up access to both Arthur and AWS.

Let's start by specifying:

* Class definitions specific to the pretrained model we will be using
* Your Arthur credentials
* Your Arthur model metadata
* The AWS region used to host your model.
* The IAM role associated with this SageMaker notebook instance.
* The S3 bucket used to store the data used to train your model, any additional model data, and the data captured from model invocations.

#### 1.1 Import necessary libraries

In [None]:
from datetime import date, datetime, time, timedelta, timezone
import json
import os
import re
import boto3
from time import sleep
from threading import Thread
import sys
import pytz

import pandas as pd
import numpy as np

from scripts.data import download_model, download_reference_dataset, download_test_dataset, \
    MODEL_METADATA_PATH, REFERENCE_DATA_PATH, TEST_DATA_PATH

# Importing sagemaker packages

import sagemaker
from sagemaker import get_execution_role, session, Session, image_uris
from sagemaker.s3 import S3Downloader, S3Uploader
from sagemaker.processing import ProcessingJob
from sagemaker.serializers import CSVSerializer

from sagemaker.model import Model
from sagemaker.model_monitor import DataCaptureConfig

In [None]:
# Define the class required for our pretrained model

from sagemaker.mxnet import MXNetModel
from sagemaker import image_uris

class AutoGluonInferenceModel(MXNetModel):
    def __init__(
        self,
        model_data,
        role,
        entry_point,
        region,
        framework_version,
        py_version,
        instance_type,
        **kwargs,
    ):
        image_uri = image_uris.retrieve(
            "autogluon",
            region=region,
            version=framework_version,
            py_version=py_version,
            image_scope="inference",
            instance_type=instance_type,
        )
        super().__init__(
            model_data, role, entry_point, image_uri=image_uri, framework_version="1.8.0", **kwargs
        )


In [None]:
# Instantiate SageMaker Session

session = Session()

In [None]:
# Importing Arthur packages

from arthurai import ArthurAI
from arthurai.common.constants import InputType, OutputType, ValueType, Stage

#### 1.2 Connecting to Arthur and instantiating an Arthur model object

In [None]:
# Connecting to Arthur
# Please provide the url and api key to your Arthur instance

url = ''
api_key = ''

# credentials are being passed to the client via environment variables
connection = ArthurAI(url=url, access_key=api_key)

In [None]:
# Initialize an Arthur Model Object
arthur_model = connection.model(partner_model_id=f"SageMakerModel_{datetime.now().strftime('%Y%m%d%H%M%S')}",
                                display_name="SageMakerArthurGluonDemo1",
                                input_type=InputType.Tabular,
                                output_type=OutputType.Multiclass,
                                is_batch=True)

#### 1.3 AWS region and  IAM Role

In [None]:
# Replace with the ARN for an AWS IAM role (we recommend with the AmazonSageMakerFullAccess Permission Policy attached)
role = ''

# To successfully run this notebook, ensure that valid AWS credentials have been set in the shell environment from which this notebook is running
sagemaker_session = sagemaker.session.Session()
region = sagemaker_session._region_name

#### 1.4 S3 bucket and prefixes

In [None]:
# Setup S3 bucket
# You can use a different bucket, but make sure the role you chose for this notebook
# has the s3:PutObject permissions. This is the bucket into which the data is captured

bucket = sagemaker_session.default_bucket()
s3_prefix = f"autogluon_sm/{sagemaker.utils.sagemaker_timestamp()}"
output_path = f"s3://{bucket}/{s3_prefix}/output/"

# Data Capture prefixes
data_capture_prefix = f"{s3_prefix}/datacapture"
s3_capture_upload_path = f"s3://{bucket}/{data_capture_prefix}"

print(f"Capture path: {s3_capture_upload_path}")

#### 1.5 Test access to the S3 bucket
Let's quickly verify that the notebook has the right permissions to access the S3 bucket specified above.
Upload a simple test object into the S3 bucket.  If this command fails, the data capture and model monitoring capabilities will not work from this notebook.  You can fix this by updating the role associated with this notebook instance to have "s3:PutObject" permissions and try this validation again

In [None]:
# Upload a test file
S3Uploader.upload_string_as_file_body(body="test file", desired_s3_uri=f"s3://{bucket}/test_upload")

# Remove from S3 bucket once upload capability is confirmed
boto3.resource('s3').Object(bucket, "test_upload").delete()

print("Success! You are all set to proceed.")

## Section 2 - Deploy pre-trained model with data capture enabled <a id='deploy'></a>

In this section, you will upload the pretrained model to the S3 bucket, create an Amazon SageMaker Model, create an Amazon SageMaker real time endpoint, and enable data capture on the endpoint to capture endpoint invocations, predictions, and metadata.

#### 2.1 Upload the pre-trained model to S3

This code uploads a pre-trained model and gets it ready to deploy. If you already have a pretrained model in Amazon S3, you can add it instead by specifying the s3_key.


In [None]:
# Get model data to create your SageMaker model

download_model() # Pass in `skip_if_exists=False` to download the model metadata file even if one already exists

model_data = sagemaker_session.upload_data(path=str(MODEL_METADATA_PATH))

#### 2.2 Create SageMaker Model entity

This step creates an Amazon SageMaker model from the  model_data.

In [None]:
instance_type = "ml.m5.2xlarge"

In [None]:
model = AutoGluonInferenceModel(
    model_data=model_data,
    role=role,
    region=region,
    framework_version="0.4",
    py_version="py38",
    instance_type=instance_type,
    source_dir="scripts",
    entry_point="tabular_serve.py",
)

#### 2.3 Deploy the model with data capture enabled.
Next, deploy the SageMaker model on a specific instance with data capture enabled.

In [None]:
endpoint_name = sagemaker.utils.unique_name_from_base("sagemaker-arthur-integration-test")

# DataCapture Configuration
capture_modes = ['REQUEST','RESPONSE']
data_capture_config = sagemaker.model_monitor.DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100,
    destination_s3_uri=s3_capture_upload_path,
    capture_options=capture_modes
)

predictor = model.deploy(
    initial_instance_count=1, serializer=CSVSerializer(), instance_type=instance_type,
    endpoint_name=endpoint_name,
    data_capture_config=data_capture_config,
)

## Section 3 - Building your Arthur model. <a id='BuildArthur'></a>  

Arthur needs a copy of (reference) data in order to, among other purposes, establish a _data schema_ - an expectation of what future data coming into the platform will look like. This schema should include all input features, a field for predictions, and a field for ground truths. If your reference data does not include these columns, it will be necessary to add them before building your Arthur model.

**Important: This step needs to be done before you start generating inferences with your SageMaker model, or those inferences will not be logged with Arthur**.

#### 3.1 Reading in and cleaning up our data.


In [None]:
# Read in reference_data.csv and format the dataframe for sending to arthur

download_reference_dataset()

df = pd.read_csv(REFERENCE_DATA_PATH, header=None)
pd.set_option('display.max_columns', None)
df.head()

In [None]:
# The first column of this dataset corresponds to the ground truth. 
# We will actually be removing/setting aside this column for the moment to mimic the state of input data that will be sent to our sagemaker model.

testDataTruth = df.iloc[: ,0]
df = df.iloc[:,1:]
df.head()

In [None]:
# Rearranging Columns to match our SageMaker output

df['index1'] = df.index
cols = df.columns.tolist()
cols = cols[-1:] + cols[:-1]
df = df[cols]

#### 3.2 Creating/formatting the prediction and ground truth columns (and mapping)

In [None]:
# Introducing column names for our input features.

num_cols = len(list(df))
rng = range(1, num_cols+1)
colNames = ['Feature_' + str(i) for i in rng]
df.columns = colNames

# Adding placeholder columns for predictions and ground truth (with their eventual dtypes)...

df['prediction_0'], df['prediction_1'] = None, None
df['prediction_0'], df['prediction_1'] = df['prediction_0'].astype('float'), df['prediction_1'].astype('float')

testDataTruth = testDataTruth.astype(int)
df['gt_0'] = 1-testDataTruth
df['gt_1'] = testDataTruth

df.head()

In [None]:
# Create a mapping between predictions and ground truth.

prediction_to_ground_truth_map = {
    "prediction_0": "gt_0",
    "prediction_1": "gt_1"
}

#### 3.2 Build and review your Arthur Model.

In [None]:
# Building the Arthur model. We provide inputs including: 
# df - the dataframe which will help establish the data schema of our model
# "ground_truth_column" - the column which corresponds to our ground truth
# "pred_to_ground_truth_map" - the mapping which relates predicted probabilities to their corresponding class

arthur_model.build(df, pred_to_ground_truth_map=prediction_to_ground_truth_map, positive_predicted_attr="prediction_1")

In [None]:
# Saving the arthur_model and returning the model_id

model_id = arthur_model.save()
print(model_id)

with open("arthur_model_id.txt", "w+") as f:
    f.write(model_id)

##  Section 4 - Sending inferences through SageMaker and Capturing the Data <a id='SendInferences'></a>  

In this section, you will send inferences through the SageMaker endpoint we just created and generate corresponding Data Capture files in S3.

#### 4.1 Generate prediction data

The cells below send a small sample of 100 test dataset inferences as to the endpoint.

In [None]:
# Format test data to generate predictions

download_test_dataset()

test_df = pd.read_csv(TEST_DATA_PATH)
test_df = test_df.drop(columns=["class"])
test_df = test_df.iloc[0:-1, 1:]
test_df_sample = test_df[:100]

test_df.head()

In [None]:
import boto3

content_type = "text/csv"

def query_endpoint(encoded_tabular_data):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType=content_type, Body=encoded_tabular_data
    )
    return response

In [None]:
query_endpoint(test_df_sample.to_csv(header=False).encode("utf-8"))

#### 4.2 View captured data

Now list the data capture files stored in Amazon S3. You should expect to see different files from different time periods organized based on the hour in which the invocation occurred. The format of the Amazon S3 path is:

`s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`

In [None]:
print("Waiting for captures to show up", end="")
for _ in range(120):
    capture_files = sorted(S3Downloader.list(f"{s3_capture_upload_path}/{endpoint_name}"))
    if capture_files:
        capture_file = S3Downloader.read_file(capture_files[-1]).split("\n")
        capture_record = json.loads(capture_file[0])
        if "inferenceId" in capture_record["eventMetadata"]:
            break
    print(".", end="", flush=True)
    sleep(1)
print()
print("Found Capture Files:")
print("\n ".join(capture_files[-3:]))

Next, view the contents of a single capture file. Here you should see all the data captured in an Amazon SageMaker specific JSON-line formatted file. Take a quick peek at the first few lines in the captured file.

In [None]:
print("\n".join(capture_file[-3:-1]))

Finally, the contents of a single line is present below in a formatted JSON file so that you can observe a little better.


In [None]:
print(json.dumps(capture_record, indent=2))

#### 4.3 Once you've tested your data capture on a sample of your inferences. Set up your Lambda with your saved Arthur model id, then run the cell below to send the remainder of your inferences. Lambda setup documentation for Arthur's SageMaker Integration can be found at [this link](https://docs.arthur.ai/user-guide/integrations.html#aws-lambda-setup).

In [None]:
query_endpoint(test_df.to_csv(header=False).encode("utf-8"))