# AWS Sagemaker Model Monitoring

In this notebook a simple examples of these are shown:
- Sagemaker Feature Store
- Model Monitor and Model Clarify

Dataset: [Wine Dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html)

In [2]:
import sagemaker
from sagemaker.session import Session
from sagemaker import get_execution_role

role = get_execution_role()
session = sagemaker.Session()
region = session.boto_region_name
bucket = session.default_bucket()

# Feature Store

First a simple example of using Sagemaker Feature Store


## Loading libraries and Data

In [3]:
import pandas as pd
from sklearn import datasets
import time
import uuid

data = datasets.load_wine()
df = pd.DataFrame(data['data'])
df.columns = data['feature_names']

Visualizing the head of the dataframe as usual

In [4]:
df.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0


Feature store does not accpet '/' in the columns/features name so we rename this column/feature

In [5]:
df.rename(columns = {'od280/od315_of_diluted_wines':'od280_od315_of_diluted_wines'}, inplace=True)
df.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280_od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0


We add two columns to the data. One for an 'id' and one for an 'EventTime'. These are some tags that are added to the data and can make future analysis easier. [Link](https://docs.aws.amazon.com/sagemaker/latest/dg/feature-store.html)

In [6]:
# Add tags
df["EventTime"] = time.time()
df["id"] = range(len(df))

df.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280_od315_of_diluted_wines,proline,EventTime,id
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,1661696000.0,0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,1661696000.0,1
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,1661696000.0,2
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,1661696000.0,3
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,1661696000.0,4


Now the data is ready to create a feature group. The steps to take are:
- Data preparation and adding tags (done above)
- Creating a feature group object by providing a name and session
- Loading the list of the features
- Calling the create method of the feature group object that was defined in step two by giving 1- an S3 bucket 2- records identifier column (id tag) 3- Event time for the features (Event time tag) 4- a role 5- option to store
- Ingesting the data

### Creating a feature group object

In [7]:
from sagemaker.feature_store.feature_group import FeatureGroup

feature_group = FeatureGroup(
    name = "wine-features", 
    sagemaker_session = session
)

### Creating a feature group object

In [8]:
feature_group.load_feature_definitions(data_frame = df)

[FeatureDefinition(feature_name='alcohol', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>),
 FeatureDefinition(feature_name='malic_acid', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>),
 FeatureDefinition(feature_name='ash', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>),
 FeatureDefinition(feature_name='alcalinity_of_ash', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>),
 FeatureDefinition(feature_name='magnesium', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>),
 FeatureDefinition(feature_name='total_phenols', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>),
 FeatureDefinition(feature_name='flavanoids', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>),
 FeatureDefinition(feature_name='nonflavanoid_phenols', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>),
 FeatureDefinition(feature_name='proanthocyanins', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>),
 FeatureDefinition(feature_name='color_intensity',

The feature group is not created until we call the `create` method, let's do that now:

### Create the feature store:

... by calling the create method of the object:

In [10]:
feature_group.create(
    s3_uri=f"s3://{bucket}/features",
    record_identifier_name='id',
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True,
)

{'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:286375333242:feature-group/wine-features',
 'ResponseMetadata': {'RequestId': 'a446807b-ab8b-4bb2-99e4-6f79950d4d60',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'a446807b-ab8b-4bb2-99e4-6f79950d4d60',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '90',
   'date': 'Sun, 28 Aug 2022 14:13:16 GMT'},
  'RetryAttempts': 0}}

We can rune the line below when we want to delete the Feature store later.

In [11]:
#feature_group.delete()

### Ingesting the data:

We may need to wait for a short while until we can successfuly run this.

In [13]:
feature_group.ingest(data_frame = df, max_workers = 3, wait = True)

IngestionManagerPandas(feature_group_name='wine-features', sagemaker_session=<sagemaker.session.Session object at 0x7f483b9fc400>, data_frame=     alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  \
0      14.23        1.71  2.43               15.6      127.0           2.80   
1      13.20        1.78  2.14               11.2      100.0           2.65   
2      13.16        2.36  2.67               18.6      101.0           2.80   
3      14.37        1.95  2.50               16.8      113.0           3.85   
4      13.24        2.59  2.87               21.0      118.0           2.80   
..       ...         ...   ...                ...        ...            ...   
173    13.71        5.65  2.45               20.5       95.0           1.68   
174    13.40        3.91  2.48               23.0      102.0           1.80   
175    13.27        4.28  2.26               20.0      120.0           1.59   
176    13.17        2.59  2.37               20.0      120.0        

### Example Test:

We can make a client to retrieve data as an example.
The data returned from the Feature Store is in the form of a JSON file of that record as expected.

In [14]:
runtime = session.boto_session.client(
  'sagemaker-featurestore-runtime',
  region_name=region
)

result = runtime.get_record(
    FeatureGroupName = "wine-features",
    RecordIdentifierValueAsString="0"
)

print(result)

{'ResponseMetadata': {'RequestId': 'fd0fdc9e-8844-41ee-a25f-465e983b55df', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'fd0fdc9e-8844-41ee-a25f-465e983b55df', 'content-type': 'application/json', 'content-length': '824', 'date': 'Sun, 28 Aug 2022 14:14:12 GMT'}, 'RetryAttempts': 0}, 'Record': [{'FeatureName': 'alcohol', 'ValueAsString': '14.23'}, {'FeatureName': 'malic_acid', 'ValueAsString': '1.71'}, {'FeatureName': 'ash', 'ValueAsString': '2.43'}, {'FeatureName': 'alcalinity_of_ash', 'ValueAsString': '15.6'}, {'FeatureName': 'magnesium', 'ValueAsString': '127.0'}, {'FeatureName': 'total_phenols', 'ValueAsString': '2.8'}, {'FeatureName': 'flavanoids', 'ValueAsString': '3.06'}, {'FeatureName': 'nonflavanoid_phenols', 'ValueAsString': '0.28'}, {'FeatureName': 'proanthocyanins', 'ValueAsString': '2.29'}, {'FeatureName': 'color_intensity', 'ValueAsString': '5.64'}, {'FeatureName': 'hue', 'ValueAsString': '1.04'}, {'FeatureName': 'od280_od315_of_diluted_wines', 'ValueAsString

# Model Monitor

The goal here is to demonstrate model monitoring in AWS Sagemaker. But first we need to create a model and deploy it so that we can monitor that. 

### Creating a predictor

We can as demonstrated before make a XGBoost predictor for the wine dataset. Data is already loaded but we need to add the target column from the data to the df (our dataframe) as the first column as described in XGBoost documentation [XGBoost](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html)

As we know the target shows one of the three classes for each wine.

In [15]:
df.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280_od315_of_diluted_wines,proline,EventTime,id
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,1661696000.0,0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,1661696000.0,1
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,1661696000.0,2
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,1661696000.0,3
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,1661696000.0,4


In [16]:
print(data['target'])
df["TARGET"] = data['target']
#df.set_index(df.pop('TARGET'), inplace=True)
df.set_index(df.pop('TARGET'), inplace=True)
df.reset_index(inplace=True)
df.head()

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]


Unnamed: 0,TARGET,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280_od315_of_diluted_wines,proline,EventTime,id
0,0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,1661696000.0,0
1,0,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,1661696000.0,1
2,0,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,1661696000.0,2
3,0,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,1661696000.0,3
4,0,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,1661696000.0,4


We drop the EventTime and id that were added for the feature store.

In [17]:
df = df.drop(['EventTime', 'id'], axis = 1)
df.head()

Unnamed: 0,TARGET,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280_od315_of_diluted_wines,proline
0,0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,0,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0
2,0,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0
3,0,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0
4,0,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0


Since the target is not randomized we need to randomize them for train test split.

In [18]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size = 0.3)
test.head()

Unnamed: 0,TARGET,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280_od315_of_diluted_wines,proline
35,0,13.48,1.81,2.41,20.5,100.0,2.7,2.98,0.26,1.86,5.1,1.04,3.47,920.0
137,2,12.53,5.51,2.64,25.0,96.0,1.79,0.6,0.63,1.1,5.0,0.82,1.69,515.0
109,1,11.61,1.35,2.7,20.0,94.0,2.74,2.92,0.29,2.49,2.65,0.96,3.26,680.0
57,0,13.29,1.97,2.68,16.8,102.0,3.0,3.23,0.31,1.66,6.0,1.07,2.84,1270.0
49,0,13.94,1.73,2.27,17.4,108.0,2.88,3.54,0.32,2.08,8.9,1.12,3.1,1260.0


In [19]:
train.head()

Unnamed: 0,TARGET,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280_od315_of_diluted_wines,proline
149,2,13.08,3.9,2.36,21.5,113.0,1.41,1.39,0.34,1.14,9.4,0.57,1.33,550.0
95,1,12.47,1.52,2.2,19.0,162.0,2.5,2.27,0.32,3.28,2.6,1.16,2.63,937.0
54,0,13.74,1.67,2.25,16.4,118.0,2.6,2.9,0.21,1.62,5.85,0.92,3.2,1060.0
138,2,13.49,3.59,2.19,19.5,88.0,1.62,0.48,0.58,0.88,5.7,0.81,1.82,580.0
168,2,13.58,2.58,2.69,24.5,105.0,1.55,0.84,0.39,1.54,8.66,0.74,1.8,750.0


We need to upload the data to our S3 bucket and save the location for train and test files:

In [20]:
train.to_csv("train.csv", header = False, index = False)
test.to_csv("validation.csv", header = False, index = False)

val_location = session.upload_data('./validation.csv', key_prefix="data")
train_location = session.upload_data('./train.csv', key_prefix="data")

s3_input_train = sagemaker.inputs.TrainingInput(s3_data=train_location, content_type='csv')
s3_input_validation = sagemaker.inputs.TrainingInput(s3_data=val_location, content_type='csv')

In [21]:
print(val_location)

s3://sagemaker-us-east-1-286375333242/data/validation.csv


Next, we can load an image for XGBoost, create an estimator object, set the hyper-parameters and fit the model.

The training time is short for this dataset but setting up the instance for the training job may take a short while.

In [22]:
algo_image = sagemaker.image_uris.retrieve("xgboost", region, version='latest')
s3_output_location = f"s3://{bucket}/models/wine_model"

model=sagemaker.estimator.Estimator(
    image_uri=algo_image,
    role=role,
    instance_count=1,
    instance_type='ml.m4.xlarge',
    volume_size=5,
    output_path=s3_output_location,
    sagemaker_session=sagemaker.Session()
)

model.set_hyperparameters(
                        max_depth = "3",
                        eta = "0.1",
                        gamma = "4",
                        objective = 'multi:softmax',
                        num_class = "3",
                        eval_metric = "mlogloss",
                        num_round = "200"
                        )


model.fit({'train': s3_input_train, 'validation': s3_input_validation})

2022-08-28 14:15:13 Starting - Starting the training job...
2022-08-28 14:15:40 Starting - Preparing the instances for trainingProfilerReport-1661696113: InProgress
.........
2022-08-28 14:16:58 Downloading - Downloading input data...
2022-08-28 14:17:39 Training - Downloading the training image......
2022-08-28 14:18:39 Training - Training image download completed. Training in progress.[34mArguments: train[0m
[34m[2022-08-28:14:18:38:INFO] Running standalone xgboost training.[0m
[34m[2022-08-28:14:18:38:INFO] File size need to be processed in the node: 0.01mb. Available memory size in the node: 8829.8mb[0m
[34m[2022-08-28:14:18:38:INFO] Determined delimiter of CSV input is ','[0m
[34m[14:18:38] S3DistributionType set as FullyReplicated[0m
[34m[14:18:38] 124x13 matrix with 1612 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,[0m
[34m[2022-08-28:14:18:38:INFO] Determined delimiter of CSV input is ','[0m
[34m[14:18:38] S3DistributionType s

# Defining Monitors

AWS design pattern for monitors is a bit complicated. A simplified visualization of the object to be defined is shown in the README of the github. Based on that we can start defining the monitors.

As it can be seen from the design pattern, if we want to have a monitor for to get statistics about the model, since the Endpoint can/has a Data_Capture_Config object, before deploying the model (which creates the endpoint for the model), we first need to define a Data Capture Config.

In [23]:
from sagemaker.model_monitor import DataCaptureConfig

capture_uri = f's3://{bucket}/data-capture'

data_capture_config = DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100,
    destination_s3_uri=capture_uri
)

Now, we can send this Data Capture Config to create the endpoint and deploy the model:

Again, since we are creating an instance, this may take a short while...

In [24]:
xgb_predictor = model.deploy(
    initial_instance_count=1, instance_type='ml.m4.xlarge',
    data_capture_config=data_capture_config
)

-----------------!

### Testing the deployed model

In [27]:
xgb_predictor.serializer = sagemaker.serializers.CSVSerializer()

newData = test.copy()
testData = newData.sample(5)
TrueLabels = testData['TARGET'].values
inputs = testData.drop(['TARGET'], axis = 1).values
preds = xgb_predictor.predict(inputs).decode('utf-8')

print("True Labels: " + str(TrueLabels))
print("Predictions: " + str(preds))

True Labels: [1 1 0 2 1]
Predictions: 1.0,1.0,0.0,2.0,1.0


## Defining a Clarify Monitor

As shown in the design pattern, the Endpoint can have different observer monitors. Here we define a clarify monitor which is a bit more complicated than the defaul monitor.

The default monitor could be enough and useful in situations where for example the output of the model is drifting and we want to alert or trigger a retraining.

As we can see from the design pattern, to make a Clarify (Model Explainability Monitor) we need to define:
- A cron Expression Generator: Define the schedule
- An Explainability Analysis Config. This itself has:
    - A model config: Define the data type that is accepted by the endpoint
    - An algorithm to run for the explanation. For example SHAP algorithm is used to calculate features' importance
    
First define the algorithm config:

In [28]:
explainability_config = sagemaker.clarify.SHAPConfig(
    baseline = [train.mean().astype(int).to_list()[1:]],
    num_samples = int(train.size),
    agg_method = "mean_abs",
    save_local_shap_values = False,
)

Then we define model config:

In [29]:
model_config = sagemaker.clarify.ModelConfig(
    model_name="xgboost-2021-08-25-15-19-33-499",
    instance_count=1,
    instance_type='ml.m4.xlarge',
    content_type="text/csv", #format of data that endpoint accepts
    accept_type="text/csv",
)

Now, from the design pattern, we can define the Explainability Analysis Config, which will be later passed to the Clarify (Model Explainability Monitor)

In [30]:
analysis_config = sagemaker.model_monitor.ExplainabilityAnalysisConfig(
        explainability_config = explainability_config, #object for explainability
        model_config = model_config,         #object for configuration
        headers = train.columns.to_list()[1:],
    )

We first need to create the monitor object, and then pass the Cron Expression Generator (for scheduling) and our Analysis Config.

In [31]:
model_explainability_monitor = sagemaker.model_monitor.ModelExplainabilityMonitor(
    role = role,
    sagemaker_session = session,
    max_runtime_in_seconds = 1800,
)

Now, everything is really to create the monitor as an observer to the model endpoint. We need to pass:
- An S3 location to save monitor data
- Model Exmplainability Config that was created.
- Cron Expression Generator.
- The endpoint of the model to be monitored.

In [32]:
from sagemaker.model_monitor import CronExpressionGenerator


explainability_uri = explainability_uri = f"s3://{bucket}/model_explainability"

model_explainability_monitor.create_monitoring_schedule(
    output_s3_uri = explainability_uri,
    analysis_config = analysis_config,
    endpoint_input = xgb_predictor.endpoint_name,
    schedule_cron_expression = CronExpressionGenerator.hourly(),
)

That creates the monitors and we can check them under Sagemaker_Resources -> Endpoints

We can also check the performance of this monitor by running the following cell to call the endpoint for inference which would trigger the monitor...


In [34]:
import time
import random
count = 10
while count > 0:
    count -= 1
    time.sleep(random.randint(1, 60))
    print("Now testing an inference:")
    xgb_predictor.predict(newData.sample(5).drop(['TARGET'], axis = 1).values).decode('utf-8')

Now testing an inference:
Now testing an inference:
Now testing an inference:
Now testing an inference:
Now testing an inference:
Now testing an inference:
Now testing an inference:
Now testing an inference:
Now testing an inference:
Now testing an inference:


To avoid costs we delete the monitors and endpoints after this demo:

In [35]:
monitors = xgb_predictor.list_monitors()
for monitor in monitors:
    monitor.delete_monitoring_schedule()
    
xgb_predictor.delete_endpoint()


Deleting Monitoring Schedule with name: monitoring-schedule-2022-08-28-14-30-14-095
