## Module 3: Training Custom Models Using Studio Notebook

In many situations, you may choose to build a custom model when you need to tackle a unique problem or when there isn't a pre-built model that meets your needs. In such cases, building a custom model might involve selecting an appropriate algorithm, fine-tuning its parameters, and optimizing its performance through iterative experimentation. In this module we will going through following steps to build, track, deploy and monitor a custom model using Amazon SageMaker Studio Notebook.

- [Step 1: Pull Data from Offline Feature Store](#Pull-data-from-offline-feature-store)
- [Step 2: Train, Track, and Deploy a Xgboost Model](#Train-XGBoost-Model)
- [Step 3: Train, Track, and Deploy an Isolation Forest Model](#Train-Isolation-Forest-Model)
- [Step 4: Model Monitoring](#Model-Monitoring)
- [Step 5: Clean Up](#Clean-up)

**If you DID NOT run the previous modules, please run [0_setup.ipynb notebook](0_setup.ipynb) first before running this notebook**

**This Demo is optimized for SageMaker Studio using Studio notebook in Data Science Kernel**



### Setup

Install required and/or update libraries

In [35]:
!pip install -Uq pip --quiet

!pip install -Uq awswrangler sagemaker boto3 --quiet

### Import & Global Parameters

In [36]:
import boto3
import sagemaker
import pandas as pd

sagemaker_session = sagemaker.Session()

region = sagemaker_session.boto_region_name
sagemaker_role = sagemaker.get_execution_role()

bucket = sagemaker_session.default_bucket()

s3_client = boto3.client("s3", region_name=region)
sagemaker_client = boto3.client("sagemaker")

prefix = "telco-5g-observabiltiy"

%store region
%store bucket
%store sagemaker_role
%store prefix

Stored 'region' (str)
Stored 'bucket' (str)
Stored 'sagemaker_role' (str)
Stored 'prefix' (str)


### Pull data from offline feature store
----
In Module 1 of this workshop, we prepared the raw data and upload the final data into an Offline Feature Store. This dataset is now cataloged in a central location for management and discovery. Now we want to extract that data and build our observability models. SageMaker feature store uses athena query to pull the data and can cast the data directly into a pandas dataframe for further processing.

In [37]:
from sagemaker.feature_store.feature_group import FeatureGroup

%store -r fg_name

anomaly_features = FeatureGroup(name=fg_name, sagemaker_session=sagemaker_session)

query = anomaly_features.athena_query()

table_name = query.table_name
                       
query_string = f"""
SELECT * FROM "{table_name}"
"""

query.run(query_string=query_string, output_location=f"s3://{bucket}/{prefix}/data/query_results")
query.wait()

dataset = query.as_dataframe()

dataset

INFO:sagemaker:Query 30c12778-8193-4048-b179-cd87274b3c8c is being executed.
INFO:sagemaker:Query 30c12778-8193-4048-b179-cd87274b3c8c successfully executed.


Unnamed: 0,health,accessibility,5g_users,contention_rate,utilization,downlink_throughput,uplink_throughput,anomaly,location_id,eventtime,write_time,api_invocation_time,is_deleted
0,1.00,1.00,0.112072,0.001814,0.263812,0.044078,0.042050,1,18EIGHTYR_401,1.679772e+09,2023-03-25 19:28:38.569,2023-03-25 19:23:40.000,False
1,0.96,1.00,0.014729,0.000000,0.151934,0.009691,0.003334,0,AGUSTINMALR_401_4RFS,1.679772e+09,2023-03-25 19:28:38.569,2023-03-25 19:23:40.000,False
2,1.00,0.99,0.109190,0.000680,0.360497,0.047669,0.096913,0,18EIGHTYR_401,1.679772e+09,2023-03-25 19:28:38.569,2023-03-25 19:23:40.000,False
3,1.00,1.00,0.001281,0.000000,0.276243,0.000590,0.000726,0,AMPAROVILCALN_401_4RFS,1.679772e+09,2023-03-25 19:28:38.569,2023-03-25 19:23:41.000,False
4,0.92,1.00,0.019212,0.000000,0.156077,0.006322,0.004083,0,AGUSTINMALR_401_4RFS,1.679772e+09,2023-03-25 19:28:38.569,2023-03-25 19:23:41.000,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
84564,0.99,0.98,0.018252,0.000000,0.160221,0.015909,0.006789,0,BUGALLONMKNAR_401_4RFS,1.679772e+09,2023-03-25 19:28:50.319,2023-03-25 19:28:36.000,False
84565,1.00,1.00,0.125520,0.002948,0.393646,0.102649,0.134556,0,BOLANOS2M_353_4RFS,1.679772e+09,2023-03-25 19:28:50.319,2023-03-25 19:28:36.000,False
84566,1.00,0.99,0.028178,0.000227,0.296961,0.007017,0.013804,0,BULACANWAWAPILILARZLN-403_4RFS_None,1.679772e+09,2023-03-25 19:28:50.319,2023-03-25 19:28:36.000,False
84567,1.00,1.00,0.011207,0.000000,0.147790,0.003988,0.010599,0,BUNGADQCR_402_4RFS,1.679772e+09,2023-03-25 19:28:50.319,2023-03-25 19:28:36.000,False


### Train XGBoost Model
----

In real world, data scientist goes through hundreds of iterations to experiment with different algorithm to come up with the best model for the ML use case. Here you are going to start with a supervised learning approach and use XGboost model for our problem.

To get Your features ready for XGBoost, we need to move the target varibale to the first column for our xgboost model. You will also split the data into train & test dataset to keep a holdout set to validate model performance.

In [38]:
col_order = ["anomaly"] + list(dataset.drop(["location_id", "anomaly", "eventtime", "write_time","api_invocation_time",'is_deleted'], axis=1).columns)

train = dataset.sample(frac=0.80, random_state=0)[col_order]
test = dataset.drop(train.index)[col_order]

Upload the training data to S3

In [39]:
train.to_csv("data/train.csv", index=False)
key = f"{prefix}/data/xgboost/train.csv"

s3_client.upload_file(
    Filename="data/train.csv",
    Bucket=bucket,
    Key=key,
)

train_s3_path = f"s3://{bucket}/{key}"
print(f"training data is uploaded to {train_s3_path}")

training data is uploaded to s3://sagemaker-us-west-2-376678947624/telco-5g-observabiltiy/data/xgboost/train.csv


#### Set the hyperparameters
These are the parameters which will be sent to our training script in order to train the model. Although they are all defined as "hyperparameters" here, they can encompass XGBoost's [Learning Task Parameters](https://xgboost.readthedocs.io/en/latest/parameter.html#learning-task-parameters), [Tree Booster Parameters](https://xgboost.readthedocs.io/en/latest/parameter.html#parameters-for-tree-booster), or any other parameters you'd like to configure for XGBoost.

#### Setup Experiment Run Context
Amazon SageMaker Experiment allows data you to organize, track, compare, and evaluate experiments during the model building and training process. Experiment tracking is extremely important because it enables you to keep track of model performance and changes over time, making it easier to debug and optimize the model. It also helps in reproducing and sharing the results with others, leading to better collaboration and faster iteration. For more details reference [SageMaker documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments-create.html).

### Train a custom model on SageMaker
When it comes to training a model on SageMaker, you start by specifying the type of instance, the framework container, and any hyperparameters you want to use. When you call `estimator.fit()`, you supply the location of your training data. SageMaker will then spin up the specified instance and download your training data onto it. In the example below, we are also supplying a custom training script. This way, SageMaker will copy the script into the container and run. This makes it easy for you to iterate your code.

In [40]:
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.experiments.run import Run, load_run
from sagemaker.utils import unique_name_from_base

train_instance_count=1
train_instance_type="ml.m5.xlarge" 

experiment_name = unique_name_from_base(prefix)

run_name = unique_name_from_base("xgboost-experiment")

with Run(experiment_name=experiment_name, run_name=run_name, 
         sagemaker_session=sagemaker_session) as run:
        
    run.log_file("data/train.csv", is_output=False)
    
    hyperparameters = {
        "max_depth": "3",
        "eta": "0.2",
        "objective": "binary:logistic",
        "num_round": "100",
        "region":region
    }

    xgb_estimator = XGBoost(
        entry_point="xgboost_starter_script.py",
        source_dir="code",
        hyperparameters=hyperparameters,
        role=sagemaker_role,
        instance_count=train_instance_count,
        instance_type=train_instance_type,        
        framework_version="1.5-1",
    )
    
    xgb_estimator.fit(inputs={"train": train_s3_path})

INFO:sagemaker.image_uris:Ignoring unnecessary Python version: py3.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: ml.m5.xlarge.
INFO:sagemaker:Creating training-job with name: sagemaker-xgboost-2023-03-25-19-30-48-305


2023-03-25 19:30:51 Starting - Starting the training job...
2023-03-25 19:31:05 Starting - Preparing the instances for training...
2023-03-25 19:31:50 Downloading - Downloading input data...
2023-03-25 19:32:20 Training - Downloading the training image...
2023-03-25 19:32:40 Training - Training image download completed. Training in progress.[34m[2023-03-25 19:32:48.425 ip-10-0-106-133.us-west-2.compute.internal:7 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None[0m
[34m[2023-03-25 19:32:48.507 ip-10-0-106-133.us-west-2.compute.internal:7 INFO profiler_config_parser.py:111] User has disabled profiler.[0m
[34m[2023-03-25:19:32:48:INFO] Imported framework sagemaker_xgboost_container.training[0m
[34m[2023-03-25:19:32:48:INFO] No GPUs detected (normal if no gpus installed)[0m
[34m[2023-03-25:19:32:48:INFO] Invoking user training script.[0m
[34m[2023-03-25:19:32:48:INFO] Module xgboost_starter_script does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34m[2023-03-

#### Deploy model to an endpoint
We are going to enable data capturing for model monitoring

In [41]:
from sagemaker.serializers import CSVSerializer
from sagemaker.model_monitor import DataCaptureConfig

data_capture_config = DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100,
    destination_s3_uri=f"s3://{bucket}/{prefix}/monitoring/datacapture"
)


predictor = xgb_estimator.deploy(
    initial_instance_count=1, instance_type="ml.m5.xlarge", serializer=CSVSerializer(), data_capture_config=data_capture_config
)

INFO:sagemaker:Creating model with name: sagemaker-xgboost-2023-03-25-19-33-45-597
INFO:sagemaker:Creating endpoint-config with name sagemaker-xgboost-2023-03-25-19-33-45-597
INFO:sagemaker:Creating endpoint with name sagemaker-xgboost-2023-03-25-19-33-45-597


----!

#### Test inference on endpoint
Function below calls the sagemaker endpoint and capture the predictions to generate the confussion matrix.

In [42]:
import numpy as np
def predict(data, rows=500):
    split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
    predictions =[]
    for array in split_array:
        predictions = predictions + sum(predictor.predict(array), [])

    return [float(i) for i in predictions]

def calibrate(probabilities, cutoff=.2):
    predictions = []
    for p in probabilities:
        if p <= cutoff:
            predictions.append(0)
        else:
            predictions.append(1)
    return predictions

You can load SageMaker Experiment at any time and track more information about the run. Here you will invoke the SageMaker endpoint for batch prediction and we will put the results in a simple [confusion matrix](https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/). We will also capture the chart and track it inside our experiment run.

After you complete the cell below, you can go to SageMaker Experiment to see all the information SageMaker has captured for you in this experiment run.

<img src="statics/module_03_ex01.png"  width="75%" height="75%">

In [43]:
with load_run(experiment_name=experiment_name, run_name=run_name) as run:

    # run batch prediction
    probabilities = predict(test.to_numpy()[:, 1:])
    # run calibration and visualize the results
    predictions = np.asarray(calibrate(probabilities, 0.4))
    run.log_confusion_matrix(test["anomaly"], predictions, unique_name_from_base("Confusion-Matrix"))

print(f"Experiment Name: {experiment_name}\n")

print(f"Run Name: {run_name}\n")

pd.crosstab(
    index=test.iloc[:, 0],
    columns=predictions,
    rownames=["actual"],
    colnames=["predictions"],
)

INFO:sagemaker.experiments.run:The run (xgboost-experiment-1679772647-ca49) under experiment (telco-5g-observabiltiy-1679772647-d8f7) already exists. Loading it. Note: sagemaker.experiments.load_run is recommended to use when the desired run already exists.


Experiment Name: telco-5g-observabiltiy-1679772647-d8f7

Run Name: xgboost-experiment-1679772647-ca49



predictions,0,1
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,13585,774
1,879,1676


### Train Isolation Forest Model
----
Now let's experiment with an unsupervised approach. You will use the full dataset this time and try to build an isolation forest model to isolate anomalies base on how different they are from each other. More on [isolation forest algorithm](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html#sklearn.ensemble.IsolationForest).

Upload a different dataset to S3 again.

In [44]:
iso_input = dataset.drop(["location_id", "anomaly", "eventtime", 
                          "write_time","api_invocation_time",'is_deleted'], axis=1)
iso_input.to_csv("data/iso_input.csv", index=False)
key = f"{prefix}/data/isoforest/iso_input.csv"

s3_client.upload_file(
    Filename="data/iso_input.csv",
    Bucket=bucket,
    Key=key,
)

input_s3_path = f"s3://{bucket}/{key}"
print(f"training data is uploaded to {input_s3_path}")

training data is uploaded to s3://sagemaker-us-west-2-376678947624/telco-5g-observabiltiy/data/isoforest/iso_input.csv


In [45]:
from sagemaker.sklearn.estimator import SKLearn

run_name = unique_name_from_base("isoforest-experiment")

with Run(experiment_name=experiment_name, run_name=run_name, 
         sagemaker_session=sagemaker_session) as run:
    
    run.log_file("data/iso_input.csv", is_output=False)
    FRAMEWORK_VERSION = "1.0-1"

    sklearn = SKLearn(
        entry_point="isolation_forest_script.py",
        source_dir="code",
        framework_version="1.0-1",
        instance_count=train_instance_count,
        instance_type=train_instance_type,
        role=sagemaker_role,
        sagemaker_session=sagemaker_session,
        hyperparameters={"max_samples": 512,
                        "random_state": 42,
                        "region":region},
    )
    sklearn.fit({"train": input_s3_path})

INFO:sagemaker:Creating training-job with name: sagemaker-scikit-learn-2023-03-25-19-39-53-333


2023-03-25 19:39:54 Starting - Starting the training job...
2023-03-25 19:40:09 Starting - Preparing the instances for training...
2023-03-25 19:40:48 Downloading - Downloading input data...
2023-03-25 19:41:33 Training - Training image download completed. Training in progress...[34m2023-03-25 19:41:41,100 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[34m2023-03-25 19:41:41,103 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-03-25 19:41:41,111 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[34m2023-03-25 19:41:41,292 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:[0m
[34m/miniconda3/bin/python -m pip install -r requirements.txt[0m
[34mCollecting sagemaker
  Downloading sagemaker-2.141.0.tar.gz (685 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 685.6/685.6 kB 18.9 MB/s eta 0:00:00
  Preparing metadata (setup.py): star

#### Deploy IsoForest Model to an endpoint

In [46]:
from sagemaker.deserializers import JSONDeserializer

isoforest_predictor = sklearn.deploy(
    initial_instance_count=1, instance_type="ml.m5.xlarge", serializer=CSVSerializer(), deserializer = JSONDeserializer()
)

INFO:sagemaker:Creating model with name: sagemaker-scikit-learn-2023-03-25-19-42-47-663
INFO:sagemaker:Creating endpoint-config with name sagemaker-scikit-learn-2023-03-25-19-42-47-663
INFO:sagemaker:Creating endpoint with name sagemaker-scikit-learn-2023-03-25-19-42-47-663


----!

#### Test inference on endpoint
Capture the confussion matrix results in the experiemnt for historic reference.

In [47]:
with load_run(experiment_name=experiment_name, run_name=run_name) as run:

    results = isoforest_predictor.predict(test.to_numpy()[:, 1:])
    
    # run fix -1 value to 0
    predictions = []
    for x in results:
        if x <= 0:
            predictions.append(0)
        else:
            predictions.append(x)
            
    predictions = np.asarray(predictions)
    run.log_confusion_matrix(test["anomaly"], predictions, unique_name_from_base("IsoForest-Confusion-Matrix"))

print(f"Experiment Name: {experiment_name}\n")

print(f"Run Name: {run_name}\n")

pd.crosstab(
    index=test.iloc[:, 0],
    columns=predictions,
    rownames=["actual"],
    colnames=["predictions"],
)

INFO:sagemaker.experiments.run:The run (isoforest-experiment-1679773192-70a6) under experiment (telco-5g-observabiltiy-1679772647-d8f7) already exists. Loading it. Note: sagemaker.experiments.load_run is recommended to use when the desired run already exists.


Experiment Name: telco-5g-observabiltiy-1679772647-d8f7

Run Name: isoforest-experiment-1679773192-70a6



predictions,0,1
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,1541,12818
1,659,1896


## Model Monitoring

<img src="statics/module_03_monitor01.png"  width="50%" height="50%">

Model monitoring is crucial to ensure that machine learning models continue to perform as expected after deployment. It involves tracking various metrics such as accuracy, precision, recall, and F1 score to detect and diagnose performance degradation, identify data drift, and other issues that may arise over time.

Amazon SageMaker model monitoring allows you to automatically monitor your deployed models using predefined rules, and alerts you when the model's performance deviates from the expected behavior. In this example you are going to manually setup data drift detection for our xgboost endpoint.

#### 1. Create a baselining job with training dataset
Now that you have the training data ready in Amazon S3, start a job to suggest constraints. DefaultModelMonitor.suggest_baseline(..) starts a ProcessingJob using an Amazon SageMaker provided Model Monitor container to generate the constraints.

In [48]:
from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat

# this is our training dataset
baseline_data_uri = train_s3_path
baseline_results_prefix = f"{prefix}/monitoring/baselining/results"
baseline_results_uri = f"s3://{bucket}/{baseline_results_prefix}"


my_default_monitor = DefaultModelMonitor(
    role=sagemaker_role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

my_default_monitor.suggest_baseline(
    baseline_dataset=baseline_data_uri,
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=baseline_results_uri,
    wait=True,
)

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: .
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating processing-job with name baseline-suggestion-job-2023-03-25-19-46-50-316


..........................[34m2023-03-25 19:51:04,337 - matplotlib.font_manager - INFO - Generating new fontManager, this may take some time...[0m
[34m2023-03-25 19:51:04.861666: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory[0m
[34m2023-03-25 19:51:04.861694: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.[0m
[34m2023-03-25 19:51:06.396617: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory[0m
[34m2023-03-25 19:51:06.396648: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)[0m
[34m2023-03-25 19:51:06.396671: I tensorflow/stream_executor/cuda/cuda_diagnostic

<sagemaker.processing.ProcessingJob at 0x7f484c91a750>

Baseline process generates a contraints and statistics configuration files, we are going to preview the generated json files. Keep in mind, you can also supply your own.

In [49]:
result = s3_client.list_objects(Bucket=bucket, Prefix=baseline_results_prefix)
report_files = [report_file.get("Key") for report_file in result.get("Contents")]
print("Found Files:")
print("\n ".join(report_files))

Found Files:
telco-5g-observabiltiy/monitoring/baselining/results/constraints.json
 telco-5g-observabiltiy/monitoring/baselining/results/statistics.json


**Statistics** refer to the expected statistical properties of the input data, such as mean and standard deviation

In [50]:
import pandas as pd

baseline_job = my_default_monitor.latest_baselining_job
schema_df = pd.io.json.json_normalize(baseline_job.baseline_statistics().body_dict["features"])
schema_df.head(10)

  after removing the cwd from sys.path.


Unnamed: 0,name,inferred_type,numerical_statistics.common.num_present,numerical_statistics.common.num_missing,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max,numerical_statistics.distribution.kll.buckets,numerical_statistics.distribution.kll.sketch.parameters.c,numerical_statistics.distribution.kll.sketch.parameters.k,numerical_statistics.distribution.kll.sketch.data
0,anomaly,Integral,67655,0,0.146301,9898.0,0.353408,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...",0.64,2048.0,"[[0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,..."
1,health,Fractional,67655,0,0.966063,65358.96,0.110403,0.5,1.0,"[{'lower_bound': 0.5, 'upper_bound': 0.55, 'co...",0.64,2048.0,"[[1.0, 0.93, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0..."
2,accessibility,Fractional,67655,0,0.872957,59059.9,0.31904,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...",0.64,2048.0,"[[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,..."
3,5g_users,Fractional,67655,0,0.034165,2311.429715,0.055447,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...",0.64,2048.0,"[[0.0028818443804034, 0.0089657380723663, 0.00..."
4,contention_rate,Fractional,67655,0,0.003183,215.332653,0.012855,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...",0.64,2048.0,"[[0.00022675736961451248, 0.0, 0.0, 0.00022675..."
5,utilization,Fractional,67655,0,0.251875,17040.611878,0.109768,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...",0.64,2048.0,"[[0.1408839779005525, 0.281767955801105, 0.151..."
6,downlink_throughput,Fractional,67655,0,0.020685,1399.434372,0.045512,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...",0.64,2048.0,"[[0.0001264799562459171, 0.0017395406966993, 0..."
7,uplink_throughput,Fractional,67655,0,0.027495,1860.168126,0.04546,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...",0.64,2048.0,"[[0.0009849585496016764, 0.0100781843310955, 0..."


**Constraints** are rules that are used to ensure that the model's performance does not degrade beyond a certain threshold

In [51]:
constraints_df = pd.io.json.json_normalize(
    baseline_job.suggested_constraints().body_dict["features"]
)
constraints_df.head(10)

  


Unnamed: 0,name,inferred_type,completeness,num_constraints.is_non_negative
0,anomaly,Integral,1.0,True
1,health,Fractional,1.0,True
2,accessibility,Fractional,1.0,True
3,5g_users,Fractional,1.0,True
4,contention_rate,Fractional,1.0,True
5,utilization,Fractional,1.0,True
6,downlink_throughput,Fractional,1.0,True
7,uplink_throughput,Fractional,1.0,True


#### 2. Create a schedule to analyze collected data for data quality issues

In [52]:
from time import gmtime, strftime
from sagemaker.model_monitor import CronExpressionGenerator

mon_schedule_name = unique_name_from_base(f"{prefix}-monitoring-job")

s3_report_path = f"s3://{bucket}/{prefix}/montoring/report"

my_default_monitor.create_monitoring_schedule(
    monitor_schedule_name=mon_schedule_name,
    endpoint_input=predictor.endpoint, #predictor endpoint name
    output_s3_uri=s3_report_path,
    statistics=my_default_monitor.baseline_statistics(),
    constraints=my_default_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(), #Hourly
    enable_cloudwatch_metrics=True,
)

See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
INFO:sagemaker.model_monitor.model_monitoring:Creating Monitoring Schedule with name: telco-5g-observabiltiy-monitoring-job-1679773970-94ce


In [53]:
desc_schedule_result = my_default_monitor.describe_schedule()
print("Schedule status: {}".format(desc_schedule_result["MonitoringScheduleStatus"]))

Schedule status: Pending


### Generate some artificial traffic

In [54]:
count = 0
for i in range(100):

    predict(test.to_numpy()[:, 1:])
    count+=1
    if count%10 == 0:
        print(f"predicting artificial traffic batch {count} ...")

predicting artificial traffic batch 10 ...
predicting artificial traffic batch 20 ...
predicting artificial traffic batch 30 ...
predicting artificial traffic batch 40 ...
predicting artificial traffic batch 50 ...
predicting artificial traffic batch 60 ...
predicting artificial traffic batch 70 ...
predicting artificial traffic batch 80 ...
predicting artificial traffic batch 90 ...
predicting artificial traffic batch 100 ...


Depend on the cron job you defined, you may need to wait a bit for your monitor job to execute.  Once it does, the execution will be listed below.

In [55]:
mon_executions = my_default_monitor.list_executions()
mon_executions

No executions found for schedule. monitoring_schedule_name: telco-5g-observabiltiy-monitoring-job-1679773970-94ce


[]

## Clean up

Delete feature group

In [56]:
feature_group_name = fg_name
sagemaker_client.delete_feature_group(
    FeatureGroupName= feature_group_name
)

{'ResponseMetadata': {'RequestId': '7abc626a-9c77-4875-9798-1c263d759681',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '7abc626a-9c77-4875-9798-1c263d759681',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Sat, 25 Mar 2023 19:54:04 GMT'},
  'RetryAttempts': 0}}

Remove experiments

In [57]:
import time
def remove_experiment(experiment_name):
    trials = sagemaker_client.list_trials(ExperimentName=experiment_name)['TrialSummaries']
    print('TrialNames:')
    for trial in trials:
        trial_name = trial['TrialName']
        print(f"\n{trial_name}")

        components_in_trial = sagemaker_client.list_trial_components(TrialName=trial_name)
        print('\tTrialComponentNames:')
        for component in components_in_trial['TrialComponentSummaries']:
            component_name = component['TrialComponentName']
            print(f"\t{component_name}")
            sagemaker_client.disassociate_trial_component(TrialComponentName=component_name, TrialName=trial_name)
            try:
                # comment out to keep trial components
                sagemaker_client.delete_trial_component(TrialComponentName=component_name)
            except:
                # component is associated with another trial
                continue
            # to prevent throttling
            time.sleep(.5)
        sagemaker_client.delete_trial(TrialName=trial_name)
    sagemaker_client.delete_experiment(ExperimentName=experiment_name)
    print(f"\nExperiment {experiment_name} deleted")

remove_experiment(experiment_name)

TrialNames:

Default-Run-Group-telco-5g-observabiltiy-1679772647-d8f7
	TrialComponentNames:
	sagemaker-scikit-learn-2023-03-25-19-39-53-333-aws-training-job
	telco-5g-observabiltiy-1679772647-d8f7-isoforest-experiment-1679773192-70a6
	sagemaker-xgboost-2023-03-25-19-30-48-305-aws-training-job
	telco-5g-observabiltiy-1679772647-d8f7-xgboost-experiment-1679772647-ca49

Experiment telco-5g-observabiltiy-1679772647-d8f7 deleted


Remove Model Monitor

In [58]:
my_default_monitor.delete_monitoring_schedule()


Deleting Monitoring Schedule with name: telco-5g-observabiltiy-monitoring-job-1679773970-94ce


INFO:sagemaker.model_monitor.model_monitoring:Deleting Data Quality Job Definition with name: data-quality-job-definition-2023-03-25-19-52-50-993


Remove endpoints

In [59]:
def remove_endpoint(endpoint_name):
    monitor_schedules = sagemaker_client.list_monitoring_schedules(EndpointName=endpoint_name)['MonitoringScheduleSummaries']
    print('Monitoring Schedule:')
    for ms in monitor_schedules:
        ms_name = ms['MonitoringScheduleName']
        print(f"\n{ms_name}")

        sagemaker_client.delete_monitoring_schedule(MonitoringScheduleName=ms_name)
        
    sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
    print(f"Endpoint {endpoint_name} deleted")

#xgboost
remove_endpoint(predictor.endpoint_name)
# #isolation forest
remove_endpoint(isoforest_predictor.endpoint_name)

Monitoring Schedule:
Endpoint sagemaker-xgboost-2023-03-25-19-33-45-597 deleted
Monitoring Schedule:
Endpoint sagemaker-scikit-learn-2023-03-25-19-42-47-663 deleted
