## Initial setup

In [1]:
import pandas as pd
import numpy as np

import google.cloud.aiplatform as aiplatform
from google.cloud.aiplatform import model_monitoring

In [2]:
PROJECT_ID = "ds-training-380514"
REGION = "us-central1"
BUCKET_NAME = "ds-training-380514"
BUCKET_URI = f"gs://{BUCKET_NAME}"

In [3]:
aiplatform.init(project=PROJECT_ID,
                location=REGION)

Copy-paste endpoint info from the AutoML notebook e.g. look for something like "projects/123/locations/us-central1/endpoints/456"

In [4]:
endpoint = aiplatform.Endpoint('projects/354621994428/locations/us-central1/endpoints/1823759936992051200')

In [5]:
print(endpoint)

<google.cloud.aiplatform.models.Endpoint object at 0x7f7d45bd02d0> 
resource name: projects/354621994428/locations/us-central1/endpoints/1823759936992051200


## Configure alerting specification
First, you configure the alerting_config specification with the following settings:

user_emails: A list of one or more email to send alerts to.
enable_logging: Streams detected anomalies to Cloud Logging. Default is False.

In [6]:
USER_EMAIL = "abc@domain.com"

alerting_config = model_monitoring.EmailAlertConfig(user_emails=[USER_EMAIL],
                                                    enable_logging=True)

### Configure the monitoring interval specification

Next, you configure the `schedule_config` specification with the following settings:

- `monitor_interval`:  Sets the model monitoring job scheduling interval in hours. Minimum time interval is 1 hour.

In [7]:
# Monitoring Interval
MONITOR_INTERVAL = 1  # least count = 1; measured in hours

# Create schedule configuration
schedule_config = model_monitoring.ScheduleConfig(monitor_interval=MONITOR_INTERVAL)

### Configure the sampling specification

Next, you configure the `logging_sampling_strategy` specification with the following settings:

- `sample_rate`: The rate as a percentage (between 0 and 1) to randomly sample prediction requests for monitoring. Selected samples are logged to a BigQuery table.

In [8]:
SAMPLE_RATE = 0.5  # default value is 0.8 i.e. 80%

# Create sampling configuration
logging_sampling_strategy = model_monitoring.RandomSampleConfig(sample_rate=SAMPLE_RATE)

### Configure the drift detection specification

Next, you configure the `drift_config` specification with the following settings:

- `drift_thresholds`: A dictionary of key/value pairs where the keys are the input features for monitor for drift. The value is the detection threshold. When not specified, the default drift threshold for a feature is 0.3 (30%).

*Note:* Enabling drift detection is optional.

In [9]:
DRIFT_THRESHOLD_VALUE = 0.05

# Set column-wise threshold values
DRIFT_THRESHOLDS = {"ABBA": DRIFT_THRESHOLD_VALUE,
                    }

drift_config = model_monitoring.DriftDetectionConfig(drift_thresholds=DRIFT_THRESHOLDS)

### Configure the skew detection specification

Next, you configure the `skew_config` specification with the following settings:

- `data_source`: The source of the dataset of the original training data. The format of the source defaults to a BigQuery table. Otherwise the setting `data_format` must be set to one of the values below. The location of the data must be a Cloud Storage location.
  - `csv`: 
  - `jsonl`:
  - `tf-record`:
- `skew_thresholds`: A dictionary of key/value pairs where the keys are the input features for monitor for skew. The value is the detection threshold. When not specified, the default skew threshold for a feature is 0.3 (30%).
- `target_field`: The target label for the training dataset

*Note:* Enabling skew detection is optional.

In [10]:
TRAIN_DATA_GCS_URI = "gs://aaa-aca-ml-workshop/beatles/file_out_2485_tags.csv"  # source of training csv file
TARGET = "Like_The_Beatles"  # label column

SKEW_THRESHOLD_VALUE = 0.05

SKEW_THRESHOLDS = {"ABBA": SKEW_THRESHOLD_VALUE,
                   }

skew_config = model_monitoring.SkewDetectionConfig(data_source=TRAIN_DATA_GCS_URI,
                                                   skew_thresholds=SKEW_THRESHOLDS,
                                                   target_field=TARGET,
                                                   data_format="csv")

### Assemble the objective specification

Finally, you assemble the objective specification `objective_config` with the following settings:

- `skew_detection_config`: (Optional) The specification for the skew detection configuration.
- `drift_detection_config`: (Optional) The specification for the drift detection configuration.


In [11]:
objective_config = model_monitoring.ObjectiveConfig(
                                                    skew_detection_config=skew_config,
                                                    drift_detection_config=drift_config,
                                                   )

## Monitoring

### Create the input schema

The monitoring service needs to know the features and data types for the the feature inputs to the model, which is referred to as the `input schema`. 

For `AutoML` models, the `input schema` is predefined and automatically loaded by the monitoring service.

### Create the monitoring job

You create a monitoring job, with your monitoring specifications, using the `aiplatform.ModelDeploymentMonitoringJob.create()` method, with the following parameters:

- `display_name`: The human readable name for the monitoring job.
- `project`: The project ID.
- `region`: The region.
- `endpoint`: The fully qualified resource name of the `Vertex AI Endpoint` to enable monitoring.
- `logging_sampling_strategy`: The specification for the sampling configuration.
- `schedule_config`: The specification for the scheduling configuration.
- `alert_config`: The specification for the alerting configuration.
- `objective_configs`: The specification for the objectives configuration.

In [12]:
monitoring_job = aiplatform.ModelDeploymentMonitoringJob.create(
                                                                display_name="beatles_monitoring",  # for GCP console
                                                                project=PROJECT_ID,
                                                                location=REGION,
                                                                endpoint=endpoint,
                                                                logging_sampling_strategy=logging_sampling_strategy,
                                                                schedule_config=schedule_config,
                                                                alert_config=alerting_config,
                                                                objective_configs=objective_config,
                                                               )

print(monitoring_job)

Creating ModelDeploymentMonitoringJob
ModelDeploymentMonitoringJob created. Resource name: projects/354621994428/locations/us-central1/modelDeploymentMonitoringJobs/1365176039596097536
To use this ModelDeploymentMonitoringJob in another session:
mdm_job = aiplatform.ModelDeploymentMonitoringJob('projects/354621994428/locations/us-central1/modelDeploymentMonitoringJobs/1365176039596097536')
View Model Deployment Monitoring Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/model-deployment-monitoring/1365176039596097536?project=354621994428
<google.cloud.aiplatform.jobs.ModelDeploymentMonitoringJob object at 0x7f7d44326790> 
resource name: projects/354621994428/locations/us-central1/modelDeploymentMonitoringJobs/1365176039596097536


Check current status

#### Email notification of the monitoring job.

An email notification is sent to the email address in the alerting configuration, notifying that the model monitoring job is now enabled.

The contents will appear like:

<blockquote>
Hello Vertex AI Customer,

You are receiving this mail because you are using the Vertex AI Model Monitoring service.
This mail is to inform you that we received your request to set up drift or skew detection for the Prediction Endpoint listed below. Starting from now, incoming prediction requests will be sampled and logged for analysis.
Raw requests and responses will be collected from prediction service and saved in bq://[your-project-id].model_deployment_monitoring_[endpoint-id].serving_predict .
</blockquote>

*Note:* You do not need to wait for the email notification to continue to the next step.

#### Monitoring Job State

After you start the `Vertex AI Model Monitoring` job, it will be in a `PENDING` state until `skew distribution baseline` is calculated. The monitoring service will initiate a batch job to generate the distribution baseline from the training data. 

Once the baseline distribution is generated, then the monitoring job will enter `OFFLINE` state. On the per interval basis -- e.g., once an hour, the monitoring job will enter `RUNNING` state while analyzing the sampled data. Once completed, it will return to an `OFFLINE` state while awaiting the next scheduled analysis.

In [13]:
jobs = monitoring_job.list(filter="display_name=beatles_monitoring")  # same as in previous cell
job = jobs[0]
print(job.state)

JobState.JOB_STATE_PENDING


Wait for a few minutes and check again

In [14]:
print(job.state)

JobState.JOB_STATE_PENDING


Generate synthetic data for prediction requests

### Make the prediction requests

In [15]:
import pandas as pd

inference_sample = pd.read_feather('test_data/inference_sample.feather')

In [16]:
import json

In [17]:
inference_sample

Unnamed: 0,user_name,30_Seconds_to_Mars,65daysofstatic,A_Perfect_Circle,A_Tribe_Called_Quest,ABBA,ACDC,Adele,Aerosmith,Air,...,tag_shoegazer,tag_hair_metal,tag_rapcore,tag_underground_hip_hop,tag_symphonic_black_metal,tag_darkwave,tag_world,tag_latin,tag_spanish,Like_The_Beatles
0,thegiant,1.0,,,,,,11.0,1.0,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,True
1,nezter,,,,,,,,,3.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,False
2,augustohp,,52.0,502.0,,1.0,452.0,1.0,215.0,14.0,...,0.0,2.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,True
3,stalphonzo,,,,,,6.0,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,True
4,davenall,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,False
5,Andy_Greenwell,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,True
6,lilyean,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,False
7,absentbebnim,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,False
8,adherr,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,False
9,auserzz,,,,,,,25.0,,,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,False


In [18]:
from typing import List, Dict

def predict_tabular_classification(
    project: str,
    location: str,
    endpoint_name: str,
    instances: List[Dict],
):
    """
    Args
        project: Your project ID or project number.
        location: Region where Endpoint is located. For example, 'us-central1'.
        endpoint_name: A fully qualified endpoint name or endpoint ID. Example: "projects/123/locations/us-central1/endpoints/456" or
               "456" when project and location are initialized or passed.
        instances: A list of one or more instances (examples) to return a prediction for.
    """
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint(endpoint_name)

    response = endpoint.predict(instances=instances)

    for prediction_ in response.predictions:
        print(prediction_)
        return prediction_

In [20]:
inference_results = []
for index, row in inference_sample.iterrows():
    instance = json.loads(row.astype(str).to_json())
    results = predict_tabular_classification(PROJECT_ID, REGION, 'projects/354621994428/locations/us-central1/endpoints/1823759936992051200', [instance])
    inference_results.append(results)


{'scores': [0.2971574068069458, 0.7028425931930542], 'classes': ['True', 'False']}
{'scores': [0.4739128947257996, 0.5260871648788452], 'classes': ['True', 'False']}
{'scores': [0.9814208745956421, 0.01857918873429298], 'classes': ['True', 'False']}
{'scores': [0.4892224967479706, 0.510777473449707], 'classes': ['True', 'False']}
{'scores': [0.02431050315499306, 0.9756895899772644], 'classes': ['True', 'False']}
{'scores': [0.03823720291256905, 0.9617628455162048], 'classes': ['True', 'False']}
{'scores': [0.03098485246300697, 0.9690151214599609], 'classes': ['True', 'False']}
{'classes': ['True', 'False'], 'scores': [0.02387252263724804, 0.9761275053024292]}
{'scores': [0.5519230365753174, 0.4480769634246826], 'classes': ['True', 'False']}
{'scores': [0.04236870259046555, 0.9576313495635986], 'classes': ['True', 'False']}


### Logging sampled requests

Once the monitoring service has started, the sampled prediction requests will be logged to Cloud Storage. On the next monitoring interval, the sampled predictions are then copied over to the BigQuery logging table. Once the entries are in the BigQuery table, the monitoring service will analyze the sampled data.

Next, you wait for the first logged entres to appear in the BigQuery table used for logging prediction samples. Since you sent 1000 prediction requests, with 50% sampling, you should see around 500 entries.

In [30]:
from google.cloud import bigquery
bqclient = bigquery.Client(project=PROJECT_ID)

In [31]:
import time
while True:
    time.sleep(180)

    ENDPOINT_ID = endpoint.resource_name.split("/")[-1]

    table = bigquery.TableReference.from_string(
        f"{PROJECT_ID}.model_deployment_monitoring_{ENDPOINT_ID}.serving_predict"
    )
    rows = bqclient.list_rows(table)
    print(rows.total_rows)
    if rows.total_rows > 0:
        break

8


In [34]:
table = bigquery.TableReference.from_string(
        f"{PROJECT_ID}.model_deployment_monitoring_{ENDPOINT_ID}.serving_predict"
    )
rows = bqclient.list_rows(table)
print(rows.total_rows)

8


In [35]:
print(job.state)

JobState.JOB_STATE_RUNNING


### Skew detection during monitoring

The feature input skew detection will occur at the next monitoring interval. In this tutorial, you set the monitoring interval to one hour. So, in about an hour your monitoring job will go from `OFFLINE` to `RUNNING`. While running, it will analyze the logged sampled tables from the predictions during this interval and compare them to the baseline distribution.

Once the analysis is completed, the monitoring job will send email notifications on the detected skew, in this case `year`, and the monitoring job will go into `OFFLINE` state until the next interval.

#### Wait for monitoring interval

It can take upwards of 40 minutes from when the analyis occurred on the monitoring interval to when you receive an email alert.

The contents will appear like

<blockquote>
   Hello Vertex AI Customer,

You are receiving this mail because you are subscribing to the Vertex AI Model Monitoring service.
This mail is just to inform you that there are some anomalies detected in your deployed models and may need your attention.


Basic Information:

Endpoint Name: projects/[your-project-id]/locations/us-central1/endpoints/3315907167046860800
Monitoring Job: projects/[your-project-id]/locations/us-central1/modelDeploymentMonitoringJobs/8672170640054157312
Statistics and Anomalies Root Path(Google Cloud Storage): gs://cloud-ai-platform-773884b1-2a32-48d6-8b83-c03cde416b68/model_monitoring/job-8672170640054157312
BigQuery Command: SELECT * FROM `bq://[your-project-id].model_deployment_monitoring_3315907167046860800.serving_predict`


Training Prediction Skew Anomalies (Raw Feature):

Anomalies Report Path(Google Cloud Storage): gs://cloud-ai-platform-773884b1-2a32-48d6-8b83-c03cde416b68/model_monitoring/job-8672170640054157312/serving/2022-08-25T00:00/stats_and_anomalies/<deployed-model-id>/anomalies/training_prediction_skew_anomalies

For more information about the alert, please visit the model monitoring alert page.

Deployed model id: <deployed-model-id>

Feature name	Anomaly short description	Anomaly long description
country	High Linfty distance between training and serving	The Linfty distance between training and serving is 0.947563 (up to six significant digits), above the threshold 0.5. The feature value with maximum difference is: Year
<blockquote>

### Delete the monitoring job

You can delete the monitoring job using the `delete()` method. 

In [34]:
monitoring_job.pause()
monitoring_job.delete()