# Sample Muliclass Model for Domino Model Monitoring

Multiclass model being monitored with Domino Model Monitoring

In [1]:
import mlflow
import os

### Train a simple xgboost model on the Iris dataset

In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    data["data"], data["target"], test_size=0.2
)

### Register the Training Dataset

This is the reference baseline for Drift. It will be automatically ingested when Model Monitoring is configured 
in the Model Monitoring API.

In [3]:
from domino_data.training_sets import client, model
import pandas as pd
import os

target_column_name = "variety"

training_df = pd.DataFrame(data = X_train, columns = data.feature_names)
training_df[target_column_name] = [data.target_names[y] for y in y_train]

tsv = client.create_training_set_version(
    training_set_name="iris_python_multi_classification_{}".format(os.environ.get('DOMINO_PROJECT_NAME')),
    df=training_df,
    key_columns=[],
    target_columns=[target_column_name],
    exclude_columns=[],
    meta={"experiment_id": "0.1"},
    monitoring_meta=model.MonitoringMeta(**{
        "categorical_columns": [target_column_name],
        "timestamp_columns": [],
        "ordinal_columns": []
    })
)

print(f"TrainingSetVersion {tsv.training_set_name}:{tsv.number}")

TrainingSetVersion iris_python_multi_classification_DMM-Quickstart:2


### Train the Iris Model

In [4]:
from xgboost import XGBClassifier
from domino_data_capture.data_capture_client import DataCaptureClient
import uuid
import datetime

xgb_classifier = XGBClassifier(
    n_estimators=10,
    max_depth=3,
    learning_rate=1,
    objective="binary:logistic",
    random_state=123,
)

# train model
xgb_classifier.fit(X_train, y_train)

data_capture_client = DataCaptureClient(data.feature_names, [target_column_name])

class IrisModel(mlflow.pyfunc.PythonModel):
    def __init__(self,model):
        self.model = model
    
    def predict(self, context, model_input, params=None):
        event_time = datetime.datetime.now(datetime.timezone.utc).isoformat()
        prediction = self.model.predict(model_input)
        
        for i in range(len(prediction)):
            # Record eventID and current time
            event_id = uuid.uuid4()
            # Convert np types to python builtin type to allow JSON serialization by prediction capture library
            model_input_value = [float(x) for x in model_input[i]]
            prediction_value = [data.target_names[prediction[i]]]
            
            # Capture this prediction event so Domino can keep track
            data_capture_client.capturePrediction(model_input_value, prediction_value, event_id=event_id,
                                timestamp=event_time)
        return prediction

model = IrisModel(xgb_classifier)

### Register your Model in the Model Catalog

In [5]:
with mlflow.start_run() as run:
    model_info = mlflow.pyfunc.log_model(
        registered_model_name="DMM-Quickstart-Model", # A unique name
        python_model=model,
        artifact_path="test-model"
    )
print(model_info)

Successfully registered model 'DMM-Quickstart-Model'.
2024/01/25 21:01:05 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: DMM-Quickstart-Model, version 1
Created version '1' of model 'DMM-Quickstart-Model'.


<mlflow.models.model.ModelInfo object at 0x7fe4773947c0>


### Create Model API from the Model Card

Once your model has been registered:

1) Navigate to the model registry, open the Model Card for "DMM-Quickstart-Model" (or whatever you called your model), and create a new Model API. 

2) For Model API Source, select "Choose Model From Model Registry" and select "DMM-Quickstart-Model"

3) Once the Model API is green and says "Running", navigate to the "Configure Model Monitoring" tab in the Model API. On the right, click "Configure Monitoring", and follow the instructions. Select your training set created above as the model baseline for drift, and set the model type to Classification.

4) Score some data, using the sample Python code below. Be sure to update your URL and auth token to point to your Model API. A sample specific to your model is available in the Model API Overview tab. Domino Prediction Data Capture will capture these predictions in the back end.

![alt text](readme_images/API_Request_Python.png)

6) Wait for a bit. If you navigate to Domino Model Monitoring, the new model will appear. If you click into your new monitored model, under "Overview" in the "Ingest History" tab, the training data should be shown as ingested and "Done". However, under "Data Drift", your model will still say "No Prediction Data Added" for about an hour. The Model API Monitoring tab will say "Waiting for Prediction Data." The prediction data from step 4 has been captured, but you have to wait for the first automated ingest for that drift data to appear in the Model Monitoring UI and to move to the next steps.

7) Once data drift ingestion has happened, a new Domino Dataset called "prediction_data" will appear in your Project Domino Datasets list, and the Model Monitoring Data Drift section will populate.

In [14]:
# Update this to call your URL and Auth Token. Snippet avaialble from Model API Overview.
# This reference example uses a User Environment Variable to secure the Model API and Auth token.
# If you choose to use environment variables (recommended with git-based projects), you'll need to save and restart the workspace once they are created.

model_url = os.environ.get('MODEL_URL')
model_auth_token = os.environ.get('MODEL_AUTH_TOKEN')

import requests
 
response = requests.post(model_url, # Update
    auth=(
            model_auth_token, # Update
            model_auth_token # Update
    ),
    json={
        "data":  [  [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [6.7, 3.3, 5.7, 2.5],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [6.7, 3.3, 5.7, 2.5],
        [6.7, 3. , 5.2, 2.3],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [6.7, 3.3, 5.7, 2.5],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [6.7, 3.3, 5.7, 2.5],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [6.7, 3.3, 5.7, 2.5],]
    }
)
 
print(response.status_code)
print(response.headers)
print(response.json())

200
{'Date': 'Thu, 25 Jan 2024 21:26:07 GMT', 'Content-Type': 'application/json', 'Content-Length': '240', 'Connection': 'keep-alive', 'X-Request-ID': 'UDGZIEM1A6VJKXWT', 'Domino-Server': 'nginx-ingress,model-api,', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'POST', 'Access-Control-Allow-Headers': 'authorization,content-type', 'Content-Security-Policy': "frame-ancestors 'self' mltraining.domino-eval.com; ", 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'X-Frame-Options': 'SAMEORIGIN always'}
{'model_time_in_ms': 8, 'release': {'harness_version': '0.1', 'registered_model_name': 'DMM-Quickstart-Model', 'registered_model_version': '1'}, 'request_id': 'UDGZIEM1A6VJKXWT', 'result': [0, 0, 0, 2, 0, 0, 2, 2, 0, 0, 2, 0, 0, 2, 0, 0, 2], 'timing': 7.59124755859375}


### Create a Dummy Ground Truth Dataset

Typically for this step you would fetch actual ground truth data (the actual outcomes from what your model predicted on), 
join the actual outcomes with your prediction data, and upload into a datasource attached to model monitoring for Model Quality 
analysis.

However, for purposes of creating a quick demo, we'll make up some fake ground truth data using the model predictions captured with Domino's
data capture client. These predictions are stored in an automatically-generated Domino Dataset called "prediction_data"

Once Data has ingested (roughly one hour), a "prediction_data" Domino Dataset will be added to the Project.

1) Navigate to the Domino Dataset Folder on the left (back from /mnt/ , then "data/prediction_data/...")
Copy the path to read in your registered model predictions.

2) Join the Predictions to make your ground truth dataset, shuffle some labels to simulate classification errors, and save the ground truth csv

3) Upload the csv to the s3 bucket attached as a Domino Model Monitoring Dataset


In [17]:
import pandas as pd

# Navigate to the most recent predictions and copy the file path to one of the parquet files in there. 
# This is where you can find data captured by the Data Capture Client in your Model API

# /mnt/data/prediction_data/{PREDICTION_DATA_ID}/{DATE}/{TIME}/predictions_{ID}.parquet

path = '/mnt/data/prediction_data/65b04f6b1266902edb95b260/$$date$$=2024-01-23Z/$$hour$$=23Z/predictions_60759ee6-b6d9-4645-8655-84e55af145ba.parquet'

predictions = pd.read_parquet(path)

In [23]:
print(predictions.shape)
predictions.head()

(25, 8)


Unnamed: 0,petal length (cm),petal width (cm),sepal length (cm),sepal width (cm),variety,timestamp,__domino_timestamp,event_id
0,1.1,0.1,4.3,3.0,setosa,2024-01-23 23:53:57.398343+00:00,2024-01-23T23:53:57.403167+00:00,69f2291e-e15d-4372-aebe-302edf53f476
1,1.2,0.2,5.8,4.0,setosa,2024-01-23 23:53:57.398343+00:00,2024-01-23T23:53:57.403685+00:00,130b85c0-91b8-4d79-b530-baf31cba5860
2,1.5,0.4,5.7,4.4,setosa,2024-01-23 23:53:57.398343+00:00,2024-01-23T23:53:57.403968+00:00,1d41236d-c595-478b-b937-bbf451b4eb98
3,5.7,2.5,6.7,3.3,virginica,2024-01-23 23:53:57.398343+00:00,2024-01-23T23:53:57.404184+00:00,42f5c408-66dc-45f8-9db1-2d6c74f6438e
4,5.2,2.3,6.7,3.0,virginica,2024-01-23 23:53:57.398343+00:00,2024-01-23T23:53:57.404386+00:00,ca1712ba-9053-4cb8-9aa2-b57267273b44


The Ground Truth dataset needs 2 columns: 

1) The existing event ID column from the model predictions.
   
    This column has the join keys for joing ground truth lables to your model's predictions

3) Your new column containing ground truth labels.


In [45]:
event_id = predictions['event_id']
iris_ground_truth = predictions['variety']

# Create a new dataframe
ground_truth = pd.DataFrame(columns=['event_id', 'iris_ground_truth'])
ground_truth['event_id'] = event_id
ground_truth['iris_ground_truth'] = iris_ground_truth

# These row labels help find some diferent iris types in our initial scoring data
end_index = predictions.shape[0]
mid_index = int(round(predictions.shape[0] / 2, 0))

# Simulate some classifcation errors. This makes our confusion matrix interesting.
ground_truth.iloc[0, 1] = 'virginica'
ground_truth.iloc[1, 1] = 'versicolor'
ground_truth.iloc[mid_index-1, 1] = 'versicolor'
ground_truth.iloc[mid_index, 1] = 'virginica'
ground_truth.iloc[end_index-2, 1] = 'setosa'
ground_truth.iloc[end_index-1, 1] = 'setosa'

# Save this example ground truth csv to your file to your Project files for reference.

date = datetime.datetime.today()
month = date.month
day = date.day
year = date.year

date = str(datetime.datetime.today()).split()[0]

ground_truth.to_csv('data/iris_ground_truth_{}_{}_{}.csv'.format(month, day, year), index=False)

### Upload the ground truth file to s3, an optional AWS model monitoring data source.

Ground truth labels must come from an external data source attached to Domino Model Monitoring. The Model API does not capture ground truth labels, since they typically become avaiable after the prediction.

This example uses a Domino Data Source, you could also use boto3 or other methods to updload data to s3.

In [47]:
# For this approach, add an s3 Domino Datasource bucket to your Project. Then, copy the first fe linwes of the automatically generated Python code.

from domino.data_sources import DataSourceClient

# instantiate a client and fetch the datasource instance
object_store = DataSourceClient().get_datasource("adlsdatasource")

object_store.upload_file("iris_ground_truth_{}_{}_{}.csv".format(month, day, year), "data/iris_ground_truth_{}_{}_{}.csv".format(month, day, year))

### Add a Domino Datasource to Domino Model Monitoring

In Domino Model Monitoring, if you have not already done so, add a Monitoring Data Source (in this example, an s3 bucket).




### First Time Registration of Ground Truth Labels via the API

The final step is to register Ground Truth Labels with Domino Model Monitoring.

This can be done in the Model Monitoring UI using the Ground Truth Config file, or using the Domino Model Monitoring API.

Documentation here: https://docs.dominodatalab.com/en/latest/api_guide/f31cde/model-monitoring-api-reference/#_registerDatasetConfig

You’ll need the following:

1) The name of the monitoring data source you registered in Domino Model Monitoring (the name in Domino, not the s3 bucket name if they’re different).

2) The Domino Model Monitoring Model ID, not the Model API model ID. This mode ID can be found in the Overview tab of your monitored model, or in the URL for that model.

3) Your Domino API Key. Note that Domino Model Monitoring API keys have been deprecated, there is now only one API key for your whole Domino account.

     *If this is your first time using your Domino API key, go to the Domino Workbench, then open up your User Account setting in the lower left. Regenerate your API key, save it securely, then also save to your Domino account as a User Environment Variable. In this example, I’ve called it “MY_API_KEY'. These are accessed & added under  “API Key” and “User Environment Variables” in your Account Settings.
    Your Workspace will not yet know about your new User Environment Variable. Save your notebook, then save and restart your Workspace to make your workspace aware of the new environment variable.*
    
4) The path to your ground truth labels csv file in your monitoring data source (s3 in this case)

5) The column name of your new, ground truth labels 

6) Your original target (or prediction) column name

7) Your organization's Domino url to create the Domino Model Monitoring API endpoint.

    For Example:
    
    “demo2.dominodatalab.com” to “my-domino-domain.dominodatalab.com”

In [49]:
import pandas as pd
import numpy as np
import random
import math
import pickle
import json
import os
import requests
import datetime
import boto3
from botocore.exceptions import NoCredentialsError
 
# UPDATE: (1) The name of your monitoring data source in Domino Model Monitoring
data_source = 'se-demo-bucket'

# UPDATE: (2) Your Model Monitoring Model ID (NOT Model API model ID)
model_id='65b0525c54ac3acc8cb495d1'

# UPDATE: (3) Your Domino API key
API_key = os.environ['MY_API_KEY']
 
# UPDATE: (4) The name of the file uploaded to s3 above
gt_file_name = "iris_ground_truth_{}_{}_{}.csv".format(month, day, year)

# UPDATE: (5) Ground Truth column name
GT_column_name = 'iris_ground_truth'

# UPDATE: (6) Your original target column name
target_column_name = 'variety'

# UPDATE: (7) Your organizations's Domino url
your_domino_url = 'demo2.dominodatalab.com'

ground_truth_url = "https://{}/model-monitor/v2/api/model/{}/register-dataset/ground_truth".format(your_domino_url, model_id)

print('Registering {} From ADLS Bucket in DMM'.format(gt_file_name))
 
# create GT payload    
 
# Set up call headers
headers = {
           'X-Domino-Api-Key': API_key,
           'Content-Type': 'application/json'
          }

 
ground_truth_payload = """
{{
    "variables": [{{
    
            "valueType": "categorical",
            "variableType": "ground_truth",
            "name": "{2}", 
            "forPredictionOutput": "{3}"
        
    }}],
    "datasetDetails": {{
            "name": "{0}",
            "datasetType": "file",
            "datasetConfig": {{
                "path": "{0}",
                "fileFormat": "csv"
            }},
            "datasourceName": "{1}",
            "datasourceType": "s3"
        }}
}}
""".format(gt_file_name, data_source, GT_column_name, target_column_name)
 
# Make api call
ground_truth_response = requests.request("PUT", ground_truth_url, headers=headers, data = ground_truth_payload)
 
# Print response
print(ground_truth_response.text.encode('utf8'))
 
print('DONE!')


Registering iris_ground_truth_1_25_2024.csv From S3 Bucket in DMM
b''
DONE!


### Next Steps

Going forward, Domino will automatically capture all prediction data going across your Model API. It will ingest these predictions for Drift detection once per day. You can set a schedule to determine when this ingest happens.

To periodically upload ground truth labels, repeat the previous step, but without the “variables” in the ground truth payload (this only needs to be done once). As new ground truth labels are added, point Domino to the path to the new labels in the monitoring data source by pinging the same Model Monitoring API:

ground_truth_payload = """

{{

       "datasetDetails": {{
        
            "name": "{0}",
            "datasetType": "file",
            "datasetConfig": {{
                "path": "{0}",
                "fileFormat": "csv"
            }},
            "datasourceName": "{1}",
            "datasourceType": "s3"
        }}
}}""".format(gt_file_name, data_source, GT_column_name, target_column_name)



### Automation with Domino Jobs
To simulate Domino Model Monitoring over time, you can try out running the following two scripts as scheduled Domino Jobs:

**(1) daily_scoring.py**

Daily scoring simulates a daily batch scoring script. Data is read in, sent to the Domino Model API, and predictions are returned.
Domino's Prediction Capture Client captures this scoring data, and every 24 hours, it gets ingested into the Drift Monitoring dashboard.

**(2) daily_ground_truth.py**

Daily ground truth simulates uploading actual outcomes after the predictions have been made. A scheduled Domino Job writes the latest ground truth labels to an s3 bucket, then calls the Domino Model Monitoring API with the path to the file with the latest ground truth labels.

If you schedule these two jobs, be sure that ground truth runs after the predictions!