# Sample Muliclass Model for External Domino Model Monitoring

Example notebook to set up external Domino Model Monitoring:
- Models hosted outside of Domino 
- Models scores using batch inference through Domino Jobs

## Background
The setup process for external models being monitored with Domino Model Monitoring listed below.

(1) The model does not need to be trained in Domino- it can be an existing model trained elsewhere.

(2) It does not matter where the external model is hosted. It could be on an edge device, on-prem, in your cloud hosting service, or hosted in Domino.

### Register a Monitoring Data Source

Domino requires an external data source to register an external model.

The external data source stores the:

(1) Training Dataset

(2) Inference data & model predictions

(3) Ground truth labels (optional)

One datasource can be used for multiple DMM models. The same datasource can also be used for both ground truth labels for integrated models and data used for external models.

The Domino Model Monitoring data sources are registered independently of the data sources used in Domino Workbench. Model monitoring can read in data from multiple cloud data sources or on-prem data sources. A list of available data sources is here:
https://docs.dominodatalab.com/en/latest/user_guide/8c7833/connect-a-data-source/

You can register your DMM datasource through the DMM UI or using DMM's API (see example API call below)

In [4]:
# API Reference: https://docs.dominodatalab.com/en/latest/api_guide/f31cde/model-monitoring-api-reference/#_datasource
import os
import json
import requests

# UPDATE: (1) Your Domino API key
API_key = os.environ['DOMINO_USER_API_KEY']

# UPDATE: (2) Your organizations's Domino url
your_domino_url = 'prod-field.cs.domino.tech'

# UPDATE: (3) Your new DMM datasource name
datasource_name = 'GSK_DataSource'

# UPDATE: (4) DMM Datasource Type & Attributes. These credential will be different for each datasource.
datasource_type = "s3"
S3_Bucket_Name = "uday-samala-dmm-test-bucket"
S3_Region = "us-west-2"
AWS_Access_Key = os.environ.get("AWS_ACCESS_KEY_ID")
AWS_Secret_Key = os.environ.get("AWS_SECRET_ACCESS_KEY")
 
datasource_url = "https://{}/model-monitor/v2/api/datasource".format(your_domino_url)

# Set up call headers
headers = {
           'X-Domino-Api-Key': API_key,
           'Content-Type': 'application/json'
          }

data_source_request = {
    "name": datasource_name,
    "type": datasource_type,
    "config" : {
        "bucket": S3_Bucket_Name,
        "region": S3_Region,
        "instance_role" : False,
        "access_key": AWS_Access_Key,
        "secret_key": AWS_Secret_Key
    }
}
# format(datasource_name, datasource_type, S3_Bucket_Name, S3_Region, AWS_Access_Key, AWS_Secret_Key)

# Make api call
datasource_response = requests.request("PUT", datasource_url, headers=headers, data = json.dumps(data_source_request))
 
# Print response
print(datasource_response.text.encode('utf8'))
 
print('DONE!')

b'Datasource with name GSK_DataSource and type s3 is already registered.'
DONE!


### Register Your External Model

Once you have a data source registered:

(1) Upload the training dataset used for your model to that datasource, and note the path to your training dataset. DMM will need this to initiate the model.

(2) Prepare your model config file. In the UI, the config json looks like the example below.

It contains 3 components:

(A) **variables**: A list of variable names, data types, and variable types for each column that you want to monitor. This can include the target variable if you'd like to monitor drift in your model's predictions.

(B) **datasetDetails**: The location of your training dataset that you just uploaded into the DMM datasource

(C) **modelMetadata**: The name and description of your model to render in Domino Model Monitoring

Like with DMM Data Sources, models can be created in the UI or via APIs.

```
{
    "variables": [
        {
            "valueType": "numerical",
            "variableType": "feature",
            "name": "petal.length"
        },
        {
            "valueType": "numerical",
            "variableType": "feature",
            "name": "sepal.length"
        },
        {
            "valueType": "numerical",
            "variableType": "feature",
            "name": "petal.width"
        },
        {
            "valueType": "numerical",
            "variableType": "feature",
            "name": "sepal.width"
        },
        {
            "valueType": "categorical",
            "variableType": "prediction",
            "name": "variety"
        }
    ],
    "datasetDetails": {
        "name": "iris.csv",
        "datasetType": "file",
        "datasetConfig": {
            "path": "iris.csv",
            "fileFormat": "csv"
        },
        "datasourceName": "dmm-shared-bucket",
        "datasourceType": "s3"
    },
    "modelMetadata": {
        "name": "iris_model",
        "modelType": "classification",
        "version": "1.01",
        "description": "classification_iris_model",
        "author": "John Doe"
    }
}
```

#### Example to register a model via the API

In [6]:
# API Reference: https://docs.dominodatalab.com/en/latest/user_guide/a94c1c/model-monitoring-apis/#_model

import os
import json
import requests

# UPDATE: (1) Your Domino API key
API_key = os.environ['DOMINO_USER_API_KEY']

# UPDATE: (2) Your organizations's Domino url
your_domino_url = 'prod-field.cs.domino.tech'

# UPDATE: (3) Your DMM datasource name
datasource_name = 'GSK_DataSource'

# UPDATE: (4) Your DMM datasource type
datasource_type = 's3'

# UPDATE: (5) DMM Datasource Type & Attributes. These credential will be different for each datasource.
training_dataset_name = "ChurnTrainingDataPP.csv"
training_dataset_path = "ChurnTrainingDataPP.csv"
training_dataset_fileFormat = "csv"

datasource_url = "https://{}/model-monitor/v2/api/model".format(your_domino_url)

# Set up call headers
headers = {
           'X-Domino-Api-Key': API_key,
           'Content-Type': 'application/json'
          }

# Update each variable name, varibleType and valueType for your model:

model_register_request = {
    "variables": [
        {
            "name": "custid",
            "valueType": "string",
            "variableType": "row_identifier"
        },
        {
            "name": "dropperc",
            "valueType": "numerical",
            "variableType": "feature",
            "featureImportance": 0.7
        },
        {
            "name": "mins",
            "valueType": "numerical",
            "variableType": "feature",
            "featureImportance": 0.9
        },
        {
            "name": "consecmonths",
            "valueType": "numerical",
            "variableType": "feature",
            "featureImportance": 0.1
        },
        {
            "name": "income",
            "valueType": "numerical",
            "variableType": "feature",
            "featureImportance": 0.3
        },
        {
            "name": "age",
            "valueType": "numerical",
            "variableType": "feature",
            "featureImportance": 0.5
        },
        {
            "name": "churn_Y",
            "valueType": "categorical",
            "variableType": "prediction"
        },
        {
            "name": "predictionProbability",
            "valueType": "numerical",
            "variableType": "prediction_probability",
            "forPredictionOutput": "churn_Y"
        }
    ],
    "datasetDetails": {
        "name": training_dataset_name,
        "datasetType": "file",
        "datasetConfig": {
            "path": training_dataset_path,
            "fileFormat": training_dataset_fileFormat
        },
        "datasourceName": datasource_name,
        "datasourceType": datasource_type
    },
    "modelMetadata": {
        "name": "gsk-customer-churn",
        "modelType": "classification",
        "version": "1.0",
        "description": "Classification model to predict customer churn",
        "author": "Uday Samala"
    }
}

# Make api call
model_response = requests.request("PUT", datasource_url, headers=headers, data = json.dumps(model_register_request))
 
# Print response
print(model_response.text.encode('utf8'))
 
print('DONE!')

b'{"id": "65ca754a69dd9289b62c0c32", "createdAt": 1707767114, "updatedAt": 1707767114, "name": "gsk-customer-churn", "description": "Classification model to predict customer churn", "modelType": "classification", "author": "Uday Samala", "version": "1.0", "userId": "0c3b0ed2-0255-449d-ac59-28d6d585f2d2", "isDeleted": false, "ingestionStatus": "created", "registrationStatus": "created", "sourceType": "standalone", "visibility": "public", "collaborators": []}'
DONE!


### Register Prediction Data

Since this is an external model, Domino does not automatically capture prediction data.

Prediction data will need to be collected in a DMM Datasource, then periodically ingested into your monitored model. You could do this manually via the API, but it is generally automated via API calls to DMM.

You could append prediction data to a single file in your monitoring data source, then have Doino ingest the prediction data on a schedule.

Alternatively, you can upload individual files with your prediction data to your monitoring data source, then call DMM's API to update the path to the file with the latest prediction data. This could be easily done with a scheduled Domino Job.

Below is an example for the second approach, updating the file and calling DMM's API.

Notes:
- Only register a column name once. If a column name is passed to DMM a second time, it will throw an error. For example, the example below adds a new column called "id" that identifies each request, so that we can later pair up requests with ground truth labels. Only add this column name the first time you upload prediction data to your registered model - for any subsequent uploads only update the dataset details. 

In [8]:
# API Reference: https://docs.dominodatalab.com/en/latest/user_guide/a94c1c/model-monitoring-apis/#_model

import os
import json
import requests

# UPDATE: (1) Your Domino API key
API_key = os.environ['DOMINO_USER_API_KEY']

# UPDATE: (2) Your Model Monitoring Model ID
model_id='65ca754a69dd9289b62c0c32'

# UPDATE: (3) Your organizations's Domino url
your_domino_url = 'prod-field.cs.domino.tech'

# UPDATE: (4) Your DMM datasource name
datasource_name = 'GSK_DataSource'

# UPDATE: (5) Your DMM datasource type
datasource_type = 's3'

# UPDATE: (6) Your RowID Name (Optional, for model quality monitoring. Do this only once.)
Prediction_ID_name = 'custid'

# UPDATE: (7) DMM Datasource Type & Attributes. These credential will be different for each datasource.
prediction_dataset_name = "inputs_and_preds_2021-09-16.csv"
prediction_dataset_path = "inputs_and_preds_2021-09-16.csv"
prediction_dataset_fileFormat = "csv"

prediction_data_url = "https://{}/model-monitor/v2/api/model/{}/register-dataset/prediction".format(your_domino_url, model_id)


# Set up call headers
headers = {
           'X-Domino-Api-Key': API_key,
           'Content-Type': 'application/json'
          }

# Update each variable name, varibleType and valueType for your model:

prediction_registration_request = {
    "datasetDetails": {
        "name": prediction_dataset_name,
        "datasetType": "file",
        "datasetConfig": {
            "path": prediction_dataset_path,
            "fileFormat": prediction_dataset_fileFormat
        },
        "datasourceName": datasource_name,
        "datasourceType": datasource_type
    }
    
}

# Make api call
prediction_response = requests.request("PUT", prediction_data_url, headers=headers, data = json.dumps(prediction_registration_request))
 
# Print response
print(prediction_response.text.encode('utf8'))
 
print('DONE!')

b''
DONE!


### Ingest Ground Truth Dataset

Typically for this step you would fetch actual ground truth data (the actual outcomes from what your model predicted on), 
join the actual outcomes with your prediction data, and upload into a datasource attached to model monitoring for Model Quality 
analysis.

However, for purposes of creating a quick demo, we'll make up some fake ground truth data using the model predictions captured with Domino's
data capture client. These predictions are stored in an automatically-generated Domino Dataset called "prediction_data"

Once Data has ingested (roughly one hour), a "prediction_data" Domino Dataset will be added to the Project.

1) Navigate to the Domino Dataset Folder on the left (back from /mnt/ , then "data/prediction_data/...")
Copy the path to read in your registered model predictions.

2) Join the Predictions to make your ground truth dataset, shuffle some labels to simulate classification errors, and save the ground truth csv

3) Upload the csv to the s3 bucket attached as a Domino Model Monitoring Dataset


In [10]:
# API Reference: https://docs.dominodatalab.com/en/latest/user_guide/a94c1c/model-monitoring-apis/#_model

import os
import json
import requests

# UPDATE: (1) Your Domino API key
API_key = os.environ['DOMINO_USER_API_KEY']

# UPDATE: (2) Your Model Monitoring Model ID
model_id='65ca754a69dd9289b62c0c32'

# UPDATE: (3) Your organizations's Domino url
your_domino_url = 'prod-field.cs.domino.tech'

# UPDATE: (4) Your DMM datasource name
datasource_name = 'GSK_DataSource'

# UPDATE: (5) Your DMM datasource type
datasource_type = 's3'

# UPDATE: (6) Your RowID Name (Optional, for model quality monitoring. Do this only once.)
groudtruth_ID_name = 'custid'

# UPDATE: (7) DMM Datasource Type & Attributes. These credential will be different for each datasource.
groudtruth_dataset_name = "ground_truth_2021-09-16.csv"
groudtruth_dataset_path = "ground_truth_2021-09-16.csv"
groudtruth_dataset_fileFormat = "csv"

groudtruth_data_url = "https://{}/model-monitor/v2/api/model/{}/register-dataset/ground_truth".format(your_domino_url, model_id)


# Set up call headers
headers = {
           'X-Domino-Api-Key': API_key,
           'Content-Type': 'application/json'
          }

# Update each variable name, varibleType and valueType for your model:

groudtruth_registration_request = {
    "variables": [
        {
            "valueType": "categorical",
            "variableType": "ground_truth",
            "name": "y_gt",
            "forPredictionOutput": "churn_Y"
        }
    ],
    "datasetDetails": {
        "name": groudtruth_dataset_name,
        "datasetType": "file",
        "datasetConfig": {
            "path": groudtruth_dataset_path,
            "fileFormat": groudtruth_dataset_fileFormat
        },
        "datasourceName": datasource_name,
        "datasourceType": datasource_type
    }
    
    
}

# Make api call
ground_truth_response = requests.request("PUT", groudtruth_data_url, headers=headers, data = json.dumps(groudtruth_registration_request))
 
# Print response
print(ground_truth_response.text.encode('utf8'))
 
print('DONE!')

b''
DONE!


The Ground Truth dataset needs 2 columns: 

1) The existing event ID column from the model predictions.
   
    This column has the join keys for joing ground truth lables to your model's predictions

3) Your new column containing ground truth labels.


Registering iris_ground_truth_1_25_2024.csv From S3 Bucket in DMM
b'["Dataset already registered with the model."]'
DONE!


### Next Steps


To periodically upload ground truth labels, repeat the previous step, but without the “variables” in the ground truth payload (this only needs to be done once). As new ground truth labels are added, point Domino to the path to the new labels in the monitoring data source by pinging the same Model Monitoring API:

ground_truth_payload = """

{{

       "datasetDetails": {{
        
            "name": "{0}",
            "datasetType": "file",
            "datasetConfig": {{
                "path": "{0}",
                "fileFormat": "csv"
            }},
            "datasourceName": "{1}",
            "datasourceType": "s3"
        }}
}}""".format(gt_file_name, data_source, GT_column_name, target_column_name)



### Automation with Domino Jobs
To simulate Domino Model Monitoring over time, you can try out running the following two scripts as scheduled Domino Jobs:

**(1) daily_scoring.py**

Daily scoring simulates a daily batch scoring script. Data is read in, sent to the Domino Model API, and predictions are returned.
Domino's Prediction Capture Client captures this scoring data, and every 24 hours, it gets ingested into the Drift Monitoring dashboard.

**(2) daily_ground_truth.py**

Daily ground truth simulates uploading actual outcomes after the predictions have been made. A scheduled Domino Job writes the latest ground truth labels to an s3 bucket, then calls the Domino Model Monitoring API with the path to the file with the latest ground truth labels.

If you schedule these two jobs, be sure that ground truth runs after the predictions!