# Sample External Model with Domino Model Monitoring

Example notebook to set up external Domino Model Monitoring (DMM).

## Background
The key difference between monitoring external models and Domino's integrated model monitoring is that with external models, Domino does not capture your model's training data using TrainingSets or prediction data through the DataCaptureClient. You will need to provide the model's training data, prediction data and, optionally, ground truth labels in a Monitoring Data Source, and then point your model in DMM to that data.

Other notes:

(1) The model does not need to be trained in Domino. It can be an existing model trained elsewhere.

(2) It does not matter where the external model is hosted. It could be on an edge device, on-prem, in your cloud hosting service, or run as a batch Job in Dominio.

The steps below can be done via the DMM UI, or automated using DMM's API. To use the UI, follow the steps documented here:

https://docs.dominodatalab.com/en/latest/user_guide/679cc1/set-up-model-monitor/

The examples below demonstrate the setup & monitoring of external models using DMM's API. 

### Monitoring Data Source

Domino requires an external data source to register an external model. For integrated models, you only need a Monitoring Data Source if you are ingesting ground truth labels. 

This datasource should already be connected and set up form the notebook '1_Initial_Setup.ipynb'

For an External model external data source stores:

(1) The training dataset

(2) Prediction data & model predictions

(3) Ground truth labels (optional)

### 1. Upload the Training Data to Monitoring Data Source

Since we already connected the Monitoring data source to the Workbench in the setup Notebook, we'll upload the training data using a 

To upload the training dataset from the Workbench, add the same Monitoring Data Source registered above to this project. In a workspace, you can add it through the Data tab on the left.

The Iris training dataset is already saved in the "data" folder in the mnt directory as "iris_training_data.csv"

In [8]:
# Load the config file
import yaml

with open("/mnt/artifacts/DMM_config.yaml") as yamlfile:
    config = yaml.safe_load(yamlfile)

# Get your username so we don't overwrite the other users files!
import os

user_name = os.environ['DOMINO_USER_NAME']

In [18]:
# Upload the Training Data from Domino Workbench to the DMM data source
from domino.data_sources import DataSourceClient

# instantiate a client and fetch the data source instance
object_store = DataSourceClient().get_datasource("{}".format(config['workbench_datasource_name']))

# Upload the existing training data to your DMM data source from the Workbench
object_store.upload_file('{}_iris_training_data.csv'.format(user_name), '/mnt/code/data/iris_training_data.csv')

# Upload the first batch of scoring data to the DMM datasource.
object_store.upload_file('{}_external_model_scoring_data.csv'.format(user_name), '/mnt/code/data/external_model_scoring_data.csv')

### 2. Register an External Model in Domino Model Monitoring

The first step is to register your new model in Domino Model Monitoring. To register a new external model:

(1) Upload the training data used for your model to that Monitoring Data Source, and note the path to your training data file. DMM will need this to initiate the model.

(2) Prepare your **Monitoring Config JSON** file. In the UI, the config json looks like the example below.

It contains 3 components:

(A) **variables**: A list of variable names, data types, and variable types for each column that you want to monitor. This can include the target variable if you'd like to monitor drift in your model's predictions.

(B) **datasetDetails**: The name and location of your training dataset that you just uploaded into the DMM datasource

(C) **modelMetadata**: The name and description of your model to render in Domino Model Monitoring

Like with DMM Data Sources, Monitoring Config JSONs can be copied and pasted into the UI or automatically sent to Domino via APIs. Full documentation for Monitoring Config JSONs here:

https://docs.dominodatalab.com/en/latest/user_guide/bb88ca/monitoring-config-json/

**Pro Tip:** Domino recommends saving your config json as a file in your Project files for future reference and modification. See "Example_Model_Config.json"



```
{
    "variables": [
        {
            "valueType": "numerical",
            "variableType": "feature",
            "name": "petal.length"
        },
        {
            "valueType": "numerical",
            "variableType": "feature",
            "name": "sepal.length"
        },
        {
            "valueType": "numerical",
            "variableType": "feature",
            "name": "petal.width"
        },
        {
            "valueType": "numerical",
            "variableType": "feature",
            "name": "sepal.width"
        },
        {
            "valueType": "categorical",
            "variableType": "prediction",
            "name": "variety"
        }
    ],
    "datasetDetails": {
        "name": "iris.csv",
        "datasetType": "file",
        "datasetConfig": {
            "path": "iris.csv",
            "fileFormat": "csv"
        },
        "datasourceName": "dmm-shared-bucket",
        "datasourceType": "s3"
    },
    "modelMetadata": {
        "name": "iris_model",
        "modelType": "classification",
        "version": "1.01",
        "description": "classification_iris_model",
        "author": "John Doe"
    }
}
```

In [19]:
#### Example to register a model via the API
# API Reference: https://docs.dominodatalab.com/en/latest/user_guide/a94c1c/model-monitoring-apis/#_model

import os
import json
import requests

# File names & format for the training data in your external datasource uploaded in Step 1.
training_dataset_name = '{}_iris_training_data.csv'.format(user_name)
training_dataset_path = '{}_iris_training_data.csv'.format(user_name)
training_dataset_fileFormat = "csv"

# Name for your model in DMM
model_name = "{} Example External Model".format(user_name)

with open("/mnt/artifacts/DMM_config.yaml") as yamlfile:
    config = yaml.safe_load(yamlfile)

datasource_url = "https://{}/model-monitor/v2/api/model".format(config['url'])

# Set up call headers
headers = {
           'X-Domino-Api-Key': os.environ['DOMINO_USER_API_KEY'],
           'Content-Type': 'application/json'
          }

# Update each variable name, varibleType and valueType for your model:

model_register_request = {
    "variables": [
        {
            "valueType": "numerical",
            "variableType": "feature",
            "name": "petal length (cm)"
        },
        {
            "valueType": "numerical",
            "variableType": "feature",
            "name": "sepal length (cm)"
        },
        {
            "valueType": "numerical",
            "variableType": "feature",
            "name": "petal width (cm)"
        },
        {
            "valueType": "numerical",
            "variableType": "feature",
            "name": "sepal width (cm)"
        },
        {
            "valueType": "categorical",
            "variableType": "prediction",
            "name": "variety"
        }
    ],
    "datasetDetails": {
        "name": training_dataset_name,
        "datasetType": "file",
        "datasetConfig": {
            "path": training_dataset_path,
            "fileFormat": training_dataset_fileFormat
        },
        "datasourceName": config['DMM_datasource_name'],
        "datasourceType": config['datasource']['type'],
    },
    "modelMetadata": {
        "name": model_name,
        "modelType": "classification",
        "version": "1.01",
        "description": "classification_iris_model",
        "author": os.environ['DOMINO_USER_NAME']
    }
}

# Make api call
ground_truth_response = requests.request("PUT", datasource_url, headers=headers, data = json.dumps(model_register_request))
 
# Print response
response = json.loads(ground_truth_response.text.encode('utf8'))

print("New model id is: {}".format(response['id']))

# Save the model ID for next steps
new_model_id = response['id']

# Save the new model ID to the config file
config['external_model_id'] = new_model_id

with open("/mnt/artifacts/DMM_config.yaml", "w") as yamlfile:
    config = yaml.dump(
        config, stream=yamlfile, default_flow_style=False, sort_keys=False
    )

print('DONE!')

New model id is: 66ebed56b7a424ef7c18d8ce
DONE!


*Dont forget to save and sync your workspace so that the config file has the new model id!*

### 3. Set up Drift Detection
https://docs.dominodatalab.com/en/latest/user_guide/86bc1f/set-up-drift-detection/


While integrated models can capture prediction data using the DataCaptureClient, external models need to ingest prediction data from a connected Monitoring Data Source. Just like with the initial model registration, information needed to ingest the prediction data is provided to Domino using a **Prediction Config JSON**.


There are two approaches to automating prediction data ingest from a Monitoring Data Source:


(1) Append new data to the same file in your Monitoring Data Source.


  -  Only register your Prediction Data config with the path to the prediction data file once. Domino will automatically retrieve new data every 24 hours from that file. You can schedule the daily check in the DMM UI.
  -  This approach requires registering a **timestamp** variable so that DMM knows which prediction rows are new.


(2) Upload prediction data as separate files to your Monitoring Data Source.


  - This requires updating the datasetDetails in the Prediction Data Config everytime new prediction data is added. This is best automated through the API, using a Domino Job or some other scheduler.
  - When you update the Prediction Data Config, only update the "datasetDetails" with the new prediction data file path. Variables are only set the first time, if you re-register variable names DMM will throw an error.

#### Option 1

Below is an example of an initial Prediction Data Config file, which be copied and pasted into the UI or automatically sent to Domino via APIs. Full docs here:


https://docs.dominodatalab.com/en/latest/user_guide/bb88ca/monitoring-config-json/


At the end is an example of only updating the "datasetDetails" via the API if you choose to follow approach #2.

**Pro Tip:** Domino recommends saving your config json as a file in your Project files for future reference and modification. See "Example_Prediction_Config.json"

In [20]:
### Example to register the initial Prediction Config via the API
# API Reference: https://docs.dominodatalab.com/en/latest/user_guide/a94c1c/model-monitoring-apis/#_model

# The RowID Name (Optional, used for model quality monitoring. Do this only once.)
Prediction_ID_name = 'event_id'

# File names & format for the prediction data in your external datasource uploaded in Step 1.
prediction_dataset_name = '{}_external_model_scoring_data.csv'.format(user_name)
prediction_dataset_path = '{}_external_model_scoring_data.csv'.format(user_name)
prediction_dataset_fileFormat = "csv"

with open("/mnt/artifacts/DMM_config.yaml") as yamlfile:
    config = yaml.safe_load(yamlfile)

prediction_data_url = "https://{}/model-monitor/v2/api/model/{}/register-dataset/prediction".format(config['url'], config['external_model_id'])


# Set up call headers
headers = {
           'X-Domino-Api-Key': os.environ['DOMINO_USER_API_KEY'],
           'Content-Type': 'application/json'
          }

# Update each variable name, varibleType and valueType for your model:

prediction_registration_request = {
    "variables": [
        {
            "valueType": "string",
            "variableType": "row_identifier",
            "name": Prediction_ID_name
        }
    ],
    "datasetDetails": {
        "name": prediction_dataset_name,
        "datasetType": "file",
        "datasetConfig": {
            "path": prediction_dataset_path,
            "fileFormat": prediction_dataset_fileFormat
        },
        "datasourceName": config['DMM_datasource_name'],
        "datasourceType": config['datasource']['type']
    }
}

# Make api call
ground_truth_response = requests.request("PUT", prediction_data_url, headers=headers, data = json.dumps(prediction_registration_request))
 
# Print response
print(ground_truth_response.text.encode('utf8'))

print('DONE!')

b''
DONE!


#### Option 2: Upload additional prediction data as separate files to your Monitoring Data Source.

Next is an example for updating the prediction data file via the API if you choose option (2).

This step:
1) Uploads a file with new scoring data to the DMM datasource
2) Use the Domino Model Monitoring API to update the path to this new data, and ingest the new scoring data for drift detection.

Example scripts to automate these steps using Domino Jobs are in the "external_model_scripts" folder, and are described in detail at the end of this notebook.

In [12]:
# Instantiate a client and fetch the data source instance
object_store = DataSourceClient().get_datasource("{}".format(config['workbench_datasource_name']))

# Upload to the data source (s3 in this case)
object_store.upload_file('{}_external_model_scoring_data_update.csv'.format(user_name), '/mnt/code/data/external_model_scoring_data_update.csv')

In [17]:
### Example to update Prediction Config if uploading prediction data as separate files to your Monitoring Data Source.

# File names & format for the new prediction data file.
prediction_dataset_name = "{}_external_model_scoring_data_update.csv".format(user_name)
prediction_dataset_path = "{}_external_model_scoring_data_update.csv".format(user_name)
prediction_dataset_fileFormat = "csv"

with open("/mnt/artifacts/DMM_config.yaml") as yamlfile:
    config = yaml.safe_load(yamlfile)

prediction_data_url = "https://{}/model-monitor/v2/api/model/{}/register-dataset/prediction".format(config['url'], config['external_model_id'])


# Set up call headers
headers = {
           'X-Domino-Api-Key': os.environ['DOMINO_USER_API_KEY'],
           'Content-Type': 'application/json'
          }

prediction_registration_request = {
    "datasetDetails": {
        "name": prediction_dataset_name,
        "datasetType": "file",
        "datasetConfig": {
            "path": prediction_dataset_path,
            "fileFormat": prediction_dataset_fileFormat
        },
        "datasourceName": config['DMM_datasource_name'],
        "datasourceType": config['datasource']['type']
    }
}

# Make api call
ground_truth_response = requests.request("PUT", prediction_data_url, headers=headers, data = json.dumps(prediction_registration_request))
 
# Print response
print(ground_truth_response.text.encode('utf8'))
 
print('DONE!')


b'["Dataset already registered with the model."]'
DONE!


### 4. Set up Model Quality Monitoring (Optional)

There is very little difference in setting up Model Quality Monitoring between Internal and External models, since Domino Model APIs cannot capture actual outcomes after-the-fact. The process is nearly the same as registering prediction data for external models.

For this example, we have a dummy ground truth dataset in the"data folder "external_model_ground_truth_data.csv"

1) Upload the sample ground truth data to your DMM data source.
2) Use the Domino Model Monitoring API to set your ground truth labels column name, initial ground truth label file name, and DMM datasource to reference.

In [14]:
# instantiate a client and fetch the datasource instance
object_store = DataSourceClient().get_datasource("{}".format(config['workbench_datasource_name']))

# Upload to the datasouce (s3 in this case)
object_store.upload_file('{}_external_model_ground_truth_data.csv'.format(user_name), '/mnt/code/data/external_model_ground_truth_data.csv')

The Ground Truth dataset needs 2 columns: 

1) The existing event ID column from the model predictions.
   
    This column has the join keys for joining ground truth labels to your model's predictions

3) Your new column containing ground truth labels.


In [15]:
# File names for the ground truth data in your external data source uploaded above.
gt_file_name = "{}_external_model_ground_truth_data.csv".format(user_name)

# Ground Truth column name
GT_column_name = 'iris_ground_truth'

# The original target column name
target_column_name = 'variety'

with open("/mnt/artifacts/DMM_config.yaml") as yamlfile:
    config = yaml.safe_load(yamlfile)

ground_truth_url = "https://{}/model-monitor/v2/api/model/{}/register-dataset/ground_truth".format(config['url'], config['external_model_id'])

print('Registering {} From S3 Bucket in DMM'.format(gt_file_name))
 
# create GT payload    
 
# Set up call headers
headers = {
           'X-Domino-Api-Key': os.environ['DOMINO_USER_API_KEY'],
           'Content-Type': 'application/json'
          }

 
ground_truth_payload = """
{{
    "variables": [{{
    
            "valueType": "categorical",
            "variableType": "ground_truth",
            "name": "{2}", 
            "forPredictionOutput": "{3}"
        
    }}],
    "datasetDetails": {{
            "name": "{0}",
            "datasetType": "file",
            "datasetConfig": {{
                "path": "{0}",
                "fileFormat": "csv"
            }},
            "datasourceName": "{1}",
            "datasourceType": "{4}"
        }}
}}
""".format(gt_file_name, config['DMM_datasource_name'], GT_column_name, target_column_name, config['datasource']['type'])
 
# Make api call
ground_truth_response = requests.request("PUT", ground_truth_url, headers=headers, data = ground_truth_payload)
 
# Print response
print(ground_truth_response.text.encode('utf8'))
 
print('DONE!')

Registering external_model_ground_truth_data.csv From S3 Bucket in DMM
b''
DONE!


### Next Steps

To periodically upload ground truth labels, repeat the previous step, but without the “variables” in the ground truth payload (this only needs to be done once). As new ground truth labels are added, point Domino to the path to the new labels in the monitoring data source by pinging the same Model Monitoring API:

ground_truth_payload = """

{{

       "datasetDetails": {{
        
            "name": "{0}",
            "datasetType": "file",
            "datasetConfig": {{
                "path": "{0}",
                "fileFormat": "csv"
            }},
            "datasourceName": "{1}",
            "datasourceType": "s3"
        }}
}}""".format(gt_file_name, data_source, GT_column_name, target_column_name)



### Automation with Domino Jobs
To simulate Domino Model Monitoring over time, you can try out running the following two scripts described above as scheduled Domino Jobs.

Just like with this notebook, be sure to update & commit your workspace to update the config file.

#### Your Model Parameters
- workbench_datasource_name
- datasourceType 
- DMM_datasource_name 
- domino_url 
- DMM_model_id 
- API_key 

**external_model_daily_batch_job.py** 

This script simulates the external model scoring step, using a Domino job for batch inference. Batch inference in Domino using Domino Jobs can be monitored the same way as an external model. For external models, the scoring data, external model predictions, and prediction ID must be captured manually.

1) First, the script scores the data using a local model, and captures the model's predictions, and generates a row identifier used for model quality tracking (Step 4)
2) Second, the script uploads the scoring data, including the external model predictions and prediction ID, to the DMM data source for our external model.
3) Finally it updates the file paths for the scoring data using the DMM API so that DMM can find the new data when it ingests data from the DMM data source.

**external_model_daily_ground_truth.py** 

This script simulated sending ground truth labels to the model after the predictions are made for model quality monitoring.

1) It generates dummy ground truth labels (we unfortunately don't have a botanist to label the irises)
2) Second, the script uploads the ground truth data to the DMM data source for our external model.
3) Finally it updates the file paths for the ground truth data using the DMM API so that DMM can find the new labels when it ingests ground truth data from the DMM data source. 


To test these scripts the ensure batch job runs before daily scoring, and that DMM is scheduled to ingest the scoring and ground truth data after daily_scoring_upload has finished. 

If you schedule these two jobs, be sure that ground truth runs after the predictions!