## Custom Metrics with Domino Data Lab

Example for generating alerts for bias in model predictions using Domino Model Monitoring & IBM's AI Fairness 360 Package
https://aif360.mybluemix.net/

(1) Use customizable Domino Environemnts to build an environment for AIF360

(2) Connect to Domino's Model Monitoring Registry

(3) Use AIF360 to apply the bias test of your choice, and set your alert thresholds with Domino Monitoring

(4) Integrate your alert with Domino Mondel Monitoring, alongside drift & model quality monitoring.


In [1]:
# Import necessary packages

import sys
import os
sys.path.insert(1, "../")  

import numpy as np
import pandas as pd
np.random.seed(0)

from aif360.datasets import GermanDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.algorithms.preprocessing import Reweighing

from IPython.display import Markdown, display

2023-05-24 23:53:45.371426: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
We've integrated functorch into PyTorch. As the final step of the integration, functorch.vmap is deprecated as of PyTorch 2.0 and will be deleted in a future version of PyTorch >= 2.3. Please use torch.vmap instead; see the PyTorch 2.0 release notes and/or the torch.func migration guide for more details https://pytorch.org/docs/master/func.migrating.html


### Connect to a registered model in Domino Model Monitoring

In [2]:
import domino

project = "{}/{}".format(os.environ['DOMINO_PROJECT_OWNER'],
                         os.environ['DOMINO_PROJECT_NAME'])

print("Project name: {}".format(project))

# Registered model for monitoring 
dmm_model_id = "644ffac7a9872366aac7065d"

# Initiate custom metric client
d = domino.Domino(project)
metrics_client = d.custom_metrics_client()

Project name: dave_heinicke/AI-Fairness-360


### Example: Sample dataset from AIF360 to use for bias detection

In [3]:
# Load dataset from AIF360
dataset_orig = GermanDataset(
    protected_attribute_names=['age'],           # this dataset also contains protected
                                                 # attribute for "sex" which we do not
                                                 # consider in this evaluation
    privileged_classes=[lambda x: x >= 25],      # age >=25 is considered privileged
    features_to_drop=['personal_status', 'sex'] # ignore sex-related attributes
)


dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

# Define privileged groups
privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]

#### Calculate Mean Difference in outcomes on the original dataset

In [4]:
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

original_mean_difference = abs(metric_orig_train.mean_difference())

Difference in mean outcomes between unprivileged and privileged groups = -0.169905


#### Reweight the dataset to simulate training bias

In [5]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)
dataset_transf_train = RW.fit_transform(dataset_orig_train)

metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 
                                               unprivileged_groups=unprivileged_groups,
                                               privileged_groups=privileged_groups)
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

corrected_mean_difference = abs(metric_orig_train.mean_difference())

Difference in mean outcomes between unprivileged and privileged groups = 0.000000


### Log as a Domino Model Monitoring Custom metric

Using Domino's Custom Metric Client, log the difference in means in the sample training data

In [6]:
metrics_client.log_metrics([
    { "modelMonitoringId" : dmm_model_id, "metric" : "Age_Mean_Difference", "value" : corrected_mean_difference,
    "timestamp" : "2023-05-08T00:00:00Z",
    "tags" : { "example_tag1" : "value1", "example_tag2" : "value2" }
    },
    { "modelMonitoringId" : dmm_model_id, "metric" : "Age_Mean_Difference", "value" : original_mean_difference,
    "timestamp" : "2023-05-08T00:00:10Z" }
    ])


#### Set the custom trigger alert for model monitoring

Tell metrics clinet where to set the alert threshold to log the difference in means in the sample training data

Include a note that will be sent when this alert threshold is exceeded

In [7]:
metrics_client.trigger_alert(dmm_model_id, "Age_Mean_Difference", 3.14,
                            condition = metrics_client.GREATER_THAN,
                            lower_limit=0.1, 
                            upper_limit=999,
                            description = "AIF 360 has detected Age Difference Factor Greater than 0.1" )

#### To verify the metric was logged, print a history of past metrics logged in Domino Model Monitoring

In [13]:
res = metrics_client.read_metrics(dmm_model_id, "Age_Mean_Difference",
"2023-05-08T00:00:00Z", "2023-05-08T00:00:10Z")

res_df = pd.DataFrame.from_dict(res['metricValues'])

res_df.head()

# res['metricValues']

Unnamed: 0,timestamp,value,tags
0,2023-05-08T00:00:00Z,-0.169905,"{'example_tag1': 'value1', 'example_tag2': 'va..."
1,2023-05-08T00:00:00Z,0.169905,"{'example_tag1': 'value1', 'example_tag2': 'va..."
2,2023-05-08T00:00:00Z,0.169905,"{'example_tag1': 'value1', 'example_tag2': 'va..."
3,2023-05-08T00:00:00Z,0.169905,"{'example_tag1': 'value1', 'example_tag2': 'va..."
4,2023-05-08T00:00:00Z,0.169905,"{'example_tag1': 'value1', 'example_tag2': 'va..."
