In [None]:
!pip install trustyai

The purpose of this demo is to highlight the capabilities of `dispayMetrics` module in the TrustyAI-Python library. This demo will show how to get TrustyAI metrics through a Jupyter notebook rather than the OpenShift CLI by utilizing the `dispayMetrics` module in the TrustyAI-Python library. It is the Python counterpart to an pre-existing demo: `odh-trusty-ai-demos/2-BiasMonitoring`, which explores how to use TrustyAI monitor models for bias. Before we get started, make sure you have completed the `odh-trusty-ai-demos/1-Installation` demo and the `Deploy Models` section in the `odh-trusty-ai-demos/2-BiasMonitoring` demo, or TLDR; met the following pre-requisites:

* An ODH installation
* A TrustyAI Operator
* A model-namespace project containing an instance of the TrustyAI Service
* Deployed models 



# Bias Monitoring

In [5]:
# common 
import json
import os 
import pandas
import requests
import subprocess

import warnings
warnings.filterwarnings("ignore") 

# local
from display_metrics import *
from send_data_batch import *

In [6]:
# instantiate the client 
TOKEN = (subprocess.check_output('oc whoami -t', shell=True)).decode().strip('\n')
trustyaiClient = displayMetrics(token=TOKEN)

Error from server (NotFound): routes.route.openshift.io "trustyai-service" not found


CalledProcessError: Command 'oc get route/trustyai-service --template={{.spec.host}}' returned non-zero exit status 1.

For the purposes of this tutorial, we will use the existing data in `2-BiasMonitoring/data/` which is in `.json` format. We expect most users that use the `displayMetrics` module to have their data in a pandas DataFrame, in which the following step of converting `.json` data files to a dataframe can be skipped.

In [4]:
# convert training data in json format to pandas df
train_df = json_to_df(data_path='2-BiasMonitoring/data/training/', batch_list=list(range(0, 2251, 250)))
display(train_df.head())
train_df.shape

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,0.0,267750.0,2.0,1.0,1.0,0.0,1.0,1.0,0.0,20991.0,4727.0
1,0.0,315000.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,10450.0,2077.0
2,1.0,405000.0,3.0,1.0,0.0,1.0,1.0,1.0,0.0,11842.0,2016.0
3,0.0,229500.0,2.0,1.0,1.0,0.0,1.0,1.0,0.0,18325.0,1256.0
4,0.0,180000.0,2.0,1.0,0.0,0.0,1.0,1.0,0.0,14781.0,2443.0


(2368, 11)

## Send Training Data to Models

Let's train our models by sending them data in batches of 250. Since our number of samples is not perfectly divisible 250, our last batch will have 118 samples. We will send our batched data to our models by first writing them to a `.json` file and then sending the `.json` file to the model. This step is a redundant given that there are pre-existing `.json` files for our batched data, however, we are assuming that most users can directly read their data file into a pandas DataFrame.

In [36]:
# define batch size
batch_size = 250

# create a list of batched df
train_df_list = [train_df[i:i+batch_size] for i in range (0, len(train_df), batch_size)]

# iterate through each batch in list of batches
batch_id = list(range(0, 2551, 250))
for batch, id in zip(train_df_list, batch_id):
    # convert batched df to json format and write to json file
    json_data = df_to_json(batch, 'customer_data_input', f'2-BiasMonitoring/data//training/{id}.json')
    # send data in json format to models
    send_data_batch(f'2-BiasMonitoring/data/training/{id}.json')

Already on project "model-namespace" on server "https://api.trustyai-2024.hzud.p3.openshiftapps.com:443".


IndexError: list index out of range

## Examining TrustyAI's Model Metadata

While we can observe that our models are recieving data through the OpenShift Cluster Metrics, we can do it more directly and also get more information by getting metadata from our models.m

In [59]:
model_metadata = trustyaiClient.get_model_metadata()
print(model_metadata)

[{'metrics': {'scheduledMetadata': {'metricCounts': {}}}, 'data': {'inputSchema': {'items': {'customer_data_input-6': {'type': 'DOUBLE', 'name': 'customer_data_input-6', 'values': [1.0, 0.0], 'index': 6}, 'customer_data_input-5': {'type': 'DOUBLE', 'name': 'customer_data_input-5', 'values': [0.0, 1.0], 'index': 5}, 'customer_data_input-4': {'type': 'DOUBLE', 'name': 'customer_data_input-4', 'values': [1.0, 0.0], 'index': 4}, 'customer_data_input-3': {'type': 'DOUBLE', 'name': 'customer_data_input-3', 'values': [1.0, 0.0], 'index': 3}, 'customer_data_input-2': {'type': 'DOUBLE', 'name': 'customer_data_input-2', 'values': [2.0, 1.0, 4.0, 3.0], 'index': 2}, 'customer_data_input-1': {'type': 'DOUBLE', 'name': 'customer_data_input-1', 'values': [135000.0, 180000.0, 267750.0, 216000.0, 225000.0, 234000.0, 252000.0, 270000.0, 360000.0, 157500.0, 337500.0, 166500.0, 202500.0, 211500.0, 427500.0, 229500.0, 247500.0, 67500.0, 279000.0, 297000.0, 315000.0, 90000.0, 405000.0, 112500.0, 126000.0, 1



## Label Data Fields

Currently, our data does not have meaningful input and output names. Let's assign labels to them.

In [61]:
trustyaiClient.label_data_fields('2-BiasMonitoring/scripts/apply_name_mapping.sh')

Feature and output name mapping successfully applied.Feature and output name mapping successfully applied.customer_data_input-0 -> Number of Children
customer_data_input-1 -> Total Income
customer_data_input-2 -> Number of Total Family Members
customer_data_input-3 -> Is Male-Identifying?
customer_data_input-4 -> Owns Car?
customer_data_input-5 -> Owns Realty?
customer_data_input-6 -> Is Partnered?
customer_data_input-7 -> Is Employed?
customer_data_input-8 -> Live with Parents?
customer_data_input-9 -> Age
customer_data_input-10 -> Length of Employment?


## Check Model Fairness

As a sanity check, let's ensure that our models are fair over the training data. 

In [63]:
alpha_data = {
        "modelId": "demo-loan-nn-onnx-alpha",
        "protectedAttribute": "Is Male-Identifying?",
        "privilegedAttribute": 1.0,
        "unprivilegedAttribute": 0.0,
        "outcomeName": "Will Default?",
        "favorableOutcome": 0,
        "batchSize": 5000
    }

beta_data = {
        "modelId": "demo-loan-nn-onnx-beta",
        "protectedAttribute": "Is Male-Identifying?",
        "privilegedAttribute": 1.0,
        "unprivilegedAttribute": 0.0,
        "outcomeName": "Will Default?",
        "favorableOutcome": 0,
        "batchSize": 5000
    }

alpha_fairness_metric = trustyaiClient.get_metric_request(
    data=alpha_data, 
    metric='group/fairness/spd', 
    reoccuring=False
)
beta_fairness_metric = trustyaiClient.get_metric_request(
    data=beta_data, 
    metric='group/fairness/spd',
    reoccuring=False
)
print(f'Cumulative fairness_metric for {alpha_data}: {alpha_fairness_metric}')
print(f'Cumulative fairness_metric for {beta_data}: {beta_fairness_metric}') 

Cumulative fairness_metric for {'modelId': 'demo-loan-nn-onnx-alpha', 'protectedAttribute': 'Is Male-Identifying?', 'privilegedAttribute': 1.0, 'unprivilegedAttribute': 0.0, 'outcomeName': 'Will Default?', 'favorableOutcome': 0, 'batchSize': 5000}: {"timestamp":"2024-02-07T00:04:22.301+00:00","type":"metric","value":0.010840108401084014,"namedValues":null,"specificDefinition":"The SPD of 0.010840 indicates that the likelihood of Group:Is Male-Identifying?=1.0 receiving Outcome:Will Default?=0 was 1.084011 percentage points higher than that of Group:Is Male-Identifying?=0.0.","name":"SPD","id":"7705c04d-1edd-44fe-ae3f-9a28bf94bf62","thresholds":{"lowerBound":-0.1,"upperBound":0.1,"outsideBounds":false}}
Cumulative fairness_metric for {'modelId': 'demo-loan-nn-onnx-beta', 'protectedAttribute': 'Is Male-Identifying?', 'privilegedAttribute': 1.0, 'unprivilegedAttribute': 0.0, 'outcomeName': 'Will Default?', 'favorableOutcome': 0, 'batchSize': 5000}: {"timestamp":"2024-02-07T00:04:22.458+00

## Schedule a Fairness Metric Request

While it's great that our models are fair over training data, it's more critical that they remain fair over unseen real-world data. Let's schedule some metric requests which will be computed at recurring intervals throughout deployment. The following cell will return `request_id`'s which can be used to later delete these scheduled requests.

In [None]:
alpha_request_id = trustyaiClient.get_metric_request(
    data=alpha_data, 
    metric='group/fairness/spd', 
    reoccuring=True
)
beta_request_id = trustyaiClient.get_metric_request(
    data=beta_data, 
    metric='group/fairness/spd',
    reoccuring=True
)

## Schedule an Identity Metric Request

Let's also monitor the average values of various data fields over time, to see the average ratio of loan-payback to loan-default predictions, as well as the average ratio of male-identifying to non-male-identifying applicants.

In [None]:
for model in [alpha_model, beta_model]:
    for field in ["Is Male-Identifying?" "Will Default?"]:
        data = {
            "columnName": f"{field}",
            "batchSize": 250,
            "modelId": f"{model}"
        }

        trustyaiClient.get_metric_request(data, 'identity', reoccuring=True)

[{'metrics': {'scheduledMetadata': {'metricCounts': {}}}, 'data': {'inputSchema': {'items': {'customer_data_input-6': {'type': 'DOUBLE', 'name': 'customer_data_input-6', 'values': [1.0, 0.0], 'index': 6}, 'customer_data_input-5': {'type': 'DOUBLE', 'name': 'customer_data_input-5', 'values': [0.0, 1.0], 'index': 5}, 'customer_data_input-4': {'type': 'DOUBLE', 'name': 'customer_data_input-4', 'values': [1.0, 0.0], 'index': 4}, 'customer_data_input-3': {'type': 'DOUBLE', 'name': 'customer_data_input-3', 'values': [1.0, 0.0], 'index': 3}, 'customer_data_input-2': {'type': 'DOUBLE', 'name': 'customer_data_input-2', 'values': [2.0, 1.0, 4.0, 5.0, 3.0], 'index': 2}, 'customer_data_input-1': {'type': 'DOUBLE', 'name': 'customer_data_input-1', 'values': [135000.0, 292500.0, 180000.0, 267750.0, 216000.0, 225000.0, 234000.0, 252000.0, 270000.0, 360000.0, 450000.0, 148500.0, 157500.0, 337500.0, 166500.0, 202500.0, 211500.0, 427500.0, 229500.0, 247500.0, 67500.0, 279000.0, 297000.0, 315000.0, 90000



## Run Model Inference

Now let's send some real world data to our models and observe whether they remain fair. Again, for the purposes of this demo, we are first converting data in .json format to a pandas DataFrame but we expect most users to skip this step.

In [None]:
# convert real-word data in json format to pandas df
test_df = json_to_df(data_path='2-BiasMonitoring/data/', batch_list=list(range(1, 9)))
display(test_df.head())
test_df.shape

In [6]:
test_df_list = [test_df[i:i+batch_size] for i in range (0, len(test_df), batch_size)]
for batch, id in zip(test_df_list, list(range(0, 10))):
   # convert to json
   json_data = df_to_json(batch, 'customer_data_input', f'2-BiasMonitoring/data/0{id}.json')
   send_data_batch(f'2-BiasMonitoring/data/0{id}.json')

{"timestamp":"2024-02-01T16:43:34.647+00:00","type":"metric","value":-0.0029676404469311524,"namedValues":null,"specificDefinition":"The SPD of -0.002968 indicates that the likelihood of Group:Is Male-Identifying?=1.0 receiving Outcome:Will Default?=0 was -0.296764 percentage points lower than that of Group:Is Male-Identifying?=0.0.","name":"SPD","id":"84716efd-273e-4c6b-917d-f83c2460363e","thresholds":{"lowerBound":-0.1,"upperBound":0.1,"outsideBounds":false}}


In [7]:
request_ids = trustyaiClient.schedule_metric_request(data, 'group/fairness/spd')
print(request_ids)

{"requestId":"d5a75959-9e69-48db-ad9e-4c68723d786a","timestamp":"2024-02-01T16:43:38.092+00:00"}


In [8]:
import pandas as pd
 
data = [
    [
        47.468246185717554,
        575.6911203538863,
        10.844143722475575,
        14.81343667761101

    ], 
    [
        47.45380690750797,
        478.6846214843319,
        13.462184703540503,
        20.764525303373535
    ]
]
# Creating Empty DataFrame and Storing it in variable df
df = pd.DataFrame(data=data)


Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [17]:

header = {'Content-Type': 'application/json'}

data = {"columnName": "Is Male-Identifying?",
        "batchSize": 250,
        "modelId": "demo-loan-nn-onnx-beta"
        }
url = "https://"+TRUSTY_ROUTE+"/metrics/identity/request"

r = requests.post(url, headers=header, data=data)


In [None]:
model_name = 'gaussian_model'
data_tag = 'TRAINING'
name = 'credit_inputs'
data_type = 'FP64'
data = df.values.tolist()

def df_to_json(model_name, data_tag, name, df, path_to_json):
    if str(df.dtypes[0]) == 'float64':
        data_type = 'FP64'
    inputs = [{'name': name, 'shape': list(df.shape), 'datatype': data_type, 'data': df.values.tolist()}]
    data_dict = {'model_name': model_name, 'data_tag': data_tag, 'requests': {'inputs': inputs}}
    
    with open(path_to_json, "w") as outfile: 
        json.dump(data_dict, outfile)

{'model_name': 'gaussian_model', 'data_tag': 'TRAINING', 'requests': {'inputs': [{'name': 'credit_inputs', 'shape': [2, 4], 'datatype': 'FP64', 'data': [[47.468246185717554, 575.6911203538863, 10.844143722475575, 14.81343667761101], [47.45380690750797, 478.6846214843319, 13.462184703540503, 20.764525303373535]]}]}}


In [23]:
metric = 'trustyai_spd'
namespace='model-namespace'
# # range='[1m]'
params = {
            "query": "trustyai_spd{namespace='model-namespace'}"
        }


header = {
    'Authorization': f'Bearer {TOKEN}'
}
# Make the GET request

THANOS_ROUTE = 'https://thanos-querier-openshift-monitoring.apps.rosa.trustyai-2024.hzud.p3.openshiftapps.com'
url = THANOS_ROUTE + "/api/v1/query?"
print(url)
response = requests.get(url, params=params, headers=header, verify=False)
response

# # Check if the request was successful
# if response.status_code == 200:
#             # Parse the JSON response and extract the desired data
#             data = response.json()
#             protected_value = data['data']['result']
#             print(protected_value)
# else:
#     print(response.status_code)
#     print("Error: Failed to fetch data from the API")

https://thanos-querier-openshift-monitoring.apps.rosa.trustyai-2024.hzud.p3.openshiftapps.com/api/v1/query?




<Response [403]>

In [31]:
THANOS_QUERIER_HOST = "thanos-querier-openshift-monitoring.apps.rosa.trustyai-2024.hzud.p3.openshiftapps.com"
TOKEN = 'sha256~jF2A7Lpt6P325M86xNJvq1cYgMMeL6sBhZ4DhXzlalg'


url = "https://" + THANOS_QUERIER_HOST + "/api/v1/query"
params = {
            "query": "trustyai_spd{namespace='model-namespace'}[1m]"
        }
headers = {
            "Authorization": "Bearer " + TOKEN
        }

    # Make the GET request
response = requests.get(url, headers=headers, params=params, verify=False)

import datetime as dt
spd_dict = json.loads(response.text)
spd_values = spd_dict['data']['result'][0]['values']
spd_df = pd.DataFrame(spd_values, columns=['timestamp', 'spd'])

spd_df['timestamp'] = spd_df['timestamp'].apply(lambda epoch: dt.datetime.fromtimestamp(epoch).strftime('%Y-%m-%d %H:%M:%S'))

spd_df




b'{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"trustyai_spd","batch_size":"5000","favorable_value":"0","instance":"10.129.2.126:8080","job":"trustyai-service","metricName":"SPD","model":"demo-loan-nn-onnx-alpha","namespace":"model-namespace","outcome":"Will Default?","pod":"trustyai-service-84ffc86749-k4nf6","privileged":"1","prometheus":"openshift-user-workload-monitoring/user-workload","protected":"Is Male-Identifying?","request":"6b0c426e-5d88-450e-9d51-ba34a8d6926b","service":"trustyai-service","unprivileged":"0"},"values":[[1706807917.936,"-0.0029676404469311524"],[1706807921.936,"-0.0029676404469311524"],[1706807925.936,"-0.0029676404469311524"],[1706807929.936,"-0.0029676404469311524"],[1706807933.936,"-0.0029676404469311524"],[1706807937.936,"-0.0029676404469311524"],[1706807941.936,"-0.0029676404469311524"],[1706807945.936,"-0.0029676404469311524"],[1706807949.936,"-0.0029676404469311524"],[1706807953.936,"-0.0029676404469311524"],[1706807

In [56]:
namespace = 'model-namespace'
metric = 'trustyai_spd'
range = '1m'

spd_df = trustyaiClient.get_metric_data(namespace, metric, range)
spd_df

Unnamed: 0,timestamp,spd
0,2024-02-01 12:18:37,-0.0029676404469311
1,2024-02-01 12:18:41,-0.0029676404469311
2,2024-02-01 12:18:45,-0.0029676404469311
3,2024-02-01 12:18:49,-0.0029676404469311
4,2024-02-01 12:18:53,-0.0029676404469311
5,2024-02-01 12:18:57,-0.0029676404469311
6,2024-02-01 12:19:01,-0.0029676404469311
7,2024-02-01 12:19:05,-0.0029676404469311
8,2024-02-01 12:19:09,-0.0029676404469311
9,2024-02-01 12:19:13,-0.0029676404469311
