# (Binary) Classification Fairness Template - Example
Use this template

In [1]:
from fairMLHealth.utils import model_comparison
from joblib import load
import numpy as np
import os
import pandas as pd

# Load (Generate) Data and Models

X (numpy array or similar pandas object): test data to be passed to the models to generate predictions. It's recommended that these be separate data from those used to train the model.

y (numpy array or similar pandas object): target data array corresponding to the test data. It is recommended that the target is not present in the test_data.

models (list or dict-like): the set of trained models to be evaluated. Dict keys assumed as model names. If a list-like object is passed, will set model names relative to their index

protected_attr (numpy array or similar pandas object): protected attributes that may or may not be present in test_data. Note that values must currently be binary or boolean type

In [2]:
data_file = os.path.expanduser("~/data/fairness_and_bias/mimic_model_comparison/binary_classification.joblib")
input_data = load(data_file)
X = input_data.X
y = input_data.y
models = input_data.models

In [3]:
# Generate indicator for protected attribute
lang_cols = [c for c in X.columns if c.startswith("LANGUAGE_")]
eng_cols = ['LANGUAGE_ENGL']
X_lang =  X.loc[:,lang_cols]
english_speaking = X[eng_cols].eq(1).any(axis=1)
protected_attr = english_speaking.astype(int)
protected_attr.name = 'ENGLISH_SPEAKING'

In [4]:
models.keys()

dict_keys(['naive_bayes_model', 'decision_tree_model', 'random_forest_model', 'logit_regression_model', 'xgboost_model'])

## Generate Comparison


Comparisons can be called in one of two ways: through an object-oriented method, or through a wrapper function. The next cell demonstrates the object-oriented version.

In [9]:
comp = model_comparison.fairCompare(test_data=X, target_data=y, protected_attr_data=protected_attr, models=models)
comp.measure_model('naive_bayes_model')

Unnamed: 0,Measure,Value
0,** Group Fairness **,
1,Statistical Parity Difference,-0.0163
2,Disparate Impact Ratio,0.9564
3,Average Odds Difference,-0.0482
4,Equal Opportunity Difference,-0.0747
5,Positive Predictive Parity Difference,0.0518
6,Between-Group AUC Difference,-0.0189
7,Between-Group Balanced Accuracy Difference,-0.0266
8,** Individual Fairness **,
9,Consistency Score,0.6255


The following cell demonstrates use of the wrapper function.

In [8]:
comp.compare_models()

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Unnamed: 0_level_0,naive_bayes_model,decision_tree_model,random_forest_model,logit_regression_model,xgboost_model
Measure,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
** Group Fairness **,,,,,
Statistical Parity Difference,-0.0163,0.0329,0.044,0.0,0.0499
Disparate Impact Ratio,0.9564,1.0899,1.1774,0.0,1.1689
Average Odds Difference,-0.0482,0.0164,0.0197,0.0,0.0204
Equal Opportunity Difference,-0.0747,0.0219,0.0272,0.0,0.0195
Positive Predictive Parity Difference,0.0518,0.0583,0.0271,0.0,0.0174
Between-Group AUC Difference,-0.0189,0.0056,0.0058,-0.0007,0.0012
Between-Group Balanced Accuracy Difference,-0.0266,0.0056,0.0075,0.0,-0.0009
** Individual Fairness **,,,,,
Consistency Score,0.6255,0.6241,0.6864,1.0,0.6516
