This notebook compares the overfitting of Fairlearn Vs AnonFair on a resampled version of the [myocardial infarction dataset](https://archive.ics.uci.edu/dataset/579/myocardial+infarction+complications).

We use sex as the protected attribute.

The initial dataset is balanced, and to induce unfairness in the downstream classifier, we drop half the datapoints that satisfy sex=1  and target_label=0.

Because the dataset is relatively high-dimensional (dims ~= 100) with around 1,000 training points, xgboost overfits perfectly obtaining zero error on the train set.

In [1]:
import dataset_loader
from anonfair import FairPredictor, performance
from anonfair import group_metrics as gm
import xgboost
import pandas as pd
import numpy as np

In [2]:
sampler=dataset_loader.resample(1,0,0.5)
train,val,test = dataset_loader.myocardial_infarction(resample=sampler,seed=0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X[X.isnull()] = -1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X[X.isnull()] = -1


We now train XGBoost, and specify a fair predictor over the validation set.

In [3]:
classifier = xgboost.XGBClassifier().fit(X=train['data'], y=train['target'])
fpred=FairPredictor(classifier,val)

We call fit to enforce equal opportunity.

In [4]:
fpred.fit(gm.accuracy,gm.equal_opportunity,0.02)

And evaluate fairness on validation data.

In [5]:
fpred.evaluate_fairness()

Unnamed: 0,original,updated
Class Imbalance,0.00125,0.00125
Demographic Parity,0.035417,0.0025
Disparate Impact,0.7875,0.982143
Average Group Difference in Accuracy,0.006667,0.000417
Average Group Difference in Recall,0.095238,0.006061
Average Group Difference in Conditional Acceptance Rate,0.346667,0.019481
Average Group Difference in Acceptance Rate,0.072381,0.002165
Average Group Difference in Specificity,0.017641,0.000547
Average Group Difference in Conditional Rejectance Rate,0.036719,0.00118
Average Group Difference in Rejection Rate,0.019914,0.000674


And on the test set.

In [6]:
fpred.evaluate_fairness(test)

Unnamed: 0,original,updated
Class Imbalance,0.008494,0.008494
Demographic Parity,0.032728,0.052465
Disparate Impact,0.80848,0.692982
Average Group Difference in Accuracy,0.007911,0.001332
Average Group Difference in Recall,0.058824,0.117647
Average Group Difference in Conditional Acceptance Rate,0.359788,0.62963
Average Group Difference in Acceptance Rate,0.116402,0.148148
Average Group Difference in Specificity,0.031028,0.039502
Average Group Difference in Conditional Rejectance Rate,0.045802,0.065968
Average Group Difference in Rejection Rate,0.015267,0.027458


We now check validation performance.

In [7]:
fpred.evaluate()

Unnamed: 0,original,updated
Accuracy,0.903226,0.906452
Balanced Accuracy,0.800559,0.797338
F1 score,0.736842,0.738739
MCC,0.699747,0.711956
Precision,0.913043,0.953488
Recall,0.617647,0.602941
ROC AUC,0.896512,0.87342


And on the test set.

In [8]:
fpred.evaluate(test)

Unnamed: 0,original,updated
Accuracy,0.870968,0.867742
Balanced Accuracy,0.758751,0.746111
F1 score,0.655172,0.637168
MCC,0.591973,0.578221
Precision,0.791667,0.8
Recall,0.558824,0.529412
ROC AUC,0.886242,0.812105


We now run fairlearn on the same data.

In [9]:
from fairlearn.reductions import TruePositiveRateParity, ExponentiatedGradient
mitagator = ExponentiatedGradient(xgboost.XGBClassifier(),TruePositiveRateParity())
mitagator.fit(X=train['data'],y=train['target'],sensitive_features=train['data']['SEX'])

To evaluate fairlearn, we write a helper function to evaluate performance and fairness on train or test, and concat the outputs together.  

In [10]:
def eval(train, classifier=mitagator):
    return pd.concat((performance.evaluate(train['target'], classifier.predict(train['data'])),
                      performance.evaluate_fairness(train['target'], classifier.predict(train['data']), train['groups'])),axis=0)

out = pd.concat((eval(train), eval(test)), axis=1)
out.columns = ['train', 'test']
out

Unnamed: 0,train,test
Accuracy,1.0,0.870968
Balanced Accuracy,1.0,0.758751
F1 score,1.0,0.655172
MCC,1.0,0.591973
Precision,1.0,0.791667
Recall,1.0,0.558824
ROC AUC,1.0,0.758751
Class Imbalance,0.007343,0.008494
Demographic Parity,0.007343,0.032728
Disparate Impact,0.966901,0.80848


Evaluating the initially trained baseline classifier we find that, as expected, fairlearn did not alter the performance or unfairness of the classifier.

In [11]:
out = pd.concat((eval(train, classifier), eval(test, classifier)), axis=1)
out.columns = ['train', 'test']
out

Unnamed: 0,train,test
Accuracy,1.0,0.870968
Balanced Accuracy,1.0,0.758751
F1 score,1.0,0.655172
MCC,1.0,0.591973
Precision,1.0,0.791667
Recall,1.0,0.558824
ROC AUC,1.0,0.758751
Class Imbalance,0.007343,0.008494
Demographic Parity,0.007343,0.032728
Disparate Impact,0.966901,0.80848
