This notebook compares the overfitting of Fairlearn Vs AnonFair on a resampled version of the [myocardial infarction dataset](https://archive.ics.uci.edu/dataset/579/myocardial+infarction+complications).

We use sex as the protected attribute.

The initial dataset is balanced, and to induce unfairness in the downstream classifier, we drop half the datapoints that satisfy sex=1  and target_label=0.

Because the dataset is relatively high-dimensional (dims ~= 100) with around 1,000 training points, xgboost overfits perfectly obtaining zero error on the train set.

In [1]:
import dataset_loader
from anonfair import FairPredictor, performance
from anonfair import group_metrics as gm
import xgboost
import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None  # default='warn'

In [2]:
sampler=dataset_loader.resample(1,0,0.5)
train,val,test = dataset_loader.myocardial_infarction(resample=sampler,seed=0)

We now train XGBoost, and specify a fair predictor over the validation set.

In [3]:
classifier = xgboost.XGBClassifier().fit(X=train['data'], y=train['target'])
fpred=FairPredictor(classifier,val)

We call fit to enforce equal opportunity.

In [4]:
fpred.fit(gm.accuracy,gm.equal_opportunity,0.02)

And evaluate fairness on validation data.

In [5]:
fpred.evaluate_fairness()

Unnamed: 0,original,updated
Statistical Parity,0.053529,0.020196
Predictive Parity,0.035982,0.16
Equal Opportunity,0.214719,0.006061
Average Group Difference in False Negative Rate,0.214719,0.006061
Equalized Odds,0.108903,0.01903
Conditional Use Accuracy,0.042487,0.080671
Average Group Difference in Accuracy,0.043367,0.024065
Treatment Equality,0.25,0.285714


And on the test set.

In [6]:
fpred.evaluate_fairness(test)

Unnamed: 0,original,updated
Statistical Parity,0.002239,0.069792
Predictive Parity,0.208333,0.24
Equal Opportunity,0.088235,0.176471
Average Group Difference in False Negative Rate,0.088235,0.176471
Equalized Odds,0.064279,0.112429
Conditional Use Accuracy,0.112137,0.141967
Average Group Difference in Accuracy,0.04495,0.009946
Treatment Equality,0.333333,0.4


We now check validation performance.

In [7]:
fpred.evaluate()

Unnamed: 0,original,updated
Accuracy,0.895765,0.899023
Balanced Accuracy,0.806793,0.793102
F1 score,0.733333,0.725664
MCC,0.679293,0.688249
Precision,0.846154,0.911111
Recall,0.647059,0.602941
ROC AUC,0.904504,0.878107


And on the test set.

In [8]:
fpred.evaluate(test)

Unnamed: 0,original,updated
Accuracy,0.895082,0.862295
Balanced Accuracy,0.790922,0.722636
F1 score,0.719298,0.603774
MCC,0.676716,0.561185
Precision,0.891304,0.842105
Recall,0.602941,0.470588
ROC AUC,0.908724,0.850583


We now run fairlearn on the same data.

In [9]:
from fairlearn.reductions import TruePositiveRateParity, ExponentiatedGradient
mitagator = ExponentiatedGradient(xgboost.XGBClassifier(),TruePositiveRateParity())
mitagator.fit(X=train['data'],y=train['target'],sensitive_features=train['data']['SEX'])

To evaluate fairlearn, we write a helper function to evaluate performance and fairness on train or test, and concat the outputs together.  

In [10]:
def eval(train, classifier=mitagator):
    return pd.concat((performance.evaluate(train['target'], classifier.predict(train['data'])),
                      performance.evaluate_fairness(train['target'], classifier.predict(train['data']), train['groups'])),axis=0)

out = pd.concat((eval(train), eval(test)), axis=1)
out.columns = ['train', 'test']
out

Unnamed: 0,train,test
Accuracy,1.0,0.895082
Balanced Accuracy,1.0,0.790922
F1 score,1.0,0.719298
MCC,1.0,0.676716
Precision,1.0,0.891304
Recall,1.0,0.602941
ROC AUC,1.0,0.790922
Statistical Parity,0.014158,0.002239
Predictive Parity,0.0,0.208333
Equal Opportunity,0.0,0.088235


Evaluating the initially trained baseline classifier we find that, as expected, fairlearn did not alter the performance or unfairness of the classifier.

In [11]:
out = pd.concat((eval(train, classifier), eval(test, classifier)), axis=1)
out.columns = ['train', 'test']
out

Unnamed: 0,train,test
Accuracy,1.0,0.895082
Balanced Accuracy,1.0,0.790922
F1 score,1.0,0.719298
MCC,1.0,0.676716
Precision,1.0,0.891304
Recall,1.0,0.602941
ROC AUC,1.0,0.790922
Statistical Parity,0.014158,0.002239
Predictive Parity,0.0,0.208333
Equal Opportunity,0.0,0.088235
