This notebook compares the overfitting of Fairlearn Vs AnonFair using random forests and decision trees on the adult dataset.

We use sex as the protected attribute.

Even on this low-dimensional data, the default parameters of scikit-learn cause both decision trees and random forests to overfit. 

This can be adjusted by specifying a low maximimal tree depth. The examples in the Fairlearn documentation typically use a depth of 4 on adult. 

In [1]:
import dataset_loader
from oxonfair import FairPredictor, performance
from oxonfair import group_metrics as gm
import pandas as pd
import numpy as np

from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

In [2]:
train,val,test = dataset_loader.adult()
basetree = DecisionTreeClassifier().fit(X=train['data'], y=train['target'])
baseforest = RandomForestClassifier().fit(X=train['data'], y=train['target'])

We now specify a fair predictors over the validation set.

In [3]:
# The outputs of a decision tree are all 0 or 1, so we add Gaussian noise to allow thresholding to work
ftree=FairPredictor(basetree,val,add_noise=0.001)
fforest=FairPredictor(baseforest,val)

We call fit to enforce equal opportunity.

In [4]:
ftree.fit(gm.accuracy,gm.equal_opportunity,0.02)
fforest.fit(gm.accuracy,gm.equal_opportunity,0.02)

We now focus on trees only.
And evaluate fairness on validation data.

In [5]:
ftree.evaluate_fairness()

Unnamed: 0,original,updated
Statistical Parity,0.180488,0.152779
Predictive Parity,0.115453,0.122845
Equal Opportunity,0.06236,0.007663
Average Group Difference in False Negative Rate,0.06236,0.007663
Equalized Odds,0.079554,0.04427
Conditional Use Accuracy,0.113697,0.125512
Average Group Difference in Accuracy,0.122871,0.127511
Treatment Equality,0.231433,0.46786


And on the test set.

In [6]:
ftree.evaluate_fairness(test)

Unnamed: 0,original,updated
Statistical Parity,0.182991,0.157615
Predictive Parity,0.114898,0.126459
Equal Opportunity,0.056253,0.011652
Average Group Difference in False Negative Rate,0.056253,0.011652
Equalized Odds,0.07854,0.04774
Conditional Use Accuracy,0.113885,0.126001
Average Group Difference in Accuracy,0.125034,0.125877
Treatment Equality,0.269692,0.499027


We now check validation performance.

In [7]:
ftree.evaluate()

Unnamed: 0,original,updated
Accuracy,0.80991,0.806552
Balanced Accuracy,0.73924,0.723428
F1 score,0.60318,0.582538
MCC,0.478201,0.457241
Precision,0.602665,0.602339
Recall,0.603696,0.563997
ROC AUC,0.739219,0.697202


And on the test set.

In [8]:
ftree.evaluate(test)

Unnamed: 0,original,updated
Accuracy,0.806732,0.804029
Balanced Accuracy,0.738907,0.723995
F1 score,0.601217,0.582155
MCC,0.473767,0.454384
Precision,0.593792,0.594296
Recall,0.60883,0.5705
ROC AUC,0.738907,0.695082


We now run fairlearn on the same data.

In [9]:
from fairlearn.reductions import TruePositiveRateParity, ExponentiatedGradient
mitagator = ExponentiatedGradient(DecisionTreeClassifier(),TruePositiveRateParity())
mitagator.fit(X=train['data'],y=train['target'],sensitive_features=train['data']['sex'])

To evaluate fairlearn, we write a helper function to evaluate performance and fairness on train or test, and concat the outputs together.  

In [10]:
def eval(train, classifier=mitagator):
    return pd.concat((performance.evaluate(train['target'], classifier.predict(train['data'])),
                      performance.evaluate_fairness(train['target'], classifier.predict(train['data']), train['groups'])),axis=0)

out = pd.concat((eval(train), eval(test)), axis=1)
out.columns = ['train', 'test']
out

Unnamed: 0,train,test
Accuracy,0.999959,0.807796
Balanced Accuracy,0.999973,0.738434
F1 score,0.999914,0.601189
MCC,0.999888,0.474606
Precision,0.999829,0.59703
Recall,1.0,0.605407
ROC AUC,0.999973,0.738434
Statistical Parity,0.194639,0.184861
Predictive Parity,0.000202,0.088625
Equal Opportunity,0.0,0.041558


Evaluating the initially trained baseline classifier we find that, as expected, fairlearn did not substantially alter the performance or unfairness of the classifier (beyond altering the random seed of the tree).

In [11]:
out = pd.concat((eval(train, basetree), eval(test, basetree)), axis=1)
out.columns = ['train', 'test']
out

Unnamed: 0,train,test
Accuracy,0.999959,0.806732
Balanced Accuracy,0.999914,0.738907
F1 score,0.999914,0.601217
MCC,0.999888,0.473767
Precision,1.0,0.593792
Recall,0.999829,0.60883
ROC AUC,0.999914,0.738907
Statistical Parity,0.194516,0.182991
Predictive Parity,0.0,0.114898
Equal Opportunity,0.000202,0.056253


We now do the same with the random forest classifier.

In [12]:
fforest.evaluate_fairness()

Unnamed: 0,original,updated
Statistical Parity,0.171365,0.147358
Predictive Parity,0.018927,0.054233
Equal Opportunity,0.08308,0.005127
Average Group Difference in False Negative Rate,0.08308,0.005127
Equalized Odds,0.075989,0.029586
Conditional Use Accuracy,0.059969,0.08343
Average Group Difference in Accuracy,0.109057,0.109686
Treatment Equality,0.120955,0.168906


In [13]:
fforest.evaluate_fairness(test)

Unnamed: 0,original,updated
Statistical Parity,0.176631,0.141287
Predictive Parity,0.028393,0.074746
Equal Opportunity,0.074845,0.020783
Average Group Difference in False Negative Rate,0.074845,0.020783
Equalized Odds,0.075552,0.035816
Conditional Use Accuracy,0.065604,0.095488
Average Group Difference in Accuracy,0.116183,0.109884
Treatment Equality,0.210769,0.280337


In [14]:
fforest.evaluate()

Unnamed: 0,original,updated
Accuracy,0.853235,0.854791
Balanced Accuracy,0.76678,0.767685
F1 score,0.662142,0.664395
MCC,0.574487,0.578378
Precision,0.737196,0.743329
Recall,0.600958,0.600616
ROC AUC,0.904615,0.891409


In [15]:
fforest.evaluate(test)

Unnamed: 0,original,updated
Accuracy,0.853657,0.851691
Balanced Accuracy,0.767404,0.763884
F1 score,0.663148,0.657721
MCC,0.575743,0.569434
Precision,0.738145,0.734487
Recall,0.601985,0.595483
ROC AUC,0.903594,0.890906


In [16]:
mitagator = ExponentiatedGradient(RandomForestClassifier(),TruePositiveRateParity())
mitagator.fit(X=train['data'],y=train['target'],sensitive_features=train['data']['sex'])

In [17]:
out = pd.concat((eval(train,mitagator), eval(test,mitagator)), axis=1)
out.columns = ['train', 'test']
out

Unnamed: 0,train,test
Accuracy,0.999959,0.854967
Balanced Accuracy,0.999914,0.76979
F1 score,0.999914,0.666792
MCC,0.999888,0.57996
Precision,1.0,0.740493
Recall,0.999829,0.606434
ROC AUC,0.999914,0.76979
Statistical Parity,0.194516,0.178964
Predictive Parity,0.0,0.008129
Equal Opportunity,0.000202,0.098747
