# Bias Evaluation : AIF360
***

Quantification of model bias in terms of fairness against protected groups before and after implementation of mitigation methods

## Terminology
***

***Favorable label:*** A label whose value corresponds to an outcome that provides an advantage to the recipient (such as receiving a loan, being hired for a job, not being arrested)

***Protected attribute:*** An attribute that partitions a population into groups whose outcomes should have parity (such as race, gender, caste, and religion)

***Privileged value (of a protected attribute):*** A protected attribute value indicating a group that has historically been at a systemic advantage

***Fairness metric:*** A quantification of unwanted bias in training data or models

***Discrimination/unwanted bias:*** Although bias can refer to any form of preference, fair or unfair, our focus is on undesirable bias or discrimination, which is when specific privileged groups are placed at a systematic advantage and specific unprivileged groups are placed at a systematic disadvantage. This relates to attributes such as race, gender, age, and sexual orientation.


## Structure of Evaluation & Intervention
***
<img src="images/aif360_pipeline.png" width="700" height="500" align="center"/>

### Three Perspectives of Fairness in ML algorithms
***

[linkedin article](https://www.linkedin.com/pulse/whats-new-deep-learning-research-reducing-bias-models-jesus-rodriguez/)

***1. Data vs Mode***

Fairness may be quantified in the training dataset or in the learned model

***2. Group vs Individual***

Group fairness partitions a population into groups defined by protected attributes and seeks for some statistical measure to be equal across all groups. Individual fairness seeks for similar individuals to be treated similarly.


***3. WAE vs WYSIWYG (We are all equal vs What you see is what you get)***

WAE says that fairness is an equal distirbution of skills and opportunities among the participants in an ML task, attributing differences in outcome distributions to structural bias and not a difference in distribution to ability. WYSIWYG says that observations reflect ability with respect to a task.

> If the application follows the WAE worldview, then the demographic parity metrics should be used: disparate_impact and statistical_parity_difference.  If the application follows the WYSIWYG worldview, then the equality of odds metrics should be used: average_odds_difference and average_abs_odds_difference.  Other group fairness metrics (some are often labeled equality of opportunity) lie in-between the two worldviews and may be used appropriately: false_negative_rate_ratio, false_negative_rate_difference, false_positive_rate_ratio, false_positive_rate_difference, false_discovery_rate_ratio, false_discovery_rate_difference, false_omission_rate_ratio, false_omission_rate_difference, error_rate_ratio, and error_rate_difference. 

### 2. Evaluation of Bias in Models

<center><b>Average Odds Difference:</b> $ \tfrac{1}{2}\left[(FPR_{D = \text{unprivileged}} - FPR_{D = \text{privileged}}) + (TPR_{D = \text{privileged}} - TPR_{D = \text{unprivileged}}))\right] $ </center>


<br><br>


<center><b>Statistical Parity Difference:    </b>$ Pr(\hat{Y} = 1 | D = \text{unprivileged}) - Pr(\hat{Y} = 1 | D = \text{privileged}) $</center>


<br><br>


<center><b>Equal Opportunity Difference:    </b>$ TPR_{D = \text{unprivileged}} - TPR_{D = \text{privileged}} $</center>


<br><br>


<center><b>Theil Index:    </b>$ \frac{1}{n}\sum_{i=1}^n\frac{b_{i}}{\mu}\ln\frac{b_{i}}{\mu}, \text{with} b_i = \hat{y}_i - y_i + 1 $</center>


<br><br>


<center><b>Disparate Impact:    </b> $ \frac{Pr(Y = 1 | D = \text{unprivileged})} {Pr(Y = 1 | D = \text{privileged})}$</center>

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
sns.set(context='talk', style='whitegrid')
from IPython.display import display, Markdown

from tensorflow import keras
from sklearn.metrics import RocCurveDisplay
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, MaxAbsScaler, RobustScaler
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

import tensorflow as tf
from tensorflow.keras.optimizers  import Adam, Adagrad, SGD, RMSprop

# from aif360.sklearn.metrics import mdss_bias_scan, mdss_bias_score
import aif360
import utilities
import global_variables as gv
import fairness_helpers as fh

In [2]:
from aif360.datasets import StandardDataset, BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.sklearn import metrics as mt
from aif360.explainers import MetricTextExplainer, MetricJSONExplainer

### load data

#### configure datasets according to aif360 library
<ol>
    <li> Binarize the sensitive attribute: 1 set to privileged group, 0 to unprivileged group</li>
    <li> Binarize the label columns: 1 is the positive outcome and 0 else</li>
    <li> Set the sensitive attribute as index</li>

In [3]:
df = pd.read_csv('data/binary_full.csv')
pd.set_option('display.max_columns', None)
df.drop('Unnamed: 0', axis=1, inplace=True)
df['sex-binary']=df['sex'].map(gv.binary_sex)

input_cols = df.iloc[:,:61].columns.to_list()
X1 = df.loc[:,input_cols+['CVD']+['sex-binary']].set_index('sex-binary')
X11 = df.loc[:,input_cols+['CVD']+['sex-binary']]
X2 = df.loc[:,input_cols+['CVD']+['race-binary']].set_index('race-binary')
X22 = df.loc[:,input_cols+['CVD']+['race-binary']]

X3 = df.loc[:,input_cols+['CVD']]
X3['age-binary'] = np.where((df['age']>=50)&(df['age']<70),1,0)
X33 = X3
X3 = X3.set_index('age-binary')
X4 = df.loc[:,input_cols+['CVD']+['age']].set_index('age')

# X1.drop(columns=gv.protected_attributes[0], axis=1, inplace=True)
# X2.drop(columns=gv.protected_attributes[1], axis=1, inplace=True)
# X3.drop(columns=gv.protected_attributes[2], axis=1, inplace=True)
# X4.drop(columns=gv.protected_attributes[2], axis=1, inplace=True)

In [4]:
gv.protected_attributes # sex, race, age

['31-0.0', '21000-0.0', '21003-0.0']

In [5]:
X11.head()

Unnamed: 0,1319-0.0,1408-0.0,1329-0.0,1448-0.0,1538-0.0,6142-0.0,2050-0.0,1508-0.0,1339-0.0,30710-0.0,1349-0.0,30750-0.0,1468-0.0,20117-0.0,30740-0.0,1160-0.0,2090-0.0,31-0.0,1488-0.0,30850-0.0,4080-0.0,1369-0.0,21000-0.0,1200-0.0,1289-0.0,30790-0.0,845-0.0,48-0.0,30630-0.0,1299-0.0,1220-0.0,1548-0.0,1528-0.0,23099-0.0,49-0.0,30690-0.0,1389-0.0,2654-0.0,1249-0.0,1309-0.0,1379-0.0,1239-0.0,21003-0.0,30780-0.0,1438-0.0,30870-0.0,1359-0.0,30770-0.0,21001-0.0,1458-0.0,23100-0.0,6138-0.0,1418-0.0,1478-0.0,4079-0.0,30760-0.0,23101-0.0,2100-0.0,1428-0.0,30640-0.0,hypertension,CVD,sex-binary
0,0.0,1.0,2.0,3.0,2.0,1.0,2.0,3.0,2.0,0.34,1.0,34.937,3.0,2.0,5.622,7.0,1.0,0.0,6.0,0.508,110.0,1.0,1001.0,3.0,6.0,54.4035,20.9,74.0,1.593,10.0,0.0,2.0,2.0,35.6,102.0,6.477,1.0,6.0,1.0,2.0,1.0,0.0,54.0,3.888,10.0,0.977,2.0,26.339,24.579,3.86,25.0,1.0,3.0,1.0,77.0,1.706,45.2,1.0,0.0,1.211,0,1,0
1,0.0,3.0,2.0,1.0,0.0,1.0,1.0,2.0,2.0,3.94,4.0,40.9,5.0,2.0,5.052,9.0,0.0,1.0,2.0,13.088,166.0,2.0,1001.0,2.0,2.0,15.4,16.0,120.0,1.39,2.0,0.0,2.0,2.47,36.5,113.0,5.512,1.0,7.0,1.0,1.0,2.0,0.0,65.0,3.52,12.0,2.358,3.0,10.701,35.0861,7.0,42.9,3.0,2.0,1.0,91.0,1.173,74.6,0.0,1.0,1.019,1,0,1
2,0.0,3.0,3.0,2.0,1.0,2.0,1.0,2.0,2.0,0.55,1.0,40.0,1.0,0.0,5.31,5.0,0.0,0.0,0.0,0.515,132.0,1.0,1001.0,3.0,2.0,32.1,16.0,66.0,2.005,4.0,0.0,1.0,1.0,29.5,88.0,7.079,1.0,7.0,3.0,4.0,2.0,0.0,69.0,4.227,8.0,0.655,2.0,10.693,19.3835,7.0,15.2,3.0,2.0,1.0,67.0,2.49,36.3,0.0,1.0,1.097,0,0,0
3,3.0,3.0,3.0,3.0,0.0,2.0,1.0,2.0,2.0,0.45,2.0,37.3,4.0,2.0,4.449,7.0,0.0,1.0,5.0,4.675,178.0,2.0,1001.0,1.0,3.0,43.562,18.0,110.0,1.474,2.0,0.0,1.0,2.0,28.5,117.0,5.028,0.0,7.0,1.0,1.0,2.0,1.0,66.0,3.041,10.0,3.108,2.0,25.317,35.1281,7.0,31.7,3.0,2.0,1.0,84.0,1.169,79.6,0.0,3.0,0.923,0,0,1
4,0.0,3.0,2.0,1.0,0.0,5.0,2.0,2.0,2.0,0.75,2.0,32.2,1.0,2.0,4.616,6.0,0.0,1.0,3.04,20.162,178.0,1.0,1001.0,3.0,1.0,71.11,22.38,94.0,2.149,1.0,0.0,2.0,2.0,24.8,100.0,7.958,1.0,7.0,2.0,1.0,1.0,0.0,48.0,4.983,8.0,1.173,1.0,26.523,25.8866,1.0,20.1,1.0,2.0,1.0,88.0,2.053,61.0,0.0,3.0,1.443,0,0,1


In [6]:
# transform features

X111 = utilities.transform_features(X11, 'CVD') # excluding protected attribute cols
X222 = utilities.transform_features(X22, 'CVD')
X333 = utilities.transform_features(X33, 'CVD')

X111_full = pd.concat([X111, X11.loc[:,'sex-binary']], axis=1)
X222_full = pd.concat([X222, X22.loc[:,'race-binary']], axis=1)
X333_full = pd.concat([X333, X33.loc[:,'age-binary']], axis=1)

### DataFrames

#### Sex-binary

<ol>
    <li><b>X1</b>: df indexed by protected attribute, includes CVD output</li>
    <li><b>X11</b>: df with all inputs + output + protected attribute, normal indexi<b>g</li>
    <li><b>X111</b>: transformed X111, excludes protected attribute</li>
    <li><b>X111_full</b>: X111 + protected attribute</li>
    <li><b>dataset1</b>: aif360 dataset</li>
</ol>

#### Race-binary

<ol>
    <li><b>X2</b>: df indexed by protected attribute, includes CVD output</li>
    <li><b>X22</b>: df with all inputs + output + protected attribute, normal indexing</li>
    <li><b>X222</b>: transformed X222, excludes protected attribute</li>
    <li><b>X222_full</b>: X222 + protected attribute</li>
    <li><b>dataset2</b>: aif360 dataset</li>
</ol>

#### Age-binary

<ol>
    <li><b>X3</b>: df indexed by protected attribute, includes CVD output</li>
    <li><b>X33</b>: df with all inputs + output + protected attribute, normal indexing</li>
    <li><b>X333</b>: transformed X333, excludes protected attribute</li>
    <li><b>X333_full</b>: X333 + protected attribute</li>
    <li><b>dataset3</b>: aif360 dataset</li>
</ol>

In [7]:
# split into test train sets
# (X1_train, X1_test,
#  y1_train, y1_test) = train_test_split(X11, y11, train_size=0.7, random_state=42)

# (X2_train, X2_test,
#  y2_train, y2_test) = train_test_split(X22, y22, train_size=0.7, random_state=42)

# (X3_train, X3_test,
#  y3_train, y3_test) = train_test_split(X33, y33, train_size=0.7, random_state=42)

In [8]:
# print(X1_train.shape, X1_test.shape, y1_train.shape, y1_test.shape)
# print(y1_train.value_counts(), '\n', y1_test.value_counts())

In [9]:
# print(X1_train.index.value_counts())

### Evaluate bias in original dataset: Disparate impact ratio

#### Interpretation:

<ul>
    <li> output range=[0,1]</li>
    <li> a higher value == more fair related to the given protected attribute</li>
    <li> x>0.8 is considered acceptable bias</li>

In [11]:
mt.disparate_impact_ratio(X1['CVD'], prot_attr='sex-binary')

0.5661578163243346

In [12]:
mt.disparate_impact_ratio(X2['CVD'], prot_attr='race-binary')

0.5438220418530659

In [13]:
mt.disparate_impact_ratio(X3['CVD'], prot_attr='age-binary')

0.31036008621940697

> disparate impact ratios indicate that all three investigated protected attributes possess significant bias

### load model

In [14]:
model = keras.models.load_model('saved_models/mlp_binary_1.h5')
model.compile(loss='categorical_hinge',
              optimizer=SGD(learning_rate=0.0005),
              metrics=['acc',tf.keras.metrics.AUC(), tf.keras.metrics.Recall()])

### Protected Attribute: Sex

In [15]:
result = fh.get_fairness(X1, 'sex-binary')
result



Unnamed: 0,sex-binary
average_odds_difference,-0.043569
disparate_impact_ratio,0.068356
equal_opportunity_difference,-0.070846
statistical_parity_difference,-0.023181


In [16]:
result = fh.get_fairness(X2, 'race-binary')
result



Unnamed: 0,race-binary
average_odds_difference,-0.016038
disparate_impact_ratio,0.395272
equal_opportunity_difference,-0.027353
statistical_parity_difference,-0.007668


In [17]:
result = fh.get_fairness(X3, 'age-binary')
result



Unnamed: 0,age-binary
average_odds_difference,-0.021265
disparate_impact_ratio,0.103662
equal_opportunity_difference,-0.032315
statistical_parity_difference,-0.014005


In [18]:
dataset1 = StandardDataset(X111_full, 
                          label_name='CVD', 
                          favorable_classes=[1], 
                          protected_attribute_names=['sex-binary'], 
                          privileged_classes=[[1]])
dataset2 = StandardDataset(X222_full, 
                          label_name='CVD', 
                          favorable_classes=[1], 
                          protected_attribute_names=['race-binary'], 
                          privileged_classes=[[1]])
dataset3 = StandardDataset(X333_full, 
                          label_name='CVD', 
                          favorable_classes=[1], 
                          protected_attribute_names=['age-binary'], 
                          privileged_classes=[[1]])

#### Sex

#### metrics for original training data

In [19]:
privileged_groups, unprivileged_groups = fh.get_att_privilege_groups('sex-binary')

In [20]:
# Get the dataset and split into train and test
dataset_orig_train, dataset_orig_vt = dataset1.split([0.7], shuffle=True)
dataset_orig_valid, dataset_orig_test = dataset_orig_vt.split([0.5], shuffle=True)

In [21]:
dataset_orig_test

               instance weights  features                                      \
                                                                                
                                 1319-0.0 1408-0.0 1329-0.0 2050-0.0 1339-0.0   
instance names                                                                  
431818                      1.0  0.629210      1.0      2.0      1.0      1.0   
488036                      1.0 -0.510709      3.0      3.0      1.0      2.0   
433182                      1.0 -0.510709      3.0      1.0      1.0      2.0   
341291                      1.0  0.059250      2.0      2.0      1.0      2.0   
299464                      1.0  0.059250      3.0      1.0      1.0      3.0   
...                         ...       ...      ...      ...      ...      ...   
160626                      1.0  0.059250      2.0      3.0      1.0      2.0   
480087                      1.0  0.059250      2.0      2.0      2.0      4.0   
80509                       

In [22]:
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

metric_orig_valid = BinaryLabelDatasetMetric(dataset_orig_valid, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
display(Markdown("#### Original validation dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_valid.mean_difference())

metric_orig_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
display(Markdown("#### Original test dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test.mean_difference())

#### Original training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.052469


#### Original validation dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.053736


#### Original test dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.048971


In [24]:
print('1-min(DI, 1/DI):', fh.get_disparity_index(metric_orig_train.disparate_impact()).round(3))

1-min(DI, 1/DI): 0.434


#### Race

In [25]:
privileged_groups2, unprivileged_groups2 = fh.get_att_privilege_groups('race-binary')

# Get the dataset and split into train and test
dataset_orig_train2, dataset_orig_vt2 = dataset2.split([0.7], shuffle=True)
dataset_orig_valid2, dataset_orig_test2 = dataset_orig_vt2.split([0.5], shuffle=True)

In [26]:
metric_orig_train2 = BinaryLabelDatasetMetric(dataset_orig_train2, 
                             unprivileged_groups=unprivileged_groups2,
                             privileged_groups=privileged_groups2)
display(Markdown("#### Original training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train2.mean_difference())

metric_orig_valid2 = BinaryLabelDatasetMetric(dataset_orig_valid2, 
                             unprivileged_groups=unprivileged_groups2,
                             privileged_groups=privileged_groups2)
display(Markdown("#### Original validation dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_valid2.mean_difference())

metric_orig_test2 = BinaryLabelDatasetMetric(dataset_orig_test2, 
                             unprivileged_groups=unprivileged_groups2,
                             privileged_groups=privileged_groups2)
display(Markdown("#### Original test dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test2.mean_difference())

#### Original training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.043406


#### Original validation dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.044774


#### Original test dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.038882


In [27]:
print('1-min(DI, 1/DI):', fh.get_disparity_index(metric_orig_train2.disparate_impact()).round(3))

1-min(DI, 1/DI): 0.461


#### Age

In [29]:
privileged_groups3, unprivileged_groups3 = fh.get_att_privilege_groups('age-binary')

# Get the dataset and split into train and test
dataset_orig_train3, dataset_orig_vt3 = dataset3.split([0.7], shuffle=True)
dataset_orig_valid3, dataset_orig_test3 = dataset_orig_vt3.split([0.5], shuffle=True)

In [30]:
metric_orig_train3 = BinaryLabelDatasetMetric(dataset_orig_train3, 
                             unprivileged_groups=unprivileged_groups3,
                             privileged_groups=privileged_groups3)
display(Markdown("#### Original training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train3.mean_difference())

metric_orig_valid3 = BinaryLabelDatasetMetric(dataset_orig_valid3, 
                             unprivileged_groups=unprivileged_groups3,
                             privileged_groups=privileged_groups3)
display(Markdown("#### Original validation dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_valid3.mean_difference())

metric_orig_test3 = BinaryLabelDatasetMetric(dataset_orig_test3, 
                             unprivileged_groups=unprivileged_groups3,
                             privileged_groups=privileged_groups3)
display(Markdown("#### Original test dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test3.mean_difference())

#### Original training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.075218


#### Original validation dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.077493


#### Original test dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.077040


In [31]:
print('1-min(DI, 1/DI):', fh.get_disparity_index(metric_orig_train3.disparate_impact()).round(3))

1-min(DI, 1/DI): 0.687
