## This notebook demonstrates the use of the adapted Adversarial Debiasing algorithm.

The code has been updated to Tensorflow 2.X and can now be used to learn a fair classifier with respect to multiple sensitive attributes simultaneously.

References:

[Mitigating UnwantedBiases with Adversarial Learning](https://arxiv.org/pdf/1801.07593.pdf)  
  
B. H. Zhang, B. Lemoine, and M. Mitchell,    
AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.

In [1]:

import sys
sys.path.append("../")

from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric

from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_adult, load_preproc_data_compas, load_preproc_data_german
from sklearn.preprocessing import MinMaxScaler
from IPython.display import Markdown, display
from sklearn.metrics import roc_auc_score

# import adapted AdversarialDebiasing code
from aif360.algorithms.inprocessing.multi_attribute_adversarial_debiasing import *



`load_boston` has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as
investigated in [1], the authors of this dataset engineered a
non-invertible variable "B" assuming that racial self-segregation had a
positive impact on house prices [2]. Furthermore the goal of the
research that led to the creation of this dataset was to study the
impact of air quality but it did not give adequate demonstration of the
validity of this assumption.

The scikit-learn maintainers therefore strongly discourage the use of
this dataset unless the purpose of the code is to study and educate
about ethical issues in data science and machine learning.

In this special case, you can fetch the dataset from the original
source::

    import pandas as pd
    import numpy as np

    data_url = "http://lib.stat.cmu.edu/datasets/boston"
    raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
    data = np.hstack([raw_df.values[::2, :], raw_df

### Helper functions

In [2]:
def accuracy(predicted_labels, true):
    return tf.reduce_mean(tf.cast(tf.equal(predicted_labels, true), dtype=tf.float32))

### 1 . Load dataset

In [3]:
# Get the dataset and split into train and test
dataset_orig = load_preproc_data_compas()#load_preproc_data_adult()#

privileged_groups = [{'sex': 1}, {'race': 1}]
unprivileged_groups = [{'sex': 0}, {'race': 0}]

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

In [4]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

#### Training Dataset shape

(3694, 10)


#### Favorable and unfavorable labels

0.0 1.0


#### Protected attribute names

['sex', 'race']


#### Privileged and unprivileged protected attribute values

[array([1.]), array([1.])] [array([0.]), array([0.])]


#### Dataset feature names

['sex', 'race', 'age_cat=25 to 45', 'age_cat=Greater than 45', 'age_cat=Less than 25', 'priors_count=0', 'priors_count=1 to 3', 'priors_count=More than 3', 'c_charge_degree=F', 'c_charge_degree=M']


#### Evaluate metric for original training data

Race and sex are the two sensitive attributes

In [5]:
for unprivileged_group, privileged_group in zip (unprivileged_groups, privileged_groups):
    print('Sensitive attribute : ', list(unprivileged_group.keys())[0], '\n')


    # Metric for the original dataset
    metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                                unprivileged_groups=[unprivileged_group],
                                                privileged_groups=[privileged_group])
    
    print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())
    metric_orig_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                                                unprivileged_groups=[unprivileged_group],
                                                privileged_groups=[privileged_group])
    print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test.mean_difference())
    print('\n'*2)

Sensitive attribute :  sex 

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.116761
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.173738



Sensitive attribute :  race 

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.133658
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.129302





#### Scale features

In [6]:
min_max_scaler = MinMaxScaler()#MaxAbsScaler()
dataset_orig_train.features = min_max_scaler.fit_transform(dataset_orig_train.features)
dataset_orig_test.features = min_max_scaler.transform(dataset_orig_test.features)



### 2 . Debias multiple sensitive attributes 

3 performance metrics and 2 Fairness metrics to evaluate the implementation. 
> <b> Performance metrics:  </b>  
> - Accuracy  
> - Balanced Accuracy  
> - AUC score  

> <b> Fairness metrics:  </b>   
> - Equal opportunity difference  
> - Average odds Difference  

##### NOTE:
Current implementation supports an adversary that tries to enforce equality of Odds  
since it gets both yHat and y during training (see [Mitigating UnwantedBiases with Adversarial Learning](https://arxiv.org/pdf/1801.07593.pdf)  )

In [7]:
# Example dataset
features = dataset_orig_train.features
labels = dataset_orig_train.labels
protected_attributes = dataset_orig_train.protected_attributes

dataset_orig_train.protected_attribute_names

['sex', 'race']

#### Training the model

The model consists of a classifier and an adversary network. At each training iteration, the classifier learns to predict a target label, and the adversary takes the classifier's predictions as well as  
the true target label, and learns to predict multiple sensitive attribute values (e.g. race , sex) for that instance. Each model can be pretrained for several epochs before initiating adversarial learning.  
The adversary can be provided with weights for each sensitive attribute, which is used in the loss computation.

In [8]:
# Define training parameters
nb_pretrain = 10
batch_size = 64
total_epochs = 40

# weights for 'sex', 'race'
# for Adult, use [3, 2]
# for COMPAS, use [3, 5]

loss_weights  = [3, 5 ]

In [9]:
# Create a trainer
trainer = AdversarialDebiasor(loss_weights = loss_weights)

# Pretrain the classifier model
trainer.pretrain_classifier(features, labels, num_epochs=nb_pretrain, batch_size=batch_size)

Pretraining Classifier - Epoch 1, Loss: 0.5559677481651306
Pretraining Classifier - Epoch 2, Loss: 0.6410130262374878
Pretraining Classifier - Epoch 3, Loss: 0.5992349982261658
Pretraining Classifier - Epoch 4, Loss: 0.5575165748596191
Pretraining Classifier - Epoch 5, Loss: 0.5891270637512207
Pretraining Classifier - Epoch 6, Loss: 0.5690146088600159
Pretraining Classifier - Epoch 7, Loss: 0.6537361741065979
Pretraining Classifier - Epoch 8, Loss: 0.5903810858726501
Pretraining Classifier - Epoch 9, Loss: 0.7376126050949097
Pretraining Classifier - Epoch 10, Loss: 0.723280131816864


#### Evaluate test set classification metrics after pre-training classifier

In [10]:
# Evaluate metrics

metrics = trainer.get_classification_metrics(dataset_orig_test)

for key, classified_metric_debiasing_test in metrics.items():
    print('Sensitive attribute : ', key)

    eod = classified_metric_debiasing_test.equal_opportunity_difference()
    aod = classified_metric_debiasing_test.average_odds_difference()
    ti = classified_metric_debiasing_test.theil_index()
    print('\tEqual opportunity diff. :', eod)
    print('\tAv. Odds diff.          :', aod, '\n'*2)


predicted_labels = trainer.predict(dataset_orig_test.features)
true_labels = dataset_orig_test.labels

TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)

# Should not change since only trained adversary
print('\nAccuracy        : ', accuracy(predicted_labels, true_labels).numpy())
print('Balanced accuracy : ', bal_acc_debiasing_test)
print('AUC score         : ', roc_auc_score(true_labels, trainer.predict_proba(dataset_orig_test.features)))


Sensitive attribute :  sex
	Equal opportunity diff. : -0.07632051742723678
	Av. Odds diff.          : -0.097887890434921 


Sensitive attribute :  race
	Equal opportunity diff. : -0.184298473062518
	Av. Odds diff.          : -0.2472622246144233 



Accuracy        :  0.6546717
Balanced accuracy :  0.6529976095872341
AUC score         :  0.7046936833942261


#### Pretrain the adversary on trained classifier's outputs

In [11]:

trainer.pretrain_adversary(features, labels, protected_attributes, num_epochs=nb_pretrain, batch_size=batch_size)

Pretraining Adversary - Epoch 1, Loss: 5.051822662353516
Pretraining Adversary - Epoch 2, Loss: 4.579408645629883
Pretraining Adversary - Epoch 3, Loss: 4.724931716918945
Pretraining Adversary - Epoch 4, Loss: 4.734898090362549
Pretraining Adversary - Epoch 5, Loss: 4.242758274078369
Pretraining Adversary - Epoch 6, Loss: 5.133546829223633
Pretraining Adversary - Epoch 7, Loss: 4.510110855102539
Pretraining Adversary - Epoch 8, Loss: 3.8390564918518066
Pretraining Adversary - Epoch 9, Loss: 5.072484493255615
Pretraining Adversary - Epoch 10, Loss: 4.4203290939331055


#### Train both models together

In [12]:
# Train both models together
trainer.train(features, labels, protected_attributes, num_epochs=total_epochs, batch_size=batch_size)

predicted_labels = trainer.predict(dataset_orig_test.features)

Epoch 1, Classifier Loss: 0.8082079887390137, Adversary Loss: 5.030716896057129
Epoch 2, Classifier Loss: 0.7027792930603027, Adversary Loss: 4.895318984985352
Epoch 3, Classifier Loss: 0.7074283957481384, Adversary Loss: 4.274467468261719
Epoch 4, Classifier Loss: 0.6176442503929138, Adversary Loss: 4.480430603027344
Epoch 5, Classifier Loss: 0.6115831136703491, Adversary Loss: 4.599733352661133
Epoch 6, Classifier Loss: 0.6636387705802917, Adversary Loss: 4.595342636108398
Epoch 7, Classifier Loss: 0.6236449480056763, Adversary Loss: 4.6188178062438965
Epoch 8, Classifier Loss: 0.5074901580810547, Adversary Loss: 4.653450965881348
Epoch 9, Classifier Loss: 0.7306272387504578, Adversary Loss: 4.531330585479736
Epoch 10, Classifier Loss: 0.5685167908668518, Adversary Loss: 4.804622650146484
Epoch 11, Classifier Loss: 0.6385553479194641, Adversary Loss: 4.2814788818359375
Epoch 12, Classifier Loss: 0.6575136780738831, Adversary Loss: 4.911703109741211
Epoch 13, Classifier Loss: 0.597085

#### Evaluate test set classification metrics after debiasing

In [13]:
metrics = trainer.get_classification_metrics(dataset_orig_test)

for key, classified_metric_debiasing_test in metrics.items():
    print('Sensitive attribute : ', key, '\n')

    TPR = classified_metric_debiasing_test.true_positive_rate()
    TNR = classified_metric_debiasing_test.true_negative_rate()
    bal_acc_debiasing_test = 0.5*(TPR+TNR)
    eod = classified_metric_debiasing_test.equal_opportunity_difference()
    aod = classified_metric_debiasing_test.average_odds_difference()
    ti = classified_metric_debiasing_test.theil_index()
    print('\tEqual opportunity diff. :', eod)
    print('\tAv. Odds diff.          :', aod, '\n'*2)
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)



print('Accuracy          : ', accuracy(predicted_labels, true_labels).numpy())
print('Balanced accuracy : ', bal_acc_debiasing_test)
print('AUC score         : ', roc_auc_score(true_labels, trainer.predict_proba(dataset_orig_test.features)))


Sensitive attribute :  sex 

	Equal opportunity diff. : 0.05910887531440889
	Av. Odds diff.          : 0.031224620498273375 


Sensitive attribute :  race 

	Equal opportunity diff. : -0.06104868913857686
	Av. Odds diff.          : -0.1180623770236495 


Accuracy          :  0.6483586
Balanced accuracy :  0.6434343595852354
AUC score         :  0.6887330812340606


In [14]:
dmetrics = trainer.get_dataset_metrics(dataset_orig_test)

for key, classified_metric_debiasing_test in metrics.items():
    print('Sensitive attribute : ', key)
    print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % classified_metric_debiasing_test.mean_difference())
    print('\n')

Sensitive attribute :  sex
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.012273


Sensitive attribute :  race
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.147379


