#### This notebook demonstrates the use of adversarial debiasing algorithm to learn a fair classifier.
Adversarial debiasing [1] is an in-processing technique that learns a classifier to maximize prediction accuracy and simultaneously reduce an adversary's ability to determine the protected attribute from the predictions. This approach leads to a fair classifier as the predictions cannot carry any group discrimination information that the adversary can exploit. We will see how to use this algorithm for learning models with and without fairness constraints and apply them on the Adult dataset.

In [1]:

# Load all necessary packages
import sys
sys.path.append("../")

from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric

from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_adult, load_preproc_data_compas, load_preproc_data_german
from aif360.algorithms.inprocessing.multi_attribute_adversarial_debiasing import *

from sklearn.preprocessing import MinMaxScaler
from IPython.display import Markdown, display
from sklearn.metrics import roc_auc_score

`load_boston` has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as
investigated in [1], the authors of this dataset engineered a
non-invertible variable "B" assuming that racial self-segregation had a
positive impact on house prices [2]. Furthermore the goal of the
research that led to the creation of this dataset was to study the
impact of air quality but it did not give adequate demonstration of the
validity of this assumption.

The scikit-learn maintainers therefore strongly discourage the use of
this dataset unless the purpose of the code is to study and educate
about ethical issues in data science and machine learning.

In this special case, you can fetch the dataset from the original
source::

    import pandas as pd
    import numpy as np

    data_url = "http://lib.stat.cmu.edu/datasets/boston"
    raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
    data = np.hstack([raw_df.values[::2, :], raw_df

### Helpers

In [None]:
def accuracy(predicted_labels, true):
    return tf.reduce_mean(tf.cast(tf.equal(predicted_labels, true), dtype=tf.float32))

### Load dataset

In [3]:
# Get the dataset and split into train and test
dataset_orig = load_preproc_data_compas()#load_preproc_data_adult()#

privileged_groups = [{'sex': 1}, {'race': 1}]
unprivileged_groups = [{'sex': 0}, {'race': 0}]

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

In [4]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

#### Training Dataset shape

(3694, 10)


#### Favorable and unfavorable labels

0.0 1.0


#### Protected attribute names

['sex', 'race']


#### Privileged and unprivileged protected attribute values

[array([1.]), array([1.])] [array([0.]), array([0.])]


#### Dataset feature names

['sex', 'race', 'age_cat=25 to 45', 'age_cat=Greater than 45', 'age_cat=Less than 25', 'priors_count=0', 'priors_count=1 to 3', 'priors_count=More than 3', 'c_charge_degree=F', 'c_charge_degree=M']


#### Evaluate metric for original training data

In [5]:
for unprivileged_group, privileged_group in zip (unprivileged_groups, privileged_groups):
    print('Sensitive attribute : ', list(unprivileged_group.keys())[0], '\n')


    # Metric for the original dataset
    metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                                unprivileged_groups=[unprivileged_group],
                                                privileged_groups=[privileged_group])
    
    print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())
    metric_orig_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                                                unprivileged_groups=[unprivileged_group],
                                                privileged_groups=[privileged_group])
    print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test.mean_difference())
    print('\n'*2)

sex
Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.134150
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.136590


race
Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.158091
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.072146




In [6]:
min_max_scaler = MinMaxScaler()#MaxAbsScaler()
dataset_orig_train.features = min_max_scaler.fit_transform(dataset_orig_train.features)
dataset_orig_test.features = min_max_scaler.transform(dataset_orig_test.features)



### Debias multiple sensitive attributes 

In [8]:
# Example dataset
features = dataset_orig_train.features
labels = dataset_orig_train.labels
protected_attributes = dataset_orig_train.protected_attributes

dataset_orig_train.protected_attribute_names

['sex', 'race']

In [2]:
# Define training parameters
nb_pretrain = 10
batch_size = 64
total_epochs = 30

In [25]:
# Create a trainer
trainer = AdversarialDebiasor(loss_weights = [2, 5])#[2, 4])##.5,.5

# Pretrain the classifier model
trainer.pretrain_classifier(features, labels, num_epochs=nb_pretrain, batch_size=batch_size)

Pretraining Classifier - Epoch 1, Loss: 0.6997118592262268
Pretraining Classifier - Epoch 2, Loss: 0.7821968793869019
Pretraining Classifier - Epoch 3, Loss: 0.5729279518127441
Pretraining Classifier - Epoch 4, Loss: 0.6386002898216248
Pretraining Classifier - Epoch 5, Loss: 0.46196913719177246
Pretraining Classifier - Epoch 6, Loss: 0.6169740557670593
Pretraining Classifier - Epoch 7, Loss: 0.5328130722045898
Pretraining Classifier - Epoch 8, Loss: 0.5376180410385132
Pretraining Classifier - Epoch 9, Loss: 0.5474033951759338
Pretraining Classifier - Epoch 10, Loss: 0.6379567980766296


In [26]:
# Evaluate metrics

metrics = trainer.get_classification_metrics(dataset_orig_test)

for key, classified_metric_debiasing_test in metrics.items():
    print('Sensitive attribute : ', key)

    eod = classified_metric_debiasing_test.equal_opportunity_difference()
    aod = classified_metric_debiasing_test.average_odds_difference()
    ti = classified_metric_debiasing_test.theil_index()
    print('\tEqual opportunity diff. :', eod)
    print('\tAv. Odds diff.          :', aod, '\n'*2)


predicted_labels = trainer.predict(dataset_orig_test.features)
true_labels = dataset_orig_test.labels

TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)

# Should not change since only trained adversary
print('\nAccuracy        : ', accuracy(predicted_labels, true_labels).numpy())
print('Balanced accuracy : ', bal_acc_debiasing_test)
print('AUC score         : ', roc_auc_score(true_labels, trainer.predict_proba(dataset_orig_test.features)))


Sensitive attribute :  sex
	Equal opportunity diff. : -0.1346140883046849
	Av. Odds diff.          : -0.14690302396562968 


Sensitive attribute :  race
	Equal opportunity diff. : -0.10410880506320597
	Av. Odds diff.          : -0.1449018645342491 



Accuracy        :  0.6439394
Balanced accuracy :  0.6373624259216502
AUC score         :  0.6721495869570795


In [14]:
# pretrain the adversary on trained classifier's outputs
trainer.pretrain_adversary(features, labels, protected_attributes, num_epochs=nb_pretrain, batch_size=batch_size)

Pretraining Adversary - Epoch 1, Loss: 4.554468631744385
Pretraining Adversary - Epoch 2, Loss: 4.257065296173096
Pretraining Adversary - Epoch 3, Loss: 3.8502635955810547
Pretraining Adversary - Epoch 4, Loss: 4.026779651641846
Pretraining Adversary - Epoch 5, Loss: 4.1616621017456055
Pretraining Adversary - Epoch 6, Loss: 3.41196870803833
Pretraining Adversary - Epoch 7, Loss: 3.6854116916656494
Pretraining Adversary - Epoch 8, Loss: 3.7127201557159424
Pretraining Adversary - Epoch 9, Loss: 3.7646265029907227
Pretraining Adversary - Epoch 10, Loss: 3.9085896015167236
Epoch 1, Classifier Loss: 0.8034292459487915, Adversary Loss: 5.14768648147583
Epoch 2, Classifier Loss: 0.7351345419883728, Adversary Loss: 4.93751859664917
Epoch 3, Classifier Loss: 0.5497244596481323, Adversary Loss: 4.192628383636475
Epoch 4, Classifier Loss: 0.6802529096603394, Adversary Loss: 4.399204730987549
Epoch 5, Classifier Loss: 0.6360391974449158, Adversary Loss: 4.44249153137207
Epoch 6, Classifier Loss: 0

In [None]:
# Train both models together
trainer.train(features, labels, protected_attributes, num_epochs=total_epochs, batch_size=batch_size)

predicted_labels = trainer.predict(dataset_orig_test.features)

In [21]:
metrics = trainer.get_classification_metrics(dataset_orig_test)

for key, classified_metric_debiasing_test in metrics.items():
    print('Sensitive attribute : ', key, '\n')

    TPR = classified_metric_debiasing_test.true_positive_rate()
    TNR = classified_metric_debiasing_test.true_negative_rate()
    bal_acc_debiasing_test = 0.5*(TPR+TNR)
    eod = classified_metric_debiasing_test.equal_opportunity_difference()
    aod = classified_metric_debiasing_test.average_odds_difference()
    ti = classified_metric_debiasing_test.theil_index()
    print('\tEqual opportunity diff. :', eod)
    print('\tAv. Odds diff.          :', aod, '\n'*2)
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)



print('Accuracy          : ', accuracy(predicted_labels, true_labels).numpy())
print('Balanced accuracy : ', bal_acc_debiasing_test)
print('AUC score         : ', roc_auc_score(true_labels, trainer.predict_proba(dataset_orig_test.features)))


Sensitive attribute :  sex 

	Equal opportunity diff. : -0.0002471632400853352
	Av. Odds diff.          : 0.008075483386319304 


Sensitive attribute :  race 

	Equal opportunity diff. : 0.04186234408927947
	Av. Odds diff.          : 0.006450227193240843 


Accuracy          :  0.62752527
Balanced accuracy :  0.6203725082736858
AUC score         :  0.6634093037276483


In [17]:
dmetrics = trainer.get_dataset_metrics(dataset_orig_test)

for key, classified_metric_debiasing_test in metrics.items():
    print('Sensitive attribute : ', key)
    print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % classified_metric_debiasing_test.mean_difference())
    print('\n')

Sensitive attribute :  sex
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.026899


Sensitive attribute :  race
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.007242





    References:
    [1] B. H. Zhang, B. Lemoine, and M. Mitchell, "Mitigating UnwantedBiases with Adversarial Learning," 
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.