#### This notebook demonstrates the use of adversarial debiasing algorithm to learn a fair classifier.
Adversarial debiasing [1] is an in-processing technique that learns a classifier to maximize prediction accuracy and simultaneously reduce an adversary's ability to determine the protected attribute from the predictions. This approach leads to a fair classifier as the predictions cannot carry any group discrimination information that the adversary can exploit. We will see how to use this algorithm for learning models with and without fairness constraints and apply them on the Adult dataset.

In [15]:
%matplotlib inline
# Load all necessary packages
import sys
sys.path.append("../")
from aif360.datasets import BinaryLabelDataset
from aif360.datasets import AdultDataset, GermanDataset, CompasDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector

from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_adult, load_preproc_data_compas, load_preproc_data_german

from aif360.algorithms.inprocessing.adversarial_debiasing import AdversarialDebiasing

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, MaxAbsScaler
from sklearn.metrics import accuracy_score

from IPython.display import Markdown, display
import matplotlib.pyplot as plt

import tensorflow.compat.v1 as tf #2.8.0
tf.disable_eager_execution() #useless for tensorflow versions > 2.0

#### Load dataset and set options

In [16]:
# Get the dataset and split into train and test
dataset_orig = load_preproc_data_adult()

privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

sex [1.0]
ooo
df[attr]  [0.0]
priviliged values prima:  [1.0]
priviliged values =vals:  [1.0]
unprivileged_values prima [0.0]
<class 'list'>
unprivileged_values dopo [0.0]
race [1.0]
ooo
df[attr]  [0.0]
priviliged values prima:  [1.0]
priviliged values =vals:  [1.0]
unprivileged_values prima [0.0]
<class 'list'>
unprivileged_values dopo [0.0]


In [17]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

#### Training Dataset shape

(34189, 18)
1.0 0.0
['sex', 'race']
[array([1.]), array([1.])] [array([0.]), array([0.])]
['race', 'sex', 'Age (decade)=10', 'Age (decade)=20', 'Age (decade)=30', 'Age (decade)=40', 'Age (decade)=50', 'Age (decade)=60', 'Age (decade)=>=70', 'Education Years=6', 'Education Years=7', 'Education Years=8', 'Education Years=9', 'Education Years=10', 'Education Years=11', 'Education Years=12', 'Education Years=<6', 'Education Years=>12']


#### Favorable and unfavorable labels

#### Protected attribute names

#### Privileged and unprivileged protected attribute values

#### Dataset feature names

#### Metric for original training data

In [18]:
# Metric for the original dataset
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())
metric_orig_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test.mean_difference())

#### Original training dataset

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.194850
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.193752


In [19]:
min_max_scaler = MaxAbsScaler()
dataset_orig_train.features = min_max_scaler.fit_transform(dataset_orig_train.features)
dataset_orig_test.features = min_max_scaler.transform(dataset_orig_test.features)
metric_scaled_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
display(Markdown("#### Scaled dataset - Verify that the scaling does not affect the group label statistics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_scaled_train.mean_difference())
metric_scaled_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_scaled_test.mean_difference())


#### Scaled dataset - Verify that the scaling does not affect the group label statistics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.194850
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.193752


### Learn plan classifier without debiasing

In [20]:
# Load post-processing algorithm that equalizes the odds
# Learn parameters with debias set to False
sess = tf.compat.v1.Session()
plain_model = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='plain_classifier',
                          debias=False,
                          sess=sess)

In [21]:
plain_model.fit(dataset_orig_train)

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
epoch 0; iter: 0; batch classifier loss: 0.715120
epoch 0; iter: 200; batch classifier loss: 0.393146
epoch 1; iter: 0; batch classifier loss: 0.440599
epoch 1; iter: 200; batch classifier loss: 0.451024
epoch 2; iter: 0; batch classifier loss: 0.473709
epoch 2; iter: 200; batch classifier loss: 0.408319
epoch 3; iter: 0; batch classifier loss: 0.501400
epoch 3; iter: 200; batch classifier loss: 0.406931
epoch 4; iter: 0; batch classifier loss: 0.367959
epoch 4; iter: 200; batch classifier loss: 0.441929
epoch 5; iter: 0; batch classifier loss: 0.537898
epoch 5; iter: 200; batch classifier loss: 0.392074
epoch 6; iter: 0; batch classifier loss: 0.499795
epoch 6; iter: 200; batch classifier loss: 0.421334
epoch 7; iter: 0; batch classifier loss: 0.450431
epoch 7; iter: 200; batch classifier loss: 0.353959
epoch 8; iter: 0; batch classifier loss: 0.365706
epoch 8; iter: 200;

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x2dc9c3472e0>

In [22]:
# Apply the plain model to test data
dataset_nodebiasing_train = plain_model.predict(dataset_orig_train)
dataset_nodebiasing_test = plain_model.predict(dataset_orig_test)

In [23]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
metric_dataset_nodebiasing_train = BinaryLabelDatasetMetric(dataset_nodebiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())

metric_dataset_nodebiasing_test = BinaryLabelDatasetMetric(dataset_nodebiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

display(Markdown("#### Plain model - without debiasing - classification metrics"))
classified_metric_nodebiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.229997
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.232417
Test set: Classification accuracy = 0.802703
Test set: Balanced classification accuracy = 0.668518
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.486013
Test set: Average odds difference = -0.303784
Test set: Theil_index = 0.173769


#### Plain model - without debiasing - classification metrics

### Apply in-processing algorithm based on adversarial learning

In [24]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()

In [25]:
# Learn parameters with debias set to True
debiased_model = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='debiased_classifier',
                          debias=True,
                          sess=sess)

In [26]:
debiased_model.fit(dataset_orig_train)

epoch 0; iter: 0; batch classifier loss: 0.726805; batch adversarial loss: 0.725726
epoch 0; iter: 200; batch classifier loss: 0.399297; batch adversarial loss: 0.663164
epoch 1; iter: 0; batch classifier loss: 0.466386; batch adversarial loss: 0.643637
epoch 1; iter: 200; batch classifier loss: 0.389937; batch adversarial loss: 0.617877
epoch 2; iter: 0; batch classifier loss: 0.393010; batch adversarial loss: 0.666465
epoch 2; iter: 200; batch classifier loss: 0.372852; batch adversarial loss: 0.618113
epoch 3; iter: 0; batch classifier loss: 0.516191; batch adversarial loss: 0.621010
epoch 3; iter: 200; batch classifier loss: 0.445630; batch adversarial loss: 0.653140
epoch 4; iter: 0; batch classifier loss: 0.474143; batch adversarial loss: 0.667452
epoch 4; iter: 200; batch classifier loss: 0.466743; batch adversarial loss: 0.594200
epoch 5; iter: 0; batch classifier loss: 0.492157; batch adversarial loss: 0.674340
epoch 5; iter: 200; batch classifier loss: 0.362627; batch adversa

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x2dc9c42e5e0>

In [27]:
# Apply the plain model to test data
dataset_debiasing_train = debiased_model.predict(dataset_orig_train)
dataset_debiasing_test = debiased_model.predict(dataset_orig_test)

In [28]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

# Metrics for the dataset from model with debiasing
display(Markdown("#### Model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(dataset_debiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train.mean_difference())

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(dataset_debiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())



display(Markdown("#### Plain model - without debiasing - classification metrics"))
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())



display(Markdown("#### Model - with debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test.accuracy())
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_debiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_debiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_debiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_debiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.229997
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.232417
Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.086101
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.098312
Test set: Classification accuracy = 0.802703
Test set: Balanced classification accuracy = 0.668518
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.486013
Test set: Average odds difference = -0.303784
Test set: Theil_index = 0.173769
Test set: Classification accuracy = 0.793626
Test set: Balanced classification accuracy = 0.672513
Test set: Disparate impact = 0.533135
Test set: Equal opportunity difference = -0.072370
Test set: Average odds difference = -0.049277
Test set: Theil_index = 0.170557


#### Model - with debiasing - dataset metrics

#### Plain model - without debiasing - classification metrics

#### Model - with debiasing - classification metrics


    References:
    [1] B. H. Zhang, B. Lemoine, and M. Mitchell, "Mitigating UnwantedBiases with Adversarial Learning," 
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.