#### This notebook demonstrates the use of adversarial debiasing algorithm to learn a fair classifier.
Adversarial debiasing [1] is an in-processing technique that learns a classifier to maximize prediction accuracy and simultaneously reduce an adversary's ability to determine the protected attribute from the predictions. This approach leads to a fair classifier as the predictions cannot carry any group discrimination information that the adversary can exploit. We will see how to use this algorithm for learning models with and without fairness constraints and apply them on the Adult dataset.

In [2]:
%matplotlib inline
# Load all necessary packages
import sys

sys.path.append("../")
import matplotlib.pyplot as plt
import tensorflow.compat.v1 as tf
from aif360.algorithms.inprocessing.adversarial_debiasing import AdversarialDebiasing
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import (
    load_preproc_data_adult,
    load_preproc_data_compas,
    load_preproc_data_german,
)
from aif360.datasets import (
    AdultDataset,
    BinaryLabelDataset,
    CompasDataset,
    GermanDataset,
)
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector
from IPython.display import Markdown, display
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import MaxAbsScaler, StandardScaler

tf.disable_eager_execution()

#### Load dataset and set options

In [3]:
# Get the dataset and split into train and test
dataset_orig = load_preproc_data_german(["age", "sex"])

# privileged_groups = [{'sex': 1}]
# unprivileged_groups = [{'sex': 0}]
privileged_groups = [{"age": 1}]
unprivileged_groups = [{"age": 0}]


dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

  df['sex'] = df['personal_status'].replace(status_map)


In [3]:
# privileged_groups = [{"age": 1}]
# unprivileged_groups = [{"age": 0}]

# dataset_orig = GermanDataset(
#     protected_attribute_names=["age"],  # this dataset also contains protected
#     # attribute for "sex" which we do not
#     # consider in this evaluation
#     privileged_classes=[lambda x: x >= 25],  # age >=25 is considered privileged
#     features_to_drop=["personal_status", "sex"],  # ignore sex-related attributes
# )
# dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

In [4]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(
    dataset_orig_train.privileged_protected_attributes,
    dataset_orig_train.unprivileged_protected_attributes,
)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

#### Training Dataset shape

(700, 11)


#### Favorable and unfavorable labels

1.0 2.0


#### Protected attribute names

['age', 'sex']


#### Privileged and unprivileged protected attribute values

[array([1.]), array([1.])] [array([0.]), array([0.])]


#### Dataset feature names

['age', 'sex', 'credit_history=Delay', 'credit_history=None/Paid', 'credit_history=Other', 'savings=500+', 'savings=<500', 'savings=Unknown/None', 'employment=1-4 years', 'employment=4+ years', 'employment=Unemployed']


#### Metric for original training data

In [5]:
# Metric for the original dataset
metric_orig = BinaryLabelDatasetMetric(
    dataset_orig,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

display(Markdown("#### Original dataset"))
print(
    "Overall: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_orig.mean_difference()
)
# Metric for the training dataset

metric_orig_train = BinaryLabelDatasetMetric(
    dataset_orig_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
display(Markdown("#### Original training dataset"))
print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_orig_train.mean_difference()
)
# Metric for the Test dataset

metric_orig_test = BinaryLabelDatasetMetric(
    dataset_orig_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_orig_test.mean_difference()
)

#### Original dataset

Overall: Difference in mean outcomes between unprivileged and privileged groups = -0.149448


#### Original training dataset

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.122178
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.209368


In [6]:
min_max_scaler = MaxAbsScaler()
dataset_orig_train.features = min_max_scaler.fit_transform(dataset_orig_train.features)
dataset_orig_test.features = min_max_scaler.transform(dataset_orig_test.features)
metric_scaled_train = BinaryLabelDatasetMetric(
    dataset_orig_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

### Learn plan classifier without debiasing

In [7]:
# Load post-processing algorithm that equalizes the odds
# Learn parameters with debias set to False
sess = tf.Session()
plain_model = AdversarialDebiasing(
    privileged_groups=privileged_groups,
    unprivileged_groups=unprivileged_groups,
    scope_name="plain_classifier",
    debias=False,
    sess=sess,
)

2024-12-05 21:00:03.120933: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


In [8]:
plain_model.fit(dataset_orig_train)

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
epoch 0; iter: 0; batch classifier loss: 0.713554
epoch 1; iter: 0; batch classifier loss: 0.658058


I0000 00:00:1733432403.541454   30852 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled


epoch 2; iter: 0; batch classifier loss: 0.645264
epoch 3; iter: 0; batch classifier loss: 0.638450
epoch 4; iter: 0; batch classifier loss: 0.601208
epoch 5; iter: 0; batch classifier loss: 0.598833
epoch 6; iter: 0; batch classifier loss: 0.561015
epoch 7; iter: 0; batch classifier loss: 0.651929
epoch 8; iter: 0; batch classifier loss: 0.538543
epoch 9; iter: 0; batch classifier loss: 0.530933
epoch 10; iter: 0; batch classifier loss: 0.558925
epoch 11; iter: 0; batch classifier loss: 0.610113
epoch 12; iter: 0; batch classifier loss: 0.497282
epoch 13; iter: 0; batch classifier loss: 0.545335
epoch 14; iter: 0; batch classifier loss: 0.565533
epoch 15; iter: 0; batch classifier loss: 0.545654
epoch 16; iter: 0; batch classifier loss: 0.539721
epoch 17; iter: 0; batch classifier loss: 0.527085
epoch 18; iter: 0; batch classifier loss: 0.530085
epoch 19; iter: 0; batch classifier loss: 0.557717
epoch 20; iter: 0; batch classifier loss: 0.565869
epoch 21; iter: 0; batch classifier los

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x7f3c86487ed0>

In [9]:
# Apply the plain model to test data
dataset_nodebiasing_train = plain_model.predict(dataset_orig_train)
dataset_nodebiasing_test = plain_model.predict(dataset_orig_test)

In [10]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
metric_dataset_nodebiasing_train = BinaryLabelDatasetMetric(
    dataset_nodebiasing_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_nodebiasing_train.mean_difference()
)

metric_dataset_nodebiasing_test = BinaryLabelDatasetMetric(
    dataset_nodebiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_nodebiasing_test.mean_difference()
)

display(Markdown("#### Plain model - without debiasing - classification metrics"))
classified_metric_nodebiasing_test = ClassificationMetric(
    dataset_orig_test,
    dataset_nodebiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print(
    "Test set: Classification accuracy = %f"
    % classified_metric_nodebiasing_test.accuracy()
)
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5 * (TPR + TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print(
    "Test set: Disparate impact = %f"
    % classified_metric_nodebiasing_test.disparate_impact()
)
print(
    "Test set: Equal opportunity difference = %f"
    % classified_metric_nodebiasing_test.equal_opportunity_difference()
)
print(
    "Test set: Average odds difference = %f"
    % classified_metric_nodebiasing_test.average_odds_difference()
)
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.259542
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.271186


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.703333
Test set: Balanced classification accuracy = 0.552003
Test set: Disparate impact = 0.728814
Test set: Equal opportunity difference = -0.133333
Test set: Average odds difference = -0.273563
Test set: Theil_index = 0.070261


### Apply in-processing algorithm based on adversarial learning

In [12]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()

In [13]:
# Learn parameters with debias set to True
debiased_model = AdversarialDebiasing(
    privileged_groups=privileged_groups,
    unprivileged_groups=unprivileged_groups,
    scope_name="debiased_classifier",
    debias=True,
    sess=sess,
    classifier_num_hidden_units=100,
)

In [14]:
debiased_model.fit(dataset_orig_train)

epoch 0; iter: 0; batch classifier loss: 0.802375; batch adversarial loss: 0.717762
epoch 1; iter: 0; batch classifier loss: 0.748406; batch adversarial loss: 0.710581
epoch 2; iter: 0; batch classifier loss: 0.732363; batch adversarial loss: 0.696123
epoch 3; iter: 0; batch classifier loss: 0.687451; batch adversarial loss: 0.709399
epoch 4; iter: 0; batch classifier loss: 0.649584; batch adversarial loss: 0.702732
epoch 5; iter: 0; batch classifier loss: 0.615525; batch adversarial loss: 0.704097
epoch 6; iter: 0; batch classifier loss: 0.619977; batch adversarial loss: 0.707481
epoch 7; iter: 0; batch classifier loss: 0.616553; batch adversarial loss: 0.715477
epoch 8; iter: 0; batch classifier loss: 0.607726; batch adversarial loss: 0.707984
epoch 9; iter: 0; batch classifier loss: 0.584955; batch adversarial loss: 0.692472
epoch 10; iter: 0; batch classifier loss: 0.570408; batch adversarial loss: 0.686875
epoch 11; iter: 0; batch classifier loss: 0.577319; batch adversarial loss:

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x7f3a163292d0>

In [15]:
# Apply the plain model to test data
dataset_debiasing_train = debiased_model.predict(dataset_orig_train)
dataset_debiasing_test = debiased_model.predict(dataset_orig_test)

In [16]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_nodebiasing_train.mean_difference()
)
print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_nodebiasing_test.mean_difference()
)

# Metrics for the dataset from model with debiasing
display(Markdown("#### Model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(
    dataset_debiasing_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_debiasing_train.mean_difference()
)

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(
    dataset_debiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_debiasing_test.mean_difference()
)


display(Markdown("#### Plain model - without debiasing - classification metrics"))
print(
    "Test set: Classification accuracy = %f"
    % classified_metric_nodebiasing_test.accuracy()
)
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5 * (TPR + TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print(
    "Test set: Disparate impact = %f"
    % classified_metric_nodebiasing_test.disparate_impact()
)
print(
    "Test set: Equal opportunity difference = %f"
    % classified_metric_nodebiasing_test.equal_opportunity_difference()
)
print(
    "Test set: Average odds difference = %f"
    % classified_metric_nodebiasing_test.average_odds_difference()
)
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())


display(Markdown("#### Model - with debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(
    dataset_orig_test,
    dataset_debiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print(
    "Test set: Classification accuracy = %f"
    % classified_metric_debiasing_test.accuracy()
)
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5 * (TPR + TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test)
print(
    "Test set: Disparate impact = %f"
    % classified_metric_debiasing_test.disparate_impact()
)
print(
    "Test set: Equal opportunity difference = %f"
    % classified_metric_debiasing_test.equal_opportunity_difference()
)
print(
    "Test set: Average odds difference = %f"
    % classified_metric_debiasing_test.average_odds_difference()
)
print("Test set: Theil_index = %f" % classified_metric_debiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.259542
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.271186


#### Model - with debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.236695
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.246290


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.703333
Test set: Balanced classification accuracy = 0.552003
Test set: Disparate impact = 0.728814
Test set: Equal opportunity difference = -0.133333
Test set: Average odds difference = -0.273563
Test set: Theil_index = 0.070261


#### Model - with debiasing - classification metrics

Test set: Classification accuracy = 0.710000
Test set: Balanced classification accuracy = 0.567696
Test set: Disparate impact = 0.747422
Test set: Equal opportunity difference = -0.121773
Test set: Average odds difference = -0.238371
Test set: Theil_index = 0.076296



    References:
    [1] B. H. Zhang, B. Lemoine, and M. Mitchell, "Mitigating UnwantedBiases with Adversarial Learning," 
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.