#### This notebook demonstrates the use of adversarial debiasing algorithm to learn a fair classifier.
Adversarial debiasing [1] is an in-processing technique that learns a classifier to maximize prediction accuracy and simultaneously reduce an adversary's ability to determine the protected attribute from the predictions. This approach leads to a fair classifier as the predictions cannot carry any group discrimination information that the adversary can exploit. We will see how to use this algorithm for learning models with and without fairness constraints and apply them on the Adult dataset.

In [1]:
%matplotlib inline
# Load all necessary packages
import sys

sys.path.append("../")
from aif360.datasets import BinaryLabelDataset
from aif360.datasets import AdultDataset, GermanDataset, CompasDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector

from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import (
    load_preproc_data_adult,
    load_preproc_data_compas,
    load_preproc_data_german,
)

from aif360.algorithms.inprocessing.adversarial_debiasing import AdversarialDebiasing

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, MaxAbsScaler
from sklearn.metrics import accuracy_score

from IPython.display import Markdown, display
import matplotlib.pyplot as plt

import tensorflow.compat.v1 as tf

import numpy as np
import pandas as pd

tf.disable_eager_execution()

2024-12-31 00:15:25.084913: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-12-31 00:15:25.088849: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-12-31 00:15:25.098399: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1735604125.113653 2746050 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1735604125.118034 2746050 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-31 00:15:25.136161: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU ins

In [2]:
NUM_EPOCHS = 50
SEED = 1

np.random.seed(SEED)

#### Load dataset and set options

In [5]:
# Get the dataset and split into train and test
dataset_orig = load_preproc_data_adult()

privileged_groups = [{"sex": 1}]
unprivileged_groups = [{"sex": 0}]

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

  df['sex'] = df['sex'].replace({'Female': 0.0, 'Male': 1.0})


In [6]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(
    dataset_orig_train.privileged_protected_attributes,
    dataset_orig_train.unprivileged_protected_attributes,
)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

#### Training Dataset shape

(23419, 18)


#### Favorable and unfavorable labels

1.0 0.0


#### Protected attribute names

['sex', 'race']


#### Privileged and unprivileged protected attribute values

[array([1.]), array([1.])] [array([0.]), array([0.])]


#### Dataset feature names

['race', 'sex', 'Age (decade)=10', 'Age (decade)=20', 'Age (decade)=30', 'Age (decade)=40', 'Age (decade)=50', 'Age (decade)=60', 'Age (decade)=>=70', 'Education Years=6', 'Education Years=7', 'Education Years=8', 'Education Years=9', 'Education Years=10', 'Education Years=11', 'Education Years=12', 'Education Years=<6', 'Education Years=>12']


#### Metric for original training data

In [7]:
# Metric for the original dataset
metric_orig_train = BinaryLabelDatasetMetric(
    dataset_orig_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
display(Markdown("#### Original training dataset"))
print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_orig_train.mean_difference()
)
metric_orig_test = BinaryLabelDatasetMetric(
    dataset_orig_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_orig_test.mean_difference()
)

#### Original training dataset

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.196751
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.191610


In [8]:
min_max_scaler = MaxAbsScaler()
dataset_orig_train.features = min_max_scaler.fit_transform(dataset_orig_train.features)
dataset_orig_test.features = min_max_scaler.transform(dataset_orig_test.features)
metric_scaled_train = BinaryLabelDatasetMetric(
    dataset_orig_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
display(
    Markdown(
        "#### Scaled dataset - Verify that the scaling does not affect the group label statistics"
    )
)
print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_scaled_train.mean_difference()
)
metric_scaled_test = BinaryLabelDatasetMetric(
    dataset_orig_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_scaled_test.mean_difference()
)

#### Scaled dataset - Verify that the scaling does not affect the group label statistics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.196751
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.191610


### Learn plan classifier without debiasing

In [9]:
# Load post-processing algorithm that equalizes the odds
# Learn parameters with debias set to False
sess = tf.Session()
plain_model = AdversarialDebiasing(
    privileged_groups=privileged_groups,
    unprivileged_groups=unprivileged_groups,
    scope_name="plain_classifier",
    debias=False,
    sess=sess,
    num_epochs=NUM_EPOCHS,
    seed=SEED,
)

2024-12-31 00:25:42.021540: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


In [10]:
plain_model.fit(dataset_orig_train)

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


epoch 0; iter: 0; batch classifier loss: 0.761656


I0000 00:00:1735604744.598689 2746050 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled


epoch 1; iter: 0; batch classifier loss: 0.404561
epoch 2; iter: 0; batch classifier loss: 0.419484
epoch 3; iter: 0; batch classifier loss: 0.329299
epoch 4; iter: 0; batch classifier loss: 0.403706
epoch 5; iter: 0; batch classifier loss: 0.440959
epoch 6; iter: 0; batch classifier loss: 0.389690
epoch 7; iter: 0; batch classifier loss: 0.486165
epoch 8; iter: 0; batch classifier loss: 0.439088
epoch 9; iter: 0; batch classifier loss: 0.428520
epoch 10; iter: 0; batch classifier loss: 0.458773
epoch 11; iter: 0; batch classifier loss: 0.392172
epoch 12; iter: 0; batch classifier loss: 0.347009
epoch 13; iter: 0; batch classifier loss: 0.463961
epoch 14; iter: 0; batch classifier loss: 0.431903
epoch 15; iter: 0; batch classifier loss: 0.343224
epoch 16; iter: 0; batch classifier loss: 0.403243
epoch 17; iter: 0; batch classifier loss: 0.410553
epoch 18; iter: 0; batch classifier loss: 0.527680
epoch 19; iter: 0; batch classifier loss: 0.349001
epoch 20; iter: 0; batch classifier loss

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x7e1bbe557690>

In [11]:
# Apply the plain model to test data
dataset_nodebiasing_train = plain_model.predict(dataset_orig_train)
dataset_nodebiasing_test = plain_model.predict(dataset_orig_test)

In [12]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
metric_dataset_nodebiasing_train = BinaryLabelDatasetMetric(
    dataset_nodebiasing_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_nodebiasing_train.mean_difference()
)

metric_dataset_nodebiasing_test = BinaryLabelDatasetMetric(
    dataset_nodebiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_nodebiasing_test.mean_difference()
)

display(Markdown("#### Plain model - without debiasing - classification metrics"))
classified_metric_nodebiasing_test = ClassificationMetric(
    dataset_orig_test,
    dataset_nodebiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print(
    "Test set: Classification accuracy = %f"
    % classified_metric_nodebiasing_test.accuracy()
)
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5 * (TPR + TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print(
    "Test set: Disparate impact = %f"
    % classified_metric_nodebiasing_test.disparate_impact()
)
print(
    "Test set: Equal opportunity difference = %f"
    % classified_metric_nodebiasing_test.equal_opportunity_difference()
)
print(
    "Test set: Average odds difference = %f"
    % classified_metric_nodebiasing_test.average_odds_difference()
)
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.213568
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.214702


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.807512
Test set: Balanced classification accuracy = 0.666667
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.469660
Test set: Average odds difference = -0.286802
Test set: Theil_index = 0.174843


### Apply in-processing algorithm based on adversarial learning

In [13]:
tf.reset_default_graph()
sess2 = tf.Session()

In [14]:
# Learn parameters with debias set to True
debiased_model = AdversarialDebiasing(
    privileged_groups=privileged_groups,
    unprivileged_groups=unprivileged_groups,
    scope_name="debiased_classifier",
    adversary_loss_weight=0.1,
    debias=True,
    sess=sess2,
    num_epochs=NUM_EPOCHS,
    seed=SEED,
)

In [15]:
debiased_model.fit(dataset_orig_train)

epoch 0; iter: 0; batch classifier loss: 0.761656; batch adversarial loss: 0.776818
epoch 1; iter: 0; batch classifier loss: 0.835015; batch adversarial loss: 0.772979
epoch 2; iter: 0; batch classifier loss: 0.558384; batch adversarial loss: 0.669674
epoch 3; iter: 0; batch classifier loss: 0.388132; batch adversarial loss: 0.677833
epoch 4; iter: 0; batch classifier loss: 0.433999; batch adversarial loss: 0.570862
epoch 5; iter: 0; batch classifier loss: 0.516740; batch adversarial loss: 0.634307
epoch 6; iter: 0; batch classifier loss: 0.410703; batch adversarial loss: 0.602223
epoch 7; iter: 0; batch classifier loss: 0.491627; batch adversarial loss: 0.619197
epoch 8; iter: 0; batch classifier loss: 0.465060; batch adversarial loss: 0.698741
epoch 9; iter: 0; batch classifier loss: 0.431493; batch adversarial loss: 0.635770
epoch 10; iter: 0; batch classifier loss: 0.466035; batch adversarial loss: 0.617725
epoch 11; iter: 0; batch classifier loss: 0.427084; batch adversarial loss:

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x7e1bbc116690>

In [16]:
# Apply the plain model to test data
dataset_debiasing_train = debiased_model.predict(dataset_orig_train)
dataset_debiasing_test = debiased_model.predict(dataset_orig_test)

In [17]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_nodebiasing_train.mean_difference()
)
print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_nodebiasing_test.mean_difference()
)

# Metrics for the dataset from model with debiasing
display(Markdown("#### Model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(
    dataset_debiasing_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_debiasing_train.mean_difference()
)

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(
    dataset_debiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_debiasing_test.mean_difference()
)


display(Markdown("#### Plain model - without debiasing - classification metrics"))
print(
    "Test set: Classification accuracy = %f"
    % classified_metric_nodebiasing_test.accuracy()
)
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5 * (TPR + TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print(
    "Test set: Disparate impact = %f"
    % classified_metric_nodebiasing_test.disparate_impact()
)
print(
    "Test set: Equal opportunity difference = %f"
    % classified_metric_nodebiasing_test.equal_opportunity_difference()
)
print(
    "Test set: Average odds difference = %f"
    % classified_metric_nodebiasing_test.average_odds_difference()
)
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())


display(Markdown("#### Model - with debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(
    dataset_orig_test,
    dataset_debiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print(
    "Test set: Classification accuracy = %f"
    % classified_metric_debiasing_test.accuracy()
)
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5 * (TPR + TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test)
print(
    "Test set: Disparate impact = %f"
    % classified_metric_debiasing_test.disparate_impact()
)
print(
    "Test set: Equal opportunity difference = %f"
    % classified_metric_debiasing_test.equal_opportunity_difference()
)
print(
    "Test set: Average odds difference = %f"
    % classified_metric_debiasing_test.average_odds_difference()
)
print("Test set: Theil_index = %f" % classified_metric_debiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.213568
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.214702


#### Model - with debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.074779
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.081932


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.807512
Test set: Balanced classification accuracy = 0.666667
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.469660
Test set: Average odds difference = -0.286802
Test set: Theil_index = 0.174843


#### Model - with debiasing - classification metrics

Test set: Classification accuracy = 0.793365
Test set: Balanced classification accuracy = 0.673811
Test set: Disparate impact = 0.605204
Test set: Equal opportunity difference = -0.077800
Test set: Average odds difference = -0.041576
Test set: Theil_index = 0.169465



    References:
    [1] B. H. Zhang, B. Lemoine, and M. Mitchell, "Mitigating UnwantedBiases with Adversarial Learning," 
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.

Print predicted labels for the test set.

In [17]:
pd.Series(dataset_debiasing_test.labels.reshape(-1)).value_counts()

0.0    12043
1.0     2610
Name: count, dtype: int64

In [18]:
pd.Series(dataset_nodebiasing_test.labels.reshape(-1)).value_counts()

0.0    12303
1.0     2350
Name: count, dtype: int64