#### This notebook demonstrates the use of the fairness adjuster, using the same structure as the AIF360 adversarial debiasing example

The source notebook can be found here:
https://github.com/Trusted-AI/AIF360/blob/main/examples/demo_adversarial_debiasing.ipynb

In [2]:
%matplotlib inline
# Load all necessary packages
import sys

sys.path.append("../")
import matplotlib.pyplot as plt
import tensorflow.compat.v1 as tf
from aif360.algorithms.inprocessing.fairness_adjuster import FairnessAdjuster
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import (
    load_preproc_data_adult,
    load_preproc_data_compas,
    load_preproc_data_german,
)
from aif360.datasets import (
    AdultDataset,
    BinaryLabelDataset,
    CompasDataset,
    GermanDataset,
)
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector
from IPython.display import Markdown, display
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import MaxAbsScaler, StandardScaler

tf.disable_eager_execution()

#### Load dataset and set options

In [3]:
# Get the dataset and split into train and test
dataset_orig = load_preproc_data_german(["age", "sex"])

# privileged_groups = [{'sex': 1}]
# unprivileged_groups = [{'sex': 0}]
privileged_groups = [{"age": 1}]
unprivileged_groups = [{"age": 0}]


dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

  df['sex'] = df['personal_status'].replace(status_map)


In [4]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(
    dataset_orig_train.privileged_protected_attributes,
    dataset_orig_train.unprivileged_protected_attributes,
)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

#### Training Dataset shape

(700, 11)


#### Favorable and unfavorable labels

1.0 2.0


#### Protected attribute names

['age', 'sex']


#### Privileged and unprivileged protected attribute values

[array([1.]), array([1.])] [array([0.]), array([0.])]


#### Dataset feature names

['age', 'sex', 'credit_history=Delay', 'credit_history=None/Paid', 'credit_history=Other', 'savings=500+', 'savings=<500', 'savings=Unknown/None', 'employment=1-4 years', 'employment=4+ years', 'employment=Unemployed']


#### Metric for original training data

In [5]:
# Metric for the original dataset
metric_orig_train = BinaryLabelDatasetMetric(
    dataset_orig_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
display(Markdown("#### Original training dataset"))
print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_orig_train.mean_difference()
)
metric_orig_test = BinaryLabelDatasetMetric(
    dataset_orig_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_orig_test.mean_difference()
)

#### Original training dataset

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.155667
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.136994


In [6]:
min_max_scaler = MaxAbsScaler()
dataset_orig_train.features = min_max_scaler.fit_transform(dataset_orig_train.features)
dataset_orig_test.features = min_max_scaler.transform(dataset_orig_test.features)
metric_scaled_train = BinaryLabelDatasetMetric(
    dataset_orig_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
display(
    Markdown(
        "#### Scaled dataset - Verify that the scaling does not affect the group label statistics"
    )
)
print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_scaled_train.mean_difference()
)
metric_scaled_test = BinaryLabelDatasetMetric(
    dataset_orig_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_scaled_test.mean_difference()
)

#### Scaled dataset - Verify that the scaling does not affect the group label statistics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.155667
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.136994


### Learn plan classifier without debiasing

In [7]:
# Load post-processing algorithm that equalizes the odds
# Learn parameters with debias set to False
sess = tf.Session()
plain_model = FairnessAdjuster(
    privileged_groups=privileged_groups,
    unprivileged_groups=unprivileged_groups,
    scope_name="plain_classifier",
    debias=False,
    sess=sess,
)

2024-12-05 21:00:33.819383: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


In [8]:
plain_model.fit(dataset_orig_train)

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
epoch 0; iter: 0; batch classifier loss: 0.650745
epoch 1; iter: 0; batch classifier loss: 0.621140


I0000 00:00:1733432434.206422   30857 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled


epoch 2; iter: 0; batch classifier loss: 0.558319
epoch 3; iter: 0; batch classifier loss: 0.579147
epoch 4; iter: 0; batch classifier loss: 0.566931
epoch 5; iter: 0; batch classifier loss: 0.607225
epoch 6; iter: 0; batch classifier loss: 0.533261
epoch 7; iter: 0; batch classifier loss: 0.616294
epoch 8; iter: 0; batch classifier loss: 0.537298
epoch 9; iter: 0; batch classifier loss: 0.548235
epoch 10; iter: 0; batch classifier loss: 0.554410
epoch 11; iter: 0; batch classifier loss: 0.608081
epoch 12; iter: 0; batch classifier loss: 0.556615
epoch 13; iter: 0; batch classifier loss: 0.573949
epoch 14; iter: 0; batch classifier loss: 0.556723
epoch 15; iter: 0; batch classifier loss: 0.526735
epoch 16; iter: 0; batch classifier loss: 0.559974
epoch 17; iter: 0; batch classifier loss: 0.550279
epoch 18; iter: 0; batch classifier loss: 0.514289
epoch 19; iter: 0; batch classifier loss: 0.545309
epoch 20; iter: 0; batch classifier loss: 0.529533
epoch 21; iter: 0; batch classifier los

<aif360.algorithms.inprocessing.fairness_adjuster.FairnessAdjuster at 0x7f05498d2fd0>

In [9]:
# Apply the plain model to test data
dataset_nodebiasing_train = plain_model.predict(dataset_orig_train)
dataset_nodebiasing_test = plain_model.predict(dataset_orig_test)

2024-12-05 21:00:35.097035: W tensorflow/c/c_api.cc:305] Operation '{name:'plain_classifier/plain_classifier/classifier_model/b2/Adam_1/Assign' id:265 op device:{requested: '', assigned: ''} def:{{{node plain_classifier/plain_classifier/classifier_model/b2/Adam_1/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](plain_classifier/plain_classifier/classifier_model/b2/Adam_1, plain_classifier/plain_classifier/classifier_model/b2/Adam_1/Initializer/zeros)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.


In [10]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
metric_dataset_nodebiasing_train = BinaryLabelDatasetMetric(
    dataset_nodebiasing_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_nodebiasing_train.mean_difference()
)

metric_dataset_nodebiasing_test = BinaryLabelDatasetMetric(
    dataset_nodebiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_nodebiasing_test.mean_difference()
)

display(Markdown("#### Plain model - without debiasing - classification metrics"))
classified_metric_nodebiasing_test = ClassificationMetric(
    dataset_orig_test,
    dataset_nodebiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print(
    "Test set: Classification accuracy = %f"
    % classified_metric_nodebiasing_test.accuracy()
)
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5 * (TPR + TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print(
    "Test set: Disparate impact = %f"
    % classified_metric_nodebiasing_test.disparate_impact()
)
print(
    "Test set: Equal opportunity difference = %f"
    % classified_metric_nodebiasing_test.equal_opportunity_difference()
)
print(
    "Test set: Average odds difference = %f"
    % classified_metric_nodebiasing_test.average_odds_difference()
)
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.228974
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.361702


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.713333
Test set: Balanced classification accuracy = 0.552439
Test set: Disparate impact = 0.638298
Test set: Equal opportunity difference = -0.185185
Test set: Average odds difference = -0.392593
Test set: Theil_index = 0.072837


### Apply in-processing algorithm based on adversarial learning

In [10]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()

In [11]:
# Learn parameters with debias set to True
debiased_model = FairnessAdjuster(
    privileged_groups=privileged_groups,
    unprivileged_groups=unprivileged_groups,
    scope_name="debiased_classifier",
    adversary_loss_weight=0.01,
    debias=True,
    sess=sess,
    classifier_num_hidden_units=100,
)

In [11]:
# Learn parameters with debias set to True
debiased_model = FairnessAdjuster(
    privileged_groups=privileged_groups,
    unprivileged_groups=unprivileged_groups,
    scope_name="debiased_classifier",
    adversary_loss_weight=0.01,
    debias=True,
    sess=sess,
    classifier_num_hidden_units=100,
)

In [12]:
debiased_model.fit(dataset_orig_train)

epoch 0; iter: 0; batch classifier loss: 0.640058
epoch 1; iter: 0; batch classifier loss: 0.623254
epoch 2; iter: 0; batch classifier loss: 0.601313
epoch 3; iter: 0; batch classifier loss: 0.601454
epoch 4; iter: 0; batch classifier loss: 0.588761
epoch 5; iter: 0; batch classifier loss: 0.536065
epoch 6; iter: 0; batch classifier loss: 0.567071
epoch 7; iter: 0; batch classifier loss: 0.555931
epoch 8; iter: 0; batch classifier loss: 0.559509
epoch 9; iter: 0; batch classifier loss: 0.545400
epoch 10; iter: 0; batch classifier loss: 0.565657
epoch 11; iter: 0; batch classifier loss: 0.587716
epoch 12; iter: 0; batch classifier loss: 0.503853
epoch 13; iter: 0; batch classifier loss: 0.605185
epoch 14; iter: 0; batch classifier loss: 0.597196
epoch 15; iter: 0; batch classifier loss: 0.573468
epoch 16; iter: 0; batch classifier loss: 0.581813
epoch 17; iter: 0; batch classifier loss: 0.573018
epoch 18; iter: 0; batch classifier loss: 0.496678
epoch 19; iter: 0; batch classifier loss:

2024-12-05 23:41:21.361115: W tensorflow/c/c_api.cc:305] Operation '{name:'plain_classifier/plain_classifier_1/adjuster_model/b2/Adam_1/Assign' id:583 op device:{requested: '', assigned: ''} def:{{{node plain_classifier/plain_classifier_1/adjuster_model/b2/Adam_1/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](plain_classifier/plain_classifier_1/adjuster_model/b2/Adam_1, plain_classifier/plain_classifier_1/adjuster_model/b2/Adam_1/Initializer/zeros)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.


epoch 0; iter: 0; batch adjuster loss: 0.049534; batch classifier loss; 0.588066; batch adversarial loss: 1.138376
epoch 1; iter: 0; batch adjuster loss: 0.025235; batch classifier loss; 0.589828; batch adversarial loss: 1.100961
epoch 2; iter: 0; batch adjuster loss: 0.021321; batch classifier loss; 0.529649; batch adversarial loss: 1.120205
epoch 3; iter: 0; batch adjuster loss: 0.015487; batch classifier loss; 0.553760; batch adversarial loss: 1.127738
epoch 4; iter: 0; batch adjuster loss: 0.016768; batch classifier loss; 0.506626; batch adversarial loss: 1.081528
epoch 5; iter: 0; batch adjuster loss: 0.013977; batch classifier loss; 0.587664; batch adversarial loss: 1.090887
epoch 6; iter: 0; batch adjuster loss: 0.015725; batch classifier loss; 0.588134; batch adversarial loss: 1.109634
epoch 7; iter: 0; batch adjuster loss: 0.009863; batch classifier loss; 0.593729; batch adversarial loss: 1.152706
epoch 8; iter: 0; batch adjuster loss: 0.011342; batch classifier loss; 0.544848

<aif360.algorithms.inprocessing.fairness_adjuster.FairnessAdjuster at 0x7f0549471290>

In [13]:
# Apply the plain model to test data
dataset_debiasing_train = debiased_model.predict(dataset_orig_train)
dataset_debiasing_test = debiased_model.predict(dataset_orig_test)

2024-12-05 23:41:26.432710: W tensorflow/c/c_api.cc:305] Operation '{name:'debiased_classifier/debiased_classifier/classifier_model/b2/Adam_1/Assign' id:876 op device:{requested: '', assigned: ''} def:{{{node debiased_classifier/debiased_classifier/classifier_model/b2/Adam_1/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](debiased_classifier/debiased_classifier/classifier_model/b2/Adam_1, debiased_classifier/debiased_classifier/classifier_model/b2/Adam_1/Initializer/zeros)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.


In [None]:
def get_metrics(
    dataset_orig_test,
    dataset_nodebiasing_train,
    dataset_nodebiasing_test,
    dataset_debiasing_train,
    dataset_debiasing_test,
):
    metric_dataset_debiasing_train = BinaryLabelDatasetMetric(
        dataset_debiasing_train,
        unprivileged_groups=unprivileged_groups,
        privileged_groups=privileged_groups,
    )
    metric_dataset_nodebiasing_test = BinaryLabelDatasetMetric(
        dataset_nodebiasing_test,
        unprivileged_groups=unprivileged_groups,
        privileged_groups=privileged_groups,
    )
    classified_metric_nodebiasing_test = ClassificationMetric(
        dataset_orig_test,
        dataset_nodebiasing_test,
        unprivileged_groups=unprivileged_groups,
        privileged_groups=privileged_groups,
    )

    metric_dataset_debiasing_train = BinaryLabelDatasetMetric(
        dataset_debiasing_train,
        unprivileged_groups=unprivileged_groups,
        privileged_groups=privileged_groups,
    )
    metric_dataset_debiasing_test = BinaryLabelDatasetMetric(
        dataset_debiasing_test,
        unprivileged_groups=unprivileged_groups,
        privileged_groups=privileged_groups,
    )
    classified_metric_debiasing_test = ClassificationMetric(
        dataset_orig_test,
        dataset_debiasing_test,
        unprivileged_groups=unprivileged_groups,
        privileged_groups=privileged_groups,
    )
    metrics_dict = {
        "No Debiasing: Train Set: mean outcomes difference": metric_dataset_nodebiasing_train.mean_difference(),
        "No Debiasing: Test Set: mean outcomes difference": metric_dataset_nodebiasing_train.mean_difference(),
        "No Debiasing: Test Set: mean outcomes difference": metric_dataset_nodebiasing_train.mean_difference(),
        "No Debiasing: Test Set: Classification accuracy": classified_metric_nodebiasing_test.accuracy(),
        "No Debiasing: Test Set: Disparate impact": classified_metric_nodebiasing_test.disparate_impact(),
        "No Debiasing: Test Set: Average odds difference": classified_metric_nodebiasing_test.average_odds_difference(),
        "Debiasing: Train Set: mean outcomes difference": metric_dataset_debiasing_train.mean_difference(),
        "Debiasing: Test Set: mean outcomes difference": metric_dataset_debiasing_train.mean_difference(),
        "Debiasing: Test Set: mean outcomes difference": metric_dataset_debiasing_train.mean_difference(),
        "Debiasing: Test Set: Classification accuracy": classified_metric_debiasing_test.accuracy(),
        "Debiasing: Test Set: Disparate impact": classified_metric_debiasing_test.disparate_impact(),
        "Debiasing: Test Set: Average odds difference": classified_metric_debiasing_test.average_odds_difference(),
    }
    return metrics_dict

In [14]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_nodebiasing_train.mean_difference()
)
print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_nodebiasing_test.mean_difference()
)

# Metrics for the dataset from model with debiasing
display(Markdown("#### Model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(
    dataset_debiasing_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print(
    "Train set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_debiasing_train.mean_difference()
)

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(
    dataset_debiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)

print(
    "Test set: Difference in mean outcomes between unprivileged and privileged groups = %f"
    % metric_dataset_debiasing_test.mean_difference()
)


display(Markdown("#### Plain model - without debiasing - classification metrics"))
print(
    "Test set: Classification accuracy = %f"
    % classified_metric_nodebiasing_test.accuracy()
)
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5 * (TPR + TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print(
    "Test set: Disparate impact = %f"
    % classified_metric_nodebiasing_test.disparate_impact()
)
print(
    "Test set: Equal opportunity difference = %f"
    % classified_metric_nodebiasing_test.equal_opportunity_difference()
)
print(
    "Test set: Average odds difference = %f"
    % classified_metric_nodebiasing_test.average_odds_difference()
)
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())


display(Markdown("#### Model - with debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(
    dataset_orig_test,
    dataset_debiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups,
)
print(
    "Test set: Classification accuracy = %f"
    % classified_metric_debiasing_test.accuracy()
)
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5 * (TPR + TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test)
print(
    "Test set: Disparate impact = %f"
    % classified_metric_debiasing_test.disparate_impact()
)
print(
    "Test set: Equal opportunity difference = %f"
    % classified_metric_debiasing_test.equal_opportunity_difference()
)
print(
    "Test set: Average odds difference = %f"
    % classified_metric_debiasing_test.average_odds_difference()
)
print("Test set: Theil_index = %f" % classified_metric_debiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.228974
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.361702


#### Model - with debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.328671
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.574468


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.713333
Test set: Balanced classification accuracy = 0.552439
Test set: Disparate impact = 0.638298
Test set: Equal opportunity difference = -0.185185
Test set: Average odds difference = -0.392593
Test set: Theil_index = 0.072837


#### Model - with debiasing - classification metrics

Test set: Classification accuracy = 0.700000
Test set: Balanced classification accuracy = 0.551660
Test set: Disparate impact = 0.425532
Test set: Equal opportunity difference = -0.444444
Test set: Average odds difference = -0.597222
Test set: Theil_index = 0.096589



    References:
    [1] B. H. Zhang, B. Lemoine, and M. Mitchell, "Mitigating UnwantedBiases with Adversarial Learning," 
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.