<a href="https://colab.research.google.com/github/arezzy17/AIF360/blob/master/examples/demo_pytorch_adversarial_debiasing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### This notebook demonstrates the use of adversarial debiasing algorithm to learn a fair classifier.
Adversarial debiasing [1] is an in-processing technique that learns a classifier to maximize prediction accuracy and simultaneously reduce an adversary's ability to determine the protected attribute from the predictions. This approach leads to a fair classifier as the predictions cannot carry any group discrimination information that the adversary can exploit. We will see how to use this algorithm for learning models with and without fairness constraints and apply them on the Adult dataset.

In [1]:
%matplotlib inline
# Load all necessary packages
import sys
sys.path.append("../")
from aif360.datasets import BinaryLabelDataset
from aif360.datasets import AdultDataset, GermanDataset, CompasDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector

from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_adult, load_preproc_data_compas, load_preproc_data_german

from aif360.pytorch.inprocessing.adversarial_debiasing import AdversarialDebiasing, ClassifierModel, AdversaryModel, default_classifier_ann, StaircaseExponentialLR
from aif360.algorithms import Transformer

#sys.path.remove("../")

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, MaxAbsScaler
from sklearn.metrics import accuracy_score

from IPython.display import Markdown, display
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F

pip install 'aif360[AdversarialDebiasing]'


#### Load dataset and set options

In [2]:
# Get the dataset and split into train and test
dataset_orig = load_preproc_data_adult()

privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

In [3]:
dataset_orig_train.features.shape

(34189, 18)

In [4]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

#### Training Dataset shape

(34189, 18)


#### Favorable and unfavorable labels

1.0 0.0


#### Protected attribute names

['sex', 'race']


#### Privileged and unprivileged protected attribute values

[array([1.]), array([1.])] [array([0.]), array([0.])]


#### Dataset feature names

['race', 'sex', 'Age (decade)=10', 'Age (decade)=20', 'Age (decade)=30', 'Age (decade)=40', 'Age (decade)=50', 'Age (decade)=60', 'Age (decade)=>=70', 'Education Years=6', 'Education Years=7', 'Education Years=8', 'Education Years=9', 'Education Years=10', 'Education Years=11', 'Education Years=12', 'Education Years=<6', 'Education Years=>12']


#### Metric for original training data

In [5]:
# Metric for the original dataset
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())
metric_orig_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test.mean_difference())

#### Original training dataset

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.193004
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.198022


In [6]:
min_max_scaler = MaxAbsScaler()
dataset_orig_train.features = min_max_scaler.fit_transform(dataset_orig_train.features)
dataset_orig_test.features = min_max_scaler.transform(dataset_orig_test.features)
metric_scaled_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
display(Markdown("#### Scaled dataset - Verify that the scaling does not affect the group label statistics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_scaled_train.mean_difference())
metric_scaled_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_scaled_test.mean_difference())


#### Scaled dataset - Verify that the scaling does not affect the group label statistics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.193004
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.198022


### Learn plan classifier without debiasing

In [7]:
# Load post-processing algorithm that equalizes the odds
# Learn parameters with debias set to False
plain_model = AdversarialDebiasing(privileged_groups = privileged_groups,
                                   unprivileged_groups = unprivileged_groups,
                                   debias=False, verbose=True, seed=360)

In [8]:
plain_model.fit(dataset_orig_train)

Starting to train model(s) on cpu:
Learning rate of the classifier model is now set to 0.001
Epoch: [1/50] Batch: [1/268]	Classifier Loss: 0.7257	C(x): 0.7257
Epoch: [1/50] Batch: [201/268]	Classifier Loss: 0.5128	C(x): 0.4786
Epoch: [2/50] Batch: [1/268]	Classifier Loss: 0.4918	C(x): 0.4678
Epoch: [2/50] Batch: [201/268]	Classifier Loss: 0.4029	C(x): 0.4493
Epoch: [3/50] Batch: [1/268]	Classifier Loss: 0.4603	C(x): 0.4461
Epoch: [3/50] Batch: [201/268]	Classifier Loss: 0.4463	C(x): 0.4397
Epoch: [4/50] Batch: [1/268]	Classifier Loss: 0.3519	C(x): 0.4383
Learning rate of the classifier model is now set to 0.00096
Epoch: [4/50] Batch: [201/268]	Classifier Loss: 0.4918	C(x): 0.4346
Epoch: [5/50] Batch: [1/268]	Classifier Loss: 0.3501	C(x): 0.4344
Epoch: [5/50] Batch: [201/268]	Classifier Loss: 0.3706	C(x): 0.4320
Epoch: [6/50] Batch: [1/268]	Classifier Loss: 0.4850	C(x): 0.4317
Epoch: [6/50] Batch: [201/268]	Classifier Loss: 0.4925	C(x): 0.4302
Epoch: [7/50] Batch: [1/268]	Classifier Los

<aif360.pytorch.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x1bb8f7dde88>

In [9]:
# Apply the plain model to test data
dataset_nodebiasing_train = plain_model.predict(dataset_orig_train)
dataset_nodebiasing_test = plain_model.predict(dataset_orig_test)

In [10]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
metric_dataset_nodebiasing_train = BinaryLabelDatasetMetric(dataset_nodebiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())

metric_dataset_nodebiasing_test = BinaryLabelDatasetMetric(dataset_nodebiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

display(Markdown("#### Plain model - without debiasing - classification metrics"))
classified_metric_nodebiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.219438
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.223037


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.798744
Test set: Balanced classification accuracy = 0.661011
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.462479
Test set: Average odds difference = -0.289453
Test set: Theil_index = 0.180469


### Apply in-processing algorithm based on adversarial learning

In [11]:
# Learn parameters with debias set to True
debiased_model = AdversarialDebiasing(privileged_groups = privileged_groups,
                                      unprivileged_groups = unprivileged_groups,
                                      debias=True, verbose=True, seed=360)

In [12]:
debiased_model.fit(dataset_orig_train)

Starting to train model(s) on cpu:
Learning rate of the classifier model is now set to 0.001
Learning rate of the adversary model is now set to 0.001
Epoch: [1/50] Batch: [1/268]	Classifier_Loss: 0.6840	Adversary Loss: 0.6395	C(x): 0.6840	A(x, y): 0.6395
Epoch: [1/50] Batch: [201/268]	Classifier_Loss: 0.6269	Adversary Loss: 0.6589	C(x): 0.6485	A(x, y): 0.6224
Epoch: [2/50] Batch: [1/268]	Classifier_Loss: 0.6097	Adversary Loss: 0.6491	C(x): 0.6438	A(x, y): 0.6208
Epoch: [2/50] Batch: [201/268]	Classifier_Loss: 0.5658	Adversary Loss: 0.6252	C(x): 0.6283	A(x, y): 0.6179
Epoch: [3/50] Batch: [1/268]	Classifier_Loss: 0.5379	Adversary Loss: 0.6266	C(x): 0.6215	A(x, y): 0.6180
Epoch: [3/50] Batch: [201/268]	Classifier_Loss: 0.4897	Adversary Loss: 0.6622	C(x): 0.6022	A(x, y): 0.6170
Epoch: [4/50] Batch: [1/268]	Classifier_Loss: 0.5642	Adversary Loss: 0.6440	C(x): 0.5973	A(x, y): 0.6172
Learning rate of the classifier model is now set to 0.00096
Learning rate of the adversary model is now set t

<aif360.pytorch.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x1bb984a1c48>

In [13]:
# Apply the plain model to test data
dataset_debiasing_train = debiased_model.predict(dataset_orig_train)
dataset_debiasing_test = debiased_model.predict(dataset_orig_test)

In [14]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

# Metrics for the dataset from model with debiasing
display(Markdown("#### Model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(dataset_debiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train.mean_difference())

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(dataset_debiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())



display(Markdown("#### Plain model - without debiasing - classification metrics"))
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())



display(Markdown("#### Model - with debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test.accuracy())
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_debiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_debiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_debiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_debiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.219438
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.223037


#### Model - with debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = 0.000000
Test set: Difference in mean outcomes between unprivileged and privileged groups = 0.000000


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.798744
Test set: Balanced classification accuracy = 0.661011
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.462479
Test set: Average odds difference = -0.289453
Test set: Theil_index = 0.180469


#### Model - with debiasing - classification metrics

Test set: Classification accuracy = 0.242749
Test set: Balanced classification accuracy = 0.500000
Test set: Disparate impact = 1.000000
Test set: Equal opportunity difference = 0.000000
Test set: Average odds difference = 0.000000
Test set: Theil_index = 0.033644


### Experimenting

In [15]:
def init_weights(layer):
    r"""Initialize layer weights and biases if it has any and the chosen initializer
    is valid. Can be applied on any layer and will only initialize parametric layers.
    """

    try:
        layer.__getattr__('weight')
        _has_weight = True
    except:
        _has_weight = False

    try:
        layer.__getattr__('bias')
        _has_bias = True
    except:
        _has_bias = False

    if _has_weight:
        nn.init.xavier_uniform_(layer.weight.data)
    if _has_bias:
        try:
            nn.init.xavier_uniform_(layer.bias.data)
        except:
            layer.bias.data.fill_(0.01)
    else:
        pass

In [16]:
num_epochs = 50

In [17]:
classifier = ClassifierModel(default_classifier_ann(18,[200],[0.5]), torch.sigmoid)
classifier.apply(init_weights)

ClassifierModel(
  (ann): Sequential(
    (0): Linear(in_features=18, out_features=200, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=200, out_features=1, bias=True)
  )
)

In [18]:
adversary = AdversaryModel(default_classifier_ann(18,[200],[0.5]))
adversary.apply(init_weights)

AdversaryModel(
  (ann): Sequential(
    (0): Linear(in_features=18, out_features=200, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=200, out_features=1, bias=True)
  )
  (s): Sigmoid()
  (encoder): Linear(in_features=3, out_features=1, bias=True)
)

In [19]:
protected_attribute_index = dataset_orig_train.protected_attribute_names.index('sex')
train_dataset = torch.utils.data.TensorDataset(
    torch.from_numpy(dataset_orig_train.features).float(),
    torch.from_numpy(dataset_orig_train.labels).float(),
    torch.from_numpy(dataset_orig_train.protected_attributes[:, protected_attribute_index].\
    reshape(dataset_orig_train.protected_attributes.shape[0], -1)).float()
)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=128, shuffle=True)

In [20]:
import numpy as np
from math import ceil
num_train_samples, features_dim = np.shape(dataset_orig_train.features)
global_steps = num_epochs * ceil(num_train_samples / 128)

In [21]:
classifier_optim = torch.optim.Adam([p for p in classifier.parameters() if p.requires_grad], lr=0.001)
classifier_lr_scheduler = StaircaseExponentialLR(classifier_optim, global_steps, 0.001, 100, 0.96, None, True, True)
adversary_optim = torch.optim.Adam([p for p in adversary.parameters() if p.requires_grad], lr=0.001)
adversary_lr_scheduler = StaircaseExponentialLR(adversary_optim, global_steps, 0.001, 100, 0.96, None, True, True)

In [22]:
normalize = lambda x: x / (torch.norm(x) + np.finfo(np.float32).tiny)
classifier_criterion = nn.BCELoss(reduction="mean")
adversary_criterion = nn.BCELoss(reduction="mean")

In [23]:
from copy import deepcopy
global_step, classifier_losses, adversary_losses = 0, [], []
torch.manual_seed(360)
adversary_loss_weight = 0.1
for epoch in range(num_epochs):
    for i, data in enumerate(train_loader, 0):
        # Update learning rate(s)
        classifier_lr_scheduler.step(global_step, classifier.__class__.__name__)
        adversary_lr_scheduler.step(global_step, adversary.__class__.__name__)
        # Train the classifier model
        classifier.zero_grad()
        batch_features = data[:][0]
        batch_labels = data[:][1]
        pred_labels, pred_logits = classifier(batch_features)
        classifier_error = classifier_criterion(pred_labels, batch_labels)
        classifier_losses.append(classifier_error.item())
        classifier_mean_error = np.mean(classifier_losses)
        # Adversary training
        classifier_error.backward()#retain_graph=True)
        # Update the parameters for the classifier layers within the adversary model
        c_params, a_params = dict(classifier.named_parameters()), dict(adversary.named_parameters())
        for (c_p, a_p) in zip(c_params.values(), a_params.values()):
            a_p.data = deepcopy(c_p.data)
        adversary.zero_grad()
        batch_protected_attributes = data[:][2]
        pred_protected_attributes_labels, pred_protected_attributes_logits = adversary(
        batch_features, batch_labels)
        adversary_error = adversary_criterion(pred_protected_attributes_labels, batch_protected_attributes)
        adversary_error.backward()#retain_graph=True)
        adversary_losses.append(adversary_error.item())
        adversary_mean_error = np.mean(adversary_losses)
        # Adjust the classifier's gradients according to the normnalized adversary gradients
        c_params, a_params = dict(classifier.named_parameters()), dict(adversary.named_parameters())
        for p in c_params:
            unit_adversary_grad = normalize(a_params[p].grad)
            c_params[p].grad -= torch.sum((c_params[p].grad * unit_adversary_grad))
            c_params[p].grad -= adversary_loss_weight * a_params[p].grad
        adversary_optim.step() # Update adversary model parameters
        classifier_optim.step() # Update classifier model parameters
        if i % 200 == 0:
            print("Epoch: [%d/%d] Batch: [%d/%d]\tClassifier_Loss: %.4f\tAdversary Loss: %.4f\tC(x): %.4f\tA(x, y): %.4f" % \
            (epoch + 1, num_epochs, i + 1, len(train_loader), classifier_error.item(),
             adversary_error.item(), classifier_mean_error, adversary_mean_error))
        global_step += 1
    classifier.training = False
    adversary.training = False

Learning rate of the classifier model is now set to 0.001
Learning rate of the adversary model is now set to 0.001
Epoch: [1/50] Batch: [1/268]	Classifier_Loss: 0.6964	Adversary Loss: 0.7403	C(x): 0.6964	A(x, y): 0.7403
Learning rate of the classifier model is now set to 0.00096
Learning rate of the adversary model is now set to 0.00096
Learning rate of the classifier model is now set to 0.0009216
Learning rate of the adversary model is now set to 0.0009216
Epoch: [1/50] Batch: [201/268]	Classifier_Loss: 0.5945	Adversary Loss: 0.6805	C(x): 0.7010	A(x, y): 0.6942
Epoch: [2/50] Batch: [1/268]	Classifier_Loss: 0.5238	Adversary Loss: 0.6873	C(x): 0.6703	A(x, y): 0.6921
Learning rate of the classifier model is now set to 0.0008847359999999999
Learning rate of the adversary model is now set to 0.0008847359999999999
Learning rate of the classifier model is now set to 0.0008493465599999999
Learning rate of the adversary model is now set to 0.0008493465599999999
Epoch: [2/50] Batch: [201/268]	C

In [24]:
dataset_debiasing_train = dataset_orig_train.copy(deepcopy = True)
dataset_debiasing_test = dataset_orig_test.copy(deepcopy = True)
train_pred_labels = classifier(torch.from_numpy(dataset_debiasing_train.features).float())[0].cpu().detach().numpy().tolist()
test_pred_labels = classifier(torch.from_numpy(dataset_debiasing_test.features).float())[0].cpu().detach().numpy().tolist()
# Mutated, fairer dataset with new labels
dataset_debiasing_train.scores = np.array(train_pred_labels, dtype=np.float64).reshape(-1, 1)
dataset_debiasing_train.labels = (np.array(train_pred_labels)>0.5).astype(np.float64).reshape(-1,1)
dataset_debiasing_test.scores = np.array(test_pred_labels, dtype=np.float64).reshape(-1, 1)
dataset_debiasing_test.labels = (np.array(test_pred_labels)>0.5).astype(np.float64).reshape(-1,1)
# Map the dataset labels to back to their original values.
train_temp_labels = dataset_debiasing_train.labels.copy()
train_temp_labels[(dataset_debiasing_train.labels == 1.0).ravel(), 0] = dataset_orig_train.favorable_label
train_temp_labels[(dataset_debiasing_train.labels == 0.0).ravel(), 0] = dataset_orig_train.unfavorable_label
dataset_debiasing_train.labels = train_temp_labels.copy()
test_temp_labels = dataset_debiasing_test.labels.copy()
test_temp_labels[(dataset_debiasing_test.labels == 1.0).ravel(), 0] = dataset_orig_test.favorable_label
test_temp_labels[(dataset_debiasing_test.labels == 0.0).ravel(), 0] = dataset_orig_test.unfavorable_label
dataset_debiasing_test.labels = test_temp_labels.copy()

In [25]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

# Metrics for the dataset from model with debiasing
display(Markdown("#### Model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(dataset_debiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train.mean_difference())

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(dataset_debiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())



display(Markdown("#### Plain model - without debiasing - classification metrics"))
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())



display(Markdown("#### Model - with debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test.accuracy())
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_debiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_debiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_debiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_debiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.219438
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.223037


#### Model - with debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.089357
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.078366


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.798744
Test set: Balanced classification accuracy = 0.661011
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.462479
Test set: Average odds difference = -0.289453
Test set: Theil_index = 0.180469


#### Model - with debiasing - classification metrics

Test set: Classification accuracy = 0.697127
Test set: Balanced classification accuracy = 0.550077
Test set: Disparate impact = 0.634279
Test set: Equal opportunity difference = -0.045510
Test set: Average odds difference = -0.054838
Test set: Theil_index = 0.238033


# TensorFlow code

### Apply in-processing algorithm based on adversarial learning

In [None]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()

In [None]:
# Learn parameters with debias set to True
debiased_model = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='debiased_classifier',
                          debias=True,
                          sess=sess)

In [None]:
debiased_model.fit(dataset_orig_train)

epoch 0; iter: 0; batch classifier loss: 0.721611; batch adversarial loss: 0.630777
epoch 0; iter: 200; batch classifier loss: 0.442980; batch adversarial loss: 0.656542
epoch 1; iter: 0; batch classifier loss: 0.453149; batch adversarial loss: 0.657557
epoch 1; iter: 200; batch classifier loss: 0.496931; batch adversarial loss: 0.617686
epoch 2; iter: 0; batch classifier loss: 0.547117; batch adversarial loss: 0.653103
epoch 2; iter: 200; batch classifier loss: 0.331452; batch adversarial loss: 0.617297
epoch 3; iter: 0; batch classifier loss: 0.407935; batch adversarial loss: 0.627860
epoch 3; iter: 200; batch classifier loss: 0.413469; batch adversarial loss: 0.616086
epoch 4; iter: 0; batch classifier loss: 0.370982; batch adversarial loss: 0.604738
epoch 4; iter: 200; batch classifier loss: 0.469453; batch adversarial loss: 0.617892
epoch 5; iter: 0; batch classifier loss: 0.502638; batch adversarial loss: 0.595247
epoch 5; iter: 200; batch classifier loss: 0.379807; batch adversa

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x1c32efcf10>

In [None]:
# Apply the plain model to test data
dataset_debiasing_train = debiased_model.predict(dataset_orig_train)
dataset_debiasing_test = debiased_model.predict(dataset_orig_test)

In [None]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

# Metrics for the dataset from model with debiasing
display(Markdown("#### Model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(dataset_debiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train.mean_difference())

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(dataset_debiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())



display(Markdown("#### Plain model - without debiasing - classification metrics"))
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())



display(Markdown("#### Model - with debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test.accuracy())
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_debiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_debiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_debiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_debiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.217876
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.221187


#### Model - with debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.090157
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.094732


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.804955
Test set: Balanced classification accuracy = 0.666400
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.470687
Test set: Average odds difference = -0.291055
Test set: Theil_index = 0.175113


#### Model - with debiasing - classification metrics

Test set: Classification accuracy = 0.792056
Test set: Balanced classification accuracy = 0.672481
Test set: Disparate impact = 0.553746
Test set: Equal opportunity difference = -0.090716
Test set: Average odds difference = -0.053841
Test set: Theil_index = 0.170358



    References:
    [1] B. H. Zhang, B. Lemoine, and M. Mitchell, "Mitigating Unwanted Biases with Adversarial Learning," 
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.