#### This notebook demonstrates the use of Reweighing pre-processing, Adversarial Debiasing in-processing and Reject Option Classification (ROC) post-processing algorithms for bias mitigation.
- Load imports
- Dataset
    * Load Adult, COMPAS, or German dataset and set privileged and unprivileged groups
    * Divide the dataset into training, validation, and testing partitions
    * Show dataset properties
- Pre-processing: Reweighing.
    * Show difference in mean outcomes for original training data
    * Assign weights with reweighing
    * Show difference in mean outcomes for transformed training data
- In-processing: Adversarial Debiasing.
    * Train model without debiasing, predict, and show metrics
    * Train model with debiasing, predict, and show metrics
- Post-processing: Reject Option Classification (ROC).
    * Show metrics for test set from Adversarial Debiasing without debiasing
    * Fit ROC model
    * Transform labels and show metrics

In [1]:
# Load all necessary packages
import sys
sys.path.append("../")
import numpy as np
import tensorflow as tf

# Avoid deprecation warnings
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

from aif360.datasets import BinaryLabelDataset
from aif360.datasets import AdultDataset, GermanDataset, CompasDataset
from aif360.metrics import ClassificationMetric, BinaryLabelDatasetMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions\
        import load_preproc_data_adult, load_preproc_data_german, load_preproc_data_compas

from aif360.algorithms.preprocessing.reweighing import Reweighing
from aif360.algorithms.inprocessing.adversarial_debiasing import AdversarialDebiasing
from aif360.algorithms.postprocessing.reject_option_classification\
        import RejectOptionClassification

from common_utils import compute_metrics

from IPython.display import Markdown, display
from ipywidgets import interactive, FloatSlider

#### Load dataset and specify options

In [2]:
## import dataset
dataset_used = "german" # "adult", "german", "compas"
protected_attribute_used = 1 # 1, 2

if dataset_used == "adult":
#     dataset_orig = AdultDataset()
    if protected_attribute_used == 1:
        privileged_groups = [{'sex': 1}]
        unprivileged_groups = [{'sex': 0}]
        dataset_orig = load_preproc_data_adult(['sex'])
    else:
        privileged_groups = [{'race': 1}]
        unprivileged_groups = [{'race': 0}]
        dataset_orig = load_preproc_data_adult(['race'])
    
elif dataset_used == "german":
#     dataset_orig = GermanDataset()
    if protected_attribute_used == 1:
        privileged_groups = [{'sex': 1}]
        unprivileged_groups = [{'sex': 0}]
        dataset_orig = load_preproc_data_german(['sex'])
    else:
        privileged_groups = [{'age': 1}]
        unprivileged_groups = [{'age': 0}]
        dataset_orig = load_preproc_data_german(['age'])
    
elif dataset_used == "compas":
#     dataset_orig = CompasDataset()
    if protected_attribute_used == 1:
        privileged_groups = [{'sex': 0}]
        unprivileged_groups = [{'sex': 1}]
        dataset_orig = load_preproc_data_compas(['sex'])
    else:
        privileged_groups = [{'race': 1}]
        unprivileged_groups = [{'race': 0}]  
        dataset_orig = load_preproc_data_compas(['race'])

#### Split into train, test and validation

In [3]:
# Get the dataset and split into train and test
dataset_orig_train, dataset_orig_vt = dataset_orig.split([0.7], shuffle=True)
dataset_orig_valid, dataset_orig_test = dataset_orig_vt.split([0.5], shuffle=True)

#### Clean up training data and display properties of the data

In [4]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

#### Training Dataset shape

(700, 11)


#### Favorable and unfavorable labels

1.0 2.0


#### Protected attribute names

['sex']


#### Privileged and unprivileged protected attribute values

[array([1.])] [array([0.])]


#### Dataset feature names

['age', 'sex', 'credit_history=Delay', 'credit_history=None/Paid', 'credit_history=Other', 'savings=500+', 'savings=<500', 'savings=Unknown/None', 'employment=1-4 years', 'employment=4+ years', 'employment=Unemployed']


## Pre-processing: Reweighing

#### Metric for original training data

In [5]:
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Weights = %f , %f, %f, ..." % (dataset_orig_train.instance_weights[1], dataset_orig_train.instance_weights[2], \
                                    dataset_orig_train.instance_weights[3]))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

#### Original training dataset

Weights = 1.000000 , 1.000000, 1.000000, ...
Difference in mean outcomes between unprivileged and privileged groups = -0.085684


#### Reweighing

In [6]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,
               privileged_groups=privileged_groups)
RW.fit(dataset_orig_train)
dataset_transf_train = RW.transform(dataset_orig_train)

#### Metric for reweighted training data

In [7]:
metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Transformed training dataset"))
print("Weights = %8f , %8f, %8f, ..." % (dataset_transf_train.instance_weights[1], dataset_transf_train.instance_weights[2], \
                                    dataset_transf_train.instance_weights[3]))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

#### Transformed training dataset

Weights = 0.962556 , 1.107728, 1.107728, ...
Difference in mean outcomes between unprivileged and privileged groups = -0.000000


## In-processing: Adversarial Debiasing

### Without debiasing

#### Train without debiasing

In [8]:
# Learn parameters with debias set to False
sess = tf.Session() 
plain_model_nodebias = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='plain_classifier',
                          debias=False,
                           sess=sess)

In [9]:
plain_model_nodebias.fit(dataset_orig_train)

epoch 0; iter: 0; batch classifier loss: 0.652186
epoch 1; iter: 0; batch classifier loss: 0.625617
epoch 2; iter: 0; batch classifier loss: 0.585623
epoch 3; iter: 0; batch classifier loss: 0.539420
epoch 4; iter: 0; batch classifier loss: 0.579216
epoch 5; iter: 0; batch classifier loss: 0.531723
epoch 6; iter: 0; batch classifier loss: 0.537198
epoch 7; iter: 0; batch classifier loss: 0.605551
epoch 8; iter: 0; batch classifier loss: 0.565381
epoch 9; iter: 0; batch classifier loss: 0.551469
epoch 10; iter: 0; batch classifier loss: 0.481968
epoch 11; iter: 0; batch classifier loss: 0.564440
epoch 12; iter: 0; batch classifier loss: 0.553404
epoch 13; iter: 0; batch classifier loss: 0.554652
epoch 14; iter: 0; batch classifier loss: 0.577627
epoch 15; iter: 0; batch classifier loss: 0.547541
epoch 16; iter: 0; batch classifier loss: 0.532641
epoch 17; iter: 0; batch classifier loss: 0.550717
epoch 18; iter: 0; batch classifier loss: 0.528838
epoch 19; iter: 0; batch classifier loss:

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x25428db58c8>

#### Show metrics

In [10]:
# Apply the plain model to test data
dataset_nodebiasing_train = plain_model_nodebias.predict(dataset_orig_train)
dataset_nodebiasing_valid = plain_model_nodebias.predict(dataset_orig_valid)
dataset_nodebiasing_test = plain_model_nodebias.predict(dataset_orig_test)

In [11]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
metric_dataset_nodebiasing_train = BinaryLabelDatasetMetric(dataset_nodebiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())

metric_dataset_nodebiasing_test = BinaryLabelDatasetMetric(dataset_nodebiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())


#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.127753
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.111111


In [12]:
# Accuracy
display(Markdown("#### Plain model - without debiasing - classification metrics"))
classified_metric_nodebiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())

# Other metrics
metric_test_bef = compute_metrics(dataset_orig_test, dataset_nodebiasing_test, 
                unprivileged_groups, privileged_groups)

#### Plain model - without debiasing - classification metrics

Classification accuracy = 0.680000
Balanced accuracy = 0.5067
Statistical parity difference = -0.1111
Disparate impact = 0.8889
Average odds difference = -0.1167
Equal opportunity difference = -0.1000
Theil index = 0.0781


### With debiasing

#### Train with debiasing

In [13]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()

In [14]:
# Learn parameters with debias set to True
plain_model_debias = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='plain_classifier',
                          debias=True,
                           sess=sess)

In [15]:
plain_model_debias.fit(dataset_orig_train)

epoch 0; iter: 0; batch classifier loss: 0.772944; batch adversarial loss: 0.720876
epoch 1; iter: 0; batch classifier loss: 0.732564; batch adversarial loss: 0.724742
epoch 2; iter: 0; batch classifier loss: 0.702260; batch adversarial loss: 0.711225
epoch 3; iter: 0; batch classifier loss: 0.678974; batch adversarial loss: 0.719904
epoch 4; iter: 0; batch classifier loss: 0.660917; batch adversarial loss: 0.750913
epoch 5; iter: 0; batch classifier loss: 0.678752; batch adversarial loss: 0.728699
epoch 6; iter: 0; batch classifier loss: 0.617788; batch adversarial loss: 0.724927
epoch 7; iter: 0; batch classifier loss: 0.628121; batch adversarial loss: 0.733563
epoch 8; iter: 0; batch classifier loss: 0.678718; batch adversarial loss: 0.728306
epoch 9; iter: 0; batch classifier loss: 0.590913; batch adversarial loss: 0.735855
epoch 10; iter: 0; batch classifier loss: 0.568394; batch adversarial loss: 0.723743
epoch 11; iter: 0; batch classifier loss: 0.592545; batch adversarial loss:

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x2542a0aeec8>

#### Show metrics

In [16]:
# Apply the plain model to test data
dataset_debiasing_train = plain_model_debias.predict(dataset_orig_train)
dataset_debiasing_test = plain_model_debias.predict(dataset_orig_test)

In [17]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(dataset_debiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train.mean_difference())

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(dataset_debiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())

#### Plain model - with debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.127753
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.066667


In [18]:
# Accuracy
display(Markdown("#### Plain model - with debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test.accuracy())

#Other metrics
metric_test_bef = compute_metrics(dataset_orig_test, dataset_debiasing_test, 
                unprivileged_groups, privileged_groups)

#### Plain model - with debiasing - classification metrics

Test set: Classification accuracy = 0.680000
Balanced accuracy = 0.5009
Statistical parity difference = -0.0667
Disparate impact = 0.9333
Average odds difference = -0.0667
Equal opportunity difference = -0.0667
Theil index = 0.0715


## Post-processing: Reject Option Classification

#### Show metrics for Test Set

In [19]:
# Metrics for the test set
display(Markdown("#### Test set"))
display(Markdown("##### Raw predictions - No fairness constraints"))
classified_metric_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)

print("Classification accuracy = %f" % classified_metric_test.accuracy())

metric_test_bef = compute_metrics(dataset_orig_test, dataset_nodebiasing_test, 
                unprivileged_groups, privileged_groups)

#### Test set

##### Raw predictions - No fairness constraints

Classification accuracy = 0.680000
Balanced accuracy = 0.5067
Statistical parity difference = -0.1111
Disparate impact = 0.8889
Average odds difference = -0.1167
Equal opportunity difference = -0.1000
Theil index = 0.0781


#### Estimate optimal parameters for the ROC method

In [20]:
# Metric used (should be one of allowed_metrics)
metric_name = "Statistical parity difference"

# Upper and lower bound on the fairness metric used
metric_ub = 0.05
metric_lb = -0.05
        
#random seed for calibrated equal odds prediction
np.random.seed(1)

# Verify metric name
allowed_metrics = ["Statistical parity difference",
                   "Average odds difference",
                   "Equal opportunity difference"]
if metric_name not in allowed_metrics:
    raise ValueError("Metric name should be one of allowed metrics")

In [21]:
ROC = RejectOptionClassification(unprivileged_groups=unprivileged_groups, 
                                 privileged_groups=privileged_groups, 
                                 low_class_thresh=0.01, high_class_thresh=0.99,
                                  num_class_thresh=100, num_ROC_margin=50,
                                  metric_name=metric_name,
                                  metric_ub=metric_ub, metric_lb=metric_lb)
ROC = ROC.fit(dataset_orig_valid,dataset_nodebiasing_valid)

In [22]:
print("Optimal classification threshold (with fairness constraints) = %.4f" % ROC.classification_threshold)
print("Optimal ROC margin = %.4f" % ROC.ROC_margin)

Optimal classification threshold (with fairness constraints) = 0.7128
Optimal ROC margin = 0.0820


#### Show predictions from Test Set with ROC

In [23]:
# Metrics for the transformed test set
dataset_transf_test = ROC.predict(dataset_nodebiasing_test)

display(Markdown("#### Test set"))
display(Markdown("##### Transformed predictions - With fairness constraints"))
classified_metric_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_transf_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)

print("Classification accuracy = %f" % classified_metric_test.accuracy()) 

metric_test_aft = compute_metrics(dataset_orig_test, dataset_transf_test, 
                unprivileged_groups, privileged_groups)

#### Test set

##### Transformed predictions - With fairness constraints

Classification accuracy = 0.493333
Balanced accuracy = 0.5617
Statistical parity difference = 0.0857
Disparate impact = 1.2727
Average odds difference = 0.0704
Equal opportunity difference = 0.1242
Theil index = 0.5954


References:

F. Kamiran, and T. Claders,"Data preprocessing techniques for classification without discrimination",
Knowledge and Information Systems, 33(1):1–33, 2012. 

B. H. Zhang, B. Lemoine, and M. Mitchell, "Mitigating UnwantedBiases with Adversarial Learning",
AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.

F. Kamiran, A. Karim, and X. Zhang,  "Decision theory for discrimination-aware classification",
In IEEE International Conference on Data Mining, pp. 924–929, 2012.