#### This notebook demonstrates the use of Reweighting pre-processing, Adversarial Debiasing in-processing and Reject Option Classification (ROC) post-processing algorithms for bias mitigation.
- The debiasing function used is implemented in the `RejectOptionClassification` class.
- Divide the dataset into training, validation, and testing partitions.
- Train classifier on original training data.
- Estimate the optimal classification threshold, that maximizes balanced accuracy without fairness constraints.
- Estimate the optimal classification threshold, and the critical region boundary (ROC margin) using a validation set for the desired constraint on fairness. The best parameters are those that maximize the classification threshold while satisfying the fairness constraints.
- The constraints can be used on the following fairness measures:
    * Statistical parity difference on the predictions of the classifier
    * Average odds difference for the classifier
    * Equal opportunity difference for the classifier
- Determine the prediction scores for testing data. Using the estimated optimal classification threshold, compute accuracy and fairness metrics.
- Using the determined optimal classification threshold and the ROC margin, adjust the predictions. Report accuracy and fairness metric on the new predictions.

In [1]:
# Load all necessary packages
import sys
sys.path.append("../")
import numpy as np
import tensorflow as tf
#from tqdm import tqdm
from warnings import warn

from aif360.datasets import BinaryLabelDataset
from aif360.datasets import AdultDataset, GermanDataset, CompasDataset
from aif360.metrics import ClassificationMetric, BinaryLabelDatasetMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions\
        import load_preproc_data_adult, load_preproc_data_german, load_preproc_data_compas

from aif360.algorithms.preprocessing.reweighing import Reweighing
from aif360.algorithms.inprocessing.adversarial_debiasing import AdversarialDebiasing
from aif360.algorithms.postprocessing.reject_option_classification\
        import RejectOptionClassification

from common_utils import compute_metrics

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

from IPython.display import Markdown, display
from ipywidgets import interactive, FloatSlider

#### Load dataset and specify options

In [2]:
## import dataset
dataset_used = "adult" # "adult", "german", "compas"
protected_attribute_used = 1 # 1, 2

if dataset_used == "adult":
#     dataset_orig = AdultDataset()
    if protected_attribute_used == 1:
        privileged_groups = [{'sex': 1}]
        unprivileged_groups = [{'sex': 0}]
        dataset_orig = load_preproc_data_adult(['sex'])
    else:
        privileged_groups = [{'race': 1}]
        unprivileged_groups = [{'race': 0}]
        dataset_orig = load_preproc_data_adult(['race'])
    
elif dataset_used == "german":
#     dataset_orig = GermanDataset()
    if protected_attribute_used == 1:
        privileged_groups = [{'sex': 1}]
        unprivileged_groups = [{'sex': 0}]
        dataset_orig = load_preproc_data_german(['sex'])
    else:
        privileged_groups = [{'age': 1}]
        unprivileged_groups = [{'age': 0}]
        dataset_orig = load_preproc_data_german(['age'])
    
elif dataset_used == "compas":
#     dataset_orig = CompasDataset()
    if protected_attribute_used == 1:
        privileged_groups = [{'sex': 0}]
        unprivileged_groups = [{'sex': 1}]
        dataset_orig = load_preproc_data_compas(['sex'])
    else:
        privileged_groups = [{'race': 1}]
        unprivileged_groups = [{'race': 0}]  
        dataset_orig = load_preproc_data_compas(['race'])

        
# Metric used (should be one of allowed_metrics)
metric_name = "Statistical parity difference"

# Upper and lower bound on the fairness metric used
metric_ub = 0.1
metric_lb = -0.1
        
#random seed for calibrated equal odds prediction
np.random.seed(1)

# Verify metric name
allowed_metrics = ["Statistical parity difference",
                   "Average odds difference",
                   "Equal opportunity difference"]
if metric_name not in allowed_metrics:
    raise ValueError("Metric name should be one of allowed metrics")

#### Split into train, test and validation

In [3]:
# Get the dataset and split into train and test
dataset_orig_train, dataset_orig_vt = dataset_orig.split([0.7], shuffle=True)
dataset_orig_valid, dataset_orig_test = dataset_orig_vt.split([0.5], shuffle=True)

#### Clean up training data and display properties of the data

In [4]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

#### Training Dataset shape

(34189, 18)


#### Favorable and unfavorable labels

1.0 0.0


#### Protected attribute names

['sex']


#### Privileged and unprivileged protected attribute values

[array([1.])] [array([0.])]


#### Dataset feature names

['race', 'sex', 'Age (decade)=10', 'Age (decade)=20', 'Age (decade)=30', 'Age (decade)=40', 'Age (decade)=50', 'Age (decade)=60', 'Age (decade)=>=70', 'Education Years=6', 'Education Years=7', 'Education Years=8', 'Education Years=9', 'Education Years=10', 'Education Years=11', 'Education Years=12', 'Education Years=<6', 'Education Years=>12']


## Pre-processing: Reweighing

#### Metric for original training data

In [5]:
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Weights:")
print(dataset_orig_train.instance_weights)
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

#### Original training dataset

Weights:
[1. 1. 1. ... 1. 1. 1.]
Difference in mean outcomes between unprivileged and privileged groups = -0.190698


#### Reweighing

In [6]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,
               privileged_groups=privileged_groups)
RW.fit(dataset_orig_train)
dataset_transf_train = RW.transform(dataset_orig_train)

#### Metric for original training data

In [7]:
metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Transformed training dataset"))
print("Weights:")
print(dataset_transf_train.instance_weights)
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

#### Transformed training dataset

Weights:
[1.09009788 1.09009788 0.85643005 ... 0.85643005 0.85643005 2.1573167 ]
Difference in mean outcomes between unprivileged and privileged groups = -0.000000


## In-processing: Adversarial Debiasing

### Without debiasing

#### Train without debiasing

In [8]:
# Learn parameters with debias set to False
sess = tf.Session() 
plain_model_nodebias = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='plain_classifier',
                          debias=False,
                           sess=sess)

In [9]:
plain_model_nodebias.fit(dataset_orig_train)




The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where




epoch 0; iter: 0; batch classifier loss: 0.749527
epoch 0; iter: 200; batch classifier loss: 0.395791
epoch 1; iter: 0; batch classifier loss: 0.451405
epoch 1; iter: 200; batch classifier loss: 0.489790
epoch 2; iter: 0; batch classifier loss: 0.466488
epoch 2; iter: 200; batch classifier loss: 0.440943
epoch 3; iter: 0; batch classifier loss: 0.398345
epoch 3; iter: 200; batch classifier loss: 0.459410
epoch 4; ite

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x1dd0cccc0c8>

#### Show metrics

In [10]:
# Apply the plain model to test data
dataset_nodebiasing_train = plain_model_nodebias.predict(dataset_orig_train)
dataset_nodebiasing_test = plain_model_nodebias.predict(dataset_orig_test)

In [11]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
metric_dataset_nodebiasing_train = BinaryLabelDatasetMetric(dataset_nodebiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())

metric_dataset_nodebiasing_test = BinaryLabelDatasetMetric(dataset_nodebiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

display(Markdown("#### Plain model - without debiasing - classification metrics"))
classified_metric_nodebiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.207083
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.206143


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.804422
Test set: Balanced classification accuracy = 0.656253
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.443989
Test set: Average odds difference = -0.273663
Test set: Theil_index = 0.178989


### With debiasing

#### Train with debiasing

In [12]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()

In [13]:
# Learn parameters with debias set to True
sess = tf.Session()
plain_model_debias = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='plain_classifier',
                          debias=True,
                           sess=sess)

In [14]:
plain_model_debias.fit(dataset_orig_train)

epoch 0; iter: 0; batch classifier loss: 0.684884; batch adversarial loss: 0.621481
epoch 0; iter: 200; batch classifier loss: 0.444297; batch adversarial loss: 0.652825
epoch 1; iter: 0; batch classifier loss: 0.517512; batch adversarial loss: 0.676000
epoch 1; iter: 200; batch classifier loss: 0.549574; batch adversarial loss: 0.667661
epoch 2; iter: 0; batch classifier loss: 0.465986; batch adversarial loss: 0.653992
epoch 2; iter: 200; batch classifier loss: 0.441834; batch adversarial loss: 0.620508
epoch 3; iter: 0; batch classifier loss: 0.493553; batch adversarial loss: 0.629533
epoch 3; iter: 200; batch classifier loss: 0.419524; batch adversarial loss: 0.600496
epoch 4; iter: 0; batch classifier loss: 0.460217; batch adversarial loss: 0.676073
epoch 4; iter: 200; batch classifier loss: 0.384316; batch adversarial loss: 0.593867
epoch 5; iter: 0; batch classifier loss: 0.410312; batch adversarial loss: 0.624691
epoch 5; iter: 200; batch classifier loss: 0.390113; batch adversa

epoch 48; iter: 0; batch classifier loss: 0.423809; batch adversarial loss: 0.590944
epoch 48; iter: 200; batch classifier loss: 0.443803; batch adversarial loss: 0.558847
epoch 49; iter: 0; batch classifier loss: 0.574912; batch adversarial loss: 0.652849
epoch 49; iter: 200; batch classifier loss: 0.347987; batch adversarial loss: 0.587090


<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x1dd10ec9588>

#### Show metrics

In [15]:
# Apply the plain model to test data
dataset_debiasing_train = plain_model_debias.predict(dataset_orig_train)
dataset_debiasing_test = plain_model_debias.predict(dataset_orig_test)

In [16]:
# Metrics for the dataset from plain model (without debiasing)
display(Markdown("#### Plain model - without debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(dataset_debiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train.mean_difference())

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(dataset_debiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())

display(Markdown("#### Plain model - without debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test.accuracy())
#metric_test_bef = compute_metrics(dataset_orig_test, dataset_nodebiasing_test, 
#                unprivileged_groups, privileged_groups)
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_debiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_debiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_debiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_debiasing_test.theil_index())

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.087973
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.093916


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.792139
Test set: Balanced classification accuracy = 0.668719
Test set: Disparate impact = 0.549817
Test set: Equal opportunity difference = -0.100694
Test set: Average odds difference = -0.059413
Test set: Theil_index = 0.170749


## Post-processing: Reject Option Classification

#### Train logistic model

In [19]:
# Logistic regression classifier and predictions
scale_orig = StandardScaler()
X_train = scale_orig.fit_transform(dataset_orig_train.features)
y_train = dataset_orig_train.labels.ravel()

lmod = LogisticRegression()
lmod.fit(X_train, y_train)

# positive class index
pos_ind = np.where(lmod.classes_ == dataset_orig_train.favorable_label)[0][0]



In [20]:
dataset_orig_valid_pred = dataset_orig_valid.copy(deepcopy=True)
X_valid = scale_orig.transform(dataset_orig_valid_pred.features)
y_valid = dataset_orig_valid_pred.labels
dataset_orig_valid_pred.scores = lmod.predict_proba(X_valid)[:,pos_ind].reshape(-1,1)

fav_inds = dataset_orig_valid_pred.scores > 0.5
dataset_orig_valid_pred.labels[fav_inds] = dataset_orig_valid_pred.favorable_label
dataset_orig_valid_pred.labels[~fav_inds] = dataset_orig_valid_pred.unfavorable_label

dataset_orig_test_pred = dataset_orig_test.copy(deepcopy=True)
X_test = scale_orig.transform(dataset_orig_test_pred.features)
y_test = dataset_orig_test_pred.labels
dataset_orig_test_pred.scores = lmod.predict_proba(X_test)[:,pos_ind].reshape(-1,1)

fav_inds = dataset_orig_test_pred.scores > 0.5
dataset_orig_test_pred.labels[fav_inds] = dataset_orig_test_pred.favorable_label
dataset_orig_test_pred.labels[~fav_inds] = dataset_orig_test_pred.unfavorable_label

#### Show metrics for Test Set

In [27]:
# Metrics for the test set
display(Markdown("#### Test set"))
display(Markdown("##### Raw predictions - No fairness constraints"))
classified_metric_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_orig_test_pred,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)

print("Classification accuracy = %f" % classified_metric_test.accuracy())

metric_test_bef = compute_metrics(dataset_orig_test, dataset_orig_test_pred, 
                unprivileged_groups, privileged_groups)

#### Test set

##### Raw predictions - No fairness constraints

Classification accuracy = 0.806742
Balanced accuracy = 0.6626
Statistical parity difference = -0.2125
Disparate impact = 0.0000
Average odds difference = -0.2829
Equal opportunity difference = -0.4604
Theil index = 0.1754


#### Estimate optimal parameters for the ROC method

In [22]:
ROC = RejectOptionClassification(unprivileged_groups=unprivileged_groups, 
                                 privileged_groups=privileged_groups, 
                                 low_class_thresh=0.01, high_class_thresh=0.99,
                                  num_class_thresh=100, num_ROC_margin=50,
                                  metric_name=metric_name,
                                  metric_ub=metric_ub, metric_lb=metric_lb)
ROC = ROC.fit(dataset_orig_valid,dataset_orig_valid_pred)

In [23]:
print("Optimal classification threshold (with fairness constraints) = %.4f" % ROC.classification_threshold)
print("Optimal ROC margin = %.4f" % ROC.ROC_margin)

Optimal classification threshold (with fairness constraints) = 0.1684
Optimal ROC margin = 0.0790


#### Show predictions from Test Set with ROC

In [26]:
# Metrics for the transformed test set
dataset_transf_test = ROC.predict(dataset_orig_test_pred)

display(Markdown("#### Test set"))
display(Markdown("##### Transformed predictions - With fairness constraints"))
classified_metric_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_transf_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)

print("Classification accuracy = %f" % classified_metric_test.accuracy()) 

metric_test_aft = compute_metrics(dataset_orig_test, dataset_transf_test, 
                unprivileged_groups, privileged_groups)

#### Test set

##### Transformed predictions - With fairness constraints

Test set: Classification accuracy = 0.676129
Balanced accuracy = 0.7180
Statistical parity difference = -0.0905
Disparate impact = 0.8173
Average odds difference = -0.0267
Equal opportunity difference = -0.0551
Theil index = 0.1061


References:

F. Kamiran, and T. Claders,"Data preprocessing techniques for classification without discrimination",
Knowledge and Information Systems, 33(1):1–33, 2012. 

B. H. Zhang, B. Lemoine, and M. Mitchell, "Mitigating UnwantedBiases with Adversarial Learning",
AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.

F. Kamiran, A. Karim, and X. Zhang,  "Decision theory for discrimination-aware classification",
In IEEE International Conference on Data Mining, pp. 924–929, 2012.