<a href="https://colab.research.google.com/github/computas/responsible-ai-ws/blob/master/fairness/colab_workshop_adversarial_debiasing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Adversarial debiasing algorithm to learn a fair classifier

This is the second notebook from our [tutorial on fair ML pipelines](https://github.com/computas/responsible-ai-ws/blob/master/fairness/workshop.md). This notebook is an adaptataion of [demo_adversarial_debiasing.ipynb](https://github.com/IBM/AIF360/blob/master/examples/demo_adversarial_debiasing.ipynb) from [AIF360's repo](https://github.com/IBM/AIF360). 

We have made some modifications in order to run the notebook in Google colab. You can also run the orginal version on-premise, by creating a virutal environment as detailed [here](https://github.com/IBM/AIF360#optional-create-a-virtual-environment).


## Colab instructions



In [1]:
# Verify that the environment has python 3
!python --version

Python 3.6.9


In [2]:
# This notebook runs in Tensorflow 1.x. We need this magic command to tell Colab about that.
%tensorflow_version 1.x

TensorFlow 1.x selected.


In [0]:
!pip install -q -U \
  aif360==0.2.3 \
  tqdm==4.46.0 \
  numpy==1.18.4 \
  matplotlib==3.2.1 \
  pandas==1.0.3 \
  scipy==1.4.1 \
  "cvxpy>=1.0" \
  scs==2.1.2 \
  "numba>=0.42.0" \
  tensorflow==1.15 \
  networkx  \
  imgaug \
  BlackBoxAuditing \
  adversarial-robustness-toolbox

### Patch for pandas

There is an issue with Pandas 1.x and the implementation of the `standard_dataset.py` where we have to convert a Pandas series to a numpy array. After you apply the patch you will need to restart the runtime. This can be done via the menu `Runtime > Restart runtime` or by using the shortcut `Ctrl + M .` You can then continue running the cells under.

### Notes
- The above pip command is created using AIF360's [requirements.txt](https://github.com/josephineHonore/AIF360/blob/master/requirements.txt). At the moment, the job to update these libraries is manual.
- The original notebook uses Markdown to display formated text. Currently this is [unsupported](https://github.com/googlecolab/colabtools/issues/322) in Colab.
- The tensorflow dependency is not needed for all other notebooks.
- We have changed TensorFlow's logging level to `ERROR`, just after the import of the library, to limit the amount of logging shown to the user.
- We have added code to fix the random seeds for reproducibility

In [4]:
#@title Run this cell to patch `standard_dataset.py`.
%%writefile /usr/local/lib/python3.6/dist-packages/aif360/datasets/standard_dataset.py
from logging import warning

import numpy as np
import pandas as pd

from aif360.datasets import BinaryLabelDataset


class StandardDataset(BinaryLabelDataset):
    """Base class for every :obj:`BinaryLabelDataset` provided out of the box by
    aif360.

    It is not strictly necessary to inherit this class when adding custom
    datasets but it may be useful.

    This class is very loosely based on code from
    https://github.com/algofairness/fairness-comparison.
    """

    def __init__(self, df, label_name, favorable_classes,
                 protected_attribute_names, privileged_classes,
                 instance_weights_name='', scores_name='',
                 categorical_features=[], features_to_keep=[],
                 features_to_drop=[], na_values=[], custom_preprocessing=None,
                 metadata=None):
        """
        Subclasses of StandardDataset should perform the following before
        calling `super().__init__`:

            1. Load the dataframe from a raw file.

        Then, this class will go through a standard preprocessing routine which:

            2. (optional) Performs some dataset-specific preprocessing (e.g.
               renaming columns/values, handling missing data).

            3. Drops unrequested columns (see `features_to_keep` and
               `features_to_drop` for details).

            4. Drops rows with NA values.

            5. Creates a one-hot encoding of the categorical variables.

            6. Maps protected attributes to binary privileged/unprivileged
               values (1/0).

            7. Maps labels to binary favorable/unfavorable labels (1/0).

        Args:
            df (pandas.DataFrame): DataFrame on which to perform standard
                processing.
            label_name: Name of the label column in `df`.
            favorable_classes (list or function): Label values which are
                considered favorable or a boolean function which returns `True`
                if favorable. All others are unfavorable. Label values are
                mapped to 1 (favorable) and 0 (unfavorable) if they are not
                already binary and numerical.
            protected_attribute_names (list): List of names corresponding to
                protected attribute columns in `df`.
            privileged_classes (list(list or function)): Each element is
                a list of values which are considered privileged or a boolean
                function which return `True` if privileged for the corresponding
                column in `protected_attribute_names`. All others are
                unprivileged. Values are mapped to 1 (privileged) and 0
                (unprivileged) if they are not already numerical.
            instance_weights_name (optional): Name of the instance weights
                column in `df`.
            categorical_features (optional, list): List of column names in the
                DataFrame which are to be expanded into one-hot vectors.
            features_to_keep (optional, list): Column names to keep. All others
                are dropped except those present in `protected_attribute_names`,
                `categorical_features`, `label_name` or `instance_weights_name`.
                Defaults to all columns if not provided.
            features_to_drop (optional, list): Column names to drop. *Note: this
                overrides* `features_to_keep`.
            na_values (optional): Additional strings to recognize as NA. See
                :func:`pandas.read_csv` for details.
            custom_preprocessing (function): A function object which
                acts on and returns a DataFrame (f: DataFrame -> DataFrame). If
                `None`, no extra preprocessing is applied.
            metadata (optional): Additional metadata to append.
        """
        # 2. Perform dataset-specific preprocessing
        if custom_preprocessing:
            df = custom_preprocessing(df)

        # 3. Drop unrequested columns
        features_to_keep = features_to_keep or df.columns.tolist()
        keep = (set(features_to_keep) | set(protected_attribute_names)
              | set(categorical_features) | set([label_name]))
        if instance_weights_name:
            keep |= set([instance_weights_name])
        df = df[sorted(keep - set(features_to_drop), key=df.columns.get_loc)]
        categorical_features = sorted(set(categorical_features) - set(features_to_drop), key=df.columns.get_loc)

        # 4. Remove any rows that have missing data.
        dropped = df.dropna()
        count = df.shape[0] - dropped.shape[0]
        if count > 0:
            warning("Missing Data: {} rows removed from {}.".format(count,
                    type(self).__name__))
        df = dropped

        # 5. Create a one-hot encoding of the categorical variables.
        df = pd.get_dummies(df, columns=categorical_features, prefix_sep='=')

        # 6. Map protected attributes to privileged/unprivileged
        privileged_protected_attributes = []
        unprivileged_protected_attributes = []
        for attr, vals in zip(protected_attribute_names, privileged_classes):
            privileged_values = [1.]
            unprivileged_values = [0.]
            if callable(vals):
                df[attr] = df[attr].apply(vals)
            elif np.issubdtype(df[attr].dtype, np.number):
                # this attribute is numeric; no remapping needed
                privileged_values = vals
                unprivileged_values = list(set(df[attr]).difference(vals))
            else:
                # find all instances which match any of the attribute values
                priv = np.logical_or.reduce(np.equal.outer(vals, df[attr].to_numpy()))
                df.loc[priv, attr] = privileged_values[0]
                df.loc[~priv, attr] = unprivileged_values[0]

            privileged_protected_attributes.append(
                np.array(privileged_values, dtype=np.float64))
            unprivileged_protected_attributes.append(
                np.array(unprivileged_values, dtype=np.float64))

        # 7. Make labels binary
        favorable_label = 1.
        unfavorable_label = 0.
        if callable(favorable_classes):
            df[label_name] = df[label_name].apply(favorable_classes)
        elif np.issubdtype(df[label_name], np.number) and len(set(df[label_name])) == 2:
            # labels are already binary; don't change them
            favorable_label = favorable_classes[0]
            unfavorable_label = set(df[label_name]).difference(favorable_classes).pop()
        else:
            # find all instances which match any of the favorable classes
            pos = np.logical_or.reduce(np.equal.outer(favorable_classes,
                                                      df[label_name].to_numpy()))
            df.loc[pos, label_name] = favorable_label
            df.loc[~pos, label_name] = unfavorable_label

        super(StandardDataset, self).__init__(df=df, label_names=[label_name],
            protected_attribute_names=protected_attribute_names,
            privileged_protected_attributes=privileged_protected_attributes,
            unprivileged_protected_attributes=unprivileged_protected_attributes,
            instance_weights_name=instance_weights_name,
            scores_names=[scores_name] if scores_name else [],
            favorable_label=favorable_label,
            unfavorable_label=unfavorable_label, metadata=metadata)

Overwriting /usr/local/lib/python3.6/dist-packages/aif360/datasets/standard_dataset.py


In [0]:
def printb(text):
  """Auxiliar function to print in bold.
    Compensates for bug in Colab that doesn't show Markdown(diplay('text'))
  """
  print('\x1b[1;30m'+text+'\x1b[0m')

# Start of Original Notebook

#### This notebook demonstrates the use of adversarial debiasing algorithm to learn a fair classifier.
Adversarial debiasing [1] is an in-processing technique that learns a classifier to maximize prediction accuracy and simultaneously reduce an adversary's ability to determine the protected attribute from the predictions. This approach leads to a fair classifier as the predictions cannot carry any group discrimination information that the adversary can exploit. We will see how to use this algorithm for learning models with and without fairness constraints and apply them on the Adult dataset.


    References:
    [1] B. H. Zhang, B. Lemoine, and M. Mitchell, "Mitigating UnwantedBiases with Adversarial Learning," 
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.
[article_link](https://arxiv.org/abs/1801.07593)

In [6]:
%matplotlib inline
# Load all necessary packages
import sys
sys.path.append("../")
from aif360.datasets import BinaryLabelDataset
from aif360.datasets import AdultDataset, GermanDataset, CompasDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector

from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_adult, load_preproc_data_compas, load_preproc_data_german

from aif360.algorithms.inprocessing.adversarial_debiasing import AdversarialDebiasing

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, MaxAbsScaler
from sklearn.metrics import accuracy_score

from IPython.display import Markdown, display
import matplotlib.pyplot as plt

import tensorflow as tf

Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit


In [0]:
tf.compat.v1.logging.set_verbosity(tf.logging.ERROR)
SEED = 42

#### Load dataset and set options

In [0]:
# Get the dataset and split into train and test
dataset_orig = load_preproc_data_adult()

privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True, seed=SEED)

In [9]:
# print out some labels, names, etc.
#display(Markdown("#### Training Dataset shape"))
printb('#### Training Dataset shape')
print(dataset_orig_train.features.shape)
#display(Markdown("#### Favorable and unfavorable labels"))
printb("#### Favorable and unfavorable labels")
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
#display(Markdown("#### Protected attribute names"))
printb("#### Protected attribute names")
print(dataset_orig_train.protected_attribute_names)
#display(Markdown("#### Privileged and unprivileged protected attribute values"))
printb("#### Privileged and unprivileged protected attribute values")
print(dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes)
#display(Markdown("#### Dataset feature names"))
printb("#### Dataset feature names")
print(dataset_orig_train.feature_names)

[1;30m#### Training Dataset shape[0m
(34189, 18)
[1;30m#### Favorable and unfavorable labels[0m
1.0 0.0
[1;30m#### Protected attribute names[0m
['sex', 'race']
[1;30m#### Privileged and unprivileged protected attribute values[0m
[array([1.]), array([1.])] [array([0.]), array([0.])]
[1;30m#### Dataset feature names[0m
['race', 'sex', 'Age (decade)=10', 'Age (decade)=20', 'Age (decade)=30', 'Age (decade)=40', 'Age (decade)=50', 'Age (decade)=60', 'Age (decade)=>=70', 'Education Years=6', 'Education Years=7', 'Education Years=8', 'Education Years=9', 'Education Years=10', 'Education Years=11', 'Education Years=12', 'Education Years=<6', 'Education Years=>12']


#### Metric for original training data

In [10]:
# Metric for the original dataset
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
#display(Markdown("#### Original training dataset"))
printb("#### Original training dataset")
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())
metric_orig_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test.mean_difference())

[1;30m#### Original training dataset[0m
Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.195979
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.191102


In [11]:
min_max_scaler = MaxAbsScaler()
dataset_orig_train.features = min_max_scaler.fit_transform(dataset_orig_train.features)
dataset_orig_test.features = min_max_scaler.transform(dataset_orig_test.features)
metric_scaled_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
#display(Markdown("#### Scaled dataset - Verify that the scaling does not affect the group label statistics"))
printb("#### Scaled dataset - Verify that the scaling does not affect the group label statistics")
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_scaled_train.mean_difference())
metric_scaled_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_scaled_test.mean_difference())


[1;30m#### Scaled dataset - Verify that the scaling does not affect the group label statistics[0m
Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.195979
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.191102


### Learn plan classifier without debiasing

In [0]:
# Load post-processing algorithm that equalizes the odds
# Learn parameters with debias set to False
sess = tf.Session()
tf.set_random_seed(SEED)
plain_model = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='plain_classifier',
                          debias=False,
                          sess=sess,
                          seed=SEED)

In [13]:
#@title Default title text
plain_model.fit(dataset_orig_train)

epoch 0; iter: 0; batch classifier loss: 0.737970
epoch 0; iter: 200; batch classifier loss: 0.438434
epoch 1; iter: 0; batch classifier loss: 0.376381
epoch 1; iter: 200; batch classifier loss: 0.418228
epoch 2; iter: 0; batch classifier loss: 0.385003
epoch 2; iter: 200; batch classifier loss: 0.609906
epoch 3; iter: 0; batch classifier loss: 0.352963
epoch 3; iter: 200; batch classifier loss: 0.464395
epoch 4; iter: 0; batch classifier loss: 0.438074
epoch 4; iter: 200; batch classifier loss: 0.466126
epoch 5; iter: 0; batch classifier loss: 0.423859
epoch 5; iter: 200; batch classifier loss: 0.453185
epoch 6; iter: 0; batch classifier loss: 0.415533
epoch 6; iter: 200; batch classifier loss: 0.472421
epoch 7; iter: 0; batch classifier loss: 0.499586
epoch 7; iter: 200; batch classifier loss: 0.404337
epoch 8; iter: 0; batch classifier loss: 0.476573
epoch 8; iter: 200; batch classifier loss: 0.430814
epoch 9; iter: 0; batch classifier loss: 0.383572
epoch 9; iter: 200; batch classi

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x7f8f5e5521d0>

In [0]:
# Apply the plain model to test data
dataset_nodebiasing_train = plain_model.predict(dataset_orig_train)
dataset_nodebiasing_test = plain_model.predict(dataset_orig_test)

In [15]:
# Metrics for the dataset from plain model (without debiasing)
#display(Markdown("#### Plain model - without debiasing - dataset metrics"))
printb("#### Plain model - without debiasing - dataset metrics")
metric_dataset_nodebiasing_train = BinaryLabelDatasetMetric(dataset_nodebiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())

metric_dataset_nodebiasing_test = BinaryLabelDatasetMetric(dataset_nodebiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

#display(Markdown("#### Plain model - without debiasing - classification metrics"))
printb("#### Plain model - without debiasing - classification metrics")
classified_metric_nodebiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())

[1;30m#### Plain model - without debiasing - dataset metrics[0m
Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.230944
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.233361
[1;30m#### Plain model - without debiasing - classification metrics[0m
Test set: Classification accuracy = 0.801679
Test set: Balanced classification accuracy = 0.666963
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.484879
Test set: Average odds difference = -0.305113
Test set: Theil_index = 0.172990


### Apply in-processing algorithm based on adversarial learning

In [0]:
sess.close()
tf.reset_default_graph()
sess = tf.Session()
tf.set_random_seed(SEED)

In [0]:
# Learn parameters with debias set to True
debiased_model = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='debiased_classifier',
                          debias=True,
                          sess=sess,
                          seed=SEED)

In [18]:
debiased_model.fit(dataset_orig_train)

epoch 0; iter: 0; batch classifier loss: 0.737970; batch adversarial loss: 0.666912
epoch 0; iter: 200; batch classifier loss: 0.439279; batch adversarial loss: 0.650546
epoch 1; iter: 0; batch classifier loss: 0.395545; batch adversarial loss: 0.653748
epoch 1; iter: 200; batch classifier loss: 0.398371; batch adversarial loss: 0.632805
epoch 2; iter: 0; batch classifier loss: 0.393818; batch adversarial loss: 0.595920
epoch 2; iter: 200; batch classifier loss: 0.648361; batch adversarial loss: 0.623091
epoch 3; iter: 0; batch classifier loss: 0.388254; batch adversarial loss: 0.640224
epoch 3; iter: 200; batch classifier loss: 0.557428; batch adversarial loss: 0.648691
epoch 4; iter: 0; batch classifier loss: 0.497622; batch adversarial loss: 0.605873
epoch 4; iter: 200; batch classifier loss: 0.535735; batch adversarial loss: 0.627681
epoch 5; iter: 0; batch classifier loss: 0.520497; batch adversarial loss: 0.671775
epoch 5; iter: 200; batch classifier loss: 0.516437; batch adversa

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x7f8f52299cc0>

In [0]:
# Apply the plain model to test data
dataset_debiasing_train = debiased_model.predict(dataset_orig_train)
dataset_debiasing_test = debiased_model.predict(dataset_orig_test)

In [20]:
# Metrics for the dataset from plain model (without debiasing)
#display(Markdown("#### Plain model - without debiasing - dataset metrics"))
printb("#### Plain model - without debiasing - dataset metrics")
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

# Metrics for the dataset from model with debiasing
#display(Markdown("#### Model - with debiasing - dataset metrics"))
printb("#### Model - with debiasing - dataset metrics")
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(dataset_debiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train.mean_difference())

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(dataset_debiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())



#display(Markdown("#### Plain model - without debiasing - classification metrics"))
printb("#### Plain model - without debiasing - classification metrics")
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_nodebiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_nodebiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_nodebiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_nodebiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_nodebiasing_test.theil_index())



#display(Markdown("#### Model - with debiasing - classification metrics"))
printb("#### Model - with debiasing - classification metrics")
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test.accuracy())
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)
print("Test set: Balanced classification accuracy = %f" % bal_acc_debiasing_test)
print("Test set: Disparate impact = %f" % classified_metric_debiasing_test.disparate_impact())
print("Test set: Equal opportunity difference = %f" % classified_metric_debiasing_test.equal_opportunity_difference())
print("Test set: Average odds difference = %f" % classified_metric_debiasing_test.average_odds_difference())
print("Test set: Theil_index = %f" % classified_metric_debiasing_test.theil_index())

[1;30m#### Plain model - without debiasing - dataset metrics[0m
Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.230944
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.233361
[1;30m#### Model - with debiasing - dataset metrics[0m
Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.086433
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.084841
[1;30m#### Plain model - without debiasing - classification metrics[0m
Test set: Classification accuracy = 0.801679
Test set: Balanced classification accuracy = 0.666963
Test set: Disparate impact = 0.000000
Test set: Equal opportunity difference = -0.484879
Test set: Average odds difference = -0.305113
Test set: Theil_index = 0.172990
[1;30m#### Model - with debiasing - classification metrics[0m
Test set: Classification accuracy = 0.790145
Test set: Balanced classification accuracy = 0.666546


# Exploring the results

Let's take a deeper look at the previous results.

In [0]:
#@title Run cell to define `print_table` function to show results in tabular format
from IPython.display import HTML, display

def print_table(headers,data,caption=""):
  """
  Prints a table given headers and data

  Inputs:
    - headers: a list of N headers
    - data: a list of N-element lists containing the data to display
    - caption: a string describing the data

  Outputs:
    - A HTML display of the table

  Example:
    caption = "A caption"
    headers = ["row","title 1", "title 2"]
    data = [["first row", 1, 2], ["second row", 2, 3]]

    print_table(headers,data,caption)
    

         A Caption
    -----------------------------------
    | row         | title 1 | title 2 |
    -----------------------------------
    | first row   | 1       | 2       |
    -----------------------------------
    | second row  | 2       | 3       |
    -----------------------------------
  """
  display(HTML(
    '<table border="1"><caption>{0}</caption><tr>{1}</tr><tr>{2}</tr></table>'.format(
        caption,
        '<th>{}</th>'.format('</th><th>'.join(line for line in headers)),
        '</tr><tr>'.join(
            '<td>{}</td>'.format(
                '</td><td>'.join(
                    str(_) for _ in row)) for row in data))
  ))

In [22]:
t_00 = "{:.4f}".format(metric_dataset_nodebiasing_train.mean_difference())
t_01 = "{:.4f}".format(metric_dataset_debiasing_train.mean_difference())
t_10 = "{:.4f}".format(metric_dataset_nodebiasing_test.mean_difference())
t_11 = "{:.4f}".format(metric_dataset_debiasing_test.mean_difference())

table = [["Train set",t_00,t_01],
         ["Test set",t_10,t_11]]
headers = ['Statistical parity difference','Without debiasing','With debiasing']
caption = "Difference in mean outcomes between unprivileged and privileged groups"

print_table(headers,table,caption)

Statistical parity difference,Without debiasing,With debiasing
Train set,-0.2309,-0.0864
Test set,-0.2334,-0.0848


We observe a big reduction in the statistical parity difference by training with Adversarial learning debias mitigation. 

Let's look at the result of this technique by evaluating other fairness metrics.

In [23]:
acc_train_non_debias = "{:.4f}".format(classified_metric_nodebiasing_test.accuracy())
acc_train_debias = "{:.4f}".format(classified_metric_debiasing_test.accuracy())
acc_difference = "{:.4f}".format(classified_metric_nodebiasing_test.accuracy() - classified_metric_debiasing_test.accuracy())

bal_acc_non_debias = "{:.4f}".format(bal_acc_nodebiasing_test)
bal_acc_debias = "{:.4f}".format(bal_acc_debiasing_test)
bal_acc_difference = "{:.4f}".format(bal_acc_nodebiasing_test - bal_acc_debiasing_test)

di_non_debias = "{:.4f}".format(classified_metric_nodebiasing_test.disparate_impact())
di_debias = "{:.4f}".format(classified_metric_debiasing_test.disparate_impact())
# Remember that disparate impact is best when it is closest to 1. Thus we see how far is the result from 1 now.
di_difference = "{:.4f}".format(1 + classified_metric_nodebiasing_test.disparate_impact() - classified_metric_debiasing_test.disparate_impact())

eq_op_non_debias = "{:.4f}".format(classified_metric_nodebiasing_test.equal_opportunity_difference())
eq_op_debias = "{:.4f}".format(classified_metric_debiasing_test.equal_opportunity_difference())
eq_op_difference = "{:.4f}".format(classified_metric_nodebiasing_test.equal_opportunity_difference() - classified_metric_debiasing_test.equal_opportunity_difference())

avg_odds_non_debias = "{:.4f}".format(classified_metric_nodebiasing_test.average_odds_difference())
avg_odds_debias = "{:.4f}".format(classified_metric_debiasing_test.average_odds_difference())
avg_odds_difference = "{:.4f}".format(classified_metric_nodebiasing_test.average_odds_difference() - classified_metric_debiasing_test.average_odds_difference())

theil_non_debias = "{:.4f}".format(classified_metric_nodebiasing_test.theil_index()) 
theil_debias = "{:.4f}".format(classified_metric_debiasing_test.theil_index())
theil_difference = "{:.4f}".format(classified_metric_nodebiasing_test.theil_index() - classified_metric_debiasing_test.theil_index())

metrics_final = [["Accuracy", acc_train_non_debias, acc_train_debias, acc_difference],
                ["Balanced classification accuracy", bal_acc_non_debias, bal_acc_debias, bal_acc_difference],
                ["Disparate impact", di_non_debias, di_debias, di_difference],
                ["Equal opportunity difference", eq_op_non_debias, eq_op_debias, eq_op_difference],
                ["Average odds difference",avg_odds_non_debias , avg_odds_debias, avg_odds_difference],
                ["Theil_index", theil_non_debias, theil_debias, theil_difference]]
headers_final = ["Classification metric", "Without debiasing","With debiasing", "Distance"]
caption_final = "Difference in model performance by using Adversarial Learning mitigation"

print_table(headers_final, metrics_final, caption_final)

Classification metric,Without debiasing,With debiasing,Distance
Accuracy,0.8017,0.7901,0.0115
Balanced classification accuracy,0.667,0.6665,0.0004
Disparate impact,0.0,0.5871,0.4129
Equal opportunity difference,-0.4849,-0.0536,-0.4313
Average odds difference,-0.3051,-0.0352,-0.2699
Theil_index,0.173,0.1723,0.0007


It is hard to remember the definition and the ideal expected value for each metric. We can use [explainers](https://aif360.readthedocs.io/en/latest/modules/explainers.html#) to explain each metric. There are two kind of flavours: TEXT and JSON. The JSON explainers provide structured explanations that can be used to present information to the users. Here are some examples. 

In [0]:
import json
from collections import OrderedDict

def format_json(json_str):
  return json.dumps(json.loads(json_str, object_pairs_hook=OrderedDict), indent=2)

def get_ideal_value(metric_object):
  return json.loads(metric_object)["ideal"]

In [0]:
from aif360.explainers import MetricJSONExplainer

In [0]:
# Define explainers for the metrics with and without debiasing
ex_nondebias_test = MetricJSONExplainer(classified_metric_nodebiasing_test)
ex_debias_test = MetricJSONExplainer(classified_metric_debiasing_test)

Now let's print the explainers for the metrics we used above. Make sure you read the whole text.

In [27]:
printb("\n\nAccuracy\n\n")
printb("Nondebiasing")
print(format_json(ex_nondebias_test.accuracy()))
printb("Debiasing")
print(format_json(ex_debias_test.accuracy()))
printb("\n\n################\n\n")

printb("\n\nDisparate_Impact\n\n")
printb("Nondebiasing")
print(format_json(ex_nondebias_test.disparate_impact()))
printb("Debiasing")
print(format_json(ex_debias_test.disparate_impact()))
printb("\n\n################\n\n")

printb("\n\nEqual_opportunity\n\n")
printb("Nondebiasing")
print(format_json(ex_nondebias_test.equal_opportunity_difference()))
printb("Debiasing")
print(format_json(ex_debias_test.equal_opportunity_difference()))
printb("\n\n################\n\n")

printb("\n\nAverage_odds\n\n")
printb("Nondebiasing")
print(format_json(ex_nondebias_test.average_odds_difference()))
printb("Debiasing")
print(format_json(ex_debias_test.average_odds_difference()))
printb("\n\n################\n\n")

printb("\n\nTheil_index\n\n")
printb("Nondebiasing")
print(format_json(ex_nondebias_test.theil_index()))
printb("Debiasing")
print(format_json(ex_debias_test.theil_index()))
printb("\n\n################\n\n")


[1;30m

Accuracy

[0m
[1;30mNondebiasing[0m
{
  "metric": "Accuracy",
  "message": "Classification accuracy (ACC): 0.8016788370982052",
  "numTruePositives": 1427.0,
  "numTrueNegatives": 10320.0,
  "numPositives": 3474.0,
  "numNegatives": 11179.0,
  "description": "Computed as (true positive count + true negative count)/(positive_count + negative_count).",
  "ideal": "The ideal value of this metric is 1.0"
}
[1;30mDebiasing[0m
{
  "metric": "Accuracy",
  "message": "Classification accuracy (ACC): 0.7901453627243568",
  "numTruePositives": 1499.0,
  "numTrueNegatives": 10079.0,
  "numPositives": 3474.0,
  "numNegatives": 11179.0,
  "description": "Computed as (true positive count + true negative count)/(positive_count + negative_count).",
  "ideal": "The ideal value of this metric is 1.0"
}
[1;30m

################

[0m
[1;30m

Disparate_Impact

[0m
[1;30mNondebiasing[0m
{
  "metric": "Disparate Impact",
  "message": "Disparate impact (probability of favorable outcome for 

Now let's collect the `ideal` description for each metric in our previous table.

In [28]:
acc_ideal = get_ideal_value(ex_debias_test.accuracy())
bal_acc_ideal = acc_ideal 
di_ideal = get_ideal_value(ex_debias_test.disparate_impact())
eq_op_ideal = get_ideal_value(ex_debias_test.equal_opportunity_difference())
avg_odds_ideal = get_ideal_value(ex_debias_test.average_odds_difference())
theil_ideal = get_ideal_value(ex_nondebias_test.theil_index())

metrics_final = [["Accuracy", acc_train_non_debias, acc_train_debias, acc_ideal],
                ["Balanced classification accuracy", bal_acc_non_debias, bal_acc_debias, bal_acc_ideal],
                ["Disparate impact", di_non_debias, di_debias, di_ideal],
                ["Equal opportunity difference", eq_op_non_debias, eq_op_debias, eq_op_ideal],
                ["Average odds difference",avg_odds_non_debias , avg_odds_debias, avg_odds_ideal],
                ["Theil_index", theil_non_debias, theil_debias, theil_ideal]]
headers_final = ["Classification metric", "Without debiasing","With debiasing", "Ideal value"]
caption_final = "Difference in model performance by using Adversarial Learning mitigation"

print_table(headers_final, metrics_final, caption_final)

Classification metric,Without debiasing,With debiasing,Ideal value
Accuracy,0.8017,0.7901,The ideal value of this metric is 1.0
Balanced classification accuracy,0.667,0.6665,The ideal value of this metric is 1.0
Disparate impact,0.0,0.5871,The ideal value of this metric is 1.0 A value < 1 implies higher benefit for the privileged group and a value >1 implies a higher benefit for the unprivileged group.
Equal opportunity difference,-0.4849,-0.0536,The ideal value is 0. A value of < 0 implies higher benefit for the privileged group and a value > 0 implies higher benefit for the unprivileged group.
Average odds difference,-0.3051,-0.0352,The ideal value of this metric is 0. A value of < 0 implies higher benefit for the privileged group and a value > 0 implies higher benefit for the unprivileged group.
Theil_index,0.173,0.1723,A value of 0 implies perfect fairness.


# Excercises and questions

Let's make sure you understand what you just did while working on this notebook.

1. Rerun this notebook with `race` as the protected attribute. How different are the results on the fairness metrics?
2. What does the `Adversarial Debiasing` technique do?
3. What kind of classifier is this technique using? What hyperparameters could you tune?
4. Can I use the current implementation to optimize for several protected attributes?
