# Information Withholding by Pleiss et al. - Adult data

This notebook contains the implementation of the post-processing algorithm introduced in [On fairness and calibration](https://dl.acm.org/doi/10.5555/3295222.3295319) by Pleiss et al. (2017) as part of the IBM AIF360 fairness tool box github.com/IBM/AIF360.

The migitation method achieves a relaxed version of Equalised Odds while maintaining Calibration by withholding information. In particular a proportion of the advantaged group is predicted according to the base rate without considering the model inputs. This preserves Calibration but allows us to bring the error rates for the two classes closer together.

This method is attractive in that it achieves one notion of fairness and approximately achieves another. However, like the intervention of Hardt et al. it introduces randomness into decision making that might not be compatible with individual notions of fairness. Furthermore the method requires as input calibrated classifiers, it does not offer a way to achieve Calibration, only to preserve it.


In [None]:
from pathlib import Path

import joblib
import numpy as np
import pandas as pd
import plotly.graph_objs as go
from aif360.datasets import StandardDataset
from aif360.algorithms.postprocessing.calibrated_eq_odds_postprocessing import (
    CalibratedEqOddsPostprocessing,
)
from helpers.fairness_measures import (
    accuracy,
    equalised_odds_d,
    equalised_odds_p,
    equal_opportunity_d,
    equal_opportunity_p,
)
from helpers.plot import group_roc_curves

## Load data

We have committed preprocessed data to the repository for reproducibility and we load it here. Check out hte preprocessing notebook for details on how this data was obtained.

In [None]:
artifacts_dir = Path("../../../artifacts")

In [None]:
data_dir = artifacts_dir / "data" / "recruiting"

train = pd.read_csv(data_dir / "processed" / "train.csv")
val = pd.read_csv(data_dir / "processed" / "val.csv")
test = pd.read_csv(data_dir / "processed" / "test.csv")

In order to process data for our fairness intervention we need to define special dataset objects which are part of every intervention pipeline within the IBM AIF360 toolbox. These objects contain the original data as well as some useful further information, e.g., which feature is the protected attribute as well as which column corresponds to the label.

In [None]:
train_sds = StandardDataset(
    train,
    label_name="employed_yes",
    favorable_classes=[1],
    protected_attribute_names=["race_white"],
    privileged_classes=[[1]],
)
test_sds = StandardDataset(
    test,
    label_name="employed_yes",
    favorable_classes=[1],
    protected_attribute_names=["race_white"],
    privileged_classes=[[1]],
)
val_sds = StandardDataset(
    val,
    label_name="employed_yes",
    favorable_classes=[1],
    protected_attribute_names=["race_white"],
    privileged_classes=[[1]],
)
index = train_sds.feature_names.index("race_white")

Define which binary value goes with the (un-)privileged group

In [None]:
privileged_groups = [{"race_white": 1.0}]
unprivileged_groups = [{"race_white": 0.0}]

## Train unfair model

For maximum reproducibility we load the baseline model from disk, but the code used to train can be found in the baseline model notebook.

In [None]:
bl_model = joblib.load(
    artifacts_dir / "models" / "recruiting" / "baseline.pkl"
)

Get predictions for the validation and test data

In [None]:
test_probs = bl_model.predict_proba(test.drop("employed_yes", axis=1))[:, 1]
test_sds_pred = test_sds.copy(deepcopy=True)
test_sds_pred.scores = test_probs.reshape(-1, 1)

val_probs = bl_model.predict_proba(val.drop("employed_yes", axis=1))[:, 1]
val_sds_pred = val_sds.copy(deepcopy=True)
val_sds_pred.scores = val_probs.reshape(-1, 1)

## Equal opportunity

We first address equal opportunity which is achieved by setting the cost_contraint parameter method accordingly when setting up the intervention. We then learn the intervention procedure based on the true and predicted labels of the validation data. Subsequently, we apply the learnt intervention to the predictions of the test data and analyse the outcomes for fairness and accuracy.

In [None]:
cost_constraint = "fnr"

In [None]:
# Learn parameters to equal opportunity and apply to create a new dataset
cpp = CalibratedEqOddsPostprocessing(
    privileged_groups=privileged_groups,
    unprivileged_groups=unprivileged_groups,
    cost_constraint=cost_constraint,
    seed=np.random.seed(),
)
cpp = cpp.fit(val_sds, val_sds_pred)

Apply intervention to testing data.

In [None]:
test_sds_pred_tranf = cpp.predict(test_sds_pred)

Analyse accuracy and fairness

In [None]:
print(
    "Accuracy =",
    accuracy(test_sds_pred_tranf.scores.flatten(), test.employed_yes),
)
print(
    "Equal opportunity on probability level =",
    equal_opportunity_p(
        test_sds_pred_tranf.scores.flatten(),
        test.race_white,
        test.employed_yes,
    ),
)
print(
    "Equal opportunity on decision level =",
    equal_opportunity_d(
        test_sds_pred_tranf.scores.flatten(),
        test.race_white,
        test.employed_yes,
    ),
)

In [None]:
eo_calib_bar = go.Figure(
    data=[
        go.Bar(
            x=[1],
            y=[
                test_sds_pred_tranf.scores[
                    (test.race_white == race) & (test.employed_yes == 1)
                ].mean()
            ],
            name="White" if race else "Black",
        )
        for race in range(2)
    ]
)
eo_calib_bar

## Equalised odds

We'll now repeat the process for equalised odds, which requires us changing the underlying cost constraint parameter accordingly, so that the resulting intervention minimises a weighted average between false negative and false positive rate. There are no further parameter choices to be made.

In [None]:
cost_constraint = "weighted"

Learn intervention on validation data.

In [None]:
# Learn parameters to equalize odds and apply to create a new dataset
cpp = CalibratedEqOddsPostprocessing(
    privileged_groups=privileged_groups,
    unprivileged_groups=unprivileged_groups,
    cost_constraint=cost_constraint,
    seed=np.random.seed(),
)
cpp = cpp.fit(test_sds, test_sds_pred)

Apply intervention on testing data.

In [None]:
test_sds_pred_tranf = cpp.predict(test_sds_pred)

Analyse fairness and accuracy

In [None]:
print(
    "Accuracy =",
    accuracy(test_sds_pred_tranf.scores.flatten(), test.employed_yes),
)
print(
    "Equalised odds on probability level = ",
    equalised_odds_p(
        test_sds_pred_tranf.scores.flatten(),
        test.race_white,
        test.employed_yes,
    ),
)
print(
    "Equalised odds on decision level = ",
    equalised_odds_d(
        test_sds_pred_tranf.scores.flatten(),
        test.race_white,
        test.employed_yes,
    ),
)

In [None]:
group_roc_curves(
    test.employed_yes, test_sds_pred_tranf.scores, test.race_white
)

In [None]:
eo_calib_bar = go.Figure(
    data=[
        go.Bar(
            x=[label],
            y=[
                test_sds_pred_tranf.scores[
                    (test.race_white == race) & (test.employed_yes == label)
                ].mean()
            ],
            name="White" if race else "Black",
        )
        for label in range(2)
        for race in range(2)
    ]
)
eo_calib_bar