# Certifying and Removing Disparate Impact

This notebook apples the algorithm described in [Certifying and removing disparate impact](https://dl.acm.org/doi/10.1145/2783258.2783311) by Feldman et al., as implemented by the [AI Fairness 360 library](https://aif360.readthedocs.io/) from IBM.

This is a pre-processing algorithm that works by adjusting the distributions of the features conditional on the protected attribute to be equal, so that a subsequently trained model can't discriminate.

In [None]:
from pathlib import Path

import joblib
import numpy as np
import pandas as pd
from aif360.datasets import StandardDataset
from aif360.algorithms.preprocessing import DisparateImpactRemover
from helpers.fairness_measures import (
    accuracy,
    disparate_impact_d,
    disparate_impact_p,
)
from helpers.plot import group_box_plots
from sklearn.neural_network import MLPClassifier  # noqa

## Load data

We have committed preprocessed data to the repository for reproducibility and we load it here. Check out the preprocessing notebook for details on how this data was obtained.

In [None]:
artifacts_dir = Path("../../../artifacts")

In [None]:
data_dir = artifacts_dir / "data" / "recruiting"

train = pd.read_csv(data_dir / "processed" / "train.csv")
val = pd.read_csv(data_dir / "processed" / "val.csv")
test = pd.read_csv(data_dir / "processed" / "test.csv")

`aif360` uses the following custom dataset objects

In [None]:
train_sds = StandardDataset(
    train,
    label_name="employed_yes",
    favorable_classes=[1],
    protected_attribute_names=["race_white"],
    privileged_classes=[[1]],
)
test_sds = StandardDataset(
    test,
    label_name="employed_yes",
    favorable_classes=[1],
    protected_attribute_names=["race_white"],
    privileged_classes=[[1]],
)
val_sds = StandardDataset(
    val,
    label_name="employed_yes",
    favorable_classes=[1],
    protected_attribute_names=["race_white"],
    privileged_classes=[[1]],
)
index = train_sds.feature_names.index("race_white")

## Train unfair model

For maximum reproducibility we load the baseline model from disk, but the code used to train can be found in the baseline model notebook.

In [None]:
bl_model = joblib.load(
    artifacts_dir / "models" / "recruiting" / "baseline.pkl"
)

bl_test_probs = bl_model.predict_proba(test_sds.features)[:, 1]

## Perform intervention

We repair the dataset using the `DisparateImpactRemover`.

In [None]:
di = DisparateImpactRemover(repair_level=1.0)

train_repd = di.fit_transform(train_sds)
train_repd_X = np.delete(train_repd.features, index, axis=1)
train_repd_y = train_repd.labels.flatten()

test_repd = di.fit_transform(test_sds)
test_repd_X = np.delete(test_repd.features, index, axis=1)
test_repd_y = test_repd.labels.flatten()

## Train model on fair data

We use the same architecture, but the repaired data. Once again we load a trained model for reproducibility, but the code used to train the model can be found below.

In [None]:
model = joblib.load(artifacts_dir / "models" / "recruiting" / "feldman.pkl")

In [None]:
# model = MLPClassifier(hidden_layer_sizes=(100, 100), early_stopping=True)
# model.fit(train_repd_X, train_repd_y)

test_probs = model.predict_proba(test_repd_X)[:, 1]

## Analyse unfairness and accuracy

We measure the accuracy and fairness in baseline and compare it to the corrected model.

In [None]:
bl_test_acc = accuracy(bl_test_probs, test.employed_yes)
bl_test_did = disparate_impact_d(bl_test_probs, test.race_white)
bl_test_dip = disparate_impact_p(bl_test_probs, test.race_white)

test_acc = accuracy(test_probs, test.employed_yes)
test_did = disparate_impact_d(test_probs, test.race_white)
test_dip = disparate_impact_p(test_probs, test.race_white)

print(f"Baseline accuracy: {bl_test_acc:.3f}")
print(f"Accuracy: {test_acc:.3f}\n")

print(f"Baseline disparate impact (dist.): {bl_test_did:.3f}")
print(f"Disparate impact (dist.): {test_did:.3f}\n")

print(f"Baseline disparate impact (prob.): {bl_test_dip:.3f}")
print(f"Disparate impact (prob.): {test_dip:.3f}")

We can visualise the disparity between men and women with a box plot of the scores.

In [None]:
dp_box = group_box_plots(
    np.concatenate([bl_test_probs, test_probs]),
    np.concatenate([np.zeros_like(bl_test_probs), np.ones_like(test_probs)]),
    np.tile(test.race_white.map(lambda x: "White" if x else "Black"), 2),
    group_names=["Baseline", "Feldman"],
)
dp_box