# Fairness Through Unawareness - Adult data

This notebook contains the implementation of the common pre-processing intervention called Fairness Through Unawareness (FTU) in which the protected attribute is not included as a feature in the training data. Besides being considered as an intervention, FTU can also be considered as a fairness notion, which is consistent with disparate treatment.

Although FTU is often applied by industry practitioners, its effect in terms of reducing unfairness is limited since information on protected attributed can still be contained elsewhere in the data. More precisely, there may be features which are highly correlated with the protected attributes and therefore act as proxies for them.

We consider the effect of applying FTU for a number of observational group fairness notions.

In [35]:
from pathlib import Path

import joblib
import numpy as np
import pandas as pd
from fairlearn.metrics import (
    demographic_parity_difference,
    equalized_odds_difference,
)
from sklearn.neural_network import MLPClassifier
from helpers.metrics import accuracy
from helpers.plot import group_box_plots, calibration_curves
from sklearn.ensemble import RandomForestClassifier

In [3]:
from helpers import export_plot

## Load data
We have committed preprocessed data to the repository for reproducibility and we load it here. Check out hte preprocessing notebook for details on how this data was obtained.

In [4]:
artifacts_dir = Path("../../../artifacts")

In [5]:
# override data_dir in source notebook
# this is stripped out for the hosted notebooks
artifacts_dir = Path("../../../../artifacts")

In [6]:
data_dir = artifacts_dir / "data" / "adult"

train = pd.read_csv(data_dir / "processed" / "train-one-hot.csv")
val = pd.read_csv(data_dir / "processed" / "val-one-hot.csv")
test = pd.read_csv(data_dir / "processed" / "test-one-hot.csv")

In [7]:
sex = train.drop("salary", axis=1)["sex"].apply(
    lambda sex: "female" if sex == 0 else "male"
)

## Load original model

For maximum reproducibility we can also load the baseline model from disk, but the code used to train can be found in the baseline model notebook.

In [8]:
baseline_model = joblib.load(
    artifacts_dir / "models" / "finance" / "baseline.pkl"
)


Trying to unpickle estimator LabelBinarizer from version 0.23.1 when using version 0.22.1. This might lead to breaking code or invalid results. Use at your own risk.


Trying to unpickle estimator MLPClassifier from version 0.23.1 when using version 0.22.1. This might lead to breaking code or invalid results. Use at your own risk.



Get predictions on the test data

In [9]:
bl_test_probs = baseline_model.predict_proba(test.drop("salary", axis=1))[:, 1]
bl_test_labels = (bl_test_probs > 0.5).astype(float)

## Learn model under FTU

Generate FTU data sets

In [None]:
train_ftu = train.drop('sex', axis=1).copy()
val_ftu = val.drop('sex', axis=1).copy()
test_ftu = test.drop('sex', axis=1).copy()

Learn model on FTU training data

In [15]:
ftu_model = MLPClassifier(hidden_layer_sizes=(100, 100), early_stopping=True)
ftu_model.fit(train_ftu.drop(columns="salary"), train.salary)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(100, 100), learning_rate='constant',
              learning_rate_init=0.001, max_fun=15000, max_iter=200,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=None, shuffle=True, solver='adam',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)

Generate prediction via learnt FTU model on test data

In [32]:
test_probs = ftu_model.predict_proba(test_ftu.drop("salary", axis=1))[:, 1]
test_pred_labels = (test_probs > 0.5).astype(float)

## Demographic parity

We first address the effect on demographic parity using FTU.

In [33]:
test_sex = test.sex.values
test_salary = test.salary.values
mask = test_sex == 1

# baseline metrics
bl_test_acc = accuracy(test_salary, bl_test_labels)
bl_test_dpd = demographic_parity_difference(
    test_salary, bl_test_labels, sensitive_features=test_sex,
)

# new model metrics
test_acc = accuracy(test_salary, test_pred_labels)
test_dpd = demographic_parity_difference(
    test_salary, test_pred_labels, sensitive_features=test_sex,
)

print(f"Baseline accuracy: {bl_test_acc:.3f}")
print(f"Accuracy: {test_acc:.3f}\n")

print(f"Baseline demographic parity: {bl_test_dpd:.3f}")
print(f"Demographic parity: {test_dpd:.3f}\n")

Baseline accuracy: 0.853
Accuracy: 0.851

Baseline demographic parity: 0.193
Demographic parity: 0.178



Consider accuracy on the female / male subgroups

NOTE: use 

In [37]:
dp_box = group_box_plots(
    np.concatenate([bl_test_probs, test_probs]),
    np.tile(test.sex.map({0: "Female", 1: "Male"}), 2),
    groups=np.concatenate(
        [np.zeros_like(bl_test_probs), np.ones_like(test_probs)]
    ),
    group_names=["Baseline", "FTU"],
    title="Distribution of scores by sex",
    xlabel="Scores",
    ylabel="Method",
)
dp_box

In [26]:
export_plot(dp_box, "ftu-dp.json")

## Equalised odds

Let us now evaluate equalised odds for the FTU model on the test data.

In [27]:
test_sex = test.sex.values
test_salary = test.salary.values
mask = test_sex == 1

# baseline metrics
bl_test_acc = accuracy(test_salary, bl_test_labels)
bl_test_eod = equalized_odds_difference(
    test_salary, bl_test_labels, sensitive_features=test_sex,
)

# new model metrics
test_acc = accuracy(test_pred_labels, test_salary)
test_eod = equalized_odds_difference(
    test_salary, test_pred_labels, sensitive_features=test_sex,
)

print(f"Baseline accuracy: {bl_test_acc:.3f}")
print(f"Accuracy: {test_acc:.3f}\n")

print(f"Baseline equalised odds difference: {bl_test_eod:.3f}")
print(f"Equalised odds difference: {test_eod:.3f}\n")

Baseline accuracy: 0.853
Accuracy: 0.851

Baseline equalised odds difference: 0.128
Equalised odds difference: 0.084



In [42]:
bl_eo_bar = group_bar_plots(
    bl_test_probs,
    test.sex.map({0: "Female", 1: "Male"}),
    groups=test.salary,
    group_names=["Low earners", "High earners"],
    title="Baseline mean scores by sex",
    xlabel="Proportion predicted high earners",
    ylabel="Outcome",
)
bl_eo_bar

In [43]:
eo_bar = group_bar_plots(
    test_probs,
    test.sex.map({0: "Female", 1: "Male"}),
    groups=test.salary,
    group_names=["Low earners", "High earners"],
    title="Corrected mean scores by sex",
    xlabel="Proportion predicted high earners",
    ylabel="Outcome",
)
eo_bar

In [44]:
export_plot(bl_eo_bar, "ftu-bl-eo.json")
export_plot(eo_bar, "ftu-eo.json")

## Equa opportunity

Let us now evaluate equal opportunity for the FTU model on the test data.

In [40]:
mask = test.salary == 1

eopp_bar = group_bar_plots(
    np.concatenate([bl_test_labels[mask], test_pred_labels[mask]]),
    np.tile(test.sex[mask].map({0: "Female", 1: "Male"}), 2),
    groups=np.concatenate(
        [np.zeros_like(bl_test_probs[mask]), np.ones_like(test_probs[mask])]
    ),
    group_names=["Baseline", "Pleiss"],
    title="Mean prediction for high earners by sex",
    xlabel="Proportion predicted high earners",
    ylabel="Method",
)
eopp_bar

In [41]:
export_plot(eo_bar, "ftu-eopp.json")

## Calibration

Let us now evaluate calibration for the FTU model on the test data.