# **Post-processing technique: Equalised Odds Post-processing**

The Equalised Odds Post-Processing approach (Hardt et al 2016) flips predictions at random until a desired error rate distribution between the protected group and the rest of the sample is achieved. The steps we will take are outlined below.

1. First, we will calculate Disparate Impact and Statistical Parity Difference metrics for a baseline model with no fairness intervention.
2. We will then apply the Equalised Odds Post-processing method to data used to train a predictive model and observe the results.

# Install Libraries and load data

In [None]:
# install holisticai
!pip install holisticai

import pickle
import pandas as pd
import numpy as np
import seaborn as sns

from sklearn.model_selection import StratifiedKFold
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

from holisticai.bias.mitigation import EqualizedOdds
from holisticai.bias import metrics as bias_metrics

Load the data into a dataframe

In [None]:
# suppress warnings
import warnings
warnings.simplefilter("ignore")

# Load data
from sklearn.datasets import fetch_openml
bunch = fetch_openml(data_id=44270)
df = bunch['frame'].dropna()
df['Ethnicity_White'] = (df['Ethnicity'] == 'White')*1
df['Ethnicity_Black'] = (df['Ethnicity'] == 'Black')*1
df = df.drop(columns = ['Gender', 'Ethnicity'])
df

# Run a baseline predictive model without applying post-processing
First we will build a Logistic Regression classifier and observe some baseline results, using the original data without Post-processing.

Set up variables for the privileged and unprivileged groups. In this example we will assign 'Ethnicity_White' as our privileged group and Black as the unpriviledged group.

Train a Logistic Regression model with 10 fold stratified cross validation. Compute performance metrics (Accuracy, Precision, Recall and F1 Score) and fairness metrics (Equalized Odds Difference, False Negative Rate Difference, Statistical Parity Difference).


In [None]:
# Instantiate the classifier (this code is ready to run, there are no gaps to fill)
model = LogisticRegression(random_state=10, solver="lbfgs", penalty="none")

# instantiate the cross-validation scheme
mv = StratifiedKFold(n_splits=10, shuffle=True, random_state=10)

# setup the performance metrics to be computed
perf_metrics = {"Accuracy": metrics.accuracy_score, 
                "Precision": metrics.precision_score, 
                "Recall": metrics.recall_score,
                "F1-Score": metrics.f1_score, 
                }

In [None]:
# Train a logistic regression classifier on the dataset (this code is ready to run, there are no gaps to fill)
k, i = True, 1

# instantiating X
X = df.drop(columns=["Label"])

# instantiating the target variable
y = df['Label']

for (train, test) in mv.split(X, y):

    # instantiating X
    X_train = X.iloc[train].copy()

    # instantiating y
    y_train = y.iloc[train].copy()

    # fit model
    model.fit(X_train, y_train)

    # X_test 
    X_test = X.iloc[test]
    
    # set up vectors
    group_a = X_test.Ethnicity_Black == 1
    group_b = X_test.Ethnicity_White == 1
    y_pred = model.predict(X_test)
    y_true = y.iloc[test].values.ravel()
    params = [group_a, group_b, y_pred]

    # compute performance metrics
    metric_list = []
    for pf in perf_metrics.keys():
            metric_list += [[pf, perf_metrics[pf](y_true, y_pred)]]
    
    # Compute fairness metrics
    metric_list += [['Statistical Parity Difference', bias_metrics.statistical_parity(group_a, group_b, y_pred)]]
    metric_list += [['Disparate Impact', bias_metrics.disparate_impact(group_a, group_b, y_pred)]]
    metric_list += [['Equalized Odds Difference', bias_metrics.average_odds_diff(group_a, group_b, y_pred,y_true)]]
    metric_list += [['False Negative Rate Difference', bias_metrics.false_negative_rate_diff(group_a, group_b, y_pred,y_true)]]

    # concatenate results
    df_m = pd.DataFrame(metric_list, columns=["Metric", "Value"])
    df_m["Fold"] = i
    i += 1
    if k:
        df_metrics_orig = df_m.copy()
        k=0
    else:
        df_metrics_orig = pd.concat([df_metrics_orig, df_m.copy()], axis=0, ignore_index=True)

df_metrics_orig

In [None]:
# TODO: Display metrics

# /TODO

Unnamed: 0_level_0,mean,std
Unnamed: 0_level_1,Value,Value
Metric,Unnamed: 1_level_2,Unnamed: 2_level_2
Accuracy,0.712141,0.008402
Equalized Odds Difference,-0.058946,0.012744
F1-Score,0.593431,0.011609
False Negative Rate Difference,0.072792,0.023473
Precision,0.648655,0.013874
Recall,0.546962,0.012234
Statistical Parity Difference,-0.070907,0.010773


# Apply Equalised Odds Post-processing to the predictive model

Amend your Logistic Regression routine above to apply [Equalized Odds](https://holisticai.readthedocs.io/en/latest/generated/holisticai.bias.mitigation.EqualizedOdds.html#holisticai.bias.mitigation.EqualizedOdds) Post-Processing to each fold of training data.  Compute performance metrics (Accuracy, Precision, Recall and F1 Score) and fairness metrics (Equalized Odds Difference, False Negative Rate Difference, Statistical Parity Difference)

In [None]:

k, i = True, 1

# instantiating X
X = df.drop(columns=["Label"])

# instantiating the target variable
y = df['Label']

for (train, test) in mv.split(X, y):

    # Set up the data
    X_train = X.iloc[train].copy()
    y_train = y.iloc[train].copy()

    # fit model
    model = model.fit(X_train, y_train)
    
    # get predictions in the test set
    X_test = X.iloc[test].copy()
    y_test = y.iloc[test].copy()
    ypred_prob = model.predict_proba(X_test).ravel()[1::2] # get probabilities
    ypred_class = model.predict(X_test)

    # fit post-processing using results from 60% of the test set
    test_pct = 0.4
    n = int(len(y_test))
    n_2 = int(n* (1-test_pct))
    indices = np.random.permutation(n)
    pp_indices = indices[:n_2]
    test_indices = indices[n_2:]

    # set up data
    group_a = X_test.reset_index().Ethnicity_Black == 1
    group_b = X_test.reset_index().Ethnicity_White == 1
    y_train_pp = np.array(y_test)[pp_indices]
    y_pred_train = np.array(ypred_class)[pp_indices]
    group_a_train = np.array(group_a)[pp_indices]
    group_b_train = np.array(group_b)[pp_indices]
    y_pred_test = np.array(ypred_class)[test_indices]
    group_a_test = group_a[test_indices]
    group_b_test = group_b[test_indices]
    y_true_test = np.array(y_test)[test_indices]

    # TODO Use eq to post-process predictions on the other 40% of the test set

    # TODO fit it

    # TODO transform

    # TODO get new labels


    # compute performance metrics
    metric_list = []
    for pf in perf_metrics.keys():
            metric_list += [[pf, perf_metrics[pf](y.iloc[test].values.ravel(), ypred_class.ravel())]]
    
    # Compute fairness metrics
    metric_list += [['Statistical Parity Difference', bias_metrics.statistical_parity(group_a_test, group_b_test, y_pred)]]
    metric_list += [['Disparate Impact', bias_metrics.disparate_impact(group_a_test, group_b_test, y_pred)]]
    metric_list += [['Equalized Odds Difference', bias_metrics.average_odds_diff(group_a_test, group_b_test, y_pred ,y_true_test)]]
    metric_list += [['False Negative Rate Difference', bias_metrics.false_negative_rate_diff(group_a_test, group_b_test, y_pred ,y_true_test)]]

    # concatenate results
    df_m = pd.DataFrame(metric_list, columns=["Metric", "Value"])
    df_m["Fold"] = i
    i += 1
    if k:
        df_metrics = df_m.copy()
        k=0
    else:
        df_metrics = pd.concat([df_metrics, df_m.copy()], axis=0, ignore_index=True)

In [None]:
# TODO: Display metrics

# /TODO

# Present results to show the effectiveness of the Post-processing method

Present graphs (bar charts work well) to show how each performance and fairness metric differs for the baseline model compared with the application of Post-processing. Show the target line for each metric on the graph.

In [None]:
# TODO: Present graphs to show each performance and fairness metrics

# /TODO
