[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/castillosebastian/castillosebastian_colabs/blob/main/adult_income_prediction/adult_income_prediction.ipynb)

# Fair IA: An Exploration

## EDA

We are goint to test 'fair-ml' algorithms. We are going to work with *Adult* dataset (Dua & Graff, 2017) used to predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset Train dataset contains 13 features and 30178 observations. Test dataset contains 13 features and 15315 observations. Target column is "target": A binary factor where 1: <=50K and 2: >50K for annual income. The column "sex" is set as protected attribute.

### Libraries

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from tqdm.auto import tqdm
from inFairness.fairalgo import SenSeI
from inFairness import distances
from inFairness.auditor import SenSRAuditor, SenSeIAuditor
%load_ext autoreload
%autoreload 2
import metrics

  from .autonotebook import tqdm as notebook_tqdm
  warn_deprecated('vmap', 'torch.vmap')


In [11]:
import pandas as pd
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'
names = [
        'age', 'workclass', 'fnlwgt', 'education', 
        'education-num', 'marital-status', 'occupation',
        'relationship', 'race', 'sex', 'capital-gain', 
        'capital-loss', 'hours-per-week', 'native-country',
        'annual-income'
    ]
data = pd.read_csv(url, sep=',', names=names)

In [12]:
data.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,annual-income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [13]:
data['annual-income'].value_counts()

annual-income
 <=50K    24720
 >50K      7841
Name: count, dtype: int64

The dataset is imbalanced: 25% make at least $50k per year. This imbalanced also appears in *sex* and *race* as shown here: 

In [15]:
(imbal_sex := data.groupby(['annual-income', 'sex']).size() 
   .sort_values(ascending=False) 
   .reset_index(name='count')
   .assign(percentage = lambda df:100 * df['count']/df['count'].sum())   
   )

Unnamed: 0,annual-income,sex,count,percentage
0,<=50K,Male,15128,46.46049
1,<=50K,Female,9592,29.458555
2,>50K,Male,6662,20.46006
3,>50K,Female,1179,3.620896


In [16]:
(imbal_race := data.groupby(['annual-income', 'race']).size() 
   .sort_values(ascending=False) 
   .reset_index(name='count')
   .assign(percentage = lambda df:100 * df['count']/df['count'].sum())   
   )

Unnamed: 0,annual-income,race,count,percentage
0,<=50K,White,20699,63.569915
1,>50K,White,7117,21.857437
2,<=50K,Black,2737,8.405761
3,<=50K,Asian-Pac-Islander,763,2.343294
4,>50K,Black,387,1.188538
5,>50K,Asian-Pac-Islander,276,0.84764
6,<=50K,Amer-Indian-Eskimo,275,0.844569
7,<=50K,Other,246,0.755505
8,>50K,Amer-Indian-Eskimo,36,0.110562
9,>50K,Other,25,0.076779


# Simple Nural Network model folowing IBM Research

Source [inFairness](https://github.com/IBM/inFairness/blob/main/examples/adult-income-prediction/adult_income_prediction.ipynb)

In [17]:
class AdultDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __getitem__(self, idx):
        data = self.data[idx]
        label = self.labels[idx]
        return data, label
    
    def __len__(self):
        return len(self.labels)

Note that the categorical variable are transformed into one-hot variables.

In [34]:
import data
train_df, test_df = data.load_data()
X_train_df, Y_train_df = train_df
X_test_df, Y_test_df = test_df

In [35]:
X_train_df.head(1)

Unnamed: 0,age,capital-gain,capital-loss,education-num,hours-per-week,marital-status_Divorced,marital-status_Married-AF-spouse,marital-status_Married-civ-spouse,marital-status_Married-spouse-absent,marital-status_Never-married,...,relationship_Unmarried,relationship_Wife,sex_Male,workclass_Federal-gov,workclass_Local-gov,workclass_Private,workclass_Self-emp-inc,workclass_Self-emp-not-inc,workclass_State-gov,workclass_Without-pay
0,0.409331,-0.14652,-0.218253,-1.613806,-0.49677,False,False,False,False,True,...,True,False,False,False,False,True,False,False,False,False


In the IBM-inFairness model [example](https://github.com/IBM/inFairness/blob/main/examples/adult-income-prediction/adult_income_prediction.ipynb) the protected attributes are droped from the training and test data. That is usually de case in fairness-aware machine learning when we deal with features that we know area biased. The idea is to prevent the model from directly learning to make decisions based on these sensitive attributes, which could lead to discriminatory outcomes.

However, this approach has some limitations. Even if you remove the protected attribute, other features in the dataset might act as proxies for it: meaning they may retaing a strong signal of the bias information. As an example certain occupations, neighborhoods, or education levels might be disproportionately associated with certain racial groups due to societal factors. So, even without explicit information about race, the model might still end up learning patterns that indirectly reflect racial biases.

On the odther hand, removing sensitives attributes makes it difficult to analyze the fairness of the model. If we don't know the race of the individuals in our dataset, we can't check whether our model is treating individuals of different races equally.

In some cases, it's important to consider sensitive attributes to ensure fairness. For example, in order to correct for historical biases or to achieve certain diversity and inclusion goals, it might be necessary to consider these attributes.

So, while removing sensitive attributes might seem like an easy fix, it doesn't necessarily solve the problem of bias and might introduce new problems. Instead, it's often better to use techniques that aim to ensure that the model treats similar individuals similarly (individual fairness), regardless of their sensitive attributes.

In [36]:
protected_vars = ['race_White', 'sex_Male']
X_protected_df = X_train_df[protected_vars]
X_train_df = X_train_df.drop(columns=protected_vars)
X_test_df = X_test_df.drop(columns=protected_vars)

In the context of assessing individual fairness, the example we are working with implements a variable consistency measure using the 'spouse' attribute. This involves flipping the 'spouse' variable in the dataset, essentially simulating a scenario where individuals with the same characteristics but different 'spouse' values are compared. The goal is to ensure that the model's predictions are consistent for individuals who are similar except for their 'spouse' attribute, thereby upholding the principle of individual fairness. This approach provides a practical way to audit the model's fairness by checking if similar individuals are treated similarly.

In [37]:
X_test_df.relationship_Wife.values.astype(int)

array([0, 1, 0, ..., 0, 0, 0])

In [38]:

X_test_df_spouse_flipped = X_test_df.copy()
X_test_df_spouse_flipped.relationship_Wife = 1 - X_test_df_spouse_flipped.relationship_Wife
X_test_df_spouse_flipped.relationship_Wife.values

array([1, 0, 1, ..., 1, 1, 1])

In [39]:
device = torch.device('cpu')

# Convert all pandas dataframes to PyTorch tensors
X_train, y_train = data.convert_df_to_tensor(X_train_df, Y_train_df)
X_test, y_test = data.convert_df_to_tensor(X_test_df, Y_test_df)
X_test_flip, y_test_flip = data.convert_df_to_tensor(X_test_df_spouse_flipped, Y_test_df)
X_protected = torch.tensor(X_protected_df.values).float()

# Create the training and testing dataset
train_ds = AdultDataset(X_train, y_train)
test_ds = AdultDataset(X_test, y_test)
test_ds_flip = AdultDataset(X_test_flip, y_test_flip)

# Create train and test dataloaders
train_dl = DataLoader(train_ds, batch_size=64, shuffle=True)
test_dl = DataLoader(test_ds, batch_size=1000, shuffle=False)
test_dl_flip = DataLoader(test_ds_flip, batch_size=1000, shuffle=False)

We test a Multilayer neural network as proposed in the IBM implementation example.

In [40]:
class Model(nn.Module):

    def __init__(self, input_size, output_size):

        super().__init__()
        self.fc1 = nn.Linear(input_size, 100)
        self.fc2 = nn.Linear(100, 100)
        self.fcout = nn.Linear(100, output_size)

    def forward(self, x):

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fcout(x)
        return x

### Standard training

In [41]:
input_size = X_train.shape[1]
output_size = 2

network_standard = Model(input_size, output_size).to(device)
optimizer = torch.optim.Adam(network_standard.parameters(), lr=1e-3)
loss_fn = F.cross_entropy

EPOCHS = 10

In [42]:
network_standard.train()

for epoch in tqdm(range(EPOCHS)):

    for x, y in train_dl:

        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        y_pred = network_standard(x).squeeze()
        loss = loss_fn(y_pred, y)
        loss.backward()
        optimizer.step()

100%|██████████| 10/10 [00:08<00:00,  1.24it/s]


In [43]:
accuracy = metrics.accuracy(network_standard, test_dl, device)
balanced_acc = metrics.balanced_accuracy(network_standard, test_dl, device)
spouse_consistency = metrics.spouse_consistency(network_standard, test_dl, test_dl_flip, device)

print(f'Accuracy: {accuracy}')
print(f'Balanced accuracy: {balanced_acc}')
print(f'Spouse consistency: {spouse_consistency}')

Accuracy: 0.8555948734283447
Balanced accuracy: 0.7764129391420478
Spouse consistency: 0.9636222910216719


The simple NN achieve .85 of accuracy. However, the inconsistency score of 0.04 on the 'spouse' variable suggests that the model is not treating similar individuals consistently, which is a violation of individual fairness. This inconsistency could be due to the model learning to differentiate based on gender, despite the intention to avoid such bias.

## Individually fair training with LogReg fair metric

In the following section, a fair machine learning model is introduced. This model is said to be fair because its performance remains consistent under certain perturbations within a sensitive subspace, meaning it is robust to partial data variations.

To illustrate the authors' approach, let us consider the process of evaluating the fairness of a resume screening system. An auditor might alter the names on resumes of Caucasian applicants to those more commonly found among the African-American population. If the system's performance declines upon reviewing the altered resumes (i.e., the evaluations become less favorable), one could infer that the model exhibits bias against African-American applicants.

To algorithmically address this issue, the authors propose a method to instill individual fairness during the training of ML models. This is achieved through *distributionally robust optimization* (DRO), an optimization technique that seeks the optimal solution while considering a fairness metric (inspired by Adversarial Robustness). The implementation is designed to ensure that the ML model maintains fairness not only with the training and testing data but also with new data from the same distribution."

## Learning fair metric from data and its hidden signals

The authors use Wasserstein distances to measure the similarity between individuals. Unlike Mahalanobis, Wasserstein distance can be used to compare two probability distributions and is defined as the minimum cost that must be paid to transform one distribution into the other.  The distances between data points are calculated in a way that takes into account protected attributes (in our example: gender or race). The goal is to ensure that similar individuals, as determined by Wasserstein distance, are treated similarly by the machine learning model.

To achieve this, the algorithm learn 'sensitive directions' in the data. These are directions in the feature space along which changes are likely to correspond to changes in protected attributes. These is a clever approach to uncover hidden biases by identifying subtle patterns that may correspond to changes in protected attributes, even if those attributes are not present in our model inputs. This allows the model to account for potential biases that might otherwise go unnoticed. 

For instance, to identify a sensitive direction associated with a particular attribute (e.g., gender), the algorithm use a logistic regression classifier to distinguish between classes (such as men and women in the data). The coefficients from this logistic regression model define a direction within the feature space. The algorithm then disregards these 'sensitive directions' while calculating the fair metric. This ensures that differences along sensitive directions are not factored into the distance computation between two individuals. The purpose of this approach is to prevent the machine learning model from discriminating based on protected attributes.

So, by fitting the logistic regression model, the algorithm also learns which features are good predictors of the protected attribute (in this case, gender). This allows it to identify and avoid using features that could lead to unfair discrimination.

In [44]:
# Same architecture we found
network_fair_LR = Model(input_size, output_size).to(device)
optimizer = torch.optim.Adam(network_fair_LR.parameters(), lr=1e-3)
lossfn = F.cross_entropy

# set the distance metric for instances similiraty detections
distance_x_LR = distances.LogisticRegSensitiveSubspace()
distance_y = distances.SquaredEuclideanDistance()

# train fair metric
distance_x_LR.fit(X_train, data_SensitiveAttrs=X_protected)
distance_y.fit(num_dims=output_size)

distance_x_LR.to(device)
distance_y.to(device)

In [45]:
rho = 5.0
eps = 0.1
auditor_nsteps = 100
auditor_lr = 1e-3

fairalgo_LR = SenSeI(network_fair_LR, distance_x_LR, distance_y, lossfn, rho, eps, auditor_nsteps, auditor_lr)

## A fair objective function

The objective function that is minimized during the training of a fair machine learning model as proposed in the inFairness package is composed of two parts: the loss function and the fair metric (see [SenSeI](https://ibm.github.io/inFairness/_modules/inFairness/fairalgo/sensei.html#SenSeI)): 

In [None]:
fair_loss = torch.mean(
            #--------1------------- + -----------------2-----------------------------# 
            self.loss_fn(Y_pred, Y) + self.rho * self.distance_y(Y_pred, Y_pred_worst)
        )

1. Loss Function: a classical loss function that measure of how well the model's predictions match the actual data. The goal of this metric is to adjust the model's parameters to minimize the loss score, and
2. Fair Metric (DIF): the fairness term is a measure of the difference between the model's predictions on the original data and its predictions on the worst-case examples. 

The model is trying to minimize this objective function, which means it's trying to make accurate and fair predictions.

It's important to note that due to the computation of a complex loss score, the training process becomes more resource-intensive.

In [47]:
fairalgo_LR.train()

for epoch in tqdm(range(EPOCHS)):
    for x, y in train_dl:
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        result = fairalgo_LR(x, y)
        result.loss.backward()
        optimizer.step()

100%|██████████| 10/10 [10:09<00:00, 60.90s/it]


In [48]:
accuracy = metrics.accuracy(network_fair_LR, test_dl, device)
balanced_acc = metrics.balanced_accuracy(network_fair_LR, test_dl, device)
spouse_consistency = metrics.spouse_consistency(network_fair_LR, test_dl, test_dl_flip, device)

print(f'Accuracy: {accuracy}')
print(f'Balanced accuracy: {balanced_acc}')
print(f'Spouse consistency: {spouse_consistency}')

Accuracy: 0.8401150107383728
Balanced accuracy: 0.742399333699871
Spouse consistency: 0.9997788589119858


#### Let's now audit the models and check for their individual fairness compliance

In [50]:
# Auditing using the SenSR Auditor + LR metric

audit_nsteps = 1000
audit_lr = 0.1

auditor_LR = SenSRAuditor(loss_fn=loss_fn, distance_x=distance_x_LR, num_steps=audit_nsteps, lr=audit_lr, max_noise=0.5, min_noise=-0.5)

audit_result_stdmodel = auditor_LR.audit(network_standard, X_test, y_test, lambda_param=10.0, audit_threshold=1.15)
audit_result_fairmodel_LR = auditor_LR.audit(network_fair_LR, X_test, y_test, lambda_param=10.0, audit_threshold=1.15)
print("="*100)
print("LR metric")
print(f"Loss ratio (Standard model) : {audit_result_stdmodel.lower_bound}. Is model fair: {audit_result_stdmodel.is_model_fair}")
print(f"Loss ratio (fair model - LogReg metric) : {audit_result_fairmodel_LR.lower_bound}. Is model fair: {audit_result_fairmodel_LR.is_model_fair}")
print("-"*100)
print("\t As signified by these numbers, the fair models are fairer than the standard model")
print("="*100)

  loss_ratio = np.divide(loss_vals_adversarial, loss_vals_original)


LR metric
Loss ratio (Standard model) : 2.1810670575586046. Is model fair: False
Loss ratio (fair model - LogReg metric) : 1.0531351204682995. Is model fair: True
----------------------------------------------------------------------------------------------------
	 As signified by these numbers, the fair models are fairer than the standard model


# Further explorations

### Individually fair training with EXPLORE metric

In [49]:
Y_gender = X_protected[:, -1]
X1, X2, Y_pairs = data.create_data_pairs(X_train, y_train, Y_gender)

distance_x_explore = distances.EXPLOREDistance()
distance_x_explore.fit(X1, X2, Y_pairs, iters=1000, batchsize=10000)
distance_x_explore.to(device)

  sclVec = 2.0 / (np.exp(diag) - 1)


In [32]:
network_fair_explore = Model(input_size, output_size).to(device)
optimizer = torch.optim.Adam(network_fair_explore.parameters(), lr=1e-3)
lossfn = F.cross_entropy

rho = 25.0
eps = 0.1
auditor_nsteps = 10
auditor_lr = 1e-2

fairalgo_explore = SenSeI(network_fair_explore, distance_x_explore, distance_y, lossfn, rho, eps, auditor_nsteps, auditor_lr)

In [33]:
fairalgo_explore.train()

for epoch in tqdm(range(EPOCHS)):
    for x, y in train_dl:
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        result = fairalgo_explore(x, y)
        result.loss.backward()
        optimizer.step()

100%|██████████| 10/10 [01:17<00:00,  7.71s/it]


In [16]:
accuracy = metrics.accuracy(network_fair_explore, test_dl, device)
balanced_acc = metrics.balanced_accuracy(network_fair_explore, test_dl, device)
spouse_consistency = metrics.spouse_consistency(network_fair_explore, test_dl, test_dl_flip, device)

print(f'Accuracy: {accuracy}')
print(f'Balanced accuracy: {balanced_acc}')
print(f'Spouse consistency: {spouse_consistency}')

Accuracy: 0.8225342631340027
Balanced accuracy: 0.7034968962244807
Spouse consistency: 1.0


#### Let's now audit the three models and check for their individual fairness compliance

In [17]:
# Auditing using the SenSR Auditor + LR metric

audit_nsteps = 1000
audit_lr = 0.1

auditor_LR = SenSRAuditor(loss_fn=loss_fn, distance_x=distance_x_LR, num_steps=audit_nsteps, lr=audit_lr, max_noise=0.5, min_noise=-0.5)

audit_result_stdmodel = auditor_LR.audit(network_standard, X_test, y_test, lambda_param=10.0, audit_threshold=1.15)
audit_result_fairmodel_LR = auditor_LR.audit(network_fair_LR, X_test, y_test, lambda_param=10.0, audit_threshold=1.15)
audit_result_fairmodel_explore = auditor_LR.audit(network_fair_explore, X_test, y_test, lambda_param=10.0, audit_threshold=1.15)

print("="*100)
print("LR metric")
print(f"Loss ratio (Standard model) : {audit_result_stdmodel.lower_bound}. Is model fair: {audit_result_stdmodel.is_model_fair}")
print(f"Loss ratio (fair model - LogReg metric) : {audit_result_fairmodel_LR.lower_bound}. Is model fair: {audit_result_fairmodel_LR.is_model_fair}")
print(f"Loss ratio (fair model - EXPLORE metric) : {audit_result_fairmodel_explore.lower_bound}. Is model fair: {audit_result_fairmodel_explore.is_model_fair}")
print("-"*100)
print("\t As signified by these numbers, the fair models are fairer than the standard model")
print("="*100)

  loss_ratio = np.divide(loss_vals_adversarial, loss_vals_original)


LR metric
Loss ratio (Standard model) : 2.660843321695338. Is model fair: False
Loss ratio (fair model - LogReg metric) : 1.0569381426130153. Is model fair: True
Loss ratio (fair model - EXPLORE metric) : 1.027237853086672. Is model fair: True
----------------------------------------------------------------------------------------------------
	 As signified by these numbers, the fair models are fairer than the standard model


In [18]:
# Auditing using the SenSR Auditor + EXPLORE metric

audit_nsteps = 1000
audit_lr = 0.1

auditor_explore = SenSRAuditor(loss_fn=loss_fn, distance_x=distance_x_explore, num_steps=audit_nsteps, lr=audit_lr, max_noise=0.5, min_noise=-0.5)

audit_result_stdmodel = auditor_explore.audit(network_standard, X_test, y_test, lambda_param=10.0, audit_threshold=1.15)
audit_result_fairmodel_LR = auditor_explore.audit(network_fair_LR, X_test, y_test, lambda_param=10.0, audit_threshold=1.15)
audit_result_fairmodel_explore = auditor_explore.audit(network_fair_explore, X_test, y_test, lambda_param=10.0, audit_threshold=1.15)

print("="*100)
print("EXPLORE metric")
print(f"Loss ratio (Standard model) : {audit_result_stdmodel.lower_bound}. Is model fair: {audit_result_stdmodel.is_model_fair}")
print(f"Loss ratio (fair model - LogReg metric) : {audit_result_fairmodel_LR.lower_bound}. Is model fair: {audit_result_fairmodel_LR.is_model_fair}")
print(f"Loss ratio (fair model - EXPLORE metric) : {audit_result_fairmodel_explore.lower_bound}. Is model fair: {audit_result_fairmodel_explore.is_model_fair}")
print("-"*100)
print("\t As signified by these numbers, the fair models are fairer than the standard model")
print("="*100)

EXPLORE metric
Loss ratio (Standard model) : 4.319292031489292. Is model fair: False
Loss ratio (fair model - LogReg metric) : 1.13404923721215. Is model fair: True
Loss ratio (fair model - EXPLORE metric) : 1.0724120239938144. Is model fair: True
----------------------------------------------------------------------------------------------------
	 As signified by these numbers, the fair models are fairer than the standard model
