## PIL TEST - IMPLEMENTATION

Here we will see the implementation of the [PIL Test](Fairness-New-Definitions.ipynb) on the [Adult Dataset](https://archive.ics.uci.edu/ml/datasets/adult).

### IMPORTNG THE NECESSARY LIBRARIES

We will be using the open-source [AIF360](https://github.com/Trusted-AI/AIF360) package to use several fairness based metrics.

In [7]:
import numpy as np
np.set_printoptions(suppress = True)
import pandas as pd

import matplotlib.pyplot as plt

# Importing the Dataset
from aif360.datasets import AdultDataset
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_adult

from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric

from aif360.metrics.utils import compute_boolean_conditioning_vector
from common_utils import compute_metrics

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

import torch
import torch.nn as nn

import pickle

### DATASET

In [2]:
priv_group = [{'sex':1}]
unpriv_group = [{'sex':0}]

In [3]:
data_adult = load_preproc_data_adult(['sex'])

### LOADING PRE-TRAINED MODELS

We trained two models simple logistic regression and equalized odds regularized logistic regression [here](Fairness-Regularization.ipynb). So, we load these models for testing our new definition of fairness.

In [123]:
class Log_Reg(nn.Module):
    def __init__(self, size_in):
        super().__init__()
        self.linear = nn.Linear(size_in, 1)
    def forward(self, x):
        prob_pred = torch.sigmoid(self.linear(x))
        return prob_pred

We load the test data on which the models are not trained so as to test out our new definition.

In [124]:
with open("raw-test-data.bin", "rb") as input:
    dset_raw_tst = pickle.load(input)
    
with open("trained-std-scaler.bin", "rb") as input:
    scaler = pickle.load(input)

In [125]:
M = Log_Reg(len(dset_raw_tst.feature_names)) # Non-Fairness Based Regularized Model
M_F = Log_Reg(len(dset_raw_tst.feature_names)) # Fairness Based Regularization Model

In [126]:
M.load_state_dict(torch.load("simple-logistic-regression.pt"))
M_F.load_state_dict(torch.load("equalized-odd-regualarized-logistic-regression.pt"))

M.eval()
M_F.eval()

Log_Reg(
  (linear): Linear(in_features=18, out_features=1, bias=True)
)

### PREDICTIONS OF THE MODELS

Here we get the predicted values of both the models. Now, we will proceed to use these predicted values as the feature for training the model $M'$ in both cases, as suggested by our new definition.

In [127]:
dset_tst_pred_M = dset_raw_tst.copy(deepcopy=True)
dset_tst_pred_M_F = dset_raw_tst.copy(deepcopy=True)
dset_tst = scaler.transform(dset_tst_pred.features)
dset_tst_pred_M.labels = (M(torch.from_numpy(dset_tst).float()) > 0.5).numpy().astype(float)
dset_tst_pred_M_F.labels = (M_F(torch.from_numpy(dset_tst).float()) > 0.5).numpy().astype(float)

We now prepare the data for training the **adversarial** models for predicting the protected group membership from the remaining features of the test data (and the predicted labels of the original model), as per the [PIL Test](Fairness-New-Definitions.ipynb).

In [128]:
# y is the target feature in this case i.e. protected group- "sex"
y = dset_tst_pred_M.features[:,1].reshape((-1, 1))
# replacing the protected "sex" column with model M's predicted values
dset_tst_pred_M.features[:,1] = dset_tst_pred_M.labels.ravel()
# replacing the protected "sex" column with model M_F's predicted values
dset_tst_pred_M_F.features[:,1] = dset_tst_pred_M_F.labels.ravel()
# Features except the predicted outcome and "sex"
dset_other_features = dset_raw_tst.features[:,[0]+list(range(2,len(dset_tst_pred_M.feature_names)))]

In [129]:
# scaling the new matrix formed by replacing the "sex" column by the outcome
# predicted by the respective models
scaler_M = StandardScaler()
dset_tst_M = scaler_M.fit_transform(dset_tst_pred_M.features)

scaler_M_F = StandardScaler()
dset_tst_M_F = scaler_M_F.fit_transform(dset_tst_pred_M_F.features)

In [130]:
dset_tst_M = torch.from_numpy(dset_tst_M).float()
dset_tst_M_F = torch.from_numpy(dset_tst_M_F).float()
dset_other_features = torch.from_numpy(dset_other_features).float()
y = torch.from_numpy(y).float()

Here we will train our model $M'$ for the case of $M$'s predicted value containing dataset and $M_F$'s predicted value containing dataset and $M''$.

In [131]:
adv_mod_pred_M = Log_Reg(len(dset_tst_pred_M.feature_names))
adv_mod_wo_pred = Log_Reg((len(dset_tst_pred_M.feature_names)-1))
adv_mod_pred_M_F = Log_Reg(len(dset_tst_pred_M_F.feature_names))

In [132]:
num_epochs = 5000 # Number of Epochs
learning_rate = 0.01 # Learning Rate

# Stochastic Gradient Descent Optimizers
optimizer_M = torch.optim.SGD(adv_mod_pred_M.parameters(), lr= learning_rate)
optimizer_M_F = torch.optim.SGD(adv_mod_pred_M_F.parameters(), lr= learning_rate)
optimizer_wo_pred = torch.optim.SGD(adv_mod_wo_pred.parameters(), lr= learning_rate)

# Binary Cross Entropy Loss Functions
criterion = nn.BCELoss()

In [133]:
# Training adv_mod_wo_pred
print("Training adv_mod_wo_pred:")
adv_mod_wo_pred.train()
for epoch in range(num_epochs):
    p_pred = adv_mod_wo_pred(dset_other_features)
    loss= criterion(p_pred, y)
    
    loss.backward()
    optimizer_wo_pred.step()
    
    optimizer_wo_pred.zero_grad()
    
    if (epoch+1) % 500== 0:
        print(f'epoch: {epoch+1}, loss = {loss.item():.4f}')
        
print("Trained adv_mod_wo_pred's Performance on Data:")        
with torch.no_grad():
    y_pred = (p_pred > 0.5).numpy().astype(float)
    accuracy_wo_pred = (np.sum(np.sum(np.array(y_pred.ravel()) == np.array(y.ravel())))/len(y.ravel()))*100
    print("Accuracy: ", accuracy_wo_pred)
    

Training adv_mod_wo_pred:
epoch: 500, loss = 0.3765
epoch: 1000, loss = 0.3181
epoch: 1500, loss = 0.2800
epoch: 2000, loss = 0.2534
epoch: 2500, loss = 0.2341
epoch: 3000, loss = 0.2198
epoch: 3500, loss = 0.2089
epoch: 4000, loss = 0.2002
epoch: 4500, loss = 0.1932
epoch: 5000, loss = 0.1873
Trained adv_mod_wo_pred's Performance on Data:
Accuracy:  90.95065856821128


We see above that the model $M''$ which is trained on all the other features of the test set except the predictions of $M$ or $M_F$ is achieving $90.5\%$ accuracy. The accuracy is high indicating the presence of some proxy variables for the protected group.

In [134]:
# Training adv_mod_pred_M
print("Training adv_mod_pred_M:")
adv_mod_pred_M.train()
for epoch in range(num_epochs):
    p_pred = adv_mod_pred_M(dset_tst_M)
    loss= criterion(p_pred, y)
    
    loss.backward()
    optimizer_M.step()
    
    optimizer_M.zero_grad()
    
    if (epoch+1) % 500== 0:
        print(f'epoch: {epoch+1}, loss = {loss.item():.4f}')
        
print("Trained adv_mod_pred_M's Performance on Data:")        
with torch.no_grad():
    y_pred = (p_pred > 0.5).numpy().astype(float)
    accuracy_m = (np.sum(np.sum(np.array(y_pred.ravel()) == np.array(y.ravel())))/len(y.ravel()))*100
    print("Accuracy: ", accuracy_m)
    print("PIL value: ", (accuracy_m - accuracy_wo_pred)/(100 - accuracy_wo_pred)*100)

Training adv_mod_pred_M:
epoch: 500, loss = 0.1874
epoch: 1000, loss = 0.1030
epoch: 1500, loss = 0.0702
epoch: 2000, loss = 0.0529
epoch: 2500, loss = 0.0424
epoch: 3000, loss = 0.0353
epoch: 3500, loss = 0.0302
epoch: 4000, loss = 0.0263
epoch: 4500, loss = 0.0234
epoch: 5000, loss = 0.0210
Trained adv_mod_pred_M's Performance on Data:
Accuracy:  100.0
PIL value:  100.0


We see above the trained $M'$ model for model $M$ and it is starking that we predicted back the gender with $100\%$ accuracy, whereas our initial accuracy was $90.5\%$. This suggests that the model's unfair behavior. The PIL value of perfect $100\%$ is a clear indication that our model is very unfair.

In [135]:
# Training adv_mod_pred_M_F
print("Training adv_mod_pred_M_F:")
adv_mod_pred_M_F.train()
for epoch in range(num_epochs):
    p_pred = adv_mod_pred_M(dset_tst_M_F)
    loss= criterion(p_pred, y)
    
    loss.backward()
    optimizer_M_F.step()
    
    optimizer_M_F.zero_grad()
    
    if (epoch+1) % 500== 0:
        print(f'epoch: {epoch+1}, loss = {loss.item():.4f}')
        
print("Trained adv_mod_pred_M_F's Performance on Data:")        
with torch.no_grad():
    y_pred = (p_pred > 0.5).numpy().astype(float)
    accuracy_m_f = (np.sum(np.sum(np.array(y_pred.ravel()) == np.array(y.ravel())))/len(y.ravel()))*100
    print("Accuracy: ", accuracy_m_f)
    print("PIL value: ", (accuracy_m_f - accuracy_wo_pred)/(100 - accuracy_wo_pred)*100)

Training adv_mod_pred_M_F:
epoch: 500, loss = 0.1885
epoch: 1000, loss = 0.1885
epoch: 1500, loss = 0.1885
epoch: 2000, loss = 0.1885
epoch: 2500, loss = 0.1885
epoch: 3000, loss = 0.1885
epoch: 3500, loss = 0.1885
epoch: 4000, loss = 0.1885
epoch: 4500, loss = 0.1885
epoch: 5000, loss = 0.1885
Trained adv_mod_pred_M_F's Performance on Data:
Accuracy:  94.91571691803726
PIL value:  43.81598793363503


We see above the trained $M'$ model for model $M_F$ and we predicted back the gender with $94.9\%$ accuracy, whereas our initial accuracy was $90.5\%$. This suggests that the model is much fairer. One interesting thing to notice here that regularization deliberately increases the loss a bit and prevents it from becoming very close to $0$ when predicting back protected variable, thereby it holds out from revealing or letting the protected variable from influencing the outcome way too much such that it does not becomes unfair. The PIL value of $\sim 43\%$ indicates that our model $M_F$ is fair as per our new definition.