# Goal: Fairness Metrics Computation
The goal of this notebook is given:
- a dataset D
- a target variable
- a sensible attribute
- a legittimate attribute (needed only for one specific metric)

it computes the statistical fairness metrics presented in Verma and Rubin paper https://dl.acm.org/doi/pdf/10.1145/3194770.3194776 + Aequitas metrics https://arxiv.org/pdf/1811.05577.pdf.

The running example is the Credit Score dataset https://www.kaggle.com/datasets/kabure/german-credit-data-with-risk



# Import Libraries and Requirements





In [1]:
try:
  from google.colab import drive
  drive.mount('/content/drive')
  import sys
  path_to_project = '/content/drive/MyDrive/FairAlgorithm'
  sys.path.append(path_to_project)
  !sudo apt install libcairo2-dev pkg-config python3-dev
  IN_COLAB = True
except:
  IN_COLAB = False

Mounted at /content/drive
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
pkg-config is already the newest version (0.29.2-1ubuntu3).
python3-dev is already the newest version (3.10.6-1~22.04).
python3-dev set to manually installed.
The following additional packages will be installed:
  libblkid-dev libcairo-script-interpreter2 libffi-dev libglib2.0-dev
  libglib2.0-dev-bin libice-dev liblzo2-2 libmount-dev libpixman-1-dev
  libselinux1-dev libsepol-dev libsm-dev libxcb-render0-dev libxcb-shm0-dev
Suggested packages:
  libcairo2-doc libgirepository1.0-dev libglib2.0-doc libgdk-pixbuf2.0-bin
  | libgdk-pixbuf2.0-dev libxml2-utils libice-doc libsm-doc
The following NEW packages will be installed:
  libblkid-dev libcairo-script-interpreter2 libcairo2-dev libffi-dev
  libglib2.0-dev libglib2.0-dev-bin libice-dev liblzo2-2 libmount-dev
  libpixman-1-dev libselinux1-dev libsepol-dev libsm-dev libxcb-render0-dev
  libxcb-shm0-dev
0 upgraded, 15

In [2]:
#import libraries
import numpy as np
import pandas as pd
np.random.seed(0)
#from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_validate,cross_val_score,cross_val_predict,train_test_split,StratifiedKFold
from sklearn.metrics import classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import (RandomForestClassifier, ExtraTreesClassifier, AdaBoostClassifier, BaggingClassifier)
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix

In [3]:
dataset_path = path_to_project + '/data/preprocessed/preprocessed-diabetes-women.csv' if IN_COLAB else 'data/preprocessed/preprocessed-diabetes-women.csv'
df=pd.read_csv(dataset_path)
ignore_cols = ['Age']
df = df.drop(columns=ignore_cols)

feature_cols= df.columns
target_variable = 'Outcome'
target_variable_labels= ['No Diabetes','Diabetes']
sensible_attribute = 'AgeCategory'
#0 is the discriminated group, 1 the privileged one
sensible_values = [0, 1]


## Load the dataset

In [4]:
df.head(10)

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Outcome,AgeCategory
0,6,148,72,35,0,33.6,0.627,1,1
1,1,85,66,29,0,26.6,0.351,0,1
2,8,183,64,0,0,23.3,0.672,1,1
3,1,89,66,23,94,28.1,0.167,0,0
4,0,137,40,35,168,43.1,2.288,1,1
5,5,116,74,0,0,25.6,0.201,0,1
6,3,78,50,32,88,31.0,0.248,1,1
7,4,110,92,0,0,37.6,0.191,0,1
8,10,168,74,0,0,38.0,0.537,1,1
9,10,139,80,0,0,27.1,1.441,0,1


In [5]:
len(df)

677

#Apply Machine Learning Classification algorithm

## Cross validation approach

We need to save the indexes of both groups: privileged and discriminated in two lists

In [6]:
sensible_indexes=df[sensible_attribute]
#save the two groups indexes
discriminated = []
privileged = []
for idx, i in enumerate(sensible_indexes):
  if i==sensible_values[0]:
    discriminated.append(sensible_indexes.index[idx])
  else:
    privileged.append(sensible_indexes.index[idx])

y_privileged is the part the dataset where sensible_value = 1 (for example Agecategory = 1), the same for y_discriminated is the part of dataset where sensible_value = 0

In [7]:
y_discriminated= df.loc[list(discriminated)]
y_privileged= df.loc[list(privileged)]
print(len(y_privileged), len(y_discriminated))

y_test_discriminated = y_discriminated[target_variable]
y_test_privileged = y_privileged[target_variable]
y_test_discriminated

441 236


3      0
21     0
26     0
38     1
40     0
      ..
652    0
660    1
662    0
669    0
676    0
Name: Outcome, Length: 236, dtype: int64

Apply the ML models using croos-validation (thus they use all the dataset to build the final model)

In [8]:
n_estimators = 30
random_seed = 1234

models = {'Logistic Regression':LogisticRegression(),
          'Decision Tree':DecisionTreeClassifier(max_depth=None),
          'Bagging':BaggingClassifier(DecisionTreeClassifier(max_depth=3),n_estimators=n_estimators),
          'Random Forest':RandomForestClassifier(n_estimators=n_estimators),
          'Extremely Randomized Trees':ExtraTreesClassifier(n_estimators=n_estimators),
          'Ada Boost':AdaBoostClassifier(DecisionTreeClassifier(max_depth=3),n_estimators=n_estimators)}

scores = {}
predicted_values = {}
for model_name in models:
    X = df.drop(target_variable, axis=1)
    y = df[target_variable].values
    clf = models[model_name];
    cross_val_results = cross_val_predict(clf,X,y,cv=StratifiedKFold(n_splits=10,shuffle=True,random_state=random_seed))
    predicted_values[model_name] = cross_val_results

    #score = cross_val_score(clf,X,y,cv=StratifiedKFold(n_splits=10,shuffle=True,random_state=random_seed))
    #scores[model_name] = [round(np.average(score), 5), round(np.std(score), 5)]

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Build the confusion matrices (one for the privileged group, one for the discriminated group) for each model

In [9]:
confusion_matrices = {}
for model_name in predicted_values.keys():
  temp_dict = {}
  #print(model_name)
  #print(classification_report(df[target_variable], predicted_values[model_name]))

  temporary = list()
  for idx, i in enumerate(df.index):
    temporary.append((i, predicted_values[model_name][idx]))

  y_pred_df = pd.DataFrame (temporary, columns = ['index', 'y_pred'])
  #y_pred_df = pd.DataFrame ([df.index, df[target_variable], predicted_values[model_name]], columns = ['index', 'y_test', 'y_pred'])
  y_pred_df.set_index(keys=y_pred_df['index'],inplace = True)
  y_pred_df.drop(columns='index', axis=1, inplace= True)

  y_pred_discriminated= y_pred_df.loc[list(discriminated)]
  y_pred_privileged= y_pred_df.loc[list(privileged)]

  cm_discriminated =confusion_matrix(y_test_discriminated, y_pred_discriminated)
  cm_privileged =confusion_matrix(y_test_privileged, y_pred_privileged)
  temp_dict['discriminated'] = cm_discriminated
  temp_dict['privileged'] = cm_privileged
  confusion_matrices[model_name] = temp_dict

#confusion_matrices

In [10]:
#retrieve values from confusion matrix cm
def retrieve_values(cm):
  TP = cm[1][1]
  TN = cm[0][0]
  FN = cm[1][0]
  FP = cm[0][1]
  return TP, TN, FP, FN

## Computation of fairness metrics

### Terminology:
- d is the predicted value,
- Y is the actual value in the dataset
- G the protected attribute, priv= privileged group, discr=discriminated group
- L is the legittimate attribute (only for Conditional Statistical Parity)

### Fairness Metrics List:
##### 1. Group Fairness: (d=1|G=priv) = (d=1|G=discr)
##### 2. Predictive Parity: (Y=1|d=1,G=priv) = (Y=1|d=1,G=discr)
##### . Predictive Equality: (d=1|Y=0,G=priv) = (d=1|Y=0,G=discr)
##### 4. Equal Opportunity:  (d=0|Y=1,G=priv) = (d=0|Y=1,G=discr)
##### 5. Equalized Odds: (d=1|Y=i,G=priv) = (d=1|Y=i,G=discr), i ∈ 0,1
##### 6. ConditionalUseAccuracyEquality: (Y=1|d=1, G=priv) = (Y=1|d=1,G=discr) and (Y=0|d=0,G=priv) = (Y=0|d=0,G=discr)
##### 7. Overall Accuracy Equality: (d=Y, G=priv) = (d=Y, G=priv)
##### 8. Treatment Equality: (Y=1, d=0, G=priv)/(Y=0, d=1, G=priv) = (Y=1, d=0, G=discr)/(Y=0, d=1, G=discr)
##### 9. FOR Parity: (Y=1|d=0, G=priv) = (Y=1|d=0,G=discr)

In [11]:
##fairness metrics computed using division operator
def fairness_metrics_division(confusion_matrix):

  TP_priv, TN_priv, FP_priv, FN_priv = retrieve_values(confusion_matrix['privileged'])
  TP_discr, TN_discr, FP_discr, FN_discr = retrieve_values(confusion_matrix['discriminated'])

  GroupFairness_discr = (TP_discr+FP_discr)/len(discriminated)
  GroupFairness_priv = (TP_priv+FP_priv)/len(privileged)
  GroupFairness = GroupFairness_discr/GroupFairness_priv

  PredictiveParity_discr = (TP_discr)/(TP_discr+FP_discr)
  PredictiveParity_priv = (TP_priv)/(TP_priv+FP_priv)
  PredictiveParity = PredictiveParity_discr/PredictiveParity_priv

  PredictiveEquality_discr = (FP_discr)/(TN_discr+FP_discr)
  PredictiveEquality_priv = (FP_priv)/(TN_priv+FP_priv)
  PredictiveEquality = PredictiveEquality_discr/PredictiveEquality_priv

  EqualOpportunity_discr = (FN_discr)/(TP_discr+FN_discr)
  EqualOpportunity_priv = (FN_priv)/(TP_priv+FN_priv)
  EqualOpportunity = EqualOpportunity_priv/EqualOpportunity_discr

  EqualizedOdds1 = ((TP_discr/(TP_discr+FN_discr)) / (TP_priv/(TP_priv+FN_priv))) #(1-equalOpportunity_discr)/(1-equalOpportunity_priv)
  EqualizedOdds2 = ((FP_discr/(TN_discr+FP_discr)) / (FP_priv/(TN_priv+FP_priv))) #= PredictiveEquality
  EqualizedOdds = (EqualizedOdds1*EqualizedOdds2)

  ConditionalUseAccuracyEquality1 = ((TP_discr/(TP_discr+FP_discr)) / (TP_priv/(TP_priv+FP_priv)))
  ConditionalUseAccuracyEquality2 = ((TN_discr/(TN_discr+FN_discr)) / (TN_priv/(TN_priv+FN_priv)))
  ConditionalUseAccuracyEquality = (ConditionalUseAccuracyEquality1*ConditionalUseAccuracyEquality2)

  OAE1 = TP_discr/TP_priv
  OAE2 = TN_discr/TN_priv
  OverallAccuracyEquality = (OAE1*OAE2)

  ## ATTENTION: DIVISION BY 0
  TreatmentEquality_discr = (FN_discr/FP_discr)*len(discriminated)
  TreatmentEquality_priv = (FN_priv/FP_priv)*len(privileged)

  TreatmentEquality = TreatmentEquality_priv/TreatmentEquality_discr

  FORParity_discr = (FN_discr)/(TN_discr+FN_discr)
  FORParity_priv = (FN_priv)/(TN_priv+FN_priv)
  FORParity = FORParity_priv/FORParity_discr

  FN_P_discr =  (FN_discr)/len(discriminated)
  FN_P_priv =  (FN_priv)/len(privileged)

  FP_P_discr = (FP_discr)/len(discriminated)
  FP_P_priv =  (FP_priv)/len(privileged)

  #RecallParity = (TP_discr/(TP_discr+FN_discr))/(TP_priv/(TP_priv+FN_priv))

  metrics = []
  metrics.append(('GroupFairness',  GroupFairness, GroupFairness_discr, GroupFairness_priv))
  metrics.append(('PredictiveParity', PredictiveParity, PredictiveParity_discr, PredictiveParity_priv))
  metrics.append(('PredictiveEquality', PredictiveEquality, PredictiveEquality_discr, PredictiveEquality_priv))
  metrics.append(('EqualOpportunity', EqualOpportunity, EqualOpportunity_discr, EqualOpportunity_priv))
  metrics.append(('EqualizedOdds', EqualizedOdds, EqualizedOdds1, EqualizedOdds2))
  metrics.append(('ConditionalUseAccuracyEquality', ConditionalUseAccuracyEquality, ConditionalUseAccuracyEquality1 , ConditionalUseAccuracyEquality2))
  metrics.append(('OverallAccuracyEquality', OverallAccuracyEquality, OAE1, OAE2))
  metrics.append(('TreatmentEquality', TreatmentEquality, TreatmentEquality_discr, TreatmentEquality_priv))
  metrics.append(('FORParity', FORParity, FORParity_discr, FORParity_priv))
  metrics.append(('FN', FN_P_priv/FN_P_discr, FN_P_discr, FN_P_priv))
  metrics.append(('FP', FP_P_discr/FP_P_priv, FP_P_discr, FP_P_priv))

  fairness_metrics = pd.DataFrame(metrics, columns = ['Metric', 'Value', 'Discr_group', 'Priv_group'])
  return fairness_metrics

In [12]:
#fairness metrics computed using substraction operator
def fairness_metrics_substraction(confusion_matrix):

  TP_priv, TN_priv, FP_priv, FN_priv = retrieve_values(confusion_matrix['privileged'])
  TP_discr, TN_discr, FP_discr, FN_discr = retrieve_values(confusion_matrix['discriminated'])

  GroupFairness_discr = (TP_discr+FP_discr)/len(discriminated)
  GroupFairness_priv = (TP_priv+FP_priv)/len(privileged)
  GroupFairness = GroupFairness_priv-GroupFairness_discr

  PredictiveParity_discr = (TP_discr)/(TP_discr+FP_discr)
  PredictiveParity_priv = (TP_priv)/(TP_priv+FP_priv)
  PredictiveParity = PredictiveParity_priv-PredictiveParity_discr

  PredictiveEquality_discr = (FP_discr)/(TN_discr+FP_discr)
  PredictiveEquality_priv = (FP_priv)/(TN_priv+FP_priv)
  PredictiveEquality = PredictiveEquality_priv-PredictiveEquality_discr

  EqualOpportunity_discr = (FN_discr)/(TP_discr+FN_discr)
  EqualOpportunity_priv = (FN_priv)/(TP_priv+FN_priv)
  EqualOpportunity = EqualOpportunity_priv-EqualOpportunity_discr

  EqualizedOdds1 = (TP_priv/(TP_priv+FN_priv))-(TP_discr/(TP_discr+FN_discr)) #(1-equalOpportunity_discr)/(1-equalOpportunity_priv)
  EqualizedOdds2 = (FP_priv/(TN_priv+FP_priv))-(FP_discr/(TN_discr+FP_discr)) #= PredictiveEquality
  EqualizedOdds = (EqualizedOdds1*EqualizedOdds2) ## TO DO: REASON ON THIS METRIC!!

  ConditionalUseAccuracyEquality1 = (TP_priv/(TP_priv+FP_priv))-(TP_discr/(TP_discr+FP_discr))
  ConditionalUseAccuracyEquality2 = (TN_priv/(TN_priv+FN_priv))- (TN_discr/(TN_discr+FN_discr))
  ConditionalUseAccuracyEquality = (ConditionalUseAccuracyEquality1*ConditionalUseAccuracyEquality2) ## TO DO: REASON ON THIS METRIC!!

  OAE1 = TP_priv-TP_discr
  OAE2 = TN_priv-TN_discr
  OverallAccuracyEquality = (OAE1*OAE2) ## TO DO: REASON ON THIS METRIC!!

  TreatmentEquality_discr = (FN_discr/FP_discr)*len(discriminated)
  TreatmentEquality_priv = (FN_priv/FP_priv)*len(privileged)

  TreatmentEquality = TreatmentEquality_priv-TreatmentEquality_discr

  FORParity_discr = (FN_discr)/(TN_discr+FN_discr)
  FORParity_priv = (FN_priv)/(TN_priv+FN_priv)
  FORParity = FORParity_priv-FORParity_discr

  FN_P_discr =  (FN_discr)/len(discriminated)
  FN_P_priv =  (FN_priv)/len(privileged)

  FP_P_discr = (FP_discr)/len(discriminated)
  FP_P_priv =  (FP_priv)/len(privileged)

  #RecallParity = (TP_discr/(TP_discr+FN_discr))/(TP_priv/(TP_priv+FN_priv))

  metrics = []
  metrics.append(('GroupFairness',  GroupFairness, GroupFairness_discr, GroupFairness_priv))
  metrics.append(('PredictiveParity', PredictiveParity, PredictiveParity_discr, PredictiveParity_priv))
  metrics.append(('PredictiveEquality', PredictiveEquality, PredictiveEquality_discr, PredictiveEquality_priv))
  metrics.append(('EqualOpportunity', EqualOpportunity, EqualOpportunity_discr, EqualOpportunity_priv))
  metrics.append(('EqualizedOdds', EqualizedOdds, EqualizedOdds1, EqualizedOdds2))
  metrics.append(('ConditionalUseAccuracyEquality', ConditionalUseAccuracyEquality, ConditionalUseAccuracyEquality1 , ConditionalUseAccuracyEquality2))
  metrics.append(('OverallAccuracyEquality', OverallAccuracyEquality, OAE1, OAE2))
  metrics.append(('TreatmentEquality', TreatmentEquality, TreatmentEquality_discr, TreatmentEquality_priv))
  metrics.append(('FORParity', FORParity, FORParity_discr, FORParity_priv))
  metrics.append(('FN', FN_P_priv-FN_P_discr, FN_P_discr, FN_P_priv))
  metrics.append(('FP', FP_P_discr-FP_P_priv, FP_P_discr, FP_P_priv))

  fairness_metrics = pd.DataFrame(metrics, columns = ['Metric', 'Value', 'Discr_group', 'Priv_group'])
  return fairness_metrics

## Final results

In [15]:
for model_name in predicted_values.keys():
  print(model_name)
  print("FAIRNESS DIVISION RESULTS")
  print(fairness_metrics_division(confusion_matrices[model_name]))
  print("FAIRNESS SUBSTRACTION RESULTS")
  print(fairness_metrics_substraction(confusion_matrices[model_name]))
  print("-----------------------------")

Logistic Regression
FAIRNESS DIVISION RESULTS
                            Metric     Value  Discr_group  Priv_group
0                    GroupFairness  0.183200     0.063559    0.346939
1                 PredictiveParity  1.048598     0.733333    0.699346
2               PredictiveEquality  0.110000     0.020000    0.181818
3                 EqualOpportunity  0.620426     0.694444    0.430851
4                    EqualizedOdds  0.059055     0.536864    0.110000
5   ConditionalUseAccuracyEquality  1.293883     1.048598    1.233917
6          OverallAccuracyEquality  0.097341     0.102804    0.946860
7                TreatmentEquality  0.526470  1475.000000  776.543478
8                        FORParity  2.486250     0.113122    0.281250
9                               FN  1.733878     0.105932    0.183673
10                              FP  0.162491     0.016949    0.104308
FAIRNESS SUBSTRACTION RESULTS
                            Metric        Value  Discr_group  Priv_group
0          

### How to evaluate the results?
Looking at the value for each corresponding metric:

- If the value is between 0 and 1-t the discriminated group suffers from unfairness
- If the value is greater than 1+t the privileged group suffers from unfairness
- If the value is between 1-t and 1+t both privileged and discriminated group have a fair treatment

t is a threshold that should be choose by the user according to the context and the goal of the task.
