# Goal: Fairness Metrics Computation
The goal of this notebook is given:
- a dataset D
- a target variable
- a sensible attribute
- a legittimate attribute (needed only for one specific metric)

it computes the statistical fairness metrics presented in Verma and Rubin paper https://dl.acm.org/doi/pdf/10.1145/3194770.3194776 + Aequitas metrics https://arxiv.org/pdf/1811.05577.pdf.

The running example is the Credit Score dataset https://www.kaggle.com/datasets/kabure/german-credit-data-with-risk



# Import Libraries





In [None]:
try:
  from google.colab import drive
  drive.mount('/content/drive')
  import sys
  path_to_project = '/content/drive/MyDrive/FairAlgorithm'
  sys.path.append(path_to_project)
  !sudo apt install libcairo2-dev pkg-config python3-dev
  IN_COLAB = True
except:
  IN_COLAB = False

Mounted at /content/drive
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
pkg-config is already the newest version (0.29.2-1ubuntu3).
python3-dev is already the newest version (3.10.6-1~22.04).
python3-dev set to manually installed.
The following additional packages will be installed:
  libblkid-dev libblkid1 libcairo-script-interpreter2 libffi-dev
  libglib2.0-dev libglib2.0-dev-bin libice-dev liblzo2-2 libmount-dev
  libmount1 libpixman-1-dev libselinux1-dev libsepol-dev libsm-dev
  libxcb-render0-dev libxcb-shm0-dev
Suggested packages:
  libcairo2-doc libgirepository1.0-dev libglib2.0-doc libgdk-pixbuf2.0-bin
  | libgdk-pixbuf2.0-dev libxml2-utils libice-doc cryptsetup-bin libsm-doc
The following NEW packages will be installed:
  libblkid-dev libcairo-script-interpreter2 libcairo2-dev libffi-dev
  libglib2.0-dev libglib2.0-dev-bin libice-dev liblzo2-2 libmount-dev
  libpixman-1-dev libselinux1-dev libsepol-dev libsm-dev libxcb-render0

In [None]:
#import libraries
import numpy as np
import pandas as pd
import pickle
#from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_validate,cross_val_score,cross_val_predict,train_test_split,StratifiedKFold
from sklearn.metrics import classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, AdaBoostClassifier, BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
from tqdm.notebook import tqdm

# Initialisation

In [None]:
dataset_name = "diabetes-women"
np.random.seed(1234)

In [None]:
#Parameters
if dataset_name == "diabetes-women":
  target_variable, target_variable_labels, sensible_attribute, ignore_cols = 'Outcome', ['No Diabetes','Diabetes'], 'AgeCategory', ['Age']
else:
  target_variable, target_variable_labels, sensible_attribute, ignore_cols = 'A', ['0','1'], 'B', ['X']

In [None]:
dataset_path = path_to_project + '/data/preprocessed/preprocessed-{}.csv'.format(dataset_name) if IN_COLAB else 'data/preprocessed/preprocessed-{}.csv'.format(dataset_name)

df = pd.read_csv(dataset_path)
df = df.drop(columns=ignore_cols)
feature_cols = df.columns
sensible_values = [0, 1]  # 0 is the discriminated group, 1 the privileged one

In [None]:
df.head(5)

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Outcome,AgeCategory
0,6,148,72,35,0,33.6,0.627,1,1
1,1,85,66,29,0,26.6,0.351,0,1
2,8,183,64,0,0,23.3,0.672,1,1
3,1,89,66,23,94,28.1,0.167,0,0
4,0,137,40,35,168,43.1,2.288,1,1


In [None]:
print("Number of entries: ", len(df))

Number of entries:  677


#Apply Classification Algorithm

Apply the ML models using cross-validation (thus using all the dataset to build the final model)

In [None]:
n_estimators = 30
random_seed = 1234

models = {'Logistic Regression':LogisticRegression(max_iter=500),
          'Decision Tree':DecisionTreeClassifier(max_depth=None),
          'Bagging':BaggingClassifier(DecisionTreeClassifier(max_depth=3),n_estimators=n_estimators),
          'Random Forest':RandomForestClassifier(n_estimators=n_estimators),
          'Extremely Randomized Trees':ExtraTreesClassifier(n_estimators=n_estimators),
          'Ada Boost':AdaBoostClassifier(DecisionTreeClassifier(max_depth=3),n_estimators=n_estimators)}

scores = {}
predicted_values = {}
for model_name in tqdm(models):
    X = df.drop(target_variable, axis=1)
    y = df[target_variable].values
    clf = models[model_name];
    cross_val_results = cross_val_predict(clf,X,y,cv=StratifiedKFold(n_splits=10,shuffle=True,random_state=random_seed))
    predicted_values[model_name] = cross_val_results

    score = cross_val_score(clf,X,y,cv=StratifiedKFold(n_splits=10,shuffle=True,random_state=random_seed))
    scores[model_name] = [round(np.average(score), 5), round(np.std(score), 5)]

  0%|          | 0/6 [00:00<?, ?it/s]

Save the predictions and scores to store the results in pickle format to save space

In [None]:
save_path = path_to_project + '/data/predictions/predictions-{}-original.p'.format(dataset_name)
with open(save_path, 'wb') as fp:
    pickle.dump(predicted_values, fp, protocol=pickle.HIGHEST_PROTOCOL)

save_path = path_to_project + '/data/scores/scores-{}-original.p'.format(dataset_name)
with open(save_path, 'wb') as fp:
    pickle.dump(scores, fp, protocol=pickle.HIGHEST_PROTOCOL)

We need to save the indexes of both groups `privileged` and `discriminated` in two lists.

In [None]:
sensible_indexes = df[sensible_attribute]
# save the two groups indexes
discriminated = []
privileged = []
for idx, i in enumerate(sensible_indexes):
  if i == sensible_values[0]:
    discriminated.append(sensible_indexes.index[idx])
  else:
    privileged.append(sensible_indexes.index[idx])

`y_privileged` is the part the dataset where `sensible_value` = 1 (for example `AgeCategory` = 1), and `y_discriminated` is the part of dataset where `sensible_value` = 0.

In [None]:
y_discriminated= df.loc[list(discriminated)]
y_privileged= df.loc[list(privileged)]

y_test_discriminated = y_discriminated[target_variable]
y_test_privileged = y_privileged[target_variable]

Build the confusion matrices (one for the privileged group, one for the discriminated group) for each model.

In [None]:
confusion_matrices = {}
for model_name in predicted_values.keys():
  temp_dict = {}
  #print(model_name)
  #print(classification_report(df[target_variable], predicted_values[model_name]))

  temporary = list()
  for idx, i in enumerate(df.index):
    temporary.append((i, predicted_values[model_name][idx]))

  y_pred_df = pd.DataFrame (temporary, columns = ['index', 'y_pred'])
  #y_pred_df = pd.DataFrame ([df.index, df[target_variable], predicted_values[model_name]], columns = ['index', 'y_test', 'y_pred'])
  y_pred_df.set_index(keys=y_pred_df['index'], inplace = True)
  y_pred_df.drop(columns='index', axis=1, inplace= True)

  y_pred_discriminated = y_pred_df.loc[list(discriminated)]
  y_pred_privileged = y_pred_df.loc[list(privileged)]

  cm_discriminated = confusion_matrix(y_test_discriminated, y_pred_discriminated)
  cm_privileged = confusion_matrix(y_test_privileged, y_pred_privileged)
  temp_dict['discriminated'] = cm_discriminated
  temp_dict['privileged'] = cm_privileged
  confusion_matrices[model_name] = temp_dict

# Fairness Metrics Functions

Terminology:

- d is the predicted value,
- Y is the actual value in the dataset
- G the protected attribute, priv= privileged group, discr=discriminated group
- L is the legittimate attribute (only for Conditional Statistical Parity)

Fairness Metrics List:

1. Group Fairness: (d=1|G=priv) = (d=1|G=discr)
2. Predictive Parity: (Y=1|d=1,G=priv) = (Y=1|d=1,G=discr)
3. Predictive Equality: (d=1|Y=0,G=priv) = (d=1|Y=0,G=discr)
4. Equal Opportunity:  (d=0|Y=1,G=priv) = (d=0|Y=1,G=discr)
5. Equalized Odds: (d=1|Y=i,G=priv) = (d=1|Y=i,G=discr), i ∈ 0,1
6. ConditionalUseAccuracyEquality: (Y=1|d=1, G=priv) = (Y=1|d=1,G=discr) and (Y=0|d=0,G=priv) = (Y=0|d=0,G=discr)
7. Overall Accuracy Equality: (d=Y, G=priv) = (d=Y, G=priv)
8. Treatment Equality: (Y=1, d=0, G=priv)/(Y=0, d=1, G=priv) = (Y=1, d=0, G=discr)/(Y=0, d=1, G=discr)
9. FOR Parity: (Y=1|d=0, G=priv) = (Y=1|d=0,G=discr)

In [None]:
# Retrieve TP, TN, FP, FN values from a confusion matrix
def retrieve_values(cm):
  TN = cm[0][0]
  FP = cm[0][1]
  FN = cm[1][0]
  TP = cm[1][1]
  return TP, TN, FP, FN

def rescale(metric):
  metric = metric - 1
  return metric

def standardization(metric):
  if metric > 1:
    metric = 1
  elif metric < -1:
    metric = -1
  return metric

def valid(metric, th):
  if metric > 1-th and metric < 1+th:
    return True
  return False

def and_function(m1, m2, th):
  if m1 > 1+th and m2 > 1+th:
    return max(m1, m2)
  elif m1 < 1-th and m2 < 1-th:
    return min(m1, m2)
  elif valid(m1, th) and valid(m2, th):
    return max(m1, m2)
  elif (valid(m1, th) or valid(m2, th)) and (m1 > 1+th or m2 > 1+th):
    return max(m1, m2)
  elif (valid(m1, th) or valid(m2, th)) and (m1 < 1-th or m2 < 1-th):
    return min(m1, m2)
  else:
    return max(m1, m2)

In [None]:
# Fairness metrics computed using division operator
def fairness_metrics_division(confusion_matrix, discriminated, privileged, threshold = 0.15):

  TP_priv, TN_priv, FP_priv, FN_priv = retrieve_values(confusion_matrix['privileged'])
  TP_discr, TN_discr, FP_discr, FN_discr = retrieve_values(confusion_matrix['discriminated'])

  GroupFairness_discr = (TP_discr+FP_discr)/len(discriminated)
  GroupFairness_priv = (TP_priv+FP_priv)/len(privileged)
  GroupFairness = GroupFairness_discr/GroupFairness_priv

  PredictiveParity_discr = (TP_discr)/(TP_discr+FP_discr)
  PredictiveParity_priv = (TP_priv)/(TP_priv+FP_priv)
  PredictiveParity = PredictiveParity_discr/PredictiveParity_priv

  PredictiveEquality_discr = (FP_discr)/(TN_discr+FP_discr)
  PredictiveEquality_priv = (FP_priv)/(TN_priv+FP_priv)
  PredictiveEquality = PredictiveEquality_discr/PredictiveEquality_priv

  EqualOpportunity_discr = (FN_discr)/(TP_discr+FN_discr)
  EqualOpportunity_priv = (FN_priv)/(TP_priv+FN_priv)
  EqualOpportunity = EqualOpportunity_priv/EqualOpportunity_discr

  EqualizedOdds1 = ((TP_discr/(TP_discr+FN_discr)) / (TP_priv/(TP_priv+FN_priv))) # (1-equalOpportunity_discr)/(1-equalOpportunity_priv)
  EqualizedOdds2 = ((FP_discr/(TN_discr+FP_discr)) / (FP_priv/(TN_priv+FP_priv))) # = PredictiveEquality
  # EqualizedOdds = (EqualizedOdds1 * EqualizedOdds2)
  EqualizedOdds = and_function(EqualizedOdds1, EqualizedOdds2, threshold)

  ConditionalUseAccuracyEquality1 = ((TP_discr/(TP_discr+FP_discr)) / (TP_priv/(TP_priv+FP_priv)))
  ConditionalUseAccuracyEquality2 = ((TN_discr/(TN_discr+FN_discr)) / (TN_priv/(TN_priv+FN_priv)))
  # ConditionalUseAccuracyEquality = (ConditionalUseAccuracyEquality1 * ConditionalUseAccuracyEquality2)
  ConditionalUseAccuracyEquality = and_function(ConditionalUseAccuracyEquality1, ConditionalUseAccuracyEquality2, threshold)

  OAE1 = TP_discr/TP_priv
  OAE2 = TN_discr/TN_priv
  # OverallAccuracyEquality = (OAE1 * OAE2)
  OverallAccuracyEquality = and_function(OAE1, OAE2, threshold)

  try: TreatmentEquality_discr = (FN_discr/FP_discr)
  except ZeroDivisionError: TreatmentEquality_discr = 2  #max value
  try: TreatmentEquality_priv = (FN_priv/FP_priv)
  except ZeroDivisionError: TreatmentEquality_priv = 2  #max value
  TreatmentEquality = TreatmentEquality_priv/TreatmentEquality_discr

  try: FORParity_discr = (FN_discr)/(TN_discr+FN_discr)
  except ZeroDivisionError: FORParity_discr = 2  #max value
  try: FORParity_priv = (FN_priv)/(TN_priv+FN_priv)
  except ZeroDivisionError: FORParity_priv = 2  #max value
  FORParity = FORParity_priv/FORParity_discr

  FN_P_discr = (FN_discr)/len(discriminated)
  FN_P_priv = (FN_priv)/len(privileged)

  FP_P_discr = (FP_discr)/len(discriminated)
  FP_P_priv = (FP_priv)/len(privileged)

  #RecallParity = (TP_discr/(TP_discr+FN_discr))/(TP_priv/(TP_priv+FN_priv))

  metrics = {}
  metrics['GroupFairness'] = [GroupFairness, GroupFairness_discr, GroupFairness_priv]
  metrics['PredictiveParity'] = [PredictiveParity, PredictiveParity_discr, PredictiveParity_priv]
  metrics['PredictiveEquality'] = [PredictiveEquality, PredictiveEquality_discr, PredictiveEquality_priv]
  metrics['EqualOpportunity'] = [EqualOpportunity, EqualOpportunity_discr, EqualOpportunity_priv]
  metrics['EqualizedOdds'] = [EqualizedOdds, EqualizedOdds1, EqualizedOdds2]
  metrics['ConditionalUseAccuracyEquality'] = [ConditionalUseAccuracyEquality, ConditionalUseAccuracyEquality1 , ConditionalUseAccuracyEquality2]
  metrics['OverallAccuracyEquality'] = [OverallAccuracyEquality, OAE1, OAE2]
  metrics['TreatmentEquality'] = [TreatmentEquality, TreatmentEquality_discr, TreatmentEquality_priv]
  metrics['FORParity'] = [FORParity, FORParity_discr, FORParity_priv]
  metrics['FN'] = [FN_P_priv/FN_P_discr, FN_P_discr, FN_P_priv]
  metrics['FP'] = [FP_P_discr/FP_P_priv, FP_P_discr, FP_P_priv]

  for k in metrics.keys():
    value = standardization(rescale(metrics[k][0]))
    discr = metrics[k][1]
    priv = metrics[k][2]
    metrics[k] = {'Value': value, 'Discr_group': discr, 'Priv_group': priv}

  return metrics


# Fairness metrics computed using subtraction operator
def fairness_metrics_subtraction(confusion_matrix, discriminated, privileged, threshold = 0.15):

  TP_priv, TN_priv, FP_priv, FN_priv = retrieve_values(confusion_matrix['privileged'])
  TP_discr, TN_discr, FP_discr, FN_discr = retrieve_values(confusion_matrix['discriminated'])

  GroupFairness_discr = (TP_discr+FP_discr)/len(discriminated)
  GroupFairness_priv = (TP_priv+FP_priv)/len(privileged)
  GroupFairness = GroupFairness_priv-GroupFairness_discr

  PredictiveParity_discr = (TP_discr)/(TP_discr+FP_discr)
  PredictiveParity_priv = (TP_priv)/(TP_priv+FP_priv)
  PredictiveParity = PredictiveParity_priv-PredictiveParity_discr

  PredictiveEquality_discr = (FP_discr)/(TN_discr+FP_discr)
  PredictiveEquality_priv = (FP_priv)/(TN_priv+FP_priv)
  PredictiveEquality = PredictiveEquality_priv-PredictiveEquality_discr

  EqualOpportunity_discr = (FN_discr)/(TP_discr+FN_discr)
  EqualOpportunity_priv = (FN_priv)/(TP_priv+FN_priv)
  EqualOpportunity = EqualOpportunity_priv-EqualOpportunity_discr

  EqualizedOdds1 = (TP_priv/(TP_priv+FN_priv))-(TP_discr/(TP_discr+FN_discr)) # (1-equalOpportunity_discr)/(1-equalOpportunity_priv)
  EqualizedOdds2 = (FP_priv/(TN_priv+FP_priv))-(FP_discr/(TN_discr+FP_discr)) # = PredictiveEquality
  EqualizedOdds = and_function(EqualizedOdds1, EqualizedOdds2, threshold)

  ConditionalUseAccuracyEquality1 = (TP_priv/(TP_priv+FP_priv)) - (TP_discr/(TP_discr+FP_discr))
  ConditionalUseAccuracyEquality2 = (TN_priv/(TN_priv+FN_priv)) - (TN_discr/(TN_discr+FN_discr))
  ConditionalUseAccuracyEquality = and_function(ConditionalUseAccuracyEquality1, ConditionalUseAccuracyEquality2, threshold)

  OAE1 = TP_priv-TP_discr
  OAE2 = TN_priv-TN_discr
  OverallAccuracyEquality = and_function(OAE1, OAE2, threshold)

  try: TreatmentEquality_discr = (FN_discr/FP_discr)
  except ZeroDivisionError: TreatmentEquality_discr = 2  #max value
  try: TreatmentEquality_priv = (FN_priv/FP_priv)
  except ZeroDivisionError: TreatmentEquality_priv = 2  #max value
  TreatmentEquality = TreatmentEquality_priv-TreatmentEquality_discr

  try: FORParity_discr = (FN_discr)/(TN_discr+FN_discr)
  except ZeroDivisionError: FORParity_discr = 2  #max value
  try: FORParity_priv = (FN_priv)/(TN_priv+FN_priv)
  except ZeroDivisionError: FORParity_priv = 2  #max value
  FORParity = FORParity_priv-FORParity_discr

  FN_P_discr =  (FN_discr)/len(discriminated)
  FN_P_priv =  (FN_priv)/len(privileged)

  FP_P_discr = (FP_discr)/len(discriminated)
  FP_P_priv =  (FP_priv)/len(privileged)

  #RecallParity = (TP_discr/(TP_discr+FN_discr))/(TP_priv/(TP_priv+FN_priv))

  metrics = {}
  metrics['GroupFairness'] = [GroupFairness, GroupFairness_discr, GroupFairness_priv]
  metrics['PredictiveParity'] = [PredictiveParity, PredictiveParity_discr, PredictiveParity_priv]
  metrics['PredictiveEquality'] = [PredictiveEquality, PredictiveEquality_discr, PredictiveEquality_priv]
  metrics['EqualOpportunity'] = [EqualOpportunity, EqualOpportunity_discr, EqualOpportunity_priv]
  metrics['EqualizedOdds'] = [EqualizedOdds, EqualizedOdds1, EqualizedOdds2]
  metrics['ConditionalUseAccuracyEquality'] = [ConditionalUseAccuracyEquality, ConditionalUseAccuracyEquality1 , ConditionalUseAccuracyEquality2]
  metrics['OverallAccuracyEquality'] = [OverallAccuracyEquality, OAE1, OAE2]
  metrics['TreatmentEquality'] = [TreatmentEquality, TreatmentEquality_discr, TreatmentEquality_priv]
  metrics['FORParity'] = [FORParity, FORParity_discr, FORParity_priv]
  metrics['FN'] = [FN_P_priv-FN_P_discr, FN_P_discr, FN_P_priv]
  metrics['FP'] = [FP_P_discr-FP_P_priv, FP_P_discr, FP_P_priv]

  for k in metrics.keys():
    value = standardization(metrics[k][0])
    discr = metrics[k][1]
    priv = metrics[k][2]
    metrics[k] = {'Value': value, 'Discr_group': discr, 'Priv_group': priv}

  return metrics

In [None]:
# # Fairness metrics computed using division operator
# def fairness_metrics_division(confusion_matrix, discriminated, privileged):

#   TP_priv, TN_priv, FP_priv, FN_priv = retrieve_values(confusion_matrix['privileged'])
#   TP_discr, TN_discr, FP_discr, FN_discr = retrieve_values(confusion_matrix['discriminated'])

#   GroupFairness_discr = (TP_discr+FP_discr)/len(discriminated)
#   GroupFairness_priv = (TP_priv+FP_priv)/len(privileged)
#   GroupFairness = GroupFairness_discr/GroupFairness_priv

#   PredictiveParity_discr = (TP_discr)/(TP_discr+FP_discr)
#   PredictiveParity_priv = (TP_priv)/(TP_priv+FP_priv)
#   PredictiveParity = PredictiveParity_discr/PredictiveParity_priv

#   PredictiveEquality_discr = (FP_discr)/(TN_discr+FP_discr)
#   PredictiveEquality_priv = (FP_priv)/(TN_priv+FP_priv)
#   PredictiveEquality = PredictiveEquality_discr/PredictiveEquality_priv

#   EqualOpportunity_discr = (FN_discr)/(TP_discr+FN_discr)
#   EqualOpportunity_priv = (FN_priv)/(TP_priv+FN_priv)
#   EqualOpportunity = EqualOpportunity_priv/EqualOpportunity_discr

#   EqualizedOdds1 = ((TP_discr/(TP_discr+FN_discr)) / (TP_priv/(TP_priv+FN_priv))) #(1-equalOpportunity_discr)/(1-equalOpportunity_priv)
#   EqualizedOdds2 = ((FP_discr/(TN_discr+FP_discr)) / (FP_priv/(TN_priv+FP_priv))) #= PredictiveEquality
#   EqualizedOdds = (EqualizedOdds1*EqualizedOdds2)

#   ConditionalUseAccuracyEquality1 = ((TP_discr/(TP_discr+FP_discr)) / (TP_priv/(TP_priv+FP_priv)))
#   ConditionalUseAccuracyEquality2 = ((TN_discr/(TN_discr+FN_discr)) / (TN_priv/(TN_priv+FN_priv)))
#   ConditionalUseAccuracyEquality = (ConditionalUseAccuracyEquality1*ConditionalUseAccuracyEquality2)

#   OAE1 = TP_discr/TP_priv
#   OAE2 = TN_discr/TN_priv
#   OverallAccuracyEquality = (OAE1*OAE2)

#   ## ATTENTION: DIVISION BY 0
#   TreatmentEquality_discr = (FN_discr/FP_discr)*len(discriminated)
#   TreatmentEquality_priv = (FN_priv/FP_priv)*len(privileged)

#   TreatmentEquality = TreatmentEquality_priv/TreatmentEquality_discr

#   FORParity_discr = (FN_discr)/(TN_discr+FN_discr)
#   FORParity_priv = (FN_priv)/(TN_priv+FN_priv)
#   FORParity = FORParity_priv/FORParity_discr

#   FN_P_discr =  (FN_discr)/len(discriminated)
#   FN_P_priv =  (FN_priv)/len(privileged)

#   FP_P_discr = (FP_discr)/len(discriminated)
#   FP_P_priv =  (FP_priv)/len(privileged)

#   #RecallParity = (TP_discr/(TP_discr+FN_discr))/(TP_priv/(TP_priv+FN_priv))

#   metrics = []
#   metrics.append(('GroupFairness',  GroupFairness, GroupFairness_discr, GroupFairness_priv))
#   metrics.append(('PredictiveParity', PredictiveParity, PredictiveParity_discr, PredictiveParity_priv))
#   metrics.append(('PredictiveEquality', PredictiveEquality, PredictiveEquality_discr, PredictiveEquality_priv))
#   metrics.append(('EqualOpportunity', EqualOpportunity, EqualOpportunity_discr, EqualOpportunity_priv))
#   metrics.append(('EqualizedOdds', EqualizedOdds, EqualizedOdds1, EqualizedOdds2))
#   metrics.append(('ConditionalUseAccuracyEquality', ConditionalUseAccuracyEquality, ConditionalUseAccuracyEquality1 , ConditionalUseAccuracyEquality2))
#   metrics.append(('OverallAccuracyEquality', OverallAccuracyEquality, OAE1, OAE2))
#   metrics.append(('TreatmentEquality', TreatmentEquality, TreatmentEquality_discr, TreatmentEquality_priv))
#   metrics.append(('FORParity', FORParity, FORParity_discr, FORParity_priv))
#   metrics.append(('FN', FN_P_priv/FN_P_discr, FN_P_discr, FN_P_priv))
#   metrics.append(('FP', FP_P_discr/FP_P_priv, FP_P_discr, FP_P_priv))

#   fairness_metrics = pd.DataFrame(metrics, columns = ['Metric', 'Value', 'Discr_group', 'Priv_group'])
#   return fairness_metrics


# # Fairness metrics computed using subtraction operator
# def fairness_metrics_subtraction(confusion_matrix, discriminated, privileged):

#   TP_priv, TN_priv, FP_priv, FN_priv = retrieve_values(confusion_matrix['privileged'])
#   TP_discr, TN_discr, FP_discr, FN_discr = retrieve_values(confusion_matrix['discriminated'])

#   GroupFairness_discr = (TP_discr+FP_discr)/len(discriminated)
#   GroupFairness_priv = (TP_priv+FP_priv)/len(privileged)
#   GroupFairness = GroupFairness_priv-GroupFairness_discr

#   PredictiveParity_discr = (TP_discr)/(TP_discr+FP_discr)
#   PredictiveParity_priv = (TP_priv)/(TP_priv+FP_priv)
#   PredictiveParity = PredictiveParity_priv-PredictiveParity_discr

#   PredictiveEquality_discr = (FP_discr)/(TN_discr+FP_discr)
#   PredictiveEquality_priv = (FP_priv)/(TN_priv+FP_priv)
#   PredictiveEquality = PredictiveEquality_priv-PredictiveEquality_discr

#   EqualOpportunity_discr = (FN_discr)/(TP_discr+FN_discr)
#   EqualOpportunity_priv = (FN_priv)/(TP_priv+FN_priv)
#   EqualOpportunity = EqualOpportunity_priv-EqualOpportunity_discr

#   EqualizedOdds1 = (TP_priv/(TP_priv+FN_priv))-(TP_discr/(TP_discr+FN_discr)) #(1-equalOpportunity_discr)/(1-equalOpportunity_priv)
#   EqualizedOdds2 = (FP_priv/(TN_priv+FP_priv))-(FP_discr/(TN_discr+FP_discr)) #= PredictiveEquality
#   EqualizedOdds = (EqualizedOdds1*EqualizedOdds2) ## TO DO: REASON ON THIS METRIC!!

#   ConditionalUseAccuracyEquality1 = (TP_priv/(TP_priv+FP_priv))-(TP_discr/(TP_discr+FP_discr))
#   ConditionalUseAccuracyEquality2 = (TN_priv/(TN_priv+FN_priv))- (TN_discr/(TN_discr+FN_discr))
#   ConditionalUseAccuracyEquality = (ConditionalUseAccuracyEquality1*ConditionalUseAccuracyEquality2) ## TO DO: REASON ON THIS METRIC!!

#   OAE1 = TP_priv-TP_discr
#   OAE2 = TN_priv-TN_discr
#   OverallAccuracyEquality = (OAE1*OAE2) ## TO DO: REASON ON THIS METRIC!!

#   TreatmentEquality_discr = (FN_discr/FP_discr)*len(discriminated)
#   TreatmentEquality_priv = (FN_priv/FP_priv)*len(privileged)

#   TreatmentEquality = TreatmentEquality_priv-TreatmentEquality_discr

#   FORParity_discr = (FN_discr)/(TN_discr+FN_discr)
#   FORParity_priv = (FN_priv)/(TN_priv+FN_priv)
#   FORParity = FORParity_priv-FORParity_discr

#   FN_P_discr =  (FN_discr)/len(discriminated)
#   FN_P_priv =  (FN_priv)/len(privileged)

#   FP_P_discr = (FP_discr)/len(discriminated)
#   FP_P_priv =  (FP_priv)/len(privileged)

#   #RecallParity = (TP_discr/(TP_discr+FN_discr))/(TP_priv/(TP_priv+FN_priv))

#   metrics = []
#   metrics.append(('GroupFairness',  GroupFairness, GroupFairness_discr, GroupFairness_priv))
#   metrics.append(('PredictiveParity', PredictiveParity, PredictiveParity_discr, PredictiveParity_priv))
#   metrics.append(('PredictiveEquality', PredictiveEquality, PredictiveEquality_discr, PredictiveEquality_priv))
#   metrics.append(('EqualOpportunity', EqualOpportunity, EqualOpportunity_discr, EqualOpportunity_priv))
#   metrics.append(('EqualizedOdds', EqualizedOdds, EqualizedOdds1, EqualizedOdds2))
#   metrics.append(('ConditionalUseAccuracyEquality', ConditionalUseAccuracyEquality, ConditionalUseAccuracyEquality1 , ConditionalUseAccuracyEquality2))
#   metrics.append(('OverallAccuracyEquality', OverallAccuracyEquality, OAE1, OAE2))
#   metrics.append(('TreatmentEquality', TreatmentEquality, TreatmentEquality_discr, TreatmentEquality_priv))
#   metrics.append(('FORParity', FORParity, FORParity_discr, FORParity_priv))
#   metrics.append(('FN', FN_P_priv-FN_P_discr, FN_P_discr, FN_P_priv))
#   metrics.append(('FP', FP_P_discr-FP_P_priv, FP_P_discr, FP_P_priv))

#   fairness_metrics = pd.DataFrame(metrics, columns = ['Metric', 'Value', 'Discr_group', 'Priv_group'])
#   return fairness_metrics

# Print and Save Results

In [None]:
for model_name in scores.keys():
  print(model_name)
  print("Mean Acc.", scores[model_name][0], "+/-", scores[model_name][1])
  print("-----------------------------")

Logistic Regression
Mean Acc. 0.75915 +/- 0.03018
-----------------------------
Decision Tree
Mean Acc. 0.67373 +/- 0.04737
-----------------------------
Bagging
Mean Acc. 0.75485 +/- 0.03279
-----------------------------
Random Forest
Mean Acc. 0.75935 +/- 0.04268
-----------------------------
Extremely Randomized Trees
Mean Acc. 0.74295 +/- 0.05607
-----------------------------
Ada Boost
Mean Acc. 0.71194 +/- 0.04394
-----------------------------


In [None]:
# Compute all the fairness metrics for all models, both with division and subtraction
div_metrics = {}
sub_metrics = {}

for model_name in predicted_values.keys():
  div_dict = fairness_metrics_division(confusion_matrices[model_name], discriminated, privileged)
  div_metrics[model_name] = div_dict

  sub_dict = fairness_metrics_subtraction(confusion_matrices[model_name], discriminated, privileged)
  sub_metrics[model_name] = sub_dict

overall_metrics = {}
overall_metrics['division'] = div_metrics
overall_metrics['subtraction'] = sub_metrics

In [None]:
metrics = ['GroupFairness', 'PredictiveParity', 'PredictiveEquality', 'EqualOpportunity', 'EqualizedOdds', 'ConditionalUseAccuracyEquality', 'OverallAccuracyEquality', 'TreatmentEquality', 'FORParity', 'FN', 'FP']
model_to_print = "Logistic Regression"
round_value = 5

print("Division \n")
for m in metrics:
  print(m, np.round(overall_metrics["division"][model_to_print][m]["Value"], round_value))
print("\nSubtraction \n")
for m in metrics:
  print(m, np.round(overall_metrics["subtraction"][model_to_print][m]["Value"], round_value))

Division 

GroupFairness -0.83337
PredictiveParity 0.17483
PredictiveEquality -0.92702
EqualOpportunity -0.36426
EqualizedOdds -0.92702
ConditionalUseAccuracyEquality 0.25382
OverallAccuracyEquality -0.89524
TreatmentEquality -0.80846
FORParity 1
FN 0.77669
FP -0.89219

Subtraction 

GroupFairness 0.29669
PredictiveParity -0.11692
PredictiveEquality 0.19053
EqualOpportunity -0.25296
EqualizedOdds 0.19053
ConditionalUseAccuracyEquality -0.17964
OverallAccuracyEquality 1
TreatmentEquality -1
FORParity 0.17964
FN 0.08228
FP -0.1052


In [None]:
#Save the metrics results
save_path = path_to_project + '/measurements/metrics-{}-original.p'.format(dataset_name)
with open(save_path, 'wb') as fp:
    pickle.dump(overall_metrics, fp, protocol=pickle.HIGHEST_PROTOCOL)

### How to evaluate the results?
Looking at the value for each corresponding metric:

- If the value is between 0 and 1-t the discriminated group suffers from unfairness
- If the value is greater than 1+t the privileged group suffers from unfairness
- If the value is between 1-t and 1+t both privileged and discriminated group have a fair treatment

t is a threshold that should be choose by the user according to the context and the goal of the task.
