# Bias Mitigation Experiments - COMPAS

## Table of Contents

<!-- To return to the table of contents, click on the number at any major section heading.

* [1. Prepare COMPAS data](#1.-Prepare-COMPAS-data)

* [2. Training Baseline Models](#2.-Training-Baseline-Models)
    * [2.1 Logistic Regression](#2.1-Learning-a-Logistic-Regression-(LR)-classifier-on-original-data)
    * [2.2 Random Forest](#2.2-Learning-a-Random-Forest-(RF)-classifier-on-original-data)

* [3. Bias mitigation using pre-processing technique](#3.-Bias-mitigation-using-pre-processing-technique)

   * [3.1 Reweighing](#3.1-Reweighing)
       * [3.1.1 RW on Logistic Regression](#3.1.1.-Learning-a-Logistic-Regression-(LR)-classifier-on-data-transformed-by-reweighing)
       * [3.1.2 RW on Random Forest](#3.1.2.-Learning-a-Random-Forest-(RF)-classifier-on-data-transformed-by-reweighing)
   * [3.2 Disparate impact remover ](#3.2-Disparate-impact-remover)
       * [3.2.1 DIR on Logistic Regression](#3.2.1-Learning-a-Logistic-Regression-(LR)-classifier-on-data-transformed-by-DIR)
       * [3.2.2 DIR on Random Forest](#3.2.2-Learning-a-Random-Forest-(RF)-classifier-on-data-transformed-by-DIR)
  
* [4. Bias mitigation using in-processing technique](#4.-Bias-mitigation-using-in-processing-technique)
    * [4.1 Prejudice Remover (PR)](#4.1-Prejudice-Remover-(PR))
    * [4.2 Adversarial Debiasing](#4.2-Adversarial-Debiasing)

* [5. Bias mitigation using post-processing technique](#5.-Bias-mitigation-using-post-processing-technique)
    * [5.1 Reject option classification (ROC)](#5.1-Reject-option-classification-(RejOpCl))
        * [5.1.1 ROC on Logistic Regression](#5.1.1-Reject-option-classification-on-Logistic-Regression)
        * [5.1.2 ROC on Random Forest](#5.1.2-Reject-option-classification-on-Random-Forest)
    * [5.2 Equalized odds processor (EOPP)](#5.2-Equalized-odds-processor-(EqOddsPr))
        * [5.2.1 EOPP on Logistic Regression](#5.2.1-Equalized-odds-processor-on-Logistic-Regression)
        * [5.2.2 EOPP on Random Forest](#5.2.2-Equalized-odds-processor-on-Random-Forest)

* [6. Summary of results](#6.-Summary-of-Model-Learning-Results)

* [7. Analyze rates of different mitigators](#7.-Analyze-rates-of-different-mitigators)

 -->

In [1]:
# to ignore warnings
import warnings
warnings.filterwarnings('ignore')

In [2]:
import helperfunctions as helpers
%load_ext autoreload
%autoreload 2
# allow automatic reloading of changes in helperfile

In [3]:
# Standard packages
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'
import tensorflow.compat.v1 as tf
tf.disable_eager_execution()
tf.logging.set_verbosity(tf.logging.ERROR)
import seaborn as sns
import random
from tqdm import tqdm
from numpy import mean
from numpy import std
from IPython.display import Markdown, display

# Plotting 
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from matplotlib.patches import Patch
from matplotlib.lines import Line2D

# Sklearn
from sklearn.cluster import KMeans
from sklearn.pipeline import Pipeline
from sklearn.compose import make_column_transformer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, train_test_split, ParameterGrid
from sklearn.model_selection import cross_val_score
from sklearn import preprocessing
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import KFold
from sklearn.model_selection import RepeatedKFold
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_validate
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score, roc_auc_score

# Fairness metrics
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric
from aif360.sklearn.metrics import statistical_parity_difference, average_odds_difference, equal_opportunity_difference, average_odds_error
from aif360.sklearn.metrics import disparate_impact_ratio

# Explainers
from aif360.explainers import MetricTextExplainer

# Scalers
from sklearn.preprocessing import StandardScaler
from aif360.sklearn.preprocessing import ReweighingMeta
from aif360.sklearn.inprocessing import AdversarialDebiasing
from aif360.sklearn.postprocessing import CalibratedEqualizedOdds, PostProcessingMeta

# Classifiers
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

# Bias mitigation techniques
# from aif360.algorithms.preprocessing import Reweighing
from aif360.algorithms.preprocessing import OptimPreproc
from aif360.algorithms.preprocessing import DisparateImpactRemover
from aif360.algorithms.preprocessing import LFR

# Set plot font
plt.rcParams.update({'font.family':'serif'})
plt.rcParams.update({'font.serif':'Times New Roman'})

pip install 'aif360[FairAdapt]'
pip install 'aif360[LFR]'


## [1.](#Table-of-Contents) Prepare COMPAS data

In [4]:
# read in data as pandas data frame
df = pd.read_csv("../data/Compas/compas-scores-two-years.csv")
# check columns
pd.set_option('display.max_columns', None) # expand all columns
df.head()

Unnamed: 0,id,name,first,last,compas_screening_date,sex,dob,age,age_cat,race,juv_fel_count,decile_score,juv_misd_count,juv_other_count,priors_count,days_b_screening_arrest,c_jail_in,c_jail_out,c_case_number,c_offense_date,c_arrest_date,c_days_from_compas,c_charge_degree,c_charge_desc,is_recid,r_case_number,r_charge_degree,r_days_from_arrest,r_offense_date,r_charge_desc,r_jail_in,r_jail_out,violent_recid,is_violent_recid,vr_case_number,vr_charge_degree,vr_offense_date,vr_charge_desc,type_of_assessment,decile_score.1,score_text,screening_date,v_type_of_assessment,v_decile_score,v_score_text,v_screening_date,in_custody,out_custody,priors_count.1,start,end,event,two_year_recid
0,1,miguel hernandez,miguel,hernandez,2013-08-14,Male,1947-04-18,69,Greater than 45,Other,0,1,0,0,0,-1.0,2013-08-13 06:03:42,2013-08-14 05:41:20,13011352CF10A,2013-08-13,,1.0,F,Aggravated Assault w/Firearm,0,,,,,,,,,0,,,,,Risk of Recidivism,1,Low,2013-08-14,Risk of Violence,1,Low,2013-08-14,2014-07-07,2014-07-14,0,0,327,0,0
1,3,kevon dixon,kevon,dixon,2013-01-27,Male,1982-01-22,34,25 - 45,African-American,0,3,0,0,0,-1.0,2013-01-26 03:45:27,2013-02-05 05:36:53,13001275CF10A,2013-01-26,,1.0,F,Felony Battery w/Prior Convict,1,13009779CF10A,(F3),,2013-07-05,Felony Battery (Dom Strang),,,,1,13009779CF10A,(F3),2013-07-05,Felony Battery (Dom Strang),Risk of Recidivism,3,Low,2013-01-27,Risk of Violence,1,Low,2013-01-27,2013-01-26,2013-02-05,0,9,159,1,1
2,4,ed philo,ed,philo,2013-04-14,Male,1991-05-14,24,Less than 25,African-American,0,4,0,1,4,-1.0,2013-04-13 04:58:34,2013-04-14 07:02:04,13005330CF10A,2013-04-13,,1.0,F,Possession of Cocaine,1,13011511MM10A,(M1),0.0,2013-06-16,Driving Under The Influence,2013-06-16,2013-06-16,,0,,,,,Risk of Recidivism,4,Low,2013-04-14,Risk of Violence,3,Low,2013-04-14,2013-06-16,2013-06-16,4,0,63,0,1
3,5,marcu brown,marcu,brown,2013-01-13,Male,1993-01-21,23,Less than 25,African-American,0,8,1,0,1,,,,13000570CF10A,2013-01-12,,1.0,F,Possession of Cannabis,0,,,,,,,,,0,,,,,Risk of Recidivism,8,High,2013-01-13,Risk of Violence,6,Medium,2013-01-13,,,1,0,1174,0,0
4,6,bouthy pierrelouis,bouthy,pierrelouis,2013-03-26,Male,1973-01-22,43,25 - 45,Other,0,1,0,0,2,,,,12014130CF10A,,2013-01-09,76.0,F,arrest case no charge,0,,,,,,,,,0,,,,,Risk of Recidivism,1,Low,2013-03-26,Risk of Violence,1,Low,2013-03-26,,,2,0,1102,0,0


In [5]:
# select features of interest
df = df[['sex','age','race','juv_fel_count','juv_misd_count','juv_other_count','priors_count','c_charge_degree', 'two_year_recid']]

# unique values of the protected feature race
df['race'].value_counts()

African-American    3696
Caucasian           2454
Hispanic             637
Other                377
Asian                 32
Native American       18
Name: race, dtype: int64

In [6]:
# encode protected features 

# make race (=protected attribute) a binary variable
array = ['Caucasian', 'African-American'] # only include samples corresponding to Caucasian or African-American
df = df.loc[df['race'].isin(array)]

def protected_race(row):
    if row['race'] == 'Caucasian':
        value = 1
    else:
        value = 0
    return value

df['race'] = df.apply(protected_race, axis=1)

# make sex a binary variable
def protected_sex(row):
    if row['sex'] == 'Female': # female is the privileged group in this case
        value = 1
    else:
        value = 0
    return value

df['sex'] = df.apply(protected_sex, axis=1)

In [7]:
# one hot encoding of object features
categorical_cols = list(df.select_dtypes(exclude=[np.number]).columns)
df = pd.get_dummies(df, columns = categorical_cols)

# encode binary feature: c_charge 
df = df.drop(['c_charge_degree_F'], axis=1)
# reset index
df = df.reset_index(drop=True)

In [8]:
# flip labels to ensure pos_label = 1, not becoming recidivist
df['two_year_recid'] = (~df['two_year_recid'].astype(bool)).astype(int)

# select descriptive features and target variable
X = df.loc[:, df.columns != 'two_year_recid'] # select all features but target feature
y = df[['race', 'two_year_recid']] # include protected feature in order to AIF360

In [9]:
# set protected attribute as index
X = X.set_index(['race'], append = True, drop = False)
y = y.set_index(['race'], append = True)

# make y data frames to 1d array to pass modeling, but keep index (protected attribute)
y = pd.Series(y['two_year_recid'], index=y.index)

## [2.](#Table-of-Contents) Proposed GridSearch Approach 

The GridSearch Approach includes hyperparameter, threshold, and Bias Mitigation

In [10]:
def store_metrics(y_test, y_pred, pred_prob, thres_dict, threshold):
    """Returns a dictionary with all interested accuracy and fairness metrics.
        Args:
            y_test (array-like): true labels from test set.
            y_pred (array-like): predicted labels for test set.
            thres_dict (dict): dictionary that stores all info.
            threshold (np.float): given threshold used to obtain y_pred.
        Returns:
            dict: `thres_dict`
    """
    
    # evaluate model performance for each split
    thres_dict[threshold]['acc_score'] += [accuracy_score(y_test, y_pred)]
    thres_dict[threshold]['bacc_score'] += [balanced_accuracy_score(y_test, y_pred)]
    thres_dict[threshold]['f1_score'] += [f1_score(y_test, y_pred)]
    thres_dict[threshold]['auc_score'] += [roc_auc_score(y_test, pred_prob)]
    thres_dict[threshold]['spd_score'] += [statistical_parity_difference(y_test, y_pred, prot_attr='race')]
    thres_dict[threshold]['aod_score'] += [average_odds_difference(y_test, y_pred, prot_attr='race')]

    return thres_dict

def get_avg_metrics(thres_dict):
    """Returns the average of all cv splits from the same model setting (hyperparameter and threshold).
    Args:
        thres_dict (dict): the dictionary with all info on each cv split.
    Returns:
        dict: `final_metrics`
    """ 
    import copy
    # calculate the average for each metrics from all splits
    avg_metrics = copy.deepcopy(thres_dict)
    for threshold in avg_metrics.keys(): 
        average_list = {}
        for metric in avg_metrics[threshold].keys():
            average_list['avg_%s'%metric] = mean(avg_metrics[threshold][metric])
        avg_metrics[threshold]['average'] = average_list
    return avg_metrics

def report_outcome(all_metrics, base_estimator_name):
    """Returns the output table from all param_grid.
    Args:
        all_metrics (dict): the final dictionary with info from all param_grid.
        base_estimator_name (str): the name of the base estimator that is shown in the output table.
    """ 

    output_table = pd.DataFrame()
    for model in all_metrics.keys():
        all_metrics[model]['parameters']['hparam'].pop('random_state', None)
        table_cv = pd.DataFrame(all_metrics[model]['metrics']['average'], index=[0])
        table_cv.insert(0, 'model', base_estimator_name)
        table_cv.insert(1, 'param', str(all_metrics[model]['parameters']['hparam']))
        table_cv.insert(2, 'Bias_Mitigation', str(all_metrics[model]['parameters']['Bias_Mitigation']))
        table_cv.insert(3, 'threshold', all_metrics[model]['parameters']['threshold'])
        output_table = pd.concat([output_table, table_cv]).reset_index(drop=True)
        
    output_table.Bias_Mitigation = output_table.Bias_Mitigation.replace({'no_BM':''})
    accuracy_metrics = ['avg_'+col for col in ['acc_score', 'bacc_score', 'f1_score', 'auc_score']]
    fairness_metrics = ['avg_'+col for col in ['spd_score', 'aod_score']]
    display(output_table.style.highlight_max(subset=accuracy_metrics,color='lightgreen')\
            .apply(lambda s: ['background: yellow' if abs(cell)==min(abs(s)) else '' for cell in s], subset=fairness_metrics))


In [11]:
# i = 0
# param_grid = {'hyperp_grid': {'solver':['lbfgs', 'sag']}, 
#               'threshold': np.linspace(0.3, 0.7, 2),
#               'Bias_Mitigation':[None]}
# thres_arr = param_grid['threshold']
# BM_arr = param_grid['Bias_Mitigation']
# param = list(ParameterGrid(param_grid['hyperp_grid']))[i]
# thres_BM_dict = {}
# # logistic regression classifier with stratified kfold cross validation
# skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=1234)
# thres_dict = {threshold: {'acc_score':[], 'bacc_score': [], 'f1_score': [], 'auc_score': [], 'spd_score': [], 'aod_score': []}\
#               for threshold in thres_arr}

# for train_index, test_index in skf.split(X, y):
#     X_train, X_test = X.iloc[train_index,:],X.iloc[test_index,:]
#     y_train , y_test = y[train_index] , y[test_index]

#     # normalize data features, fit on training set to avoid data leakage
#     scaler = StandardScaler()
#     scaler.fit(X_train[X_train.columns])
#     X_train[X_train.columns] = scaler.transform(X_train[X_train.columns])
#     X_test[X_test.columns] = scaler.transform(X_test[X_test.columns])

#     model = ReweighingMeta(estimator=LogisticRegression(**param))
#     model.fit(X_train,y_train)
#     for threshold in thres_arr:
#         pred_prob = model.predict_proba(X_test)[:, 1]
#         y_pred = (pred_prob >= threshold).astype('int') # set threshold

#         thres_dict = store_metrics(y_test, y_pred, thres_dict, threshold)
#     avg_metrics = get_avg_metrics(thres_dict)
#     # thres_BM_dict[BM_name] = avg_metrics


# y_pred

In [12]:
class skf_model():
    def __init__(self, cv, report, random_state):
        self.cv = cv
        self.report = report
        self.random_state = random_state
        
    # @methods
    def get_metrics(self, X, y, base='LR', param = {}, BM_arr=[None], thres_arr=[0.5]):
        # logistic regression classifier with stratified kfold cross validation
        skf = StratifiedKFold(n_splits=self.cv, shuffle=True, random_state=self.random_state)
        thres_dict = {threshold: {'acc_score':[], 'bacc_score': [], 'f1_score': [], 'auc_score': [],
                                  'spd_score': [], 'aod_score': []}\
                      for threshold in thres_arr}
        thres_BM_dict = {}
        
        for train_index, test_index in skf.split(X, y):
            X_train, X_test = X.iloc[train_index,:],X.iloc[test_index,:]
            y_train , y_test = y[train_index] , y[test_index]

            # normalize data features, fit on training set to avoid data leakage
            scaler = StandardScaler()
            scaler.fit(X_train[X_train.columns])
            X_train[X_train.columns] = scaler.transform(X_train[X_train.columns])
            X_test[X_test.columns] = scaler.transform(X_test[X_test.columns])

            # fit model
            param.update({'random_state': self.random_state})
            if base=='LR':
                base_estimator=LogisticRegression(**param)
            elif base=='RF':
                base_estimator=RandomForestClassifier(**param)
                
            for BM in BM_arr:
                BM_name = BM
                if BM == None:
                    BM_name = 'no_BM'
                    model = base_estimator
                elif BM == 'RW':
                    model = ReweighingMeta(estimator=base_estimator)
                elif BM == 'AD':
                    model = AdversarialDebiasing(prot_attr='race', random_state=self.random_state)

                model.fit(X_train,y_train)
                for threshold in thres_arr:
                    pred_prob = model.predict_proba(X_test)[:,1]
                    y_pred = (pred_prob >= threshold).astype('int') # set threshold

                    thres_dict = store_metrics(y_test, y_pred, pred_prob, thres_dict, threshold)
                avg_metrics = get_avg_metrics(thres_dict)
                thres_BM_dict[BM_name] = avg_metrics

        return thres_BM_dict

In [13]:
def GridsearchCV_LRC(param_grid, cv=10, X=X, y=y, report=True, random_state=1234):
    if 'hyperp_grid' in param_grid: 
        hyperp_grid = list(ParameterGrid(param_grid['hyperp_grid']))
    else: hyperp_grid = [{'penalty':'l2'}] # default setting    
    if 'Bias_Mitigation' not in param_grid: 
        param_grid['Bias_Mitigation'] = [None]        
    if 'threshold' not in param_grid: 
        param_grid['threshold'] = [0.5]
        
    all_metrics = {}
    for i, param in enumerate(tqdm(hyperp_grid)):
        if report:
            print(param)
        model = skf_model(cv, report, random_state)
        metrics = model.get_metrics(X=X, y=y, param=param, 
                                    BM_arr=param_grid['Bias_Mitigation'], thres_arr=param_grid['threshold'])
        # print(metrics)
        for j, BM in enumerate(metrics.keys()):
            for k,thres in enumerate(metrics[BM].keys()):
                all_param = {'hparam':param, 'Bias_Mitigation':BM, 'threshold':thres}
                all_metrics['LR_%s%s%s'%(i,j,k)] = {'parameters':all_param, 'metrics':metrics[BM][thres]}

           
    if report==True:
        report_outcome(all_metrics, 'LR')
    
    return all_metrics

In [14]:
param_grid = {'hyperp_grid': {'C':[1, 10],'solver':['liblinear', 'saga'],'penalty':['l1','l2']}, 
              'threshold': np.linspace(0.3, 0.7, 7),
              'Bias_Mitigation':[None,'RW','AD']}
# param_grid = {'hyperp_grid': {'C':[0.1, 1]},
#               'threshold': np.linspace(0.3, 0.7, 11),
#               'Bias_Mitigation':[None]}
all_metrics = GridsearchCV_LRC(param_grid=param_grid, X=X, y=y)

  0%|                                                            | 0/8 [00:00<?, ?it/s]

{'C': 1, 'penalty': 'l1', 'solver': 'liblinear'}
Metal device set to: Apple M1 Pro


2022-12-20 12:28:01.407839: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 12:28:01.407966: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 12:28:01.537859: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-12-20 12:28:01.541017: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-12-20 12:28:01.541649: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:28:01.587015: I tensorflow/core/grappler/optimizers/custom_graph

{'C': 1, 'penalty': 'l1', 'solver': 'saga'}


2022-12-20 12:31:49.275125: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 12:31:49.275152: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 12:31:49.402381: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:31:49.432810: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:32:07.427758: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:32:12.07

{'C': 1, 'penalty': 'l2', 'solver': 'liblinear'}


2022-12-20 12:35:39.778556: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 12:35:39.778585: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 12:35:39.920084: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:35:39.960373: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:35:59.927933: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:36:04.70

{'C': 1, 'penalty': 'l2', 'solver': 'saga'}


2022-12-20 12:39:33.498592: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 12:39:33.498617: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 12:39:33.630427: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:39:33.667596: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:39:52.036600: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:39:56.71

{'C': 10, 'penalty': 'l1', 'solver': 'liblinear'}


2022-12-20 12:43:33.596872: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 12:43:33.596896: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 12:43:33.729603: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:43:33.767686: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:43:51.955881: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:43:56.64

{'C': 10, 'penalty': 'l1', 'solver': 'saga'}


2022-12-20 12:47:29.711498: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 12:47:29.711525: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 12:47:29.841845: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:47:29.887525: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:47:49.106197: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:47:53.81

{'C': 10, 'penalty': 'l2', 'solver': 'liblinear'}


2022-12-20 12:51:27.433693: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 12:51:27.433716: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 12:51:27.570849: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:51:27.617864: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:51:47.207184: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:51:51.88

{'C': 10, 'penalty': 'l2', 'solver': 'saga'}


2022-12-20 12:55:24.780145: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 12:55:24.780169: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 12:55:24.911528: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:55:24.962097: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:55:43.380676: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:55:48.10

Unnamed: 0,model,param,Bias_Mitigation,threshold,avg_acc_score,avg_bacc_score,avg_f1_score,avg_auc_score,avg_spd_score,avg_aod_score
0,LR,"{'C': 1, 'penalty': 'l1', 'solver': 'liblinear'}",,0.3,0.605633,0.580824,0.719836,0.721844,-0.095805,-0.082201
1,LR,"{'C': 1, 'penalty': 'l1', 'solver': 'liblinear'}",,0.366667,0.629675,0.60895,0.725409,0.721844,-0.115419,-0.095241
2,LR,"{'C': 1, 'penalty': 'l1', 'solver': 'liblinear'}",,0.433333,0.653949,0.638672,0.727177,0.721844,-0.145,-0.118815
3,LR,"{'C': 1, 'penalty': 'l1', 'solver': 'liblinear'}",,0.5,0.670848,0.663827,0.712861,0.721844,-0.176495,-0.144454
4,LR,"{'C': 1, 'penalty': 'l1', 'solver': 'liblinear'}",,0.566667,0.657724,0.660715,0.656707,0.721844,-0.193434,-0.160556
5,LR,"{'C': 1, 'penalty': 'l1', 'solver': 'liblinear'}",,0.633333,0.623113,0.635218,0.562493,0.721844,-0.168921,-0.138208
6,LR,"{'C': 1, 'penalty': 'l1', 'solver': 'liblinear'}",,0.7,0.581591,0.601336,0.439677,0.721844,-0.131978,-0.107169
7,LR,"{'C': 1, 'penalty': 'l1', 'solver': 'liblinear'}",RW,0.3,0.605999,0.581169,0.720078,0.722148,-0.097751,-0.084285
8,LR,"{'C': 1, 'penalty': 'l1', 'solver': 'liblinear'}",RW,0.366667,0.629941,0.609172,0.725669,0.722148,-0.118089,-0.098095
9,LR,"{'C': 1, 'penalty': 'l1', 'solver': 'liblinear'}",RW,0.433333,0.654444,0.639101,0.727725,0.722148,-0.147448,-0.121295


In [15]:
def GridsearchCV_RFC(param_grid, cv=10, X=X, y=y, report=True, random_state=1234):
    if 'hyperp_grid' in param_grid: 
        hyperp_grid = list(ParameterGrid(param_grid['hyperp_grid']))
    else: hyperp_grid = [{'penalty':'l2'}] # default setting    
    if 'Bias_Mitigation' not in param_grid: 
        param_grid['Bias_Mitigation'] = [None]        
    if 'threshold' not in param_grid: 
        param_grid['threshold'] = [0.5]
        
    all_metrics = {}
    for i, param in enumerate(tqdm(hyperp_grid)):
        if report:
            print(param)
        model = skf_model(cv, report, random_state)
        metrics = model.get_metrics(X=X, y=y, base='RF', param=param, 
                                    BM_arr=param_grid['Bias_Mitigation'], thres_arr=param_grid['threshold'])
        # print(metrics)
        for j, BM in enumerate(metrics.keys()):
            for k,thres in enumerate(metrics[BM].keys()):
                all_param = {'hparam':param, 'Bias_Mitigation':BM, 'threshold':thres}
                all_metrics['LR_%s%s%s'%(i,j,k)] = {'parameters':all_param, 'metrics':metrics[BM][thres]}

           
    if report==True:
        report_outcome(all_metrics, 'RF')
    
    return all_metrics

In [16]:
param_grid = {'hyperp_grid': {'n_estimators':[10, 100],'criterion':['gini', 'entropy'], 'max_depth':[8, None]}, 
              'threshold': np.linspace(0.3, 0.7, 7),
              'Bias_Mitigation':[None,'RW','AD']}
# param_grid = {'hyperp_grid': {'n_estimators':[10, 100]},
#               'threshold': np.linspace(0.3, 0.7, 11),
#               'Bias_Mitigation':[None]}
all_metrics = GridsearchCV_RFC(param_grid=param_grid, X=X, y=y)

  0%|                                                            | 0/8 [00:00<?, ?it/s]

{'criterion': 'gini', 'max_depth': 8, 'n_estimators': 10}


2022-12-20 12:59:23.582867: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 12:59:23.582888: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 12:59:23.719448: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:59:23.775232: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:59:42.360934: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 12:59:47.07

{'criterion': 'gini', 'max_depth': 8, 'n_estimators': 100}


2022-12-20 13:03:27.943271: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 13:03:27.943299: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 13:03:28.073452: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:03:28.123184: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:03:47.137316: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:03:52.11

{'criterion': 'gini', 'max_depth': None, 'n_estimators': 10}


2022-12-20 13:07:34.411538: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 13:07:34.411568: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 13:07:34.541936: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:07:34.587554: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:07:53.185967: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:07:57.94

{'criterion': 'gini', 'max_depth': None, 'n_estimators': 100}


2022-12-20 13:11:34.156720: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 13:11:34.156748: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 13:11:34.289509: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:11:34.339913: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:11:54.130434: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:11:59.48

{'criterion': 'entropy', 'max_depth': 8, 'n_estimators': 10}


2022-12-20 13:15:44.127623: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 13:15:44.127646: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 13:15:44.260897: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:15:44.323468: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:16:03.486304: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:16:08.22

{'criterion': 'entropy', 'max_depth': 8, 'n_estimators': 100}


2022-12-20 13:19:50.945700: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 13:19:50.945723: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 13:19:51.077495: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:19:51.142369: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:20:10.027506: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:20:15.10

{'criterion': 'entropy', 'max_depth': None, 'n_estimators': 10}


2022-12-20 13:24:03.655304: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 13:24:03.655329: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 13:24:03.794130: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:24:03.862384: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:24:22.823999: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:24:27.61

{'criterion': 'entropy', 'max_depth': None, 'n_estimators': 100}


2022-12-20 13:28:18.058721: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-12-20 13:28:18.058744: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-12-20 13:28:18.195860: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:28:18.274316: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:28:38.215752: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2022-12-20 13:28:43.83

Unnamed: 0,model,param,Bias_Mitigation,threshold,avg_acc_score,avg_bacc_score,avg_f1_score,avg_auc_score,avg_spd_score,avg_aod_score
0,RF,"{'criterion': 'gini', 'max_depth': 8, 'n_estimators': 10}",,0.3,0.616028,0.592623,0.723177,0.721702,-0.121669,-0.106901
1,RF,"{'criterion': 'gini', 'max_depth': 8, 'n_estimators': 10}",,0.366667,0.642683,0.624887,0.726366,0.721702,-0.154107,-0.131319
2,RF,"{'criterion': 'gini', 'max_depth': 8, 'n_estimators': 10}",,0.433333,0.664286,0.652055,0.725887,0.721702,-0.170584,-0.141052
3,RF,"{'criterion': 'gini', 'max_depth': 8, 'n_estimators': 10}",,0.5,0.670093,0.663957,0.709155,0.721702,-0.173752,-0.140196
4,RF,"{'criterion': 'gini', 'max_depth': 8, 'n_estimators': 10}",,0.566667,0.663995,0.664517,0.674686,0.721702,-0.187124,-0.15352
5,RF,"{'criterion': 'gini', 'max_depth': 8, 'n_estimators': 10}",,0.633333,0.643089,0.651232,0.611314,0.721702,-0.164344,-0.131685
6,RF,"{'criterion': 'gini', 'max_depth': 8, 'n_estimators': 10}",,0.7,0.594483,0.612151,0.477302,0.721702,-0.117613,-0.091472
7,RF,"{'criterion': 'gini', 'max_depth': 8, 'n_estimators': 10}",RW,0.3,0.616316,0.592879,0.723407,0.721557,-0.122571,-0.107864
8,RF,"{'criterion': 'gini', 'max_depth': 8, 'n_estimators': 10}",RW,0.366667,0.643117,0.625286,0.726754,0.721557,-0.155466,-0.13263
9,RF,"{'criterion': 'gini', 'max_depth': 8, 'n_estimators': 10}",RW,0.433333,0.664536,0.652238,0.72627,0.721557,-0.172083,-0.142469
