# Project 4：Machine Learning Fairness Algorithms Evaluation
# 1. Intro

Links: <br>
Dataset: https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis <br>
Class slides with basic info about dataset: https://docs.google.com/presentation/d/1RQZZpXmt1i-DyEEAZTFiBvrpuePFMJh69tXEWDPJIO8/edit#slide=id.g11f80bb4f73_0_115 <br>
Dataset Github: https://github.com/propublica/compas-analysis <br>
A2 model github: https://github.com/mbilalzafar/fair-classification <br>

Papers: <br>
Maximizing accuracy under fairness constraints (C-SVM and C-LR) <br>
Handling Conditional Discrimination (LM and LPS) <br>

Team Members:
1. Arceneaux, Luke lpa2114@columbia.edu
2. Ren, Xiaoxue xr2159@columbia.edu
3. Wei, Jiahao jw4312@columbia.edu
4. Xia, Weijie wx2281@columbia.edu
5. Xu, Mingze mx2269@columbia.edu
6. Zhu, Yiming yz4336@columbia.edu

# 2. Preparation

        2.1 Packages

In [76]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC
from scipy.optimize import fmin_slsqp
import sys
sys.path.insert(0, '../lib/')
import utils as ut
import loss_funcs as lf
import csvm as csvm

        2.2 Data Cleaning and Wrangling

Directly observing the data, we see that there are 52 features. Some of them are useful, while some are not. <br>
As a result, we decided to extract the features that are useful in this project. <br>
An initial feature selection is about these: ['id','sex','age','age_cat','race','decile_score','score_text','priors_count', <br>
'c_charge_degree',  'is_recid','r_charge_degree','is_violent_recid', 'vr_charge_degree','v_decile_score', 'v_score_text','two_year_recid']

In [77]:
# Importing the raw dataset
raw_data = pd.read_csv("/Users/xu/Desktop/5243 project4/compas-scores-two-years.csv")
# raw_data.head()

# Create New Variable length_of_stay
length_of_stay = pd.to_datetime(raw_data["c_jail_out"]) - pd.to_datetime(raw_data["c_jail_in"])
length_of_stay_days = length_of_stay.astype('timedelta64[h]') / 24

# Filtering the features we need
filtered_df = raw_data.loc[
    (raw_data["race"].isin(["African-American", "Caucasian"])) &
    (length_of_stay_days > 0)
].assign(
    length_of_stay=length_of_stay_days
)[[
    "two_year_recid", "id", "sex", "race", "age", "age_cat", "decile_score", "score_text", "c_charge_degree", 
    "is_recid", "is_violent_recid", "v_decile_score", "v_score_text", "priors_count", "length_of_stay"
]]

# Drop missing values
filtered_df = filtered_df.dropna()

# Display filtered data
filtered_df.head(5)

Unnamed: 0,two_year_recid,id,sex,race,age,age_cat,decile_score,score_text,c_charge_degree,is_recid,is_violent_recid,v_decile_score,v_score_text,priors_count,length_of_stay
1,1,3,Male,African-American,34,25 - 45,3,Low,F,1,1,1,Low,0,10.041667
2,1,4,Male,African-American,24,Less than 25,4,Low,F,1,0,3,Low,4,1.083333
6,1,8,Male,Caucasian,41,25 - 45,6,Medium,F,1,0,2,Low,14,6.291667
8,0,10,Female,Caucasian,39,25 - 45,1,Low,M,0,0,1,Low,0,2.916667
9,1,13,Male,Caucasian,21,Less than 25,3,Low,F,1,1,5,Medium,1,0.958333


In [78]:
# Check if any of the length_of_stay is negative, which is unreasonable.
if (filtered_df['length_of_stay'] < 0).any():
    print("length_of_stay contains negative numbers")
else:
    print("length_of_stay does not contain negative numbers")

length_of_stay does not contain negative numbers


In [79]:
# Converting some of the features to be binary or categorical
filtered_df['sex'] = filtered_df['sex'].apply(lambda x: 1 if x == 'Male' else 0)
filtered_df['race'] = filtered_df['race'].apply(lambda x: 1 if x == 'Caucasian' else 0)

age_cat_map = {'Less than 25': 0, '25 - 45': 1, 'Greater than 45': 2}
filtered_df['age_cat'] = filtered_df['age_cat'].apply(lambda x: age_cat_map[x])

decile_score_map = {'Low': 0, 'Medium': 1, 'High': 2}
filtered_df['score_text'] = filtered_df['score_text'].apply(lambda x: decile_score_map[x])

filtered_df['c_charge_degree'] = filtered_df['c_charge_degree'].apply(lambda x: 1 if x == 'F' else 0)

v_score_map = {'Low': 0, 'Medium': 1, 'High': 2}
filtered_df['v_score_text'] = filtered_df['v_score_text'].apply(lambda x: v_score_map[x])

# filtered_df.head()

In [80]:
# Further features selection
cleaned_data = filtered_df.loc[:, ["two_year_recid", "sex", "race", "age_cat", "c_charge_degree", "v_score_text",
                                   "score_text", "priors_count", "length_of_stay"]]                       

# Normalize priors_count and length_of_stay
cleaned_data["priors_count"] = (cleaned_data["priors_count"] - cleaned_data["priors_count"].mean())/cleaned_data["priors_count"].std()
cleaned_data["length_of_stay"] = (cleaned_data["length_of_stay"] - cleaned_data["length_of_stay"].mean())/cleaned_data["length_of_stay"].std()

# cleaned_data is the final cleaned dataframe that we end up with
cleaned_data = cleaned_data.reset_index(drop=True)

# Display a few lines of the cleaned data
cleaned_data.head()

# len(cleaned_data)       5699rows

Unnamed: 0,two_year_recid,sex,race,age_cat,c_charge_degree,v_score_text,score_text,priors_count,length_of_stay
0,1,1,0,1,1,0,0,-0.738411,-0.187151
1,1,1,0,0,1,0,0,0.045203,-0.356541
2,1,1,1,1,1,0,1,2.00424,-0.258059
3,0,0,1,1,0,0,0,-0.738411,-0.321875
4,1,1,1,0,1,1,0,-0.542508,-0.358905


        2.3 Splitting Dataframe

In [81]:
# According to the instructure, the ratio should be 5:1:1
# It is roughly the same as 5:1:1
train_ratio = 0.7
val_ratio = 0.15
test_ratio = 0.15

# Spliting the dataframe into train, val, and test with roughly 5:1:1
train_val, test = train_test_split(cleaned_data, test_size=test_ratio, random_state=110)
train, val = train_test_split(train_val, test_size=val_ratio/(train_ratio+val_ratio), random_state=110)

# print the shapes of the resulting dataframes
print(f"Training set shape: {train.shape}")
print(f"Validation set shape: {val.shape}")
print(f"Testing set shape: {test.shape}")

Training set shape: (3989, 9)
Validation set shape: (855, 9)
Testing set shape: (855, 9)


In [82]:
label = "two_year_recid"
sensitive_feature = "race"
features = list(cleaned_data.columns)

X_train = train.drop(label, axis=1)
Y_train = train[label].to_numpy()

X_val = val.drop(label, axis=1)
Y_val = val[label].to_numpy()

X_test = test.drop(label, axis=1)
Y_test = test[label].to_numpy()

race_train = train[sensitive_feature]
race_val = val[sensitive_feature]
race_test = test[sensitive_feature]

# 3. Functions defined

In [83]:
# Calibration
# Computes the calibration difference between Caucasians and African-Americans groups in the predicted outcomes.
def calibrate_difference(sensitive_features, y_pred, y_true):
    caucasians_index = np.where(sensitive_features == 1)[0]
    aa_index = np.where(sensitive_features == 0)[0]
    y_pred_caucasians = y_pred[caucasians_index]
    y_true_caucasians = y_true[caucasians_index]
    y_pred_aa = y_pred[aa_index]
    y_true_aa = y_true[aa_index]
    accuracy_caucasians = sum(y_pred_caucasians == y_true_caucasians) / len(y_true_caucasians)
    accuracy_aa = sum(y_pred_aa == y_true_aa) / len(y_true_aa)
    calib_diff = abs(accuracy_caucasians - accuracy_aa)
    return calib_diff


In [84]:
# p_rule
def p_rule(sensitive_features, y_pred):
    caucasians_index = np.where(sensitive_features == 1)[0]
    aa_index = np.where(sensitive_features == 0)[0]
    caucasians_pred = np.where(y_pred[caucasians_index] == 1)
    aa_pred = np.where(y_pred[aa_index] == 1)
    caucasians_percent = caucasians_pred[0].shape[0]/caucasians_index.shape[0]
    aa_percent = aa_pred[0].shape[0]/aa_index.shape[0]
    ratio = min(caucasians_percent/aa_percent, aa_percent/caucasians_percent)
    return ratio, caucasians_percent, aa_percent

In [85]:
# Use hinge loss function to compute the loss
def hinge_loss(w, X, y, C):
    y_hat = y * np.dot(X,w)
    y_hat = np.maximum(np.zeros_like(y_hat), (1-y_hat)) # hinge function
    
    return C*sum(y_hat)

# 4. Baseline

        4.1 Logistic Regression

In [86]:
# Fit a logistic regression classifier
lr_base = LogisticRegression(random_state=110)
lr_base.fit(X_train, Y_train)

# On validation set
accuracy_lr_base_val = lr_base.score(X_val, Y_val)
print(f"Accuracy on the validation set: {accuracy_lr_base_val}")

# Testing 
accuracy_lr_base_test = lr_base.score(X_test, Y_test)
print(f"Test accuracy: {accuracy_lr_base_test:.4f}")

Accuracy on the validation set: 0.672514619883041
Test accuracy: 0.6573


        4.2 SVM

In [87]:
# Fit an SVM classifier
svm_base = SVC(random_state=110)
svm_base.fit(X_train, Y_train)

# On validation
accuracy_svm_base_val = svm_base.score(X_val, Y_val)
print(f"Accuracy on the validation set: {accuracy_svm_base_val}")

# Testing
accuracy_svm_base_test = svm_base.score(X_test, Y_test)
print(f"Test accuracy: {accuracy_svm_base_test:.4f}")


Accuracy on the validation set: 0.6678362573099416
Test accuracy: 0.6585


        4.3 Summary of Baseline Models

In [88]:
summary_baseline = {"Methods": ["LR", "LR", "SVM", "SVM"], 
              "Set": ["Val", "Test", "Val", "Test"],
              "Accuracy (%)": [accuracy_lr_base_val*100, accuracy_lr_base_test*100, accuracy_svm_base_val*100, accuracy_svm_base_test*100],
              "Calibration(%)": [calibrate_difference(race_val, lr_base.predict(X_val), Y_val)*100,
                                 calibrate_difference(race_test, lr_base.predict(X_test), Y_test)*100,
                                 calibrate_difference(race_val, svm_base.predict(X_val), Y_val)*100,
                                 calibrate_difference(race_test, svm_base.predict(X_test), Y_test)*100],
              "p_rule": [p_rule(race_val, lr_base.predict(X_val))[0],
                                 p_rule(race_test, lr_base.predict(X_test))[0],
                                 p_rule(race_val, svm_base.predict(X_val))[0],
                                 p_rule(race_test, svm_base.predict(X_test))[0]]}
pd.DataFrame(summary_baseline)

Unnamed: 0,Methods,Set,Accuracy (%),Calibration(%),p_rule
0,LR,Val,67.251462,1.973728,0.426691
1,LR,Test,65.730994,0.393226,0.544211
2,SVM,Val,66.783626,2.596923,0.491003
3,SVM,Test,65.847953,2.654994,0.634721


# 5. Optimization with Fairness Constraints (Paper A2)

        5.1 CLR

In [89]:
# Train model, use from utils
np.random.seed(100)
w = ut.train_model(X_train,
                   Y_train,
                   x_control = {'race': race_train},
                   loss_function = lf._logistic_loss,
                   apply_fairness_constraints = 1,
                   apply_accuracy_constraint = 0,
                   sep_constraint = 0,
                   sensitive_attrs = ['race'],
                   sensitive_attrs_to_cov_thresh = {'race': 0},
                   gamma = None)


# Fit coefficients/weights into logistic regression in sklearn
CLR = LogisticRegression()
CLR.coef_= w.reshape((1,-1))
CLR.intercept_ = 0
CLR.classes_ = np.array([0, 1])

In [90]:
# On validation set
accuracy_CLR_val = CLR.score(X_val, Y_val)
print(f"Accuracy on the validation set: {accuracy_CLR_val}")

# Testing 
accuracy_CLR_test = CLR.score(X_test, Y_test)
print(f"Test accuracy: {accuracy_CLR_test:.4f}")

Accuracy on the validation set: 0.48771929824561405
Test accuracy: 0.4982




        5.2 CSVM

In [91]:
CSVM = csvm.SVM()
w = CSVM.train_model(X_train, 
                     Y_train, 
                     x_control = {'race': race_train}, 
                     loss_function = hinge_loss, 
                     C =1, 
                     max_iter = -1, 
                     lamb = 1, 
                     epochs = 1000, 
                     lr = 0.1, 
                     apply_fairness_constraints = 1, 
                     sensitive_attrs = ['race'],
                     sensitive_attrs_to_cov_thresh = {'race': 0},
                     gamma = None)
csvm_predict_y_val = np.sign(np.dot(X_val, w))
csvm_predict_y_test = np.sign(np.dot(X_test, w))
csvm_val_accuracy = sum(csvm_predict_y_val == Y_val)/len(Y_val)
csvm_test_accuracy = sum(csvm_predict_y_test == Y_test)/len(Y_test)

Running custom model


In [92]:
summary_c_svm = {"Classifier": ["C-SVM", "C-SVM"],
                "Set": ["Val", "Test"],
                "Accuracy (%)": [csvm_val_accuracy*100, csvm_test_accuracy*100],
                "Calibration (%)": [calibrate_difference(race_val, csvm_predict_y_val, Y_val) * 100, 
                                    calibrate_difference(race_test, csvm_predict_y_test, Y_test) * 100],
                "p_rule": [p_rule(race_val, csvm_predict_y_val)[0],
                           p_rule(race_test, csvm_predict_y_test)[0]]}
pd.DataFrame(summary_c_svm)

Unnamed: 0,Classifier,Set,Accuracy (%),Calibration (%),p_rule
0,C-SVM,Val,47.017544,8.93973,0.941057
1,C-SVM,Test,47.48538,5.43628,0.938462


        5.3 Summary of A2

In [93]:
summary_A2 = {"Methods": ["CLR", "CLR", "CSVM", "CSVM"], 
              "Set": ["Val", "Test", "Val", "Test"],
              "Accuracy (%)": [accuracy_CLR_val*100, accuracy_CLR_test*100, csvm_val_accuracy*100, csvm_test_accuracy*100],
              "Calibration(%)": [calibrate_difference(race_val, CLR.predict(X_val), Y_val)*100,
                                 calibrate_difference(race_test, CLR.predict(X_test), Y_test)*100,
                                 calibrate_difference(race_val, csvm_predict_y_val, Y_val) * 100,
                                 calibrate_difference(race_test, csvm_predict_y_test, Y_test) * 100],
              "p_rule": [p_rule(race_val, CLR.predict(X_val))[0],
                                 p_rule(race_test, CLR.predict(X_test))[0],
                                 p_rule(race_val, csvm_predict_y_val)[0],
                                 p_rule(race_test, csvm_predict_y_test)[0]]}

pd.DataFrame(summary_A2)



Unnamed: 0,Methods,Set,Accuracy (%),Calibration(%),p_rule
0,CLR,Val,48.77193,11.98851,1.0
1,CLR,Test,49.824561,8.791619,0.998936
2,CSVM,Val,47.017544,8.93973,0.941057
3,CSVM,Test,47.48538,5.43628,0.938462


# 6. Local Massaging & Local Preferential Sampling (Paper A6)

        6.1 Local Massaging

In [94]:
#df is the dataframe with all columns to be partitioned
#e is the name of explainatory varaible
def PARTITION(df, e):
    groups = []
    uniques = np.unique(df[e])
    for u in uniques:
        groups.append(df[df[e]==u])
    return groups
    
#part is the dataframe with all columns with same value for the explainatory variable
#si is the current sensitive parameter value
def DELTA(df, part, si):
    isRace = part['race']==si
    Gi = sum(isRace)
    n = len(df)
    
    num = sum(part[isRace]['two_year_recid']==1)/n
    denom = len(part[isRace])/n
    P = num/denom
    
    isNotRace = part['race']!=si
    num = sum(part[isNotRace]['two_year_recid']==1)/n
    denom = len(part[isNotRace])/n

    P_star = 0.5*(P + num/denom) #explainable difference
    return np.floor(Gi*abs(P-P_star)).astype(np.int64)

In [95]:
#Local Massaging algorithm
LM_labels = []
for part in PARTITION(train, 'c_charge_degree'):
    #train model
    X_part = part.drop('two_year_recid', axis=1)
    y_part = part['two_year_recid']
    log_model = LogisticRegression(random_state=110)
    model=log_model.fit(X_part, y_part)
    
    #predict labels from model
    part1 = part[part['race']==1]
    part1.reset_index(drop=True, inplace=True)
    delta1 = DELTA(train, part, 1)
    X_part1 = part1.drop('two_year_recid', axis=1)
    y_part1 = part1['two_year_recid']
    rank = pd.DataFrame(model.decision_function(X_part1), columns = ['rank'])
    comb1 = pd.concat([part1, rank], axis=1)
    
    part0 = part[part['race']==0]
    part0.reset_index(drop=True, inplace=True)
    delta0 = DELTA(train, part, 0)
    X_part0 = part0.drop('two_year_recid', axis=1)
    y_part0 = part0['two_year_recid']
    rank = pd.DataFrame(model.decision_function(X_part0), columns = ['rank'])
    comb0 = pd.concat([part0, rank], axis=1)

    #for C, relabel closest delta datapoints from C to AA
    comb1 = comb1.sort_values(['rank'])
    comb1.reset_index(drop=True, inplace=True)
    
    t = sum(comb1['rank']>0)
    l = len(comb1)
    
    fix1 = np.full(l-t, False)
    relabel = np.full(delta1, True)
    fix2 = np.full(t-delta1, False)
    comb1.loc[np.concatenate([fix1, relabel, fix2]), 'two_year_recid'] = 0
    LM_labels.append(comb1)
    
    #for AA, relabel closest delta datapoints from AA to C
    comb0 = comb0.sort_values(['rank'])
    comb0.reset_index(drop=True, inplace=True)
    
    t = sum(comb0['rank']<0)
    l = len(comb0)
    
    fix1 = np.full(t-delta0, False)
    relabel = np.full(delta0, True)
    fix2 = np.full(l-t, False)
    comb0.loc[np.concatenate([fix1, relabel, fix2]), 'two_year_recid'] = 1
    LM_labels.append(comb0)
    
loc_mass = pd.concat(LM_labels, axis=0)

In [96]:
features=["sex", "race", "age_cat", "c_charge_degree", "v_score_text", "score_text", "priors_count", "length_of_stay"]
X_train_mass = loc_mass[features]
Y_train_mass = loc_mass['two_year_recid'].reset_index(drop = True)
log_model = LogisticRegression(random_state=110)
LM_model = log_model.fit(X_train_mass, Y_train_mass)
LM_acc_train = log_model.score(X_train_mass, Y_train_mass)
LM_acc_test = LM_model.score(X_test, Y_test)

        6.2 Local Preferential Sampling


In [97]:
#Local Preferential Sampling
LPS_labels = []
partition = PARTITION(train, 'c_charge_degree')
for part in partition:
    #train model
    X_part = part.drop("two_year_recid", axis=1)
    y_part = part["two_year_recid"]
    log_model = LogisticRegression(random_state=110)
    model = log_model.fit(X_part, y_part)
    
    #predict labels from model
    part1 = part[part["race"]==1]
    part1.reset_index(drop=True, inplace=True)
    delta1 = DELTA(train, part, 1)//2
    X_part1 = part1.drop("two_year_recid", axis=1)
    y_part1 = part1["two_year_recid"]
    rank = pd.DataFrame(model.decision_function(X_part1), columns = ['rank'])
    comb1 = pd.concat([part1, rank], axis=1)
    
    part0 = part[part["race"]==0]
    part0.reset_index(drop=True, inplace=True)
    delta0 = DELTA(train, part, 0)//2
    X_part0 = part0.drop("two_year_recid", axis=1)
    y_part0 = part0["two_year_recid"]
    rank = pd.DataFrame(model.decision_function(X_part0), columns = ['rank'])
    comb0 = pd.concat([part0, rank], axis=1)

    #for C, replace closest delta/2 data with duplicates from same number of AA 
    comb1 = comb1.sort_values(["rank"])
    comb1.reset_index(drop=True, inplace=True)
    
    t = sum(comb1["rank"]>0)
    l = len(comb1)
    
    keep1 = np.full(l-t, False)
    replace = np.full(delta1, True)
    keep2 = np.full(t-delta1, False)
    toKeep = np.invert(np.concatenate([keep1, replace, keep2]))
    dup1 = np.full(l-t-delta1, False)
    dup2 = np.full(t, False)
    toDup = np.concatenate([dup1, replace, dup2])
    duplicates = comb1[toDup]
    comb1 = comb1[toKeep]
    comb1 = pd.concat([comb1,duplicates], axis=0)
    LPS_labels.append(comb1)
    
    #for AA, replace closest delta/2 data with duplicates from same number of C 
    comb0 = comb0.sort_values(['rank'])
    comb0.reset_index(drop=True, inplace=True)
    
    t = sum(comb0['rank']<0)
    l = len(comb0)
    
    keep1 = np.full(t-delta0, False)
    replace = np.full(delta0, True)
    keep2 = np.full(l-t, False)
    toKeep = np.invert(np.concatenate([keep1, replace, keep2]))
    dup1 = np.full(t, False)
    dup2 = np.full(l-t-delta0, False)
    toDup = np.concatenate([dup1, replace, dup2])
    duplicates = comb0[toDup]
    comb0 = comb0[toKeep]
    comb0 = pd.concat([comb0,duplicates], axis=0)
    LPS_labels.append(comb0)

loc_lps = pd.concat(LPS_labels, axis=0)

In [98]:
features=["sex", "race", "age_cat", "c_charge_degree", "v_score_text", "score_text", "priors_count", "length_of_stay"]
X_train_lps = loc_lps[features]
Y_train_lps = loc_lps['two_year_recid'].reset_index(drop = True)
log_model = LogisticRegression(random_state=110)
LPS_model = log_model.fit(X_train_lps, Y_train_lps)
LPS_acc_train = log_model.score(X_train_lps, Y_train_lps)
LPS_acc_test = LPS_model.score(X_test, Y_test)

In [99]:
summary_A6 = {"Methods": ["Local Massaging (LR)", "Local Massaging (LR)", "Local Preferential Sampling (LR)", "Local Preferential Sampling (LR)"], 
              "Set": ["Train", "Test", "Train", "Test"],
              "Accuracy (%)": [LM_acc_train*100, LM_acc_test*100, LPS_acc_train*100, LPS_acc_test*100],
              "Calibration(%)": [calibrate_difference(race_train, LM_model.predict(X_train_mass), Y_train_mass)*100,
                                 calibrate_difference(race_test, LM_model.predict(X_test), Y_test)*100,
                                 calibrate_difference(race_train, LPS_model.predict(X_train_lps), Y_train_lps)*100,
                                 calibrate_difference(race_test, LPS_model.predict(X_test), Y_test)*100],
              "p_rule": [p_rule(race_train, LM_model.predict(X_train_mass))[0],
                                 p_rule(race_test, LM_model.predict(X_test))[0],
                                 p_rule(race_train, LPS_model.predict(X_train_lps))[0],
                                 p_rule(race_test, LPS_model.predict(X_test))[0]]}
pd.DataFrame(summary_A6)

Unnamed: 0,Methods,Set,Accuracy (%),Calibration(%),p_rule
0,Local Massaging (LR),Train,69.942341,0.849556,0.992513
1,Local Massaging (LR),Test,64.561404,1.334673,0.33768
2,Local Preferential Sampling (LR),Train,69.089997,0.354507,0.973
3,Local Preferential Sampling (LR),Test,65.614035,0.094719,0.571877


# 7. Summary Evaluation of Test Data

In [100]:
summary = {"Methods": ["Baseline LR", "Baseline SVM", 
                       "CLR", "CSVM",
                       "Local Massaging (LR)", "Local Preferential Sampling (LR)"], 
              "Accuracy (%)": [accuracy_lr_base_test*100, accuracy_svm_base_test*100, 
                               accuracy_CLR_test*100, csvm_test_accuracy*100,
                               LM_acc_test*100, LPS_acc_test*100],
              "Calibration(%)": [calibrate_difference(race_test, lr_base.predict(X_test), Y_test)*100,
                                 calibrate_difference(race_test, svm_base.predict(X_test), Y_test)*100,
                                 calibrate_difference(race_test, CLR.predict(X_test), Y_test)*100,
                                 calibrate_difference(race_test, csvm_predict_y_test, Y_test) * 100,
                                 calibrate_difference(race_test, LM_model.predict(X_test), Y_test)*100,
                                 calibrate_difference(race_test, LPS_model.predict(X_test), Y_test)*100],
              "p_rule": [p_rule(race_test, lr_base.predict(X_test))[0],
                                 p_rule(race_test, svm_base.predict(X_test))[0],
                                 p_rule(race_test, CLR.predict(X_test))[0],
                                 p_rule(race_test, csvm_predict_y_test)[0],
                                 p_rule(race_test, LM_model.predict(X_test))[0],
                                 p_rule(race_test, LPS_model.predict(X_test))[0]]}
pd.DataFrame(summary)



Unnamed: 0,Methods,Accuracy (%),Calibration(%),p_rule
0,Baseline LR,65.730994,0.393226,0.544211
1,Baseline SVM,65.847953,2.654994,0.634721
2,CLR,49.824561,8.791619,0.998936
3,CSVM,47.48538,5.43628,0.938462
4,Local Massaging (LR),64.561404,1.334673,0.33768
5,Local Preferential Sampling (LR),65.614035,0.094719,0.571877
