# Multi scoring Hyperparemeter Tuning using GCU(Generic classifier Utility)

While finetuning hyperparameter for a specific scoring strategy there is a chances other scores dropping.
e.g: Accuracy might be increasing but Precision or Recall or ROC dropping.
In order to balance the lossess in other metrics we need multiple scoring evaluation in a single graph.

For this I developed a <b><u>Generic classifier Utility</u></b> library which will display multiple scoring for different Tree based Boosting technique.
i.e: A single library can accomodate multiple Tree Boosting technique along with multiple scoring.

Source code available [here](https://github.com/KeshavShetty/kesh-utils/tree/master/KUtils/classifier) and PyPi package [here](https://pypi.org/project/kesh-utils/)

For this demo I used Adult Census Income dataset.

In [2]:
# This Python 3 environment 
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory
import os
#print(os.listdir("../input"))

from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, make_scorer, recall_score, precision_score, f1_score, roc_auc_score

from sklearn.model_selection import GridSearchCV

In [4]:
from aif360.datasets import StandardDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
import matplotlib.patches as patches
from aif360.algorithms.preprocessing import Reweighing
#from packages import *
#from ml_fairness import *
import matplotlib.pyplot as plt
import seaborn as sns



from IPython.display import Markdown, display

In [5]:
!pip install statsmodels==0.10.0rc2 --pre  # Statsmodel has sme problem with factorial in latest lib

DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Invalid requirement: '#'



In [6]:
# Install the Library (Refer: https://pypi.org/project/kesh-utils/ )
!pip install kesh-utils

Collecting kesh-utils
  Downloading https://files.pythonhosted.org/packages/d1/49/c98a0cd11386dddc8d0b7465e3e9aea45996f507be0f285b24a812484497/kesh-utils-0.4.9.tar.gz
Building wheels for collected packages: kesh-utils
  Building wheel for kesh-utils (setup.py): started
  Building wheel for kesh-utils (setup.py): finished with status 'done'
  Stored in directory: C:\Users\Usman Gohar\AppData\Local\pip\Cache\wheels\04\4f\12\161e1106e024583ba970104a2f13a0030907b5d380395ca112
Successfully built kesh-utils
Installing collected packages: kesh-utils
Successfully installed kesh-utils-0.4.9


DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.


In [7]:
# Ignore the warnings if any
import warnings  
warnings.filterwarnings('ignore')

In [8]:
# Load the dataset 
adult_income_df = pd.read_csv('../../Data/adult.csv')

In [9]:
adult_income_df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education.num,marital.status,occupation,relationship,race,sex,capital.gain,capital.loss,hours.per.week,native.country,income
0,90,?,77053,HS-grad,9,Widowed,?,Not-in-family,White,Female,0,4356,40,United-States,<=50K
1,82,Private,132870,HS-grad,9,Widowed,Exec-managerial,Not-in-family,White,Female,0,4356,18,United-States,<=50K
2,66,?,186061,Some-college,10,Widowed,?,Unmarried,Black,Female,0,4356,40,United-States,<=50K
3,54,Private,140359,7th-8th,4,Divorced,Machine-op-inspct,Unmarried,White,Female,0,3900,40,United-States,<=50K
4,41,Private,264663,Some-college,10,Separated,Prof-specialty,Own-child,White,Female,0,3900,40,United-States,<=50K


In [10]:
# Quick known cleanup for this dataset
adult_income_df['workclass']=adult_income_df['workclass'].replace('?','Unknown') # Treat ? workclass as unknown
adult_income_df = adult_income_df[adult_income_df['occupation'] != '?'] # Remove rows with occupation =?
adult_income_df['native.country']=adult_income_df['native.country'].replace('?', adult_income_df['native.country'].mode()[0]) # Replace ? with mode
adult_income_df['fnlwgt']=np.log(adult_income_df['fnlwgt']) # Convert to antural log
adult_income_df.loc[adult_income_df['native.country']!='United-States','native.country'] = 'non_usa' # Two many category level, convert just US and Non-US

In [11]:
# We will use Label encoder for all categorical variables
from sklearn import preprocessing

# encode categorical variables using Label Encoder
# select all categorical variables
df_categorical = adult_income_df.select_dtypes(include=['object'])
df_categorical.head()

# apply Label encoder to df_categorical
le = preprocessing.LabelEncoder()
df_categorical = df_categorical.apply(le.fit_transform)
df_categorical.head()

# concat df_categorical with original df
adult_income_df = adult_income_df.drop(df_categorical.columns, axis=1)
adult_income_df = pd.concat([adult_income_df, df_categorical], axis=1)

In [12]:
# Scale the numerical features using StandardScalar
from sklearn.preprocessing import StandardScaler
numerical_column_names = ['age','fnlwgt','education.num', 'capital.gain', 'capital.loss', 'hours.per.week']
scaler = StandardScaler()

adult_income_df[numerical_column_names] = scaler.fit_transform(
    adult_income_df[numerical_column_names])

In [13]:
# Final cleaned dataset 
adult_income_df.head()

Unnamed: 0,age,fnlwgt,education.num,capital.gain,capital.loss,hours.per.week,workclass,education,marital.status,occupation,relationship,race,sex,native.country,income
1,3.320351,-0.298396,-0.441111,-0.147516,10.519126,-1.914806,2,11,6,3,1,4,0,0,0
3,1.185882,-0.211235,-2.392386,-0.147516,9.395006,-0.079207,2,5,0,6,4,4,0,0,0
4,0.194878,0.796972,-0.050856,-0.147516,9.395006,-0.079207,2,15,5,9,3,4,0,0,0
5,-0.338739,0.480345,-0.441111,-0.147516,9.074533,0.337974,2,11,0,7,4,4,0,0,0
6,-0.033815,-0.099279,-1.611876,-0.147516,9.074533,-0.079207,2,0,5,0,4,4,1,0,0


In [None]:
# Prepare the data for model building and evaluation
X = adult_income_df.drop('income', axis=1)
y = adult_income_df['income'] 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, test_size=0.3, random_state=43)

# GCU in Action
### The method used is KUtils.classifier.single_hyperparameter_multiple_scoring_tuning()

In [None]:
# Load the custom library
from KUtils.classifier import generic_classifier_utils as gcu

### We will use
- DecisionTreeClassifier
- RandomForestClassifier
- XGBClassifier
- LGBMClassifier

For scoring we will use

model_scoring = {'F1': make_scorer(f1_score),
    'AUC': make_scorer(roc_auc_score),
    'Accuracy': make_scorer(accuracy_score)
}

At a time you can send single hyper parameter and multiple scoring for hyperparameter tuning.

# 1. DecisionTreeClassifier() and Hyperparameter 'max_depth' with range range(3, 21, 3)

In [None]:
from sklearn.tree import DecisionTreeClassifier

scores = gcu.single_hyperparameter_multiple_scoring_tuning(
    X_train, y_train,
    cv_folds=5, 
    hyper_parameter_name='max_depth',
    hyper_parameter_range = range(3, 21, 3),
    model_scoring = {'F1': make_scorer(f1_score),
                     'AUC': make_scorer(roc_auc_score),
                     'Accuracy': make_scorer(accuracy_score)        #  'Accuracy': make_scorer(accuracy_score),
                    },
    refit='AUC',
    classifier_algo=DecisionTreeClassifier())

### In a single chart you can see which scoring is improving and which one deteriorating

In the above chart at max_depth=9 all three (AUC, Accuracy, F1) are at its best**

# 2. RandonForestClassifier() and Hyperparameter 'n_estimator' with range range(5, 200, 25)

In [None]:
from sklearn.ensemble import RandomForestClassifier

scores = gcu.single_hyperparameter_multiple_scoring_tuning(
    X_train, y_train,
    cv_folds=10, 
    hyper_parameter_name='n_estimators',
    hyper_parameter_range =range(5, 200, 25),   
    model_scoring = {'F1': make_scorer(f1_score),
                     'AUC': make_scorer(roc_auc_score),
                     'Accuracy': make_scorer(accuracy_score)        #  'Accuracy': make_scorer(accuracy_score),
                    },
    refit='AUC',
    classifier_algo=RandomForestClassifier(max_depth=4))

# 3. XGBClassifier() and Hyperparameter 'learning_rate' with values [0.1, 0.2, 0.3, 0.4, 0.5, 0.9]

In [None]:
from xgboost.sklearn import XGBClassifier

scores = gcu.single_hyperparameter_multiple_scoring_tuning(
    X_train, y_train,
    cv_folds=10, 
    hyper_parameter_name='learning_rate',
    hyper_parameter_range = [0.1, 0.2, 0.3, 0.4, 0.5, 0.9],
    model_scoring = {'F1': make_scorer(f1_score),
                     'AUC': make_scorer(roc_auc_score),
                     'Accuracy': make_scorer(accuracy_score)        #  'Accuracy': make_scorer(accuracy_score),
                    },
    refit='AUC',
    classifier_algo=XGBClassifier(objective= 'binary:logistic'))

#### Accuracy best at learning rate 0.3, however F1 best at 0.4

# 4. lightgbm - LGBMClassifier() and Hyperparameter 'num_leaves' with values [2, 5, 10, 50, 100, 200]

In [None]:
import lightgbm as lgb

scores = gcu.single_hyperparameter_multiple_scoring_tuning(
    X_train, y_train,
    cv_folds=10,
    hyper_parameter_name='num_leaves',
    hyper_parameter_range = [2, 5, 10, 50, 100, 200],
    model_scoring = {'F1': make_scorer(f1_score),
                     'AUC': make_scorer(roc_auc_score),
                     'Accuracy': make_scorer(accuracy_score)        #  'Accuracy': make_scorer(accuracy_score),
                    },
    refit='Accuracy',
    classifier_algo=lgb.LGBMClassifier(n_jobs=-1))

#### I will stop here. This is just a demo how to use the library for using multiple Classifer with different scoring.

#### You can try to finetune for different classifier with different scoring and different hyperparamaters.

### Check other methods in the library.


### Upvote if you liked the Kernel. Leave comments if any

## Fairness

In [14]:
# This DataFrame is created to stock differents models and fair metrics that we produce in this notebook
algo_metrics = pd.DataFrame(columns=['model', 'fair_metrics', 'prediction', 'probs'])

def add_to_df_algo_metrics(algo_metrics, model, fair_metrics, preds, probs, name):
    return algo_metrics.append(pd.DataFrame(data=[[model, fair_metrics, preds, probs]], columns=['model', 'fair_metrics', 'prediction', 'probs'], index=[name]))

In [15]:
def fair_metrics(dataset, pred, pred_is_dataset=False):
    if pred_is_dataset:
        dataset_pred = pred
    else:
        dataset_pred = dataset.copy()
        dataset_pred.labels = pred
    
    cols = ['statistical_parity_difference', 'equal_opportunity_difference', 'average_abs_odds_difference',  'disparate_impact', 'theil_index']
    obj_fairness = [[0,0,0,1,0]]
    
    fair_metrics = pd.DataFrame(data=obj_fairness, index=['objective'], columns=cols)
    
    for attr in dataset_pred.protected_attribute_names:
        idx = dataset_pred.protected_attribute_names.index(attr)
        privileged_groups =  [{attr:dataset_pred.privileged_protected_attributes[idx][0]}] 
        unprivileged_groups = [{attr:dataset_pred.unprivileged_protected_attributes[idx][0]}] 
        
        classified_metric = ClassificationMetric(dataset, 
                                                     dataset_pred,
                                                     unprivileged_groups=unprivileged_groups,
                                                     privileged_groups=privileged_groups)

        metric_pred = BinaryLabelDatasetMetric(dataset_pred,
                                                     unprivileged_groups=unprivileged_groups,
                                                     privileged_groups=privileged_groups)

        acc = classified_metric.accuracy()

        row = pd.DataFrame([[metric_pred.mean_difference(),
                                classified_metric.equal_opportunity_difference(),
                                classified_metric.average_abs_odds_difference(),
                                metric_pred.disparate_impact(),
                                classified_metric.theil_index()]],
                           columns  = cols,
                           index = [attr]
                          )
        fair_metrics = fair_metrics.append(row)    
    
    fair_metrics = fair_metrics.replace([-np.inf, np.inf], 2)
        
    return fair_metrics

def plot_fair_metrics(fair_metrics):
    fig, ax = plt.subplots(figsize=(20,4), ncols=5, nrows=1)

    plt.subplots_adjust(
        left    =  0.125, 
        bottom  =  0.1, 
        right   =  0.9, 
        top     =  0.9, 
        wspace  =  .5, 
        hspace  =  1.1
    )

    y_title_margin = 1.2

    plt.suptitle("Fairness metrics", y = 1.09, fontsize=20)
    sns.set(style="dark")

    cols = fair_metrics.columns.values
    obj = fair_metrics.loc['objective']
    size_rect = [0.2,0.2,0.2,0.4,0.25]
    rect = [-0.1,-0.1,-0.1,0.8,0]
    bottom = [-1,-1,-1,0,0]
    top = [1,1,1,2,1]
    bound = [[-0.1,0.1],[-0.1,0.1],[-0.1,0.1],[0.8,1.2],[0,0.25]]

    display(Markdown("### Check bias metrics :"))
    display(Markdown("A model can be considered bias if just one of these five metrics show that this model is biased."))
    for attr in fair_metrics.index[1:len(fair_metrics)].values:
        display(Markdown("#### For the %s attribute :"%attr))
        check = [bound[i][0] < fair_metrics.loc[attr][i] < bound[i][1] for i in range(0,5)]
        display(Markdown("With default thresholds, bias against unprivileged group detected in **%d** out of 5 metrics"%(5 - sum(check))))

    for i in range(0,5):
        plt.subplot(1, 5, i+1)
        ax = sns.barplot(x=fair_metrics.index[1:len(fair_metrics)], y=fair_metrics.iloc[1:len(fair_metrics)][cols[i]])
        
        for j in range(0,len(fair_metrics)-1):
            a, val = ax.patches[j], fair_metrics.iloc[j+1][cols[i]]
            marg = -0.2 if val < 0 else 0.1
            ax.text(a.get_x()+a.get_width()/5, a.get_y()+a.get_height()+marg, round(val, 3), fontsize=15,color='black')

        plt.ylim(bottom[i], top[i])
        plt.setp(ax.patches, linewidth=0)
        ax.add_patch(patches.Rectangle((-5,rect[i]), 10, size_rect[i], alpha=0.3, facecolor="green", linewidth=1, linestyle='solid'))
        plt.axhline(obj[i], color='black', alpha=0.3)
        plt.title(cols[i])
        ax.set_ylabel('')    
        ax.set_xlabel('')

In [16]:
def get_fair_metrics_and_plot(data, model, plot=False, model_aif=False):
    pred = model.predict(data).labels if model_aif else model.predict(data.features)
    # fair_metrics function available in the metrics.py file
    fair = fair_metrics(data, pred)

    if plot:
        # plot_fair_metrics function available in the visualisations.py file
        # The visualisation of this function is inspired by the dashboard on the demo of IBM aif360 
        plot_fair_metrics(fair)
        display(fair)
    
    return fair

In [17]:
adult_income_df

Unnamed: 0,age,fnlwgt,education.num,capital.gain,capital.loss,hours.per.week,workclass,education,marital.status,occupation,relationship,race,sex,native.country,income
1,3.320351,-0.298396,-0.441111,-0.147516,10.519126,-1.914806,2,11,6,3,1,4,0,0,0
3,1.185882,-0.211235,-2.392386,-0.147516,9.395006,-0.079207,2,5,0,6,4,4,0,0,0
4,0.194878,0.796972,-0.050856,-0.147516,9.395006,-0.079207,2,15,5,9,3,4,0,0,0
5,-0.338739,0.480345,-0.441111,-0.147516,9.074533,0.337974,2,11,0,7,4,4,0,0,0
6,-0.033815,-0.099279,-1.611876,-0.147516,9.074533,-0.079207,2,0,5,0,4,4,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,-1.253512,1.049090,-0.050856,-0.147516,-0.219179,-0.079207,2,15,4,10,1,4,1,0,0
32557,-0.872356,0.752134,0.729654,-0.147516,-0.219179,-0.246080,2,7,2,12,5,4,0,0,0
32558,0.118647,-0.059946,-0.441111,-0.147516,-0.219179,-0.079207,2,11,2,6,0,4,1,0,1
32559,1.490806,-0.085522,-0.441111,-0.147516,-0.219179,-0.079207,2,11,6,0,4,4,0,0,0


In [18]:
#print(X)


#combine_final = [train_df, test_df]
#result = pd.concat(combine_final)
#print(result.ifany())
#print(result)
privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]
dataset_orig = StandardDataset(adult_income_df,
                                  label_name='income',
                                  protected_attribute_names=['sex'],
                                  favorable_classes=[1],
                                  privileged_classes=[[1]])

#metric_orig_train = BinaryLabelDatasetMetric(dataset_orig, 
#                                             unprivileged_groups=unprivileged_groups,
#                                             privileged_groups=privileged_groups)
#display(Markdown("#### Original training dataset"))
#print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())


In [19]:
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

#### Original training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.200292


In [20]:
import ipynbname
nb_fname = ipynbname.name()
nb_path = ipynbname.path()

from xgboost import XGBClassifier
import pickle

data_orig_train, data_orig_test = dataset_orig.split([0.7], shuffle=True)
X_train = data_orig_train.features
y_train = data_orig_train.labels.ravel()

X_test = data_orig_test.features
y_test = data_orig_test.labels.ravel()
num_estimators = 100

model = XGBClassifier(n_estimators= 1,objective= 'binary:logistic', learning_rate =0.3)

mdl = model.fit(X_train, y_train)
with open('../../Results/XGBoost/' + nb_fname + '.pkl', 'wb') as f:
        pickle.dump(mdl, f)

with open('../../Results/XGBoost/' + nb_fname + '_Train' + '.pkl', 'wb') as f:
    pickle.dump(data_orig_train, f) 
    
with open('../../Results/XGBoost/' + nb_fname + '_Test' + '.pkl', 'wb') as f:
    pickle.dump(data_orig_test, f) 

In [21]:
from csv import writer
from sklearn.metrics import accuracy_score, f1_score

final_metrics = []
accuracy = []
f1= []

for i in range(1,num_estimators+1):
    
    model = XGBClassifier(n_estimators= i, objective= 'binary:logistic', learning_rate =0.3)
    mdl = model.fit(X_train, y_train)
    yy = mdl.predict(X_test)
    accuracy.append(accuracy_score(y_test, yy))
    f1.append(f1_score(y_test, yy))
    fair = get_fair_metrics_and_plot(data_orig_test, mdl)                           
    fair_list = fair.iloc[1].tolist()
    fair_list.insert(0, i)
    final_metrics.append(fair_list)


In [22]:
import numpy as np
final_result = pd.DataFrame(final_metrics)
final_result[4] = np.log(final_result[4])
final_result = final_result.transpose()
final_result.loc[0] = f1  # add f1 and acc to df
acc = pd.DataFrame(accuracy).transpose()
acc = acc.rename(index={0: 'accuracy'})
final_result = pd.concat([acc,final_result])
final_result = final_result.rename(index={0: 'f1', 1: 'statistical_parity_difference', 2: 'equal_opportunity_difference', 3: 'average_abs_odds_difference', 4: 'disparate_impact', 5: 'theil_index'})
final_result.columns = ['T' + str(col) for col in final_result.columns]
final_result.insert(0, "classifier", final_result['T' + str(num_estimators - 1)])   ##Add final metrics add the beginning of the df
final_result.to_csv('../../Results/XGBoost/' + nb_fname + '.csv')
final_result

Unnamed: 0,classifier,T0,T1,T2,T3,T4,T5,T6,T7,T8,...,T90,T91,T92,T93,T94,T95,T96,T97,T98,T99
accuracy,0.866645,0.853733,0.85536,0.85612,0.857096,0.857856,0.860135,0.861003,0.861979,0.86556,...,0.86773,0.86773,0.867839,0.867405,0.867296,0.866862,0.866319,0.866536,0.866753,0.866645
f1,0.712649,0.657172,0.655467,0.658775,0.660829,0.663585,0.67109,0.673464,0.675013,0.698906,...,0.712974,0.713379,0.71395,0.712876,0.712573,0.711498,0.710798,0.712079,0.712547,0.712649
statistical_parity_difference,-0.188499,-0.113481,-0.151758,-0.137031,-0.147017,-0.138807,-0.147283,-0.142441,-0.14499,-0.179575,...,-0.184705,-0.185165,-0.185963,-0.185644,-0.186979,-0.188155,-0.188774,-0.189195,-0.189195,-0.188499
equal_opportunity_difference,-0.022432,0.128475,-0.047136,0.0241,-0.010563,0.025373,-0.001031,0.018016,0.010652,-0.056103,...,-0.013673,-0.015196,-0.016719,-0.018878,-0.025735,-0.038941,-0.039449,-0.031956,-0.028781,-0.022432
average_abs_odds_difference,0.048587,0.078961,0.047826,0.032606,0.028811,0.033695,0.023287,0.030016,0.027122,0.060549,...,0.042796,0.043601,0.044595,0.045486,0.04922,0.055634,0.05628,0.053116,0.051718,0.048587
disparate_impact,-1.144513,-0.750015,-1.165519,-0.993335,-1.098828,-1.004127,-1.067321,-1.015212,-1.048543,-1.207883,...,-1.135936,-1.134998,-1.137919,-1.136752,-1.14957,-1.161941,-1.161489,-1.154963,-1.154963,-1.144513
theil_index,0.105018,0.129112,0.130559,0.129274,0.128669,0.127639,0.124889,0.124074,0.123722,0.112659,...,0.105514,0.105259,0.10497,0.105366,0.105486,0.105883,0.105974,0.105309,0.105153,0.105018
