# FairML: Demo<br>
Predict income <=50K: 0, >50K: 1

# Contents

[FairML: Demo](#FairML:-Demo)<br>
[Contents](#Contents)<br>
* [1. Bias Mitigation: Bias Mitigation 01](#1.-Bias-Mitigation:-Bias-Mitigation-01)<br>
  * [1.1. Dataset Adult Dataset](#1.1.-Dataset-Adult-Dataset)<br>
      * [1.1.1. Original Dataset](#1.1.1.-Original-Dataset)<br>
          * [1.1.1.1. Classifier DecisionTreeClassifier, Parameters: criterion='gini', max_depth=4](#1.1.1.1.-Original-Dataset:-Classifier-DecisionTreeClassifier,-Parameters:-criterion='gini',-max_depth=4)<br>
              * [1.1.1.1.1. Bias Metrics](#1.1.1.1.1.-Original-Dataset:-Bias-Metrics)<br>
      * [1.1.2. Mitigate Bias using DisparateImpactRemover](#1.1.2.-Mitigate-Bias-using-DisparateImpactRemover)<br>
          * [1.1.2.1. Classifier DecisionTreeClassifier, Parameters: criterion='gini', max_depth=4](#1.1.2.1.-After-mitigation-Dataset:-Classifier-DecisionTreeClassifier,-Parameters:-criterion='gini',-max_depth=4)<br>
              * [1.1.2.1.1. Bias Metrics](#1.1.2.1.1.-After-mitigation:-Bias-Metrics)<br>
      * [1.1.3. Mitigate Bias using Reweighing](#1.1.3.-Mitigate-Bias-using-Reweighing)<br>
          * [1.1.3.1. Classifier DecisionTreeClassifier, Parameters: criterion='gini', max_depth=4](#1.1.3.1.-After-mitigation-Dataset:-Classifier-DecisionTreeClassifier,-Parameters:-criterion='gini',-max_depth=4)<br>
              * [1.1.3.1.1. Bias Metrics](#1.1.3.1.1.-After-mitigation:-Bias-Metrics)<br>
  * [1.2. Summary](#1.2.-Summary)

Load dependencies.

In [1]:
from fairml import *
import inspect
import numpy as np
import pandas as pd
from sklearn import tree
from sklearn.preprocessing import MaxAbsScaler
from sklearn.tree import DecisionTreeClassifier
# from sklearn.linear_model import LogisticRegression
# from sklearn.svm import LinearSVC
# from sklearn.neighbors import KNeighborsClassifier
# from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric
from aif360.algorithms.preprocessing import *
from aif360.algorithms.inprocessing import *
from aif360.algorithms.postprocessing import *
from aif360.explainers import MetricTextExplainer
from aif360.datasets import StandardDataset
import matplotlib.pyplot as plt
from collections import defaultdict
from IPython.display import Markdown, display
from IPython import get_ipython

In [2]:
fairml = FairML()

In [3]:
print("========================")
print("FairML: Demo")
print("========================")
print("Description:")
print("Predict income <=50K: 0, >50K: 1")

FairML: Demo
Description:
Predict income <=50K: 0, >50K: 1


## [1.](#Contents) Bias Mitigation: Bias Mitigation 01

In [4]:
bm = fairml.add_bias_mitigation(BiasMitigation())

In [5]:
print("")
print("========================")
print("Bias Mitigation: Bias Mitigation 01")
print("------------------------")


Bias Mitigation: Bias Mitigation 01
------------------------


### [1.1.](#Contents) Dataset Adult Dataset

In [6]:
print("")
print("Dataset: Adult Dataset")
print("-------------")


Dataset: Adult Dataset
-------------


#### [1.1.1.](#Contents) Original Dataset

In [7]:
bm.predicted_attribute = 'income-per-year'
bm.protected_attributes = ['sex', 'race']
bm.favorable_class = 1
bm.privileged_class = 1
bm.unprivileged_class = 0
bm.dropped_attributes = []
bm.na_values = []
bm.training_size = 7.0    
bm.test_size = 2.0
bm.validation_size = 3.0
bm.total_size = bm.training_size + bm.test_size + bm.validation_size
bm.categorical_features = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'native-country']
bm.default_mappings = None

Load dataset.

In [8]:
bm.resource = "data/adult.data.numeric.csv"
bm.data = pd.read_csv(bm.resource, header=0)
bm.dataset_original = StandardDataset(df=bm.data, label_name=bm.predicted_attribute, 
                favorable_classes=[bm.favorable_class],
                protected_attribute_names=bm.protected_attributes,
                privileged_classes=[[bm.privileged_class]],
                instance_weights_name=None,
                categorical_features=bm.categorical_features,
                features_to_keep=[],
                features_to_drop=bm.dropped_attributes,
                na_values=bm.na_values, 
                custom_preprocessing=None,
                metadata=bm.default_mappings)
bm.dataset_original_train, bm.dataset_original_valid, bm.dataset_original_test = bm.dataset_original.split([bm.training_size/bm.total_size, (bm.training_size/bm.total_size) + (bm.validation_size/bm.total_size)], shuffle=True)
bm.privileged_groups = [{bm.protected_attributes[0] : bm.privileged_class}]
bm.unprivileged_groups = [{bm.protected_attributes[0] : bm.unprivileged_class}]

##### [1.1.1.1.](#Contents) Original Dataset: Classifier DecisionTreeClassifier, Parameters: criterion='gini', max_depth=4

In [9]:
print("")
print("Original Dataset: Classifier DecisionTreeClassifier, Parameters: criterion='gini', max_depth=4")
print("-------------")


Original Dataset: Classifier DecisionTreeClassifier, Parameters: criterion='gini', max_depth=4
-------------


Train the model from the original train data

In [10]:
classifier = DecisionTreeClassifier(criterion='gini', max_depth=4)
model_original_train = bm.train(bm.dataset_original_train,  classifier)

###### [1.1.1.1.1.](#Contents) Original Dataset: Bias Metrics

In [11]:
print("Original Bias Metrics")
dataset_original_train_pred = bm.create_predicted_dataset(bm.dataset_original_train, model_original_train)

Original Bias Metrics


In [12]:
bm.init_new_result("Original", "Adult Dataset", "DecisionTreeClassifier")

In [13]:
bm.measure_bias("accuracy", bm.dataset_original_train, dataset_original_train_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("disparate_impact", bm.dataset_original_train, dataset_original_train_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("statistical_parity_difference", bm.dataset_original_train, dataset_original_train_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("average_odds_difference", bm.dataset_original_train, dataset_original_train_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("generalized_entropy_index", bm.dataset_original_train, dataset_original_train_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("theil_index", bm.dataset_original_train, dataset_original_train_pred, bm.privileged_groups, bm.unprivileged_groups)

After mitigation accuracy: 0.835562

After mitigation explainer: Classification accuracy (ACC): 0.83556210733214

After mitigation disparate_impact: 0.302107

After mitigation explainer: Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.30210671187518606

After mitigation statistical_parity_difference: -0.132960

After mitigation explainer: Statistical parity difference (probability of favorable outcome for unprivileged instances - probability of favorable outcome for privileged instances): -0.1329600082096691

After mitigation average_odds_difference: -0.056464

After mitigation explainer: Average odds difference (average of TPR difference and FPR difference, 0 = equality of odds): -0.056464348862829444

After mitigation generalized_entropy_index: 0.094805

After mitigation explainer: Generalized entropy index (GE(alpha)): 0.0948052999611521

After mitigation theil_index: 0.152701

After mitigation explainer: Theil index (generalized entropy index with alpha = 1): 0.15270107148998197

#### [1.1.2.](#Contents) Mitigate Bias using DisparateImpactRemover  

In [14]:
print("")
print("Mitigate Bias using DisparateImpactRemover")
print("-------------")
mitigation_method = bm.create_mitigation_method(DisparateImpactRemover, )
dataset_mitigated_train = mitigation_method.fit_transform(bm.dataset_original_train)
dataset_mitigated_valid = mitigation_method.fit_transform(bm.dataset_original_valid)
dataset_mitigated_test = mitigation_method.fit_transform(bm.dataset_original_test)


Mitigate Bias using DisparateImpactRemover
-------------


##### [1.1.2.1.](#Contents) After-mitigation Dataset: Classifier DecisionTreeClassifier, Parameters: criterion='gini', max_depth=4

In [15]:
print("")
print("After-mitigation Training: DecisionTreeClassifier, Parameters: criterion='gini', max_depth=4")
print("-------------")


After-mitigation Training: DecisionTreeClassifier, Parameters: criterion='gini', max_depth=4
-------------


Train the model from the after-mitigation train data

In [16]:
classifier = DecisionTreeClassifier(criterion='gini', max_depth=4)
model_mitigated_train = bm.train(dataset_mitigated_train,  classifier)

###### [1.1.2.1.1.](#Contents) After-mitigation: Bias Metrics

In [17]:
print("After-mitigation Metrics")
dataset_mitigated_train_pred = bm.create_predicted_dataset(dataset_mitigated_train, model_mitigated_train)
dataset_mitigated_valid_pred = bm.create_predicted_dataset(dataset_mitigated_valid, model_mitigated_train)
dataset_mitigated_test_pred = bm.create_predicted_dataset(dataset_mitigated_test, model_mitigated_train)

After-mitigation Metrics


In [18]:
bm.init_new_result("DisparateImpactRemover", "Adult Dataset", "DecisionTreeClassifier")

In [19]:
bm.measure_bias("accuracy", dataset_mitigated_test, dataset_mitigated_test_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("disparate_impact", dataset_mitigated_test, dataset_mitigated_test_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("statistical_parity_difference", dataset_mitigated_test, dataset_mitigated_test_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("average_odds_difference", dataset_mitigated_test, dataset_mitigated_test_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("generalized_entropy_index", dataset_mitigated_test, dataset_mitigated_test_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("theil_index", dataset_mitigated_test, dataset_mitigated_test_pred, bm.privileged_groups, bm.unprivileged_groups)

After mitigation accuracy: 0.835892

After mitigation explainer: Classification accuracy (ACC): 0.8358923965114851

After mitigation disparate_impact: 0.352922

After mitigation explainer: Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.35292183031984575

After mitigation statistical_parity_difference: -0.119268

After mitigation explainer: Statistical parity difference (probability of favorable outcome for unprivileged instances - probability of favorable outcome for privileged instances): -0.11926772906097308

After mitigation average_odds_difference: -0.025650

After mitigation explainer: Average odds difference (average of TPR difference and FPR difference, 0 = equality of odds): -0.02564973062259254

After mitigation generalized_entropy_index: 0.094544

After mitigation explainer: Generalized entropy index (GE(alpha)): 0.09454361691751112

After mitigation theil_index: 0.152194

After mitigation explainer: Theil index (generalized entropy index with alpha = 1): 0.15219364795664772

#### [1.1.3.](#Contents) Mitigate Bias using Reweighing  

In [20]:
print("")
print("Mitigate Bias using Reweighing")
print("-------------")
mitigation_method = bm.create_mitigation_method(Reweighing, )
dataset_mitigated_train = mitigation_method.fit_transform(bm.dataset_original_train)
dataset_mitigated_valid = mitigation_method.fit_transform(bm.dataset_original_valid)
dataset_mitigated_test = mitigation_method.fit_transform(bm.dataset_original_test)


Mitigate Bias using Reweighing
-------------


##### [1.1.3.1.](#Contents) After-mitigation Dataset: Classifier DecisionTreeClassifier, Parameters: criterion='gini', max_depth=4

In [21]:
print("")
print("After-mitigation Training: DecisionTreeClassifier, Parameters: criterion='gini', max_depth=4")
print("-------------")


After-mitigation Training: DecisionTreeClassifier, Parameters: criterion='gini', max_depth=4
-------------


Train the model from the after-mitigation train data

In [22]:
classifier = DecisionTreeClassifier(criterion='gini', max_depth=4)
model_mitigated_train = bm.train(dataset_mitigated_train,  classifier)

###### [1.1.3.1.1.](#Contents) After-mitigation: Bias Metrics

In [23]:
print("After-mitigation Metrics")
dataset_mitigated_train_pred = bm.create_predicted_dataset(dataset_mitigated_train, model_mitigated_train)
dataset_mitigated_valid_pred = bm.create_predicted_dataset(dataset_mitigated_valid, model_mitigated_train)
dataset_mitigated_test_pred = bm.create_predicted_dataset(dataset_mitigated_test, model_mitigated_train)

After-mitigation Metrics


In [24]:
bm.init_new_result("Reweighing", "Adult Dataset", "DecisionTreeClassifier")

In [25]:
bm.measure_bias("accuracy", dataset_mitigated_test, dataset_mitigated_test_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("disparate_impact", dataset_mitigated_test, dataset_mitigated_test_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("statistical_parity_difference", dataset_mitigated_test, dataset_mitigated_test_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("average_odds_difference", dataset_mitigated_test, dataset_mitigated_test_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("generalized_entropy_index", dataset_mitigated_test, dataset_mitigated_test_pred, bm.privileged_groups, bm.unprivileged_groups)
bm.measure_bias("theil_index", dataset_mitigated_test, dataset_mitigated_test_pred, bm.privileged_groups, bm.unprivileged_groups)

After mitigation accuracy: 0.841921

After mitigation explainer: Classification accuracy (ACC): 0.8419207681207685

After mitigation disparate_impact: 3.396592

After mitigation explainer: Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 3.3965917452433647

After mitigation statistical_parity_difference: 0.167686

After mitigation explainer: Statistical parity difference (probability of favorable outcome for unprivileged instances - probability of favorable outcome for privileged instances): 0.1676858511049902

After mitigation average_odds_difference: 0.271833

After mitigation explainer: Average odds difference (average of TPR difference and FPR difference, 0 = equality of odds): 0.2718326561484964

After mitigation generalized_entropy_index: 0.107999

After mitigation explainer: Generalized entropy index (GE(alpha)): 0.10799925107806518

After mitigation theil_index: 0.178945

After mitigation explainer: Theil index (generalized entropy index with alpha = 1): 0.17894546103655312

### [1.2.](#Contents) Summary

In [26]:
table = bm.display_summary()
table.style.apply(bm.highlight_fairest_values, axis=1)


0.16443789266786002 0.15807923187923145 0.16443789266786002
0.16410760348851494 0.15807923187923145 0.16443789266786002
0.15807923187923145 0.15807923187923145 0.16443789266786002
0.6978932881248139 0.6470781696801542 2.3965917452433647
0.6470781696801542 0.6470781696801542 2.3965917452433647
2.3965917452433647 0.6470781696801542 2.3965917452433647
0.1329600082096691 0.11926772906097308 0.1676858511049902
0.11926772906097308 0.11926772906097308 0.1676858511049902
0.1676858511049902 0.11926772906097308 0.1676858511049902


Original Data size: 48842</br>Predicted attribute: income-per-year</br>Protected attributes: sex, race</br>Favourable classes: 1</br>Dropped attributes:  </br>Training data size (ratio): 7.0</br>Test data size (ratio): 2.0</br>Validation data size (ratio): 3.0

Unnamed: 0,Mitigation,Dataset,Classifier,accuracy,disparate_impact,statistical_parity_difference,average_odds_difference,generalized_entropy_index,theil_index
1,Original,Adult Dataset(7.0:2.0:3.0),DecisionTreeClassifier,0.835562,0.302107,-0.13296,-0.056464,0.094805,0.152701
2,DisparateImpactRemover,Adult Dataset(7.0:2.0:3.0),DecisionTreeClassifier,0.835892,0.352922,-0.119268,-0.02565,0.094544,0.152194
3,Reweighing,Adult Dataset(7.0:2.0:3.0),DecisionTreeClassifier,0.841921,3.396592,0.167686,0.271833,0.107999,0.178945
