# Final Project: Automated solutions for algorithmic bias, Python Portion
##### by: Cheng-Yu (Ben) Chiang from MATH 157 Winter 2022

## Resource used: 
- https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G
- https://en.wikipedia.org/wiki/Disparate_impact#:~:text=Disparate%20impact%20in%20United%20States,or%20landlords%20are%20formally%20neutral
- https://github.com/Trusted-AI/AIF360
- https://dl.acm.org/doi/pdf/10.1145/3278721.3278779
- https://aif360.readthedocs.io/en/stable/modules/generated/aif360.datasets.BinaryLabelDataset.html#aif360.datasets.BinaryLabelDataset

## What is algorithmic bias? 
- Algorithmic bias refers error made by computer algorithms that create unfair outcomes, usually results in privileging one particular group of users over the other.
- It can emerge from intentional / unintentional factors such as the design of the algorithm, biased sampling of the data, etc.
- Algorithms have a strong social and political impact on our lives: presidential polls, search engines, credit score, advertisement, etc.

## Example: Amazon's AI recruiting tool is biased against women
- One of the examples of algorithmic bias is the famous case of amazon's AI recruiting tool biasing against women. Amazon, like many big tech companies, has been using automated tools to recruit top talents from the huge pool of candidates that they receive applications from. In an ideal situation, all candidates will be judged fairly based off their abilities.
- However, in 2015, Amazon discovered that its hiring algorithm for software developers are heavily biased against women. 
- This is because the machine learning algorithm were trained to observe pattern from previous successful hires and most of these data points come from men.
- Eventually, the team that was in charge of this project was disbanded.
- Highlights the impact in algorithmic bias and the difficulty in solving such problems. 

## The challenge
- Bias is a human nature
- How can we make sure our algorithms does not amplify our inherent bias?

### Data input and cleaning

In [1]:
import pandas as pd
from aif360.algorithms.preprocessing import DisparateImpactRemover
from aif360.algorithms.preprocessing.reweighing import Reweighing
from aif360.datasets import BinaryLabelDataset
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np

pd.set_option("display.precision", 5)


col_names = ["Age", "Workclass", "fnlwgt", "Education", "Education-Num", "Martial Status",
             "Occupation", "Relationship", "Race", "Sex", "Capital Gain", "Capital Loss",
             "Hours per week", "Country", "Target"]

train = pd.read_csv("raw_dataset/adult.data", skiprows=0, names=col_names, header=None)
test = pd.read_csv("raw_dataset/adult.test", skiprows=1, names=col_names, header=None)

df = pd.concat([train, test]).reset_index(drop=True)
df = df.drop(['Education', 'fnlwgt', 'Country'], axis=1)
df['Sex'] = df['Sex'].str.strip()
df['Target'] = df['Target'].str.strip()

df['Target'] = df['Target'].replace(['<=50K', '<=50K.'], 0)
df['Target'] = df['Target'].replace(['>50K', '>50K.'], 1)
df.to_csv("parsed_dataset/complete_data.csv")


pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.


2022-03-18 02:24:44.916791: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-18 02:24:44.916828: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


In [2]:
def summarize(df, gender: str):
    gender_df = df[df['Sex'] == gender]
    print("="*30)
    print(gender, "<=50K")
    print("count:", sum(gender_df['Target'] == 0))
    print("prob: ",  sum(gender_df['Target'] == 0) / len(gender_df))
    print("="*30)
    print(gender, ">50K")
    print("count:", sum(gender_df['Target'] == 1))
    print("prob: ",  sum(gender_df['Target'] == 1) / len(gender_df))
    return gender_df
     
# key should be "Male" or "Female"
male = summarize(df, 'Male')
female = summarize(df, 'Female')

Male <=50K
count: 22732
prob:  0.6962327718223583
Male >50K
count: 9918
prob:  0.3037672281776417
Female <=50K
count: 14423
prob:  0.8907485177865613
Female >50K
count: 1769
prob:  0.10925148221343874


As you can see, there are more male represented in the >50k region. Based on the dataset, about 3 in 10 male adult earns more than 50K. On the other hand, only 1 in 10 female adults earn more than 50K. 
Clearly, something is wrong. Whether or not this is an accurate representation of male and female in the workforce is up for debate. However, as an engineer / scientist our job is to make sure that the algorithm that we designed isn't biased against either group for whatever reason.

## Measuring Bias

To measure bias, we will not introduce some common metrics. Including Statistical Parity Difference, Equal Opportunity Difference, Average Odds Difference, and Disparate Impact Ratio. Each metric can be interpreted individually or together to give us a insight on whether the dataset / model is biased and in what ways. 

The first 3 will be introduced in the julia portion of the project and the last metric, disparate impact ratio, will be introduced after this section. 

#### **If you are reading this presentation instead of being presented to, please head to the Julia portion of this presentation**

<br></br>
### Disparate impact metric
Now, in order to see if our method is working, we need a way to measure the bias in our dataset / population. One good way to do it is using **disparate impact** ratio.

Disparate impact ratio is a metric formally defined by the United States labor law by to evaluate fairness based on a predefined positive output. The general equation for the disparate impact is
$$
\frac{P(Y=1|\text{unprivileged})}{P(Y=1|\text{privileged})}
$$
We define the positive output as >50K income (Y = 1) and assume that the unprivileged group to be female based on its significantly higher <=50K income probability compared to male. Conversely, we assume that male is the privileged group.

So, the disparate impact metric becomes: 

$$
\frac{P(\text{income}>\text{50K}|\text{women})}{P(\text{income}>\text{50K}|\text{men})}
$$

In [3]:
(sum(female['Target'] == 1) / len(female)) / (sum(male['Target'] == 1) / len(male))

0.3596552625800337

## Solution 1: Reweighting (Preprocessing)
One way to mitigate the bias is by modifying the data before the training has taken place.
A good method that is commonly used is reweighting. This refers to giving weights to different row features based on their privilege status and ourcome. 

Below, we will layout the foundation of such method by hand calculating the weights that should be given to different classes / outcomes.

### The 4/5 Rule
The typical standard for an acceptable disparate impact rate that is the four-fifths rule:
- if the unprivileged group gets a positive outcome (income > 50K) less than 80% of their proportion of the privileged group, then this counts as a disparate impact violation
- this means that our calculation results shown above should be > 0.8 for it to not be a violation
- our disparate impact ratio of `0.358` means that there is clearly an bias against women in terms of income based on the sampling in this dataset

## Risk of Training with Biased data

The risk of training with biased data lies in the possibility of the model amplifying the mistake. 

As a machine learning model, it's goal is to maximum its accuracy. Therefore, it will be naturally intended to lean towards the privileged group. 

An example to understand this is: 
- say we have a dataset with 10000 samples, 1000 of them has gender = female and the rest is marked male.
- If our model were to predict with 0.5 probability (randomly predicting), then it's expected accuracy is only 0.9 * 0.5 + 0.1 * 0.5 = 0.5 (50%)
- However, if our model predicts male all the time, then it will be right 90% of the time. 
- Therefore, the model is inclined to be biased towards male simply because it is more well represented in the dataset. 

Below we provide a real example of how classifying using biased data can lead to worse biasing.

In [4]:
categorical_df = pd.get_dummies(df)
labels = categorical_df['Target']
features = categorical_df.drop(['Target'], axis=1)

scaler = StandardScaler()
scaled_df = pd.DataFrame(scaler.fit_transform(features), columns=features.columns)

x_train, x_test, y_train, y_test = train_test_split(scaled_df, labels, test_size=0.3, random_state=42)

In [5]:
reg = LogisticRegression()
reg.fit(x_train, y_train)

LogisticRegression()

In [6]:
def test_model(x, y, df):
    orig_matching_df = df.iloc[x.index, :]
    pred = reg.predict(x)
    probs = reg.predict_proba(x)
    print("The mean accuracy of the model is:", round(reg.score(x, y), 5))
    data = {'target': y, 'prediction': pred ,'class_0_prob': probs[:, 0], 'class_1_prob': probs[:, 1], 'gender': orig_matching_df['Sex']}
    result_df = pd.DataFrame(data)
    return result_df

result_df = test_model(x_test, y_test, df)
result_df.head()

The mean accuracy of the model is: 0.85006


Unnamed: 0,target,prediction,class_0_prob,class_1_prob,gender
7762,0,0,0.99531,0.00469,Male
23881,0,0,0.99826,0.00174,Female
30507,0,0,0.99665,0.00335,Male
28911,0,0,0.99555,0.00445,Female
19484,0,0,0.97298,0.02702,Male


In [7]:
female = result_df[result_df['gender'] == 'Female']
print('Model predicts', round(sum(female['prediction'] == 1) / len(female), 5) * 100, '% of female earn >50K')
male = result_df[result_df['gender'] == 'Male']
print('Model predicts', round(sum(male['prediction'] == 1) / len(male), 5) * 100, '% of male earn >50K')

print("Predicted disparate impact ratio is:", (sum(female['prediction'] == 1) / len(female)) / (sum(male['prediction'] == 1) / len(male)))

result_df.to_csv("parsed_dataset/biased_results.csv")

Model predicts 7.034 % of female earn >50K
Model predicts 25.176 % of male earn >50K
Predicted disparate impact ratio is: 0.27939670399997324


Clearly, the classifier has worsened the bias in the dataset. It will be critical that we mitigate this before using it in any way.

## Assigning weights to different classes
To rectify this bias what we can do is apply a weight for each class depending on whether it its unprivileged and privileged. Intuitively, we would want to give the privileged class (male) less weighting and the less privileged class (female) a heavier weight. 

This can be formulated as the following equation: 
$$
W = \frac{N_{\text{unprivileged}}N_{\text{positive}}}{N_{\text{all}}N_{\text{positive unprivileged}}}
$$

In [8]:
male = df[df['Sex'] == 'Male']
female = df[df['Sex'] == 'Female']

pos_unpri = (len(female) * sum(df['Target'] == 1)) / (len(df) * sum(female['Target'] == 1))
neg_unpri = (len(female) * sum(df['Target'] == 0)) / (len(df) * sum(female['Target'] == 0))

pos_pri = (len(male) * sum(df['Target'] == 1)) / (len(df) * sum(male['Target'] == 1))
neg_pri = (len(male) * sum(df['Target'] == 0)) / (len(df) * sum(male['Target'] == 0))

print('weight for positive underprivileged class:', round(pos_unpri, 5))
print('weight for negative underprivileged class:', round(neg_unpri, 5))
print('weight for position privileged class:', round(pos_pri, 5))
print('weight for negative privileged class:', round(neg_pri, 5))

weight for positive underprivileged class: 2.19019
weight for negative underprivileged class: 0.85402
weight for position privileged class: 0.78771
weight for negative privileged class: 1.09262


How can we explain these results intuitively?
- higher weight for positive underprivileged class: we want to boost the positive features of underprivileged class
- lower weight for negative underprivileged class: we want to reduce the negative features of underprivileged class
- lower weight for positive privileged class: we want to reduce the amount of positive for the privileged class
- slightly higher weight for negative privileged class: we want to slightly increase the effect of negative attributes to the privileged class

## Reweighing the dataset and retrain the model 
Now, after we obtained the weights, what we want to do is to apply the calculated weights to the entire dataset. To implementing this, we can simply use the provided function from `aif360` library `Reweighing`. 
A brief introduction to the `aif360` library: 
- Developed by IBM's research team Trusted AI, it is an opensource toolkit for examining, reporting, and mitigating discrimination and bias in machine learning models.
- The package includes comprehensive set of metrics for biases and models and a collection of algorithms to mitigate bias in dataset and machine learned models. 
- Using the package, we can load common datasets such as the Adult Census Income Dataset, Bank marketing Dataset, German credit Dataset, etc.
- The package splits the algorithms into 3 main types:
    1. Preprocessing: modifying the dataset before training
    2. Inprocessing: modifying the training process to ensure fairness
    3. Postprocessing: calibrate, reject, or equalize outcomes after training has been done
- In this example, I will only demonstrate a few examples of the functionality that it provides. To learn more, go to https://aif360.mybluemix.net/data for an interactive demo

Reweighting is an algorithm that belongs to the class of preprocessing algorithms because it modifies the dataset before training happens.

Below, we demonstrate how `Reweighing` can be used to mitigate bias in the dataset and make the model less prone to be biased.

In [9]:
# take out gender column so it doesn't get transformed
gender_df = df['Sex'].copy()
gender_df[gender_df == 'Female'] = 1
gender_df[gender_df == 'Male'] = 0

# drop the unnecessary columns
new_df = pd.get_dummies(df).drop(['Sex_Male', 'Sex_Female'], axis=1)
new_df['Gender'] = gender_df

In [10]:
# convert encoded data into required aif format
BLD = BinaryLabelDataset(favorable_label=1, unfavorable_label=0, df=new_df, label_names=['Target'], protected_attribute_names=['Gender'])

privileged_groups = [{'Gender': 1}]
unprivileged_groups = [{'Gender': 0}]

# reweight the dataset using built-in function from aif360
RW = Reweighing(unprivileged_groups=[{'Gender': 0}], privileged_groups=[{'Gender': 1}])
di = RW.fit_transform(BLD)
print(di.instance_weights, "\n", len(di.instance_weights))

data = {'gender': df['Sex'], 'target': new_df['Target'], 'weight': di.instance_weights}
pd.DataFrame(data).head(15)

[1.09262055 1.09262055 1.09262055 ... 1.09262055 1.09262055 0.78771422] 
 48842


Unnamed: 0,gender,target,weight
0,Male,0,1.09262
1,Male,0,1.09262
2,Male,0,1.09262
3,Male,0,1.09262
4,Female,0,0.85402
5,Female,0,0.85402
6,Female,0,0.85402
7,Male,1,0.78771
8,Female,1,2.19019
9,Male,1,0.78771


As we can see here, the previous weights that we calculated are now applied to the entire dataset based on which of the four classes that it is in. 

In [11]:
di_features = pd.DataFrame(di.features, columns=di.feature_names).reset_index(drop=True)

# scale the features
scaler = StandardScaler()
scaled_features = pd.DataFrame(scaler.fit_transform(di_features), columns=di_features.columns)

# split the dataset into train / test sets
x_train, x_test, y_train, y_test = train_test_split(scaled_features, di.labels.flatten(), test_size=0.3, random_state=42)

In [12]:
reg.fit(x_train, y_train, sample_weight=di.instance_weights[x_train.index])

LogisticRegression()

In [13]:
result_df = test_model(x_test, y_test, df)
result_df

The mean accuracy of the model is: 0.84324


Unnamed: 0,target,prediction,class_0_prob,class_1_prob,gender
7762,0.0,0.0,0.99651,0.00349,Male
23881,0.0,0.0,0.99573,0.00427,Female
30507,0.0,0.0,0.99773,0.00227,Male
28911,0.0,0.0,0.98880,0.01120,Female
19484,0.0,0.0,0.98053,0.01947,Male
...,...,...,...,...,...
15938,0.0,0.0,0.65240,0.34760,Male
27828,0.0,0.0,0.99038,0.00962,Female
28449,0.0,0.0,0.94207,0.05793,Female
5647,0.0,0.0,0.98255,0.01745,Male


In [14]:
female = result_df[result_df['gender'] == 'Female']
print('Model predicts', round(sum(female['prediction'] == 1) / len(female), 5) * 100, '% of female earn >50K')

male = result_df[result_df['gender'] == 'Male']
print('Model predicts', round(sum(male['prediction'] == 1) / len(male), 5) * 100, '% of male earn >50K')

print("Predicted disparate impact ratio is:", (sum(female['prediction'] == 1) / len(female)) / (sum(male['prediction'] == 1) / len(male)))

Model predicts 12.073 % of female earn >50K
Model predicts 20.488 % of male earn >50K
Predicted disparate impact ratio is: 0.5892766989037003


## Solution 2: Adversarial Debiasing
- An inprocessing technique which learns two classifiers in simultaneously to ensure that protected attributes such as gender and race cannot be used to produce biased models. 
- One classifier's job is to maximize prediction accuracy or minimize loss
- The other classifier tries to determine the protected attribute from the prediction produced by the first classifier. 
- Goal: to minimize the adversary's ability to correctly identify the protected attribute while reducing the error of the original classification problem. 
- Intuitively: if the difference in results between unprivileged and privileged group cannot be discerned, then we have achieved our goal of removing bias from the training process. 

In [15]:
from aif360.algorithms.inprocessing.adversarial_debiasing import AdversarialDebiasing
import tensorflow.compat.v1 as tf

tf.disable_eager_execution()
sess = tf.Session()

2022-03-18 02:25:01.762683: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-03-18 02:25:01.763252: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-03-18 02:25:01.763786: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (project-38c1918c-f051-4428-8847-997d00b4b0c8): /proc/driver/nvidia/version does not exist
2022-03-18 02:25:01.771621: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [16]:
filtered_df = df[['Age', 'Workclass', 'Education-Num', 'Occupation', 'Sex', 'Capital Gain', 'Capital Loss',
       'Hours per week', 'Target']]

gender_df = filtered_df['Sex'].copy()
gender_df[gender_df == 'Female'] = 1
gender_df[gender_df == 'Male'] = 0

# drop the unnecessary columns
new_df = pd.get_dummies(filtered_df).drop(['Sex_Male', 'Sex_Female'], axis=1)
new_df['Gender'] = gender_df
new_df = new_df.drop("Workclass_ ?", axis=1)


BLD = BinaryLabelDataset(favorable_label=1, unfavorable_label=0, df=new_df, label_names=['Target'], protected_attribute_names=['Gender'])

In [17]:
privileged_groups = [{'Gender': 1}]
unprivileged_groups = [{'Gender': 0}]

debiased_model = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='debiased_classifier',
                          debias=True,
                          sess=sess,
                          num_epochs=100)

In [18]:
debiased_model.fit(BLD)

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


epoch 0; iter: 0; batch classifier loss: 26.520050; batch adversarial loss: 0.721980


epoch 0; iter: 200; batch classifier loss: 6.575389; batch adversarial loss: 0.672951


epoch 1; iter: 0; batch classifier loss: 15.754431; batch adversarial loss: 0.648825


epoch 1; iter: 200; batch classifier loss: 8.892447; batch adversarial loss: 0.636596


epoch 2; iter: 0; batch classifier loss: 8.706734; batch adversarial loss: 0.633119


epoch 2; iter: 200; batch classifier loss: 3.740513; batch adversarial loss: 0.678879


epoch 3; iter: 0; batch classifier loss: 0.936135; batch adversarial loss: 0.632686


epoch 3; iter: 200; batch classifier loss: 1.568327; batch adversarial loss: 0.635623


epoch 4; iter: 0; batch classifier loss: 0.520397; batch adversarial loss: 0.619828


epoch 4; iter: 200; batch classifier loss: 0.564673; batch adversarial loss: 0.565578


epoch 5; iter: 0; batch classifier loss: 1.424243; batch adversarial loss: 0.658711


epoch 5; iter: 200; batch classifier loss: 1.433200; batch adversarial loss: 0.639733


epoch 6; iter: 0; batch classifier loss: 0.442910; batch adversarial loss: 0.620846


epoch 6; iter: 200; batch classifier loss: 0.554492; batch adversarial loss: 0.654188


epoch 7; iter: 0; batch classifier loss: 0.827659; batch adversarial loss: 0.634674


epoch 7; iter: 200; batch classifier loss: 0.513500; batch adversarial loss: 0.610400


epoch 8; iter: 0; batch classifier loss: 0.491024; batch adversarial loss: 0.658965


epoch 8; iter: 200; batch classifier loss: 0.514090; batch adversarial loss: 0.622027


epoch 9; iter: 0; batch classifier loss: 0.550617; batch adversarial loss: 0.614941


epoch 9; iter: 200; batch classifier loss: 0.599395; batch adversarial loss: 0.628909


epoch 10; iter: 0; batch classifier loss: 0.478407; batch adversarial loss: 0.703508


epoch 10; iter: 200; batch classifier loss: 0.460160; batch adversarial loss: 0.582949


epoch 11; iter: 0; batch classifier loss: 0.444950; batch adversarial loss: 0.621784


epoch 11; iter: 200; batch classifier loss: 0.330203; batch adversarial loss: 0.613746


epoch 12; iter: 0; batch classifier loss: 0.493532; batch adversarial loss: 0.577208


epoch 12; iter: 200; batch classifier loss: 0.373200; batch adversarial loss: 0.654899


epoch 13; iter: 0; batch classifier loss: 0.480373; batch adversarial loss: 0.668941


epoch 13; iter: 200; batch classifier loss: 0.434231; batch adversarial loss: 0.653616


epoch 14; iter: 0; batch classifier loss: 0.457155; batch adversarial loss: 0.596422


epoch 14; iter: 200; batch classifier loss: 0.451609; batch adversarial loss: 0.619495


epoch 15; iter: 0; batch classifier loss: 0.441263; batch adversarial loss: 0.616872


epoch 15; iter: 200; batch classifier loss: 0.360725; batch adversarial loss: 0.591028


epoch 16; iter: 0; batch classifier loss: 0.478597; batch adversarial loss: 0.646450


epoch 16; iter: 200; batch classifier loss: 0.331256; batch adversarial loss: 0.633863


epoch 17; iter: 0; batch classifier loss: 0.373736; batch adversarial loss: 0.634879


epoch 17; iter: 200; batch classifier loss: 0.378821; batch adversarial loss: 0.629062


epoch 18; iter: 0; batch classifier loss: 0.483393; batch adversarial loss: 0.618329


epoch 18; iter: 200; batch classifier loss: 0.442795; batch adversarial loss: 0.558951


epoch 19; iter: 0; batch classifier loss: 0.493609; batch adversarial loss: 0.518423


epoch 19; iter: 200; batch classifier loss: 0.397547; batch adversarial loss: 0.555116


epoch 20; iter: 0; batch classifier loss: 0.337871; batch adversarial loss: 0.630228


epoch 20; iter: 200; batch classifier loss: 0.508213; batch adversarial loss: 0.559340


epoch 21; iter: 0; batch classifier loss: 0.434960; batch adversarial loss: 0.607187


epoch 21; iter: 200; batch classifier loss: 0.384430; batch adversarial loss: 0.602530


epoch 22; iter: 0; batch classifier loss: 0.318251; batch adversarial loss: 0.644566


epoch 22; iter: 200; batch classifier loss: 0.398346; batch adversarial loss: 0.638102


epoch 23; iter: 0; batch classifier loss: 0.457130; batch adversarial loss: 0.686689


epoch 23; iter: 200; batch classifier loss: 0.435053; batch adversarial loss: 0.637149


epoch 24; iter: 0; batch classifier loss: 0.359426; batch adversarial loss: 0.621873


epoch 24; iter: 200; batch classifier loss: 0.407546; batch adversarial loss: 0.620361


epoch 25; iter: 0; batch classifier loss: 0.444263; batch adversarial loss: 0.629434


epoch 25; iter: 200; batch classifier loss: 0.448907; batch adversarial loss: 0.609631


epoch 26; iter: 0; batch classifier loss: 0.465315; batch adversarial loss: 0.663206


epoch 26; iter: 200; batch classifier loss: 0.389895; batch adversarial loss: 0.650545


epoch 27; iter: 0; batch classifier loss: 0.363650; batch adversarial loss: 0.645354


epoch 27; iter: 200; batch classifier loss: 0.323745; batch adversarial loss: 0.648873


epoch 28; iter: 0; batch classifier loss: 0.402512; batch adversarial loss: 0.605798


epoch 28; iter: 200; batch classifier loss: 0.325530; batch adversarial loss: 0.548896


epoch 29; iter: 0; batch classifier loss: 0.398492; batch adversarial loss: 0.618267


epoch 29; iter: 200; batch classifier loss: 0.336571; batch adversarial loss: 0.598027


epoch 30; iter: 0; batch classifier loss: 0.412983; batch adversarial loss: 0.617439


epoch 30; iter: 200; batch classifier loss: 0.799667; batch adversarial loss: 0.586326


epoch 31; iter: 0; batch classifier loss: 0.355098; batch adversarial loss: 0.559492


epoch 31; iter: 200; batch classifier loss: 0.494161; batch adversarial loss: 0.619415


epoch 32; iter: 0; batch classifier loss: 0.378052; batch adversarial loss: 0.604950


epoch 32; iter: 200; batch classifier loss: 0.457935; batch adversarial loss: 0.616417


epoch 33; iter: 0; batch classifier loss: 0.480134; batch adversarial loss: 0.602581


epoch 33; iter: 200; batch classifier loss: 0.390843; batch adversarial loss: 0.633257


epoch 34; iter: 0; batch classifier loss: 0.400778; batch adversarial loss: 0.663468


epoch 34; iter: 200; batch classifier loss: 0.380937; batch adversarial loss: 0.602046


epoch 35; iter: 0; batch classifier loss: 0.371457; batch adversarial loss: 0.668501


epoch 35; iter: 200; batch classifier loss: 0.330504; batch adversarial loss: 0.568554


epoch 36; iter: 0; batch classifier loss: 0.468557; batch adversarial loss: 0.592391


epoch 36; iter: 200; batch classifier loss: 0.327116; batch adversarial loss: 0.696013


epoch 37; iter: 0; batch classifier loss: 0.479114; batch adversarial loss: 0.604080


epoch 37; iter: 200; batch classifier loss: 0.533110; batch adversarial loss: 0.603514


epoch 38; iter: 0; batch classifier loss: 0.424274; batch adversarial loss: 0.592869


epoch 38; iter: 200; batch classifier loss: 0.381855; batch adversarial loss: 0.562479


epoch 39; iter: 0; batch classifier loss: 0.393235; batch adversarial loss: 0.644126


epoch 39; iter: 200; batch classifier loss: 0.544316; batch adversarial loss: 0.609787


epoch 40; iter: 0; batch classifier loss: 0.411553; batch adversarial loss: 0.621930


epoch 40; iter: 200; batch classifier loss: 0.716560; batch adversarial loss: 0.600227


epoch 41; iter: 0; batch classifier loss: 0.357922; batch adversarial loss: 0.616588


epoch 41; iter: 200; batch classifier loss: 0.404680; batch adversarial loss: 0.606049


epoch 42; iter: 0; batch classifier loss: 0.582821; batch adversarial loss: 0.670913


epoch 42; iter: 200; batch classifier loss: 0.561658; batch adversarial loss: 0.606481


epoch 43; iter: 0; batch classifier loss: 0.408466; batch adversarial loss: 0.619658


epoch 43; iter: 200; batch classifier loss: 0.500393; batch adversarial loss: 0.654502


epoch 44; iter: 0; batch classifier loss: 0.426844; batch adversarial loss: 0.619151


epoch 44; iter: 200; batch classifier loss: 0.296007; batch adversarial loss: 0.632768


epoch 45; iter: 0; batch classifier loss: 0.359848; batch adversarial loss: 0.573729


epoch 45; iter: 200; batch classifier loss: 0.468365; batch adversarial loss: 0.676520


epoch 46; iter: 0; batch classifier loss: 0.565396; batch adversarial loss: 0.604404


epoch 46; iter: 200; batch classifier loss: 0.527489; batch adversarial loss: 0.621284


epoch 47; iter: 0; batch classifier loss: 0.437843; batch adversarial loss: 0.557883


epoch 47; iter: 200; batch classifier loss: 0.437752; batch adversarial loss: 0.557434


epoch 48; iter: 0; batch classifier loss: 0.521939; batch adversarial loss: 0.646541


epoch 48; iter: 200; batch classifier loss: 0.414165; batch adversarial loss: 0.668098


epoch 49; iter: 0; batch classifier loss: 0.382521; batch adversarial loss: 0.590477


epoch 49; iter: 200; batch classifier loss: 0.523422; batch adversarial loss: 0.623466


epoch 50; iter: 0; batch classifier loss: 0.379800; batch adversarial loss: 0.677158


epoch 50; iter: 200; batch classifier loss: 0.581437; batch adversarial loss: 0.609218


epoch 51; iter: 0; batch classifier loss: 0.460286; batch adversarial loss: 0.552050


epoch 51; iter: 200; batch classifier loss: 0.585025; batch adversarial loss: 0.640073


epoch 52; iter: 0; batch classifier loss: 0.440576; batch adversarial loss: 0.620362


epoch 52; iter: 200; batch classifier loss: 0.500468; batch adversarial loss: 0.615953


epoch 53; iter: 0; batch classifier loss: 0.548903; batch adversarial loss: 0.611425


epoch 53; iter: 200; batch classifier loss: 0.466510; batch adversarial loss: 0.576406


epoch 54; iter: 0; batch classifier loss: 0.506385; batch adversarial loss: 0.640013


epoch 54; iter: 200; batch classifier loss: 0.475267; batch adversarial loss: 0.560353


epoch 55; iter: 0; batch classifier loss: 0.477082; batch adversarial loss: 0.609644


epoch 55; iter: 200; batch classifier loss: 0.705489; batch adversarial loss: 0.585191


epoch 56; iter: 0; batch classifier loss: 0.344180; batch adversarial loss: 0.639899


epoch 56; iter: 200; batch classifier loss: 0.516345; batch adversarial loss: 0.605311


epoch 57; iter: 0; batch classifier loss: 1.051936; batch adversarial loss: 0.644423


epoch 57; iter: 200; batch classifier loss: 0.323015; batch adversarial loss: 0.627936


epoch 58; iter: 0; batch classifier loss: 0.433234; batch adversarial loss: 0.625571


epoch 58; iter: 200; batch classifier loss: 0.375474; batch adversarial loss: 0.658097


epoch 59; iter: 0; batch classifier loss: 0.628997; batch adversarial loss: 0.565198


epoch 59; iter: 200; batch classifier loss: 0.451645; batch adversarial loss: 0.617592


epoch 60; iter: 0; batch classifier loss: 0.404796; batch adversarial loss: 0.630170


epoch 60; iter: 200; batch classifier loss: 0.582468; batch adversarial loss: 0.619873


epoch 61; iter: 0; batch classifier loss: 0.411227; batch adversarial loss: 0.596337


epoch 61; iter: 200; batch classifier loss: 0.420566; batch adversarial loss: 0.619979


epoch 62; iter: 0; batch classifier loss: 0.410043; batch adversarial loss: 0.642399


epoch 62; iter: 200; batch classifier loss: 0.262852; batch adversarial loss: 0.587193


epoch 63; iter: 0; batch classifier loss: 0.573976; batch adversarial loss: 0.615368


epoch 63; iter: 200; batch classifier loss: 0.608135; batch adversarial loss: 0.613981


epoch 64; iter: 0; batch classifier loss: 0.538630; batch adversarial loss: 0.564292


epoch 64; iter: 200; batch classifier loss: 0.471639; batch adversarial loss: 0.592460


epoch 65; iter: 0; batch classifier loss: 0.454550; batch adversarial loss: 0.600444


epoch 65; iter: 200; batch classifier loss: 0.539123; batch adversarial loss: 0.618549


epoch 66; iter: 0; batch classifier loss: 0.548452; batch adversarial loss: 0.667517


epoch 66; iter: 200; batch classifier loss: 0.494669; batch adversarial loss: 0.627778


epoch 67; iter: 0; batch classifier loss: 0.479128; batch adversarial loss: 0.636439


epoch 67; iter: 200; batch classifier loss: 0.456680; batch adversarial loss: 0.635043


epoch 68; iter: 0; batch classifier loss: 0.392934; batch adversarial loss: 0.647577


epoch 68; iter: 200; batch classifier loss: 0.447730; batch adversarial loss: 0.583593


epoch 69; iter: 0; batch classifier loss: 0.313820; batch adversarial loss: 0.572590


epoch 69; iter: 200; batch classifier loss: 0.484785; batch adversarial loss: 0.621506


epoch 70; iter: 0; batch classifier loss: 0.441307; batch adversarial loss: 0.580799


epoch 70; iter: 200; batch classifier loss: 0.557256; batch adversarial loss: 0.595662


epoch 71; iter: 0; batch classifier loss: 0.400243; batch adversarial loss: 0.641795


epoch 71; iter: 200; batch classifier loss: 0.476855; batch adversarial loss: 0.611976


epoch 72; iter: 0; batch classifier loss: 0.488860; batch adversarial loss: 0.641287


epoch 72; iter: 200; batch classifier loss: 0.491863; batch adversarial loss: 0.602648


epoch 73; iter: 0; batch classifier loss: 0.492133; batch adversarial loss: 0.611998


epoch 73; iter: 200; batch classifier loss: 0.389102; batch adversarial loss: 0.603767


epoch 74; iter: 0; batch classifier loss: 0.598418; batch adversarial loss: 0.688751


epoch 74; iter: 200; batch classifier loss: 0.412226; batch adversarial loss: 0.630671


epoch 75; iter: 0; batch classifier loss: 0.463557; batch adversarial loss: 0.555454


epoch 75; iter: 200; batch classifier loss: 0.550895; batch adversarial loss: 0.650518


epoch 76; iter: 0; batch classifier loss: 0.620233; batch adversarial loss: 0.599568


epoch 76; iter: 200; batch classifier loss: 0.502021; batch adversarial loss: 0.580248


epoch 77; iter: 0; batch classifier loss: 0.509107; batch adversarial loss: 0.611209


epoch 77; iter: 200; batch classifier loss: 0.644740; batch adversarial loss: 0.640341


epoch 78; iter: 0; batch classifier loss: 0.615663; batch adversarial loss: 0.606928


epoch 78; iter: 200; batch classifier loss: 0.455263; batch adversarial loss: 0.622632


epoch 79; iter: 0; batch classifier loss: 0.485389; batch adversarial loss: 0.636164


epoch 79; iter: 200; batch classifier loss: 0.392409; batch adversarial loss: 0.582776


epoch 80; iter: 0; batch classifier loss: 0.436945; batch adversarial loss: 0.705351


epoch 80; iter: 200; batch classifier loss: 0.410058; batch adversarial loss: 0.602996


epoch 81; iter: 0; batch classifier loss: 0.572287; batch adversarial loss: 0.627186


epoch 81; iter: 200; batch classifier loss: 0.339898; batch adversarial loss: 0.581311


epoch 82; iter: 0; batch classifier loss: 0.488578; batch adversarial loss: 0.661467


epoch 82; iter: 200; batch classifier loss: 0.385276; batch adversarial loss: 0.633955


epoch 83; iter: 0; batch classifier loss: 0.447498; batch adversarial loss: 0.621476


epoch 83; iter: 200; batch classifier loss: 0.434678; batch adversarial loss: 0.616745


epoch 84; iter: 0; batch classifier loss: 0.590795; batch adversarial loss: 0.593546


epoch 84; iter: 200; batch classifier loss: 0.486707; batch adversarial loss: 0.547683


epoch 85; iter: 0; batch classifier loss: 0.480030; batch adversarial loss: 0.582969


epoch 85; iter: 200; batch classifier loss: 0.474519; batch adversarial loss: 0.658126


epoch 86; iter: 0; batch classifier loss: 0.457927; batch adversarial loss: 0.654213


epoch 86; iter: 200; batch classifier loss: 0.449076; batch adversarial loss: 0.596764


epoch 87; iter: 0; batch classifier loss: 0.377987; batch adversarial loss: 0.626256


epoch 87; iter: 200; batch classifier loss: 0.439079; batch adversarial loss: 0.629761


epoch 88; iter: 0; batch classifier loss: 0.554024; batch adversarial loss: 0.527594


epoch 88; iter: 200; batch classifier loss: 0.434524; batch adversarial loss: 0.661275


epoch 89; iter: 0; batch classifier loss: 0.556957; batch adversarial loss: 0.576559


epoch 89; iter: 200; batch classifier loss: 0.408172; batch adversarial loss: 0.653879


epoch 90; iter: 0; batch classifier loss: 0.602316; batch adversarial loss: 0.610464


epoch 90; iter: 200; batch classifier loss: 0.490156; batch adversarial loss: 0.588832


epoch 91; iter: 0; batch classifier loss: 0.391693; batch adversarial loss: 0.628557


epoch 91; iter: 200; batch classifier loss: 0.466361; batch adversarial loss: 0.586307


epoch 92; iter: 0; batch classifier loss: 0.554599; batch adversarial loss: 0.629932


epoch 92; iter: 200; batch classifier loss: 0.715748; batch adversarial loss: 0.578237


epoch 93; iter: 0; batch classifier loss: 0.429510; batch adversarial loss: 0.642800


epoch 93; iter: 200; batch classifier loss: 0.466947; batch adversarial loss: 0.634181


epoch 94; iter: 0; batch classifier loss: 0.433265; batch adversarial loss: 0.667963


epoch 94; iter: 200; batch classifier loss: 0.591680; batch adversarial loss: 0.625345


epoch 95; iter: 0; batch classifier loss: 0.547993; batch adversarial loss: 0.557559


epoch 95; iter: 200; batch classifier loss: 0.614790; batch adversarial loss: 0.561463


epoch 96; iter: 0; batch classifier loss: 0.472355; batch adversarial loss: 0.639440


epoch 96; iter: 200; batch classifier loss: 0.476184; batch adversarial loss: 0.635030


epoch 97; iter: 0; batch classifier loss: 0.701959; batch adversarial loss: 0.653899


epoch 97; iter: 200; batch classifier loss: 0.771629; batch adversarial loss: 0.591383


epoch 98; iter: 0; batch classifier loss: 0.501463; batch adversarial loss: 0.580412


epoch 98; iter: 200; batch classifier loss: 0.404070; batch adversarial loss: 0.653818


epoch 99; iter: 0; batch classifier loss: 0.693034; batch adversarial loss: 0.652979


epoch 99; iter: 200; batch classifier loss: 0.492253; batch adversarial loss: 0.604289


<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x7f6a3c991ca0>

We can use the built-in functionality of aif360 to measure how our model is performing

In [19]:
from aif360.metrics import BinaryLabelDatasetMetric

debiased_ds = debiased_model.predict(BLD)
metric = BinaryLabelDatasetMetric(debiased_ds, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric.mean_difference())

metric = BinaryLabelDatasetMetric(BLD, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric.mean_difference())

Difference in mean outcomes between unprivileged and privileged groups = 0.093089
Difference in mean outcomes between unprivileged and privileged groups = 0.194516


We can calculate mean difference easily using pandas as well:  

In [21]:
sum(male['target']) / len(male) - sum(female['target']) / len(female)

0.19112980418681333

However, this is rather a crude metric, let's apply some of the metrics that we discussed earlier in the julia script to see how our model performs compared to our previous ones.

Thankfully, aif360 has built-in functionality of these metrics, so we don't have to hand write them like before. We can simply use the methods provided in `from aif360.metrics`, specifically `ClassificationMetric` directly. 

In [24]:
from aif360.metrics import ClassificationMetric

classification_metrics = ClassificationMetric(BLD, debiased_ds, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)
print(classification_metrics.accuracy())
print(classification_metrics.average_odds_difference())
print(classification_metrics.equal_opportunity_difference())

0.8312722656729864
0.044071753083786636
0.08035729043732376
