# Adversarial fairnes on COMPAS with gradient reversal
This notebook will demonstrate the code in `gradient_reversal.py` to apply gradient reversal.

Use the `Pipfile` in the repo root if there are missing dependencies.

There is also a pre-trained model available that is unbiased with this approach. It's saved in `unbiased_model.h5`. The COMPAs data is loaded from this repo.

In [9]:
import pandas as pd
import os
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

In [10]:
df = pd.read_csv(os.path.join("..", "data", "csv", "scikit", "compas_recidive_two_years_sanitize_age_category_jail_time_decile_score.csv"))
df.head()

Unnamed: 0,sex,age_cat,race,juv_fel_count,juv_misd_count,juv_other_count,priors_count,days_b_screening_arrest,c_jail_in,c_jail_out,c_jail_time (days),date_dif_in_jail,c_charge_degree,is_recid,decile_score,score_text,two_year_recid
0,Male,Greater than 45,Other,0,0,0,0,-1,2013-08-13 06:03:42,2013-08-14 05:41:20,0.984468,1,F,0,1,Low,0
1,Male,25 - 45,African-American,0,0,0,0,-1,2013-01-26 03:45:27,2013-02-05 05:36:53,10.077384,10,F,1,3,Low,1
2,Male,Less than 25,African-American,0,0,1,4,-1,2013-04-13 04:58:34,2013-04-14 07:02:04,1.085764,1,F,1,4,Low,1
3,Male,25 - 45,Other,0,0,0,0,0,2013-11-30 04:50:18,2013-12-01 12:28:56,1.318495,1,M,0,1,Low,0
4,Male,25 - 45,Caucasian,0,0,0,14,-1,2014-02-18 05:08:24,2014-02-24 12:18:30,6.298681,6,F,1,6,Medium,1


Remove all other races except `African-American` and `Caucasian`. This might be removed later on if we use a generative adversary.

In [11]:
df_binary = df[(df["race"] == "Caucasian") | (df["race"] == "African-American")]

In [12]:
del df_binary['c_jail_in']
del df_binary['c_jail_out']

##separated class from the rests of the features
Y = df_binary[['decile_score', 'two_year_recid', 'race', 'score_text']]
del df_binary['decile_score']
del df_binary['two_year_recid']
del df_binary['score_text']

S = df_binary['race']
del df_binary['race']

In [13]:
df_binary.head()

Unnamed: 0,sex,age_cat,juv_fel_count,juv_misd_count,juv_other_count,priors_count,days_b_screening_arrest,c_jail_time (days),date_dif_in_jail,c_charge_degree,is_recid
1,Male,25 - 45,0,0,0,0,-1,10.077384,10,F,1
2,Male,Less than 25,0,0,1,4,-1,1.085764,1,F,1
4,Male,25 - 45,0,0,0,14,-1,6.298681,6,F,1
6,Female,25 - 45,0,0,0,0,-1,2.953611,3,M,0
7,Male,25 - 45,0,0,0,0,-1,1.080451,1,F,0


In [14]:
encod = preprocessing.OrdinalEncoder()
encod.fit(df_binary)
X = encod.transform(df_binary)
X = pd.DataFrame(X)
X.columns = df_binary.columns
X.head()

Unnamed: 0,sex,age_cat,juv_fel_count,juv_misd_count,juv_other_count,priors_count,days_b_screening_arrest,c_jail_time (days),date_dif_in_jail,c_charge_degree,is_recid
0,1.0,0.0,0.0,0.0,0.0,0.0,29.0,4127.0,10.0,0.0,1.0
1,1.0,2.0,0.0,0.0,1.0,4.0,29.0,2204.0,1.0,0.0,1.0
2,1.0,0.0,0.0,0.0,0.0,14.0,29.0,3920.0,6.0,0.0,1.0
3,0.0,0.0,0.0,0.0,0.0,0.0,29.0,3503.0,3.0,1.0,0.0
4,1.0,0.0,0.0,0.0,0.0,0.0,29.0,2184.0,1.0,0.0,0.0


In [15]:
# Trained on X_train with random state 42, so we'll keep that
X_train, X_test, Y_train, Y_test = train_test_split( X, Y, test_size=0.3, random_state=42)

In [22]:
print(X_train.shape)
print(len(Y_train))
Y.head()

(3694, 11)
3694


Unnamed: 0,decile_score,two_year_recid,race,score_text
1,3,1,African-American,Low
2,4,1,African-American,Low
4,6,1,Caucasian,Medium
6,1,0,Caucasian,Low
7,4,0,Caucasian,Low


## Load an unbiased pre-trained classifier
Here we'll load a pre-trained classifier for COMPAS, which can get you started.

In [17]:
from gradient_reversal import GradientReversalModel

Using TensorFlow backend.


In [18]:
gr = GradientReversalModel()

gr.load_trained_model(path="unbiased_model.h5", hp_lambda=100)
m = gr.get_model()
model_to_json=m.to_json()
with open("model.json", "w") as json_file:
    json_file.write(model_to_json)

In [19]:
Y_pred = gr.predict(X_test)




In [20]:
X_test

Unnamed: 0,sex,age_cat,juv_fel_count,juv_misd_count,juv_other_count,priors_count,days_b_screening_arrest,c_jail_time (days),date_dif_in_jail,c_charge_degree,is_recid
8,0.0,1.0,0.0,0.0,0.0,1.0,10.0,4268.0,14.0,0.0,1.0
4246,1.0,0.0,0.0,0.0,0.0,2.0,29.0,1110.0,1.0,1.0,1.0
544,0.0,2.0,0.0,0.0,0.0,0.0,29.0,1116.0,1.0,0.0,1.0
1780,1.0,0.0,0.0,0.0,0.0,1.0,29.0,825.0,1.0,0.0,1.0
3940,1.0,0.0,0.0,0.0,0.0,0.0,29.0,2787.0,1.0,0.0,1.0
1564,1.0,0.0,0.0,0.0,0.0,3.0,30.0,4662.0,34.0,0.0,1.0
4519,1.0,0.0,0.0,0.0,0.0,16.0,29.0,1885.0,1.0,0.0,1.0
2664,1.0,0.0,0.0,0.0,0.0,6.0,30.0,3081.0,2.0,0.0,1.0
167,1.0,0.0,0.0,0.0,2.0,11.0,29.0,2629.0,1.0,1.0,1.0
346,0.0,0.0,0.0,0.0,0.0,0.0,29.0,1931.0,1.0,1.0,0.0


In [23]:
Y_test['pred'] = Y_pred[0]
Y_test['pred_race'] = Y_pred[1][:,1] > 0.5
Y_test.head

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


<bound method NDFrame.head of       decile_score  two_year_recid              race score_text      pred  \
11               1               1         Caucasian        Low  6.031363   
4957             3               0  African-American        Low  4.009992   
648              7               0  African-American     Medium  5.157203   
2100             2               1         Caucasian        Low  3.764811   
4595             2               1  African-American        Low  3.681648   
1857            10               1  African-American       High  6.080746   
5287             9               1  African-American       High  6.174338   
3124             7               1  African-American     Medium  4.525392   
200              9               1  African-American       High  7.116177   
415              2               0         Caucasian        Low  3.115577   
2038             8               1  African-American       High  4.942677   
1877             7               1  African-Am

## Evaluation of the model
For this we use a simple model counting strategy with a syntax similar to bayesian networks.

We evaluate the model based on the demographic parity, both as an absolute diffence and as a ratio (related to the 80% rule).


In [15]:
from bayesian_model import BayesianModel as bm

In [16]:
abs(bm(Y_test).P(pred = lambda x: x>5).given(race = "Caucasian") - bm(Y_test).P(pred = lambda x: x>5).given(race = "African-American"))

0.11489678184486118

In [17]:
bm(Y_test).P(pred = lambda x: x>5).given(race = "Caucasian") / bm(Y_test).P(pred = lambda x: x>5).given(race = "African-American")

0.7576631195149871

So that value is less than 0.8, so by applying the 80% rule we could conclude this model is still biased. Let's look at another metric, called equality of opportunity:

$$P(\hat{Y} = 1 \mid A = 1, Y = 1) = P(\hat{Y} = 1 \mid A = 0, Y = 1)$$

In [18]:
abs(bm(Y_test).P(pred = lambda x: x>5).given(race = "Caucasian", two_year_recid = True) - bm(Y_test).P(pred = lambda x: x>5).given(race = "African-American", two_year_recid = True))

0.11477466626518118

This one should be as close to 0 as possible. So it's not bad, but not 0. Anyway, let's compare to a naive model with $\lambda = 0$.

## Naive model

In [19]:
gr_naive = GradientReversalModel()

gr_naive.load_trained_model(path="naive_model.h5", hp_lambda=0)

In [20]:
Y_pred_n = gr_naive.predict(X_test)

In [21]:
Y_test['pred_n'] = Y_pred_n

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [22]:
abs(bm(Y_test).P(pred_n = lambda x: x>5).given(race = "Caucasian") - bm(Y_test).P(pred_n = lambda x: x>5).given(race = "African-American"))

0.20485838911334903

In [23]:
bm(Y_test).P(pred_n = lambda x: x>5).given(race = "Caucasian") / bm(Y_test).P(pred_n = lambda x: x>5).given(race = "African-American")

0.5816211334386995

Ok that's a lot worse than the unbiased model ... 

In [24]:
abs(bm(Y_test).P(pred_n = lambda x: x>5).given(race = "Caucasian", two_year_recid = True) - bm(Y_test).P(pred_n = lambda x: x>5).given(race = "African-American", two_year_recid = True))

0.2239536284251732