COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a popular commercial algorithm that is used to score criminal defendant's likelihood of reoffending (recidivism). The ProPublica organization audited this model and found that this model finds overproportionally higher risks with black inmates, i.e. it descriminates against blacks.

In this section, we are going to download the [COMPAS modeling dataset as provided by ProPublica](https://github.com/propublica/compas-analysis). We are going to create a model in python with the interpret python package and balanced sampling. We will check if the model has a bias by gender or against an ethnic minority. We will use a black box postprocessing for debiasing in order to achieve fair classification thereby removing a significant portion of the bias.

For highly regulated domains such as recidivism, it can be very important to be able to interpret models. While we have seen how this can be done with blackbox models using packages like lime, it can be preferable to go beyond this. Instead of blackbox models, we will use interpretable models. 

We will use:
* a [RuleFit algorithm classifier](https://github.com/christophM/rulefit.git)
* an [Explainable Boosting Machine](https://github.com/interpretml/interpret)

Some other models are a few clicks away. You can find a lot of different interpretable models from [Tamas Madl's github page](https://github.com/tmadl).

In [0]:
!wget https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv

--2019-12-20 23:06:34--  https://raw.githubusercontent.com/propublica/compas-analysis/master/compas-scores-two-years.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2546489 (2.4M) [text/plain]
Saving to: ‘compas-scores-two-years.csv’


2019-12-20 23:06:34 (49.9 MB/s) - ‘compas-scores-two-years.csv’ saved [2546489/2546489]



In [0]:
import pandas as pd

In [0]:
data = pd.read_csv('compas-scores-two-years.csv')
data.head()

Unnamed: 0,id,name,first,last,compas_screening_date,sex,dob,age,age_cat,race,juv_fel_count,decile_score,juv_misd_count,juv_other_count,priors_count,days_b_screening_arrest,c_jail_in,c_jail_out,c_case_number,c_offense_date,c_arrest_date,c_days_from_compas,c_charge_degree,c_charge_desc,is_recid,r_case_number,r_charge_degree,r_days_from_arrest,r_offense_date,r_charge_desc,r_jail_in,r_jail_out,violent_recid,is_violent_recid,vr_case_number,vr_charge_degree,vr_offense_date,vr_charge_desc,type_of_assessment,decile_score.1,score_text,screening_date,v_type_of_assessment,v_decile_score,v_score_text,v_screening_date,in_custody,out_custody,priors_count.1,start,end,event,two_year_recid
0,1,miguel hernandez,miguel,hernandez,2013-08-14,Male,1947-04-18,69,Greater than 45,Other,0,1,0,0,0,-1.0,2013-08-13 06:03:42,2013-08-14 05:41:20,13011352CF10A,2013-08-13,,1.0,F,Aggravated Assault w/Firearm,0,,,,,,,,,0,,,,,Risk of Recidivism,1,Low,2013-08-14,Risk of Violence,1,Low,2013-08-14,2014-07-07,2014-07-14,0,0,327,0,0
1,3,kevon dixon,kevon,dixon,2013-01-27,Male,1982-01-22,34,25 - 45,African-American,0,3,0,0,0,-1.0,2013-01-26 03:45:27,2013-02-05 05:36:53,13001275CF10A,2013-01-26,,1.0,F,Felony Battery w/Prior Convict,1,13009779CF10A,(F3),,2013-07-05,Felony Battery (Dom Strang),,,,1,13009779CF10A,(F3),2013-07-05,Felony Battery (Dom Strang),Risk of Recidivism,3,Low,2013-01-27,Risk of Violence,1,Low,2013-01-27,2013-01-26,2013-02-05,0,9,159,1,1
2,4,ed philo,ed,philo,2013-04-14,Male,1991-05-14,24,Less than 25,African-American,0,4,0,1,4,-1.0,2013-04-13 04:58:34,2013-04-14 07:02:04,13005330CF10A,2013-04-13,,1.0,F,Possession of Cocaine,1,13011511MM10A,(M1),0.0,2013-06-16,Driving Under The Influence,2013-06-16,2013-06-16,,0,,,,,Risk of Recidivism,4,Low,2013-04-14,Risk of Violence,3,Low,2013-04-14,2013-06-16,2013-06-16,4,0,63,0,1
3,5,marcu brown,marcu,brown,2013-01-13,Male,1993-01-21,23,Less than 25,African-American,0,8,1,0,1,,,,13000570CF10A,2013-01-12,,1.0,F,Possession of Cannabis,0,,,,,,,,,0,,,,,Risk of Recidivism,8,High,2013-01-13,Risk of Violence,6,Medium,2013-01-13,,,1,0,1174,0,0
4,6,bouthy pierrelouis,bouthy,pierrelouis,2013-03-26,Male,1973-01-22,43,25 - 45,Other,0,1,0,0,2,,,,12014130CF10A,,2013-01-09,76.0,F,arrest case no charge,0,,,,,,,,,0,,,,,Risk of Recidivism,1,Low,2013-03-26,Risk of Violence,1,Low,2013-03-26,,,2,0,1102,0,0


Each row represents risk of violence and risk of recidivism scores for an inmate. The final column *two_year_recid* indicates our target.

We can already see a few issues:
1. The column for race is a protected category. It should not be used as a feature for model training, but as a control.
2. there are full names in the dataset
  i) we can only hope these are aliases
  ii) these will not be useful - or might even give away ethnicity of the inmates.
3. there are case numbers in the dataset - these will likely not be useful for training a model; although they might have some target leakage in the sense that increasing case numbers might tell about the time, and there could be a drift effect in the targets over time.
4. there are missing values - we will need to do imputation.
5. there are datestamps: these will probably not be useful and might even come with associated problems (see 3). However, we can convert these features into unix Epochs, i.e. seconds elapsed since 1970, and then repurpose the NumericDifferenceTransformer that we saw in an earlier recipe, in order to get time periods between these datestamps instead than the epochs themselves.
6. We have several categorical variables
7. The charge description (*c_charge_desc*) might need have to be cleaned up.

In [0]:
pip install -q aix360 interpret fairlearn git+git://github.com/christophM/rulefit.git

[K     |████████████████████████████████| 10.7MB 9.0MB/s 
[K     |████████████████████████████████| 109.2MB 51kB/s 
[K     |████████████████████████████████| 3.2MB 44.2MB/s 
[K     |████████████████████████████████| 491kB 54.5MB/s 
[?25h  Building wheel for RuleFit (setup.py) ... [?25l[?25hdone


In [0]:
from interpret.glassbox import ExplainableBoostingClassifier
from sklearn.model_selection import train_test_split

ebm = ExplainableBoostingClassifier()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)

In [0]:
https://github.com/benman1/People-Analytics-Project

Mitigation of model bias by transforming 

In [0]:
from fairlearn.postprocessing import ThresholdOptimizer
from fairlearn.postprocessing._threshold_optimizer import DEMOGRAPHIC_PARITY
from fairlearn.reductions import GridSearch
from fairlearn.reductions import DemographicParity


class MitigationWrapper:
    '''
    After
    https://github.com/fairlearn/fairlearn/blob/
    16729ddc80dc40d36a9e63c0baba875947628aea/test/perf/test_performance.py

    Two blackbox fairness postprocessing methods are implemented here:
    1. GridSearch
    2. Threshold

    Parameters:
    -----------
    method : either 'threshold' (default) or 'grid'
    estimator : a blackbox (sklearn-compatible) model
    '''
    def __init__(self, estimator, method='threshold'):
        assert method in ['threshold', 'grid']
        self.method = method
        self.estimator = estimator

    def fit(self, X, y, sensitive_features):
        '''
        Parameters:
        -----------
        X : training features (array-like)
        y : training targets
        sensitive_features : the sensitive data columns
        '''
        if self.method == 'threshold':
            self.mitigator = ThresholdOptimizer(
                unconstrained_predictor=self.estimator,
                constraints=DEMOGRAPHIC_PARITY
            )
        else:
            self.mitigator = GridSearch(
                estimator=self.estimator,
                constraints=DemographicParity()
            )

        self.mitigator.fit(
            X,
            y,
            sensitive_features=sensitive_features
        )
    def predict(self, X, sensitive_features):
        '''
        Parameters:
        -----------
        X : features (array-like)
        sensitive_features : the sensitive data columns
        '''
        if self.method == 'threshold':
            return self.mitigator.predict(
                X,
                sensitive_features=sensitive_features,
                random_state=1
            )
        else:
            return self.mitigator.predict(X)

In [0]:
from treeinterpreter import treeinterpreter as ti
prediction, bias, contributions = ti.predict(classifiers[0], X_prepro_imp[0:2, :])
# treeinterpreter does not work with pipelines...

print('Bias (trainset mean)'.format(bias))
print('Feature contributions:')
for i, (c, feature) in enumerate(sorted(
    zip(contributions[0], resampling_pipeline_bclf.named_steps['preprocessor'].get_feature_names()),
    key=lambda x: sum(abs(x[0])), reverse=True
)):
    if i > 20:
      break
    print('{} - {}'.format(feature, c // 2))



In [0]:
# Similarly the shap module doesn't work with pipelines either
# Exception: Model type not yet supported by TreeExplainer: <class 'sklearn.pipeline.Pipeline'>

https://blog.fastforwardlabs.com/2017/03/09/fairml-auditing-black-box-predictive-models.html