# Fairness in Recidvisim Risk Scores

adapted from [BPDM 2017 Tutorial by Caitlin Kuhlman et al](https://github.com/caitlinkuhlman/bpdmtutorial)

__Tools:__ Analysis will be done in python, using a number of open source packages commonly used for data science tasks:
- __Numpy__ scientific computing. http://www.numpy.org/
- __Pandas__ data analysis and manipulation http://pandas.pydata.org/
- __Scikit-learn__ machine learning http://scikit-learn.org/stable/
- __Matplotlib__ plotting https://matplotlib.org/

__Material:__ *Disclaimer*: The analysis presented here is directly inspired by the following references:

[1] ProPublica, *“Machine Bias,”* https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing, May 2016.

[2] A. Chouldechova. *"Fair prediction with disparate impact: A study of bias in recidivism prediction instruments."* arXiv preprint arXiv:1703.00056 (2017).

[3] F. P. Calmon, D. Wei, K. Natesan Ramamurthy, and K. R. Varshney, *“Optimized Data Pre- Processing for Discrimination Prevention,”* arXiv preprint arXiv:1704.03354 (2017)

In [None]:
%load code/tools.py

In [None]:
%load code/loadcompas

Here is an explanation of the data:

* `age`: defendant's age
* `c_charge_degree`: degree charged (Misdemeanor of Felony)
* `race`: defendant's race
* `age_cat`: defendant's age quantized in "less than 25", "25-45", or "over 45"
* `score_text`: COMPAS score: 'low'(1 to 5), 'medium' (5 to 7), and 'high' (8 to 10).
* `sex`: defendant's gender
* `priors_count`: number of prior charges
* `days_b_screening_arrest`: number of days between charge date and arrest where defendant was screened for compas score
* `decile_score`: COMPAS score from 1 to 10
* `is_recid`: if the defendant recidivized
* `two_year_recid`: if the defendant within two years
* `c_jail_in`: date defendant was imprisoned
* `c_jail_out`: date defendant was released from jail
* `length_of_stay`: length of jail stay

Next we look at the first few rows of the dataset

Since we want to look at race, we first look at the counts for each

In [None]:
%load code/filter

We can look at the scores, quantized and how they interact as well

In [None]:
%load code/quant.py

In [None]:
%load code/recidcorr

The correlation is not that high. Let's measure the disparate impact of the quantized COMPAS score ($\leq4$ is low, everything else is high) according to the EEOC rule that the values with "high" for each protected group should be within 80% of each other. Of course, the interpertation here is not the same, but it's a good starting point.

reference: https://en.wikipedia.org/wiki/Disparate_impact#The_80.25_rule

In [None]:
%load code/scoremeans.py

In [None]:
%load code/diff.py

In [None]:
%load code/recidmeans.py

There is a difference in recidivism, but not as high as assigned by the COMPAS scores.

Now let's measure the difference in scores when we consider both the COMPAS output at true recidivism.

We will consider a few different metrics. Further explaination can be found in North Point's response to the ProPublica article, and also in Alexandra Chouldechova’s paper (listed above). The link for it is https://assets.documentcloud.org/documents/2998391/ProPublica-Commentary-Final-070616.pdf . The discussion on error rates and calibration also appear in both. 

In [None]:
%load code/normalize.py

For each group, the point in the ROC curve corresponds to a $$(\mbox{false postive rate, true positive rate})$$ pair for a given threshold. In order to caputre the difference in error rates, we map the points to $$\left(\frac{\mbox{false postive rate Afr.-American}}{\mbox{false postive rate Cauc.}},s \right)$$
and similarly for *false negative* rates for different thersholds s.

In [None]:
%load code/fpr.py

The difference is once again stark. This graph is particlarly concerning due to the significantly higher false positive rates for African Americans across all thresholds.

# What other diffrences are there?

In [None]:
%load code/decilesdist.py

In [None]:
%load code/decileplot.py

In [None]:
%load code/priors_dist.py