## SLU17 - Ethics & Fairness - Exercise notebook

In [None]:
import warnings
warnings.filterwarnings('ignore')
import hashlib

import matplotlib.pyplot as plt
plt.style.use('seaborn-dark')
%matplotlib inline
plt.rcParams["figure.figsize"]=(3.5,3.5)

import pandas as pd
from sklearn.metrics import confusion_matrix
import numpy as np

from utils.utils import make_data

## Criminal justice bias

Exercise adaptated from the book [Fairness and Machine Learning by Solon Barocas, Moritz Hardt, and Arvind Narayanan](https://fairmlbook.org/pdf/fairmlbook.pdf).

Based on the ProPublica's article [Machine Bias](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing) about a proprietary risk score algorithm, called COMPAS, used in the US.

This is the problem setting:

> Risk assessment is an important component of the criminal justice system. In the United States, judges set bail and decide pre-trial detention based on their assessment of the risk that a released defendant would fail to appear at trial or cause harm to the public.

These scores are intended to assess the risk that a defendant will re-offend, a task often called **recidivism prediction**.

We’ll use data obtained and released by ProPublica.

In [None]:
data = make_data()
data.head()

## Exercise 1 - Score distribution for Black and White defendants

In this exercise, we will compare the predicted `decile_score` distributions for different races. This is the score predicted by the COMPAS algorithm. High score means a high risk of reoffending. 

In the three cells below, plot the histograms of the `decile_score` for all defendants, for the Black defendants - `African-American` race, and for White defendants - `Caucasian` race. Suggestion: use the `histtype='step'` option.

In [None]:
# Plot here the decile_score histogram for all defendants.
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Plot here the decile_score histogram for the Black defendants.
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Plot here the decile_score histogram for the White defendants.
# YOUR CODE HERE
raise NotImplementedError()

### Interpretation

Based on these plots, what would you conclude from the distributions? Uncomment the correct answer.

In [None]:
# hypothesis_1 = 'The distributions of the scores are similar for the Black and White populations.'
# hypothesis_1 = 'Scores for White defendants are skewed toward lower-risk categories.'
# hypothesis_1 = 'Scores for Black defendants are skewed toward lower-risk categories.'

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert hashlib.sha256(str(hypothesis_1).encode('utf-8')).hexdigest() == 'c9502425519f2a2ab1389f934ac42ccff095b92497a25ea8067e1eaffa704fbd', 'Not correct.'

## Exercise 2 - Error rates

In this exercise, we will compare the predicted `decile_score` of people who have reoffended within a two-year period across the races. The reoffension is indicated in the column `two_year_recid`, 1 means reoffension.

In the three cells below, plot the histograms of the `decile_score` for all reoffenders, for the Black reoffenders - `African-American` race, and for White reoffenders - `Caucasian` race. Suggestion: use the `histtype='step'` option.

In [None]:
# Plot here the decile_score histogram for all reoffenders.
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Plot here the decile_score histogram for Black reoffenders.
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Plot here the decile_score histogram for White reoffenders.
# YOUR CODE HERE
raise NotImplementedError()

## Interpretation

Overall, the risk score doesn't appear to be particularly good at separating recidivists. The resulting histogram for all reoffenders resembles a uniform distribution.

Based on these plots, uncomment the correct answer. (Remember, these histograms report to **actual recidivists**.) 

In [None]:
# hypothesis_2 = 'The distribution of recidivists scores is similar for both races groups.'
# hypothesis_2 = 'Scores for White recidivists are skewed toward lower-risk categories.'
# hypothesis_2 = 'Scores for Black recidivists are skewed toward lower-risk categories.'

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert hashlib.sha256(str(hypothesis_2).encode('utf-8')).hexdigest() == 'c0c2fcf934d80486aa2f4a469c5ddaf8ee08784a4b37595eb64acd8039b217b7', 'Not correct.'

## Exercise 3 - When predictions fail differently

Defendants with `decile_score` higher than 3 are classified as being at high-risk of recidivism. In this exercise, we will compare the predictions for high-risk of recidivism across the races.

In the cells below, calculate the false positive rate (FPR) for predictions of high-risk of recidivism for all people, for Black people - `African-American` race, and for White people - `Caucasian` race.

Recall that the false positive rate, also known as the probability of false alarm, is given by:

$$FPR = \frac{FP}{FP + TN} = \frac{FP}{N}$$

Where $FP$ is the number of false positives, $TN$ is the number of true negatives, and $N$ the total number of negatives. You can use the `confusion_matrix` function to calculate the true/false positives/negatives.

In [None]:
# Calculate here the FPR for all people predicted at high-risk of recidivism.
# fpr = ...
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Calculate here the FPR for Black people predicted at high-risk of recidivism.
# fpr_b = ...
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Calculate here the FPR for all White predicted at high-risk of recidivism.
# fpr_w = ...
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert np.isclose(fpr, 0.41, atol=0.01)
assert np.isclose(fpr_b, 0.54, atol=0.01)
assert np.isclose(fpr_w, 0.33, atol=0.01)

In [None]:
print(f'Overall FPR is {round(fpr,2)}, FPR for Black people is {round(fpr_b,2)}, and FPR for White people is {round(fpr_w,2)}.')

Clearly, Black people are disproportionately more often falsely predicted to be at high-risk of recidivism than White people.