# Fairness and Bias in Machine Learning

![image info](https://static.propublica.org/projects/algorithmic-bias/assets/img/generated/opener-b-crop-1200*675-00796e.jpg)


## Data analysis on the ProPublica Dataset

The goal of this exercise is to have you interact with the **COMPAS dataset**, to clean the dataset for analysis, extract insight, visualize findings, and replicate a part of the **ProPublica's analysis**.

## Loading and surveying the data

* Load the dataset `compas-scores-two-years.csv` and select the columns below.

### Columns of Interest:
* `age` - Age of the defendant. It is numeric.
* `age_cat` - Category of Age. It can be < 25, 25-45, >45.
* `sex` - Sex of the defendant. It is either 'Male' or 'Female'
* `race` - Race of the defendant. It can be 'African-American', 'Caucasian', 'Hispanic', 'Asian', or 'Other'.
* `c_charge_degree` - Degree of the crime. It is either M (Misdemeanor), F (Felony), or O (not causing jail time).
* `priors_count` - Count of prior crimes committed by the defendant. It is numeric.
* `days_b_screening_arrest` - Days between the arrest and COMPAS screening.
* `decile_score` - The COMPAS score predicted by the system. It is between 0-10.
* `score_text` - Category of decile score. It can be Low (1-4), Medium (5-7), and High (8-10).
* `is_recid` - A variable to indicate if recidivism was done by the defendant. It can be 0, 1, -1.
* `two_year_recid` - A variable to indicate if recidivism was done by the defendant within two years.
* `c_jail_in` - Time when the defendant was jailed.
* `c_jail_out` - Time when the defendant was released from the jail.

In [None]:
## WRITE YOUR CODE HERE

## Data Cleaning
Now that we have surveyed the dataset, let's look into cleaning the data. This data-cleaning is largely based off of ProPublica's methods:
1. We only focus on cases where the COMPAS scored crime happened within +/- 30 days from when the person was arrested (if the value is missing, the record should be removed). 
2. Then, we also get rid of cases where is_recid is -1 since we only want binary values for the purpose of our analysis (0 for no recidivism, 1 for yes recidivism). 
3. Finally, we don't want the c_charge_degree to be "O" which denotes ordinary traffic offenses (not as serious of a crime). 

The cleaned dataset has 6172 records and 13 features.

In [None]:
## WRITE YOUR CODE HERE

## Exploratory data Analysis

Let's study basic statistics of the dataset
* Frequency of different attributes (such as race, age, sex, decile score)
* General descriptive statistics of the dataset

In [None]:
## WRITE YOUR CODE HERE

## Bias Analysis

* Study the distribution of the recidivism score `decile_score` for different categories: does recidivism have the same distribution for different races? For different genders? 
    
* If it is not distributed in the same way, which biases do you identify in the input dataset that can lead to different distributions? 
    * List the ones you think are present and explain why.

In [None]:
## WRITE YOUR CODE HERE

## Replicating ProPublica Analysis

Propublica used the *COMPAS scores* to predict recidivism if the score was >=5 and no recidivism if the score was < 5.

This is not a complete analysis since it solely uses the decile score and does a hard thresholding for prediction, discarding all other aspects of individuals. But let's reproduce it anyway:

>
> * Compute the thresholded version of predicted recividism call it `predicted_recid`.
> * Compute the accuracy of the COMPAS model
> * Compute and visualize the confusion matrix for each of the races (i.e. TP, TN, FP, FN)
>     * **Hint:** you can visualize a confusion matrix as a [seaborn heatmap](https://seaborn.pydata.org/generated/seaborn.heatmap.html)
> * Compute and visualize the mean recidivism score, the false positive rate and false negative rate for each of the races
> * What do you conclude?

In [None]:
## WRITE YOUR CODE HERE

## Debiasing

We are now going to implement the function for **Statistical Parity** and apply it to the results of the COMPAS model (i.e. on the predicted recidivism `predicted_recid`). For this exercise, we are going to focus only on the *African-American* and *Caucasian* groups as they have more data samples in our data (however, you can repeat the steps for other demographics too!).
>
> * Select African-American and Caucasian samples from the data.
> * Compute the probability of predicting recidivism (`predicted_recid==1`) for the two populations, i.e. $P_{AA}$ for African-American and $P_{CA}$ for Caucasian. What do you observe?
> * To debias the output you need to:
>    * Compute the threshold $th = 1 - \frac{P_{CA}}{P_{AA}}$;
>    * randomly flip positive predictions to negative (1s to 0s), i.e. pick a random number $n\in\left[0,1\right)$ from a uniform distribution and flip a positive prediction if $n < th$.
> * Recompute the probabilities $P_{CA}$ and $P_{AA}$ (only now they are corrected). What do you observe?
> * Plot the confusion matrices with the corrected values. What do you observe?
> * Explain what could be the issue with this method.

In [None]:
## WRITE YOUR CODE HERE

## Optional (if you are courious and finish early)

To encourage transparent model use and reporting Mitchell et al. designed a framework called Model Cards (see paper in resources). Model Cards are used to evaluate a model from different angles and keep into account its purpose and ethical considerations.

The paper shows a summary of a model card and provide details on how to use them. Try to go through the paper and rethink the COMPAS model in terms of the aspects listed in the model card. 

If you do not care about COMPAS, perhaps you could think about using model cards (or part of them) in your group project or in future projects!

## References
- https://github.com/propublica/compas-analysis/
- https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm