<font color="#de3023"><h1><b>REMINDER MAKE A COPY OF THIS NOTEBOOK, DO NOT EDIT</b></h1></font>

# Part 2: Fairness
*by Trenton Chang and Lindsay Sanneman*

In the second of this series of notebooks, we'll be exploring various definitions of fairness in AI. We'll continue exploring the COMPAS dataset. In the last notebook, we created a machine learning algorithm that seemed to impact certain groups disproportionately. Can we analyze the fairness of this model more formally, and use those insights to create a better model?

Again, we'll assume basic familiarity with Python, basic machine learning concepts, as well as (very) basic knowledge of Pandas (a useful DataFrame processing/data analysis tool). As you go through this notebook, please feel free to add cells for your exploration; however, do keep all the original cells.

This series will cover these general topics:

1. Building a competitive model on the COMPAS dataset. 
2. Analyzing bias and fairness on the COMPAS dataset.
3. Exploring and justifying fair models on the COMPAS dataset; discussing fairness in machine learning beyond COMPAS.

## Learning Goals

* Learn about probabilistic/mathematical definitions of fairness, and be comfortable evaluating models on these criterion.
* Gain skills in commenting on model bias from a statistical and a qualitative standpoint.
* Learn topics in model explainability, and gain skills in analyzing models retrospectively.

## Setup

Let's download the COMPAS dataset. 

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

Run the below cell to download the dataset, in case it's been a while since you've worked on Notebook 1.

In [None]:
!wget 'https://storage.googleapis.com/inspirit-ai-data-bucket-1/Data/AI%20Scholars/Sessions%206%20-%2010%20(Projects)/Projects%20-%20AI%20and%20Ethics%20-%20Criminal%20Justice/compas-scores-two-years.csv'

--2021-11-07 21:03:44--  https://storage.googleapis.com/inspirit-ai-data-bucket-1/Data/AI%20Scholars/Sessions%206%20-%2010%20(Projects)/Projects%20-%20AI%20and%20Ethics%20-%20Criminal%20Justice/compas-scores-two-years.csv
Resolving storage.googleapis.com (storage.googleapis.com)... 173.194.215.128, 108.177.12.128, 172.217.193.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|173.194.215.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2546487 (2.4M) [text/csv]
Saving to: ‘compas-scores-two-years.csv’


2021-11-07 21:03:44 (52.5 MB/s) - ‘compas-scores-two-years.csv’ saved [2546487/2546487]



To make sure the download was successful, you should be able to see a file along the lines of `compas-scores-two-years.csv` in the File Explorer (folder-symbol on the left sidebar of the Colab interface).

Let's load the data to make sure that everything is in order.


In [None]:
data = pd.read_csv("compas-scores-two-years.csv", header=0)
data.head(n=20)

Unnamed: 0,id,name,first,last,compas_screening_date,sex,dob,age,age_cat,race,juv_fel_count,decile_score,juv_misd_count,juv_other_count,priors_count,days_b_screening_arrest,c_jail_in,c_jail_out,c_case_number,c_offense_date,c_arrest_date,c_days_from_compas,c_charge_degree,c_charge_desc,is_recid,r_case_number,r_charge_degree,r_days_from_arrest,r_offense_date,r_charge_desc,r_jail_in,r_jail_out,violent_recid,is_violent_recid,vr_case_number,vr_charge_degree,vr_offense_date,vr_charge_desc,type_of_assessment,decile_score.1,score_text,screening_date,v_type_of_assessment,v_decile_score,v_score_text,v_screening_date,in_custody,out_custody,priors_count.1,start,end,event,two_year_recid
0,1,miguel hernandez,miguel,hernandez,2013-08-14,Male,1947-04-18,69,Greater than 45,Other,0,1,0,0,0,-1.0,2013-08-13 06:03:42,2013-08-14 05:41:20,13011352CF10A,2013-08-13,,1.0,F,Aggravated Assault w/Firearm,0,,,,,,,,,0,,,,,Risk of Recidivism,1,Low,2013-08-14,Risk of Violence,1,Low,2013-08-14,2014-07-07,2014-07-14,0,0,327,0,0
1,3,kevon dixon,kevon,dixon,2013-01-27,Male,1982-01-22,34,25 - 45,African-American,0,3,0,0,0,-1.0,2013-01-26 03:45:27,2013-02-05 05:36:53,13001275CF10A,2013-01-26,,1.0,F,Felony Battery w/Prior Convict,1,13009779CF10A,(F3),,2013-07-05,Felony Battery (Dom Strang),,,,1,13009779CF10A,(F3),2013-07-05,Felony Battery (Dom Strang),Risk of Recidivism,3,Low,2013-01-27,Risk of Violence,1,Low,2013-01-27,2013-01-26,2013-02-05,0,9,159,1,1
2,4,ed philo,ed,philo,2013-04-14,Male,1991-05-14,24,Less than 25,African-American,0,4,0,1,4,-1.0,2013-04-13 04:58:34,2013-04-14 07:02:04,13005330CF10A,2013-04-13,,1.0,F,Possession of Cocaine,1,13011511MM10A,(M1),0.0,2013-06-16,Driving Under The Influence,2013-06-16,2013-06-16,,0,,,,,Risk of Recidivism,4,Low,2013-04-14,Risk of Violence,3,Low,2013-04-14,2013-06-16,2013-06-16,4,0,63,0,1
3,5,marcu brown,marcu,brown,2013-01-13,Male,1993-01-21,23,Less than 25,African-American,0,8,1,0,1,,,,13000570CF10A,2013-01-12,,1.0,F,Possession of Cannabis,0,,,,,,,,,0,,,,,Risk of Recidivism,8,High,2013-01-13,Risk of Violence,6,Medium,2013-01-13,,,1,0,1174,0,0
4,6,bouthy pierrelouis,bouthy,pierrelouis,2013-03-26,Male,1973-01-22,43,25 - 45,Other,0,1,0,0,2,,,,12014130CF10A,,2013-01-09,76.0,F,arrest case no charge,0,,,,,,,,,0,,,,,Risk of Recidivism,1,Low,2013-03-26,Risk of Violence,1,Low,2013-03-26,,,2,0,1102,0,0
5,7,marsha miles,marsha,miles,2013-11-30,Male,1971-08-22,44,25 - 45,Other,0,1,0,0,0,0.0,2013-11-30 04:50:18,2013-12-01 12:28:56,13022355MM10A,2013-11-30,,0.0,M,Battery,0,,,,,,,,,0,,,,,Risk of Recidivism,1,Low,2013-11-30,Risk of Violence,1,Low,2013-11-30,2013-11-30,2013-12-01,0,1,853,0,0
6,8,edward riddle,edward,riddle,2014-02-19,Male,1974-07-23,41,25 - 45,Caucasian,0,6,0,0,14,-1.0,2014-02-18 05:08:24,2014-02-24 12:18:30,14002304CF10A,2014-02-18,,1.0,F,Possession Burglary Tools,1,14004485CF10A,(F2),0.0,2014-03-31,Poss of Firearm by Convic Felo,2014-03-31,2014-04-18,,0,,,,,Risk of Recidivism,6,Medium,2014-02-19,Risk of Violence,2,Low,2014-02-19,2014-03-31,2014-04-18,14,5,40,1,1
7,9,steven stewart,steven,stewart,2013-08-30,Male,1973-02-25,43,25 - 45,Other,0,4,0,0,3,-1.0,2013-08-29 08:55:23,2013-08-30 08:42:13,13012216CF10A,,2013-08-29,1.0,F,arrest case no charge,0,,,,,,,,,0,,,,,Risk of Recidivism,4,Low,2013-08-30,Risk of Violence,3,Low,2013-08-30,2014-05-22,2014-06-03,3,0,265,0,0
8,10,elizabeth thieme,elizabeth,thieme,2014-03-16,Female,1976-06-03,39,25 - 45,Caucasian,0,1,0,0,0,-1.0,2014-03-15 05:35:34,2014-03-18 04:28:46,14004524MM10A,2014-03-15,,1.0,M,Battery,0,,,,,,,,,0,,,,,Risk of Recidivism,1,Low,2014-03-16,Risk of Violence,1,Low,2014-03-16,2014-03-15,2014-03-18,0,2,747,0,0
9,13,bo bradac,bo,bradac,2013-11-04,Male,1994-06-10,21,Less than 25,Caucasian,0,3,0,0,1,428.0,2015-01-06 03:55:34,2015-01-07 03:38:44,13000017CF10A,2012-12-31,,308.0,F,Insurance Fraud,1,15002891MM10A,(M1),0.0,2015-01-06,Battery,2015-01-06,2015-01-07,,1,15000258CF10A,(F2),2015-01-06,Aggrav Battery w/Deadly Weapon,Risk of Recidivism,3,Low,2013-11-04,Risk of Violence,5,Medium,2013-11-04,2015-01-06,2015-01-07,1,0,428,1,1


Again, we filter out columns that aren't relevant/redundant to our analysis (e.g. name, case number, date of arrest), and make the column names more intuitive.

Column descriptions:

* `sex`: Male or female.
* `age`: Age.
* `age_category`: Age group; i.e. under 25, 25 to 45, or over 45.
* `race`: Categorical variable; takes on values `African-American`,  `Caucasian`, `Asian`, `Native American`, `Hispanic`, and `Other`.
* `juvenile_felony_count`: Number of times defendant has previously been convicted of a juvenile felony.
* `juvenile_misdemeanor_count`: Number of times defendant has previously been convicted of a juvenile misdemeanor.
* `juvenile_other_count`: Number of times defendant has been convicted of another juvenile charge.
* `prior_convictions`: Number of prior convictions for a defendant.
* `current_charge`: Felony, misdemeanor, or other (`F`, `M`, or `O`).
* `charge_description`: The charge for which they were arrested, and took a risk assessment.
* `recidivated_last_two_years`: Recidivated within two years of completing the assessment. This variable is the focus of our task, and what we'll be predicting.

In [None]:
df = data.drop(labels=['id', 'name', 'first', 'last', 'compas_screening_date', 'dob', 'days_b_screening_arrest', 
                         'c_jail_in', 'c_jail_out', 'c_case_number', 'c_offense_date', 'c_arrest_date', 'c_days_from_compas',
                         'r_case_number', 'r_charge_degree', 'r_days_from_arrest', 'r_offense_date', 'r_charge_desc', 
                         'r_jail_in', 'r_jail_out', 'vr_case_number', 'vr_charge_degree', 'vr_offense_date', 'decile_score.1',
                         'violent_recid', 'vr_charge_desc', 'in_custody', 'out_custody', 'priors_count.1', 'start', 'end', 
                         'v_screening_date', 'event', 'type_of_assessment', 'v_type_of_assessment', 'screening_date',
                         'score_text', 'v_score_text', 'v_decile_score', 'decile_score', 'is_recid', 'is_violent_recid'], axis=1)
df.columns = ['sex', 'age', 'age_category', 'race', 'juvenile_felony_count', 'juvenile_misdemeanor_count', 'juvenile_other_count', 
              'prior_convictions', 'current_charge', 'charge_description', 'recidivated_last_two_years']
df.head()

Unnamed: 0,sex,age,age_category,race,juvenile_felony_count,juvenile_misdemeanor_count,juvenile_other_count,prior_convictions,current_charge,charge_description,recidivated_last_two_years
0,Male,69,Greater than 45,Other,0,0,0,0,F,Aggravated Assault w/Firearm,0
1,Male,34,25 - 45,African-American,0,0,0,0,F,Felony Battery w/Prior Convict,1
2,Male,24,Less than 25,African-American,0,0,1,4,F,Possession of Cocaine,1
3,Male,23,Less than 25,African-American,0,1,0,1,F,Possession of Cannabis,0
4,Male,43,25 - 45,Other,0,0,0,2,F,arrest case no charge,0


We also one-hot encode the textual data.

In [None]:
value_counts = df['charge_description'].value_counts() 
df = df[df['charge_description'].isin(value_counts[value_counts >= 70].index)].reset_index(drop=True) # drop rare charges
for colname in df.select_dtypes(include='object').columns: # use get_dummies repeatedly one-hot encode categorical columns
  one_hot = pd.get_dummies(df[colname])
  df = df.drop(colname, axis=1)
  df = df.join(one_hot)
df

Unnamed: 0,age,juvenile_felony_count,juvenile_misdemeanor_count,juvenile_other_count,prior_convictions,recidivated_last_two_years,Female,Male,25 - 45,Greater than 45,Less than 25,African-American,Asian,Caucasian,Hispanic,Native American,Other,F,M,Battery,Burglary Conveyance Unoccup,Burglary Unoccupied Dwelling,DUI Property Damage/Injury,Driving Under The Influence,Driving While License Revoked,Felony Battery (Dom Strang),Felony Driving While Lic Suspd,Grand Theft (Motor Vehicle),Grand Theft in the 3rd Degree,Pos Cannabis W/Intent Sel/Del,"Poss3,4 Methylenedioxymethcath",Possess Cannabis/20 Grams Or Less,Possession of Cannabis,Possession of Cocaine,arrest case no charge
0,24,0,0,1,4,1,0,1,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
1,23,0,1,0,1,0,0,1,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
2,43,0,0,0,2,0,0,1,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,44,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,43,0,0,0,3,0,0,1,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4392,30,0,0,0,2,0,0,1,1,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4393,23,0,2,1,5,1,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4394,21,0,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4395,30,0,0,0,0,1,0,1,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0


For your convenience, I've created per-group splits of the data that will be useful when you analyze model fairness across these two groups.

In [None]:
y_column = 'recidivated_last_two_years'
X_all, y_all = df.drop(y_column, axis=1), df[y_column]
X_train, X_test, y_train, y_test = train_test_split(X_all, y_all, test_size=0.3)

X_caucasian = X_test[X_test['Caucasian'] == 1]
y_caucasian = y_test[X_test['Caucasian'] == 1]
X_african_american = X_test[X_test['African-American'] == 1]
y_african_american = y_test[X_test['African-American'] == 1]

# Question 1: Fairness

What is fairness? How do we define it formally? How can an AI system be understood as fair?

To explore this topic, we will be using the analysis of 20 definitions of fairness analyzed in [Verma and Rubin 2018](https://fairware.cs.umass.edu/papers/Verma.pdf), a meta-analysis of various mathematical definitions of fairness previously proposed by researchers in the field. Familiarity with various statistical metrics in confusion matrices will be helpful here (false positive rate, false negative rate, etc.). Feel free to consult the [Wikipedia page](https://en.wikipedia.org/wiki/Confusion_matrix) for a refresher.

Before you begin, answer this preliminary question:

**Question 1.0.** What do you think fairness means, personally? Note the columns in the COMPAS dataset. Do you think it is "fair" to judge defendants based on these attributes? 

In [None]:
# Question 1.0 (any # of lines!)


In [None]:
#@title Instructor Solution
"""
  Actual answers may vary. Sample answer should not be taken to reflect the author's actual beliefs.

  To me, fairness in the justice system means equal standards applied to everyone, independent
  of any personal or identifying characteristics. It means speedy and thorough due process for
  all people and presumption of innocence.
"""

Let's retrain our model from the previous notebook. 

In [None]:
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
print("Training accuracy:", model.score(X_train, y_train))
print("Testing accuracy:", model.score(X_test, y_test))

Training accuracy: 0.6977575560610985
Testing accuracy: 0.6742424242424242


Now, we'll analyze the fairness of this model according to various metrics. For the purposes of this notebook, we'll use the following notation:

* $X$: all input features
* $R$: "protected" features for which we want the model to be fair; i.e. race, gender, age, etc. Note these features are a subset of $X$.
* $y$: the ground-truth label.
* $\hat{y}$: the model's predicted label.

We will specifically analyze racial bias<sup>1</sup> in this model; hence $R$ is either `African-American` or `Caucasian`, which we abbreviate to $a$ and $c$, respectively. Consonant with U.S. legal terminology, we use the phrase "protected attribute" to generally discuss characteristics upon which we do not want to base our decision, e.g. gender, age, or in this case, race.

As a refresher, recall as well the following probability notation:

* $P(A)$: the probability of event $A$, $0 \leq P(A)\leq 1$
* $P(A \cap B)$ or $P(A \land B)$: the probability of events $A$ and $B$ co-occuring
* $P(A \cup B)$ or $P(A \lor B)$: the probability of events $A$ or $B$ occurring (either one, or both)
* $P(A \mid B)$: the probability of event $A$ given event $B$; $P(A \cap B) = P(A \mid B)P(B)$.

If you want to brush up some more, feel free to *skim* Sections 2 and 3 of the  "Probability for Computer Scientists" [course reader](https://web.stanford.edu/class/archive/cs/cs109/cs109.1198/courseReader.html), the introductory probability course on the computer science core curriculum at Stanford. Those two sections are *much more than enough* for this notebook, but will cover everything you need to know.

Now we're ready to analyze model fairness formally!

<sup>1</sup> Here we're using *bias* in the societal sense; i.e. in terms of inter-group unfairness/prejudice, which is completely different from the statistical sense of bias. If this distinction means nothing to you, don't worry; if it does, hopefully this clarifies what we mean.

## Definition One: Group Fairness

The definition of **group fairness** is very intuitive. A classifier satisfies **group fairness** if all values in the support of $P$ have equal probability of being assigned to the positive class. That is, each group is equally likely to be predicted as the positive class in a binary classification problem. **Group fairness** is also called **statistical parity**, **equal acceptance rate**, and **benchmarking.**

**Question 1.1.** Fill in the blank. Feel free to type `y_hat` to represent $\hat{y}$.

The mathematical definition of group fairness is: $$P(\_\_=1 \mid \_\_=\_\_) = P(\_\_=1\mid \_\_=\_\_)$$

**Question 1.2.** Is the above model fair on the testing data? Use a threshold of 25% to determine fairness (i.e. the probabilities must be within 25% of one another), and briefly comment on your findings. Report the ratio between the two probabilities (larger to smaller) in your decision on an otherwise blank line at the bottom of the cell.

**Question 1.3.** What are the limitations of this definition? Why might this definition be appropriate for the justice system, in your opinion? Why might it be inappropriate?

In [None]:
# X: all input features
# R: "protected" features
# y: ground truth label
# y_hat: the models predicted label
# a = african american
# c = caucasian
# Question 1.1 (5 terms)
# P(y_hat=1|r=a)=P(y_hat=1|r=c)
# Question 1.2 (~4-8 lines)

# The ratio is: (~1 line/a float)

# Question 1.3


In [None]:
#@title Instructor Solution
# Question 1.1 (5 terms)
"""
  Answer
  
  y_hat, R, a, y_hat, R, c
"""
# Question 1.2 (~4-8 lines)
y_pred_caucasian = model.predict(X_caucasian)
print("Caucasian acceptance rate:", np.count_nonzero(y_pred_caucasian) / len(y_pred_caucasian))
y_pred_afam = model.predict(X_african_american)
print("African-American acceptance rate:", np.count_nonzero(y_pred_afam) / len(y_pred_afam))

"""
  Sample answer

  The model is unfair under the 25% threshold, because African-Americans are over two times as likely,
  i.e. over 100% as likely to be incarcerated under this model. This is well over the requisite
  threshold for fairness.
"""
# The ratio is: (~1 line/a float)
2.069
# Question 1.3
"""
  Sample answer. Should not be taken to reflect the author's personal beliefs.

  This definition is a good start, but considers all positive predictions equally without paying
  attention to false positives. On one hand, this model is in line with my belief that justice should
  be blind, and to that end, measuring equality of selection seems reasonable. On the other hand,
  the seriousness of incarceration makes false positives weigh more heavily for me than 
  false negatives, which makes me think that we should somehow incorporate that value judgment
  into our definition.
"""

Hints: 

* 1.1. Reword the definition as "a classifier satisfies group fairness when  *given* a subject belongs to a particular group (i.e. $R=a, R=c$), they have the same chance of being assigned to class 1."
* 1.2. The function `np.count_nonzero` might be useful here. 

## Definition 2: Calibration

Also known as **test-fairness** or **matching conditional frequencies**, a model is calibrated if given the model prediction and the protected attribute, the probability that the ground-truth label corresponds to the positive class should be equal across protected attributes.

**Question 1.4.** The mathematical definition of calibration is $$P(\_\_ \mid \_\_=k, \_\_) = P(\_\_ \mid \_\_=k, \_\_)$$ for any class $k$.

**Question 1.5.** Under the same threshold as Question 1.2, with all else the same, is the model fair with respect to each class? Briefly comment on your findings and report the ratio of probabilities on an otherwise blank line in the cell.

**Question 1.6.** What are the limitations of this definition? Is it the same or different from the previous definition? Do you have any hypotheses for why this might be the case?

In [None]:
# Question 1.4 (4 events, 2 random variables)

# Question 1.5 (~6 - 12 lines)

# The ratio is: (~1 line/a float)


In [None]:
#@title Instructor Solution
# Question 1.4 (4 events, 2 random variables)
"""
  Answer

  y=1, y_hat, R=a, y=1, y_hat, R=c
"""
# Question 1.5 (~6 - 12 lines)
y_pred_caucasian = model.predict(X_caucasian)
y_pred_afam = model.predict(X_african_american)
print("Class 0 calibration, Caucasian:", np.count_nonzero(1 - y_pred_caucasian[(y_caucasian == 1)]) / np.count_nonzero(1 - y_pred_caucasian))
print("Class 1 calibration, Caucasian:", np.count_nonzero(y_pred_caucasian[y_caucasian == 1]) / np.count_nonzero(y_pred_caucasian))
print("Class 0 calibration, African-American:", np.count_nonzero(1 - y_pred_afam[y_african_american == 1]) / np.count_nonzero(1 - y_pred_afam))
print("Class 1 calibration, African-American:", np.count_nonzero(y_pred_afam[y_african_american == 1]) / np.count_nonzero(y_pred_afam))

"""
  Sample answer.

  The model is fair under the calibration definition, since conditioned on the model's prediction and 
  the protected attribute, African-American defendants in the dataset are only approximately 1.1x
  times more likely to actually recidivate than Caucasian defendants, which, under the 1.2x threshold,
  is fair. 
"""
# The ratio is: (~1 line/a float)
1.12

Hints:

* 1.4. What is "given," i.e., what are we conditioning on?
* 1.5. Recall the definition of conditional probability: $P(A \mid B) = \frac{P(A \cap B)}{P(B)}$. You can use `np.count_nonzero` to count all the zeros by doing `np.count_nonzero(1 - a)` where `a` is the array of interest. This works since the array only contains 0s and 1s.

If you've done everything correctly up to this point, you will notice that one definition yields a fair classifier, yet the other does not. This is a surprising result -- and does nothing to ease the ethical quandary we're facing.

## Definition 3: Your Definition!

Here's where you come in. You have two choices:

* Select a fairness definition we haven't analyzed yet [from here](https://fairware.cs.umass.edu/papers/Verma.pdf),
* OR create your own (mathematical) definition that you want to test.

**Question 1.7.** Write down the probabilistic formulation of your fairness rule (whether selected from the paper or your own). Include a brief plain English description of what it is, and why you chose it.

**Question 1.8.** Test the fairness of the model on your chosen definition. Retain the 25% threshold, or alternately, choose a new threshold AND justify your choice of new threshold. Briefly comment on your findings.

**Question 1.9.**
Discuss what your new definition of threshold accomplishes that calibration and group fairness fail to do, and the limitations of your definition. 

**Question 1.10.**
Out of the three fairness criterion we've explored in this notebook, which one would you choose to evaluate a model, and why?


In [None]:
# Question 1.7.

# Question 1.8.

# Question 1.9. 

# Question 1.10.


It should be apparent by this point that these disparate definitions of fairness capture different nuances of the model predictions -- they're measuring entirely different things, and yield entirely different results. Thus, the point of this exercise is to get you thinking not only about the mathematical evaluations we can use to answer this ethical problem, but also the *various value systems and assumptions* encoded within each definition of fairness.

So it was probably not too easy to come up with a mathematically AND sociologically sound definition of fairness. In fact, it can be [mathematically proven](https://arxiv.org/pdf/1609.05807v1.pdf) that the two fairness criterion we've explored above are not simultaneously satisfiable except for two narrow cases: a classifier with perfect accuracy, or equal base rates. There is an inherent tradeoff between the two. That's pretty depressing -- is there anything we can do?

Clearly, we're quite far from perfect accuracy. But what if we thought about the base rates? For that, we'll have to take a deeper look at the data itself.

# Question 2: Explainability

Let's analyze this model from a different angle. Can we actually *explain*, from a human perspective, what features the model is looking at?

Another way we might think about whether a model is fair or not is by understanding how our classification model makes decisions. For example, if we can understand how our model maps inputs to its outputted classifications, we might be able to better assess if it is making decisions in a fair manner. 


Recently, the field of ***explainable AI (XAI)*** has explored how to make the complex decision-making models of machine learning algorithms more transparent and understandable to the humans who are interacting with them. Often times, as is likely the case with our recidivism algorithm, the people using the output of the algorithm for decision-making are not machine learning experts. XAI aims to help people understand why the algorithm came to the conclusion it did (in this case, why it predicted re-incarceration or not).

<!-- To Do:
*   Idea of "black box" versus "white box"
*   GDPR - Right to an explanation
*   Explainable AI field
*   Interpretability vs. Accuracy tradeoff -->

We will start by putting on different hats and imagining the perspectives of the different stakeholders who interact with our machine learning algorithm. What information do they need, and what information should we as the coders aim to provide?

**Question 2.1** Imagine you are a judge who needs to use the algorithm's prediction to decide on a sentence for the defendant in a court case. What might you want to know about how the algorithm made its prediction if you want to be as fair as possible while keeping the community safe?

**Question 2.2** Now imagine that you're the defendant, and the decision about your sentence is being influenced by the output of the algorithm. What information would you want the judge to know about how the algorithm made its prediction?

**Question 2.3** Finally, imagine you're the coder who wrote the algorithm that predicts recidivism (as you did in the last notebook!). How would you describe to the judge how your logistic regression algorithm works? 






---




In [None]:
# Question 2.1.

# Question 2.2.

# Question 2.3.


In [None]:
#@title Instructor Solution
# Question 2.1.
"""
  Sample answer

  As a judge, I want to be sure that the algorithm makes the best prediction possible. I don't want people to go to
  jail who should not go, and I also don't want to let people who might pose a risk to the community free. I would 
  want to know how well the algorithm has done at predicting recidivism in the past and what kinds of errors it made
  most often when it made a mistake. 
"""
# Question 2.2.
"""
  Sample answer

  As a defendant, I want the judge to know which factors influenced the decision the algorithm made. For example, if
  it was heavily influenced by my race or sex, I would want the judge to know and consider those potential biases. I
  would also want the judge to realize that the algorithm is making predictions based on distributional information, 
  and I am an individual case and might fall anywhere in the distribution. 

"""
# Question 2.3.
"""
  Sample answer

  The algorithm considers different input instances and gives an output that predicts whether a person is likely to be reincarcerated. 
  The algorithm has trained on a number of previous real cases, and it uses what it has learned to make decisions about new cases.
  The algorithm considers a number of features of the inputs in order to make its predications. Here, it considers factors such as the type of crime, a person's age, sex, and race, etc.
"""

Was it easy to describe your algorithm to the judge? I found it to be a little tricky. How might we explain what the algorithm was doing in a different way?

<!-- TODO: Add discussion of black box vs. white box. -->
You may have noticed that it's easier to understand and visualize some of the algorithms you've learned about so far than others. 

***White box*** models are models that have inherently understandable input-output relationships for which the inner-workings of the algorithm are transparent. For example, think of a decision tree, which can be represented as a set of decision rules. 

On the other hand, ***black box*** models are models for which the inner-workings are not as clear and for which there might be high non-linearities in the input-output relationships. 

***Question 2.4*** Think of all the algorithms you've learned about during our time together. Which ones do you think would be considered "black box", and which ones would be considered "white box"?

In [None]:
# Question 2.4.


In [None]:
#@title Instructor Solution
# Question 2.4.
"""
  Sample answer

  Examples of black box: random forests, neural networks

  Examples of white box: linear regression, logistic regression, SVMs, decision trees
"""

Researchers are working on techniques to explain both "black box" and "white box" algorithms to people in a clearer way. So far, we've worked with a number of both "white box" and "black box" algorithms. Here we're going to explore how readily explainable, or interpretable, these algorithms are. 

For "white box" algorithms, we can apply some basic explainability techniques to understand how the algorithm makes decisions. 

<!-- Luckily, in this project, so far we've used logistic regression, which is typically considered to be a "white box" technique, so we can use some basic explainability techniques to understand how our algorithm is making decisions. -->

<!-- Maybe discuss linear versus non-linear models. Check what they've learned in class. -->

<!-- TODO: Talk about feature importance. -->

One simple way of thinking about explainability  is called ***feature importance***. Feature importance techniques describe which features that the algorithm is using to learn have the most influence on the algorithm's decisions. 

Now we're going to look at which input features are most important in predicting recidivism for different models!

## Model 1: SVM

First, we will consider linear SVMs. For linear models, such as SVMs, we can simply consider the coefficients that are learned during training to understand which features have the most influence on the output. As a reminder, a linear SVM model learns which line splits up tbe data into separate classes the best. The line is defined by:
$$y = w^{T}x + b$$
where $w^{T}$ represents the coefficients associated with each feature $x$. If you haven't seen this math before, don't worry! For now, we can assume that the larger the coefficient is, the more influence the associated feature has on the result.

If you want to read more about the math behind SVMs, you can find more information [here](https://towardsdatascience.com/demystifying-maths-of-svm-13ccfe00091e).

First, we will train a linear SVM on our training data and use the learned coefficients to understand which factors of the input data had the greatest bearing on the prediction.

***Question 2.5*** Run the code below to train a linear SVM on our training data.

***Question 2.6*** Once you have trained your SVM, uncomment the plotting code in order to plot the coefficients, and we can explore which features have the most impact. A larger positive coefficient means that the corresponding feature influences the algorithm towards making a positive classification (recidivism), and a larger negative coeffient means that the corresponding feature influences the algorithm towards making a negative classification (no recidivism). 



<!-- TODO: Add discussion of coefficients. Add question about which features are the most important. Add question about whether any of these are protected. -->

In [None]:
# Question 2.5 
from sklearn import svm
from sklearn.svm import LinearSVC
pyplot.rcParams['figure.figsize'] = [15, 10]


#Create and train the SVM model
model_svm = svm.SVC(kernel='linear')
model_svm.fit(X_train, y_train)

#Print the training and test accuracies
print("SVM Training accuracy:", model_svm.score(X_train, y_train))
print("SVM Testing accuracy:", model_svm.score(X_test, y_test))



# Question 2.6.
#Get the model coefficients (feature importances)
importance_svm = model_svm.coef_[0]

#Plot the feature importances
features = X_all.columns
pyplot.xticks(rotation="vertical")
pyplot.gca().tick_params(axis='both', which='major', labelsize=20)

svm_importance_plot = pyplot.bar(features, importance_svm)
pyplot.xlabel("Feature", fontsize=20)
pyplot.ylabel("Coefficient Value", fontsize=20)
pyplot.show()


Which features did you find were most important? How do you feel about this result? Were any of them protected?

No matter what result you found, being able to understand how our algorithm works allows us to ask questions about whether we agree with how it is making decisions. This gives us another tool to be able to check if the algorithm is exhibiting a bias that we find to be problematic.

Now let's say that we hope to build a model with improved accuracy. Here, we will first try a random forest and then a neural network. What can we know about the inner workings of each of these models?

## Model 2: Random Forest

You don't need to fully understand how random forests work to implement one here, but if you would like to learn more, you can read about them [here](https://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python-77bf308a9b76).

***Question 2.7*** Fill in the code below to train a random forest on our training data. You can try tuning the maximum depth of each tree (fill in the blank in the code) in the forest to get the best accuracy possible. What training and test accuracies result, and how do these compare with the SVM?

In [None]:
# Question 2.7.
from sklearn.ensemble import RandomForestClassifier

#Create and train the random forest classifier
model_rf = RandomForestClassifier(max_depth=____) # fill this in
model_rf.fit(X_train, y_train)

#Print training and test accuracies
print("Random Forest Training accuracy:", model_rf.score(____, ____)) # fill this in
print("Random Forest Testing accuracy:", model_rf.score(____, ____)) # fill this in

In [None]:
#@title Instructor Solution
from sklearn.ensemble import RandomForestClassifier

#Create and train the random forest classifier
model_rf = RandomForestClassifier(max_depth=5)
model_rf.fit(X_train, y_train)

#Print training and test accuracies
print("Random Forest Training accuracy:", model_rf.score(X_train, y_train))
print("Random Forest Testing accuracy:", model_rf.score(X_test, y_test))

Were you able to get a better accuracy than the SVM? Hopefully you were able to do at least a little better.

Now let's see what we can understand about how our random forest is making decisions. It's not quite as straightforward to understand how the random forest is using our input data to come up with a classification as it is with the linear SVM. Luckily, the programmers who wrote the random forest class that we are using have defined an attribute called feature_importances_, which we can use to explore our model.

For a random forest, the feature importances are calculated based on a metric called *impurity*. You can read more on impurity here: [ADD LINK]

***Question 2.8*** Run the code below to find and plot the feature importances for our random forest. What do you find?

In [None]:
# Question 2.8.
#Get feature importances
rf_importances = model_rf.feature_importances_

#Plot feature importances
pyplot.xticks(rotation="vertical")
pyplot.gca().tick_params(axis='both', which='major', labelsize=20)
rf_importance_plot = pyplot.bar(features, rf_importances)
pyplot.xlabel("Feature", fontsize=20)
pyplot.ylabel("Coefficient Value", fontsize=20)
pyplot.show()

Now let's zoom in on the set of features associated with race in particular. Run the code below to look at that part of the plot more closely. What do you see?

In [None]:
#Create a list of importances associated with only the race-related variables
race_importances = rf_importances[10:16]
race_features = features[10:16]

#Plot feature importances related to race
pyplot.gca().tick_params(axis='both', which='major', labelsize=18)
race_importances_plot = pyplot.bar(race_features, race_importances)
pyplot.xlabel("Feature", fontsize=20)
pyplot.ylabel("Coefficient Value", fontsize=20)
pyplot.show()

Hm. Even though we have a higher accuracy with our new model, there seem to be some worrying trends in terms of which features the model considers to be important in making its classifications.

## Model 3: Neural Network

The final model we will consider is a neural netowrk. 

***Question 2.9*** Fill in the code below to train a neural netowrk on our training data. You can try tuning the number of layers and the size of the layers (left as blanks in the code below to start) to get the best accuracy possible, just as we have done in previous notebooks. What training and test accuracies can you get using a neural network? How do these compare to the other two models?

In [None]:
# Question 2.9.
from sklearn.neural_network import MLPClassifier

#Create and train the neural netowrk 
model_nn = MLPClassifier(hidden_layer_sizes=(____, ____, ____),random_state=1, max_iter=500) # fill this line in!
model_nn.fit(X_train, y_train)

#Print training and test accuracies
print("Neural Network Training accuracy:", model_nn.score(X_train, y_train))
print("Neural Network Testing accuracy:", model_nn.score(X_test, y_test))

In [None]:
#@title Instructor Solution
from sklearn.neural_network import MLPClassifier

#Create and train the neural netowrk 
model_nn = MLPClassifier(hidden_layer_sizes=(10,10,10),random_state=1, max_iter=500)
model_nn.fit(X_train, y_train)

#Print training and test accuracies
print("Neural Network Training accuracy:", model_nn.score(X_train, y_train))
print("Neural Network Testing accuracy:", model_nn.score(X_test, y_test))

Now, how can we go about understanding the neural network? We could try to look at the coefficients that the network has learned as we did with the linear SVM. But what do these coefficients mean in this case? It's less clear how they relate to the inputs.

Machine learning researchers refer to the dilemma we just encountered as the **interpretability-accuracy trade-off**. Oftentimes, higher-performing machine learning models (in terms of accuracy) are more difficult to directly interpret. This is usually because they are more complex and can model the data better, but in a way that humans can't easily visualize or understand.

Just as researchers devloped a way to define feature importance for random forests, researchers are working on way to understand neural networks better. So far, we do not have a clear and unified way of thinking about how to interpet neural networks (and many other machine learning techniques!). 

Now consider our algorithm again. The decision it is making influences whether someone goes to jail or not. Do you think it is more important to have an algorithm that is interpretable or accurate in this case? Would you be willing to give up the ability to easily understand what the algorithm is doing for better predictive performance?

# Conclusion

Well done! You've taken apart the biased COMPAS model we trained in the previous notebook and analyzed it along the lines of fairness and explainability. Hopefully, this has helped you develop a decent sense of how model bias works.

In the final notebook, we'll be using these insights to try and create a better model. We'll also encourage you to think about some other datasets in the wild and consider how biases might also creep into those models. 