# **Counterfactual Fairness Replication** 
### By Potluri, Kai, Mingfei, and Ollie

There are many machine learning algorithms that can and do automate decisions for complex issues such as receiving a loan, insurance pricing, predictive policing, prison sentences, etc. This  paper, by Kusner, Loftus, Russell, and Silva, seeks to develop a way to address whether a prediction is fair using tools from causal inference. Through causal modeling we can see how and where this unfairness is happening. Because algorithms use observed data to make predictions, data that may contain historical bias, (ex. racially biased police stop and frisk policies sexist/racist/ageist hiring policies) they can often result in unfair policy decisions. 

To understand how to evaluate fairness we must define what it means. Below are a definitions presented by the authors: 

- **Fairness through unawareness** – an algorithm does not use any protected attributes. If we have observations of a person’s race or sex we just ignore that information

- **Individual Fairness** – an algorithm gives similar predictions to similar individuals

- **Demographic Parity** – an algorithm gives the same prediction no matter what the value of the given predictor is

- **Equality of Opportunity** – an algorithm gives the same prediction given the outcome regardless of the protected attribute

- **Counterfactual Fairness** – Had any individual been of a different race, sex, etc. the prediction would not change

The authors believe that counterfactual fairness is the correct definition to use because we can make comparisons on an individual level. We are asking what would have happened if this same person was of a different race or sex.

Below is a simple example from the paper to show how counterfactual awareness works. We are looking to predict the accident rate of an individual given we have observations on some protected attribute and the color of their car. See below for the complete model.

- **A Protected Attribute** - any attribute that should not be discriminated 
- **X Observable attribute** – Red Car
- **Y outcome of interest** – Accident Rate
- **U latent variable** – Aggressive Driving


    
![image alt ><](project_DAG_2.png)
    


In this example, some group A is more likely than other groups to drive a red car but are not more likely to get into an accident. However, people who are more likely to be aggressive drivers like to drive red cars as well. We are explicitly modeling a discriminatory effect as a causal effect. If we were to predict who would have a high accident rate Y by using just X we would have a counterfactually unfair prediction because we know that individuals of the protected attribute like to drive red cars more than other groups even though they don’t get into more accidents.

One of the main implications of counterfactual fairness is that a prediction will be counterfactually fair if it is a function of the non-descendants of A. In the example above we could not use the observable attribute red car to make a counterfactually fair prediction as any change in A would cause a change in the observed variable, red car. Instead we would want to try to infer the latent variable U.

This is not an if and only if statement however, it is possible for a prediction to be counterfactually fair if it is a function of a descendent of the protected attribute A. However, this is only true though if the dependence of the prediction on the protected attribute disappears in the function.

Another benefit of using causal inference to evaluate fairness is the ability to deal with historical bias. An example of this given in the paper is of whether a person defaults on a loan. A protected attribute may cause a person to default on a loan, but only through some past discrimination. The protected attribute for example may be mediated by employment which in turn is caused by the latent variable, prejudice. Prejudice could mean that the hiring process was unfair towards certain groups. This causes a person to be less likely to be employed and in turn increases the likelihood of default. Hence Y, a descendent of A, has hsitorical discrimination and should not be used in the prediction of $\hat{Y}$.

We want to use variables that are not caused by A, the protected attribute, but are predictive of Y. Below is the algorithm given in the text.

### Fair Learning Algorithm

![](FairLearning Algorithm.png)

Essentially what the algorithm is saying is, given the observed values of A and X, extract the values of the latent variable U and then use those values in the predictive model. 

There are three different levels of assumptions that make counterfactual fairness possible. The higher levels (3 being the highest) require stronger assumptions

- Level 1 – Determine $\hat{Y}$ from observable non-descendents of A
- Level 2 – Determine $\hat{Y}$ through some non-deterministic latent variable 
- Level 3 – Determine $\hat{Y}$ through some deterministic latent variables (not discussed in great detail) 

The authors recommend always modeling the causal structure to reach the goal of counterfactual fairness. They state, “we are essentially learning a projection of T into the space of fair decisions, removing historical biases as a by-product ”. (p. 5) Modeling in this way does not always give the most accurate predictions, however this is not the goal of the procedure. The goal is to accurately model the real world including social bias's that may arise and to use causal modeling tools to address algorithmic unfariness. The following is another example of this in practice.

## **Law School Success** 

![](project_dag.png)
