# Introduction

In [2]:
import numpy as np
from sklearn.linear_model import LogisticRegression
from aif360.algorithms.inprocessing import PrejudiceRemover
from methods.data.individual_data_generator import DataGenerator as IndDataGen
from methods.data.individual_data_generator_simple import DataGenerator as IndDataGenSim
from methods.data.group_data_generator import DataGenerator as GrpDataGen
from methods.long_term_fairness import LongTermFairnessPlot
from methods.aif360.longterm_aif import AifLongTermMetric, AifLongTermPrediction

# 1. Fairness in Machine Learning

ML Background and Fairness in ML Setup...

## 1.1. Long Term Fairness Setup
The decision function $d(X^{(t)}, A)$ is a function of the features $X^{(t)}$ at time step $t$, and the protected attributes $A$ (constant over time). The superscript $^{(t)}$ indicates that the features at time step $t$ are meant.

$$ Y^{(t)} \in \mathbb{R}^{m}$$
$$ \hat{Y}^{(t)} \in \mathbb{R}^{m}$$
$$A^{(t)} = A \in \mathbb{R}^{m}$$
$$X^{(t)} \in \mathbb{R}^{n \times m}$$

$$Y \in \mathbb{R}^{t \times m}$$
$$\hat{Y}\in \mathbb{R}^{t \times m}$$
$$X\in \mathbb{R}^{t \times n \times m}$$


## 1.2 Measuring Static Fairness
Static fairness referees to the decisions at one fixed time step $t$ in the above setup. They are discussed in [fairmlBook](https://fairmlbook.org/pdf/classification.pdf). 

### 1.2.1 Disparate Impact (or Demographic Parity)
Only considers the acceptance rates between different groups:

$$P(\hat{Y}^{(t)} = \hat{y} | A = a_i) = P(\hat{Y}^{(t)} = \hat{y} | A = a_j) ~~ \forall i, j $$ 

### 1.2.2  Error based Metrics (Equal Opportunity or Separation)
Also consider the true label.

$$P(\hat{Y}^{(t)} = \hat{y} | Y = y, A = a_i ) = P(\hat{Y}^{(t)} = \hat{y} | Y=y, A = a_j) ~~ \forall i, j $$ 

# 2. Limits of above Metrics
The above two methods are referred to as observational in literature. They can be expressed as probability distributions over the the random variables $X^{(t)}, Y^{(t)}, \hat{Y}^{(t)}$ and $A^{(t)}$. Some limits are already discussed in the chapter [Inherent limitations of observational criteria](https://fairmlbook.org/pdf/causal.pdf) of the fair ML book. 

The limitations here address another viewpoint related to long term fairness.
The assumption is, that a decision made by the decision maker at some time step $t$ has a future impact on the true label $y^{(t)}$ of each individual:

$$P(y_i^{(t)}=c) \sim \sum _ {j \in G} \sum _ {i=1} ^{n} \hat{y} _j ^{(t-n)} $$

The decisions made by the decision maker therefore impacts the future pool of qualified individuals.
Under such an assumption, observational criteria might fail to yield optimal solutions. Although they provide accuracy in the short term (at the current decision step) they might not unleash the full potential of each individual after several time steps. It can be argued, that the decision marker itself has an interest in increasing the pool of qualified individuals to have an better chance of hiring someone. The effects of different static decision rules are discussed in the notebooks for each data generation process individually because the effect changes under different assumptions regarding the data generation. 

This long term view of fairness also implies other fairness notations beyond static criteria. The question is shifted from *is an individual qualified for a positive label* to *could the individual be made qualified*. This view can be seen as related or as a subtype of causal fairness criteria.

The $G$ in the equation is a undefined subset of individuals. The group $G$ can be assigned arbitrary. Two examples are $G_s=\{i\}$ and $G_g=\{j | a_j=a_i\}$. $G_s$ describes the case, when the positive label only depends on previous predictions of the individual itself and $G_g$ when the positive label depends on positive predictions for all individuals of the same group. Another possible case would be $G_N =\{j | x_j \sim x_i\}$. This would mean, that the probability for a positive label depends on the predictions for individuals with similar features.



## 2.1 Causal Fairness
Causal fairness notations ask *why* individuals are labeled as they are [fairmlBook](https://fairmlbook.org/pdf/causal.pdf). A label is considered as unfair if a protected attribute has an undesired effect on the decision...

The relationship between variables can be visualized with cause effect graphs.

$$TODO$$


## 2.2 Related Work
Long term fairness has gained much interest recently...

https://github.com/google/ml-fairness-gym long term fairness as markov decision process

https://arxiv.org/abs/1909.09141 Causal Modeling for Fairness in Dynamical Systems

https://arxiv.org/abs/1803.04383 Delayed Impact of Fair Machine Learning

https://arxiv.org/abs/1903.01209 On the Long-term Impact of Algorithmic Decision Policies

https://arxiv.org/abs/1712.00064 A Short-term Intervention for Long-term Fairness in the Labor Market

## 2.2 Long Term Fairness Metric
The best way to conceptually define fairness in a long term decision process is trough the number of qualified individuals after several decision steps denoted: $$ \# (Y^{(t)}=1)$$

The goal of this project is to show the effects of static decision function on long term fairness. Therefore, the long term metric is more discussed in the last chapter (next steps).

Long term fairness as described here and decision maker utility (usually accuracy) are kind of parallel notations.
The decision maker must accept some false decision in order to achieve long term fairness. However, as discussed later the decision maker on the other hand has an interest in increasing the number of qualified individuals.

## 2.3 Long Term Decision Function
The framework is used here to visualize the effects of different  short term decision functions on the long term effect. It is also designed to provide a possibility to train a decision function learning the long term relationship.

A possible extension of the fairness in ML setup could be to extend the notation of the decision makers utility by the number of qualified individuals in the future. I.e. the decision maker does not only want to correctly label all qualified candidates in the current decision process but also to increase the number of qualified people for the future.

$$max(U)$$

$$U \sim \mathbb{E}[Y^{(t)}=\hat{Y}^{(t)}] +  \sum_{i=1} ^N \mathbb{E}[\# (Y^{(t+i)}=1)]$$

