# Discussion
The goal of this project is, to provide a framework to explore the effects of different decision rules in dynamic fairness scenarios and point out possible shortcomings. The main example is, that the same decision rule can have different impact depending on the underlying data dynamics. Assuming effects of positive decisions on the whole group, disparate impact converges towards the optimal solution. When decisions have only an impact on individual level, disparate impact only helps those individuals with positive decision and stabilizes this inequality.

In this project, only positive decisions had an impact on the dynamics. This is because the negative prediction was 0 and did not count negative in the sum. This is not really realistic and would change the results again. However, the current implementation of the data generators is imperfect and negative labels made the models collapse. An improved data generation scheme with more considerations is therefore discussed in the next part.

# Advanced Data Generation
One assumption in the previous data generators was, that every individual could benefit from positive decisions. In other words, every individual could be positive labeled. This must not always be the case and, furthermore, negative decisions should also have an impact on the dynamics. Finally, only whole groups or individuals benefit from positive decisions. This should be extended to subpopulations of each group. 

This could be modeled using a more complex model with a hidden variable $h$ representing the level  of capability of each individual. Individuals would then benefit from positive decisions depending on the level of their hidden capability $h$. Furthermore, the distance in feature space to individuals with positive prediction should define how much an individual benefits from positive decisions of others. 

The goal of the decision maker in this case would be, to achieve accuracy in short term decision making and learn the hidden variable $h$ in the long term in order to maximize the pool of qualified individuals in the future.

The generator should be as extensive as possible, to easily account for new dynamics, and incorporate some kind of randomness.

## Fairness Definition in the Advanced Setup
Measuring fairness in the dynamic setup is not straight forward. The method used in this project, counting the number of positive labeled individuals in the future, is not sufficient. The decision maker could again prefer certain groups and improve their capabilities before others. For instance assume, that capabilities increase continuously (i.e. there is no upper bound for qualification). In this case, individuals who entered the positive group earlier have an advantage.

The new notation using the hidden capabilities $h$ can be used to extend existing fairness notations for the dynamic setup.

Let $H$ be the vector of hidden capabilities, then the following extensions are possible: 

Qualified Disparate Impact:
$$P(\hat{Y}=1|H=1) = P(\hat{Y}=1|H=1) \forall i, j$$
This definition is independent of the protected attribute $A$ but implicitly removes  unfairness as discussed in the next part.

Disparate Impact:
$$P(\hat{Y}=1|A=a_i, H=1) = P(\hat{Y}=1|A=a_j, H=1) \forall i, j$$

Error Based metrics:
$$P(\hat{Y}=1|A=a_i, H=1, Y=1) = P(\hat{Y}=1|A=a_j, H=1, Y=1) \forall i, j$$


## 2.3 Long Term Decision Function
The framework is used here to visualize the effects of different  short term decision functions on the long term effect. It is also designed to provide a possibility to train a decision function on the long term data.

# Fairness without Protected Attributes
In the beginning, it was mentioned that the described dynamics imply another view on long term fairness asking who *could* be labeled positive. The question of who *could* be labeled positive is also in line with causal fairness notations asking *why* a certain label was given.

The understand the main argument, first consider a perfect decision function $d^*$ which is able to perfectly predict the labels from data. Such a function could be seen as fair from a static point of view (neglecting any long term impacts) since it does not make errors. For instance, all error based metrics would consider it as fair.

Only when viewing it from a long term perspective it can be considered unfair since it...

The unfairness is only an effect of an maybe random initial correlation between the protected group and the ...

# Considerations for an Improved Data Generator
- The decision maker should only have access to data with positive prediction (i.e. the dm only knows which individuals are qualified or not if the dm previously accepted them).

- A separate classifier should be trained for the baseline data.

- Improvements of the generator discussed above.