# 1 Contributions

In [1]:
import numpy as np

## 1.1 Data Generation
The data $X^{(t)}, A^{(t)}$ and $Y^{(t)}$ at time step $t$ is assumed to be distributed according to some distribution $\pi(X^{(t-1)}, A^{(t-1)},Y^{(t-1)})$. The features $X^{(t-1)}$ and $X^{(t)}$ between two different time step should be similar and the true label $Y^{(t)}$ must be predictable from $X^{(t)}$. In this setup there is only an initial correlation between features, labels and the protected attribute. During data generation more individuals from the protected group are sampled from the negative labeled cluster, but there is no further correlation between the protected group and the qualification (...).

Individuals with a positive label are sampled from a bivariate gaussian with mean $\mu _{pos}$ and negative labeled individuals from mean $\mu_{neg}$.
The overall setup for the data generation is:

1. $Y^{(t)}$ is sampled proportional to $\sum _ {j \in G} \sum _ {k=1} ^{n} \hat{y} _j ^{(t-k)}$
1. $X^{(t)}_{pos}$ (features for individuals with positive label) are sampled from $\mathcal{N}(X^{(t)}, \mu _{pos}, \sigma _{pos})$
1. $X^{(t)}_{neg}$ (features for individuals with negative label) are sampled from $\mathcal{N}(X^{(t)}, \mu _{neg}, \sigma _{pos})$

### 1.1.1. Individual Data Generation
The probability for a positive label depends only on the predictions for the individual.

This is achieved by sampling $x_j^{(t)}$ from  $\mathcal{N}(X^{(t)}, x_j^{(t-1)}, \sigma)$

### 1.1.2. Group Data Generation
The whole group sharing the protected attribute benefits from positive decisions for individuals.

Again, points of the same generation are similar
$$\mathcal{N}\left( x_i^{(t)}; \overline{x}_{i-1} + \alpha \cdot \vec{v} \cdot \sum    \hat{y_i}, \sigma \right)$$

## 1.2 Metric and Prediction Function
To avoid re-implementation of metrics and decision functions, wrappers for AIF360 functions and metrics provided.

## 1.3. Plot Generator
Finally, the plot generator runs the long term simulation by repeadeatly sampling new data, running predictions and computing the metric. To estimate the impact of decision rules a baseline data pipeline 

Input: Metric, DecisionFunction, DataGenerator, Steps

X_old, A_old, Y_old <- DataGenerator.InitializeData
X_base_old, A_base_old, Y_base_old <- X_old, A_old, Y_old 

TrueMetric.append(Metric(X, A, Y))
BaselineMetric.append(Metric(X_base, A_base, Y_base))

For s in Steps:
    X_new, A_new, Y_new <- DataGenerator.Sample(X_old, A_old, Y_old)
    TrueMetric.append(Metric(X_new, A_new, Y_new))
    
    Y_pos = PositiveLabel of shape Y_old
    X_base_new, A_base_new, Y_base_new <- DataGenerator.Sample(X_base_old, A_base_old, Y_pos)
    BaselineMetric.append(Metric( X_base_new, A_base_new, Y_base_new))
    
    Append X_new, ..., X_base_new, Y_base_new to the old data
 
Output: Plot(TrueMetric, BaselineMetric)