# **A fast and model agnostic machine learning fairness solution**
## Why do we talk about fairness in Machine Learning ? 

In its famous 2016 article about Machine Bias, Pro Publica analysed the software used by the US court of justice to score the likelihood of recidivism for a convincted. They showed that the model used was heavily biased against black people, and was most of the time failing to predict recidivism among the white convincted. Indeed blacks were considered almost twice as likely as whites to be labeled a higher risk but did not actually re-offend more.
&nbsp;

In 2015, Amazon realized that its recruiting algorithm was far from gender neutral, mostly because it had been trained on 10 years of hiring data dominated by man. The algorithm was penalising women profile because it considered the male attribute to be a good predictor for hiring. They tried different solutions to remove this gender bias but eventually dropped the project.
&nbsp;

It is the perfect example to show that ML solutions completely reflect the society biases as they are learning from "human" data and that fairness in ML isn't just about removing the sensible attribute from the data (as hidden bias can exist). This asks the question of how transparent and how explainable these solutions should be, especially in domains such as Justice, employment or Credit. As algorithms are becoming more and more complex, it becomes harder to understand the pattern used by a model to make a decision (i.e why did it hire this specific person or predict a high recidivism score) and as the previous examples showed, discrimination can lead to a lower performance in real world applications. 
&nbsp;

Furthermore, as laws across the world evolve to catch up with these models, businesses are even more incentivised to use fair algorithms. Indeed, the time and profit  gained from using a recruiting or credit scoring algorithm is significant but such model must first comply with the local laws to be usable. 
Moreover, it is often profitable for a company to use fair algorithms as it can help them target new customers. For instance, a Bank with an historic majority of middle aged and senior customers might want to target young adults for risk diversification. In order to do that, it will need its credit scoring algorithm to accept younger people as the model would have only learned to recognize as profitable customers the middle aged/ senior.
&nbsp;

This is why Dreamquark is adding an innovative method to its explainable auto ML software : Brain, that can be plugged in after any machine learning models and any tasks (Classification and Regression) to make it fairer based on sensible groups and definitions chosen by the user. Our Fairness Calibrator can provide metrics, visualizations and tackles the problem of hidden bias for tabular data. 

Each "fairness problem" is highly domain-specific and thus, requires its own specific definition of fairness. For instance, in a credit scoring situation, we would want to consider the equality in outcome (i.e target should be the same) between female and male, whereas in a fraud detection setting we'd rather look at the false positive and true positive rates and make sure they are not too different between sensible groups.

## **The problem of hidden bias**
One of the major issue in fair AI is the hidden bias. Also known as implicit bias, it refers to the attitudes or stereotypes that affect our understanding, actions, and decisions in an unconscious manner. In Machine Learning, it concerns the variables that can be used as predictor for a protected attributes. For instance, in a resume, 'AS Roma fan' in the interest section might be a predictor for the gender. 

As its name indicates, hidden bias is really hard to find in a dataset, making the problem of removing it even harder. In order to tackle this issue, we remove the sensible attribute column before the training. This action will force the model to learn bias from the hidden part (i.e from the other variables). After the training we 'tell' our calibrators the values of the sensible attribute column to optimize the fairness on these groups. Therefore, our post processing optimization only works to remove the hidden biases since the explicit ones are removed directly before the training.

## **How to integrate fairness into business oriented auto ML tool ?**

### Auto ML constraints
Because DreamQuark's software, Brain, is a business oritented auto ML platform, we had to solve several domain specific issues. First, our solution must be model/use case agnostic to fit the auto ML framework. Indeed, in auto ML, it is crucial to implement a general framework that can be used in most scenarios. This is also one of the greatest strenght of our solution since it allows any user to easily use it.

Second, we had a constraint in terms of computation time. The method shouldn't increase the duration of an usual auto ML task. 

Third, since DreamQuark is focused on ethical and explainable AI, the explainability of the model must be preserved.

Finally, it was important to make the solution as understandable as possible for the end business user, meaning interpretable visualisations and explainable results.

### Existing Solutions

The research in fairness Machine learning has defined three ways of mitigating bias : post processing methods, in training methods and post processing methods. The main difference between these approaches is the time at which the bias is removed. 
* Pre processing : (before training) a new representation of the dataset is learned in an hyperplane where the bias is not present. This can be seen as an embeddings whose aim is to forget bias. This method loses the explainability of the model but is model agnostic. (see https://arxiv.org/pdf/1802.04422.pdf). 
* In training : during the training of the model, a new part accounting for the fairness is added to the loss function. Other model specific method can also be derived (see  https://arxiv.org/pdf/1810.05041.pdf ). This approach is really efficient however it requires to have a complete access to the model parameters (not possible for the xgboost and sklearn packages for instance)
* Post processing : once the model has been trained, the output is transformed to forget the bias learned during training. (https://arxiv.org/pdf/1610.02413.pdf ). This approach is model agnostic.

At DreamQuark, we choosed to focus on the post processing appraoch since it preserves the explainability and is model agnostic, which fits perfectly our auto ML needs.

One of the most complete package in fairness machine learning is aif360 from IBM. The package contains a whole set of methods in the three categories presented above. The performance in terms of fairness and accuracy of the models are among the bests. However the post processing method only accepts the binary classification setting and the number of sensible groups is limited to 2. Furthermore, the computation time can be optimized and no tradeoff between performance and fairness is presented.

## **Overview**
In this article, we propose a post processing method to introduce fairness in Binary Classification, Multiclass Classification and Regression tasks. The Fairness Calibrator can be plugged in after any model as long as it has a predict_proba method for classification. Furthermore, our solution can accept up to 10 sensible groups. 

In the Binary classification cases the calibrator object gets fitted by learning a set of thresholds, one for each group, that will be used for the final prediction. By allowing one threshold per group, the model is able to differentiate the prediction between each sensitive groups and thus improving the fairness metric.

<font size="3"><h1><center>$thresholds = \begin{pmatrix} t_1, \cdots, t_m \end{pmatrix}$</center></h1></font>

where m = number of sensible groups

In the multiclassification and regression settings, the calibrator object gets fitted by learning a set of weights and biases that are going to linearly transform the scores outputed by the model. In the classification settings, each couple (group, class) will be assigned a weight and bias whereas in the regression we will only consider one couple (weight, bias) per sensible group.

<font size="3"><h1><center>$fitted\:scores = weight * scores + bias$</center></h1></font>

<font size="3"><h1><center>$weight =  \begin{pmatrix} 
            w_{1,1} & w_{1,2} & \cdots & w_{1,n} \\ 
            w_{2,1} & w_{2,2} & \cdots & w_{2,n} \\
            \vdots  & \vdots  & \ddots & \vdots  \\
            w_{m,1} & w_{m,2} & \cdots & w_{m,n} 
         \end{pmatrix}
         \quad \quad
         bias = \begin{pmatrix} 
            b_{1,1} & b_{1,2} & \cdots & b_{1,n} \\ 
            b_{2,1} & b_{2,2} & \cdots & b_{2,n} \\
            \vdots  & \vdots  & \ddots & \vdots  \\
            b_{m,1} & b_{m,2} & \cdots & b_{m,n} 
         \end{pmatrix}$</center></h1></font>

where n = number of class and m = number of sensible groups. Note that we consider here the broadcasted matrix product.
The learning procedure is performed by optimizing a loss function that accounts for the fairness metric chosen by the user.

The core of our approach lies within this loss function. Indeed, for each definition, the corresponding loss is a continuous proxy (shown below) that allows us to use Pytorch's autograd to perform the Gradient Descent's optimization very quickly.  

# **1.Binary Classification**

In the binary classification settings, the thresholds are learned by minimizing a loss function that measures the fairness metric specified by the user. More precisely, we allow the user to choose between two definitions :

## &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **1.A) Demographic Parity**


<font size="3"><h1><center>$min \left(\frac{P(\hat{y} = c| a = g_i )}{P(\hat{y} = c| a = g_j)}, \frac{P(\hat{y} = c| a = g_j)}{P(\hat{y} = c | a= g_i)}\right) > p \quad  \forall g_i,g_j \in G \quad \forall c \in C$</center></h1></font>

where G is the set containing the sensible groups and C the set containing the classes.

In this definition, we want the probability of sucess to be  roughly the same between each sensible groups. For instance, if we want to attribute a loan, we would want female and male to have the same probability of getting one. The value of p is here to measure how "close" we want the probabilities to be. To measure these probabilities we simply consider the empiric means within each sensible groups. 

To find the best thresholds we consider the loss function :
<font size="3"><h1><center>$- mean_{c}\left( \frac{\min_{g} P(\hat{y} = c| a = g)}{\max_{g} P(\hat{y} = c| a = g) + \epsilon} \right) \quad  c \in C, g \in G$</center></h1></font>

G is set the set containing the sensible groups and C the one containing the classes. 

The problem was that the prediction made with thresholds was not differentiable. To solve this issue, we took the scores, translated them with the thresholds and multiplied the results by a factor beta (~50). 
This technique allowed us to push the scores below the thresholds towards big negative values and vice versa. We then gave this modified vector as input to a sigmoid function that rescaled everything between 0 and 1. 
<font size="3"><h1><center>$sigmoid(beta * (score-thresholds))$</center></h1></font>

Thanks to the beta factor, most values were really high in absolute values, which made the sigmoid function outputs a vector really close to the actual prediction. 
Once we have our proxy of the predicted vector, we can compute the demographic parity metric. This final result will be our loss value at each iteration. Note that it directly depends on the thresholds which are the parameters to optimize.

<img src="Images/sigmoid.png">

The parameters to optimize are thresholds (size n_groups) where n_groups is the number of sensible groups.

Each thresholds vector leads to a different predicted vector that gives a specific demographic parity metric.
For each class c , we compute the Demographic parity metric, then we take the mean over the classes. There is also an option to take the weighted average with the relative frequency of each class. Thanks to pytorch autograd, computing these metrics allow us to efficiently optimize the loss as long as it is differentiable. The lower this loss, the higher the demographic parity.

<img src="Images/DP_binary_loss">

For each gradient descent step, we get the thresholds that leads to a modified prediction vector. Once we have our prediction we can compute the fairness and performance metrics, which gives us a tradeoff between the two. 

In this article we will often mention the tradeoff between performance and fairness. It is important to understand that this tradeoff only exists within the dataset. By removing the bias it is obvious that we lose in performance computed on this specific dataset but that could also lead to better performance in the real world since the original bias has been removed from the decision process.


<table><tr>
<td><img src="Images/DP_binary_tradeoff">
<td><img src="Images/DP_binary_results">
</tr></table>

The early stop is performed on the valid fairness metric and the best iteration on valid is kept. The user can of course choose which iteration is best in terms of tradeoff for his specific problem. Note that the performance metric can also be specified. With the optimal thresholds, we can now make our prediction and visualize the results. 

## &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **1.B) Equalized odds**

<font size="-1"><h1><center>$E[\hat{y}|g, Y=c] = E[\hat{y}|Y=c] \quad \forall g \in G,\quad  \forall \hat{y}, c \in C$</center></h1></font>

In this definition, we consider a model fair when the true positive rates and false positive rates for each group are "close" to each other. In the multiclass setting, we consider the one vs all approach where the false positive rate and true positive rate are as described below for class c 

<font size="3"><h1><center>$FPR_c = \frac{(\hat{y} = c \cap  y_{true} \ne c)}{y_{true} \ne c} \quad TPR_c = \frac{(\hat{y} = c \cap y_{true} =c)}{y_{true} = c}$</center></h1></font>

Similar to what we did for Demographic parity we minimize a differentiable loss function with adam. Here we consider a linear combination of TPR fairness and FPR fairness, where lambda is an hyperparameter that measures the importance of TPR fairness over FPR fairness.

<font size="3"><h1><center>$- \left(\lambda * mean_{c}\left( \frac{\min_{g} TPR_{c,g}}{\max_{g} TPR_{c,g} }\right) + (1 - \lambda) * mean_{c} \left(\frac{\min_{g} FPR_{c,g}}{\max_{g} FPR_{c,g} }\right)\right)$</center></h1></font>

where c is the class and g the sensible group.

We approximate the prediction in a differentiable fashion with the sigmoid method exactly as we did for Demographic Parity. However, instead of computing the demographic parity metric, we compute the equalized odds one on this proxy.

We compute how 'close' the FPR and TPR of each group are from each other and we take the mean over the classes. Here again, TPR and FPR depend on the scores that are modified by the thresholds at each iteration. Minimizing this loss function leads directly to improving the equalized odds metric.Below you will find the tradeoff obtained during training between performance and fairness

 
Similar to what we did for Demographic parity we can plot the results as an histogram to visualize what actually improved compared to the initial model.
<table><tr>
<td><img src="Images/EQ_binary_tradeoff">
<td><img src="Images/EQ_binary_results"> 
</tr></table>

It is important to note that the higher the number of class and sensible groups, the harder it is to reach a good fairness metric value without completely killing the performance.

# **2. Multiclass Classification**

In the classification settings, the weights and biases are learned by minimizing a loss function that measures the fairness metric specified by the user. More precisely, we allow the user to choose between two definitions :

## &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **2.A) Demographic Parity**


<font size="3"><h1><center>$min \left(\frac{P(\hat{y} = c| a = g_i )}{P(\hat{y} = c| a = g_j)}, \frac{P(\hat{y} = c| a = g_j)}{P(\hat{y} = c | a= g_i)}\right) > p \quad  \forall g_i,g_j \in G \quad \forall c \in C$</center></h1></font>

where G is the set containing the sensible groups and C the set containing the classes.

In this definition, we want the probability of sucess to be  roughly the same between each sensible groups. For instance, if we want to attribute a loan, we would want female and male to have the same probability of getting one. The value of p is here to measure how "close" we want the probabilities to be. To measure these probabilities we simply consider the empiric means within each sensible groups. 

To find the best couple (weights, biases) we consider the loss function :
<font size="3"><h1><center>$- mean_{c}\left( \frac{\min_{g} P(\hat{y} = c| a = g)}{\max_{g} P(\hat{y} = c| a = g) + \epsilon} \right) \quad  c \in C, g \in G$</center></h1></font>


G is set the set containing the sensible groups and C the one containing the classes. 

Similar to what we did in the binary case, we made this loss function differentiable by approximating the prediction vector using the softmax function on the scores multiplied by a factor beta to obtain values as close to 0 and 1 as possible. We then compute the fairness metric on the proxy. 

The parameters to optimize are weights (n_groups x n_class) and biases (n_groups * n_class), where n_groups is the number of sensible groups and n_class the number of classes

Each couple (weights, bias) leads to a different predicted vector that gives a specific demographic parity metric.
For each class c , we compute the Demographic parity metric, then we take the mean over the classes. There is also an option to take the weighted average with the relative frequency of each class. Thanks to pytorch autograd, computing these metrics allow us to efficiently optimize the loss as long as it is differentiable. The lower this quantity is, the higher the demographic parity.

<img src="Images/DP_loss">

For each gradient descent step, we get the weights and biases that leads to a modified scores (through the linear transformation explained above) which gives the prediction. Once we have our prediction we can compute the fairness and performance metrics, which gives us a tradeoff between the two. In this article we will often mention the tradeoff between performance and fairness.


<table><tr>
<td><img src="Images/DP">
<td><img src="Images/DP_results">
</tr></table>

The early stop is performed on the valid fairness metric and the best iteration on valid is kept. The user can of course choose which iteration is best in terms of tradeoff for his specific problem. Note that the performance metric can also be specified. With the optimal weights and biases, we can now make our prediction and visualize the results. 


The Calibrator object also has a predict proba method to update existing scores and a predict method to make fair predictions once the classifier has been fitted. Lastly, the distribution of each class scores can be plotted.

<img src="Images/DP_distplot">

## &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **2.B) Equalized odds**

<font size="3"><h1><center>$E[\hat{y}|g, Y=c] = E[\hat{y}|Y=c] \quad \forall g \in G,\quad  \forall \hat{y}, c \in C$</center></h1></font>

In this definition, we consider a model fair when the true positive rates and false positive rates for each group are "close" to each other. In the multiclass setting, we consider the one vs all approach where the false positive rate and true positive rate are as described below for class c 

<font size="3"><h1><center>$FPR_c = \frac{(\hat{y} = c \cap  y_{true} \ne c)}{y_{true} \ne c} \quad TPR_c = \frac{(\hat{y} = c \cap y_{true} =c)}{y_{true} = c}$</center></h1></font>

Similar to what we did for Demographic parity we minimize a differentiable loss function with adam. Here we consider a linear combination of TPR fairness and FPR fairness, where lambda is an hyperparameter that measure the importance of FPR TPR fairness over FPR fairness.

<font size="3"><h1><center>$- \left(\lambda * mean_{c}\left( \frac{\min_{g} TPR_{c,g}}{\max_{g} TPR_{c,g} }\right) + (1 - \lambda) * mean_{c} \left(\frac{\min_{g} FPR_{c,g}}{\max_{g} FPR_{c,g} }\right)\right)$</center></h1></font>

where c is the class and g the sensible group.

We compute how 'close' the FPR and TPR of each group are from each other and we take the mean over the classes. Here again, TPR and FPR depend on the scores that are modified by the couple (weights, biases) at each iteration. Minimizing this loss function leads directly to improving the equalized odds metric.Below you will find the tradeoff obtained during training between performance and fairness

<img src="Images/EQ_tradeoff">
 
Similar to what we did for Demographic parity we can plot the results as an histogram to visualize what actually improved compared to the initial model.

<table><tr>
<td><img src="Images/EQ_0"> 
<td><img src="Images/EQ_1"> 
<td><img src="Images/EQ_2"> 
</tr></table>

It is important to note that the higher the number of class and sensible groups, the harder it is to reach a good fairness metric value withtou completely killing the performance.

## **3. Regression**
Similar to what we did in the classification settings, we are modifying the output of a model using linear transformation. Once again, the weights and biases are learned via optimization of a loss function that accounts for the fairness metric. 

Since regression is not a common task in fair ML, we adapted Demographic parity to fit this new setting. We consider a regression model to be fair when the output means of each sensitive group are 'close' to each others. Thus we consider this loss function :

<font size="3"><h1><center>$\left(\max_g(mean_g(\hat{y}))- \min_g(mean_g(\hat{y})) \right)^2 \quad g \in G$</center></h1></font>
where G is the set containing the groups.

We consider the target empirical means of each group and we try to reduce their squared difference. In the following plots, reg_fairness refers to the negative squared error.

<table><tr>
<td><img src="Images/reg_tradeoff">
<td><img src="Images/reg_results"> 
</tr></table>

# AIF360 Comparison 

We are going to compare IBM's postprocessing method called Reject Option Classification with Dreamquarks's own postprocessing methods (weights/bias and thresholds). For more details on the comparison, see the notebook Score_Comparison.ipynb

**Method** : IBM is using the Reject option classification method. 'Reject option classification is a postprocessing technique that gives favorable outcomes to unpriviliged groups and unfavorable outcomes to priviliged groups in a confidence band around the decision boundary with the highest uncertainty' https://aif360.readthedocs.io/en/v0.2.3/modules/postprocessing.html

DreamQuark is using Pytorch to learn a set of parameters. In the weights/biases methods, the parameters are the (w,b) that are going to linearly transform the initial scores. Each sensible group will be linearly transformed by its own set (w,b), thus allowing differentiation and bias removal. We perform in the same fashion with thresholds, each group receiving a threshold.

**Comparison settings** : We reproduced  ibm calibrator by importing their module aif360 (pip install aif360) and used the demo_reject_option_classification method. We compared the results with  our own method applied on the same data (using their metric).

**Metrics** : We focused on two linked metrics used by IBM : Disparate impact and Statistical parity difference

<font size="3"><h1><center>$Demographic\ parity : \frac{P(\hat{y} = 1| a = unfavoured\ group )}{P(\hat{y} = 1| a = privileged\ group)}$</center></h1></font>

<font size="3"><h1><center>$Statistical\ parity\ difference : P(\hat{y} = 1| a = privileged\ group )-P(\hat{y} = 1| a = unfavoured\ group)$</center></h1></font>

#### Dataset and problematic description

The Adult Dataset contains data on 50 000 individuals. The goal is to predict wether their annual income is above 50k or not. We are especially interested in the impact of the SEX and RACE features. Indeed, we will consider a model biased if we observe a difference in outcome between the sensible groups created by these features.

Kaggle's description :
COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a popular commercial algorithm used by judges and parole officers for scoring criminal defendant’s likelihood of reoffending (recidivism). It has been shown that the algorithm is biased in favor of white defendants, and against black inmates, based on a 2 year follow up study (i.e who actually committed crimes or violent crimes after 2 years).
We consider two sensitive attributes, SEX and RACE.

## Adult Dataset

##### Sex
| Method | Disparate Impact | Statistical parity difference |Accuracy |Balanced Accuracy |F1 score | Fit Time (seconds)
| --- | --- | --- | --- |--- |--- | ---|
| Initial state | 0.2794 | 0.3580 |0.7408 |0.7437| 0.5841
| IBM | 0.9088 | 0.0402 |0.6944 |0.7140| 0.5446| 23.18 
| DreamQuark (weights/bias) | 0.8063| 0.0255 |0.7782|0.6087|0.3795 |1.36
| DreamQuark (thresholds) | 0.9389 |  0.0125 |0.7817|0.6746|0.50935 | 1.3
| DreamQuark (thresholds) (at ~0.9088 DI) | 0.9083| | | 0.6748| |

##### Race
| Method | Disparate Impact | Statistical parity difference |Accuracy |Balanced Accuracy |F1 score| Fit Time (seconds)
| --- | --- | --- | --- |--- |--- |---| 
| Initial state | 0.4122 | 0.2434 |0.7408 |0.7437| 0.5841
| IBM | 0.9059 | 0.0390 |0.7240 |0.7412| 0.5770 | 23.10
| DreamQuark (weights/bias) | 0.9183| 0.008 |0.7966|0.6222|0.4036 |2.5
| DreamQuark (thresholds) | 0.999 |  0.000 |0.8027|0.6698|0.5034 | 1.02
| DreamQuark (thresholds at ~ 0.9059 DI) | 0.9024 |   ||0.6724| | 

## Compas Dataset

##### Race
| Method | Disparate Impact | Statistical parity difference |Accuracy |Balanced Accuracy |F1 score| Fit Time (seconds)
| --- | --- | --- | --- |--- |--- | --- |
| Initial state | 0.4127 | 0.2724 |0.6774|0.6776| 0.6483
| IBM | 0.8992| 0.0578 |0.6515|0.6512| 0.6658 | 4.85
| DreamQuark (weights/bias) | 0.8669 | 0.05146 |0.6527|0.6538|0.6008|0.25
| DreamQuark (thresholds) | 0.9065 | 0.0535 |0.6489|0.6486|0.6642|0.15
| DreamQuark (weights/bias at ~ 0.8992 DI) | 0.8945 |  ||0.6538||


##### Sex
| Method | Disparate Impact | Statistical parity difference |Accuracy |Balanced Accuracy |F1 score| Fit Time (seconds)
| --- | --- | --- | --- |--- |--- | ---| 
| Initial state | 0.5163 | 0.2493 |0.6776 |0.6774| 0.6493
| IBM | 0.9325 | 0.0312 |0.6742|0.6745| 0.6614 | 4.86
| DreamQuark (weights/bias) | 0.9008 | 0.0312 |0.6338|0.6352|0.5496 | 0.28
| DreamQuark (thresholds) | 0.9686 | 0.0160|0.6704|0.6704|0.6708 | 0.20
| DreamQuark (thresholds) (at ~ 0.9325 DI) | 0.9265 | ||0.6756| | 

# Results Analysis

It is important to take into account both Disparate Impact and Statistical parity. Indeed, the first one is the relative difference while the other is the absolute one. 
We can see that IBM is almost always better in terms of balanced accuracy, however DreamQuark can reach really high level of fairness while maintaining a good amount of performance. Furthermore, since DreamQuark's approach is similar to a deep learning optimization, the user can choose exactly the tradeoff between fairness and performance. 
One of DreamQuark's main advantage is the computation time that is 10 times lower than IBM on average.

# Conclusion and further work

We proposed a package that is highly scalable by taking full advantage of Pytorch's autograd feature. We have attained really good performance compared to the state of the art with a method that doesn't require to have access to the model function. This makes our approach easily usable but also easily tunable thanks to its many hyperparameters and early stop options. 



# Sources

* https://medium.com/@tonyxu_71807/ensuring-fairness-and-explainability-in-credit-default-risk-modeling-shap-ibm-tool-kit-aix-360-bfc519c191bf

* https://arxiv.org/pdf/1610.02413.pdf

* https://aif360.mybluemix.net/

* https://paulgoelz.de/papers/equalized.pdf

### Authors
* Quentin Raquet, Data Scientist at DreamQuark, https://www.linkedin.com/in/quentin-raquet/
* Hugo Paolini, Data Scientist Intern at DreamQuark, https://www.linkedin.com/in/hugo-paolini/

DreamQuark's website : https://www.dreamquark.com/