In [94]:
import sys
sys.path.append("../")

import numpy as np

from aif360.metrics import BinaryLabelDatasetMetric, SampleDistortionMetric,ClassificationMetric,DatasetMetric

from aif360.algorithms.preprocessing.optim_preproc import OptimPreproc
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions\
            import load_preproc_data_adult
from aif360.algorithms.preprocessing.optim_preproc_helpers.distortion_functions\
            import get_distortion_adult
from aif360.algorithms.preprocessing.optim_preproc_helpers.opt_tools import OptTools

from IPython.display import Markdown, display

In [48]:
np.random.seed(1)

In [53]:
dataset_orig = load_preproc_data_adult(['race'])

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

privileged_groups = [{'race': 1}] # White
unprivileged_groups = [{'race': 0}] # Not white

### Binary Label Dataset Metric

In [61]:
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

Difference in mean outcomes between unprivileged and privileged groups = -0.097328


### ClassificationMetric

In [58]:
metric_orig_train1 = ClassificationMetric(dataset_orig, dataset_orig,
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

In [60]:
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train1.mean_difference())

Difference in mean outcomes between unprivileged and privileged groups = -0.101445


### Dataset Metric

In [68]:
metric_orig_train2 = DatasetMetric(dataset_orig,
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

In [87]:
metric_orig_train2.num_instances(privileged=None)

48842.0

### SampleDistortionMetric

In [92]:
metric_orig_train3 = SampleDistortionMetric(dataset_orig,dataset_orig,
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

In [95]:
optim_options = {
    "distortion_fun": get_distortion_adult,
    "epsilon": 0.05,
    "clist": [0.99, 1.99, 2.99],
    "dlist": [.1, 0.05, 0]
}
    
OP = OptimPreproc(OptTools, optim_options)

OP = OP.fit(dataset_orig_train)
dataset_transf_train = OP.transform(dataset_orig_train, transform_Y=True)

dataset_transf_train = dataset_orig_train.align_datasets(dataset_transf_train)


This use of ``*`` has resulted in matrix multiplication.
Using ``*`` for matrix multiplication has been deprecated since CVXPY 1.1.
    Use ``*`` for matrix-scalar and vector-scalar multiplication.
    Use ``@`` for matrix-matrix and matrix-vector multiplication.
    Use ``multiply`` for elementwise multiplication.
This code path has been hit 1 times so far.


This use of ``*`` has resulted in matrix multiplication.
Using ``*`` for matrix multiplication has been deprecated since CVXPY 1.1.
    Use ``*`` for matrix-scalar and vector-scalar multiplication.
    Use ``@`` for matrix-matrix and matrix-vector multiplication.
    Use ``multiply`` for elementwise multiplication.
This code path has been hit 2 times so far.


This use of ``*`` has resulted in matrix multiplication.
Using ``*`` for matrix multiplication has been deprecated since CVXPY 1.1.
    Use ``*`` for matrix-scalar and vector-scalar multiplication.
    Use ``@`` for matrix-matrix and matrix-vector multiplication.
    Use `

Optimized Preprocessing: Objective converged to 0.000000


In [96]:
metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 
                                               unprivileged_groups=unprivileged_groups,
                                               privileged_groups=privileged_groups)
display(Markdown("#### Transformed training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

#### Transformed training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.043280


#### What types of bias mitigation algorithm are available.

Many bias mitigation strategies for machine learning havebeen proposed in recent years. The different approaches can be divided in the following three distinct groups:

1. Pre-processing
Efficient bias mitigation starts at the data acquisition and processing phase since the source of the data and also the extraction methods can introduce unwanted bias. Therefore, a maximum of effort must be put into validating the integrity of the data source and in ensuring that the data collection process includes appropriate and reliable methods of measurement. Hence, algorithms which belong to the pre-processing family ensure that the input data is balanced and fair. This can be achieved by suppressing the protected attributes, by changing class labels of the data set, and by reweighting or resampling the data.

2. In-processing
The second type of mitigation strategies comprises the in-processing algorithms. Here, undesired bias is directly mitigated during the training phase. A straightforward approach to achieve this goal is to integrate a fairness penalty directly in the loss function. 

3. Post-processing
The final group of mitigation algorithms follows a post-processing approach. In this case, only the output of a trained classifier is modified. The advantage of post-processing algorithms is that fair classifiers are derived without the necessity of retraining the original model which may be time consuming or difficult to implement in production environments. However, this approach may have a negative effect on accuracy or could compromise any generalization acquired by the original classifier. 

### Other fairness tools

1. RI toolkit - Microsoft responsible innovation toolkit
: It provides a set of practices in development, for anticipating and addressing the potential negative impacts of technology on people. There are some tools that can be used to improve fairness. 

1) Harms Modeling: a framework for product teams, grounded in four core pillars of responsible innovation, that examine how people's lives can be negatively impacted by technology

* injuries
* denial of consequential services
* infringement on human rights
* erosion of democratic & societal structures
    
2) Community Jury: a technique that brings together diverse stakeholders impacted by a technology. It is an adaptation of the citizen jury. The stakeholders are provided an opportunity to learn from experts about a project, deliberate together, and give feedback on use cases and product design. This responsible innovation technique allows project teams to collaborate with researchers to identify stakeholder values, and understand the perceptions and concerns of impacted stakeholders.

* Used in 'Center for new democratic processes' 
* Non-profit

2. LIME: Local interpretable model-agnostic explanation. It is developed by a team in University of Washington. It helps to understand the black-box classifier.

* It is an effective library to find out if the single prediction was explainable.

3. SHAP: it decomposes measures of fairness and allocate responsibility for any observed disparity among each of the model’s input features. 

1) Demographic parity metric: the output of the machine learning between the two groups should be equal or comparable. 

* Compared to the LIME library, it is effective to find out the whole prediction's explainability.
* Visualization methods are also available.