<a href="https://colab.research.google.com/github/Ali-Alameer/AI_fairness/blob/main/disparate_impact_remover.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### This notebook demonstrates the ability of the DisparateImpactRemover algorithm.
The algorithm corrects for imbalanced selection rates between unprivileged and privileged groups at various levels of repair. It follows the guidelines set forth by [1] for training the algorithm and classifier and uses the AdultDataset as an example.

In [None]:
!pip install 'aif360[all]'
!wget https://raw.githubusercontent.com/Ali-Alameer/AI_fairness/main/common_utils.py

In [None]:
import urllib.request 
# For Adult dataset
urllib.request.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",'/usr/local/lib/python3.9/dist-packages/aif360/data/raw/adult/adult.data')   
urllib.request.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test",'/usr/local/lib/python3.9/dist-packages/aif360/data/raw/adult/adult.test')  
urllib.request.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names",'/usr/local/lib/python3.9/dist-packages/aif360/data/raw/adult/adult.names') 


In [3]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

import matplotlib.pyplot as plt

import sys
sys.path.append("../")
import warnings

import numpy as np
from tqdm import tqdm

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC as SVM
from sklearn.preprocessing import MinMaxScaler

from aif360.algorithms.preprocessing import DisparateImpactRemover
from aif360.datasets import AdultDataset
from aif360.metrics import BinaryLabelDatasetMetric
from common_utils import compute_metrics

In [4]:
protected = 'sex'
ad = AdultDataset(protected_attribute_names=[protected],
    privileged_classes=[['Male']], categorical_features=[],
    features_to_keep=['age', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week'])

In [5]:
scaler = MinMaxScaler(copy=False)

In [6]:
test, train = ad.split([16281])
train.features = scaler.fit_transform(train.features)
test.features = scaler.fit_transform(test.features)

index = train.feature_names.index(protected)

The repair_level parameter in DisparateImpactRemover specifies the amount or intensity of the repair applied to the data. It is typically a value between 0 and 1, where 0 means no repair (i.e., no modification of the data) and 1 means maximum repair (i.e., full modification of the data to eliminate disparate impact). The value of repair_level determines the trade-off between fairness and accuracy in the resulting model. Higher values of repair_level may result in more fairness but potentially lower accuracy, while lower values may result in higher accuracy but less fairness.

In [None]:
DIs = []
acc = []
dis_impact = []
ave_odds_diff = []
for level in tqdm(np.linspace(0., 1., 11)):
    di = DisparateImpactRemover(repair_level=level)
    train_repd = di.fit_transform(train)
    test_repd = di.fit_transform(test)
    
    X_tr = np.delete(train_repd.features, index, axis=1)
    X_te = np.delete(test_repd.features, index, axis=1)
    y_tr = train_repd.labels.ravel()
    
    lmod = LogisticRegression(class_weight='balanced', solver='liblinear')
    lmod.fit(X_tr, y_tr)
    
    test_repd_pred = test_repd.copy()
    test_repd_pred.labels = lmod.predict(X_te)

    p = [{protected: 1}]
    u = [{protected: 0}]
    cm = BinaryLabelDatasetMetric(test_repd_pred, privileged_groups=p, unprivileged_groups=u)
    print("Repair Level = %f" % level)
    print("Difference in mean outcomes between unprivileged and privileged groups = %f" % cm.mean_difference())
    metric_test = compute_metrics(test_repd, test_repd_pred, 
                                      unprivileged_groups=u, privileged_groups=p,
                                      disp = False)
    print("Model accuracy = %f" % metric_test["Balanced accuracy"])
    acc.append(metric_test["Balanced accuracy"])
    print("Average odds difference = %f" % metric_test["Average odds difference"])
    ave_odds_diff.append(metric_test["Average odds difference"])
    print("Disparate impact = %f" % metric_test["Disparate impact"])
    dis_impact.append(metric_test["Disparate impact"])
    DIs.append(cm.disparate_impact())

In [None]:
%matplotlib inline
plt.plot(np.linspace(0, 1, 11), DIs, marker='o')
plt.plot([0, 1], [1, 1], 'g')
plt.plot([0, 1], [0.8, 0.8], 'r')
plt.ylim([0.4, 1.2])
plt.ylabel('Disparate Impact (DI)')
plt.xlabel('repair level')
plt.show()

    References:
        .. [1] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and
           S. Venkatasubramanian, "Certifying and removing disparate impact."
           ACM SIGKDD International Conference on Knowledge Discovery and Data
           Mining, 2015.