# Adverserial Debiasing (In-processing)
Adversarial debiasing is an in-processing technique that learns a classifier to maximize prediction accuracy and simultaneously reduce an adversary’s ability to determine the protected attribute from the predictions. This approach leads to a fair classifier as the predictions cannot carry any group discrimination information that the adversary can exploit.

**References**
* B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating Unwanted Biases with Adversarial Learning,” AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2018.

In [1]:
# import relevant dependencies
import numpy as np
import pandas as pd

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

from aif360.datasets import StandardDataset
from aif360.algorithms.inprocessing import AdversarialDebiasing
from aif360.metrics import ClassificationMetric

`load_boston` has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as
investigated in [1], the authors of this dataset engineered a
non-invertible variable "B" assuming that racial self-segregation had a
positive impact on house prices [2]. Furthermore the goal of the
research that led to the creation of this dataset was to study the
impact of air quality but it did not give adequate demonstration of the
validity of this assumption.

The scikit-learn maintainers therefore strongly discourage the use of
this dataset unless the purpose of the code is to study and educate
about ethical issues in data science and machine learning.

In this special case, you can fetch the dataset from the original
source::

    import pandas as pd
    import numpy as np

    data_url = "http://lib.stat.cmu.edu/datasets/boston"
    raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
    data = np.hstack([raw_df.values[::2, :], raw_df

In [2]:
def create_dataset(
    X: pd.DataFrame,
    y,
    protected_attribute_name: str
) -> StandardDataset:
    if isinstance(y, np.ndarray):
        y = pd.Series(y.flatten(), index=X.index, name='class')
    return StandardDataset(
        df=pd.concat([X, y], axis=1),
        label_name="class",
        favorable_classes=[1],
        protected_attribute_names=[protected_attribute_name],
        privileged_classes=[[1]],
    )


In [3]:
# fetch raw-data from sklearn.datasets
raw_data = fetch_openml(data_id=1590, as_frame=True)

  warn(


In [4]:
from sklearn.preprocessing import MinMaxScaler

X_raw = pd.get_dummies(raw_data.data)
X_raw = pd.DataFrame(MinMaxScaler().fit_transform(X_raw), columns=X_raw.columns)
y = 1 * (raw_data.target == ">50K")

X_train, X_test, y_train, y_test = train_test_split(X_raw, y, test_size=0.5, random_state=42)
X_test, X_val, y_test, y_val = train_test_split(X_test, y_test, test_size=0.5, random_state=42)

In [5]:
protected_attribute_name = "sex_Male"

privileged_groups = [{protected_attribute_name: 1.0}]
unprivileged_groups = [{protected_attribute_name: 0.0}]

In [9]:
import tensorflow.compat.v1 as tf
tf.disable_eager_execution()

In [10]:
# Load post-processing algorithm that equalizes the odds
# Learn parameters with debias set to False
tf.reset_default_graph()
sess = tf.Session()

AB_PLAIN = AdversarialDebiasing(
    privileged_groups = privileged_groups,
    unprivileged_groups = unprivileged_groups,
    scope_name='plain_classifier',
    debias=False,
    sess=sess
)
dataset_train = create_dataset(X_train, y_train, protected_attribute_name)
AB_PLAIN.fit(dataset_train)

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


epoch 0; iter: 0; batch classifier loss: 0.648401
epoch 1; iter: 0; batch classifier loss: 0.385139
epoch 2; iter: 0; batch classifier loss: 0.306398
epoch 3; iter: 0; batch classifier loss: 0.361695
epoch 4; iter: 0; batch classifier loss: 0.286632
epoch 5; iter: 0; batch classifier loss: 0.438883
epoch 6; iter: 0; batch classifier loss: 0.289754
epoch 7; iter: 0; batch classifier loss: 0.346614
epoch 8; iter: 0; batch classifier loss: 0.327000
epoch 9; iter: 0; batch classifier loss: 0.320092
epoch 10; iter: 0; batch classifier loss: 0.371554
epoch 11; iter: 0; batch classifier loss: 0.324482
epoch 12; iter: 0; batch classifier loss: 0.319571
epoch 13; iter: 0; batch classifier loss: 0.340679
epoch 14; iter: 0; batch classifier loss: 0.396512
epoch 15; iter: 0; batch classifier loss: 0.343625
epoch 16; iter: 0; batch classifier loss: 0.332313
epoch 17; iter: 0; batch classifier loss: 0.285954
epoch 18; iter: 0; batch classifier loss: 0.336866
epoch 19; iter: 0; batch classifier loss:

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x24e26fdfd90>

In [11]:
y = np.zeros((X_test.shape[0],1))
dataset_test = create_dataset(X_test, y, protected_attribute_name)

In [12]:
dataset_test_pred_plain = AB_PLAIN.predict(dataset_test)

In [13]:
dataset_test_pred_plain.labels.mean()

0.219000819000819

In [14]:
# Load post-processing algorithm that equalizes the odds
# Learn parameters with debias set to False

AB_DEBIASED = AdversarialDebiasing(
    privileged_groups = privileged_groups,
    unprivileged_groups = unprivileged_groups,
    scope_name='debiased_classifier',
    debias=True,
    sess=sess
)
AB_DEBIASED.fit(dataset_train)
sess.close()

epoch 0; iter: 0; batch classifier loss: 0.782194; batch adversarial loss: 0.725723
epoch 1; iter: 0; batch classifier loss: 0.421570; batch adversarial loss: 0.637983
epoch 2; iter: 0; batch classifier loss: 0.405619; batch adversarial loss: 0.636180
epoch 3; iter: 0; batch classifier loss: 0.394683; batch adversarial loss: 0.634207
epoch 4; iter: 0; batch classifier loss: 0.372450; batch adversarial loss: 0.636455
epoch 5; iter: 0; batch classifier loss: 0.374253; batch adversarial loss: 0.643077
epoch 6; iter: 0; batch classifier loss: 0.388330; batch adversarial loss: 0.680611
epoch 7; iter: 0; batch classifier loss: 0.347921; batch adversarial loss: 0.623439
epoch 8; iter: 0; batch classifier loss: 0.391733; batch adversarial loss: 0.623459
epoch 9; iter: 0; batch classifier loss: 0.325650; batch adversarial loss: 0.622758
epoch 10; iter: 0; batch classifier loss: 0.346007; batch adversarial loss: 0.574849
epoch 11; iter: 0; batch classifier loss: 0.285128; batch adversarial loss:

<aif360.algorithms.inprocessing.adversarial_debiasing.AdversarialDebiasing at 0x24e296de4c0>

In [15]:
dataset_test_pred_debiased = AB_DEBIASED.predict(dataset_test)

In [16]:
dataset_test_pred_debiased.labels.mean()

0.15331695331695333

In [17]:
metric_plain = ClassificationMetric(
    dataset_test,
    dataset_test_pred_plain,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
)

metric_debiased = ClassificationMetric(
    dataset_test,
    dataset_test_pred_debiased,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
)

In [22]:
metric_plain.disparate_impact()

0.3117628783513548

In [23]:
metric_debiased.disparate_impact()

0.9442368534700252

In [24]:
metric_plain.accuracy()

  TPR=TP / P, TNR=TN / N, FPR=FP / N, FNR=FN / P,
  GTPR=GTP / P, GTNR=GTN / N, GFPR=GFP / N, GFNR=GFN / P,


0.780999180999181

In [25]:
metric_debiased.accuracy()

0.8466830466830467