# Yoonhyuck WOO / Purdue University_Computer and Information Technology
# Assignment 4: Detecting and Mitigating Bias 
# Professor: Dr. Pradhan

## Date: 3.4 - 5:00pm 3/22/2024 (EST)

#### references
- class lectures


The goal of this tutorial is to introduce the basic functionality of AI Fairness 360 for detecting and mitigating bias. As before, we will work with the German Credit dataset. There are many metrics one can use to detect the presence of bias. Likewise, there are many different bias mitigation algorithms one can employ. AI Fairness 360 provides some of them most common metrics and algorithms.


### Bias mitigation techniques

We learnt about the different bias mitigation techniques in class called _pre-processing_, _in-processing_, and _post-processing_.


We will use AI Fairness 360 (`aif360`) to detect and mitigate bias. We will look for bias in the creation of a machine learning model that predicts whether an applicant should be given credit based on various features from a typical credit application. The protected attribute will be "Age", with "1" (older than or equal to 25) and "0" (younger than 25) being the values for the _privileged_ and _unprivileged_ groups, respectively.

In this notebook, we will:

1. Install and import packages and modules
2. Load dataset, split between train and test, and compute fairness metrics on original training dataset
3. Mitigate bias using a pre-processing algorithm (reweighing)
4. Mitigate bias using an in-processing algorithm (adversarial debiasing)
5. Mitigate bias using a post-processing algorithm (equalized odds post processing)


## 1. Import Statements

First, we install the necessary packages. Then we import several components from the `aif360` package. We are relying on aif360 for this assignment, so please start early to make sure that the dependencies are resolved and that the pacakges load correctly. 

In [2]:
!pip install numba==0.48

Collecting numba==0.48

ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\Users\\LG\\AppData\\Roaming\\Python\\Python38\\site-packages\\~lvmlite\\binding\\llvmlite.dll'
Consider using the `--user` option or check the permissions.




  Using cached numba-0.48.0-1-cp38-cp38-win_amd64.whl (2.1 MB)
Collecting llvmlite<0.32.0,>=0.31.0dev0
  Using cached llvmlite-0.31.0-cp38-cp38-win_amd64.whl (13.6 MB)
Installing collected packages: llvmlite, numba
  Attempting uninstall: llvmlite
    Found existing installation: llvmlite 0.41.1
    Uninstalling llvmlite-0.41.1:
      Successfully uninstalled llvmlite-0.41.1


In [2]:
!pip install tensorflow-macos

ERROR: Could not find a version that satisfies the requirement tensorflow-macos
ERROR: No matching distribution found for tensorflow-macos


In [3]:
import aif360

In [4]:
print(aif360.__version__)

0.2.2


In [2]:
!pip install numpy==1.20.3 --user

Collecting numpy==1.20.3


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
umap-learn 0.5.5 requires numba>=0.51.2, but you have numba 0.48.0 which is incompatible.
torchvision 0.10.1 requires torch==1.9.1, but you have torch 2.2.1 which is incompatible.
torchtext 0.10.1 requires torch==1.9.1, but you have torch 2.2.1 which is incompatible.
pynndescent 0.5.11 requires numba>=0.51.2, but you have numba 0.48.0 which is incompatible.
infairness 0.2.3 requires numpy>=1.21.6, but you have numpy 1.20.3 which is incompatible.
fairlearn 0.10.0 requires numpy>=1.24.4, but you have numpy 1.20.3 which is incompatible.
allennlp 2.9.2 requires torch<1.12.0,>=1.6.0, but you have torch 2.2.1 which is incompatible.
allennlp 2.9.2 requires transformers<4.18,>=4.1, but you have transformers 4.18.0 which is incompatible.


  Using cached numpy-1.20.3-cp38-cp38-win_amd64.whl (13.7 MB)
Installing collected packages: numpy
Successfully installed numpy-1.20.3


In [1]:
import numpy

In [2]:
print(numpy.__version__)

1.20.3


In [5]:
# No need to re-install if you already did so in Assignment 2
!pip install aif360==0.2.2

Collecting aif360==0.2.2
  Using cached aif360-0.2.2-py2.py3-none-any.whl (56.4 MB)
Installing collected packages: aif360
  Attempting uninstall: aif360
    Found existing installation: aif360 0.6.0
    Uninstalling aif360-0.6.0:
      Successfully uninstalled aif360-0.6.0
Successfully installed aif360-0.2.2


In [5]:
!pip install aif360==0.6.0

Collecting aif360==0.6.0
  Using cached aif360-0.6.0-py3-none-any.whl (229 kB)
Installing collected packages: aif360
  Attempting uninstall: aif360
    Found existing installation: aif360 0.2.2
    Uninstalling aif360-0.2.2:
      Successfully uninstalled aif360-0.2.2
Successfully installed aif360-0.6.0


On my laptop, when I used the aif360=0.2.2 version, it didn't work, like 'gd2 = GermanDataset()'. So, I upgraded it to v.0.6.0, refreshed the kernel, and ran the code again, but under this version, it says error. So I ran again '!pip install aif360==0.2.2' and did not refresh the kernel, and the code worked well, including' gd2 = GermanDataset()'. I am still finding the reason, but I undertake this assignment by following the above process.

In [3]:
# import all necessary packages
import numpy as np
np.random.seed(0)

from numba import jit

from aif360.datasets import GermanDataset, BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, DatasetMetric

from aif360.algorithms.preprocessing import Reweighing, LFR, DisparateImpactRemover
from aif360.algorithms.inprocessing import AdversarialDebiasing
from aif360.algorithms.postprocessing import EqOddsPostprocessing

from aif360.explainers import MetricTextExplainer, MetricJSONExplainer

from sklearn.linear_model import LogisticRegression

import tensorflow as tf
print(tf.__version__)

from IPython.display import Markdown, display

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

import json
from collections import OrderedDict

  warn_deprecated('vmap', 'torch.vmap')


2.8.0


## 2. Load Data, Specify Protected Attribute, and Split Data

We will use the German Credit data, set the protected attribute to be age, create two variables to represent the privileged and unprivileged groups, and split the original dataset into training and test data subsets. Finally, we will build a typical machine learning workflow that involves training a machine learning model on the training dataset and use a test dataset to assess the model's efficacy (e.g., accuracy, fairness). For this dataset, we have a binary classification problem that predicts individuals as being a good or a bad credit risk.

In this dataset, we consider older applicants (`age >= 25`) as the privileged group and younger applicants (`age < 25`) as the unprivileged group. 

We will use the preprocessed GermanDataset with one-hot encoded data provided by the aif360 package. 

In [6]:
# note that we drop sex, which may also be a protected attribute
dataset_orig = GermanDataset(protected_attribute_names=['age'],
                             privileged_classes=[lambda x: x >= 25],
                             features_to_drop=['personal_status', 'sex'])

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]

In [7]:
print("Original data shape: ",dataset_orig.features.shape)
print("Train dataset shape: ", dataset_orig_train.features.shape)
print("Test dataset shape: ", dataset_orig_test.features.shape)

Original data shape:  (1000, 57)
Train dataset shape:  (700, 57)
Test dataset shape:  (300, 57)


The object ```dataset_orig``` is an aif360 dataset, which has some useful methods and attributes that you can explore. More documentation is available at https://aif360.readthedocs.io/en/latest/modules/datasets.html. 
For now, we'll just transform the data into a pandas dataframe:

In [8]:
df, dict_df = dataset_orig.convert_to_dataframe()
print("Shape: ", df.shape)
# print(df.columns)
# df.head(5)

Shape:  (1000, 58)


In [9]:
df.head(5)

Unnamed: 0,month,credit_amount,investment_as_income_percentage,residence_since,age,number_of_credits,people_liable_for,status=A11,status=A12,status=A13,...,housing=A153,skill_level=A171,skill_level=A172,skill_level=A173,skill_level=A174,telephone=A191,telephone=A192,foreign_worker=A201,foreign_worker=A202,credit
0,6.0,1169.0,4.0,4.0,1.0,2.0,1.0,1.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0
1,48.0,5951.0,2.0,2.0,0.0,1.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,2.0
2,12.0,2096.0,2.0,3.0,1.0,1.0,2.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0
3,42.0,7882.0,2.0,4.0,1.0,1.0,2.0,1.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
4,24.0,4870.0,3.0,4.0,1.0,2.0,2.0,1.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,2.0


## 3. Compute Fairness Metrics on Original Training Data
Now that we have identified the protected attribute "age" and defined privileged and unprivileged values, we can use aif360 to detect bias in the dataset.  

### Mean Outcomes

Compare the base rates (i.e., percentage of favorable results) for the privileged and unprivileged groups and report the difference (unprivileged base rate - privileged base rate). This is implemented in the ```mean_difference``` method on the BinaryLabelDatasetMetric class, as shown below:

In [10]:
metric_orig_train = BinaryLabelDatasetMetric(
     dataset_orig_train, 
     unprivileged_groups=unprivileged_groups,
     privileged_groups=privileged_groups
  )
print("Original training dataset")
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

Original training dataset
Difference in mean outcomes between unprivileged and privileged groups = -0.169905


### Disparate Impact
We can calculate the ratio of (predicted) favorable outcomes for the unprivileged group compared to the privileged group as implemented in the ```disparate_impact``` method on the BinaryLabelDatasetMetric class:

In [11]:
print("Original training dataset")
print("Disparate Impact = %f" % metric_orig_train.disparate_impact())

Original training dataset
Disparate Impact = 0.766430


**Note:** The fairness metrics above will vary depending upon the train-test split. If the magnitude of mean difference is less than 10%, try another split.

### Built-In Explainers

```aif360``` has some useful explainers for the fairness metrics which can be used to interpret the fairness metric values:

In [12]:
json_expl = MetricJSONExplainer(metric_orig_train)
def format_json(json_str):
    return json.dumps(json.loads(json_str, object_pairs_hook=OrderedDict),
                      indent=2)

Let's print the mean difference explainer:

In [13]:
print(format_json(json_expl.mean_difference()))

{
  "metric": "Mean Difference",
  "message": "Mean difference (mean label value on unprivileged instances - mean label value on privileged instances): -0.1699054740619017",
  "numPositivesUnprivileged": 63.0,
  "numInstancesUnprivileged": 113.0,
  "numPositivesPrivileged": 427.0,
  "numInstancesPrivileged": 587.0,
  "description": "Computed as the difference of the rate of favorable outcomes received by the unprivileged group to the privileged group.",
  "ideal": "The ideal value of this metric is 0.0"
}


We can also print the disparate impact explainer:

In [14]:
print(format_json(json_expl.disparate_impact()))

{
  "metric": "Disparate Impact",
  "message": "Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.7664297113013201",
  "numPositivePredictionsUnprivileged": 63.0,
  "numUnprivileged": 113.0,
  "numPositivePredictionsPrivileged": 427.0,
  "numPrivileged": 587.0,
  "description": "Computed as the ratio of rate of favorable outcome for the unprivileged group to that of the privileged group.",
  "ideal": "The ideal value of this metric is 1.0 A value < 1 implies higher benefit for the privileged group and a value >1 implies a higher benefit for the unprivileged group."
}


**Q1:** Using the explainers above, interpret the difference in means and disparate impact in the German Credit data:

**Q1_Write your interpretation here**

- statistical parity: If the values are less than 0, the privileged group has a higher proportion of predicted positive outcomes than the unprivileged group 

- Disparate Imapct: If the values are less than 1, the same is true when statistical parity values are less than 0, which means positive bias.

Thus, both metrics have similar meanings.

### Build a model on the training data

Let's build a logistic regression model on this training data, predict credit risk for test data and compute the same fairness metrics over the model predictions.

In [15]:
model = LogisticRegression(solver='liblinear', class_weight='balanced')

df_test, dict_df_test = dataset_orig_test.convert_to_dataframe()
df_train, dict_df_train = dataset_orig_train.convert_to_dataframe()

# Fit the model to the training data
x_train = df_train.drop(['credit'], axis=1)
y_train = df_train['credit']
model.fit(x_train, y_train)

x_test = df_test.drop(['credit'], axis=1)
y_test = df_test['credit']

y_pred = model.predict(x_test)

dataset_pred_test = dataset_orig_test.copy()
dataset_pred_test.labels = y_pred.copy()

metric_dataset_test = BinaryLabelDatasetMetric(
    dataset_pred_test, 
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

In [16]:
# write code here to compute fairness metrics
json_expl_q2 = MetricJSONExplainer(metric_dataset_test)

In [17]:
print(format_json(json_expl_q2.mean_difference()))

{
  "metric": "Mean Difference",
  "message": "Mean difference (mean label value on unprivileged instances - mean label value on privileged instances): -0.30303030303030304",
  "numPositivesUnprivileged": 12.0,
  "numInstancesUnprivileged": 36.0,
  "numPositivesPrivileged": 168.0,
  "numInstancesPrivileged": 264.0,
  "description": "Computed as the difference of the rate of favorable outcomes received by the unprivileged group to the privileged group.",
  "ideal": "The ideal value of this metric is 0.0"
}


In [18]:
print(format_json(json_expl_q2.disparate_impact()))

{
  "metric": "Disparate Impact",
  "message": "Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.5238095238095238",
  "numPositivePredictionsUnprivileged": 12.0,
  "numUnprivileged": 36.0,
  "numPositivePredictionsPrivileged": 168.0,
  "numPrivileged": 264.0,
  "description": "Computed as the ratio of rate of favorable outcome for the unprivileged group to that of the privileged group.",
  "ideal": "The ideal value of this metric is 1.0 A value < 1 implies higher benefit for the privileged group and a value >1 implies a higher benefit for the unprivileged group."
}


**Q2:** Using the fairness metric functions as before, report the bias observed in the model's predictions over test data. What do these values indicate? Are the model's predictions more biased or less biased compared to the bias observed in the training data?

**Q2_Write your answer here**

As before, both metrics show that the privileged group has a higher proportion of predicted positive outcomes than the unprivileged group, so there is a positive bias.

|      | Original data | Model's Predictions over test data   |
| :---        |    :----:   |          ---: |
| **Mean difference**  | -0.1699       | -0.3030   |
| **Disparate Impact**   | 0.7644       | 0.5238      |

According to the above crosstab, both values of original data are closer to the ideal value, so the model predicts less bias in original data.

## 4. Bias Mitigation Techniques

We learnt in class that there are several bias mitigation techniques namely, pre-processing, in-processing, and post-processing algorithms.

_Pre-processing_ bias mitigation is performed at the data end, before the creation of the model. In other words, we transform the data such that a model learned on the transformed data produces less biased decisions.

_In-processing_ bias mitigation methods focus on the model training stage, as compared to pre-processing which focuses on transforming the data prior to model training. This suite of methods includes incorporating a fairness constraint during model training, tweaking the model's objective function, and adversarial learning.

_Post-processing_ bias mitigation focus on the model predictions after the model has been trained.



### 4.1 Bias Mitigation via Pre-Processing

AI Fairness 360 implements several pre-processing mitigation algorithms. We will use the **reweighing algorithm**, which is implemented in the `Reweighing` class in the `aif360.algorithms.preprocessing` package. As discussed in class, this algorithm will transform the dataset by assigning weights to instances in each (group, label) combination to change the base rates and ensure fairness before classification. The idea is to apply appropriate weights to different tuples in the training data to reduce discrimination with respect to the protected attributes.

You can find documentation for reweighting here:
https://aif360.readthedocs.io/en/latest/modules/generated/aif360.algorithms.preprocessing.Reweighing.html 

Call the fit and transform methods to perform the transformation, producing a newly transformed training dataset (```dataset_transf_train```):

In [19]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)
dataset_transf_train = RW.fit_transform(dataset_orig_train)

We can print the weights. Each observation in the data should have a weight. For brevity, let's look at the weights for the first 10 rows:

In [20]:
len(dataset_transf_train.instance_weights)
dataset_transf_train.instance_weights[0:10]

array([0.96229508, 0.96229508, 0.96229508, 0.96229508, 0.96229508,
       0.96229508, 0.96229508, 0.96229508, 1.25555556, 0.678     ])

### Compute Fairness Metrics in Transformed Data

We can check how effective the transformed data was in removing bias by calculating the metrics used for the original training dataset.

In [65]:
metric_rw_train = BinaryLabelDatasetMetric(
    dataset_transf_train, 
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

Print the difference in mean outcomes and disparate impact in the transformed data:

In [66]:
# write your code here
# write code here to compute fairness metrics
json_expl_q3 = MetricJSONExplainer(metric_rw_train)

In [67]:
print(format_json(json_expl_q3.disparate_impact()))

{
  "metric": "Disparate Impact",
  "message": "Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 1.0000000000000004",
  "numPositivePredictionsUnprivileged": 79.10000000000002,
  "numUnprivileged": 113.00000000000003,
  "numPositivePredictionsPrivileged": 410.89999999999986,
  "numPrivileged": 587.0,
  "description": "Computed as the ratio of rate of favorable outcome for the unprivileged group to that of the privileged group.",
  "ideal": "The ideal value of this metric is 1.0 A value < 1 implies higher benefit for the privileged group and a value >1 implies a higher benefit for the unprivileged group."
}


In [68]:
print(format_json(json_expl_q3.mean_difference()))

{
  "metric": "Mean Difference",
  "message": "Mean difference (mean label value on unprivileged instances - mean label value on privileged instances): 3.3306690738754696e-16",
  "numPositivesUnprivileged": 79.10000000000002,
  "numInstancesUnprivileged": 113.00000000000003,
  "numPositivesPrivileged": 410.89999999999986,
  "numInstancesPrivileged": 587.0,
  "description": "Computed as the difference of the rate of favorable outcomes received by the unprivileged group to the privileged group.",
  "ideal": "The ideal value of this metric is 0.0"
}


**Q3:** How do these values compare to the difference in mean outcomes and disparate impact in the original data?

**Q3_Write your answer in this text cell:**

|      | Original Data | Reweighing Data   | Model's Predictions over test data   |
| :---        |    :----:   |          ---: |         ---: |
| **Mean difference**  | -0.1699       | 3.3306   | -0.3030   |
| **Disparate Impact**   | 0.7664       | 1      | 0.5238      |

It is fascinating. While the disparate impact of reweighing data is almost ideal, the mean difference is more biased.

### Compute Fairness Metrics on Model Trained on Transformed Data

In the following, we will train a model on the transformed data and compute the metrics over predictions made on the test data.

**Q4:**  How do you expect the fairness metrics would be over a model trained on the transformed data?

**Q4_Write your answer in this text cell:**

Even if the fairness metrics over reweighing data show differently, I expect the model's predictions over transformed data will show similarly as before because, according to Q2, fairness metrics on the model's predictions show farther from the ideal value than the original data.

Since the instances now have weights, we will use a classifier that can incorporate instance weights. In this case, we will use a Naive Bayes classifier (more details here: https://scikit-learn.org/stable/modules/naive_bayes.html). 

In [25]:
df_train_rw, dict_df_train_rw = dataset_transf_train.convert_to_dataframe()

# Fit the model to the transformed training data
x_train_rw = df_train_rw.drop(['credit'], axis=1)
y_train_rw = df_train_rw['credit']

from sklearn.naive_bayes import GaussianNB
model__gnb = GaussianNB()
model__gnb.fit(x_train_rw, y_train_rw)

# Use the model to make predictions on the test data
y_pred_rw = model__gnb.predict(x_test)

dataset_pred_test_rw = dataset_orig_test.copy()
dataset_pred_test_rw.labels = y_pred_rw.copy()

# Construct the BinaryLabelDatasetMetric object over the test predictions
metric_dataset_test_rw = BinaryLabelDatasetMetric(
    dataset_pred_test_rw, 
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

In [26]:
# Print fairness metrics computed over test predictions
# write code here
json_expl_q5 = MetricJSONExplainer(metric_dataset_test_rw)

In [27]:
print(format_json(json_expl_q5.disparate_impact()))

{
  "metric": "Disparate Impact",
  "message": "Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.810077519379845",
  "numPositivePredictionsUnprivileged": 19.0,
  "numUnprivileged": 36.0,
  "numPositivePredictionsPrivileged": 172.0,
  "numPrivileged": 264.0,
  "description": "Computed as the ratio of rate of favorable outcome for the unprivileged group to that of the privileged group.",
  "ideal": "The ideal value of this metric is 1.0 A value < 1 implies higher benefit for the privileged group and a value >1 implies a higher benefit for the unprivileged group."
}


In [28]:
print(format_json(json_expl_q5.mean_difference()))

{
  "metric": "Mean Difference",
  "message": "Mean difference (mean label value on unprivileged instances - mean label value on privileged instances): -0.1237373737373737",
  "numPositivesUnprivileged": 19.0,
  "numInstancesUnprivileged": 36.0,
  "numPositivesPrivileged": 172.0,
  "numInstancesPrivileged": 264.0,
  "description": "Computed as the difference of the rate of favorable outcomes received by the unprivileged group to the privileged group.",
  "ideal": "The ideal value of this metric is 0.0"
}


**Q5:** Are your observations in line with what you expected in Q4 above? Why or why not?

**Q5_Write your answer in this text cell:**

|      | Original Data | Reweighing Data   | Model's Predictions over test data   |Model's Predictions over RW data   |
| :---        |    :----:   |          ---: |         ---: |        ---: |
| **Mean difference**  | -0.1699       | 3.3306   | -0.3030   | -0.1237   |
| **Disparate Impact**   | 0.7664       | 1      | 0.5238      | 0.81    |

While the disparate impact shows what I expected, the mean difference does not. Instead, the mean difference value shows the most ideal among the others. In addition, even if the model's prediction over reweighing data is lower than the fairness of the original reweighing data, when I compare each model's predictions over each data, the prediction over reweighing data shows better fairness than the other in both values.

The observation does not align with what I expected from the data and the model because we manipulated the data and used another model, so these differences are enough to show different values.

**Q6:** Instead of reweighing, one could also apply techniques such as suppression, i.e. removing sensitive attributes. Write code below to train a model that does not use any information on the sensitive attribute, use this model to make predictions over the test data, and then compute the fairness metrics over the predictions.


-  Reference: https://www.oreilly.com/library/view/practical-fairness/9781492075721/ch04.html#callout_fairness_pre_processing_CO1-1

Suppression part function 'def build_logit_model_suppression' part

In [30]:
label_map = {1.0: 'Good Credit', 0.0: 'Bad Credit'}
protected_attribute_maps = [{1.0: 'Male', 0.0: 'Female'}]
gd = GermanDataset(protected_attribute_names=['sex'],
                   privileged_classes=[['male']], 
                   metadata={'label_map': label_map, 'protected_attribute_maps': protected_attribute_maps})

In [135]:
aa=dataset_orig_sup.features[:,]
print(aa)

[[6.000e+00 1.169e+03 4.000e+00 ... 1.000e+00 1.000e+00 0.000e+00]
 [4.800e+01 5.951e+03 2.000e+00 ... 0.000e+00 1.000e+00 0.000e+00]
 [1.200e+01 2.096e+03 2.000e+00 ... 0.000e+00 1.000e+00 0.000e+00]
 ...
 [1.200e+01 8.040e+02 4.000e+00 ... 0.000e+00 1.000e+00 0.000e+00]
 [4.500e+01 1.845e+03 4.000e+00 ... 1.000e+00 1.000e+00 0.000e+00]
 [4.500e+01 4.576e+03 3.000e+00 ... 0.000e+00 1.000e+00 0.000e+00]]


In [140]:
print(aa[:,4][:4])
print(aa[:,7][:4])

[1. 0. 1. 1.]
[1. 0. 1. 1.]


In [141]:
aa2 = np.delete(aa, 4&7, 1)
print(aa2)
print(aa2[:,4][:4])
print(aa2[:,7][:4])

[[6.000e+00 1.169e+03 4.000e+00 ... 1.000e+00 1.000e+00 0.000e+00]
 [4.800e+01 5.951e+03 2.000e+00 ... 0.000e+00 1.000e+00 0.000e+00]
 [1.200e+01 2.096e+03 2.000e+00 ... 0.000e+00 1.000e+00 0.000e+00]
 ...
 [1.200e+01 8.040e+02 4.000e+00 ... 0.000e+00 1.000e+00 0.000e+00]
 [4.500e+01 1.845e+03 4.000e+00 ... 1.000e+00 1.000e+00 0.000e+00]
 [4.500e+01 4.576e+03 3.000e+00 ... 0.000e+00 1.000e+00 0.000e+00]]
[2. 1. 1. 1.]
[1. 0. 0. 1.]


In [123]:
bb=dataset_orig_sup.labels.ravel()
print(bb)

[1. 2. 1. 1. 2. 1. 1. 1. 1. 2. 2. 2. 1. 2. 1. 2. 1. 1. 2. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 2. 1. 2. 1. 1. 1. 1. 1. 1. 2. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 2. 1. 2. 1. 1. 2. 1. 1. 2. 2. 1. 1. 1. 1. 2. 1. 1. 1.
 1. 1. 2. 1. 2. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 2. 1. 2. 1. 1. 2. 1. 1. 2.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 1. 1. 1. 1. 1. 1. 2. 1. 1. 2. 1. 2. 1.
 2. 1. 1. 1. 2. 1. 1. 2. 1. 2. 1. 2. 1. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 2.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 1.
 1. 2. 2. 1. 2. 1. 2. 2. 1. 1. 1. 1. 2. 2. 2. 1. 2. 1. 2. 1. 2. 1. 2. 2.
 2. 1. 2. 2. 1. 2. 1. 2. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 2. 1. 2. 1. 1. 1. 1. 2. 2. 2. 1. 1.
 2. 1. 2. 1. 1. 1. 1. 1. 1. 2. 1. 1. 2. 1. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1.
 1. 2. 1. 1. 2. 1. 1. 1. 1. 2. 2. 1. 1. 1. 2. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 2. 1. 2. 1. 1. 1. 2. 1. 1. 1. 1. 1. 2. 2. 1. 2. 1. 1. 2. 2. 1. 1. 1.
 1. 2. 1. 2. 1. 1. 1. 1. 2. 2. 1. 1. 1. 1. 1. 1. 1.

In [125]:
cc=dataset_orig_sup.instance_weights.ravel()
print(cc)

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.

In [36]:
dataset_orig_sup = GermanDataset(protected_attribute_names=['sex', 'age'],
                 privileged_classes=[['male'], lambda x: x > 25],
    features_to_drop=['personal_status'])
dataset_orig_train_sup, dataset_orig_test_sup = dataset_orig_sup.split([0.7], shuffle=True)


In [44]:
model2 = LogisticRegression(solver='liblinear', class_weight='balanced')

df_test_sup, dict_df_test_sup = dataset_orig_test_sup.convert_to_dataframe()
df_train_sup, dict_df_train_sup = dataset_orig_train_sup.convert_to_dataframe()

# Fit the model to the training data
x_train_sup = df_train_sup.drop(['credit','sex','age'], axis=1)
y_train_sup = df_train_sup['credit']
model2.fit(x_train_sup, y_train_sup)

x_test_sup = df_test_sup.drop(['credit','sex','age'], axis=1)
y_test_sup = df_test['credit']

y_pred_sup = model2.predict(x_test_sup)

dataset_pred_test_sup = dataset_orig_test_sup.copy()
dataset_pred_test_sup.labels = y_pred_sup.copy()

In [56]:
up = [{'sex': 1, 'age': 1}, {'sex': 0}]
p = [{'sex': 1, 'age': 0}]
metric_dataset_test_sup = BinaryLabelDatasetMetric(
    dataset_pred_test_sup,
privileged_groups=p,
unprivileged_groups=up)

In [61]:
print("Suppression")
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_test_sup.mean_difference())
print("Disparate Impact = %f" % metric_dataset_test_sup.disparate_impact())

Suppression
Difference in mean outcomes between unprivileged and privileged groups = 0.074949
Disparate Impact = 1.145214


In [58]:
df_sup, dict_df_sup = dataset_orig_sup.convert_to_dataframe()
print("Shape: ", df_sup.shape)
# print(df.columns)
df_sup.head(5)

Shape:  (1000, 59)


Unnamed: 0,month,credit_amount,investment_as_income_percentage,residence_since,age,number_of_credits,people_liable_for,sex,status=A11,status=A12,...,housing=A153,skill_level=A171,skill_level=A172,skill_level=A173,skill_level=A174,telephone=A191,telephone=A192,foreign_worker=A201,foreign_worker=A202,credit
0,6.0,1169.0,4.0,4.0,1.0,2.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0
1,48.0,5951.0,2.0,2.0,0.0,1.0,1.0,0.0,0.0,1.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,2.0
2,12.0,2096.0,2.0,3.0,1.0,1.0,2.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0
3,42.0,7882.0,2.0,4.0,1.0,1.0,2.0,1.0,1.0,0.0,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
4,24.0,4870.0,3.0,4.0,1.0,2.0,2.0,1.0,1.0,0.0,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,2.0


**Q7:** Interpret your results. How does the preprocessing technique in Q5 compare to the suppression technique? 

|      | Original Data | Reweighing Data   | Model's Predictions over test data   |Model's Predictions over RW data   |Suppression   |
| :---        |    :----:   |          ---: |         ---: |        ---: |        ---: |
| **Mean difference**  | -0.1699       | 3.3306   | -0.3030   | -0.1237   | 0.074949   |
| **Disparate Impact**   | 0.7664       | 1      | 0.5238      | 0.81    |1.1452    |

**Write your answer in this text cell:**
Overall, the suppression shows better fairness than the Model's prediction over RW data.

### 4.2. Bias Mitigation via In-Processing

In-processing methods focus on the model training stage, as compared to pre-processing which focuses on transforming the data prior to model training. Broadly speaking, contemporary in-processing methods are stronger than pre-processing methods.

### Adversarial Debiasing

In this part of the notebook, we will use an in-processing algorithm, called _Adversarial Debiasing_, that we briefly discussed in class. From the aif360 documentation (https://aif360.readthedocs.io/en/v0.2.3/modules/inprocessing.html):

> Adversarial debiasing is an in-processing technique that learns a classifier to maximize prediction accuracy and simultaneously reduce an adversary’s ability to determine the protected attribute from the predictions. This approach leads to a fair classifier as the predictions cannot carry any group discrimination information that the adversary can exploit.

For intuition, you can think of adversarial debiasing as a model with two supervised learning tasks. The first task is to predict an outcome using the training data input. The second task, i.e. the adversary, is to predict a protected feature using these predictions and non-protected features in the training data input. The aim is to maximize the model's ability to carry out the first task (i.e. predict outcomes) while minimizing its ability to carry out the second task (i.e. predict protected features).

We implement adversarial debiasing below:

In [63]:
import tensorflow as tf

print(tf.__version__)


2.8.0


In [69]:
# reset tensorflow graph
tf.compat.v1.reset_default_graph()

# start tensorflow session
sess = tf.compat.v1.Session()
tf.compat.v1.disable_eager_execution()

# create AdversarialDebiasing model
debiased_model = AdversarialDebiasing(
    privileged_groups = privileged_groups,
    unprivileged_groups = unprivileged_groups,
    scope_name = 'debiased_classifier',
    debias = True,
    sess = sess)

# fit the model to training data
debiased_model.fit(dataset_orig_train)

# make predictions on training and test data
dataset_debiasing_train = debiased_model.predict(dataset_orig_train)
dataset_debiasing_test = debiased_model.predict(dataset_orig_test)

# metrics
metric_dataset_debiasing_test = BinaryLabelDatasetMetric(
    dataset_debiasing_test, 
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

# Close session
sess.close()

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


epoch 0; iter: 0; batch classifier loss: 71.666153; batch adversarial loss: 0.660049
epoch 1; iter: 0; batch classifier loss: 50.631836; batch adversarial loss: 0.655333
epoch 2; iter: 0; batch classifier loss: 43.741997; batch adversarial loss: 0.666746
epoch 3; iter: 0; batch classifier loss: 52.799423; batch adversarial loss: 0.654768
epoch 4; iter: 0; batch classifier loss: 39.849537; batch adversarial loss: 0.650470
epoch 5; iter: 0; batch classifier loss: 60.623253; batch adversarial loss: 0.640732
epoch 6; iter: 0; batch classifier loss: 40.378685; batch adversarial loss: 0.635663
epoch 7; iter: 0; batch classifier loss: 56.493797; batch adversarial loss: 0.643048
epoch 8; iter: 0; batch classifier loss: 56.042507; batch adversarial loss: 0.640165
epoch 9; iter: 0; batch classifier loss: 48.317097; batch adversarial loss: 0.647052
epoch 10; iter: 0; batch classifier loss: 54.358109; batch adversarial loss: 0.622438
epoch 11; iter: 0; batch classifier loss: 61.225632; batch adver

### Fairness Metrics under Adversarial Debiasing

The adversarial debiasing algorithm has built-in methods for the difference in mean outcomes (called ```.mean_difference()```) and disparate impact (called ```.disparate_impact()```). Print these below: 

In [70]:
# write your code here
print("Adversarial Debiasing")
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())
print("Disparate Impact = %f" % metric_dataset_debiasing_test.disparate_impact())

Adversarial Debiasing
Difference in mean outcomes between unprivileged and privileged groups = 0.000000
Disparate Impact = 1.000000


**Q8:** Interpret the difference in means and disparate impact for the predicted outcomes under adversarial debiasing. How do these compare to the metrics calculated in Q2 and Q5?

|      | Model's Predictions over test data   |Model's Predictions over RW data   | Adversarial Debiasing|
| :---        |        ---: |        ---: |        ---: |
| **Mean difference**  | -0.3030   | -0.1237   | 0   |
| **Disparate Impact**   | 0.5238      | 0.81    | 1   |

**Write your interpretation and comparison in this text cell:**

Under adversarial debiasing, the difference in means and disparate impact for the predicted outcomes show an ideal value.

### 4.3. Bias Mitigation via Post-Processing

In this last section, we will use one of the post-processing algorithms in AI Fairness 360 called as **equalized odds postprocessing**, which is implemented in the `EqOddsPostprocessing` class in the `aif360.algorithms.postprocessing` package. This technique solves a linear program to find probabilities with which to change output labels to optimize equalized odds.

You can find documentation for reweighting here:
https://aif360.readthedocs.io/en/latest/modules/generated/aif360.algorithms.postprocessing.EqOddsPostprocessing.html 

Call the fit and transform methods to perform the transformation, producing a newly transformed training dataset (```dataset_post_train```):

In [72]:
df_test, dict_df_test = dataset_orig_test.convert_to_dataframe()
df_train, dict_df_train = dataset_orig_train.convert_to_dataframe()

# Fit the model to the training data and predict for test data
# write code here
# dataset_pred_test -- dataset with predictions stored in labels

# create Equalized Odds Post processing object
eo_post = EqOddsPostprocessing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)

# fit the object to training data
eo_post.fit(dataset_orig_test, dataset_pred_test)

# make predictions on test data
# write code here
dataset_eop_train = eo_post.predict(dataset_orig_train)
dataset_eop_test = eo_post.predict(dataset_orig_test)

# construct metrics object
# write code here
metric_dataset_eop_test = BinaryLabelDatasetMetric(
    dataset_eop_test, 
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

# compute fairnesss metrics 
# write code here
print("Equalized odds postprocessing")
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_eop_test.mean_difference())
print("Disparate Impact = %f" % metric_dataset_eop_test.disparate_impact())

Equalized odds postprocessing
Difference in mean outcomes between unprivileged and privileged groups = 0.345960
Disparate Impact = 1.992754


**Q9:** Interpret the difference in fairness metrics for the predicted outcomes under this post-processing technique. How do these compare to the metrics calculated in Q2, Q5 and Q8?

|      | Model's Predictions over test data   |Model's Predictions over RW data   | Adversarial Debiasing| Equalized Odds Postprocessing
| :---        |        ---: |        ---: |        ---: |        ---: |
| **Mean difference**  | -0.3030   | -0.1237   | 0   | 0.3459   |
| **Disparate Impact**   | 0.5238      | 0.81    | 1   | 1.9927   |

**Write your interpretation and comparison in this text cell:**
While both values in EOP are skewed toward positive values, they are not close to each ideal value.


# Submitting this Assignment Notebook

Once complete, please submit your assignment notebook as an attachment under \"Assignments > Assignment 4\" on Brightspace. You can download a copy of your notebook using ```File > Download .ipynb```. Please ensure you submit the `.ipynb` file (and not a `.py` file)."