# Yoonhyuck WOO / Purdue University_Computer and Information Technology
# Assignment 4: Detecting and Mitigating Bias 
# Professor: Dr. Pradhan

## Date: 3.4 - 5:00pm 3/22/2024 (EST)

#### references
- class lectures


The goal of this tutorial is to introduce the basic functionality of AI Fairness 360 for detecting and mitigating bias. As before, we will work with the German Credit dataset. There are many metrics one can use to detect the presence of bias. Likewise, there are many different bias mitigation algorithms one can employ. AI Fairness 360 provides some of them most common metrics and algorithms.


### Bias mitigation techniques

We learnt about the different bias mitigation techniques in class called _pre-processing_, _in-processing_, and _post-processing_.


We will use AI Fairness 360 (`aif360`) to detect and mitigate bias. We will look for bias in the creation of a machine learning model that predicts whether an applicant should be given credit based on various features from a typical credit application. The protected attribute will be "Age", with "1" (older than or equal to 25) and "0" (younger than 25) being the values for the _privileged_ and _unprivileged_ groups, respectively.

In this notebook, we will:

1. Install and import packages and modules
2. Load dataset, split between train and test, and compute fairness metrics on original training dataset
3. Mitigate bias using a pre-processing algorithm (reweighing)
4. Mitigate bias using an in-processing algorithm (adversarial debiasing)
5. Mitigate bias using a post-processing algorithm (equalized odds post processing)


## 1. Import Statements

First, we install the necessary packages. Then we import several components from the `aif360` package. We are relying on aif360 for this assignment, so please start early to make sure that the dependencies are resolved and that the pacakges load correctly. 

In [1]:
import numba

In [11]:
import aif360

In [12]:
print(aif360.__version__)

0.6.0


In [9]:
!pip install aif360[inFairness]

^C


In [5]:
!pip install tensorflow-macos

ERROR: Could not find a version that satisfies the requirement tensorflow-macos
ERROR: No matching distribution found for tensorflow-macos


In [12]:
# No need to re-install if you already did so in Assignment 2
!pip install aif360==0.2.2

Collecting aif360==0.2.2
  Using cached aif360-0.2.2-py2.py3-none-any.whl (56.4 MB)
Installing collected packages: aif360
  Attempting uninstall: aif360
    Found existing installation: aif360 0.6.0
    Uninstalling aif360-0.6.0:
      Successfully uninstalled aif360-0.6.0
Successfully installed aif360-0.2.2


In [5]:
!pip install aif360==0.6.0

Collecting aif360==0.6.0
  Using cached aif360-0.6.0-py3-none-any.whl (229 kB)
Installing collected packages: aif360
  Attempting uninstall: aif360
    Found existing installation: aif360 0.2.2
    Uninstalling aif360-0.2.2:
      Successfully uninstalled aif360-0.2.2
Successfully installed aif360-0.6.0


In [15]:
# import all necessary packages
import numpy as np
np.random.seed(0)

from numba import jit
from aif360.datasets import GermanDataset, BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, DatasetMetric

from aif360.algorithms.preprocessing import Reweighing, LFR, DisparateImpactRemover
from aif360.algorithms.inprocessing import AdversarialDebiasing
from aif360.algorithms.postprocessing import EqOddsPostprocessing

from aif360.explainers import MetricTextExplainer, MetricJSONExplainer

from sklearn.linear_model import LogisticRegression

from IPython.display import Markdown, display

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

import json
from collections import OrderedDict

Collecting skorch
  Using cached skorch-0.15.0-py3-none-any.whl (239 kB)
Collecting inFairness>=0.2.2
  Using cached inFairness-0.2.3-py3-none-any.whl (45 kB)
Collecting torch>=1.13.0
  Using cached torch-2.2.1-cp38-cp38-win_amd64.whl (198.6 MB)
Collecting POT>=0.8.0
  Using cached POT-0.9.3-cp38-cp38-win_amd64.whl (294 kB)
Collecting typing-extensions>=4.8.0
  Using cached typing_extensions-4.10.0-py3-none-any.whl (33 kB)
Collecting tabulate>=0.7.7
  Using cached tabulate-0.9.0-py3-none-any.whl (35 kB)
Installing collected packages: typing-extensions, torch, tabulate, POT, skorch, inFairness
  Attempting uninstall: typing-extensions
    Found existing installation: typing-extensions 4.1.1
    Uninstalling typing-extensions-4.1.1:
      Successfully uninstalled typing-extensions-4.1.1
  Attempting uninstall: torch
    Found existing installation: torch 1.9.1
    Uninstalling torch-1.9.1:
      Successfully uninstalled torch-1.9.1
Successfully installed POT-0.9.3 inFairness-0.2.3 skorch

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.10.1 requires torch==1.9.1, but you have torch 2.2.1 which is incompatible.
torchtext 0.10.1 requires torch==1.9.1, but you have torch 2.2.1 which is incompatible.
torchaudio 0.9.1 requires torch==1.9.1, but you have torch 2.2.1 which is incompatible.
allennlp 2.9.2 requires torch<1.12.0,>=1.6.0, but you have torch 2.2.1 which is incompatible.
allennlp 2.9.2 requires transformers<4.18,>=4.1, but you have transformers 4.18.0 which is incompatible.


## 2. Load Data, Specify Protected Attribute, and Split Data

We will use the German Credit data, set the protected attribute to be age, create two variables to represent the privileged and unprivileged groups, and split the original dataset into training and test data subsets. Finally, we will build a typical machine learning workflow that involves training a machine learning model on the training dataset and use a test dataset to assess the model's efficacy (e.g., accuracy, fairness). For this dataset, we have a binary classification problem that predicts individuals as being a good or a bad credit risk.

In this dataset, we consider older applicants (`age >= 25`) as the privileged group and younger applicants (`age < 25`) as the unprivileged group. 

We will use the preprocessed GermanDataset with one-hot encoded data provided by the aif360 package. 

In [14]:
label_map = {1.0: 'Good Credit', 0.0: 'Bad Credit'}
protected_attribute_maps = [{1.0: 'Male', 0.0: 'Female'}]
gd = GermanDataset(protected_attribute_names=['sex'],
                   privileged_classes=[['male']], 
                   metadata={'label_map': label_map, 'protected_attribute_maps': protected_attribute_maps})

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



IOError: [Errno 2] No such file or directory: 'C:\\Users\\LG\\anaconda3\\lib\\site-packages\\aif360\\datasets\\..\\data\\raw\\german\\german.data'
To use this class, please download the following files:

	https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data
	https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc

and place them, as-is, in the folder:

	C:\Users\LG\anaconda3\lib\site-packages\aif360\data\raw\german

Traceback (most recent call last):
  File "C:\Users\LG\anaconda3\lib\site-packages\aif360\datasets\german_dataset.py", line 78, in __init__
    df = pd.read_csv(filepath, sep=' ', header=None, names=column_names,
  File "C:\Users\LG\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 912, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\LG\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 577, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C

TypeError: object of type 'NoneType' has no len()

In [None]:
gd2 = GermanDataset()

In [14]:
print(gd2)

               instance weights features                \
                                                         
                                   month credit_amount   
instance names                                           
0                           1.0      6.0        1169.0   
1                           1.0     48.0        5951.0   
2                           1.0     12.0        2096.0   
3                           1.0     42.0        7882.0   
4                           1.0     24.0        4870.0   
...                         ...      ...           ...   
995                         1.0     12.0        1736.0   
996                         1.0     30.0        3857.0   
997                         1.0     12.0         804.0   
998                         1.0     45.0        1845.0   
999                         1.0     45.0        4576.0   

                                                                \
                                                               

In [8]:
# note that we drop sex, which may also be a protected attribute
dataset_orig_2 = GermanDataset(protected_attribute_names=['sex'],
                             privileged_classes=['male'])
# df['sex'] = df['personal_status'].replace(status_map)

dataset_orig_train, dataset_orig_test = dataset_orig_2.split([0.7], shuffle=True)

# privileged_groups = [{'age': 1}]
# unprivileged_groups = [{'age': 0}]

privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

NotImplementedError: 

In [6]:
german_data = GermanDataset(protected_attribute_names=['age'],
    privileged_classes=[lambda x: x > 25],
    features_to_drop=['personal_status'])

# german_data.drop(['sex', 'age'], axis=1, inplace=True)

ValueError: could not convert string to float: 'male'


ValueError: DataFrame values must be numerical.

In [24]:
default_mappings = {
    'label_maps': [{1.0: 'Good Credit', 2.0: 'Bad Credit'}],
    'protected_attribute_maps': [{1.0: 'Male', 0.0: 'Female'},
                                 {1.0: 'Old', 0.0: 'Young'}],
}

In [13]:
# note that we drop sex, which may also be a protected attribute
dataset_orig = GermanDataset(protected_attribute_names=['age'],
                             privileged_classes=[lambda x: x >= 25],
                             features_to_drop=['personal_status', 'sex'])

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]

In [14]:
print("Original data shape: ",dataset_orig.features.shape)
print("Train dataset shape: ", dataset_orig_train.features.shape)
print("Test dataset shape: ", dataset_orig_test.features.shape)

Original data shape:  (1000, 57)
Train dataset shape:  (700, 57)
Test dataset shape:  (300, 57)


The object ```dataset_orig``` is an aif360 dataset, which has some useful methods and attributes that you can explore. More documentation is available at https://aif360.readthedocs.io/en/latest/modules/datasets.html. 
For now, we'll just transform the data into a pandas dataframe:

In [15]:
df, dict_df = dataset_orig.convert_to_dataframe()
print("Shape: ", df.shape)
# print(df.columns)
# df.head(5)

Shape:  (1000, 58)


In [16]:
df.head(5)

Unnamed: 0,month,credit_amount,investment_as_income_percentage,residence_since,age,number_of_credits,people_liable_for,status=A11,status=A12,status=A13,...,housing=A153,skill_level=A171,skill_level=A172,skill_level=A173,skill_level=A174,telephone=A191,telephone=A192,foreign_worker=A201,foreign_worker=A202,credit
0,6.0,1169.0,4.0,4.0,1.0,2.0,1.0,1.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0
1,48.0,5951.0,2.0,2.0,0.0,1.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,2.0
2,12.0,2096.0,2.0,3.0,1.0,1.0,2.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0
3,42.0,7882.0,2.0,4.0,1.0,1.0,2.0,1.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
4,24.0,4870.0,3.0,4.0,1.0,2.0,2.0,1.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,2.0


In [25]:
print(dict_df)

{'feature_names': ['month', 'credit_amount', 'investment_as_income_percentage', 'residence_since', 'age', 'number_of_credits', 'people_liable_for', 'status=A11', 'status=A12', 'status=A13', 'status=A14', 'credit_history=A30', 'credit_history=A31', 'credit_history=A32', 'credit_history=A33', 'credit_history=A34', 'purpose=A40', 'purpose=A41', 'purpose=A410', 'purpose=A42', 'purpose=A43', 'purpose=A44', 'purpose=A45', 'purpose=A46', 'purpose=A48', 'purpose=A49', 'savings=A61', 'savings=A62', 'savings=A63', 'savings=A64', 'savings=A65', 'employment=A71', 'employment=A72', 'employment=A73', 'employment=A74', 'employment=A75', 'other_debtors=A101', 'other_debtors=A102', 'other_debtors=A103', 'property=A121', 'property=A122', 'property=A123', 'property=A124', 'installment_plans=A141', 'installment_plans=A142', 'installment_plans=A143', 'housing=A151', 'housing=A152', 'housing=A153', 'skill_level=A171', 'skill_level=A172', 'skill_level=A173', 'skill_level=A174', 'telephone=A191', 'telephone=A

In [10]:
df['personal_status']

KeyError: 'personal_status'

## 3. Compute Fairness Metrics on Original Training Data
Now that we have identified the protected attribute "age" and defined privileged and unprivileged values, we can use aif360 to detect bias in the dataset.  

### Mean Outcomes

Compare the base rates (i.e., percentage of favorable results) for the privileged and unprivileged groups and report the difference (unprivileged base rate - privileged base rate). This is implemented in the ```mean_difference``` method on the BinaryLabelDatasetMetric class, as shown below:

In [11]:
metric_orig_train = BinaryLabelDatasetMetric(
     dataset_orig_train, 
     unprivileged_groups=unprivileged_groups,
     privileged_groups=privileged_groups
  )
print("Original training dataset")
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

Original training dataset
Difference in mean outcomes between unprivileged and privileged groups = -0.169905


### Disparate Impact
We can calculate the ratio of (predicted) favorable outcomes for the unprivileged group compared to the privileged group as implemented in the ```disparate_impact``` method on the BinaryLabelDatasetMetric class:

In [12]:
print("Original training dataset")
print("Disparate Impact = %f" % metric_orig_train.disparate_impact())

Original training dataset
Disparate Impact = 0.766430


**Note:** The fairness metrics above will vary depending upon the train-test split. If the magnitude of mean difference is less than 10%, try another split.

### Built-In Explainers

```aif360``` has some useful explainers for the fairness metrics which can be used to interpret the fairness metric values:

In [13]:
json_expl = MetricJSONExplainer(metric_orig_train)
def format_json(json_str):
    return json.dumps(json.loads(json_str, object_pairs_hook=OrderedDict),
                      indent=2)

Let's print the mean difference explainer:

In [14]:
print(format_json(json_expl.mean_difference()))

{
  "metric": "Mean Difference",
  "message": "Mean difference (mean label value on privileged instances - mean label value on unprivileged instances): -0.1699054740619017",
  "numPositivesUnprivileged": 63.0,
  "numInstancesUnprivileged": 113.0,
  "numPositivesPrivileged": 427.0,
  "numInstancesPrivileged": 587.0,
  "description": "Computed as the difference of the rate of favorable outcomes received by the unprivileged group to the privileged group.",
  "ideal": "The ideal value of this metric is 0.0"
}


We can also print the disparate impact explainer:

In [15]:
print(format_json(json_expl.disparate_impact()))

{
  "metric": "Disparate Impact",
  "message": "Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.7664297113013201",
  "numPositivePredictionsUnprivileged": 63.0,
  "numUnprivileged": 113.0,
  "numPositivePredictionsPrivileged": 427.0,
  "numPrivileged": 587.0,
  "description": "Computed as the ratio of rate of favorable outcome for the unprivileged group to that of the privileged group.",
  "ideal": "The ideal value of this metric is 1.0 A value < 1 implies higher benefit for the privileged group and a value >1 implies a higher benefit for the unprivileged group."
}


**Q1:** Using the explainers above, interpret the difference in means and disparate impact in the German Credit data:

**Q1_Write your interpretation here**

- statistical parity: If the values are less than 0, the privileged group has a higher proportion of predicted positive outcomes than the unprivileged group 

- Disparate Imapct: If the values are less than 1, the same is true when statistical parity values are less than 0, which means positive bias.

Thus, both metrics have similar meanings.

### Build a model on the training data

Let's build a logistic regression model on this training data, predict credit risk for test data and compute the same fairness metrics over the model predictions.

In [16]:
model = LogisticRegression(solver='liblinear', class_weight='balanced')

df_test, dict_df_test = dataset_orig_test.convert_to_dataframe()
df_train, dict_df_train = dataset_orig_train.convert_to_dataframe()

# Fit the model to the training data
x_train = df_train.drop(['credit'], axis=1)
y_train = df_train['credit']
model.fit(x_train, y_train)

x_test = df_test.drop(['credit'], axis=1)
y_test = df_test['credit']

y_pred = model.predict(x_test)

dataset_pred_test = dataset_orig_test.copy()
dataset_pred_test.labels = y_pred.copy()

metric_dataset_test = BinaryLabelDatasetMetric(
    dataset_pred_test, 
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

In [17]:
# write code here to compute fairness metrics
json_expl_q2 = MetricJSONExplainer(metric_dataset_test)

In [30]:
print(format_json(json_expl_q2.mean_difference()))

{
  "metric": "Mean Difference",
  "message": "Mean difference (mean label value on privileged instances - mean label value on unprivileged instances): -0.12521343198634033",
  "numPositivesUnprivileged": 21.0,
  "numInstancesUnprivileged": 49.0,
  "numPositivesPrivileged": 139.0,
  "numInstancesPrivileged": 251.0,
  "description": "Computed as the difference of the rate of favorable outcomes received by the unprivileged group to the privileged group.",
  "ideal": "The ideal value of this metric is 0.0"
}


In [16]:
print(format_json(json_expl_q2.disparate_impact()))

{
  "metric": "Disparate Impact",
  "message": "Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.7738951695786228",
  "numPositivePredictionsUnprivileged": 21.0,
  "numUnprivileged": 49.0,
  "numPositivePredictionsPrivileged": 139.0,
  "numPrivileged": 251.0,
  "description": "Computed as the ratio of rate of favorable outcome for the unprivileged group to that of the privileged group.",
  "ideal": "The ideal value of this metric is 1.0 A value < 1 implies higher benefit for the privileged group and a value >1 implies a higher benefit for the unprivileged group."
}


**Q2:** Using the fairness metric functions as before, report the bias observed in the model's predictions over test data. What do these values indicate? Are the model's predictions more biased or less biased compared to the bias observed in the training data?

**Q2_Write your answer here**

As before, both metrics show that the privileged group has a higher proportion of predicted positive outcomes than the unprivileged group, so there is a positive bias.

|      | Original data | Model's Predictions over test data   |
| :---        |    :----:   |          ---: |
| **Mean difference**  | -0.115       | -0.1252   |
| **Disparate Impact**   | 0.8477       | 0.7738      |

According to the above crosstab, both values of original data are closer to the ideal value, so the model predicts less bias in original data.

## 4. Bias Mitigation Techniques

We learnt in class that there are several bias mitigation techniques namely, pre-processing, in-processing, and post-processing algorithms.

_Pre-processing_ bias mitigation is performed at the data end, before the creation of the model. In other words, we transform the data such that a model learned on the transformed data produces less biased decisions.

_In-processing_ bias mitigation methods focus on the model training stage, as compared to pre-processing which focuses on transforming the data prior to model training. This suite of methods includes incorporating a fairness constraint during model training, tweaking the model's objective function, and adversarial learning.

_Post-processing_ bias mitigation focus on the model predictions after the model has been trained.



### 4.1 Bias Mitigation via Pre-Processing

AI Fairness 360 implements several pre-processing mitigation algorithms. We will use the **reweighing algorithm**, which is implemented in the `Reweighing` class in the `aif360.algorithms.preprocessing` package. As discussed in class, this algorithm will transform the dataset by assigning weights to instances in each (group, label) combination to change the base rates and ensure fairness before classification. The idea is to apply appropriate weights to different tuples in the training data to reduce discrimination with respect to the protected attributes.

You can find documentation for reweighting here:
https://aif360.readthedocs.io/en/latest/modules/generated/aif360.algorithms.preprocessing.Reweighing.html 

Call the fit and transform methods to perform the transformation, producing a newly transformed training dataset (```dataset_transf_train```):

In [19]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)
dataset_transf_train = RW.fit_transform(dataset_orig_train)

We can print the weights. Each observation in the data should have a weight. For brevity, let's look at the weights for the first 10 rows:

In [20]:
len(dataset_transf_train.instance_weights)
dataset_transf_train.instance_weights[0:10]

array([1.06705539, 0.9782403 , 0.9782403 , 0.9782403 , 0.9782403 ,
       0.9782403 , 1.06705539, 0.9782403 , 1.15401786, 0.9782403 ])

### Compute Fairness Metrics in Transformed Data

We can check how effective the transformed data was in removing bias by calculating the metrics used for the original training dataset.

In [21]:
metric_rw_train = BinaryLabelDatasetMetric(
    dataset_transf_train, 
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

Print the difference in mean outcomes and disparate impact in the transformed data:

In [27]:
# write your code here
# write code here to compute fairness metrics
json_expl_q3 = MetricJSONExplainer(metric_rw_train)

In [28]:
print(format_json(json_expl_q3.disparate_impact()))

{
  "metric": "Disparate Impact",
  "message": "Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.9999999999999999",
  "numPositivePredictionsUnprivileged": 73.85714285714288,
  "numUnprivileged": 100.00000000000003,
  "numPositivePredictionsPrivileged": 443.1428571428572,
  "numPrivileged": 600.0,
  "description": "Computed as the ratio of rate of favorable outcome for the unprivileged group to that of the privileged group.",
  "ideal": "The ideal value of this metric is 1.0 A value < 1 implies higher benefit for the privileged group and a value >1 implies a higher benefit for the unprivileged group."
}


In [25]:
print(format_json(json_expl_q3.mean_difference()))

{
  "metric": "Mean Difference",
  "message": "Mean difference (mean label value on privileged instances - mean label value on unprivileged instances): -1.1102230246251565e-16",
  "numPositivesUnprivileged": 73.85714285714288,
  "numInstancesUnprivileged": 100.00000000000003,
  "numPositivesPrivileged": 443.1428571428572,
  "numInstancesPrivileged": 600.0,
  "description": "Computed as the difference of the rate of favorable outcomes received by the unprivileged group to the privileged group.",
  "ideal": "The ideal value of this metric is 0.0"
}


**Q3:** How do these values compare to the difference in mean outcomes and disparate impact in the original data?

**Q3_Write your answer in this text cell:**

|      | Original Data | Reweighing Data   | Model's Predictions over test data   |
| :---        |    :----:   |          ---: |         ---: |
| **Mean difference**  | -0.115       | -1.110   | -0.1252   |
| **Disparate Impact**   | 0.8477       | 0.9999      | 0.7738      |

It is fascinating. While the disparate impact of reweighing data is almost ideal, the mean difference is more biased.

### Compute Fairness Metrics on Model Trained on Transformed Data

In the following, we will train a model on the transformed data and compute the metrics over predictions made on the test data.

**Q4:**  How do you expect the fairness metrics would be over a model trained on the transformed data?

**Q4_Write your answer in this text cell:**

Even if the fairness metrics over reweighing data show differently, I expect the model's predictions over transformed data will show similarly as before because, according to Q2, fairness metrics on the model's predictions show farther from the ideal value than the original data.

Since the instances now have weights, we will use a classifier that can incorporate instance weights. In this case, we will use a Naive Bayes classifier (more details here: https://scikit-learn.org/stable/modules/naive_bayes.html). 

In [35]:
df_train_rw, dict_df_train_rw = dataset_transf_train.convert_to_dataframe()

# Fit the model to the transformed training data
x_train_rw = df_train_rw.drop(['credit'], axis=1)
y_train_rw = df_train_rw['credit']

from sklearn.naive_bayes import GaussianNB
model__gnb = GaussianNB()
model__gnb.fit(x_train_rw, y_train_rw)

# Use the model to make predictions on the test data
y_pred_rw = model__gnb.predict(x_test)

dataset_pred_test_rw = dataset_orig_test.copy()
dataset_pred_test_rw.labels = y_pred_rw.copy()

# Construct the BinaryLabelDatasetMetric object over the test predictions
metric_dataset_test_rw = BinaryLabelDatasetMetric(
    dataset_pred_test_rw, 
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

In [36]:
# Print fairness metrics computed over test predictions
# write code here
json_expl_q5 = MetricJSONExplainer(metric_dataset_test_rw)

In [37]:
print(format_json(json_expl_q5.disparate_impact()))

{
  "metric": "Disparate Impact",
  "message": "Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.8597117168545739",
  "numPositivePredictionsUnprivileged": 24.0,
  "numUnprivileged": 49.0,
  "numPositivePredictionsPrivileged": 143.0,
  "numPrivileged": 251.0,
  "description": "Computed as the ratio of rate of favorable outcome for the unprivileged group to that of the privileged group.",
  "ideal": "The ideal value of this metric is 1.0 A value < 1 implies higher benefit for the privileged group and a value >1 implies a higher benefit for the unprivileged group."
}


In [38]:
print(format_json(json_expl_q5.mean_difference()))

{
  "metric": "Mean Difference",
  "message": "Mean difference (mean label value on privileged instances - mean label value on unprivileged instances): -0.07992519717050173",
  "numPositivesUnprivileged": 24.0,
  "numInstancesUnprivileged": 49.0,
  "numPositivesPrivileged": 143.0,
  "numInstancesPrivileged": 251.0,
  "description": "Computed as the difference of the rate of favorable outcomes received by the unprivileged group to the privileged group.",
  "ideal": "The ideal value of this metric is 0.0"
}


**Q5:** Are your observations in line with what you expected in Q4 above? Why or why not?

**Q5_Write your answer in this text cell:**

|      | Original Data | Model's Predictions over test data   | Reweighing Data   | Model's Predictions over RW data   |
| :---        |    :----:   |          ---: |         ---: |        ---: |
| **Mean difference**  | -0.115     | -0.1252   |  -1.110   | -0.0799   |
| **Disparate Impact** | 0.8477     | 0.7738    |  0.9999    | 0.8597    |

While the disparate impact shows what I expected, the mean difference does not. Instead, the mean difference value shows the most ideal among the others. In addition, even if the model's prediction over reweighing data is lower than the fairness of the original reweighing data, when I compare each model's predictions over each data, the prediction over reweighing data shows better fairness than the other in both values.

The observation does not align with what I expected from the data and the model because we manipulated the data and used another model, so these differences are enough to show different values.

**Q6:** Instead of reweighing, one could also apply techniques such as suppression, i.e. removing sensitive attributes. Write code below to train a model that does not use any information on the sensitive attribute, use this model to make predictions over the test data, and then compute the fairness metrics over the predictions.


In [13]:
german_data = GermanDataset(features_to_keep)

# Remove sensitive attributes
german_data.drop(['age', 'sex'], axis=1, inplace=True)

NotImplementedError: 

In [19]:
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_german

preproc_gd   = load_preproc_data_german()

In [20]:
print(preproc_gd)

               instance weights            features                            \
                                protected attribute                             
                                                age  sex credit_history=Delay   
instance names                                                                  
0                           1.0                 1.0  1.0                  0.0   
1                           1.0                 0.0  0.0                  0.0   
2                           1.0                 1.0  1.0                  0.0   
3                           1.0                 1.0  1.0                  0.0   
4                           1.0                 1.0  1.0                  1.0   
...                         ...                 ...  ...                  ...   
995                         1.0                 1.0  0.0                  0.0   
996                         1.0                 1.0  1.0                  0.0   
997                         

In [None]:
def build_logit_model_suppression(dset_trn,
                                  dset_tst,
                                  privileged_groups,
                                  unprivileged_groups):

    scaler = StandardScaler()
    X_trn  = scaler.fit_transform(dset_trn.features[:, 2:]) 
    y_trn  = dset_trn.labels.ravel()
    w_trn  = dset_trn.instance_weights.ravel()

    lmod = LogisticRegression()
    lmod.fit(X_trn, y_trn,
             sample_weight = w_trn) 

    dset_tst_pred = dset_tst.copy(deepcopy=True)
    X_tst = scaler.transform(dset_tst_pred.features[:, 2:]) 
    dset_tst_pred.labels = lmod.predict(X_tst)

    metric_tst = BinaryLabelDatasetMetric(dset_tst_pred, 
                                          unprivileged_groups,
                                          privileged_groups)
    print("Disparate impact is %0.2f (closer to 1 is better)" %
           metric_tst.disparate_impact())
    print("Mean difference  is %0.2f (closer to 0 is better)" %
           metric_tst.mean_difference())

    return lmod, dset_tst_pred, metric_tst

In [15]:
# write code here to implement suppression
dataset_orig_sup = GermanDataset()


dataset_orig_train_sup, dataset_orig_test_sup = dataset_orig_sup.split([0.7], shuffle=True)

# metric_orig_train_sup = BinaryLabelDatasetMetric(
#      dataset_orig_train_sup, 
#      unprivileged_groups=unprivileged_groups,
#      privileged_groups=privileged_groups
#   )
# print("Original training dataset_suppressing")
# print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train_sup.mean_difference())
# print("Original training dataset_suppressing")
# print("Disparate Impact = %f" % metric_orig_train_sup.disparate_impact())
# # json_expl_sup = MetricJSONExplainer(metric_orig_train_sup)

NotImplementedError: 

In [27]:
print(dataset_orig)

               instance weights features                \
                                                         
                                   month credit_amount   
instance names                                           
0                           1.0      6.0        1169.0   
1                           1.0     48.0        5951.0   
2                           1.0     12.0        2096.0   
3                           1.0     42.0        7882.0   
4                           1.0     24.0        4870.0   
...                         ...      ...           ...   
995                         1.0     12.0        1736.0   
996                         1.0     30.0        3857.0   
997                         1.0     12.0         804.0   
998                         1.0     45.0        1845.0   
999                         1.0     45.0        4576.0   

                                                                \
                                                               

In [45]:
# note that we drop sex, which may also be a protected attribute
dataset_orig = GermanDataset(protected_attribute_names=['age'],
                             privileged_classes=[lambda x: x >= 25],
                             features_to_drop=['personal_status', 'sex'])

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]

In [25]:
import os

import pandas as pd

from aif360.datasets import StandardDataset


default_mappings = {
    'label_maps': [{1.0: 'Good Credit', 2.0: 'Bad Credit'}],
    'protected_attribute_maps': [{1.0: 'Male', 0.0: 'Female'},
                                 {1.0: 'Old', 0.0: 'Young'}],
}

def default_preprocessing(df):
    """Adds a derived sex attribute based on personal_status."""
    # TODO: ignores the value of privileged_classes for 'sex'
    status_map = {'A91': 'male', 'A93': 'male', 'A94': 'male',
                  'A92': 'female', 'A95': 'female'}
    df['sex'] = df['personal_status'].replace(status_map)

    return df

class GermanDataset(StandardDataset):
    """German credit Dataset.

    See :file:`aif360/data/raw/german/README.md`.
    """

    def __init__(self, label_name='credit', favorable_classes=[1],
                 protected_attribute_names=['sex', 'age'],
                 privileged_classes=[['male'], lambda x: x > 25],
                 instance_weights_name=None,
                 categorical_features=['status', 'credit_history', 'purpose',
                     'savings', 'employment', 'other_debtors', 'property',
                     'installment_plans', 'housing', 'skill_level', 'telephone',
                     'foreign_worker'],
                 features_to_keep=[], features_to_drop=['personal_status'],
                 na_values=[], custom_preprocessing=default_preprocessing,
                 metadata=default_mappings):
        """See :obj:`StandardDataset` for a description of the arguments.

        By default, this code converts the 'age' attribute to a binary value
        where privileged is `age > 25` and unprivileged is `age <= 25` as
        proposed by Kamiran and Calders [1]_.

        References:
            .. [1] F. Kamiran and T. Calders, "Classifying without
               discriminating," 2nd International Conference on Computer,
               Control and Communication, 2009.

        Examples:
            In some cases, it may be useful to keep track of a mapping from
            `float -> str` for protected attributes and/or labels. If our use
            case differs from the default, we can modify the mapping stored in
            `metadata`:

            >>> label_map = {1.0: 'Good Credit', 0.0: 'Bad Credit'}
            >>> protected_attribute_maps = [{1.0: 'Male', 0.0: 'Female'}]
            >>> gd = GermanDataset(protected_attribute_names=['sex'],
            ... privileged_classes=[['male']], metadata={'label_map': label_map,
            ... 'protected_attribute_maps': protected_attribute_maps})

            Now this information will stay attached to the dataset and can be
            used for more descriptive visualizations.
        """

        filepath = os.path.join(os.path.dirname(os.path.abspath(__file__)),
                                '..', 'data', 'raw', 'german', 'german.data')
        # as given by german.doc
        column_names = ['status', 'month', 'credit_history',
            'purpose', 'credit_amount', 'savings', 'employment',
            'investment_as_income_percentage', 'personal_status',
            'other_debtors', 'residence_since', 'property', 'age',
            'installment_plans', 'housing', 'number_of_credits',
            'skill_level', 'people_liable_for', 'telephone',
            'foreign_worker', 'credit']
        try:
            df = pd.read_csv(filepath, sep=' ', header=None, names=column_names,
                             na_values=na_values)
        except IOError as err:
            print("IOError: {}".format(err))
            print("To use this class, please download the following files:")
            print("\n\thttps://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data")
            print("\thttps://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc")
            print("\nand place them, as-is, in the folder:")
            print("\n\t{}\n".format(os.path.abspath(os.path.join(
                os.path.abspath(__file__), '..', '..', 'data', 'raw', 'german'))))
            import sys
            sys.exit(1)

        super(GermanDataset, self).__init__(df=df, label_name=label_name,
            favorable_classes=favorable_classes,
            protected_attribute_names=protected_attribute_names,
            privileged_classes=privileged_classes,
            instance_weights_name=instance_weights_name,
            categorical_features=categorical_features,
            features_to_keep=features_to_keep,
            features_to_drop=features_to_drop, na_values=na_values,
            custom_preprocessing=custom_preprocessing, metadata=metadata)

In [23]:
default_mappings = {
    'label_maps': [{1.0: 'Good Credit', 2.0: 'Bad Credit'}],
    'protected_attribute_maps': [{1.0: 'Male', 0.0: 'Female'},
                                 {1.0: 'Old', 0.0: 'Young'}],
}

def default_preprocessing(df):
    """Adds a derived sex attribute based on personal_status."""
    # TODO: ignores the value of privileged_classes for 'sex'
    status_map = {'A91': 'male', 'A93': 'male', 'A94': 'male',
                  'A92': 'female', 'A95': 'female'}
    df['sex'] = df['personal_status'].replace(status_map)

    return df

In [28]:
dataset_orig_sss = GermanDataset(protected_attribute_names=[],
                             privileged_classes=[['male'], lambda x: x >= 25],
                             features_to_drop=[])


NameError: name '__file__' is not defined

In [48]:
dataset_orig_sss.drop(['sex', 'age'], axis=1, inplace=True)


AttributeError: 'GermanDataset' object has no attribute 'drop'

In [38]:
df_sup, dict_df_sup = dataset_orig_sup.convert_to_dataframe()
print("Shape: ", df_sup.shape)
# print(df.columns)
# df.head(5)

TypeError: convert_to_dataframe() missing 1 required positional argument: 'self'

In [None]:
model = LogisticRegression(solver='liblinear', class_weight='balanced')

df_test, dict_df_test = dataset_orig_test.convert_to_dataframe()
df_train, dict_df_train = dataset_orig_train.convert_to_dataframe()

# Fit the model to the training data
x_train = df_train.drop(['credit'], axis=1)
y_train = df_train['credit']
model.fit(x_train, y_train)

x_test = df_test.drop(['credit'], axis=1)
y_test = df_test['credit']

y_pred = model.predict(x_test)

dataset_pred_test = dataset_orig_test.copy()
dataset_pred_test.labels = y_pred.copy()

metric_dataset_test = BinaryLabelDatasetMetric(
    dataset_pred_test, 
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

**Q7:** Interpret your results. How does the preprocessing technique in Q5 compare to the suppression technique? 

**Write your answer in this text cell:**

### 4.2. Bias Mitigation via In-Processing

In-processing methods focus on the model training stage, as compared to pre-processing which focuses on transforming the data prior to model training. Broadly speaking, contemporary in-processing methods are stronger than pre-processing methods.

### Adversarial Debiasing

In this part of the notebook, we will use an in-processing algorithm, called _Adversarial Debiasing_, that we briefly discussed in class. From the aif360 documentation (https://aif360.readthedocs.io/en/v0.2.3/modules/inprocessing.html):

> Adversarial debiasing is an in-processing technique that learns a classifier to maximize prediction accuracy and simultaneously reduce an adversary’s ability to determine the protected attribute from the predictions. This approach leads to a fair classifier as the predictions cannot carry any group discrimination information that the adversary can exploit.

For intuition, you can think of adversarial debiasing as a model with two supervised learning tasks. The first task is to predict an outcome using the training data input. The second task, i.e. the adversary, is to predict a protected feature using these predictions and non-protected features in the training data input. The aim is to maximize the model's ability to carry out the first task (i.e. predict outcomes) while minimizing its ability to carry out the second task (i.e. predict protected features).

We implement adversarial debiasing below:

In [21]:
import tensorflow as tf

print(tf.__version__)


2.8.0


In [28]:
# reset tensorflow graph
# tf.compat.v1.reset_default_graph()

# start tensorflow session
# sess = tf.compat.v1.Session()
# tf.compat.v1.disable_eager_execution()

# create AdversarialDebiasing model
debiased_model = AdversarialDebiasing(
    privileged_groups = privileged_groups,
    unprivileged_groups = unprivileged_groups,
    scope_name = 'debiased_classifier',
    debias = True,
    sess = sess)

# fit the model to training data
debiased_model.fit(dataset_orig_train)

# make predictions on training and test data
dataset_debiasing_train = debiased_model.predict(dataset_orig_train)
dataset_debiasing_test = debiased_model.predict(dataset_orig_test)

# metrics
metric_dataset_debiasing_test = BinaryLabelDatasetMetric(
    dataset_debiasing_test, 
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

# Close session
# sess.close()

NameError: name 'sess' is not defined

In [43]:
# import tensorflow as tf
# import tensorflow.compat.v1 as tf

# reset tensorflow graph
tf.reset_default_graph()

# start tensorflow session
sess = tf.Session()
tf.disable_eager_execution()

# create AdversarialDebiasing model
debiased_model = AdversarialDebiasing(
    privileged_groups = privileged_groups,
    unprivileged_groups = unprivileged_groups,
    scope_name = 'debiased_classifier',
    debias = True,
    sess = sess)

# fit the model to training data
debiased_model.fit(dataset_orig_train)

# make predictions on training and test data
dataset_debiasing_train = debiased_model.predict(dataset_orig_train)
dataset_debiasing_test = debiased_model.predict(dataset_orig_test)

# metrics
metric_dataset_debiasing_test = BinaryLabelDatasetMetric(
    dataset_debiasing_test, 
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

# Close session
sess.close()

AttributeError: module 'tensorflow' has no attribute 'reset_default_graph'

### Fairness Metrics under Adversarial Debiasing

The adversarial debiasing algorithm has built-in methods for the difference in mean outcomes (called ```.mean_difference()```) and disparate impact (called ```.disparate_impact()```). Print these below: 

In [None]:
# write your code here

**Q8:** Interpret the difference in means and disparate impact for the predicted outcomes under adversarial debiasing. How do these compare to the metrics calculated in Q2 and Q5?

**Write your interpretation and comparison in this text cell:**

### 4.3. Bias Mitigation via Post-Processing

In this last section, we will use one of the post-processing algorithms in AI Fairness 360 called as **equalized odds postprocessing**, which is implemented in the `EqOddsPostprocessing` class in the `aif360.algorithms.postprocessing` package. This technique solves a linear program to find probabilities with which to change output labels to optimize equalized odds.

You can find documentation for reweighting here:
https://aif360.readthedocs.io/en/latest/modules/generated/aif360.algorithms.postprocessing.EqOddsPostprocessing.html 

Call the fit and transform methods to perform the transformation, producing a newly transformed training dataset (```dataset_post_train```):

In [None]:
df_test, dict_df_test = dataset_orig_test.convert_to_dataframe()
df_train, dict_df_train = dataset_orig_train.convert_to_dataframe()

# Fit the model to the training data and predict for test data
# write code here
# dataset_pred_test -- dataset with predictions stored in labels

# create Equalized Odds Post processing object
eo_post = EqOddsPostprocessing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)

# fit the object to training data
eo_post.fit(dataset_orig_test, dataset_pred_test)

# make predictions on test data
# write code here

# construct metrics object
# write code here

# compute fairnesss metrics 
# write code here

**Q9:** Interpret the difference in fairness metrics for the predicted outcomes under this post-processing technique. How do these compare to the metrics calculated in Q2, Q5 and Q8?

**Write your interpretation and comparison in this text cell:**

# Submitting this Assignment Notebook

Once complete, please submit your assignment notebook as an attachment under \"Assignments > Assignment 4\" on Brightspace. You can download a copy of your notebook using ```File > Download .ipynb```. Please ensure you submit the `.ipynb` file (and not a `.py` file)."