# FAIR IN-PROCESSING

This notebook implements the Adersarial Debiasing in-processor [(Zhang et al. 2018)](https://dl.acm.org/doi/abs/10.1145/3278721.3278779).

The modeling is performed separately for each combination of training folds. This is controlled with `use_fold` variable. To fit adervsarial debiasing on a different combination of training folds, set `use_fold` to a specific value and restar the kernel.

A further analysis of the processor outputs is performed in `code_05_inprocess3.R`.

The notebook loads the data exported in `code_00_partitinoing.ipynb` and applies pre-processors. The processor predictions are exported as CSV files.

## 1. Parameters and preparations

In [1]:
##### PARAMETERS

# working paths
%run code_00_working_paths.py

# sepcify data set
# one of ['bene', 'german', 'uk', 'taiwan', 'pkdd', 'gmsc', 'homecredit']
data = 'taiwan'

# partitioning
num_folds = 5
use_fold  = 0 # one of [0, 1, ..., num_folds-1]
seed      = 1

In [2]:
##### IN-PROCESSOR PARAMS

adversary_loss_weight = 0.1 # other options: [0.1, 0.01, 0.001]

In [3]:
import tensorflow as tf

In [4]:
##### PACKAGES

import sys
sys.path.append(func_path)

import pickle
import numpy as np
import time

from load_data import *

import tensorflow as tf

from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector
from aif360.algorithms.inprocessing.adversarial_debiasing import AdversarialDebiasing

from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import MaxAbsScaler

import matplotlib.pyplot as plt

pip install 'aif360[LawSchoolGPA]'
pip install 'aif360[Reductions]'
pip install 'aif360[Reductions]'
pip install 'aif360[Reductions]'


## 2. Data import

In [5]:
##### RANDOM SEED

np.random.seed(seed)

In [6]:
##### LOAD PARTITIONING

dataset_orig_test = pickle.load(open(data_path + '\\prepared\\' + data + '_orig_test.pkl', 'rb'))
te                = dataset_orig_test.convert_to_dataframe()[0]

print(te.shape)

(7060, 77)


In [7]:
##### DATA PREP

# protected attribute
protected           = 'AGE'
privileged_groups   = [{'AGE': 1}] 
unprivileged_groups = [{'AGE': 0}]

## 3. Fair processing

In [8]:
##### MODELING

# timer
cv_start = time.time()
tf.reset_default_graph()
# loop through training folds
for fold in range(num_folds):
    
    ##### LOAD DATA
    
    # select fold combination
    if fold != use_fold:
        continue
    
    # feedback
    print('-'*30)
    print('- FOLD ' + str(fold) + '...')
    print('-'*30)

    # import data
    data_train = pickle.load(open(data_path + '\\prepared\\' + data + '_scaled_' + str(fold) + '_train.pkl', 'rb'))
    data_valid = pickle.load(open(data_path + '\\prepared\\' + data + '_scaled_' + str(fold) + '_valid.pkl', 'rb'))
    data_test  = pickle.load(open(data_path + '\\prepared\\' + data + '_scaled_' + str(fold) + '_test.pkl',  'rb'))
    

    ##### MODELING

    # start tensorflow session
    sess = tf.Session()

    # fit adversarial debiasing
    debiased_model = AdversarialDebiasing(privileged_groups     = privileged_groups,
                                          unprivileged_groups   = unprivileged_groups,
                                          debias                = True,
                                          adversary_loss_weight = adversary_loss_weight,
                                          scope_name            = 'debiased_classifier',
                                          sess                  = sess)
    debiased_model.fit(data_train)
    
    # apply the model to valid data
    scores_valid = debiased_model.predict(data_valid).scores
    advdebias_predictions = pd.DataFrame()
    advdebias_predictions['scores']  = scores_valid
    advdebias_predictions['targets'] = data_valid.labels.flatten()
    advdebias_predictions.to_csv(res_path + '\\intermediate\\' + data + '_' + str(fold) + '_AD_' + str(adversary_loss_weight) + '_predictions_valid.csv', 
                                 index  = None, 
                                 header = True)
    
    # apply the model to test data
    scores_test = debiased_model.predict(data_test).scores
    advdebias_predictions = pd.DataFrame()
    advdebias_predictions['scores'] = scores_test
    advdebias_predictions.to_csv(res_path + '\\intermediate\\' + data + '_' + str(fold) + '_AD_' + str(adversary_loss_weight) + '_predictions_test.csv', 
                                 index  = None, 
                                 header = True)
    
    # print performance
    print('')
    print('Finished in {:.2f} minutes'.format((time.time() - cv_start) / 60))







------------------------------
- FOLD 0...
------------------------------






Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


epoch 0; iter: 0; batch classifier loss: 0.589463; batch adversarial loss: 0.893605
epoch 1; iter: 0; batch classifier loss: 0.416237; batch adversarial loss: 0.813566
epoch 2; iter: 0; batch classifier loss: 0.410241; batch adversarial loss: 0.721516
epoch 3; iter: 0; batch classifier loss: 0.426786; batch adversarial loss: 0.631660
epoch 4; iter: 0; batch classifier loss: 0.465434; batch adversarial loss: 0.579287
epoch 5; iter: 0; batch classifier loss: 0.443747; batch adversarial loss: 0.502342
epoch 6; iter: 0; batch classifier loss: 0.485760; batch adversarial loss: 0.529835
epoch 7; iter: 0; batch classifier loss: 0.464600; batch adversarial loss: 0.532830
epoch 8; iter: 0; batch classifier loss: 0.820010; batch adversarial loss: 0.573462
epoch 9; iter: 0; batch classifier loss: 1.219455; batch adversarial loss: 0.502670
epoch 10; iter: 0; batch classifier loss: 1.231727; batch adversarial loss: 0.518090
epoch 11; iter: 0; batch classifier loss: 1.108555; batch adversarial loss:

ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series