#### This notebook demonstrates the use of the SenSR algorithm to learn a fair classifier.
[SenSR](https://arxiv.org/pdf/1907.00020.pdf) is an in-processing technique that learns a classifier that is fair in the sense that its performance is invariant under certain perturbations to the features. For example, the performance of a resume screening system should be invariant under changes to the name of the applicant or switching the gender pronouns. This notebook reproduces the Adult experiments in [this paper](https://arxiv.org/pdf/1907.00020.pdf).

In [18]:
# Load all necessary packages
from aif360.datasets import BinaryLabelDataset, AdultDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric

from sklearn.preprocessing import StandardScaler

from IPython.display import Markdown, display

from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.decomposition import TruncatedSVD

import utils
import SenSR
import numpy as np

#### Load dataset and set options

In [3]:
# Get the dataset and split into train and test
dataset_orig = AdultDataset()

# We do not use these features. Note, we use the continuous version of education, i.e. `education-num`, so we drop the categorical versions of education
drop_features = [
    'education=10th',
    'education=11th',
    'education=12th',
    'education=1st-4th',
    'education=5th-6th',
    'education=7th-8th',
    'education=9th',
    'education=Assoc-acdm',
    'education=Assoc-voc',
    'education=Bachelors',
    'education=Doctorate',
    'education=HS-grad',
    'education=Masters',
    'education=Preschool',
    'education=Prof-school',
    'education=Some-college', 
    'native-country=Cambodia',
    'native-country=Canada',
    'native-country=China',
    'native-country=Columbia',
    'native-country=Cuba',
    'native-country=Dominican-Republic',
    'native-country=Ecuador',
    'native-country=El-Salvador',
    'native-country=England',
    'native-country=France',
    'native-country=Germany',
    'native-country=Greece',
    'native-country=Guatemala',
    'native-country=Haiti',
    'native-country=Holand-Netherlands',
    'native-country=Honduras',
    'native-country=Hong',
    'native-country=Hungary',
    'native-country=India',
    'native-country=Iran',
    'native-country=Ireland',
    'native-country=Italy',
    'native-country=Jamaica',
    'native-country=Japan',
    'native-country=Laos',
    'native-country=Mexico',
    'native-country=Nicaragua',
    'native-country=Outlying-US(Guam-USVI-etc)',
    'native-country=Peru',
    'native-country=Philippines',
    'native-country=Poland',
    'native-country=Portugal',
    'native-country=Puerto-Rico',
    'native-country=Scotland',
    'native-country=South',
    'native-country=Taiwan',
    'native-country=Thailand',
    'native-country=Trinadad&Tobago',
    'native-country=United-States',
    'native-country=Vietnam',
    'native-country=Yugoslavia']

drop_features_indices = [dataset_orig.feature_names.index(feat) for feat in drop_features]

dataset_orig.features = np.delete(dataset_orig.features, drop_features_indices, axis = 1)
dataset_orig.feature_names = [feat for feat in dataset_orig.feature_names if feat not in drop_features]

# we will standardize continous features
continous_features = ['age', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']
continous_features_indices = [dataset_orig.feature_names.index(feat) for feat in continous_features]

# get a 80%/20% train/test split
dataset_orig_train, dataset_orig_test = dataset_orig.split([0.8], shuffle=True)

X_train = dataset_orig_train.features
# normalize continuous features
SS = StandardScaler().fit(X_train[:, continous_features_indices])
X_train[:, continous_features_indices] = SS.transform(X_train[:, continous_features_indices])
# remove sex and race as predictive features
X_train = np.delete(X_train, [dataset_orig_train.feature_names.index(feat) for feat in ['sex', 'race']], axis = 1)

X_test = dataset_orig_test.features
# normalize continuous features
X_test[:, continous_features_indices] = SS.transform(X_test[:, continous_features_indices])
# remove sex and race as predictive features
X_test = np.delete(X_test, [dataset_orig_test.feature_names.index(feat) for feat in ['sex', 'race']], axis = 1)

y_train = dataset_orig_train.labels
y_test = dataset_orig_test.labels

one_hot = OneHotEncoder(sparse=False)
one_hot.fit(y_train.reshape(-1,1))
y_train = one_hot.transform(y_train.reshape(-1,1))
y_test = one_hot.transform(y_test.reshape(-1,1))

y_sex_train = dataset_orig_train.features[:, dataset_orig_train.feature_names.index('sex')]
y_sex_test = dataset_orig_test.features[:, dataset_orig_test.feature_names.index('sex')]

one_hot.fit(y_sex_train.reshape(-1,1))
y_sex_train = one_hot.transform(y_sex_train.reshape(-1,1))
y_sex_test = one_hot.transform(y_sex_test.reshape(-1,1))

privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

  priv = np.logical_or.reduce(np.equal.outer(vals, df[attr]))
  df[label_name]))


In [4]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

#### Training Dataset shape

(36177, 41)


#### Favorable and unfavorable labels

1.0 0.0


#### Protected attribute names

['race', 'sex']


#### Privileged and unprivileged protected attribute values

[array([1.]), array([1.])] [array([0.]), array([0.])]


#### Dataset feature names

['age', 'education-num', 'race', 'sex', 'capital-gain', 'capital-loss', 'hours-per-week', 'workclass=Federal-gov', 'workclass=Local-gov', 'workclass=Private', 'workclass=Self-emp-inc', 'workclass=Self-emp-not-inc', 'workclass=State-gov', 'workclass=Without-pay', 'marital-status=Divorced', 'marital-status=Married-AF-spouse', 'marital-status=Married-civ-spouse', 'marital-status=Married-spouse-absent', 'marital-status=Never-married', 'marital-status=Separated', 'marital-status=Widowed', 'occupation=Adm-clerical', 'occupation=Armed-Forces', 'occupation=Craft-repair', 'occupation=Exec-managerial', 'occupation=Farming-fishing', 'occupation=Handlers-cleaners', 'occupation=Machine-op-inspct', 'occupation=Other-service', 'occupation=Priv-house-serv', 'occupation=Prof-specialty', 'occupation=Protective-serv', 'occupation=Sales', 'occupation=Tech-support', 'occupation=Transport-moving', 'relationship=Husband', 'relationship=Not-in-family', 'relationship=Other-relative', 'relationship=Own-child', 

#### Metric for original training data

In [5]:
# Metric for the original dataset
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())
metric_orig_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test.mean_difference())

#### Original training dataset

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.199162
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.197952


### Learn plain classifier

In [6]:
weights, train_logits, test_logits  = SenSR.train_nn(X_train, y_train, X_test = X_test, y_test = y_test, n_units=[], l2_reg=0., batch_size=1000, epoch=5000, verbose=True)
































Epoch 0 train accuracy 0.473201
Epoch 0 test accuracy 0.471752

Epoch 10 train accuracy 0.502612
Epoch 10 test accuracy 0.503261

Epoch 20 train accuracy 0.528596
Epoch 20 test accuracy 0.528579

Epoch 30 train accuracy 0.553860
Epoch 30 test accuracy 0.55644

Epoch 40 train accuracy 0.580037
Epoch 40 test accuracy 0.579768

Epoch 50 train accuracy 0.603643
Epoch 50 test accuracy 0.601879

Epoch 60 train accuracy 0.622744
Epoch 60 test accuracy 0.622333

Epoch 70 train accuracy 0.641347
Epoch 70 test accuracy 0.641791

Epoch 80 train accuracy 0.658429
Epoch 80 test accuracy 0.661249

Epoch 90 train accuracy 0.674213
Epoch 90 test accuracy 0.67728

Epoch 100 train accuracy 0.688421
Epoch 100 test accuracy 0.69309

Epoch 110 train accuracy 0.701910
Epoch 110 test accuracy 0.707352

Epoch 120 train accuracy 0.712884
Epoch 120 test accuracy 0.71885

Epoch 130 train accuracy 0.722393
Epoch 130 test accuracy 0.728027

Epoch 140 train accuracy 0.730436
Epoch 140 test accuracy 0.735876

Epoch

In [7]:
dataset_nodebiasing_train = dataset_orig_train.copy()
dataset_nodebiasing_train.labels = np.argmax(train_logits,axis = 1)

dataset_nodebiasing_test = dataset_orig_test.copy()
dataset_nodebiasing_test.labels = np.argmax(test_logits,axis = 1)

In [8]:
def compute_gap_RMS(data_set):
    TPR = -1*data_set.false_negative_rate_difference()
    TNR = -1*data_set.false_positive_rate_difference()    

    return np.sqrt(1/2*(TPR**2 + TNR**2)), max(np.abs(TPR), np.abs(TNR))

In [9]:
# Metrics for the dataset from plain model (without debiasing)
privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

display(Markdown("#### Plain model - without debiasing - dataset metrics"))
metric_dataset_nodebiasing_train = BinaryLabelDatasetMetric(dataset_nodebiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())

metric_dataset_nodebiasing_test = BinaryLabelDatasetMetric(dataset_nodebiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

display(Markdown("#### Plain model - without debiasing - classification metrics"))
classified_metric_nodebiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)

gap_rms, max_gap = compute_gap_RMS(classified_metric_nodebiasing_test)
print("Test set: gap rms sex = %f" % gap_rms)
print("Test set: max gap rms sex = %f" % max_gap)
print("Test set: Balanced TPR = %f" % bal_acc_nodebiasing_test)

privileged_groups = [{'race': 1}]
unprivileged_groups = [{'race': 0}]

classified_metric_nodebiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)

gap_rms, max_gap = compute_gap_RMS(classified_metric_nodebiasing_test)
print("Test set: gap rms race = %f" % gap_rms)
print("Test set: max gap rms race = %f" % max_gap)


#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.307939
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.307783


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.809397
Test set: gap rms sex = 0.164555
Test set: max gap rms sex = 0.204364
Test set: Balanced TPR = 0.819975
Test set: gap rms race = 0.079848
Test set: max gap rms race = 0.102520


### Apply in-processing algorithm based on adversarial learning
#### SenSR$_0$

In [10]:
# get sensitive directions
weights, train_logits, test_logits  = SenSR.train_nn(X_train, y_sex_train, X_test = X_test, y_test = y_sex_test, n_units=[], l2_reg=1., batch_size=5000, epoch=5000, verbose=True)

sensitive_directions = []
sensitive_directions.append(weights[0].T)

sensitive_directions = np.vstack(sensitive_directions)
tSVD = TruncatedSVD(n_components=2)
tSVD.fit(sensitive_directions)
sensitive_directions = tSVD.components_


Epoch 0 train accuracy 0.384056
Epoch 0 test accuracy 0.383527

Epoch 10 train accuracy 0.384139
Epoch 10 test accuracy 0.383969

Epoch 20 train accuracy 0.384775
Epoch 20 test accuracy 0.383527

Epoch 30 train accuracy 0.386185
Epoch 30 test accuracy 0.38408

Epoch 40 train accuracy 0.388258
Epoch 40 test accuracy 0.38817

Epoch 50 train accuracy 0.391851
Epoch 50 test accuracy 0.390934

Epoch 60 train accuracy 0.394975
Epoch 60 test accuracy 0.395578

Epoch 70 train accuracy 0.400614
Epoch 70 test accuracy 0.401216

Epoch 80 train accuracy 0.407358
Epoch 80 test accuracy 0.406965

Epoch 90 train accuracy 0.413661
Epoch 90 test accuracy 0.411719

Epoch 100 train accuracy 0.422340
Epoch 100 test accuracy 0.420896

Epoch 110 train accuracy 0.431711
Epoch 110 test accuracy 0.430846

Epoch 120 train accuracy 0.441275
Epoch 120 test accuracy 0.441902

Epoch 130 train accuracy 0.453161
Epoch 130 test accuracy 0.454616

Epoch 140 train accuracy 0.467120
Epoch 140 test accuracy 0.472305

Epo

In [11]:
# apply SenSR_0
weights, train_logits, test_logits  = SenSR.train_fair_nn(
    X_train, 
    y_train, 
    sensitive_directions, 
    X_test=X_test, 
    y_test=y_test, 
    n_units = [], 
    lr=0.001, 
    batch_size=5000, 
    epoch=15000, 
    verbose=True, 
    l2_reg=0., 
    subspace_epoch=10, 
    subspace_step=.1, 
    eps=None, 
    full_step=-1)

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



Instructions for updating:
Please switch to tf.train.get_or_create_global_step


Instructions for updating:
Please switch to tf.train.get_or_create_global_step








Epoch 0 train accuracy 0.511706; lambda is 2.000000
Epoch 0 test accuracy 0.509895
Epoch 10 train accuracy 0.551732; lambda is 2.000000
Epoch 10 test accuracy 0.551465
Epoch 20 train accuracy 0.592752; lambda is 2.000000
Epoch 20 test accuracy 0.597236
Epoch 30 train accuracy 0.634464; lambda is 2.000000
Epoch 30 test accuracy 0.636484
Epoch 40 train accuracy 0.669707; lambda is 2.000000
Epoch 40 test accuracy 0.671531
Epoch 50 train accuracy 0.696658; lambda is 2.000000
Epoch 50 test accuracy 0.697844
Epoch 60 train accuracy 0.717417; lambda is 2.000000
Epoch 60 test accuracy 0.722609
Epoch 70 train accuracy 0.731376; lambda is 2.000000
Epoch 70 test accuracy 0.737866
Epoch 80 train accuracy 0.741853; lambda is 2.000000
Epoch 80 test accuracy 0.748148
Epoch 90 train accuracy 0.749896; lambda is 2.000000
Epoch 90 test accuracy 0.753565
Epoch 100 train accuracy 0.756226; lambda is 2.000000
Epoch 100 test accuracy 0.758872
Epoch 110 train accuracy 0.760981; lambda is 2.000000
Epoch 110 t

In [59]:
dataset_debiasing_train = dataset_orig_train.copy()
dataset_debiasing_train.labels = np.argmax(train_logits,axis = 1)

dataset_debiasing_test = dataset_orig_test.copy()
dataset_debiasing_test.labels = np.argmax(test_logits,axis = 1)

In [60]:
# Metrics for the dataset from plain model (without debiasing)
# parameters from paper but with more epochs
privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

display(Markdown("#### Plain model - without debiasing - dataset metrics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

# Metrics for the dataset from model with debiasing
display(Markdown("#### Model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(dataset_debiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train.mean_difference())

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(dataset_debiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())



display(Markdown("#### Plain model - without debiasing - classification metrics"))
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())

privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

classified_metric_nodebiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)

TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)

gap_rms, max_gap = compute_gap_RMS(classified_metric_nodebiasing_test)
print("Test set: gap rms sex = %f" % gap_rms)
print("Test set: max gap rms sex = %f" % max_gap)
print("Test set: Balanced TPR = %f" % bal_acc_nodebiasing_test)

privileged_groups = [{'race': 1}]
unprivileged_groups = [{'race': 0}]

classified_metric_nodebiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)

gap_rms, max_gap = compute_gap_RMS(classified_metric_nodebiasing_test)
print("Test set: gap rms race = %f" % gap_rms)
print("Test set: max gap rms race = %f" % max_gap)




display(Markdown("#### Model - with debiasing - classification metrics"))
privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test.accuracy())
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)

gap_rms, max_gap = compute_gap_RMS(classified_metric_debiasing_test)
print("Test set: gap rms sex = %f" % gap_rms)
print("Test set: max gap rms sex = %f" % max_gap)
print("Test set: Balanced TPR = %f" % bal_acc_debiasing_test)

privileged_groups = [{'race': 1}]
unprivileged_groups = [{'race': 0}]
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
gap_rms, max_gap = compute_gap_RMS(classified_metric_debiasing_test)
print("Test set: gap rms race = %f" % gap_rms)
print("Test set: max gap rms race = %f" % max_gap)

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.307939
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.307783


#### Model - with debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.129583
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.128436


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.809397
Test set: gap rms sex = 0.164555
Test set: max gap rms sex = 0.204364
Test set: Balanced TPR = 0.819975
Test set: gap rms race = 0.079848
Test set: max gap rms race = 0.102520


#### Model - with debiasing - classification metrics

Test set: Classification accuracy = 0.797236
Test set: gap rms sex = 0.047743
Test set: max gap rms sex = 0.062512
Test set: Balanced TPR = 0.796080
Test set: gap rms race = 0.046531
Test set: max gap rms race = 0.055678


#### SenSR

In [14]:
#get sensitive directions
weights, train_logits, test_logits  = SenSR.train_nn(X_train, y_sex_train, X_test = X_test, y_test = y_sex_test, n_units=[], l2_reg=1., batch_size=5000, epoch=5000, verbose=True)

sensitive_directions = []
sensitive_directions.append(weights[0].T)

sensitive_directions = np.vstack(sensitive_directions)
tSVD = TruncatedSVD(n_components=2)
tSVD.fit(sensitive_directions)
sensitive_directions = tSVD.components_


Epoch 0 train accuracy 0.354701
Epoch 0 test accuracy 0.352018

Epoch 10 train accuracy 0.356138
Epoch 10 test accuracy 0.355998

Epoch 20 train accuracy 0.356912
Epoch 20 test accuracy 0.357656

Epoch 30 train accuracy 0.358460
Epoch 30 test accuracy 0.358209

Epoch 40 train accuracy 0.359731
Epoch 40 test accuracy 0.359536

Epoch 50 train accuracy 0.362136
Epoch 50 test accuracy 0.361747

Epoch 60 train accuracy 0.364265
Epoch 60 test accuracy 0.364179

Epoch 70 train accuracy 0.367582
Epoch 70 test accuracy 0.368049

Epoch 80 train accuracy 0.370816
Epoch 80 test accuracy 0.371144

Epoch 90 train accuracy 0.374077
Epoch 90 test accuracy 0.375567

Epoch 100 train accuracy 0.378804
Epoch 100 test accuracy 0.378662

Epoch 110 train accuracy 0.383946
Epoch 110 test accuracy 0.384411

Epoch 120 train accuracy 0.388728
Epoch 120 test accuracy 0.388944

Epoch 130 train accuracy 0.394947
Epoch 130 test accuracy 0.393809

Epoch 140 train accuracy 0.402604
Epoch 140 test accuracy 0.400663

E

In [15]:
# apply SenSR

weights, train_logits, test_logits  = SenSR.train_fair_nn(
    X_train, 
    y_train, 
    sensitive_directions, 
    X_test=X_test, 
    y_test=y_test, 
    n_units = [], 
    lr=0.001, 
    batch_size=5000, 
    epoch=15000, 
    verbose=True, 
    l2_reg=0., 
    lamb_init=2., 
    subspace_epoch=15, 
    subspace_step=1, 
    eps=.001, 
    full_step=.0001, 
    full_epoch=25)

Epoch 0 train accuracy 0.503884; lambda is 2.000000
Epoch 0 test accuracy 0.50901
Epoch 10 train accuracy 0.535368; lambda is 2.000000
Epoch 10 test accuracy 0.537092
Epoch 20 train accuracy 0.562678; lambda is 2.000000
Epoch 20 test accuracy 0.56628
Epoch 30 train accuracy 0.588744; lambda is 2.000000
Epoch 30 test accuracy 0.593256
Epoch 40 train accuracy 0.613014; lambda is 2.000000
Epoch 40 test accuracy 0.616915
Epoch 50 train accuracy 0.638583; lambda is 2.000000
Epoch 50 test accuracy 0.641238
Epoch 60 train accuracy 0.664981; lambda is 2.000000
Epoch 60 test accuracy 0.667993
Epoch 70 train accuracy 0.685297; lambda is 2.000000
Epoch 70 test accuracy 0.685462
Epoch 80 train accuracy 0.699312; lambda is 2.000000
Epoch 80 test accuracy 0.701935
Epoch 90 train accuracy 0.710590; lambda is 2.000000
Epoch 90 test accuracy 0.713433
Epoch 100 train accuracy 0.718633; lambda is 2.000000
Epoch 100 test accuracy 0.720619
Epoch 110 train accuracy 0.724936; lambda is 2.000000
Epoch 110 tes

In [16]:
dataset_debiasing_train = dataset_orig_train.copy()
dataset_debiasing_train.labels = np.argmax(train_logits,axis = 1)

dataset_debiasing_test = dataset_orig_test.copy()
dataset_debiasing_test.labels = np.argmax(test_logits,axis = 1)

In [17]:
# Metrics for the dataset from plain model (without debiasing)
privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

display(Markdown("#### Plain model - without debiasing - dataset metrics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_train.mean_difference())
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_nodebiasing_test.mean_difference())

# Metrics for the dataset from model with debiasing
display(Markdown("#### Model - with debiasing - dataset metrics"))
metric_dataset_debiasing_train = BinaryLabelDatasetMetric(dataset_debiasing_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_train.mean_difference())

metric_dataset_debiasing_test = BinaryLabelDatasetMetric(dataset_debiasing_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())



display(Markdown("#### Plain model - without debiasing - classification metrics"))
print("Test set: Classification accuracy = %f" % classified_metric_nodebiasing_test.accuracy())
privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]

classified_metric_nodebiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)

TPR = classified_metric_nodebiasing_test.true_positive_rate()
TNR = classified_metric_nodebiasing_test.true_negative_rate()
bal_acc_nodebiasing_test = 0.5*(TPR+TNR)

gap_rms, max_gap = compute_gap_RMS(classified_metric_nodebiasing_test)
print("Test set: gap rms sex = %f" % gap_rms)
print("Test set: max gap rms sex = %f" % max_gap)
print("Test set: Balanced TPR = %f" % bal_acc_nodebiasing_test)

privileged_groups = [{'race': 1}]
unprivileged_groups = [{'race': 0}]

classified_metric_nodebiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_nodebiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)

gap_rms, max_gap = compute_gap_RMS(classified_metric_nodebiasing_test)
print("Test set: gap rms race = %f" % gap_rms)
print("Test set: max gap rms race = %f" % max_gap)



display(Markdown("#### Model - with debiasing - classification metrics"))
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % classified_metric_debiasing_test.accuracy())
TPR = classified_metric_debiasing_test.true_positive_rate()
TNR = classified_metric_debiasing_test.true_negative_rate()
bal_acc_debiasing_test = 0.5*(TPR+TNR)

gap_rms, max_gap = compute_gap_RMS(classified_metric_debiasing_test)
print("Test set: gap rms sex = %f" % gap_rms)
print("Test set: max gap rms sex = %f" % max_gap)
print("Test set: Balanced TPR = %f" % bal_acc_debiasing_test)

privileged_groups = [{'race': 1}]
unprivileged_groups = [{'race': 0}]
classified_metric_debiasing_test = ClassificationMetric(dataset_orig_test, 
                                                 dataset_debiasing_test,
                                                 unprivileged_groups=unprivileged_groups,
                                                 privileged_groups=privileged_groups)

print("Test set: gap rms race = %f" % gap_rms)
print("Test set: max gap rms race = %f" % max_gap)

#### Plain model - without debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.307939
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.307783


#### Model - with debiasing - dataset metrics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.129583
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.128436


#### Plain model - without debiasing - classification metrics

Test set: Classification accuracy = 0.809397
Test set: gap rms sex = 0.164555
Test set: max gap rms sex = 0.204364
Test set: Balanced TPR = 0.819975
Test set: gap rms race = 0.079848
Test set: max gap rms race = 0.102520


#### Model - with debiasing - classification metrics

Test set: Classification accuracy = 0.797236
Test set: gap rms sex = 0.046531
Test set: max gap rms sex = 0.055678
Test set: Balanced TPR = 0.796080
Test set: gap rms race = 0.046531
Test set: max gap rms race = 0.055678
