## Using Prejudice Remover to Test Impact of Eta

**Goals:**
* Which samples get different predictions under different classifiers? 

* Which samples get different decisions (difficult decisions/samples).

* How much change induced onto eta will make the classifier produce different decisions?

* Create plot of changing eta over sample classification.

**Output:**
* Output CSV for test set for PrejudiceRemover, for each eta add a column of predictions for each different value of eta for that sample.

* Ability to pick 2 features, scattered the samples of that feature, and ability to move the slider (eta) with samples changing real-time.

**Notes:**
* Eta is 1 - fairness and accuracy are equal.


First, I will set up a prejudice_remover algorithm in order to make sure I understand the process of how it works, and if I can set it up correctly. 

In [21]:
from aif360.algorithms.inprocessing import PrejudiceRemover
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_compas
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from aif360.metrics import BinaryLabelDatasetMetric

In [22]:
# Split data into test and train, label as necessary
data = load_preproc_data_compas()
# For protected attribute sex, Female is privileged, and Male is unprivileged
privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]
# Split at 70/30
data_train, data_test = data.split([0.7], shuffle=True)

In [23]:
# Looking at our dataset
# Shape of data
print(data_train.features.shape)
# Labels of data
print(data_train.favorable_label, data_train.unfavorable_label)
# Attribute names (protected)
print(data_train.protected_attribute_names)
# Attribute values (protected)
print(data_train.privileged_protected_attributes, 
      data_train.unprivileged_protected_attributes)
# Feature names
print(data_train.feature_names)

(3694, 10)
0.0 1.0
['sex', 'race']
[array([1.]), array([1.])] [array([0.]), array([0.])]
['sex', 'race', 'age_cat=25 to 45', 'age_cat=Greater than 45', 'age_cat=Less than 25', 'priors_count=0', 'priors_count=1 to 3', 'priors_count=More than 3', 'c_charge_degree=F', 'c_charge_degree=M']


Splitting original data for later.

In [25]:
# Original training data
metric_orig_train = BinaryLabelDatasetMetric(data_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

In [26]:
# Original testing data
metric_orig_test = BinaryLabelDatasetMetric(data_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)

Preparing logistic regression model.

In [15]:
# Logistic regression classifier and predictions for training data
scale_orig = StandardScaler()
X_train = scale_orig.fit_transform(data_train.features)
y_train = data_train.labels.ravel()
lmod = LogisticRegression()
lmod.fit(X_train, y_train)

LogisticRegression()

Preparing the prejudice remover model.

In [27]:
# Using prejudice remover
pr = PrejudiceRemover(eta=1.0, sensitive_attr='sex')
pr.fit(data)

<aif360.algorithms.inprocessing.prejudice_remover.PrejudiceRemover at 0x2923d573e20>

In [30]:
# Make predictions
y_pred = pr.predict(data)