Hello! Thank you for checking out our tool.

The purpose of this demo is demonstrate some of the basics. In doing so, we will generate a flipset for one individual. In doing so, we'll show:

1. How to use the ActionSet interface to specify immutable variables and variables with custom ranges.
2. How to use a model to align an ActionSet
3. How to use the RecourseBuilder interface to find the feasibility of one person.

We'll work using CPLEX. The problem is equivalent for CBC. To install either package, read [here](https://github.com/ustunb/actionable-recourse/blob/master/README.md).

In [1]:
import os
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from recourse.builder import RecourseBuilder
from recourse.builder import ActionSet
from recourse.flipset import Flipset

data_dir = "../data/2_1_experiment_1/"

# German Credit dataset

In [2]:
data_name = 'german_processed'
data_file = os.path.join(data_dir, '%s.csv' % data_name)
## load and process data
german_df = pd.read_csv(data_file).reset_index(drop=True)

german_df = (german_df
             .assign(isMale=lambda df: (df['Gender']=='Male').astype(int))
             .drop(['PurposeOfLoan', 'Gender', 'OtherLoansAtStore'], axis=1)
            )

german_y = german_df['GoodCustomer']
german_X = german_df.drop('GoodCustomer', axis=1)

german_categorical_features = [0, 1, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
columns = german_X.columns
german_categorical_names = [columns[i] for i in german_categorical_features] 

# COMPAS dataset

In [3]:
data_name = "compas-scores-two-years"
data_file = os.path.join(data_dir, '%s.csv' % data_name)
## load and process data
compas_df = pd.read_csv(data_file).reset_index(drop=True)

cols_with_missing_values = []
for col in compas_df.columns:
    if len(np.where(compas_df[col].values == '?')[0]) >= 1 or compas_df[col].isnull().values.any():
        cols_with_missing_values.append(col)    

# compas_df['length_of_stay'] = (pd.to_datetime(compas_df['c_jail_out']) - pd.to_datetime(compas_df['c_jail_in'])).dt.days
compas_df = compas_df.drop(cols_with_missing_values, axis=1)
compas_df = (compas_df
             .drop(['id', 'name', 'first', 'last', 'dob', 'compas_screening_date', 'type_of_assessment', \
                    'screening_date', 'v_type_of_assessment', 'v_screening_date', 'race'], axis=1)
            )

compas_df = pd.get_dummies(compas_df, columns=['sex']).drop(['sex_Female'], axis=1)
# compas_df = pd.get_dummies(compas_df, columns=['race'])
compas_df = pd.get_dummies(compas_df, columns=['age_cat'])
compas_df = pd.get_dummies(compas_df, columns=['score_text'])
compas_df = pd.get_dummies(compas_df, columns=['v_score_text'])
compas_df = pd.get_dummies(compas_df, columns=['c_charge_degree']).drop(['c_charge_degree_F'], axis=1)

compas_y = compas_df['two_year_recid'].replace(0, -1)
compas_X = compas_df.drop('two_year_recid', axis=1)

compas_categorical_features = [6, 7, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
columns = compas_X.columns
compas_categorical_names = [columns[i] for i in compas_categorical_features] 


  


# Adult dataset

In [4]:
data_name = "adult"
data_file = os.path.join(data_dir, '%s.csv' % data_name)
## load and process data
adult_df = pd.read_csv(data_file).reset_index(drop=True)
adult_df.columns = ['age','workclass','fnlwgt','education','education-num','marital-status','occupation','relationship','race','sex',\
                                          'capital-gain','capital-loss','hours-per-week','native-country','label']

cols_with_missing_values = []
for col in adult_df.columns:
    if len(np.where(adult_df[col].values == '?')[0]) >= 1 or adult_df[col].isnull().values.any():
        cols_with_missing_values.append(col)    

adult_df = adult_df.drop(cols_with_missing_values, axis=1)

adult_df['Married'] = adult_df.apply(lambda row: 1 if 'Married' in row['marital-status'] else 0, axis=1)
adult_df['Widowed'] = adult_df.apply(lambda row: 1 if 'Widowed' in row['marital-status'] else 0, axis=1)
adult_df['NeverMarried'] = adult_df.apply(lambda row: 1 if 'Never-married' in row['marital-status'] else 0, axis=1)

adult_df['workclass_gov'] = adult_df.apply(lambda row: 1 if 'gov' in row['workclass'] else 0, axis=1)
adult_df['workclass_private'] = adult_df.apply(lambda row: 1 if 'Private' in row['workclass'] else 0, axis=1)
adult_df['workclass_self-emp'] = adult_df.apply(lambda row: 1 if 'Self-emp' in row['workclass'] else 0, axis=1)
# adult_df['workclass_never-worked'] = adult_df.apply(lambda row: 1 if 'Never-worked' in row['workclass'] else 0, axis=1)

adult_df['White'] = adult_df.apply(lambda row: 1 if 'White' in row['race'] else 0, axis=1)

# adult_df = pd.get_dummies(adult_df, columns=['race'])
adult_df = pd.get_dummies(adult_df, columns=['sex'])

adult_df = adult_df.drop(['education', 'occupation', 'native-country', \
                          'relationship'], axis=1)
adult_df = adult_df.drop(['sex_ Female', 'race'], axis=1)
adult_df = adult_df.drop(['marital-status', 'workclass', 'fnlwgt'], axis=1)

adult_df.columns = adult_df.columns.str.replace(' ', '')

adult_X = adult_df.drop('label', axis=1)
adult_y = adult_df['label'].replace(' <=50K', -1)
adult_y = adult_y.replace(' >50K', 1)

for col in adult_X.columns:
    print(col)
    print(adult_X[col].value_counts())

adult_categorical_features = [5, 6, 7, 8, 9, 10, 11, 12]
columns = adult_X.columns
print(columns)
adult_categorical_names = [columns[i] for i in adult_categorical_features] 


  if __name__ == '__main__':


age
36    898
31    888
34    886
23    877
35    876
     ... 
83      6
85      3
88      3
87      1
86      1
Name: age, Length: 73, dtype: int64
education-num
9     10501
10     7291
13     5354
14     1723
11     1382
7      1175
12     1067
6       933
4       646
15      576
5       514
8       433
16      413
3       333
2       168
1        51
Name: education-num, dtype: int64
capital-gain
0        29849
15024      347
7688       284
7298       246
99999      159
         ...  
4931         1
1455         1
6097         1
22040        1
1111         1
Name: capital-gain, Length: 119, dtype: int64
capital-loss
0       31041
1902      202
1977      168
1887      159
1848       51
        ...  
1411        1
1539        1
2472        1
1944        1
2201        1
Name: capital-loss, Length: 92, dtype: int64
hours-per-week
40    15216
50     2819
45     1824
60     1475
35     1297
      ...  
92        1
94        1
87        1
74        1
82        1
Name: hours-per-week, Lengt

Make the data not ohe.

In [5]:
#need the data for recourse and lime to be NOT one-hot-encoded and to be numerical
#need the data for the classifier to be one hot encoded

# german_df['YearsAtCurrentJob_lt_1'] = german_df['YearsAtCurrentJob_lt_1'].replace(1, 'lt_1')
# german_df['YearsAtCurrentJob'] = german_df['YearsAtCurrentJob_lt_1']
# german_df['YearsAtCurrentJob_geq_4'] = german_df['YearsAtCurrentJob_geq_4'].replace(1, 'geq_4')
# german_df['YearsAtCurrentJob'] = german_df.apply(lambda row: 'geq_4' if row['YearsAtCurrentJob_geq_4'] == 'geq_4' else row['YearsAtCurrentJob'], axis=1)
# german_df['YearsAtCurrentJob'] = german_df['YearsAtCurrentJob_lt_1'].replace(0, 'bet_1_4')
# german_df = german_df.drop(['YearsAtCurrentJob_lt_1', 'YearsAtCurrentJob_geq_4'], axis=1)

# german_df['CheckingAccountBalance_geq_0'] = german_df['CheckingAccountBalance_geq_0'].replace(1, 'geq_0')
# german_df['CheckingAccountBalance_geq_200'] = german_df['CheckingAccountBalance_geq_200'].replace(1, 'geq_200')
# german_df['CheckingAccountBalance'] = german_df['CheckingAccountBalance_geq_0']
# german_df['CheckingAccountBalance'] = german_df.apply(lambda row: 'geq_200' if row['CheckingAccountBalance_geq_200'] == 'geq_200' else row['CheckingAccountBalance'], axis=1)
# german_df['CheckingAccountBalance'] = german_df['CheckingAccountBalance'].replace('geq_0', '0_200')
# german_df = german_df.drop(['CheckingAccountBalance_geq_0', 'CheckingAccountBalance_geq_200'], axis=1)

# german_df['SavingsAccountBalance_geq_100'] = german_df['SavingsAccountBalance_geq_100'].replace(1, '100_500')
# german_df['SavingsAccountBalance_geq_500'] = german_df['SavingsAccountBalance_geq_500'].replace(1, 'geq_500')
# german_df['SavingsAccountBalance'] = german_df['SavingsAccountBalance_geq_100']
# german_df['SavingsAccountBalance'] = german_df.apply(lambda row: 'geq_500' if row['SavingsAccountBalance_geq_500'] == 'geq_500' else row['SavingsAccountBalance'], axis=1)
# german_df['SavingsAccountBalance'] = german_df['SavingsAccountBalance'].replace('0', 'lt_100')
# german_df = german_df.drop(['SavingsAccountBalance_geq_100', 'SavingsAccountBalance_geq_500'], axis=1)
# display(german_df)


In [6]:
pd.set_option('display.max_columns', None)
# display(X)
# display(y)

In [7]:
# msk = np.random.rand(len(X)) < 0.8
# train = X[msk]
# test = X[~msk]

# train_y = y[msk]
# test_y = y[~msk]

Currently, no immutable features.

# Train model

Ok great, now let's get into the meat of it. Let's train up a model as see what recourse exists.

In [8]:
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier
import lime.explanation
import lime.lime_tabular

nn = MLPClassifier(hidden_layer_sizes=(100))
rf = RandomForestClassifier(n_estimators=40)

# nn.fit(train, train_y)
# print(nn.score(train, train_y))
# print(nn.score(test, test_y))


# Generate Recourse

First, let's score everyone using our model. Now, let's say that we will give loans to anyone with a greater than a $80\%$ chance of paying it back

In [10]:
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Ridge
from IPython.display import HTML
import time

def convert_binary_categorical_coefficients(exp_list):
    cleaned_exp_dict = {}
    for (feat, coeff) in exp_list:
        if "=" in feat:
            original_feat, val = feat.split("=")
            int_val = int(val)
            if int_val == 1:
                cleaned_exp_dict[original_feat] = coeff
            else:
                cleaned_exp_dict[original_feat] = -1 * coeff
        else:
            cleaned_exp_dict[feat] = coeff
    return cleaned_exp_dict

# scaled_X = (X - explainer.scaler.mean_) / explainer.scaler.scale_

def get_nonzero_actions(feature_names, action):
    action_dict = {}
    for feat_idx, feat_name in enumerate(feature_names):
        action_for_feat = action[feat_idx]
        if action_for_feat != 0:
            action_dict[feat_name] = action_for_feat
    return action_dict
    

# assumes data is properly formatted
def calculate_recourse_accuracy(model, X, y, categorical_features, categorical_names, num_samples = 10000, kernel_width = 1):
    
    model.fit(X, y) 
    print("SCORE: ", model.score(X, y))
    
    if len(X) >= 1000:
        sampled = np.random.choice(np.arange(len(X)), 1000, False)
        sampled_X = X.iloc[sampled]
        
    for col in sampled_X.columns:
        print(col)
        print(sampled_X[col].value_counts())
    
    classes = model.classes_
    true_index = list(classes).index(1)
    
    scores = pd.Series(model.predict_proba(sampled_X)[:, true_index])
    recourses = [None] * len(scores)

    total_recourses = 0
    total_actual_recourses = 0
    error_instances = 0
    
    print("NUM SAMPLES: ", num_samples)
    print("KERNEL WIDTH: ", kernel_width)

    # class_names have to be ordered according to what the classifier is using
    # need to specify which features are categorical for lime
    explainer = lime.lime_tabular.LimeTabularExplainer(X.values, categorical_features=categorical_features, 
                                                       categorical_names=categorical_names, \
                                                       feature_names=X.columns, class_names=classes, \
                                                       discretize_continuous=False, kernel_width = kernel_width)
    
    action_set = ActionSet(X = X)
#     action_set['LoanDuration'].step_type ="absolute"
#     action_set['LoanDuration'].step_size = 6
    
    start_time = time.time()
    for i, dn in enumerate(scores):
        if i % 25 == 0:
            print("\n", i, " out of ", len(scores))
        if i % 100 == 0:
            print("time elapsed: ", (time.time() - start_time) / 60, " minutes")
            start_time = time.time()
        x = X.values[i]

        num_features = len(x)
        exp = explainer.explain_instance(x, model.predict_proba, num_features = num_features, num_samples = num_samples)
        local_pred = exp.local_pred

        model_pred = (model.predict_proba([x])[0][true_index])

        cleaned_exp_dict = convert_binary_categorical_coefficients(exp.as_list())

        columns = X.columns
        coefficients = [None] * num_features
        for i, col in enumerate(columns):
            coefficients[i] = cleaned_exp_dict[col]

        intercept = exp.intercept[1]

        x_shift = np.array(explainer.scaler.mean_)
        x_scale = np.array(explainer.scaler.scale_)
        w = coefficients / x_scale
        b = intercept - np.dot(w, x_shift) - 0.5

        action_set.align(coefficients=w)
        fb = Flipset(x = x, action_set = action_set, coefficients = w, intercept = b)
        
        try:
            print("populating")
            fb.populate(enumeration_type = 'distinct_subsets', total_items = 10)
            error = False

        except ValueError:
            error_instances += 1
            error = True

        recourse = {}
        recourse['idx'] = i
        recourse['instance'] = x
        recourse['model_prob'] = model_pred
        recourse['lime_prob'] = local_pred
        recourse['model_pred'] = 1 if model_pred >= 0.5 else -1
        recourse['lime_pred'] = 1 if local_pred >= 0.5 else -1

        recourse['lime_coeff'] = coefficients
        recourse['lime_intercept'] = intercept
        recourse['scaled_coeff'] = w
        recourse['scaled_intercept'] = b
        recourse['actions'] = [result['actions'] for result in fb.items]
        recourse['error_solving'] = error
        recourses[i] = recourse
        
        for action in recourse['actions']:
            new_x = (x + action)
            old_pred = recourse['model_pred']
            new_pred = model.predict(new_x.reshape(1, -1))[0]

            new_lime_pred = 1 if np.dot(w, new_x) + b >= 0.0 else -1
            total_recourses += 1

            if old_pred != new_pred:
                print(get_nonzero_actions(columns, action))
                total_actual_recourses += 1
                
    print(total_actual_recourses/total_recourses)
                

In [11]:
print("GERMAN")
calculate_recourse_accuracy(nn, german_X, german_y, german_categorical_features, german_categorical_names)

GERMAN
SCORE:  0.7
ForeignWorker
0    963
1     37
Name: ForeignWorker, dtype: int64
Single
1    548
0    452
Name: Single, dtype: int64
Age
27    51
26    50
23    48
24    44
28    43
25    41
35    40
30    40
36    39
31    38
29    37
32    34
33    33
34    32
37    29
22    27
40    25
38    24
42    22
39    21
46    18
41    17
43    17
44    17
47    17
45    15
21    14
49    14
20    14
48    12
50    12
54    10
52     9
57     9
55     8
63     8
51     8
53     7
61     7
60     6
58     5
64     5
65     5
66     5
74     4
67     3
68     3
56     3
59     3
75     2
62     2
19     2
70     1
Name: Age, dtype: int64
LoanDuration
24    184
12    179
18    113
36     83
6      75
15     64
9      49
48     48
30     40
21     30
10     28
27     13
60     13
42     11
11      9
20      8
8       7
4       6
39      5
45      5
7       5
14      4
13      4
33      3
28      3
22      2
16      2
54      2
26      1
40      1
47      1
5       1
72      1
Name: LoanDurat

done building mip
attached x
end RecourseBuilder
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0., 1.]), 'Single': array([-1.,  0.]), 'Age': array([-25., -24., -23., -22., -21., -20., -19., -18., -17., -16., -15.,
       -14., -13., -12., -11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,
        -3.,  -2.,  -1.,   0.]), 'LoanDuration': array([-36., -35., -34., -33., -32., -31., -30., -29., -28., -27., -26.,
       -25., -24., -23., -22., -21., -20., -19., -18., -17., -16., -15.,
       -14., -13., -12., -11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,
        -3.,  -2.,  -1.,   0.]), 'LoanAmount': array([   0.,  133.,  271.,  409.,  547.,  685.,  823.,  961., 1099.,
       1237., 1375., 1513., 1651., 1789., 1927., 2065., 2203., 2341.,
       2479., 2617., 2755., 2893., 3031., 3169., 3307., 3445., 3583.,
       3721., 3859., 3997., 4135., 4273., 4411., 4549., 4687., 4825

recovered all minimum-cost items
obtained 0 items in 0.0 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0.]), 'Single': array([0., 1.]), 'Age': array([0., 1., 2., 3., 4., 5., 6., 7.]), 'LoanDuration': array([-6., -5., -4., -3., -2., -1.,  0.]), 'LoanAmount': array([    0.,   126.,   264.,   402.,   540.,   678.,   816.,   954.,
        1092.,  1230.,  1368.,  1506.,  1644.,  1782.,  1920.,  2058.,
        2196.,  2334.,  2472.,  2610.,  2748.,  2886.,  3024.,  3162.,
        3300.,  3438.,  3576.,  3714.,  3852.,  3990.,  4128.,  4266.,
        4404.,  4542.,  4680.,  4818.,  4956.,  5094.,  5232.,  5370.,
        5508.,  5646.,  5784.,  5922.,  6060.,  6198.,  6336.,  6474.,
        6612.,  6750.,  6888.,  7026.,  7164.,  7302.,  7440.,  7578.,
        7716.,  7854.,  7992.,  8130.,  8268.,  8406.,  8544.,  8682.,
        8820.,  8958.,  9096.,  9234.,  9372.,  9510.,  9648.,  9786.,
        9924., 10062., 10200., 10338., 

recovered all minimum-cost items
obtained 0 items in 0.0 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0., 1.]), 'Single': array([0.]), 'Age': array([-2., -1.,  0.]), 'LoanDuration': array([-6., -5., -4., -3., -2., -1.,  0.]), 'LoanAmount': array([    0.,   100.,   238.,   376.,   514.,   652.,   790.,   928.,
        1066.,  1204.,  1342.,  1480.,  1618.,  1756.,  1894.,  2032.,
        2170.,  2308.,  2446.,  2584.,  2722.,  2860.,  2998.,  3136.,
        3274.,  3412.,  3550.,  3688.,  3826.,  3964.,  4102.,  4240.,
        4378.,  4516.,  4654.,  4792.,  4930.,  5068.,  5206.,  5344.,
        5482.,  5620.,  5758.,  5896.,  6034.,  6172.,  6310.,  6448.,
        6586.,  6724.,  6862.,  7000.,  7138.,  7276.,  7414.,  7552.,
        7690.,  7828.,  7966.,  8104.,  8242.,  8380.,  8518.,  8656.,
        8794.,  8932.,  9070.,  9208.,  9346.,  9484.,  9622.,  9760.,
        9898., 10036., 10174., 10312., 10450., 10588., 1

       12115., 12253., 12391., 12529., 12667., 12805., 12943.]), 'LoanRateAsPercentOfIncome': array([0.]), 'YearsAtCurrentHome': array([-1.,  0.]), 'NumberOfOtherLoansAtBank': array([0.]), 'NumberOfLiableIndividuals': array([0., 1.]), 'HasTelephone': array([0.]), 'CheckingAccountBalance_geq_0': array([0.]), 'CheckingAccountBalance_geq_200': array([0.]), 'SavingsAccountBalance_geq_100': array([0.]), 'SavingsAccountBalance_geq_500': array([0.]), 'MissedPayments': array([0.]), 'NoCurrentLoan': array([0.]), 'CriticalAccountOrLoansElsewhere': array([0., 1.]), 'OtherLoansAtBank': array([0.]), 'HasCoapplicant': array([0.]), 'HasGuarantor': array([0., 1.]), 'OwnsHouse': array([0.]), 'RentsHouse': array([0.]), 'Unemployed': array([0.]), 'YearsAtCurrentJob_lt_1': array([0., 1.]), 'YearsAtCurrentJob_geq_4': array([0.]), 'JobClassIsSkilled': array([0.]), 'isMale': array([0.])}
done building mip
attached x
end RecourseBuilder
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populati

done building mip
attached x
end RecourseBuilder
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0.]), 'Single': array([0.]), 'Age': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20.]), 'LoanDuration': array([-3., -2., -1.,  0.]), 'LoanAmount': array([    0.,    85.,   223.,   361.,   499.,   637.,   775.,   913.,
        1051.,  1189.,  1327.,  1465.,  1603.,  1741.,  1879.,  2017.,
        2155.,  2293.,  2431.,  2569.,  2707.,  2845.,  2983.,  3121.,
        3259.,  3397.,  3535.,  3673.,  3811.,  3949.,  4087.,  4225.,
        4363.,  4501.,  4639.,  4777.,  4915.,  5053.,  5191.,  5329.,
        5467.,  5605.,  5743.,  5881.,  6019.,  6157.,  6295.,  6433.,
        6571.,  6709.,  6847.,  6985.,  7123.,  7261.,  7399.,  7537.,
        7675.,  7813.,  7951.,  8089.,  8227.,  8365.,  8503.,  8641.,


done building mip
attached x
end RecourseBuilder
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0.]), 'Single': array([0.]), 'Age': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
       26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38.,
       39., 40., 41., 42.]), 'LoanDuration': array([-4., -3., -2., -1.,  0.]), 'LoanAmount': array([0.0000e+00, 1.2000e+01, 1.5000e+02, 2.8800e+02, 4.2600e+02,
       5.6400e+02, 7.0200e+02, 8.4000e+02, 9.7800e+02, 1.1160e+03,
       1.2540e+03, 1.3920e+03, 1.5300e+03, 1.6680e+03, 1.8060e+03,
       1.9440e+03, 2.0820e+03, 2.2200e+03, 2.3580e+03, 2.4960e+03,
       2.6340e+03, 2.7720e+03, 2.9100e+03, 3.0480e+03, 3.1860e+03,
       3.3240e+03, 3.4620e+03, 3.6000e+03, 3.7380e+03, 3.8760e+03,
       4.0140e+03, 4.1520e+03, 4.2900

recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0., 1.]), 'Single': array([0.]), 'Age': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
       26., 27., 28., 29., 30., 31., 32., 33., 34.]), 'LoanDuration': array([-1.,  0.]), 'LoanAmount': array([    0.,    80.,   218.,   356.,   494.,   632.,   770.,   908.,
        1046.,  1184.,  1322.,  1460.,  1598.,  1736.,  1874.,  2012.,
        2150.,  2288.,  2426.,  2564.,  2702.,  2840.,  2978.,  3116.,
        3254.,  3392.,  3530.,  3668.,  3806.,  3944.,  4082.,  4220.,
        4358.,  4496.,  4634.,  4772.,  4910.,  5048.,  5186.,  5324.,
        5462.,  5600.,  5738.,  5876.,  6014.,  6152.,  6290.,  6428.,
        6566.,  6704.,  6842.,  6980.,  7118.,  7256.,  7394.,  7532.,
        7670.,  7808.,  7946.,  8084.,  8222.,  8

recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0., 1.]), 'Single': array([-1.,  0.]), 'Age': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.]), 'LoanDuration': array([-6., -5., -4., -3., -2., -1.,  0.]), 'LoanAmount': array([    0.,   127.,   265.,   403.,   541.,   679.,   817.,   955.,
        1093.,  1231.,  1369.,  1507.,  1645.,  1783.,  1921.,  2059.,
        2197.,  2335.,  2473.,  2611.,  2749.,  2887.,  3025.,  3163.,
        3301.,  3439.,  3577.,  3715.,  3853.,  3991.,  4129.,  4267.,
        4405.,  4543.,  4681.,  4819.,  4957.,  5095.,  5233.,  5371.,
        5509.,  5647.,  5785.,  5923.,  6061.,  6199.,  6337.,  6475.,
        6613.,  6751.,  6889.,  7027.,  7165.,  7303.,  7441.,  7579.,
        7717.,  7855.,  7993.,  8131.,  8269.,  8407.,  8545.,  8683.,
        8821.,  8959.,  9097.,  9235.,  9373.,  9511.,  9649.,  9787.,
     

recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0.]), 'Single': array([0.]), 'Age': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
       26., 27., 28., 29., 30., 31.]), 'LoanDuration': array([-12., -11., -10.,  -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,
        -1.,   0.]), 'LoanAmount': array([    0.,   119.,   257.,   395.,   533.,   671.,   809.,   947.,
        1085.,  1223.,  1361.,  1499.,  1637.,  1775.,  1913.,  2051.,
        2189.,  2327.,  2465.,  2603.,  2741.,  2879.,  3017.,  3155.,
        3293.,  3431.,  3569.,  3707.,  3845.,  3983.,  4121.,  4259.,
        4397.,  4535.,  4673.,  4811.,  4949.,  5087.,  5225.,  5363.,
        5501.,  5639.,  5777.,  5915.,  6053.,  6191.,  6329.,  6467.,
        6605.,  6743.,  6881.,  7019.,  7157.,  7295.,  7433.,  

done building mip
attached x
end RecourseBuilder
recovered all minimum-cost items
obtained 0 items in 0.0 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0., 1.]), 'Single': array([0.]), 'Age': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
       26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38.,
       39., 40., 41., 42.]), 'LoanDuration': array([-6., -5., -4., -3., -2., -1.,  0.]), 'LoanAmount': array([    0.,    95.,   233.,   371.,   509.,   647.,   785.,   923.,
        1061.,  1199.,  1337.,  1475.,  1613.,  1751.,  1889.,  2027.,
        2165.,  2303.,  2441.,  2579.,  2717.,  2855.,  2993.,  3131.,
        3269.,  3407.,  3545.,  3683.,  3821.,  3959.,  4097.,  4235.,
        4373.,  4511.,  4649.,  4787.,  4925.,  5063.,  5201.,  5339.,
        5477.,  5615.,  5753.,  5891.,  6029.,  6167.,  6305.,  6443.,

done building mip
attached x
end RecourseBuilder
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0.]), 'Single': array([0.]), 'Age': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
       26., 27., 28., 29.]), 'LoanDuration': array([-30., -29., -28., -27., -26., -25., -24., -23., -22., -21., -20.,
       -19., -18., -17., -16., -15., -14., -13., -12., -11., -10.,  -9.,
        -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.]), 'LoanAmount': array([    0.,    58.,   196.,   334.,   472.,   610.,   748.,   886.,
        1024.,  1162.,  1300.,  1438.,  1576.,  1714.,  1852.,  1990.,
        2128.,  2266.,  2404.,  2542.,  2680.,  2818.,  2956.,  3094.,
        3232.,  3370.,  3508.,  3646.,  3784.,  3922.,  4060.,  4198.,
        4336.,  4474.,  4612.,  4750.,  4888.,  5026

       1.1600e+04, 1.1738e+04, 1.1876e+04, 1.2014e+04, 1.2152e+04]), 'LoanRateAsPercentOfIncome': array([0.]), 'YearsAtCurrentHome': array([0., 1., 2.]), 'NumberOfOtherLoansAtBank': array([0.]), 'NumberOfLiableIndividuals': array([0.]), 'HasTelephone': array([0.]), 'CheckingAccountBalance_geq_0': array([0., 1.]), 'CheckingAccountBalance_geq_200': array([0.]), 'SavingsAccountBalance_geq_100': array([0.]), 'SavingsAccountBalance_geq_500': array([0.]), 'MissedPayments': array([0.]), 'NoCurrentLoan': array([0.]), 'CriticalAccountOrLoansElsewhere': array([0., 1.]), 'OtherLoansAtBank': array([0.]), 'HasCoapplicant': array([0.]), 'HasGuarantor': array([0., 1.]), 'OwnsHouse': array([0.]), 'RentsHouse': array([0.]), 'Unemployed': array([0., 1.]), 'YearsAtCurrentJob_lt_1': array([0., 1.]), 'YearsAtCurrentJob_geq_4': array([0.]), 'JobClassIsSkilled': array([-1.,  0.]), 'isMale': array([0.])}
done building mip
attached x
end RecourseBuilder
recovered all minimum-cost items
obtained 0 items in 0.1 

recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0., 1.]), 'Single': array([0.]), 'Age': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.]), 'LoanDuration': array([-30., -29., -28., -27., -26., -25., -24., -23., -22., -21., -20.,
       -19., -18., -17., -16., -15., -14., -13., -12., -11., -10.,  -9.,
        -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.]), 'LoanAmount': array([    0.,   132.,   270.,   408.,   546.,   684.,   822.,   960.,
        1098.,  1236.,  1374.,  1512.,  1650.,  1788.,  1926.,  2064.,
        2202.,  2340.,  2478.,  2616.,  2754.,  2892.,  3030.,  3168.,
        3306.,  3444.,  3582.,  3720.,  3858.,  3996.,  4134.,  4272.,
        4410.,  4548.,  4686.,  4824.,  4962.,  5100.,  5238.,  5376.,
        5514.,  5652.,  5790.,  5928.,  6066.,  6204.,  6342.,  6480.,
        6618.,  6756.,  6894.,  7032.,  7170.,  7308.,  744

done building mip
attached x
end RecourseBuilder
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0., 1.]), 'Single': array([0., 1.]), 'Age': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
       26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38.,
       39., 40., 41., 42., 43., 44., 45.]), 'LoanDuration': array([-30., -29., -28., -27., -26., -25., -24., -23., -22., -21., -20.,
       -19., -18., -17., -16., -15., -14., -13., -12., -11., -10.,  -9.,
        -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.]), 'LoanAmount': array([   0.,  130.,  268.,  406.,  544.,  682.,  820.,  958., 1096.,
       1234., 1372., 1510., 1648., 1786., 1924., 2062., 2200., 2338.,
       2476., 2614., 2752., 2890., 3028., 3166., 3304., 3442., 3580.,
       3718., 3856., 3994., 413

recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0.]), 'Single': array([0.]), 'Age': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
       26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38.,
       39., 40., 41., 42., 43.]), 'LoanDuration': array([-42., -41., -40., -39., -38., -37., -36., -35., -34., -33., -32.,
       -31., -30., -29., -28., -27., -26., -25., -24., -23., -22., -21.,
       -20., -19., -18., -17., -16., -15., -14., -13., -12., -11., -10.,
        -9.,  -8.,  -7.,  -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.]), 'LoanAmount': array([0.]), 'LoanRateAsPercentOfIncome': array([0., 1., 2.]), 'YearsAtCurrentHome': array([0., 1., 2.]), 'NumberOfOtherLoansAtBank': array([0.]), 'NumberOfLiableIndividuals': array([0.]), 'HasTelephone': array([-1.,  0.]), 'Chec

recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0.]), 'Single': array([0.]), 'Age': array([-2., -1.,  0.]), 'LoanDuration': array([-6., -5., -4., -3., -2., -1.,  0.]), 'LoanAmount': array([    0.,   108.,   246.,   384.,   522.,   660.,   798.,   936.,
        1074.,  1212.,  1350.,  1488.,  1626.,  1764.,  1902.,  2040.,
        2178.,  2316.,  2454.,  2592.,  2730.,  2868.,  3006.,  3144.,
        3282.,  3420.,  3558.,  3696.,  3834.,  3972.,  4110.,  4248.,
        4386.,  4524.,  4662.,  4800.,  4938.,  5076.,  5214.,  5352.,
        5490.,  5628.,  5766.,  5904.,  6042.,  6180.,  6318.,  6456.,
        6594.,  6732.,  6870.,  7008.,  7146.,  7284.,  7422.,  7560.,
        7698.,  7836.,  7974.,  8112.,  8250.,  8388.,  8526.,  8664.,
        8802.,  8940.,  9078.,  9216.,  9354.,  9492.,  9630.,  9768.,
        9906., 10044., 10182., 10320., 10458., 10596., 10734

recovered all minimum-cost items
obtained 0 items in 0.1 seconds
populating
before RecourseBuilder
attaching x
building mip
actions_builder:  {'ForeignWorker': array([0.]), 'Single': array([0.]), 'Age': array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22.]), 'LoanDuration': array([-1.,  0.]), 'LoanAmount': array([    0.,   109.,   247.,   385.,   523.,   661.,   799.,   937.,
        1075.,  1213.,  1351.,  1489.,  1627.,  1765.,  1903.,  2041.,
        2179.,  2317.,  2455.,  2593.,  2731.,  2869.,  3007.,  3145.,
        3283.,  3421.,  3559.,  3697.,  3835.,  3973.,  4111.,  4249.,
        4387.,  4525.,  4663.,  4801.,  4939.,  5077.,  5215.,  5353.,
        5491.,  5629.,  5767.,  5905.,  6043.,  6181.,  6319.,  6457.,
        6595.,  6733.,  6871.,  7009.,  7147.,  7285.,  7423.,  7561.,
        7699.,  7837.,  7975.,  8113.,  8251.,  8389.,  8527.,  8665.,
        8803.,  8941.,  9079.,  9217.,  9355.,  9

KeyboardInterrupt: 

In [None]:
print("COMPAS")
calculate_recourse_accuracy(nn, compas_X, compas_y, compas_categorical_features, compas_categorical_names)


In [None]:
print("ADULT")
calculate_recourse_accuracy(nn, adult_X, adult_y, adult_categorical_features, adult_categorical_names)


In [None]:
print("GERMAN")
calculate_recourse_accuracy(rf, german_X, german_y, german_categorical_features, german_categorical_names)

In [None]:
print("COMPAS")
calculate_recourse_accuracy(rf, compas_X, compas_y, compas_categorical_features, compas_categorical_names)


In [None]:
print("ADULT")
display(adult_X)

calculate_recourse_accuracy(rf, adult_X, adult_y, adult_categorical_features, adult_categorical_names)


You can switch optimizers if you don't have CPLEX by setting `optimizer="cbc"`. 

A quick note: Our decision boundary is by default 0. We shift this by tweaking the intercept. Since we used Logistic Regression, we use the trick above to do that. In future iterations, we will provide a more elegant way of doing this.

In [None]:
output_1 = rb.fit()
output_1

all_info = rb.populate()
print(all_info)

Ok, great, we have a solution! This individual has recourse. The total cost of all the actions needed to flip their prediction is the first thing of interest to us. It costs this person $.21$, meaning that the sum of percentile shifts across this person's features is $.21$. That's quite a lot. Imagine having to shift that much relative to a population? Let's check out what this means in terms of actions:

In [None]:
# pd.Series(output_1['actions'], index=X.columns).to_frame('Actions')
actions = [x['actions'] for x in all_info]
actions_df = pd.DataFrame(data=actions).transpose().set_index(X.columns)
person = (pd.Series(x, index=X.columns))
print(person)
display(actions_df)

Ok, so let's read this. 

* `SavingsAccountBalance_geq_100`$=1$, for example. This was a binary feature, so it can only be $1$. This also means that we're enouraging this person to increase their savings. 
* `LoanDuration`$=20$. This, if we recall, was the number of months of loan. This means we're encouraging this person to reapply but specify that their loan repayment period is 20 months shorter.

Let's check if these two actions make sense in the context of this person:

In [None]:
X.loc[denied_individuals[0]].to_frame("Original Features")

Ok, this person originally applied with no savings and with a 4-year repayment period. So asking them to get savings and decrease their loan repayment period by $20$ months make sense as actions.

(Let's leave aside the question of mutually exclusive features (eg. `SavingsAccountBalance_geq_100` $=0$, `SavingsAccountBalance_geq_500`$=1$). We'll get back to that in later releases.)

Let's close by noting some things:

* Immutable features are __not__ changed. That's good. That's recourse.
* The changes make sense, at least directionally. We'd encourage this person to get a gaurantor, to decrease their loan amount, and to decrease their loan period, among other changes.

Yes, these might be hard for someone. They might have other reasons for immutability that we're not considering. Maybe they _need_ that amount and cannot change. Ok, let's express that:

In [None]:
action_set['LoanAmount'].mutable=False

In [None]:
x = X.values[denied_individuals[0]]

p = .8
rb = RecourseBuilder(
      optimizer="cbc",
      coefficients=coefficients,
      intercept=intercept- (np.log(p / (1. - p))),
      action_set=action_set,
      x=x
)

In [None]:
output_2 = rb.fit()
output_2

Ok, so their total cost actually didn't change, which is nice. Let's take a look at their new action set:

In [None]:
pd.Series(output_2['actions'], index=X.columns).to_frame("New Actions")

Ok, by decreasing their repayment period by a bit more and changing some other features, this person can still ask for the same amount. That's good.

The magical thing about both of these action sets is that this person, if they do this, _will_ qualify for a loan. Let's check that:

In [None]:
clf.predict_proba([X.loc[denied_individuals[0]] + pd.Series(output_1['actions'], index=X.columns)])[:, 1]

In [None]:
clf.predict_proba([X.loc[denied_individuals[0]] + pd.Series(output_2['actions'], index=X.columns)])[:, 1]

And there we have it. By making these tweaks, this person has two ways to get over the $.8$ threshold that we've set. This period can now get approved under this model.