Hello! Thank you for checking out our tool.

The purpose of this demo is demonstrate some of the basics. In doing so, we will generate a flipset for one individual. In doing so, we'll show:

1. How to use the ActionSet interface to specify immutable variables and variables with custom ranges.
2. How to use a model to align an ActionSet
3. How to use the RecourseBuilder interface to find the feasibility of one person.

We'll work using CPLEX. The problem is equivalent for CBC. To install either package, read [here](https://github.com/ustunb/actionable-recourse/blob/master/README.md).

In [59]:
import os
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from recourse.builder import RecourseBuilder
from recourse.builder import ActionSet
from recourse.flipset import Flipset


In [60]:
data_dir = "../data/2_1_experiment_1/"
data_name = 'german'
data_file = os.path.join(data_dir, '%s_processed.csv' % data_name)
## load and process data
german_df = pd.read_csv(data_file).reset_index(drop=True)

german_df = (german_df
             .assign(isMale=lambda df: (df['Gender']=='Male').astype(int))
             .drop(['PurposeOfLoan', 'Gender', 'OtherLoansAtStore'], axis=1)
            )

Make the data not ohe.

In [61]:
#need the data for recourse and lime to be NOT one-hot-encoded and to be numerical
#need the data for the classifier to be one hot encoded

# german_df['YearsAtCurrentJob_lt_1'] = german_df['YearsAtCurrentJob_lt_1'].replace(1, 'lt_1')
# german_df['YearsAtCurrentJob'] = german_df['YearsAtCurrentJob_lt_1']
# german_df['YearsAtCurrentJob_geq_4'] = german_df['YearsAtCurrentJob_geq_4'].replace(1, 'geq_4')
# german_df['YearsAtCurrentJob'] = german_df.apply(lambda row: 'geq_4' if row['YearsAtCurrentJob_geq_4'] == 'geq_4' else row['YearsAtCurrentJob'], axis=1)
# german_df['YearsAtCurrentJob'] = german_df['YearsAtCurrentJob_lt_1'].replace(0, 'bet_1_4')
# german_df = german_df.drop(['YearsAtCurrentJob_lt_1', 'YearsAtCurrentJob_geq_4'], axis=1)

# german_df['CheckingAccountBalance_geq_0'] = german_df['CheckingAccountBalance_geq_0'].replace(1, 'geq_0')
# german_df['CheckingAccountBalance_geq_200'] = german_df['CheckingAccountBalance_geq_200'].replace(1, 'geq_200')
# german_df['CheckingAccountBalance'] = german_df['CheckingAccountBalance_geq_0']
# german_df['CheckingAccountBalance'] = german_df.apply(lambda row: 'geq_200' if row['CheckingAccountBalance_geq_200'] == 'geq_200' else row['CheckingAccountBalance'], axis=1)
# german_df['CheckingAccountBalance'] = german_df['CheckingAccountBalance'].replace('geq_0', '0_200')
# german_df = german_df.drop(['CheckingAccountBalance_geq_0', 'CheckingAccountBalance_geq_200'], axis=1)

# german_df['SavingsAccountBalance_geq_100'] = german_df['SavingsAccountBalance_geq_100'].replace(1, '100_500')
# german_df['SavingsAccountBalance_geq_500'] = german_df['SavingsAccountBalance_geq_500'].replace(1, 'geq_500')
# german_df['SavingsAccountBalance'] = german_df['SavingsAccountBalance_geq_100']
# german_df['SavingsAccountBalance'] = german_df.apply(lambda row: 'geq_500' if row['SavingsAccountBalance_geq_500'] == 'geq_500' else row['SavingsAccountBalance'], axis=1)
# german_df['SavingsAccountBalance'] = german_df['SavingsAccountBalance'].replace('0', 'lt_100')
# german_df = german_df.drop(['SavingsAccountBalance_geq_100', 'SavingsAccountBalance_geq_500'], axis=1)
# display(german_df)


In [93]:
y = german_df['GoodCustomer']
X = german_df.drop('GoodCustomer', axis=1)

pd.set_option('display.max_columns', None)
# display(X)
# display(y)

In [94]:
# msk = np.random.rand(len(X)) < 0.8
# train = X[msk]
# test = X[~msk]

# train_y = y[msk]
# test_y = y[~msk]

Currently, no immutable features.

# Train model

Ok great, now let's get into the meat of it. Let's train up a model as see what recourse exists.

In [96]:
##### from sklearn.neural_network import MLPClassifier
import lime.explanation
import lime.lime_tabular

nn = MLPClassifier(hidden_layer_sizes=(100))

nn.fit(X, y) 
# nn.fit(train, train_y)
# print(nn.score(train, train_y))
# print(nn.score(test, test_y))
print(nn.score(X, y))

classes = nn.classes_
print(classes)

# need to specify which features are categorical for lime
categorical_features = [0, 1, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
columns = X.columns
categorical_names = [columns[i] for i in categorical_features]

print(len(columns))

# class_names have to be ordered according to what the classifier is using
explainer = lime.lime_tabular.LimeTabularExplainer(X.values, categorical_features=categorical_features, 
                                                   categorical_names=categorical_names, \
                                                   feature_names=X.columns, class_names=classes, \
                                                   discretize_continuous=False, kernel_width = 1)


0.585
[-1  1]
27


# Generate Recourse

First, let's score everyone using our model. Now, let's say that we will give loans to anyone with a greater than a $80\%$ chance of paying it back

In [111]:
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Ridge
from IPython.display import HTML

def convert_binary_categorical_coefficients(exp_list):
    cleaned_exp_dict = {}
    for (feat, coeff) in exp_list:
        if "=" in feat:
            original_feat, val = feat.split("=")
            int_val = int(val)
            if int_val == 1:
                cleaned_exp_dict[original_feat] = coeff
            else:
                cleaned_exp_dict[original_feat] = -1 * coeff
        else:
            cleaned_exp_dict[feat] = coeff
    return cleaned_exp_dict

scaled_X = (X - explainer.scaler.mean_) / explainer.scaler.scale_
action_set = ActionSet(X = X)
action_set['LoanDuration'].step_type ="absolute"
action_set['LoanDuration'].step_size = 6

scores = pd.Series(nn.predict_proba(X)[:, 1])
recourses = [None] * len(scores)

total_recourses = 0
total_actual_recourses = 0
error_instances = 0

for i, dn in enumerate(scores):
    if i % 25 == 0:
        print("\n", i, " out of ", len(scores))
    x = X.values[i]
    
    num_features = len(x)
    exp = explainer.explain_instance(x, nn.predict_proba, num_features = num_features, num_samples = 10000)
    local_pred = exp.local_pred
    
#     print(local_pred[0])
    nn_pred = (nn.predict_proba([x])[0][1])
#     print(nn_pred)
    
#     print("SCORE: ", exp.score)
    
#     if local_pred >= 0.5:
#         continue
    
    cleaned_exp_dict = convert_binary_categorical_coefficients(exp.as_list())

    columns = X.columns
    coefficients = [None] * num_features
    for i, col in enumerate(columns):
        coefficients[i] = cleaned_exp_dict[col]
        
    intercept = exp.intercept[1]
    
    x_shift = np.array(explainer.scaler.mean_)
    x_scale = np.array(explainer.scaler.scale_)
    w = coefficients / x_scale
    b = intercept - np.dot(w, x_shift) - 0.5
        
    action_set.align(coefficients=w)
#     print(w)
    fb = Flipset(x = x, action_set = action_set, coefficients = w, intercept = b)
    
    try:
        fb.populate(enumeration_type = 'distinct_subsets', total_items = 10)
        error = False

    except ValueError:
        error_instances += 1
        error = True
    
    recourse = {}
    recourse['idx'] = i
    recourse['instance'] = x
    recourse['model_prob'] = nn_pred
    recourse['lime_prob'] = local_pred
    recourse['model_pred'] = 1 if nn_pred >= 0.5 else -1
    recourse['lime_pred'] = 1 if local_pred >= 0.5 else -1
    
    recourse['lime_coeff'] = coefficients
    recourse['lime_intercept'] = intercept
    recourse['scaled_coeff'] = w
    recourse['scaled_intercept'] = b
    recourse['actions'] = [result['actions'] for result in fb.items]
    recourse['error_solving'] = error
    recourses[i] = recourse
    
    
    
    for action in recourse['actions']:
        new_x = (x + action)
        old_pred = recourse['model_pred']
        new_pred = nn.predict(new_x.reshape(1, -1))[0]
        
        new_lime_pred = 1 if np.dot(w, new_x) + b >= 0.0 else -1
        total_recourses += 1
        
        if old_pred == new_pred:
            total_actual_recourses += 1
#         print(old_pred, new_pred)
#         print(old_pred, new_lime_pred)
    
#     if len(fb.items) == 0:
#         continue
        
        
#     print(fb)
#     display(fb)
#     display(HTML(fb.to_html()))



 0  out of  1000
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
obtained 10 items in 0.5 seconds
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
obtained 10 items in 0.5 seconds
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
obtained 10 items in 0.5 seconds
recovered all minimum-cost items
obtained 0 items in 0.0 seconds
obtained 10 items in 0.5 seconds
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
obtained 10 items in 0.7 seconds
obtained 10 items in 0.5 seconds
obtained 10 items in 0.5 seconds
obtained 10 items in 0.6 seconds
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
obtained 10 items in 0.6 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.6 seconds
obtained 10 items in 0.7 seconds
obtained 10 items in 0.5 seconds
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
recovered all minimum-cost items
obtained 0 items in 0.1 seconds


obtained 10 items in 0.5 seconds
obtained 10 items in 0.4 seconds
recovered all minimum-cost items
obtained 0 items in 0.0 seconds
recovered all minimum-cost items
obtained 0 items in 0.0 seconds
obtained 10 items in 0.5 seconds
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
recovered all minimum-cost items
obtained 0 items in 0.0 seconds
obtained 10 items in 0.5 seconds
obtained 10 items in 0.5 seconds
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.5 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.5 seconds
    model=unknown;
        message from solver=<undefined>
obtained 10 items in 0.5 seconds
obtained 10 items in 0.5 seconds

 200  out of  1000
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
obtained 10 items in 0.5 seconds
obtained 10 items in 0.5 seconds
obtained 10 items in 0.5 seconds
obt

obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
    model=unknown;
        message from solver=<undefined>
    model=unknown;
        message from solver=<undefined>

 375  out of  1000
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
recovered all minimum-cost items
obtained 0 items in 0.1 seconds
    model=unknown;
        message from solver=<undefined>
recovered all minimum-cost items
obtained 0 items in 0.0 seconds
obtained 10 items in 0.4 seconds
    model=unknown;
        message from solver=<undefined>
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in 0.4 seconds
obtained 10 items in

ApplicationError: Solver (cbc) did not exit normally

In [112]:
print(total_actual_recourses/total_recourses)

0.43617021276595747


You can switch optimizers if you don't have CPLEX by setting `optimizer="cbc"`. 

A quick note: Our decision boundary is by default 0. We shift this by tweaking the intercept. Since we used Logistic Regression, we use the trick above to do that. In future iterations, we will provide a more elegant way of doing this.

In [None]:
output_1 = rb.fit()
output_1

all_info = rb.populate()
print(all_info)

Ok, great, we have a solution! This individual has recourse. The total cost of all the actions needed to flip their prediction is the first thing of interest to us. It costs this person $.21$, meaning that the sum of percentile shifts across this person's features is $.21$. That's quite a lot. Imagine having to shift that much relative to a population? Let's check out what this means in terms of actions:

In [None]:
# pd.Series(output_1['actions'], index=X.columns).to_frame('Actions')
actions = [x['actions'] for x in all_info]
actions_df = pd.DataFrame(data=actions).transpose().set_index(X.columns)
person = (pd.Series(x, index=X.columns))
print(person)
display(actions_df)

Ok, so let's read this. 

* `SavingsAccountBalance_geq_100`$=1$, for example. This was a binary feature, so it can only be $1$. This also means that we're enouraging this person to increase their savings. 
* `LoanDuration`$=20$. This, if we recall, was the number of months of loan. This means we're encouraging this person to reapply but specify that their loan repayment period is 20 months shorter.

Let's check if these two actions make sense in the context of this person:

In [None]:
X.loc[denied_individuals[0]].to_frame("Original Features")

Ok, this person originally applied with no savings and with a 4-year repayment period. So asking them to get savings and decrease their loan repayment period by $20$ months make sense as actions.

(Let's leave aside the question of mutually exclusive features (eg. `SavingsAccountBalance_geq_100` $=0$, `SavingsAccountBalance_geq_500`$=1$). We'll get back to that in later releases.)

Let's close by noting some things:

* Immutable features are __not__ changed. That's good. That's recourse.
* The changes make sense, at least directionally. We'd encourage this person to get a gaurantor, to decrease their loan amount, and to decrease their loan period, among other changes.

Yes, these might be hard for someone. They might have other reasons for immutability that we're not considering. Maybe they _need_ that amount and cannot change. Ok, let's express that:

In [None]:
action_set['LoanAmount'].mutable=False

In [None]:
x = X.values[denied_individuals[0]]

p = .8
rb = RecourseBuilder(
      optimizer="cbc",
      coefficients=coefficients,
      intercept=intercept- (np.log(p / (1. - p))),
      action_set=action_set,
      x=x
)

In [None]:
output_2 = rb.fit()
output_2

Ok, so their total cost actually didn't change, which is nice. Let's take a look at their new action set:

In [None]:
pd.Series(output_2['actions'], index=X.columns).to_frame("New Actions")

Ok, by decreasing their repayment period by a bit more and changing some other features, this person can still ask for the same amount. That's good.

The magical thing about both of these action sets is that this person, if they do this, _will_ qualify for a loan. Let's check that:

In [None]:
clf.predict_proba([X.loc[denied_individuals[0]] + pd.Series(output_1['actions'], index=X.columns)])[:, 1]

In [None]:
clf.predict_proba([X.loc[denied_individuals[0]] + pd.Series(output_2['actions'], index=X.columns)])[:, 1]

And there we have it. By making these tweaks, this person has two ways to get over the $.8$ threshold that we've set. This period can now get approved under this model.