Hello! Thank you for checking out our tool.

The purpose of this demo is demonstrate some of the basics. In doing so, we will generate a flipset for one individual. In doing so, we'll show:

1. How to use the ActionSet interface to specify immutable variables and variables with custom ranges.
2. How to use a model to align an ActionSet
3. How to use the RecourseBuilder interface to find the feasibility of one person.

We'll work using CPLEX. The problem is equivalent for CBC. To install either package, read [here](https://github.com/ustunb/actionable-recourse/blob/master/README.md).

In [1]:
import os
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
import sklearn
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from recourse.builder import RecourseBuilder
from recourse.builder import ActionSet
from recourse.flipset import Flipset
import traceback
import seaborn as sns

data_dir = "../data/2_1_experiment_1/"
pd.set_option('display.max_columns', None)


In [2]:
from sklearn.preprocessing import LabelEncoder

def ohe(data, categorical_names, encoder, columns = [], target_columns = []):
    if isinstance(data, pd.DataFrame):
        df_data = data.copy()
    else:
        if len(columns) == 0:
            raise ValueError('Need to supply columns to make a pandas dataframe')
        df_data = pd.DataFrame(data, columns = columns)        
    transformed = encoder.transform(df_data[categorical_names])
    df_transformed = (pd.DataFrame(transformed, columns = encoder.get_feature_names(input_features = categorical_names)))
    to_return = pd.concat([df_data.reset_index(drop = True), df_transformed], axis=1).drop(categorical_names, axis=1)
    if len(target_columns) != 0:
        to_return = to_return[target_columns]
    return to_return, to_return.columns

def un_ohe(ohe_data, categorical_names, encoder, columns):
    ohe_categorical_columns = encoder.get_feature_names(input_features = categorical_names)
    if len(columns) == 0:
            raise ValueError('Need to supply columns to make a pandas dataframe')
    if isinstance(ohe_data, pd.DataFrame):
        df_ohe_data = ohe_data.copy()
    else:
        df_ohe_data = pd.DataFrame(ohe_data, columns = columns)        
    untransformed = encoder.inverse_transform(df_ohe_data[ohe_categorical_columns])
    untransformed_df = pd.DataFrame(untransformed, columns = categorical_names)
    to_return = pd.concat([df_ohe_data.reset_index(drop = True), untransformed_df], axis=1).drop(ohe_categorical_columns, axis=1)
    return to_return[columns]

#NEED TO FINISH
# target_columns: columns of one-hot-encoded
def ohe_coefficients(w, starting_columns, x, enc, categorical_names, target_columns):
    new_coefficients = {}
    categories_mapping = {}
    ohe_columns = ohe([x], categorical_names, enc, columns = starting_columns)[0].columns
    
    for feat_idx, feat in enumerate(categorical_names):
        categories_mapping[feat] = enc.categories_[feat_idx]
    
    for coef, feat, val in zip(w, starting_columns, x):
        if feat in categorical_names:
            feat_ohe_columns = [col for col in target_columns if feat in col]
            all_values = categories_mapping[feat]
            for poss_val in all_values:
                if val == poss_val:
                    new_coefficients[feat + "_" + str(int(val))] = coef
                else:
                    new_coefficients[feat + "_" + str(int(poss_val))] = 0.000001
        else:
            new_coefficients[feat] = coef
        
    sorted_new_coefficients = []
    for col in target_columns:
        sorted_new_coefficients.append(new_coefficients[col])
        
    return sorted_new_coefficients


def get_label_encoders(categorical_names, data):
    return_data = data.copy()
    categorical_encoders = {}

    for cf in categorical_names:
        le = LabelEncoder()
        le.fit(data[cf])
        return_data[cf] = le.transform(return_data[cf])
        categorical_encoders[cf] = le
    
    return categorical_encoders, return_data

# COMPAS dataset

In [3]:
data_name = "compas-scores-two-years"
data_file = os.path.join(data_dir, '%s.csv' % data_name)
## load and process data
compas_df = pd.read_csv(data_file).reset_index(drop=True)

# filter according to https://github.com/propublica/compas-analysis/blob/master/Compas%20Analysis.ipynb and https://github.com/dylan-slack/Fooling-LIME-SHAP/blob/e763fdea8242f4f3a5955951161c69f573db624d/get_data.py#L5
# compas_df = compas_df.loc[(compas_df['days_b_screening_arrest'] <= 30) & \
#                           (compas_df['days_b_screening_arrest'] >= -30) & \
#                           (compas_df['is_recid'] != -1) & \
#                           (compas_df['c_charge_degree'] != "O") & \
#                           (compas_df['score_text'] != "NA")]

# cols_with_missing_values = []
# for col in compas_df.columns:
#     if len(np.where(compas_df[col].values == '?')[0]) >= 1 or compas_df[col].isnull().values.any():
#         cols_with_missing_values.append(col)    
# compas_df = compas_df.drop(cols_with_missing_values, axis=1)

compas_df['length_of_stay'] = (pd.to_datetime(compas_df['c_jail_out']) - pd.to_datetime(compas_df['c_jail_in'])).dt.days

compas_df = compas_df[['age', 'two_year_recid','juv_misd_count', 'juv_fel_count', 'juv_other_count', 'c_charge_degree', 'race', 'sex', 'priors_count', 'length_of_stay', 'score_text']]
compas_df = compas_df.dropna()

compas_X = compas_df.drop('score_text', axis=1)

compas_y = pd.Series(np.array([-1 if score == 'High' else 1 for score in compas_df['score_text']]))
# compas_X = compas_df.drop('score_text', axis=1)

# CATEGORICAL FEATURES
compas_categorical_features = [1, 5, 6, 7]

compas_X = compas_X.reset_index(drop = True)
compas_X.columns = compas_X.columns.str.replace("_", "-")
compas_y = compas_y.reset_index(drop = True)

columns = compas_X.columns
compas_categorical_names = [columns[i] for i in compas_categorical_features] 


# ENCODE NUMERICALLY
compas_label_encoders, compas_X = (get_label_encoders(compas_categorical_names, compas_X))

# CREATE ENCODER
from sklearn.preprocessing import OneHotEncoder
compas_enc = OneHotEncoder(sparse = False, handle_unknown='error')
compas_enc.fit(compas_X[compas_categorical_names])
print(compas_enc.categories_)

# display(compas_X)
ohe_compas_X, _ = ohe(compas_X, compas_categorical_names, compas_enc)
# display(ohe_compas_X)
# display(un_ohe(ohe_compas_X, compas_categorical_names, compas_enc, columns = compas_X.columns))
display(ohe_compas_X)


[array([0, 1]), array([0, 1]), array([0, 1, 2, 3, 4, 5]), array([0, 1])]


Unnamed: 0,age,juv-misd-count,juv-fel-count,juv-other-count,priors-count,length-of-stay,two-year-recid_0,two-year-recid_1,c-charge-degree_0,c-charge-degree_1,race_0,race_1,race_2,race_3,race_4,race_5,sex_0,sex_1
0,69,0,0,0,0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0
1,34,0,0,0,0,10.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2,24,0,0,1,4,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
3,44,0,0,0,0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0
4,41,0,0,0,14,6.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6902,23,0,0,0,0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
6903,23,0,0,0,0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
6904,57,0,0,0,0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0
6905,33,0,0,0,3,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


# German Credit dataset

In [4]:
data_name = 'german_processed'
data_file = os.path.join(data_dir, '%s.csv' % data_name)
## load and process data
german_df = pd.read_csv(data_file).reset_index(drop=True)

german_df = (german_df
             .assign(isMale=lambda df: (df['Gender']=='Male').astype(int))
             .drop(['PurposeOfLoan', 'Gender', 'OtherLoansAtStore', 'HasTelephone'], axis=1)
            )

german_y = german_df['GoodCustomer']
german_X = german_df.drop('GoodCustomer', axis=1)

# def label_CheckingAccountBalance(row):
#     if row['CheckingAccountBalance_geq_0'] == 1 :
#         return "geq0"
#     elif row['CheckingAccountBalance_geq_200'] == 1:
#         return "geq200"
    
# def label_SavingsAccountBalance(row):
#     if row['SavingsAccountBalance_geq_100'] == 1 :
#         return "geq100"
#     elif row['SavingsAccountBalance_geq_500'] == 1:
#         return "geq500"
    
# def label_YearsAtCurrentJob(row):
#     if row['YearsAtCurrentJob_lt_1'] == 1 :
#         return "lt1"
#     elif row['YearsAtCurrentJob_geq_4'] == 1:
#         return "geqr"


# german_X['CheckingAccountBalance'] = german_X.apply (lambda row: label_CheckingAccountBalance(row), axis=1)
# german_X['SavingsAccountBalance'] = german_X.apply (lambda row: label_SavingsAccountBalance(row), axis=1)
# german_X['YearsAtCurrentJob'] = german_X.apply (lambda row: label_YearsAtCurrentJob(row), axis=1)

german_X = (german_X
             .drop(['CheckingAccountBalance_geq_0', 'CheckingAccountBalance_geq_200', \
                    'SavingsAccountBalance_geq_100', 'SavingsAccountBalance_geq_500', \
                   'YearsAtCurrentJob_lt_1', 'YearsAtCurrentJob_geq_4', 'RentsHouse'], axis=1)
            )

display(german_X)

# CATEGORICAL FEATURES
german_categorical_features = [0, 1, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

german_X.columns = german_X.columns.str.replace("_", "-")

columns = german_X.columns
german_categorical_names = [columns[i] for i in german_categorical_features] 


# ENCODE NUMERICALLYnu
german_label_encoders, german_X = (get_label_encoders(german_categorical_names, german_X))

# CREATE ENCODER
from sklearn.preprocessing import OneHotEncoder
german_enc = OneHotEncoder(sparse = False, handle_unknown='error')
german_enc.fit(german_X[german_categorical_names])
print(german_enc.categories_)

ohe_german_X, _ = ohe(german_X, german_categorical_names, german_enc)
# display(ohe_german_X)

display(german_y)
display(german_categorical_features)

Unnamed: 0,ForeignWorker,Single,Age,LoanDuration,LoanAmount,LoanRateAsPercentOfIncome,YearsAtCurrentHome,NumberOfOtherLoansAtBank,NumberOfLiableIndividuals,MissedPayments,NoCurrentLoan,CriticalAccountOrLoansElsewhere,OtherLoansAtBank,HasCoapplicant,HasGuarantor,OwnsHouse,Unemployed,JobClassIsSkilled,isMale
0,0,1,67,6,1169,4,4,2,1,1,0,1,0,0,0,1,0,1,1
1,0,0,22,48,5951,2,2,1,1,1,0,0,0,0,0,1,0,1,0
2,0,1,49,12,2096,2,3,1,2,1,0,1,0,0,0,1,0,0,1
3,0,1,45,42,7882,2,4,1,2,1,0,0,0,0,1,0,0,1,1
4,0,1,53,24,4870,3,4,2,2,0,0,0,0,0,0,0,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,0,0,31,12,1736,3,4,1,1,1,0,0,0,0,0,1,0,0,0
996,0,0,40,30,3857,4,4,1,1,1,0,0,0,0,0,1,0,1,1
997,0,1,38,12,804,4,4,1,1,1,0,0,0,0,0,1,0,1,1
998,0,1,23,45,1845,4,4,1,1,1,0,0,0,0,0,0,0,1,1


[array([0, 1]), array([0, 1]), array([0, 1]), array([0, 1]), array([0, 1]), array([0, 1]), array([0, 1]), array([0, 1]), array([0, 1]), array([0, 1]), array([0, 1]), array([0, 1])]


0      1
1     -1
2      1
3      1
4     -1
      ..
995    1
996    1
997    1
998   -1
999    1
Name: GoodCustomer, Length: 1000, dtype: int64

[0, 1, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

# Adult dataset

In [5]:
data_name = "adult"
data_file = os.path.join(data_dir, '%s.csv' % data_name)
## load and process data
adult_df = pd.read_csv(data_file).reset_index(drop=True)
adult_df.columns = ['age','workclass','fnlwgt','education','education-num','marital-status','occupation','relationship','race','sex',\
                                          'capital-gain','capital-loss','hours-per-week','native-country','label']

# cols_with_missing_values = []
# for col in adult_df.columns:
#     if len(np.where(adult_df[col].values == '?')[0]) >= 1 or adult_df[col].isnull().values.any():
#         cols_with_missing_values.append(col)    
# display(adult_df)
# adult_df = adult_df.drop(cols_with_missing_values, axis=1)

adult_df = adult_df.dropna()
adult_df['native-country-United-States'] = adult_df.apply(lambda row: 1 if 'United-States' in row['native-country'] else 0, axis=1)
adult_df['marital-status-Married'] = adult_df.apply(lambda row: 1 if 'Married' in row['marital-status'] else 0, axis=1)
adult_df = adult_df.drop(['native-country', 'marital-status', 'relationship'], axis=1)

adult_df.columns = adult_df.columns.str.replace(' ', '')
adult_df = adult_df.drop(['fnlwgt', 'education', 'occupation'], axis=1)
adult_X = adult_df.drop('label', axis=1)
adult_y = adult_df['label'].replace(' <=50K', -1)
adult_y = adult_y.replace(' >50K', 1)
display(adult_X)
# CATEGORICAL FEATURES
adult_categorical_features = [1, 3, 4, 8, 9]

adult_X.columns = adult_X.columns.str.replace("_", "-")

columns = adult_X.columns
adult_categorical_names = [columns[i] for i in adult_categorical_features] 


# ENCODE NUMERICALLYnu
adult_label_encoders, adult_X = (get_label_encoders(adult_categorical_names, adult_X))

# CREATE ENCODER
from sklearn.preprocessing import OneHotEncoder
adult_enc = OneHotEncoder(sparse = False, handle_unknown='error')
adult_enc.fit(adult_X[adult_categorical_names])
print(adult_enc.categories_)

ohe_adult_X, _ = ohe(adult_X, adult_categorical_names, adult_enc)
display(ohe_adult_X)

Unnamed: 0,age,workclass,education-num,race,sex,capital-gain,capital-loss,hours-per-week,native-country-United-States,marital-status-Married
0,50,Self-emp-not-inc,13,White,Male,0,0,13,1,1
1,38,Private,9,White,Male,0,0,40,1,0
2,53,Private,7,Black,Male,0,0,40,1,1
3,28,Private,13,Black,Female,0,0,40,0,1
4,37,Private,14,White,Female,0,0,40,1,1
...,...,...,...,...,...,...,...,...,...,...
32555,27,Private,12,White,Female,0,0,38,1,1
32556,40,Private,9,White,Male,0,0,40,1,1
32557,58,Private,9,White,Female,0,0,40,1,0
32558,22,Private,9,White,Male,0,0,20,1,0


[array([0, 1, 2, 3, 4, 5, 6, 7, 8]), array([0, 1, 2, 3, 4]), array([0, 1]), array([0, 1]), array([0, 1])]


Unnamed: 0,age,education-num,capital-gain,capital-loss,hours-per-week,workclass_0,workclass_1,workclass_2,workclass_3,workclass_4,workclass_5,workclass_6,workclass_7,workclass_8,race_0,race_1,race_2,race_3,race_4,sex_0,sex_1,native-country-United-States_0,native-country-United-States_1,marital-status-Married_0,marital-status-Married_1
0,50,13,0,0,13,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
1,38,9,0,0,40,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0
2,53,7,0,0,40,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0
3,28,13,0,0,40,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0
4,37,14,0,0,40,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32555,27,12,0,0,38,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0
32556,40,9,0,0,40,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
32557,58,9,0,0,40,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0
32558,22,9,0,0,20,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0


Make the data not ohe.

In [6]:
#need the data for recourse and lime to be NOT one-hot-encoded and to be numerical
#need the data for the classifier to be one hot encoded

# german_df['YearsAtCurrentJob_lt_1'] = german_df['YearsAtCurrentJob_lt_1'].replace(1, 'lt_1')
# german_df['YearsAtCurrentJob'] = german_df['YearsAtCurrentJob_lt_1']
# german_df['YearsAtCurrentJob_geq_4'] = german_df['YearsAtCurrentJob_geq_4'].replace(1, 'geq_4')
# german_df['YearsAtCurrentJob'] = german_df.apply(lambda row: 'geq_4' if row['YearsAtCurrentJob_geq_4'] == 'geq_4' else row['YearsAtCurrentJob'], axis=1)
# german_df['YearsAtCurrentJob'] = german_df['YearsAtCurrentJob_lt_1'].replace(0, 'bet_1_4')
# german_df = german_df.drop(['YearsAtCurrentJob_lt_1', 'YearsAtCurrentJob_geq_4'], axis=1)

# german_df['CheckingAccountBalance_geq_0'] = german_df['CheckingAccountBalance_geq_0'].replace(1, 'geq_0')
# german_df['CheckingAccountBalance_geq_200'] = german_df['CheckingAccountBalance_geq_200'].replace(1, 'geq_200')
# german_df['CheckingAccountBalance'] = german_df['CheckingAccountBalance_geq_0']
# german_df['CheckingAccountBalance'] = german_df.apply(lambda row: 'geq_200' if row['CheckingAccountBalance_geq_200'] == 'geq_200' else row['CheckingAccountBalance'], axis=1)
# german_df['CheckingAccountBalance'] = german_df['CheckingAccountBalance'].replace('geq_0', '0_200')
# german_df = german_df.drop(['CheckingAccountBalance_geq_0', 'CheckingAccountBalance_geq_200'], axis=1)

# german_df['SavingsAccountBalance_geq_100'] = german_df['SavingsAccountBalance_geq_100'].replace(1, '100_500')
# german_df['SavingsAccountBalance_geq_500'] = german_df['SavingsAccountBalance_geq_500'].replace(1, 'geq_500')
# german_df['SavingsAccountBalance'] = german_df['SavingsAccountBalance_geq_100']
# german_df['SavingsAccountBalance'] = german_df.apply(lambda row: 'geq_500' if row['SavingsAccountBalance_geq_500'] == 'geq_500' else row['SavingsAccountBalance'], axis=1)
# german_df['SavingsAccountBalance'] = german_df['SavingsAccountBalance'].replace('0', 'lt_100')
# german_df = german_df.drop(['SavingsAccountBalance_geq_100', 'SavingsAccountBalance_geq_500'], axis=1)
# display(german_df)


In [7]:
pd.set_option('display.max_columns', None)
# display(X)
# display(y)

In [8]:
# msk = np.random.rand(len(X)) < 0.8
# train = X[msk]
# test = X[~msk]

# train_y = y[msk]
# test_y = y[~msk]

Currently, no immutable features.

# Train model

Ok great, now let's get into the meat of it. Let's train up a model as see what recourse exists.

# Generate Recourse

First, let's score everyone using our model. Now, let's say that we will give loans to anyone with a greater than a $80\%$ chance of paying it back

In [9]:
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Ridge
from IPython.display import HTML
import time

from sklearn.tree import DecisionTreeRegressor
sys.path.append(os.path.dirname(os.getcwd()))
sys.path.append("../..")
from mlinsights.mlinsights.mlmodel import PiecewiseRegressor


# start by randomly picking an action for each feature
def sample_with_actions(instance, actions, num_samples, ordered_feature_names):
    num_features = len(ordered_feature_names)
    sampled_data = np.zeros((num_samples, num_features))
    sampled_data[0, :] = instance
    
    len_actions = [len(actions[feat]) for feat in ordered_feature_names]
    ordered_actions = [actions[feat] for feat in ordered_feature_names]
    
#     print("instance: ", instance)
#     print("actions: ", ordered_actions)
#     print("len_actions; ", len_actions)
#     print("ordered_actions: ", ordered_actions)
    
    # max number of actions
    max_actions = len(actions[max(actions, key=lambda feat:len(actions[feat]))])
            
#     print(len_actions)    
        
    for s in range(1, num_samples):
        sampled_actions = [ordered_actions[i][np.random.choice(x)] for i, x in enumerate(len_actions)]
#         print("sampled_actions: ", sampled_actions)
        sampled_data[s, :] = instance + sampled_actions
#         print("sampled_actions: ", sampled_actions)
        
    return sampled_data

def convert_binary_categorical_coefficients(exp_list):
    cleaned_exp_dict = {}
    for (feat, coeff) in exp_list:
        if "=" in feat:
            original_feat, val = feat.split("=")
            int_val = int(val)
            if int_val == 1:
                cleaned_exp_dict[original_feat] = coeff
            else:
                cleaned_exp_dict[original_feat] = -1 * coeff
        else:
            cleaned_exp_dict[feat] = coeff
    return cleaned_exp_dict

# scaled_X = (X - explainer.scaler.mean_) / explainer.scaler.scale_

In [20]:
from functools import partial
from sklearn.model_selection import train_test_split
from pyutilib.common import ApplicationError
from sklearn.preprocessing import KBinsDiscretizer

sys.path.append(os.path.dirname(os.getcwd()))
sys.path.append("../..")
    
# import MAPLE.MAPLE
from MAPLE.Code.Misc import load_normalize_data, unpack_coefs
from MAPLE.Code.MAPLE import MAPLE

def get_nonzero_actions(feature_names, action):
    action_dict = {}
    for feat_idx, feat_name in enumerate(feature_names):
        action_for_feat = action[feat_idx]
        if action_for_feat != 0:
            action_dict[feat_name] = action_for_feat
    return action_dict

# # X_train is unnormalized
# def find_testing_interventions(x, model, X_train, binary_categorical_features, num_divisions = 100):
#     interventions = [[]] * len(X_train.columns)
#     original_pred = model
#     for feat_idx, feat in X_train.columns:
#         if feat_idx not in binary_categorical_features:
#             feat_min = X_train[feat].min()
#             feat_max = X_train[feat].max()
#             increment = (feat_max - feat_min) / 100
#         else:
#             x[feat_idx] = 
        
# data is array-like of shape (n_samples, n_features)
def get_pred_function(model, categorical_names, enc, columns, target_columns):
    def new_predict_proba(data):
        transformed_data, _ = ohe(data, categorical_names, enc, columns = columns, target_columns = target_columns)
        return model.predict_proba(transformed_data)
    return new_predict_proba        
    
def get_lime_sampled(lime_explainer, x, new_predict_proba, num_features, num_samples, columns, true_index):
    exp = lime_explainer.explain_instance(x, new_predict_proba, num_features = num_features, num_samples = num_samples)
    return exp.inverse, exp.scaled_data, exp.all_sampled_preds, exp.weights, lime_explainer.scaler.mean_, lime_explainer.scaler.scale_
    
def get_lime_coefficients(lime_explainer, x, new_predict_proba, num_features, num_samples, columns, true_index):
    exp = lime_explainer.explain_instance(x, new_predict_proba, num_features = num_features, num_samples = num_samples)
    inverse_data = (pd.DataFrame(exp.inverse, columns = columns))
#     display(inverse_data)
#     display(pd.DataFrame([x], columns = columns))
#     display(inverse_data['race'].value_counts())
#     display(inverse_data['sex'].value_counts())
    local_pred = exp.local_pred

    coefficients = [None] * num_features

    local_exp = exp.local_exp[true_index]
        
    for (feat, coef) in local_exp:
        coefficients[feat] = coef
    
#     cleaned_exp_dict = convert_binary_categorical_coefficients(exp.as_list())
    
#     for j, col in enumerate(columns):
#         coefficients[j] = cleaned_exp_dict[col]

        
#     print("exp.local_exp:", local_exp)
        
    intercept = exp.intercept[true_index]

    x_shift = np.array(lime_explainer.scaler.mean_)
    x_scale = np.array(lime_explainer.scaler.scale_)
    w = coefficients / x_scale
    b = intercept - np.sum(w * x_shift) - 0.5 # subtract 0.5 bc using probs as labels
    
    discrete_yss = (exp.yss[:, true_index] > 0.5).astype(int)
    discrete_sampled_preds = (exp.all_sampled_preds > 0.5).astype(int)
    
    num_pos_yss = (np.count_nonzero(discrete_yss == 1))
    num_neg_yss = (np.count_nonzero(discrete_yss == 0))
    
    num_accurate_preds = np.count_nonzero(discrete_yss == discrete_sampled_preds)
    accuracy_sampled = num_accurate_preds/len(discrete_yss)
    
    return w, b, local_pred, accuracy_sampled

def get_maple_coefficients(maple_explainer, x, mean, std, lime_sampled = [], model_preds_sampled = [], use_distance_weights = True):
    if lime_sampled != []:
        e_maple = maple_explainer.explain(x, lime_sampled = lime_sampled, model_preds_sampled = model_preds_sampled, use_distance_weights = use_distance_weights)
    else:
        e_maple = maple_explainer.explain(x, use_distance_weights = use_distance_weights)
        
    coefs_maple = e_maple["coefs"][1:]
    intercept_maple = e_maple["coefs"][0]
    
    
    w = coefs_maple / std
    b = intercept_maple - np.sum(w * mean) - 0.5 # subtract 0.5 bc using probs as labels
        
    num_pos_yss = (np.count_nonzero(e_maple['selected_sampled_yss'] == 1))
    num_neg_yss = (np.count_nonzero(e_maple['selected_sampled_yss'] == 0))
    
    num_accurate_preds = np.count_nonzero(e_maple['selected_sampled_yss'] == e_maple['selected_sampled_preds'])   
    accuracy_sampled = num_accurate_preds/len(e_maple['selected_sampled_preds'])
    local_pred = e_maple['pred']    
    
#     print(e_maple['weights'])
#     print(len(np.nonzero(e_maple['weights'])[0]))
#     print(e_maple['weights'][np.nonzero(e_maple['weights'])[0]])
    
    return w, b, local_pred, accuracy_sampled

def get_piecewise_coefficients_with_maple(maple_explainer, x, lime_sampled = [], model_preds_sampled = [], use_distance_weights = False):
    e_maple = maple_explainer.explain(x, lime_sampled = lime_sampled, model_preds_sampled = model_preds_sampled, use_distance_weights = use_distance_weights)
    maple_weights = e_maple['weights']
    
    model = PiecewiseRegressor(verbose=False,
                           binner=DecisionTreeRegressor(min_samples_leaf=2500))
    model.fit(lime_sampled, model_preds_sampled, sample_weight=maple_weights)
    
    estimators = model.estimators_
    
    return estimators    

def get_piecewise_coefficients(x, lime_sampled, model_preds_sampled, lime_sampled_weights):
    model = PiecewiseRegressor(verbose=False,
                           binner=DecisionTreeRegressor(min_samples_leaf=2500))
    model.fit(lime_sampled, model_preds_sampled, sample_weight=lime_sampled_weights)
    
#     sampled_preds = model.predict()
    
#     accuracy_sampled = 
    
    estimators = model.estimators_
    
    return estimators

# def get_ohe_coefficients(w):
#     df_w = pd.DataFrame(w, columns = columns)        
#     transformed = encoder.transform(df_data[categorical_names])
#     df_transformed = (pd.DataFrame(transformed, columns = encoder.get_feature_names(input_features = categorical_names)))
#     return pd.concat([df_data.reset_index(drop = True), df_transformed], axis=1).drop(categorical_names, axis=1)

    

def get_recourse(x, action_set, w, b):
    action_set.align(coefficients=w)
    fb = Flipset(x = x, action_set = action_set, coefficients = w, intercept = b)
    
    try:
        print("populating...")
        fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
        actions = fb._builder.actions

        error = False

        returned_actions = [result['actions'] for result in fb.items]

    except (ValueError, ApplicationError, AssertionError) as e:
        print("excepting...")
        print("error: ", e)
        print(traceback.print_exc())
#         print("coeffs from error: ", w)
        error = True

        returned_actions = []
    

    return returned_actions, error

# assumes data is properly formatted
def calculate_recourse_accuracy(model, data, enc, categorical_features, categorical_names, file_name, \
    num_samples = 5000, kernel_width = None, explanation_type = 'lime', lime_sample_around_instance = None, \
    use_lime_sampled_maple = None, maple_use_distance_weights = None, instances_subset = None, \
    min_samples_leaf = None):
    
    instances_with_recourses = []
    
    use_lime_inverse = False
    
    with open(file_name, "a") as f:   
        
        X_train = data['X_train']
        y_train = data['y_train']

        X_val = data['X_val']
        y_val = data['y_val']

        X_test = data['X_test']
        y_test = data['y_test']
        
        ohe_X_train, target_columns = ohe(X_train, categorical_names, enc)
        ohe_X_val, _ = ohe(X_val, categorical_names, enc, target_columns = target_columns)
        ohe_X_test, _ = ohe(X_test, categorical_names, enc, target_columns = target_columns)
        
        print("\n\nTRAIN LABEL SPLIT: ", file=f)
        print(y_train.value_counts(), file=f)

        print("validation score: ", model.score(ohe_X_val, y_val), file=f)

        
        new_predict_proba = get_pred_function(model, categorical_names, enc, X_train.columns, target_columns)
        
        if instances_subset != None:
            calculate_subset_accuracy = True
            subset_total_recourses = 0
            subset_total_actual_recourses = 0
        else:
            instances_subset = []
            calculate_subset_accuracy = False
        
        classes = model.classes_
        true_index = list(classes).index(1)
        
        scores = pd.Series(new_predict_proba(X_test)[:, true_index])
        discrete_scores = pd.Series(model.predict(ohe_X_test))

        total_recourses = 0
        total_actual_recourses = 0
        
        total_instances_with_recourses = 0

        num_actiongrid_regressor_agree = 0
        num_lime_agree = 0
        num_sampled_total = 0
        

        print("NUM SAMPLES: ", num_samples, file=f)
        if kernel_width == None:
            calculated_kernel_width = float(np.sqrt(X_train.values.shape[1]) * .75)
        else:
            calculated_kernel_width = kernel_width
        print("CALCULATED KERNEL WIDTH: ", calculated_kernel_width, file=f)
        print("EXPLANATION TYPE: ", explanation_type, file=f)
        if lime_sample_around_instance != None:
            print("LIME SAMPLE AROUND INSTANCES: ", lime_sample_around_instance, file=f)
        if use_lime_sampled_maple != None:
            print("USE SAMPLED LIME FOR MAPLE: ", use_lime_sampled_maple, file=f)
        if maple_use_distance_weights != None:
            print("USE DISTANCE WEIGHTS FOR MAPLE: ", maple_use_distance_weights, file=f)
        if min_samples_leaf != None:
            print("min_samples_leaf fraction: ", min_samples_leaf, "; actual #: ", min_samples_leaf * X_train.values.shape[0], file=f)            
        print("num unique preds: ", np.unique(discrete_scores, axis=0).shape[0])


        print("TEST LABEL SPLIT: ", file=f)

        print(discrete_scores.value_counts())

        # class_names have to be ordered according to what the classifier is using
        lime_explainer = lime_tabular.LimeTabularExplainer(X_train.values, categorical_features=categorical_features, 
                                                           categorical_names=categorical_names, \
                                                           feature_names=X_train.columns, class_names=classes, \
                                                           discretize_continuous=False, kernel_width = kernel_width, \
                                                           sample_around_instance = lime_sample_around_instance, feature_selection='none')

        train_stddev = X_train[X_train.columns[:]].std()
        train_mean = X_train[X_train.columns[:]].mean()
        
        for feat in categorical_features:
            train_mean[feat] = 0
            train_stddev[feat] = 1

        # Normalize to have mean 0 and variance 1
        norm_X_train = (X_train - train_mean) / train_stddev
        norm_X_val = (X_val - train_mean) / train_stddev
        norm_X_test = (X_test - train_mean) / train_stddev
        
        display(train_mean)
        display

        pred_train = new_predict_proba(X_train)[:, true_index]
        pred_val = new_predict_proba(X_val)[:, true_index]

        maple_explainer = MAPLE(norm_X_train, pred_train, norm_X_val, pred_val)

#         action_set = ActionSet(X = ohe_X_train)
        action_set = ActionSet(X = ohe_X_train, custom_bounds={'race_1':(0, 1.0), 'race_4':(0, 1.0)})
#         action_set = ActionSet(X = ohe_X_train, custom_bounds={'workclass_3':(0, 1.0), 'workclass_8':(0, 1.0), \
#                                                                'marital-status_1':(0, 1.0), 'race_0':(0, 1.0), \
#                                                               'race_1':(0, 1.0), 'race_2':(0, 1.0), 'race_3':(0, 1.0), \
#                                                               'race_4':(0, 1.0)})
    
    
        display(action_set)

        start_time = time.time()
        num_neg_test_preds = 0

        negative_scores = np.nonzero(scores < 0.5)[0]
        recourses = [None] * len(negative_scores)

        num_neg_test_preds = len(negative_scores)

        columns = X_train.columns
        
        print(len(X_train))
        
        if explanation_type == "baseline":
            baseline_model = Ridge()
            baseline_model.fit(norm_X_train, pred_train)
            coefs = baseline_model.coef_
            intercept = baseline_model.intercept_
            baseline_w = coefs / train_stddev
            baseline_b = intercept - np.sum(baseline_w * train_mean) - 0.5 # subtract 0.5 bc using probs as labels       
                
        if "cluster" in explanation_type:
            cluster_model = PiecewiseRegressor(verbose=False,
                                   binner=DecisionTreeRegressor(min_samples_leaf = int(min_samples_leaf * X_train.values.shape[0])))
            if explanation_type == "cluster_train":
                cluster_model.fit(norm_X_train, pred_train)
#             else: NEED SAMPLED POINTS
#                 cluster_model.fit(lime_sampled, model_preds_sampled, sample_weight=lime_sampled_weights)
            cluster_estimators = cluster_model.estimators_
            print(len(cluster_estimators))
            bins = cluster_model.transform_bins(norm_X_train.values)
        
        for idx, i in enumerate(negative_scores): #scores is for X_test specifically
            if idx % 25 == 0:
                print("\n", idx, " out of ", len(negative_scores))
            if idx % 100 == 0:
                print("time elapsed: ", (time.time() - start_time) / 60, " minutes")
                start_time = time.time()
            
            x = X_test.values[i]
            ohe_x = (ohe(x.reshape(1, -1), categorical_names, enc, columns = columns, target_columns = target_columns)[0].values[0])
            print(ohe_x)
            
            num_features = len(x)

            print(explanation_type)
            
            if explanation_type == "baseline":
                ohe_w = ohe_coefficients(baseline_w, X_train.columns, x, enc, categorical_names, target_columns)
                b = baseline_b
                returned_actions, error = get_recourse(ohe_x, action_set, ohe_w, b)
            
            elif "cluster" in explanation_type:
                normalized_x = (x - train_mean) / train_stddev
                estimator_idx = int(cluster_model.transform_bins(np.array([normalized_x]))[0])
                print(estimator_idx)
                coefs = cluster_estimators[estimator_idx].coef_
                intercept = cluster_estimators[estimator_idx].intercept_
                
                w = coefs / train_stddev
                b = intercept - np.sum(w * train_mean) - 0.5 # subtract 0.5 bc using probs as labels                   w, b = 
                ohe_w = ohe_coefficients(w, X_train.columns, x, enc, categorical_names, target_columns)

                returned_actions, error = get_recourse(ohe_x, action_set, ohe_w, b)                
            
            elif explanation_type == "lime":
                w, b, local_pred, accuracy_sampled = get_lime_coefficients(lime_explainer, x, new_predict_proba, num_features, num_samples, X_train.columns, true_index)
                
                ohe_w = ohe_coefficients(w, X_train.columns, x, enc, categorical_names, target_columns)

                returned_actions, error = get_recourse(ohe_x, action_set, ohe_w, b)
                
            elif explanation_type == "maple":
                if use_lime_sampled_maple:
                    inverse_lime_sampled, scaled_binary_lime_sampled, model_preds_sampled, lime_sampled_weights, mean, std = get_lime_sampled(lime_explainer, x, new_predict_proba, num_features, num_samples, X_train.columns, true_index)
#                     un_ohe_lime_sampled = un_ohe(lime_sampled, categorical_names, enc, columns = columns)
                    if use_lime_inverse:
                        lime_sampled = inverse_lime_sampled
                    else:
                        lime_sampled = scaled_binary_lime_sampled
                    w, b, local_pred, accuracy_sampled = get_maple_coefficients(maple_explainer, x, train_mean, train_stddev, lime_sampled = lime_sampled, model_preds_sampled = model_preds_sampled, use_distance_weights = maple_use_distance_weights)
                else:
                    w, b, local_pred, accuracy_sampled = get_maple_coefficients(maple_explainer, x, train_mean, train_stddev, use_distance_weights = maple_use_distance_weights)
                
                ohe_w = ohe_coefficients(w, X_train.columns, x, enc, categorical_names, target_columns)
                returned_actions, error = get_recourse(ohe_x, action_set, ohe_w, b)
                
            elif explanation_type == "piecewise" or explanation_type == "piecewise_maple":
                inverse_lime_sampled, scaled_binary_lime_sampled, model_preds_sampled, lime_sampled_weights, mean, std = get_lime_sampled(lime_explainer, x, new_predict_proba, num_features, num_samples, X_train.columns, true_index)
                if use_lime_inverse:
                    lime_sampled = inverse_lime_sampled
                else:
                    lime_sampled = scaled_binary_lime_sampled
                if explanation_type == "piecewise":
                    estimators = get_piecewise_coefficients(x, lime_sampled, model_preds_sampled, lime_sampled_weights)
                else:
                    estimators = get_piecewise_coefficients_with_maple(maple_explainer, x, lime_sampled = lime_sampled, model_preds_sampled = model_preds_sampled, use_distance_weights = False)
                returned_actions, error = [], []
                
                print("NUM ESTIMATORS: ", len(estimators))
                
                for estimator in estimators:
                    
                    coefs = estimator.coef_
                    intercept = estimator.intercept_
            
                    print(coefs)
                    print(intercept)
            
                    w = coefs / std
                    b = intercept - np.sum(w * mean) - 0.5 # subtract 0.5 bc using probs as labels                    
                    
                    ohe_w = ohe_coefficients(w, X_train.columns, x, enc, categorical_names, target_columns)
                    
                    ra, er = get_recourse(ohe_x, action_set, ohe_w, b)
                    for a in ra:
                        if not any((a == e).all() for e in returned_actions):
                            returned_actions.append(a)
                    error.extend([er])
                
            model_pred = (new_predict_proba([x])[0][true_index])

#             display(pd.DataFrame([x], columns=X_train.columns))
#             display(ohe_x)
#             print(w)
#             print(ohe_w)
                
            recourse = {}
            recourse['idx'] = i
            recourse['instance'] = x
            recourse['model_prob'] = model_pred
#             recourse['local_prob'] = local_pred
            recourse['model_pred'] = 1 if model_pred >= 0.5 else -1
#             recourse['local_pred'] = 1 if local_pred >= 0.5 else -1

#             recourse['scaled_coeff'] = w
#             recourse['scaled_intercept'] = b
            recourse['actions'] = returned_actions
            recourse['error_solving'] = error

            recourse['explanation_type'] = explanation_type

#             recourse['accurate_pred'] = 1 if (recourse['model_pred'] == recourse['local_pred']) else 0
#             recourse['sampled_accuracy'] = accuracy_sampled

            recourse['returned_actions'] = returned_actions
    
            recourses[idx] = recourse

            print_coefs = False

                                    
            no_changes = True
            
            if len(returned_actions) != 0:
                total_instances_with_recourses += 1
                instances_with_recourses.append(i) 
            
            for action in returned_actions:
#                 print("action: ", action)
                new_x = (ohe_x + action)
#                 ohe_new_x, _ = ohe(new_x.reshape(1, -1), categorical_names, enc, columns = columns, target_columns = target_columns)
                
                
                old_pred = recourse['model_pred']
                new_pred = model.predict([new_x])[0]

                new_lime_pred = 1 if np.dot(ohe_w, new_x) + b >= 0.0 else -1
                total_recourses += 1
                
                if i in instances_subset and calculate_subset_accuracy:
                    subset_total_recourses += 1
                
                if old_pred != new_pred:
                    print(get_nonzero_actions(target_columns, action))
                    total_actual_recourses += 1
                    no_changes = False
                    
                    if i in instances_subset and calculate_subset_accuracy:
                        subset_total_actual_recourses +=1
                
            if no_changes:
                print(x)
                    
            print("model_pred: ", recourse['model_pred']) 
#             print("local_pred: ", recourse['local_pred'])
            print("intercept: ", b)

        if explanation_type == "piecewise":
            total_errors = [1 for rec in recourses if (True not in rec['error_solving'])]            
        else:
            total_errors = [1 for rec in recourses if (rec['error_solving'] == True)]   
#         total_accurate_preds = [1 for rec in recourses if (rec['accurate_pred'] == True)]   
#         average_sampled_accuracy = np.mean([rec['sampled_accuracy'] for rec in recourses])
        average_recourses_per_all = np.mean([len(rec['returned_actions']) for rec in recourses])
        average_recourses_per_found = np.mean([len(rec['returned_actions']) for rec in recourses if rec['returned_actions'] != []])

        num_errors = sum(total_errors)
        
        try:
        
            recourse_accuracy = round(total_actual_recourses/total_recourses, 2)
            recourse_accuracy_all = round(total_actual_recourses/(average_recourses_per_found * num_neg_test_preds), 2)
            error_percentage = round(sum(total_errors)/len(recourses), 2)
            
            print("num_neg_test_preds: ", num_neg_test_preds, " out of ", len(scores), " = ", round(num_neg_test_preds/len(scores), 2), file=f)
            print("recourse accuracy: ", recourse_accuracy, "; total instances with recourses found: ", total_instances_with_recourses, file=f)
            print("recourse accuracy (on all instances, assuming recourse and assuming avg per found instance would be found): ", recourse_accuracy_all, file=f)
            if instances_subset != []:
                print("subset recourse accuracy: ", round(subset_total_actual_recourses/subset_total_recourses, 2), "; total instances in subset: ", len(instances_subset), file=f)
                print("subset recourse accuracy (out of total potential): ", round(subset_total_actual_recourses/(len(instances_subset) * 20), 2), "; total instances in subset: ", len(instances_subset), file=f)
            print("number of errors: ", num_errors, "; percent of total instances: ", round(sum(total_errors)/len(recourses), 2), file=f)
            print("average number of recourses per instance: ", round(average_recourses_per_all, 2), file=f)
            print("average number of recourses per instance found: ", round(average_recourses_per_found, 2), file=f)
#             print("number accurate preds (original data): ", sum(total_accurate_preds), "; percent of total instances: ", round(sum(total_accurate_preds)/len(recourses), 2), file=f)
#             print("average accuracy of preds on sampled data: ", average_sampled_accuracy, file=f)

        except ZeroDivisionError as error_msg:
            print(error_msg)
            recourse_accuracy = 0
            recourse_accuracy_all = 0
            error_percentage = 0
            
        stats = {
            'recourse_accuracy': recourse_accuracy,
            'recourse_accuracy_all': recourse_accuracy_all,
            'num_errors': num_errors,
            'total_instances_with_recourses': total_instances_with_recourses,
            'error_percentage': error_percentage
        }
        return instances_with_recourses, stats


In [11]:
def get_data(X, y, test_size = 0.5):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
    X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, test_size=test_size)
    
    data = {
        'X_train': X_train,
        'y_train': y_train,

        'X_val': X_val,
        'y_val': y_val,

        'X_test': X_test,
        'y_test': y_test
    }
    
    return data    

# Experiments

In [12]:
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier
import sys
import os
# import lime.explanation
# import lime.lime_tabular

sys.path.append(os.path.dirname(os.getcwd()))
sys.path.append("../lime_experiments")
    
import explanation
import lime_tabular
import lime_base

import importlib
importlib.reload(explanation)
importlib.reload(lime_tabular)
importlib.reload(lime_base)

<module 'lime_base' from '../lime_experiments/lime_base.py'>

In [13]:
# def run_all_old(file_name, model, enc, data, categorical_features, categorical_names):
    
#     start_time = time.time()
    
#     exp1 = {'explanation_type': 'lime', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': False, 'maple_use_distance_weights': None, 'instances_subset': None}

#     experiments = [exp1]

#     for exp in experiments:
#         lime_instances = calculate_recourse_accuracy(model, data, enc, categorical_features, categorical_names, \
#                                     file_name, explanation_type = exp["explanation_type"], \
#                                     use_lime_sampled_maple = exp['use_lime_sampled_maple'], \
#                                     lime_sample_around_instance = exp['lime_sample_around_instance'], \
#                                     maple_use_distance_weights = exp['maple_use_distance_weights'], instances_subset = exp['instances_subset'])

#     print("TIME FOR EXP1: ", (time.time() - start_time) / 60, " minutes")
#     cluster_exp = {'explanation_type': 'cluster_train', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': False, 'maple_use_distance_weights': None, 'instances_subset': lime_instances}
    
#     exp2 = {'explanation_type': 'piecewise_maple', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': False, 'maple_use_distance_weights': None, 'instances_subset': lime_instances}
#     exp3 = {'explanation_type': 'piecewise', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': False, 'maple_use_distance_weights': None, 'instances_subset': lime_instances}

#     exp4 = {'explanation_type': 'maple', 'use_lime_sampled_maple': True, 'lime_sample_around_instance': False, 'maple_use_distance_weights': True, 'instances_subset': lime_instances}
#     exp5 = {'explanation_type': 'maple', 'use_lime_sampled_maple': True, 'lime_sample_around_instance': False, 'maple_use_distance_weights': False, 'instances_subset': lime_instances}

#     exp6 = {'explanation_type': 'maple', 'use_lime_sampled_maple': False, 'lime_sample_around_instance': False, 'maple_use_distance_weights': False, 'instances_subset': lime_instances}
#     exp7 = {'explanation_type': 'maple', 'use_lime_sampled_maple': False, 'lime_sample_around_instance': False, 'maple_use_distance_weights': True, 'instances_subset': lime_instances}

#     experiments = [cluster_exp, exp2, exp3, exp4, exp5, exp6, exp7]


#     for exp in experiments:
#         _ = calculate_recourse_accuracy(model, data, enc, categorical_features, categorical_names, \
#                                     file_name, explanation_type = exp["explanation_type"], \
#                                     use_lime_sampled_maple = exp['use_lime_sampled_maple'], \
#                                     lime_sample_around_instance = exp['lime_sample_around_instance'], \
#                                     maple_use_distance_weights = exp['maple_use_distance_weights'], instances_subset = exp['instances_subset'])

#     with open(file_name, "a") as f:
#         print("--------------------------------------------", file=f)
#         print("lime_sample_around_instance: TRUE", file=f)

#     exp8 = {'explanation_type': 'lime', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': True, 'maple_use_distance_weights': None, 'instances_subset': None}
#     experiments = [exp7]

#     for exp in experiments:
#         lime_instances = calculate_recourse_accuracy(model, data, enc, categorical_features, categorical_names, \
#                                     file_name, explanation_type = exp["explanation_type"], \
#                                     use_lime_sampled_maple = exp['use_lime_sampled_maple'], \
#                                     lime_sample_around_instance = exp['lime_sample_around_instance'], \
#                                     maple_use_distance_weights = exp['maple_use_distance_weights'], instances_subset = exp['instances_subset'])

#     exp9 = {'explanation_type': 'piecewise', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': True, 'maple_use_distance_weights': None, 'instances_subset': lime_instances}
#     exp10 = {'explanation_type': 'maple', 'use_lime_sampled_maple': True, 'lime_sample_around_instance': True, 'maple_use_distance_weights': True, 'instances_subset': lime_instances}
#     exp11 = {'explanation_type': 'maple', 'use_lime_sampled_maple': True, 'lime_sample_around_instance': True, 'maple_use_distance_weights': False, 'instances_subset': lime_instances}


#     experiments = [exp9, exp10, exp11]

#     for exp in experiments:
#         _ = calculate_recourse_accuracy(model, data, enc, categorical_features, categorical_names, \
#                                     file_name, explanation_type = exp["explanation_type"], \
#                                     use_lime_sampled_maple = exp['use_lime_sampled_maple'], \
#                                     lime_sample_around_instance = exp['lime_sample_around_instance'], \
#                                     maple_use_distance_weights = exp['maple_use_distance_weights'], instances_subset = exp['instances_subset'])

#     print("TOTAL TIME FOR ALL EXPERIMENTS: ", (time.time() - start_time) / 60, " minutes")


In [22]:
def run_all(file_name, model, enc, data, categorical_features, categorical_names):
    
    all_stats = {}
    
    lime = {'label': 'lime', 'explanation_type': 'lime', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': False, 'maple_use_distance_weights': None, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': None}
    lime_kernel_1 = {'label': 'lime', 'explanation_type': 'lime', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': False, 'maple_use_distance_weights': None, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': 1}
    lime_centersampled = {'label': 'lime_centersampled', 'explanation_type': 'lime', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': True, 'maple_use_distance_weights': None, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': None}

    baseline_exp = {'label': 'baseline', 'explanation_type': 'baseline', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': None, 'maple_use_distance_weights': None, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': None}
    cluster_exp0 = {'label': 'cluster_0.05', 'explanation_type': 'cluster_train', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': None, 'maple_use_distance_weights': None, 'instances_subset': None, 'min_samples_leaf': 0.05, 'kernel_width': None}
    cluster_exp1 = {'label': 'cluster_0.1', 'explanation_type': 'cluster_train', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': None, 'maple_use_distance_weights': None, 'instances_subset': None, 'min_samples_leaf': 0.1, 'kernel_width': None}
    cluster_exp2 = {'label': 'cluster_0.15', 'explanation_type': 'cluster_train', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': None, 'maple_use_distance_weights': None, 'instances_subset': None, 'min_samples_leaf': 0.15, 'kernel_width': None}
    cluster_exp3 = {'label': 'cluster_0.2', 'explanation_type': 'cluster_train', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': None, 'maple_use_distance_weights': None, 'instances_subset': None, 'min_samples_leaf': 0.2, 'kernel_width': None}
    cluster_exp4 = {'label': 'cluster_0.25', 'explanation_type': 'cluster_train', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': None, 'maple_use_distance_weights': None, 'instances_subset': None, 'min_samples_leaf': 0.25, 'kernel_width': None}

    maple = {'label': 'maple', 'explanation_type': 'maple', 'use_lime_sampled_maple': False, 'lime_sample_around_instance': None, 'maple_use_distance_weights': False, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': None}
    maple_centersampled = {'label': 'maple_centersampled', 'explanation_type': 'maple', 'use_lime_sampled_maple': True, 'lime_sample_around_instance': True, 'maple_use_distance_weights': False, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': None}
    maple_sampled = {'label': 'maple_sampled', 'explanation_type': 'maple', 'use_lime_sampled_maple': True, 'lime_sample_around_instance': False, 'maple_use_distance_weights': False, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': None}
    maple_distance = {'label': 'maple_distance', 'explanation_type': 'maple', 'use_lime_sampled_maple': False, 'lime_sample_around_instance': None, 'maple_use_distance_weights': True, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': None}
    maple_centersampled_distance = {'label': 'maple_centersampled_distance', 'explanation_type': 'maple', 'use_lime_sampled_maple': True, 'lime_sample_around_instance': True, 'maple_use_distance_weights': True, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': None}
    maple_sampled_distance = {'label': 'maple_sampled_distance', 'explanation_type': 'maple', 'use_lime_sampled_maple': True, 'lime_sample_around_instance': False, 'maple_use_distance_weights': True, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': None}

    piecewise_maple = {'label': 'piecewise_maple', 'explanation_type': 'piecewise_maple', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': False, 'maple_use_distance_weights': None, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': None}
    piecewise = {'label': 'piecewise', 'explanation_type': 'piecewise', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': False, 'maple_use_distance_weights': None, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': None}
    piecewise_centersampled = {'label': 'piecewise_centersampled', 'explanation_type': 'piecewise', 'use_lime_sampled_maple': None, 'lime_sample_around_instance': True, 'maple_use_distance_weights': None, 'instances_subset': None, 'min_samples_leaf': None, 'kernel_width': None}
    
    start_time = time.time()
    
    experiments = [lime, lime_kernel_1, maple, maple_sampled]
    
    for exp in experiments:
        exp_start_time = time.time()
        lime_instances, stats = calculate_recourse_accuracy(model, data, enc, categorical_features, categorical_names, \
                                    file_name, explanation_type = exp["explanation_type"], \
                                    use_lime_sampled_maple = exp['use_lime_sampled_maple'], \
                                    lime_sample_around_instance = exp['lime_sample_around_instance'], \
                                    maple_use_distance_weights = exp['maple_use_distance_weights'], \
                                    instances_subset = exp['instances_subset'], min_samples_leaf = exp['min_samples_leaf'], \
                                    kernel_width = exp['kernel_width'])
        all_stats[exp['label']] = stats
        print("TIME FOR EXP: ", (time.time() - exp_start_time) / 60, " minutes")
    
    experiments = [baseline_exp, cluster_exp0, cluster_exp1, cluster_exp2]
    
    with open(file_name, "a") as f:
        print("--------------------------------------------", file=f)
        print("BASELINES", file=f)
        
    for exp in experiments:
        lime_instances, stats = calculate_recourse_accuracy(model, data, enc, categorical_features, categorical_names, \
                                    file_name, explanation_type = exp["explanation_type"], \
                                    use_lime_sampled_maple = exp['use_lime_sampled_maple'], \
                                    lime_sample_around_instance = exp['lime_sample_around_instance'], \
                                    maple_use_distance_weights = exp['maple_use_distance_weights'], \
                                    instances_subset = exp['instances_subset'], min_samples_leaf = exp['min_samples_leaf'], \
                                    kernel_width = exp['kernel_width'])
        all_stats[exp['label']] = stats
        print("TIME FOR EXP: ", (time.time() - exp_start_time) / 60, " minutes")

#     experiments = [cluster_exp2, cluster_exp3, cluster_exp4, cluster_exp5, lime_centersampled, maple_centersampled, maple_distance, maple_centersampled_distance, maple_sampled_distance, piecewise_maple, piecewise, piecewise_centersampled]

#     with open(file_name, "a") as f:
#         print("--------------------------------------------", file=f)
#         print("OTHER EXPERIMENTS", file=f)

#     for exp in experiments:
#         lime_instances, stats = calculate_recourse_accuracy(model, data, enc, categorical_features, categorical_names, \
#                                     file_name, explanation_type = exp["explanation_type"], \
#                                     use_lime_sampled_maple = exp['use_lime_sampled_maple'], \
#                                     lime_sample_around_instance = exp['lime_sample_around_instance'], \
#                                     maple_use_distance_weights = exp['maple_use_distance_weights'], \
#                                     instances_subset = exp['instances_subset'], cluster_ratio = exp['cluster_ratio'])
#         all_stats[exp['label']] = stats
#         print("TIME FOR EXP: ", (time.time() - exp_start_time) / 60, " minutes")

    print("TOTAL TIME FOR ALL EXPERIMENTS: ", (time.time() - start_time) / 60, " minutes")
    
    return all_stats

In [15]:
import matplotlib.pyplot as plt

def plot_metric(all_stats, metric, order, title, xticks, colors = None):
    
    sns.set(rc={'figure.figsize':(8,6)})
    sns.set_style("whitegrid")
   
    plot_dict = {label: results_dict[metric] for label, results_dict in all_stats.items()}
    plot_df = pd.DataFrame.from_dict(plot_dict, orient='index', columns=[metric])
    plot_df = plot_df.loc[order]
    
    ax = sns.barplot(x=plot_df.index, y=plot_df[metric], order=order, palette = colors)
    plt.setp(ax.get_xticklabels(), rotation=90)
    plt.ylim(0, 1)
    plt.ylabel("Accuracy")
    plt.title(title, fontsize = 18)
    ax.set_xticklabels(xticks, fontsize=16)
    

## COMPAS

In [24]:
compas_data = get_data(compas_X, compas_y, test_size = 0.3)


In [None]:
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier
import lime.explanation
import lime.lime_tabular
from IPython.display import clear_output

# NEURAL NETWORK
for i in range(1, 10):

    compas_nn = MLPClassifier(max_iter = 1000)

    ohe_compas_X_train, _ = ohe(compas_data['X_train'], compas_categorical_names, compas_enc)
    ohe_compas_X_val, _ = ohe(compas_data['X_val'], compas_categorical_names, compas_enc)
    ohe_compas_X_test, _ = ohe(compas_data['X_test'], compas_categorical_names, compas_enc)

    compas_nn.fit(ohe_compas_X_train, compas_data['y_train']) 
    print("validation score: ", round(compas_nn.score(ohe_compas_X_val, compas_data['y_val']), 2))
    test_preds = pd.Series(compas_nn.predict(ohe_compas_X_test))
    print("test predictions split: ")
    print(test_preds.value_counts())

    compas_nn_stats = run_all("results/un_normalized_scale_sampled/" + str(i) + "-compas_nn.txt", compas_nn, compas_enc, compas_data, \
                          compas_categorical_features, compas_categorical_names)
    clear_output(wait=True)

validation score:  0.84
test predictions split: 
 1    618
-1     66
dtype: int64
num unique preds:  2
 1    618
-1     66
dtype: int64


age                34.622434
two-year-recid      0.000000
juv-misd-count      0.096175
juv-fel-count       0.074130
juv-other-count     0.113464
c-charge-degree     0.000000
race                0.000000
sex                 0.000000
priors-count        3.481522
length-of-stay     18.416685
dtype: float64

+-------------------+---------------+---------+------------+----------------+----------------+-----------+-----------+-----------+------+-------+
|              name | variable type | mutable | actionable | step direction | flip direction | grid size | step type | step size |   lb |    ub |
+-------------------+---------------+---------+------------+----------------+----------------+-----------+-----------+-----------+------+-------+
|               age | <class 'int'> |    True |       True |              0 |            nan |        48 |  relative |      0.01 | 20.0 |  67.0 |
|    juv-misd-count | <class 'int'> |    True |       True |              0 |            nan |         3 |  relative |      0.01 |  0.0 |   2.0 |
|     juv-fel-count | <class 'int'> |    True |       True |              0 |            nan |         3 |  relative |      0.01 |  0.0 |   2.0 |
|   juv-other-count | <class 'int'> |    True |       True |              0 |            nan |         3 |  relative |      

4627

 0  out of  66
time elapsed:  6.733338038126628e-05  minutes
[24.  0.  0.  0.  7.  0.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.12509742387176487), (8, -0.1079750374343984), (6, -0.09814296693434978), (1, -0.09189959364632884), (9, -0.037191436360755524), (3, -0.0345303901254351), (2, -0.028632477543561952), (4, -0.01342588096380436), (7, -0.010489174276334556), (5, -0.006011018338144834)]
populating...


  return bound(*args, **kwds)


recovered all minimum-cost items
obtained 0 items in 0.1 seconds
[24.  1.  0.  0.  0.  0.  0.  1.  7.  0.]
model_pred:  -1
intercept:  0.13409089215984804
[30.  1.  0.  0. 14.  2.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.13240426031104138), (8, -0.11154489004584724), (6, -0.10219457240334957), (1, -0.09007152922580451), (3, -0.040941036502158634), (9, -0.03303345025183523), (2, -0.029426135175847093), (4, -0.016535368630000758), (7, -0.0072384906020507775), (5, -0.006881946247523801)]
populating...
obtained 20 items in 1.0 seconds
{'age': 13.0}
{'age': 11.0, 'priors-count': -1.0}
{'age': 13.0, 'length-of-stay': -3.0}
{'age': 11.0, 'priors-count': -1.0, 'length-of-stay': -3.0}
{'priors-count': -7.0}
{'age': 13.0, 'race_1': 1.0}
{'age': 11.0, 'priors-count': -1.0, 'race_1': 1.0}
{'age': 13.0, 'length-of-stay': -3.0, 'race_1': 1.0}
model_pred:  -1
intercept:  0.1149104887883885
[ 21.   1.   0.   0.   1. 456.   0.   1.   1.   0.   1.   0.   0.   0.
   0. 

Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


obtained 20 items in 1.1 seconds
{'age': 18.0}
{'age': 16.0, 'priors-count': -1.0}
{'age': 12.0, 'juv-misd-count': -1.0}
{'age': 12.0, 'juv-misd-count': -1.0, 'length-of-stay': -3.0}
{'age': 10.0, 'juv-misd-count': -1.0, 'priors-count': -1.0}
{'age': 10.0, 'juv-misd-count': -1.0, 'priors-count': -1.0, 'length-of-stay': -3.0}
{'juv-misd-count': -1.0, 'priors-count': -5.0, 'length-of-stay': -27.0}
{'age': 13.0, 'juv-other-count': -1.0, 'length-of-stay': -6.0}
{'age': 14.0, 'juv-other-count': -1.0}
{'age': 11.0, 'juv-other-count': -1.0, 'priors-count': -1.0, 'length-of-stay': -6.0}
{'age': 12.0, 'juv-other-count': -1.0, 'priors-count': -1.0}
{'age': 17.0, 'length-of-stay': -6.0, 'race_4': 1.0}
{'age': 18.0, 'race_4': 1.0}
{'age': 15.0, 'priors-count': -1.0, 'length-of-stay': -6.0, 'race_4': 1.0}
{'age': 16.0, 'priors-count': -1.0, 'race_4': 1.0}
model_pred:  -1
intercept:  0.11331155716558383
[27.  0.  0.  0. 11.  2.  0.  1.  0.  1.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0

recovered all minimum-cost items
obtained 0 items in 0.0 seconds
[20.  1.  1.  0.  1.  0.  0.  1.  1.  5.]
model_pred:  -1
intercept:  0.13795504492386912
[28.  0.  0.  0. 25. 25.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.1419446542985778), (8, -0.1124310062160429), (6, -0.11029608797503963), (1, -0.10016221662409278), (3, -0.041211939391673585), (9, -0.028141806639134816), (2, -0.026390180044058662), (4, -0.019572032301363285), (5, -0.00951425522678364), (7, -0.00522888992096534)]
populating...
obtained 20 items in 1.1 seconds
{'age': 27.0, 'priors-count': -3.0}
{'age': 26.0, 'priors-count': -3.0, 'length-of-stay': -5.0}
{'age': 32.0}
{'age': 32.0, 'length-of-stay': -2.0}
{'priors-count': -17.0}
{'priors-count': -17.0, 'length-of-stay': -2.0}
{'age': 27.0, 'priors-count': -3.0, 'race_1': 1.0}
{'age': 26.0, 'priors-count': -3.0, 'length-of-stay': -5.0, 'race_1': 1.0}
{'age': 32.0, 'race_1': 1.0}
{'age': 32.0, 'length-of-stay': -2.0, 'race_1': 1.0}
{'ag

obtained 20 items in 1.1 seconds
{'age': 22.0}
{'age': 20.0, 'priors-count': -1.0}
{'age': 16.0, 'juv-misd-count': -1.0}
{'age': 14.0, 'juv-misd-count': -1.0, 'priors-count': -1.0}
{'age': 22.0, 'length-of-stay': -3.0}
{'age': 20.0, 'priors-count': -1.0, 'length-of-stay': -3.0}
{'juv-misd-count': -1.0, 'priors-count': -9.0}
{'age': 16.0, 'juv-misd-count': -1.0, 'length-of-stay': -3.0}
{'age': 14.0, 'juv-misd-count': -1.0, 'priors-count': -1.0, 'length-of-stay': -3.0}
{'juv-misd-count': -1.0, 'priors-count': -9.0, 'length-of-stay': -3.0}
{'priors-count': -10.0, 'length-of-stay': -45.0}
{'age': 22.0, 'race_4': 1.0}
{'age': 20.0, 'priors-count': -1.0, 'race_4': 1.0}
{'age': 22.0, 'race_1': 1.0}
{'age': 20.0, 'priors-count': -1.0, 'race_1': 1.0}
{'age': 16.0, 'juv-misd-count': -1.0, 'race_4': 1.0}
{'age': 14.0, 'juv-misd-count': -1.0, 'priors-count': -1.0, 'race_4': 1.0}
{'age': 22.0, 'length-of-stay': -3.0, 'race_4': 1.0}
{'age': 16.0, 'juv-misd-count': -1.0, 'race_1': 1.0}
{'age': 20.0, 

Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


obtained 20 items in 0.8 seconds
{'age': 8.0}
{'age': 8.0, 'race_1': 1.0}
{'age': 6.0, 'priors-count': -1.0, 'race_1': 1.0}
model_pred:  -1
intercept:  0.13028149966339264
[57.  0.  0.  0. 28.  2.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.12825769478620255), (6, -0.1075949801857605), (8, -0.10591545795666735), (1, -0.09065183435623479), (3, -0.04172951168299823), (9, -0.022678011636234646), (2, -0.02135423302864184), (5, -0.01621687245847201), (4, -0.014413229778892464), (7, -0.0072378613658451605)]
populating...
obtained 20 items in 0.8 seconds
{'priors-count': -6.0}
{'priors-count': -6.0, 'length-of-stay': -3.0}
{'age': 1.0, 'priors-count': -6.0}
{'age': 1.0, 'priors-count': -6.0, 'length-of-stay': -3.0}
{'age': 9.0}
{'age': 9.0, 'length-of-stay': -3.0}
{'priors-count': -6.0, 'race_1': 1.0}
{'priors-count': -6.0, 'length-of-stay': -3.0, 'race_1': 1.0}
{'age': 1.0, 'priors-count': -6.0, 'race_1': 1.0}
{'age': 1.0, 'priors-count': -6.0, 'length-of-stay

obtained 20 items in 0.9 seconds
[27.  1.  0.  0.  0.  0.  0.  1.  9. 52.]
model_pred:  -1
intercept:  0.11471249846908482
[32.  0.  0.  0. 16.  0.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.13293206989162382), (8, -0.11152972851065285), (6, -0.1013192921254467), (1, -0.09367865974669742), (3, -0.041758724527351125), (9, -0.030904483929106233), (2, -0.029087047622508855), (7, -0.016292248518912752), (4, -0.01565033286222925), (5, -0.006792736966856002)]
populating...
obtained 20 items in 1.0 seconds
{'age': 10.0}
{'age': 10.0, 'length-of-stay': -1.0}
{'age': 8.0, 'priors-count': -1.0}
{'age': 8.0, 'priors-count': -1.0, 'length-of-stay': -1.0}
{'age': 10.0, 'race_1': 1.0}
{'age': 10.0, 'length-of-stay': -1.0, 'race_1': 1.0}
{'age': 8.0, 'priors-count': -1.0, 'race_1': 1.0}
{'age': 8.0, 'priors-count': -1.0, 'length-of-stay': -1.0, 'race_1': 1.0}
{'priors-count': -5.0, 'race_1': 1.0}
{'priors-count': -5.0, 'length-of-stay': -1.0, 'race_1': 1.0}
{'age': 10

obtained 20 items in 0.8 seconds
{'juv-other-count': -1.0, 'priors-count': -4.0}
{'juv-other-count': -1.0, 'length-of-stay': -106.0}
model_pred:  -1
intercept:  0.005173288379467622
[27.  0.  0.  0.  9.  0.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.1272357236643967), (8, -0.10602164367333058), (6, -0.09459207594721211), (1, -0.08900174044202565), (3, -0.03871227313398201), (9, -0.036871038116543), (2, -0.027965687369169927), (4, -0.013661970846044936), (5, -0.007497183544008985), (7, -0.003569665293873893)]
populating...
recovered all minimum-cost items
obtained 0 items in 0.0 seconds
[27.  1.  0.  0.  0.  0.  0.  1.  9.  0.]
model_pred:  -1
intercept:  0.12143128788740631
[20.  0.  0.  0.  3. 21.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.12432688689949019), (8, -0.10694194959609199), (6, -0.09800152030138647), (1, -0.09090630485473516), (3, -0.036306151057722565), (9, -0.036009269204464005), (2, -0.02541802630690416),

obtained 20 items in 0.8 seconds
{'age': 8.0, 'race_4': 1.0}
{'age': 8.0, 'length-of-stay': -2.0, 'race_4': 1.0}
{'age': 6.0, 'priors-count': -1.0, 'race_4': 1.0}
{'age': 6.0, 'priors-count': -1.0, 'length-of-stay': -2.0, 'race_4': 1.0}
{'priors-count': -4.0, 'race_4': 1.0}
{'priors-count': -4.0, 'length-of-stay': -2.0, 'race_4': 1.0}
{'age': 8.0, 'race_5': 1.0}
{'age': 8.0, 'length-of-stay': -2.0, 'race_5': 1.0}
model_pred:  -1
intercept:  0.12773929811802076
[22.  0.  1.  0. 12. 35.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.1366599979277575), (8, -0.114456201216834), (6, -0.09877462667788292), (1, -0.09255158066372968), (3, -0.04082001221645286), (2, -0.028068683665834978), (9, -0.027891564636244842), (7, -0.01531820050117478), (4, -0.014340794329848583), (5, -0.010251942500398164)]
populating...
obtained 20 items in 0.9 seconds
{'age': 20.0}
{'age': 18.0, 'priors-count': -1.0}
{'age': 20.0, 'length-of-stay': -3.0}
{'age': 18.0, 'priors-count': -1.0,

age                34.622434
two-year-recid      0.000000
juv-misd-count      0.096175
juv-fel-count       0.074130
juv-other-count     0.113464
c-charge-degree     0.000000
race                0.000000
sex                 0.000000
priors-count        3.481522
length-of-stay     18.416685
dtype: float64

+-------------------+---------------+---------+------------+----------------+----------------+-----------+-----------+-----------+------+-------+
|              name | variable type | mutable | actionable | step direction | flip direction | grid size | step type | step size |   lb |    ub |
+-------------------+---------------+---------+------------+----------------+----------------+-----------+-----------+-----------+------+-------+
|               age | <class 'int'> |    True |       True |              0 |            nan |        48 |  relative |      0.01 | 20.0 |  67.0 |
|    juv-misd-count | <class 'int'> |    True |       True |              0 |            nan |         3 |  relative |      0.01 |  0.0 |   2.0 |
|     juv-fel-count | <class 'int'> |    True |       True |              0 |            nan |         3 |  relative |      0.01 |  0.0 |   2.0 |
|   juv-other-count | <class 'int'> |    True |       True |              0 |            nan |         3 |  relative |      

4627

 0  out of  66
time elapsed:  1.5584627787272134e-05  minutes
[24.  0.  0.  0.  7.  0.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.15584266636220162), (8, -0.13084817634812185), (6, -0.11809095543378098), (1, -0.10389580085804832), (3, -0.05146492444818474), (9, -0.033671352289860845), (2, -0.0320888422446752), (5, -0.023273303097080176), (7, -0.01375074404058225), (4, -0.013488332939531772)]
populating...


  return bound(*args, **kwds)


obtained 20 items in 0.7 seconds
{'age': 1.0, 'priors-count': -1.0}
{'age': 1.0, 'priors-count': -1.0, 'length-of-stay': -1.0}
{'age': 1.0, 'race_1': 1.0}
{'age': 1.0, 'length-of-stay': -1.0, 'race_1': 1.0}
{'priors-count': -1.0, 'race_1': 1.0}
{'priors-count': -1.0, 'length-of-stay': -1.0, 'race_1': 1.0}
{'age': 1.0, 'priors-count': -1.0, 'race_1': 1.0}
{'age': 1.0, 'priors-count': -1.0, 'length-of-stay': -1.0, 'race_1': 1.0}
{'age': 1.0, 'race_5': 1.0}
{'age': 1.0, 'length-of-stay': -1.0, 'race_5': 1.0}
model_pred:  -1
intercept:  0.11770738715941231
[30.  1.  0.  0. 14.  2.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.17004860432055174), (6, -0.1268987543973381), (8, -0.12582783657598057), (1, -0.11770429484410848), (3, -0.05580259345004445), (2, -0.030597385604729806), (5, -0.027256403730685178), (4, -0.022766737519110514), (7, -0.021548907032735376), (9, 0.010390029580845481)]
populating...
obtained 20 items in 0.9 seconds
{'age': 15.0}
{'age': 13.0,

Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


obtained 20 items in 0.9 seconds
{'age': 4.0}
{'age': 2.0, 'priors-count': -1.0}
{'age': 3.0, 'length-of-stay': 3.0}
{'priors-count': -2.0}
{'age': 1.0, 'priors-count': -1.0, 'length-of-stay': 3.0}
{'priors-count': -2.0, 'length-of-stay': 3.0}
{'age': 4.0, 'race_1': 1.0}
{'age': 2.0, 'priors-count': -1.0, 'race_1': 1.0}
{'age': 3.0, 'length-of-stay': 3.0, 'race_1': 1.0}
{'priors-count': -2.0, 'race_1': 1.0}
{'age': 1.0, 'priors-count': -1.0, 'length-of-stay': 3.0, 'race_1': 1.0}
{'priors-count': -2.0, 'length-of-stay': 3.0, 'race_1': 1.0}
{'age': 4.0, 'race_5': 1.0}
model_pred:  -1
intercept:  0.0007081626167260158
[26.  0.  0.  0.  8.  1.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.15333358143993828), (8, -0.12409055927326729), (6, -0.12092214145443328), (1, -0.10397736958859809), (3, -0.050201144984320545), (9, -0.03350228681867016), (2, -0.03055366886104044), (4, -0.014447828880348708), (5, -0.014187551478979191), (7, -0.007619175208043604)]
populatin

Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


obtained 20 items in 0.8 seconds
{'age': 8.0}
{'age': 8.0, 'length-of-stay': -2.0}
{'age': 7.0, 'priors-count': -1.0}
{'age': 7.0, 'priors-count': -1.0, 'length-of-stay': -2.0}
{'priors-count': -5.0}
{'priors-count': -5.0, 'length-of-stay': -2.0}
{'age': 8.0, 'race_1': 1.0}
{'age': 8.0, 'length-of-stay': -2.0, 'race_1': 1.0}
{'age': 7.0, 'priors-count': -1.0, 'race_1': 1.0}
{'age': 7.0, 'priors-count': -1.0, 'length-of-stay': -2.0, 'race_1': 1.0}
{'priors-count': -5.0, 'race_1': 1.0}
{'priors-count': -5.0, 'length-of-stay': -2.0, 'race_1': 1.0}
{'age': 8.0, 'race_5': 1.0}
{'age': 8.0, 'length-of-stay': -2.0, 'race_5': 1.0}
model_pred:  -1
intercept:  0.04763422340498891
[ 21.   2.   0.   6.   8. 243.   0.   1.   1.   0.   1.   0.   0.   0.
   0.   0.   0.   1.]
lime
exp.local_exp: [(4, -2.4898031345293405e-24), (9, -9.671936329737957e-25), (2, -8.017821285715674e-25), (0, 3.024441535490536e-25), (8, 8.927425768423173e-26), (3, -4.3873748127367473e-26), (1, -1.831727030334632e-27), (5, 

Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


obtained 20 items in 0.9 seconds
{'age': 29.0, 'juv-misd-count': -1.0, 'priors-count': -5.0, 'length-of-stay': -129.0}
{'age': 36.0, 'juv-misd-count': -1.0, 'length-of-stay': -144.0}
{'age': 27.0, 'juv-misd-count': -1.0, 'juv-fel-count': 1.0, 'priors-count': -4.0, 'length-of-stay': -132.0}
{'age': 34.0, 'juv-misd-count': -1.0, 'juv-fel-count': 1.0, 'length-of-stay': -138.0}
{'age': 39.0, 'juv-fel-count': 1.0, 'priors-count': -5.0, 'length-of-stay': -150.0}
{'age': 41.0, 'priors-count': -5.0, 'length-of-stay': -156.0}
{'age': 11.0, 'juv-other-count': -2.0}
{'age': 9.0, 'juv-other-count': -2.0, 'priors-count': -1.0}
{'juv-misd-count': -1.0, 'juv-other-count': -2.0}
{'juv-misd-count': -1.0, 'juv-other-count': -2.0, 'length-of-stay': -3.0}
{'age': 1.0, 'juv-misd-count': -1.0, 'juv-other-count': -2.0}
{'age': 1.0, 'juv-misd-count': -1.0, 'juv-other-count': -2.0, 'length-of-stay': -3.0}
{'juv-misd-count': -1.0, 'juv-other-count': -2.0, 'priors-count': -1.0}
{'juv-misd-count': -1.0, 'juv-othe

obtained 20 items in 0.9 seconds
{'age': 6.0, 'priors-count': -1.0}
{'age': 9.0}
{'age': 8.0, 'length-of-stay': -3.0}
{'age': 6.0, 'priors-count': -1.0, 'length-of-stay': -3.0}
{'priors-count': -4.0}
{'priors-count': -4.0, 'length-of-stay': -3.0}
{'juv-other-count': -1.0}
{'age': 1.0, 'juv-other-count': -1.0}
{'juv-other-count': -1.0, 'length-of-stay': -3.0}
{'juv-other-count': -1.0, 'priors-count': -1.0}
{'age': 1.0, 'juv-other-count': -1.0, 'length-of-stay': -3.0}
{'age': 1.0, 'juv-other-count': -1.0, 'priors-count': -1.0}
{'juv-other-count': -1.0, 'priors-count': -1.0, 'length-of-stay': -3.0}
{'age': 1.0, 'juv-other-count': -1.0, 'priors-count': -1.0, 'length-of-stay': -3.0}
{'age': 6.0, 'priors-count': -1.0, 'race_4': 1.0}
{'age': 9.0, 'race_4': 1.0}
{'age': 8.0, 'length-of-stay': -3.0, 'race_4': 1.0}
{'age': 6.0, 'priors-count': -1.0, 'length-of-stay': -3.0, 'race_4': 1.0}
{'age': 6.0, 'priors-count': -1.0, 'race_1': 1.0}
model_pred:  -1
intercept:  0.146014965537268
[20.  1.  0. 

Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


obtained 20 items in 1.0 seconds
{'age': 1.0, 'priors-count': -21.0, 'length-of-stay': -5.0}
{'priors-count': -21.0, 'length-of-stay': -11.0}
{'priors-count': -22.0}
{'age': 3.0, 'priors-count': -21.0}
{'age': 1.0, 'priors-count': -21.0, 'length-of-stay': -5.0, 'race_4': 1.0}
{'priors-count': -21.0, 'length-of-stay': -11.0, 'race_4': 1.0}
{'priors-count': -22.0, 'race_4': 1.0}
{'age': 3.0, 'priors-count': -21.0, 'race_4': 1.0}
{'age': 1.0, 'priors-count': -21.0, 'length-of-stay': -5.0, 'race_1': 1.0}
{'priors-count': -21.0, 'length-of-stay': -11.0, 'race_1': 1.0}
{'priors-count': -22.0, 'race_1': 1.0}
{'age': 3.0, 'priors-count': -21.0, 'race_1': 1.0}
{'age': 1.0, 'priors-count': -21.0, 'length-of-stay': -5.0, 'race_5': 1.0}
{'priors-count': -21.0, 'length-of-stay': -11.0, 'race_5': 1.0}
{'priors-count': -22.0, 'race_5': 1.0}
{'age': 3.0, 'priors-count': -21.0, 'race_5': 1.0}
{'age': 1.0, 'priors-count': -21.0, 'length-of-stay': -5.0, 'race_3': 1.0}
{'priors-count': -21.0, 'length-of-s

Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


obtained 20 items in 1.0 seconds
{'age': 10.0}
{'age': 8.0, 'priors-count': -1.0}
{'age': 10.0, 'length-of-stay': 2.0}
{'age': 8.0, 'priors-count': -1.0, 'length-of-stay': 2.0}
{'priors-count': -6.0}
{'priors-count': -5.0, 'length-of-stay': 5.0}
{'age': 10.0, 'race_1': 1.0}
{'age': 8.0, 'priors-count': -1.0, 'race_1': 1.0}
{'age': 10.0, 'length-of-stay': 2.0, 'race_1': 1.0}
{'age': 8.0, 'priors-count': -1.0, 'length-of-stay': 2.0, 'race_1': 1.0}
{'priors-count': -6.0, 'race_1': 1.0}
{'priors-count': -5.0, 'length-of-stay': 5.0, 'race_1': 1.0}
{'age': 10.0, 'race_5': 1.0}
{'age': 8.0, 'priors-count': -1.0, 'race_5': 1.0}
model_pred:  -1
intercept:  0.05163133990131019

 25  out of  66
[27.  1.  0.  0.  9.  1.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.16537319209405657), (8, -0.12147380064989464), (6, -0.11621099450050248), (1, -0.10914088766850456), (3, -0.05356898452487894), (9, -0.030980605611156387), (2, -0.030621189803337102), (4, -0.013794816796875

Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


obtained 20 items in 0.9 seconds
{'priors-count': -12.0}
{'age': -2.0, 'priors-count': -11.0}
{'age': -1.0, 'priors-count': -11.0, 'length-of-stay': 3.0}
{'priors-count': -12.0, 'length-of-stay': 3.0}
{'juv-misd-count': 1.0, 'priors-count': -10.0}
{'age': -1.0, 'juv-misd-count': 1.0, 'priors-count': -10.0}
{'juv-misd-count': 1.0, 'priors-count': -10.0, 'length-of-stay': 3.0}
{'juv-other-count': 1.0, 'priors-count': -11.0}
{'age': -1.0, 'juv-other-count': 1.0, 'priors-count': -11.0}
{'juv-other-count': 1.0, 'priors-count': -11.0, 'length-of-stay': 3.0}
{'juv-misd-count': 1.0, 'juv-other-count': 1.0, 'priors-count': -10.0}
{'juv-misd-count': 1.0, 'juv-other-count': 1.0, 'priors-count': -10.0, 'length-of-stay': 3.0}
model_pred:  -1
intercept:  0.3185182055247988
[ 32.   6.   0.   5.  17. 112.   0.   1.   1.   0.   1.   0.   0.   0.
   0.   0.   0.   1.]
lime
exp.local_exp: [(2, -9.7670754955698e-36), (4, -8.223709309877351e-36), (8, -2.3537705614462588e-36), (0, 1.2172176762100138e-36), (

Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


obtained 20 items in 0.8 seconds
{'age': 10.0}
{'age': 10.0, 'length-of-stay': -2.0}
{'age': 9.0, 'priors-count': -1.0}
{'age': 9.0, 'priors-count': -1.0, 'length-of-stay': -2.0}
{'priors-count': -6.0}
{'priors-count': -6.0, 'length-of-stay': -2.0}
{'age': 7.0, 'juv-misd-count': -1.0}
{'age': 5.0, 'juv-misd-count': -1.0, 'priors-count': -1.0}
{'age': 7.0, 'juv-misd-count': -1.0, 'length-of-stay': -2.0}
{'age': 5.0, 'juv-misd-count': -1.0, 'priors-count': -1.0, 'length-of-stay': -2.0}
{'juv-misd-count': -1.0, 'priors-count': -4.0}
{'juv-misd-count': -1.0, 'priors-count': -4.0, 'length-of-stay': -2.0}
{'age': 10.0, 'race_1': 1.0}
{'age': 10.0, 'length-of-stay': -2.0, 'race_1': 1.0}
{'age': 9.0, 'priors-count': -1.0, 'race_1': 1.0}
model_pred:  -1
intercept:  0.08569222029390366
[57.  0.  0.  0. 28.  2.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(8, -0.06070244457417272), (1, -0.013909955749590194), (6, -0.011731482538960808), (9, 0.009433513115053345), (5, -0.0

Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


obtained 20 items in 0.9 seconds
{'priors-count': -5.0}
{'age': 8.0, 'race_4': 1.0}
{'age': 8.0, 'length-of-stay': -2.0, 'race_4': 1.0}
{'age': 6.0, 'priors-count': -1.0, 'race_4': 1.0}
{'age': 6.0, 'priors-count': -1.0, 'length-of-stay': -2.0, 'race_4': 1.0}
{'priors-count': -4.0, 'length-of-stay': -14.0, 'race_4': 1.0}
{'priors-count': -5.0, 'race_4': 1.0}
{'age': 8.0, 'race_5': 1.0}
{'age': 8.0, 'length-of-stay': -2.0, 'race_5': 1.0}
model_pred:  -1
intercept:  0.06515032219756822
[41.  0.  2.  0. 18. 33.  0.  1.  0.  1.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(8, -0.06193439063929131), (0, 0.038473350036572665), (3, -0.03761301233575655), (6, -0.03218362574896091), (2, -0.018529087805515015), (1, -0.015080768739102619), (4, -0.01428635105487572), (5, -0.00993389187500086), (9, 0.008694361668267642), (7, -0.004433029679862389)]
populating...
obtained 20 items in 0.9 seconds
{'age': 2.0, 'priors-count': -10.0}
{'age': 2.0, 'priors-count': -10.0, 'length-of-stay': 2.0}
{

    model=unknown;
        message from solver=<undefined>
excepting...
error:  solver status is not OK
None
[34.  0.  6.  0.  1.  1.  0.  1. 19.  1.]
model_pred:  -1
intercept:  -0.36548405067721623
[31.  0.  5.  0. 10.  4.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(3, -3.287226329909546e-08), (7, -5.5010370849867366e-09), (2, 4.837810537438676e-09), (9, 3.6813336373168662e-09), (4, -3.6675765640545046e-09), (0, 2.299751178502859e-09), (8, -1.7030349677403606e-09), (6, -3.675552300003964e-10), (1, -3.4024203851765486e-10), (5, -3.021207279182969e-10)]
populating...


Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


    model=unknown;
        message from solver=<undefined>
excepting...
error:  solver status is not OK
None
[31.  1.  0.  5.  0.  0.  0.  1. 10.  4.]
model_pred:  -1
intercept:  -0.44262314662972607

 50  out of  66
[34.  1.  3.  0.  9.  0.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(3, -0.02715625456329618), (2, -0.008698655195451746), (8, -0.008135043346687567), (6, -0.005207284862971278), (1, -0.003882665686106557), (5, -0.00304634791199406), (9, 0.0024046647065826383), (7, -0.0019442806732810221), (4, -0.00114057725562802), (0, -0.00044112391728022327)]
populating...


Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


    model=unknown;
        message from solver=<undefined>
excepting...
error:  solver status is not OK
None
[34.  1.  1.  3.  0.  0.  0.  1.  9.  0.]
model_pred:  -1
intercept:  -0.15376355522509794


Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


[ 47.   0.   0.   0.  23. 403.   0.   1.   1.   0.   1.   0.   0.   0.
   0.   0.   0.   1.]
lime
exp.local_exp: [(9, -2.2668118365602113e-07), (8, -1.3869105350655698e-07), (1, -2.855044275915507e-08), (6, -2.6344396962754732e-08), (0, -1.320770122072987e-08), (5, -1.3134602817935242e-08), (7, -1.2028536724396929e-08), (4, 6.552464493428658e-09), (3, -4.417581756282682e-09), (2, 4.1708095260087135e-09)]
populating...
    model=unknown;
        message from solver=<undefined>
excepting...
error:  solver status is not OK
None
[ 47.   1.   0.   0.   0.   0.   0.   1.  23. 403.]
model_pred:  -1
intercept:  -0.10697071192483043
[25.  4.  1.  0. 10.  7.  0.  1.  1.  0.  0.  0.  0.  1.  0.  0.  0.  1.]
lime


Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


exp.local_exp: [(2, -1.1105344728242875e-06), (8, -2.742587990446923e-07), (6, -2.1073042451106732e-07), (3, -1.9870334589471332e-07), (0, 1.864145755877166e-07), (1, -1.1235347989383945e-07), (5, -8.819741316876205e-08), (9, 1.3104963066884192e-08), (4, 9.826707884792976e-09), (7, -7.799681103892068e-09)]
populating...
    model=unknown;
        message from solver=<undefined>
excepting...
error:  solver status is not OK
None
[25.  1.  4.  1.  0.  0.  3.  1. 10.  7.]
model_pred:  -1
intercept:  -0.17653261120724434
[ 31.   0.   0.   1.  15. 135.   1.   0.   1.   0.   1.   0.   0.   0.
   0.   0.   0.   1.]
lime


Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


exp.local_exp: [(0, 0.14637916401748197), (8, -0.11260756662604836), (6, -0.0985589431822098), (1, 0.07797015646163653), (9, -0.046173202319159015), (4, -0.035309961455152146), (2, -0.029002851655540007), (3, -0.02513035681695895), (5, 0.010836726299809609), (7, -0.0003695405668314329)]
populating...
obtained 20 items in 0.9 seconds
{'age': 14.0}
{'age': 14.0, 'length-of-stay': -1.0}
{'age': 12.0, 'priors-count': -1.0}
{'age': 12.0, 'priors-count': -1.0, 'length-of-stay': -1.0}
{'priors-count': -7.0, 'length-of-stay': -7.0}
{'priors-count': -8.0}
{'age': 8.0, 'juv-other-count': -1.0}
{'age': 8.0, 'juv-other-count': -1.0, 'length-of-stay': -1.0}
{'age': 6.0, 'juv-other-count': -1.0, 'priors-count': -1.0}
{'age': 6.0, 'juv-other-count': -1.0, 'priors-count': -1.0, 'length-of-stay': -1.0}
{'juv-other-count': -1.0, 'priors-count': -4.0, 'length-of-stay': -1.0}
{'juv-other-count': -1.0, 'priors-count': -5.0}
{'juv-other-count': -1.0, 'length-of-stay': -106.0}
{'age': 14.0, 'race_4': 1.0}
{'

Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


obtained 20 items in 1.2 seconds
{'age': 17.0}
{'age': 17.0, 'length-of-stay': 3.0}
{'age': 16.0, 'priors-count': -1.0}
{'age': 16.0, 'priors-count': -1.0, 'length-of-stay': 3.0}
{'priors-count': -15.0}
{'age': 17.0, 'race_1': 1.0}
{'age': 17.0, 'length-of-stay': 3.0, 'race_1': 1.0}
{'age': 16.0, 'priors-count': -1.0, 'race_1': 1.0}
{'age': 16.0, 'priors-count': -1.0, 'length-of-stay': 3.0, 'race_1': 1.0}
{'age': 17.0, 'race_5': 1.0}
{'age': 17.0, 'length-of-stay': 3.0, 'race_5': 1.0}
{'age': 17.0, 'race_3': 1.0}
{'age': 17.0, 'length-of-stay': 3.0, 'race_3': 1.0}
{'age': 16.0, 'priors-count': -1.0, 'race_5': 1.0}
{'age': 17.0, 'sex_0': 1.0}
model_pred:  -1
intercept:  -0.09089902460925531
[24.  0.  0.  1.  3. 35.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.1531036369382588), (8, -0.12893633192955967), (1, -0.11257684546065169), (6, -0.10924249626604081), (9, -0.05623673952012948), (3, -0.03918043302199757), (2, -0.028970896493934653), (4, -0.02152478274

Traceback (most recent call last):
  File "<ipython-input-20-ac4635999753>", line 151, in get_recourse
    fb = fb.populate(enumeration_type = 'distinct_subsets', total_items = 20)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/flipset.py", line 170, in populate
    items = self._builder.populate(total_items = total_items, enumeration_type = enumeration_type, time_limit = time_limit, node_limit = node_limit, display_flag = display_flag)
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 585, in populate
    info = self.solution_info
  File "/Users/alexisross/Desktop/Thesis/actionable-recourse/recourse/builder.py", line 1083, in solution_info
    raise ValueError('solver status is not OK')
ValueError: solver status is not OK


obtained 20 items in 1.0 seconds
{'age': 9.0, 'race_4': 1.0}
{'age': 9.0, 'length-of-stay': -2.0, 'race_4': 1.0}
{'age': 7.0, 'priors-count': -1.0, 'race_4': 1.0}
{'age': 7.0, 'priors-count': -1.0, 'length-of-stay': -2.0, 'race_4': 1.0}
{'priors-count': -5.0, 'race_4': 1.0}
{'priors-count': -5.0, 'length-of-stay': -2.0, 'race_4': 1.0}
{'age': 9.0, 'race_5': 1.0}
{'age': 9.0, 'length-of-stay': -2.0, 'race_5': 1.0}
model_pred:  -1
intercept:  0.04025934066732928
[22.  0.  1.  0. 12. 35.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
lime
exp.local_exp: [(0, 0.17516072033788233), (6, -0.13237048409195196), (8, -0.1194529454074195), (1, -0.11805769755054285), (3, -0.04425725469556256), (2, -0.030572843440220952), (4, -0.026587225846236674), (9, 0.009098651199192802), (7, -0.006272924049996475), (5, -0.0015947875107497282)]
populating...
obtained 20 items in 1.3 seconds
{'age': 19.0}
{'age': 19.0, 'length-of-stay': 3.0}
{'age': 18.0, 'priors-count': -1.0}
{'age': 18.0, 'priors-count': -1.

age                34.622434
two-year-recid      0.000000
juv-misd-count      0.096175
juv-fel-count       0.074130
juv-other-count     0.113464
c-charge-degree     0.000000
race                0.000000
sex                 0.000000
priors-count        3.481522
length-of-stay     18.416685
dtype: float64

+-------------------+---------------+---------+------------+----------------+----------------+-----------+-----------+-----------+------+-------+
|              name | variable type | mutable | actionable | step direction | flip direction | grid size | step type | step size |   lb |    ub |
+-------------------+---------------+---------+------------+----------------+----------------+-----------+-----------+-----------+------+-------+
|               age | <class 'int'> |    True |       True |              0 |            nan |        48 |  relative |      0.01 | 20.0 |  67.0 |
|    juv-misd-count | <class 'int'> |    True |       True |              0 |            nan |         3 |  relative |      0.01 |  0.0 |   2.0 |
|     juv-fel-count | <class 'int'> |    True |       True |              0 |            nan |         3 |  relative |      0.01 |  0.0 |   2.0 |
|   juv-other-count | <class 'int'> |    True |       True |              0 |            nan |         3 |  relative |      

4627

 0  out of  66
time elapsed:  3.505150477091472e-05  minutes
[24.  0.  0.  0.  7.  0.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
maple
populating...


  return bound(*args, **kwds)


obtained 20 items in 1.2 seconds
{'age': 3.0}
{'age': 1.0, 'priors-count': -1.0}
{'age': 3.0, 'length-of-stay': 2.0}
{'age': 1.0, 'priors-count': -1.0, 'length-of-stay': 2.0}
{'priors-count': -2.0}
{'priors-count': -2.0, 'length-of-stay': 2.0}
{'age': 3.0, 'race_1': 1.0}
{'age': 1.0, 'priors-count': -1.0, 'race_1': 1.0}
{'age': 3.0, 'length-of-stay': 2.0, 'race_1': 1.0}
{'age': 1.0, 'priors-count': -1.0, 'length-of-stay': 2.0, 'race_1': 1.0}
{'priors-count': -2.0, 'race_1': 1.0}
{'priors-count': -2.0, 'length-of-stay': 2.0, 'race_1': 1.0}
{'age': 3.0, 'race_5': 1.0}
{'age': 1.0, 'priors-count': -1.0, 'race_5': 1.0}
model_pred:  -1
intercept:  -0.20888010839475762
[30.  1.  0.  0. 14.  2.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
maple
populating...
obtained 20 items in 1.0 seconds
{'priors-count': -8.0}
{'priors-count': -8.0, 'length-of-stay': -3.0}
{'age': 7.0, 'juv-misd-count': -1.0, 'priors-count': -1.0}
{'age': 8.0, 'juv-misd-count': -1.0, 'length-of-stay': -3.0}
{'age': 7.0

obtained 20 items in 0.9 seconds
{'age': 9.0}
{'age': 7.0, 'priors-count': -1.0}
{'age': 9.0, 'length-of-stay': -3.0}
{'age': 7.0, 'priors-count': -1.0, 'length-of-stay': -3.0}
{'priors-count': -7.0}
{'priors-count': -7.0, 'length-of-stay': -3.0}
{'age': 7.0, 'juv-misd-count': -1.0}
{'age': 6.0, 'juv-misd-count': -1.0, 'priors-count': -1.0}
{'age': 7.0, 'juv-misd-count': -1.0, 'length-of-stay': -3.0}
{'age': 6.0, 'juv-misd-count': -1.0, 'priors-count': -1.0, 'length-of-stay': -3.0}
{'juv-misd-count': -1.0, 'priors-count': -6.0}
{'juv-misd-count': -1.0, 'priors-count': -6.0, 'length-of-stay': -3.0}
{'age': 9.0, 'race_1': 1.0}
{'age': 7.0, 'priors-count': -1.0, 'race_1': 1.0}
{'age': 9.0, 'length-of-stay': -3.0, 'race_1': 1.0}
{'age': 7.0, 'priors-count': -1.0, 'length-of-stay': -3.0, 'race_1': 1.0}
model_pred:  -1
intercept:  -0.21382464391725953
[23.  0.  0.  0.  6.  1.  0.  1.  1.  0.  1.  0.  0.  0.  0.  0.  0.  1.]
maple
populating...
obtained 20 items in 0.9 seconds
{'age': 4.0}
{'

populating...
obtained 20 items in 0.9 seconds
{'age': 6.0}
{'age': 6.0, 'length-of-stay': -2.0}
{'age': 4.0, 'priors-count': -1.0}
{'age': 4.0, 'priors-count': -1.0, 'length-of-stay': -2.0}
{'juv-fel-count': -1.0}
{'age': 1.0, 'juv-fel-count': -1.0}
{'juv-fel-count': -1.0, 'length-of-stay': -2.0}
{'age': 1.0, 'juv-fel-count': -1.0, 'length-of-stay': -2.0}
{'juv-fel-count': -1.0, 'priors-count': -1.0}
{'age': 1.0, 'juv-fel-count': -1.0, 'priors-count': -1.0}
{'juv-fel-count': -1.0, 'priors-count': -1.0, 'length-of-stay': -2.0}
{'age': 1.0, 'juv-fel-count': -1.0, 'priors-count': -1.0, 'length-of-stay': -2.0}
model_pred:  -1
intercept:  -0.029876192774538457
[ 25.   1.   0.   0.  10. 260.   0.   1.   1.   0.   1.   0.   0.   0.
   0.   0.   0.   1.]
maple
populating...
obtained 20 items in 1.0 seconds
{'age': 10.0}
{'age': 9.0, 'priors-count': -1.0}
{'priors-count': -9.0}
{'age': 10.0, 'length-of-stay': -3.0}
{'age': 9.0, 'juv-misd-count': -1.0}
{'age': 9.0, 'priors-count': -1.0, 'length

In [None]:
import importlib
importlib.reload(explanation)
importlib.reload(lime_tabular)
importlib.reload(lime_base)

compas_nn_stats = run_all("V3_3_31_compas_nn_unnormalized-training-ALSO-maple-should-be-fixed.txt", compas_nn, compas_enc, compas_data, \
                          compas_categorical_features, compas_categorical_names)

order = ['baseline', 'cluster_0.005', 'cluster_0.01', 'cluster_0.05', 'lime', 'maple', 'maple_sampled']
names = ['Baseline', 'Cluster\n(0.005)', 'Cluster\n(0.01)', 'Cluster\n(0.05)', 'LIME', 'MAPLE', 'MAPLE + Sampling']

color_orders = ['#F7DC6F', '#f7b7d9', '#f98d94', '#E74C3C', '#7DCEA0', '#b5a8ea','#6a5acd']

plot_metric(compas_nn_stats, 'recourse_accuracy_all', order, "COMPAS: Projected Accuracy of Flipsets", names, colors = color_orders)


In [None]:
compas_nn2 = MLPClassifier()

compas_data2 = get_data(compas_X, compas_y, test_size = 0.3)

ohe_compas_X_train2, _ = ohe(compas_data2['X_train'], compas_categorical_names, compas_enc)
ohe_compas_X_val2, _ = ohe(compas_data2['X_val'], compas_categorical_names, compas_enc)
ohe_compas_X_test2, _ = ohe(compas_data2['X_test'], compas_categorical_names, compas_enc)

compas_nn2.fit(ohe_compas_X_train2, compas_data2['y_train']) 
print("validation score: ", round(compas_nn2.score(ohe_compas_X_val2, compas_data2['y_val']), 2))
test_preds2 = pd.Series(compas_nn.predict(ohe_compas_X_test2))
print("test predictions split: ")
print(test_preds2.value_counts())

compas_nn_stats2 = run_all("V3_featselectionauto_3_30_ohe_final_compas_nn2.txt", compas_nn2, compas_enc, compas_data2, \
                          compas_categorical_features, compas_categorical_names)

order = ['baseline', 'cluster_0.001', 'cluster_0.05', 'cluster_0.01', 'lime', 'maple', 'maple_sampled']
names = ['Baseline', 'Cluster\n(0.001)', 'Cluster\n(0.05)', 'Cluster\n(0.01)', 'LIME', 'MAPLE', 'MAPLE + Sampling']

color_orders = ['#F7DC6F', '#f7b7d9', '#f98d94', '#E74C3C', '#7DCEA0', '#b5a8ea','#6a5acd']

plot_metric(compas_nn_stats2, 'recourse_accuracy_all', order, "COMPAS: Projected Accuracy of Flipsets", names, colors = color_orders)


In [None]:
# RANDOM FOREST

compas_rf = RandomForestClassifier()

compas_rf.fit(ohe_compas_X_train, compas_data['y_train']) 
print("validation score: ", round(compas_rf.score(ohe_compas_X_val, compas_data['y_val']), 2))
test_preds = pd.Series(compas_rf.predict(ohe_compas_X_test))
print("test predictions split: ")
print(test_preds.value_counts())

In [None]:
# compas_rf_stats = run_all("V23_29_ohe_final_compas_rf.txt", compas_rf, compas_enc, compas_data, \
#                           compas_categorical_features, compas_categorical_names)

order = ['baseline', 'cluster_0.01', 'cluster_0.05', 'cluster_0.1', 'cluster_0.15', \
             'cluster_0.2', 'lime', 'maple', 'maple_sampled']

plot_metric(compas_rf_stats, 'recourse_accuracy_all', order)


## German

In [None]:
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier
import lime.explanation
import lime.lime_tabular


# NEURAL NETWORK
 
german_nn = MLPClassifier(hidden_layer_sizes=(100), max_iter = 1000)

german_data = get_data(german_X, german_y, test_size = 0.5)

ohe_german_X_train, _ = ohe(german_data['X_train'], german_categorical_names, german_enc)
ohe_german_X_val, _ = ohe(german_data['X_val'], german_categorical_names, german_enc)
ohe_german_X_test, _ = ohe(german_data['X_test'], german_categorical_names, german_enc)

train_stddev = ohe_german_X_train[ohe_german_X_train.columns[:]].std()
train_mean = ohe_german_X_train[ohe_german_X_train.columns[:]].mean()

for feat in german_categorical_features:
    train_mean[feat] = 0
    train_stddev[feat] = 1

# Normalize to have mean 0 and variance 1
norm_X_train = (ohe_german_X_train - train_mean) / train_stddev
norm_X_val = (ohe_german_X_val - train_mean) / train_stddev
norm_X_test = (ohe_german_X_test - train_mean) / train_stddev

german_nn.fit(norm_X_train, german_data['y_train']) 
print("validation score: ", round(german_nn.score(norm_X_val, german_data['y_val']), 2))
test_preds = pd.Series(german_nn.predict(norm_X_test))
print("test predictions split: ")
print(test_preds.value_counts())

In [None]:
run_all("3_28_ohe_final_german_nn.txt", german_nn, german_enc, german_data, german_categorical_features, \
        german_categorical_names)

In [None]:
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier

# NEURAL NETWORK

german_rf = RandomForestClassifier()

german_rf.fit(ohe_german_X_train, german_data['y_train']) 
print("validation score: ", round(german_rf.score(ohe_german_X_val, german_data['y_val']), 2))
test_preds = pd.Series(german_rf.predict(ohe_german_X_test))
print("test predictions split: ")
print(test_preds.value_counts())

In [None]:
run_all("3_28_ohe_final_german_rf.txt", german_rf, german_enc, german_data, german_categorical_features, \
        german_categorical_names)

## ADULT

In [None]:
# NEURAL NETWORK

adult_nn = MLPClassifier()

adult_data = get_data(adult_X, adult_y, test_size = 0.1)

adult_nn.fit(adult_data['X_train'], adult_data['y_train']) 
print("validation score: ", round(adult_nn.score(adult_data['X_val'], adult_data['y_val']), 2))
test_preds = pd.Series(adult_nn.predict(adult_data['X_test']))
print("test predictions split: ")
print(test_preds.value_counts())

In [None]:
run_all("3_23_adult_nn.txt", adult_nn, adult_data, adult_categorical_features, \
        adult_categorical_names)

In [None]:
# RANDOM FOREST

adult_rf = RandomForestClassifier()

adult_rf.fit(adult_data['X_train'], adult_data['y_train']) 
print("validation score: ", round(adult_rf.score(adult_data['X_val'], adult_data['y_val']), 2))
test_preds = pd.Series(adult_rf.predict(adult_data['X_test']))
print("test predictions split: ")
print(test_preds.value_counts())

In [None]:
run_all("3_23_adult_rf.txt", adult_rf, adult_data, adult_categorical_features, \
        adult_categorical_names)

You can switch optimizers if you don't have CPLEX by setting `optimizer="cbc"`. 

A quick note: Our decision boundary is by default 0. We shift this by tweaking the intercept. Since we used Logistic Regression, we use the trick above to do that. In future iterations, we will provide a more elegant way of doing this.

In [None]:
output_1 = rb.fit()
output_1

all_info = rb.populate()
print(all_info)

Ok, great, we have a solution! This individual has recourse. The total cost of all the actions needed to flip their prediction is the first thing of interest to us. It costs this person $.21$, meaning that the sum of percentile shifts across this person's features is $.21$. That's quite a lot. Imagine having to shift that much relative to a population? Let's check out what this means in terms of actions:

In [None]:
# pd.Series(output_1['actions'], index=X.columns).to_frame('Actions')
actions = [x['actions'] for x in all_info]
actions_df = pd.DataFrame(data=actions).transpose().set_index(X.columns)
person = (pd.Series(x, index=X.columns))
print(person)
display(actions_df)

Ok, so let's read this. 

* `SavingsAccountBalance_geq_100`$=1$, for example. This was a binary feature, so it can only be $1$. This also means that we're enouraging this person to increase their savings. 
* `LoanDuration`$=20$. This, if we recall, was the number of months of loan. This means we're encouraging this person to reapply but specify that their loan repayment period is 20 months shorter.

Let's check if these two actions make sense in the context of this person:

In [None]:
X.loc[denied_individuals[0]].to_frame("Original Features")

Ok, this person originally applied with no savings and with a 4-year repayment period. So asking them to get savings and decrease their loan repayment period by $20$ months make sense as actions.

(Let's leave aside the question of mutually exclusive features (eg. `SavingsAccountBalance_geq_100` $=0$, `SavingsAccountBalance_geq_500`$=1$). We'll get back to that in later releases.)

Let's close by noting some things:

* Immutable features are __not__ changed. That's good. That's recourse.
* The changes make sense, at least directionally. We'd encourage this person to get a gaurantor, to decrease their loan amount, and to decrease their loan period, among other changes.

Yes, these might be hard for someone. They might have other reasons for immutability that we're not considering. Maybe they _need_ that amount and cannot change. Ok, let's express that:

In [None]:
action_set['LoanAmount'].mutable=False

In [None]:
x = X.values[denied_individuals[0]]

p = .8
rb = RecourseBuilder(
      optimizer="cbc",
      coefficients=coefficients,
      intercept=intercept- (np.log(p / (1. - p))),
      action_set=action_set,
      x=x
)

In [None]:
output_2 = rb.fit()
output_2

Ok, so their total cost actually didn't change, which is nice. Let's take a look at their new action set:

In [None]:
pd.Series(output_2['actions'], index=X.columns).to_frame("New Actions")

Ok, by decreasing their repayment period by a bit more and changing some other features, this person can still ask for the same amount. That's good.

The magical thing about both of these action sets is that this person, if they do this, _will_ qualify for a loan. Let's check that:

In [None]:
clf.predict_proba([X.loc[denied_individuals[0]] + pd.Series(output_1['actions'], index=X.columns)])[:, 1]

In [None]:
clf.predict_proba([X.loc[denied_individuals[0]] + pd.Series(output_2['actions'], index=X.columns)])[:, 1]

And there we have it. By making these tweaks, this person has two ways to get over the $.8$ threshold that we've set. This period can now get approved under this model.