# Project 1: Bank Credit
### adrianty & sondrewo

### We used the TestLending.py code as a baseline for the model development.

## Implicit assumptions of the data

The data was already labeled with some named columns in the provided TestLending script. We still decided to do some explorative testing, to see if we could use some heuristics to develop some additional understanding of the data. We found out that there were many more elements that were labelled with <em>2</em>, which is described in the documentation as <em>bad</em>, than there were those that were labelled with <em>1</em>, or <em>good</em>. Further inspection revealed that the average amount for the <em>bad</em> loans were quite a lot higher than the <em>good</em> ones. This could suggest that the data is taken from a position in time where the <em>bad</em> loans have already defaulted, and the <em>good</em> ones are in the process of being paid off.

This of course would make modelling extremely difficult, as we would have to normalize the amounts for the duration remaining. This would be extremely speculative and probably not make for good generalization, so we make the assumption that this is not the case, and that the duration and amount listed are initial values.

## Model development

We started our model development by inspecting the different columns in the data set, identifying both numerical and discrete  features. As is well known, the Naive Bayes classifier supports catergorical features natively and can be adjusted to use numerical ones as well. Thus, we formulated the following hypothesis:

H<sub>0</sub> : The Multinomial Naive Bayes classifier will provide a high accuracy

We then attempted to falsify this hypothesis (Exp 1) by testing out different models: Logistic regression, KNN, BernoulliNB, AdaBoost with Decision Tree and a simple multi-layered perceptron (MLP).

In [19]:
class NameBanker:
    def __init__(self, MODEL):
        self.model = MODEL
    
    def fit(self, X, y):
        self.data = [X, y]
        self.model.fit(X, y)

    def set_interest_rate(self, rate):
        self.rate = rate
        return

    def predict_proba(self, x):
        return self.model.predict_proba(np.array(x).reshape(1, -1))

    def expected_utility(self, x, action):
        utility = x["amount"] * ((1 + self.rate) ** x['duration'])
        U = np.matrix(f"0 0; {utility} {x['amount'] * -1}")

        probs = self.predict_proba(x)
        repaid = np.where(probs[0] == max(probs[0]))

        return U[action, repaid]

    def get_best_action(self, x):
        util = [self.expected_utility(x, a) for a in [0,1]]
        return util.index(max(util))

In [20]:
from sklearn.naive_bayes import MultinomialNB 
from sklearn.naive_bayes import BernoulliNB 
from sklearn.naive_bayes import GaussianNB 
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.neural_network import MLPClassifier
import pandas
import math
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
import random_banker
import shap

In [21]:
PATH = "./data/credit/D_valid.csv"
features = ['checking account balance', 'duration', 'credit history',
            'purpose', 'amount', 'savings', 'employment', 'installment',
            'marital status', 'other debtors', 'residence time',
            'property', 'age', 'other installments', 'housing', 'credits',
            'job', 'persons', 'phone', 'foreign']
target = 'repaid'

df = pandas.read_csv(PATH, sep=' ',
                     names=features+[target])

In [22]:
numerical_features = ['duration', 'age', 'residence time', 'installment', 'amount', 'persons', 'credits']
quantitative_features = list(filter(lambda x: x not in numerical_features, features))
X = pandas.get_dummies(df, columns=quantitative_features, drop_first=True)
encoded_features = list(filter(lambda x: x != target, X.columns))

In [29]:
X

Unnamed: 0,duration,amount,installment,residence time,age,credits,persons,repaid,checking account balance_2,checking account balance_3,...,property_3,property_4,other installments_2,other installments_3,housing_2,housing_3,job_2,job_3,job_4,foreign_2
0,42,6522,1,33,20,0,1,1,0,0,...,0,0,0,1,0,0,0,0,1,1
1,36,29993,6,9,65,2,2,2,1,0,...,1,0,1,0,0,0,0,0,1,1
2,0,31259,2,23,121,0,4,2,0,0,...,0,0,0,1,0,0,0,0,0,0
3,54,17852,10,22,29,0,5,1,1,0,...,1,0,0,1,0,0,0,0,1,1
4,54,8292,3,15,64,2,2,1,0,0,...,0,1,0,1,0,0,0,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,6,39948,4,7,24,1,1,2,0,0,...,0,1,1,0,0,0,0,0,0,1
996,54,10930,2,14,33,1,3,1,1,0,...,0,0,0,0,1,0,0,1,0,1
997,102,16508,9,7,24,0,1,2,0,0,...,1,0,1,0,0,0,1,0,0,1
998,54,1814,4,7,23,0,0,1,0,0,...,0,1,0,1,0,0,0,0,1,1


In [23]:
def test_decision_maker(X_test, y_test, interest_rate, decision_maker):
    n_test_examples = len(X_test)
    utility = 0

    ## Example test function - this is only an unbiased test if the data has not been seen in training
    total_amount = 0
    total_utility = 0
    decision_maker.set_interest_rate(interest_rate)
    for t in range(n_test_examples):
        action = decision_maker.get_best_action(X_test.iloc[t])
        good_loan = y_test.iloc[t] # assume the labels are correct
        duration = X_test['duration'].iloc[t]
        amount = X_test['amount'].iloc[t]
        # If we don't grant the loan then nothing happens
        if (action==1):
            if (good_loan != 1):
                utility -= amount
            else:
                utility += amount*(pow(1 + interest_rate, duration) - 1)
        total_utility += utility
        total_amount += amount
    return utility, total_utility/total_amount

In [24]:
interest_rate = 0.017
n_tests = 100

### Do a number of preliminary tests by splitting the data in parts
def run_test(models):
    '''
    args:
        models (dict): dictionary of models to test on. key=str (name of model), value=model
    returns:
        results: dictionary of the total utility and avg investment return per n, per model
    '''
    results = {}
    for name, model in models.items():
        print(name)
        decision_maker = NameBanker(model)
        utility = 0
        investment_return = 0
        for iter in range(n_tests):
            X_train, X_test, y_train, y_test = train_test_split(X[encoded_features], X[target], test_size=0.2)
            decision_maker.set_interest_rate(interest_rate)
            decision_maker.fit(X_train, y_train)
            Ui, Ri = test_decision_maker(X_test, y_test, interest_rate, decision_maker)
            utility += Ui
            investment_return += Ri
        results[name] = [math.floor((utility / n_tests) * 100)/100.0, math.floor((investment_return / n_tests) * 100)/100.0]
    return results, decision_maker

## Exp 1: Comparing different classification models:

In [25]:
results_test, model = run_test({"MultinomialNB": MultinomialNB()})

MultinomialNB


In [31]:
results = run_test({"KNN": KNeighborsClassifier(n_neighbors=31),
                    "BernoulliNB": BernoulliNB(),
                    "MultinomialNB": MultinomialNB(),
                    "Log.regression": LogisticRegression(max_iter=1500),
                    "Neural Net": MLPClassifier(alpha=1, max_iter=1000),
                    "AdaBoost": AdaBoostClassifier()})

KNN
BernoulliNB
MultinomialNB
Log.regression
Neural Net
AdaBoost


In [15]:
pandas.DataFrame(results_test.items(), columns=["Model", "Total Utility, Avg Investment Return"])

Unnamed: 0,Model,"Total Utility, Avg Investment Return"
0,MultinomialNB,"[5321433.41, 136.65]"


### Results of Exp 1:

Based on these results, we chose to keep our hypothesis H<sub>0</sub> and continue the development using the Multinomial NB model. 

<b> Assumption 1 </b>: 
    Since the results for Multinomial NB was that much better than for KNN, we assumed that changing the amount of neighbours would not out-perform NB and decided to only test for `k=floor(sqrt(n))=31` (a common approach for K selection for KNN)

## Exp 2: Comparison with RandomBanker.py

In [36]:
def run_test_single_model(maker):
    res = []
    decision_maker = maker
    utility = 0
    investment_return = 0
    for iter in range(n_tests):
        X_train, X_test, y_train, y_test = train_test_split(X[encoded_features], X[target], test_size=0.2)
        decision_maker.set_interest_rate(interest_rate)
        decision_maker.fit(X_train, y_train)
        Ui, Ri = test_decision_maker(X_test, y_test, interest_rate, decision_maker)
        utility += Ui
        investment_return += Ri
    
    res.append(math.floor((utility / n_tests) * 100)/100.0)
    res.append(math.floor((investment_return / n_tests) * 100)/100.0)
    return res

In [37]:
comp_test = {}
comp_test["Random banker"] = run_test_single_model(random_banker.RandomBanker())
comp_test["Name banker (our model)"] = run_test_single_model(NameBanker(MultinomialNB()))

In [38]:
pandas.DataFrame(comp_test.items(), columns=["Model", "Total Utility, Avg Investment Return"])

Unnamed: 0,Model,"Total Utility, Avg Investment Return"
0,Random banker,"[918250.86, 7.29]"
1,Name banker (our model),"[4647505.22, 108.96]"


### Results of Exp 2

The table above shows that our model performed better than the random banker module

# PART 2

We want to measure fairness in our model. One way to do that is through the p-percent metric:

$$ p\%-score=min ( \frac{𝑃(𝑦̂ =1|𝑧=1)𝑃(𝑦̂ =1|𝑧=0)} {𝑃(𝑦̂ =1|𝑧=0)𝑃(𝑦̂ =1|𝑧=1)})$$ 

This allows us to measure for demographic parity. We do this because we want the rate of positive outcomes to be equal for some protected/sensitive feautres, like gender and ethinicity. To measure this, we used the `p_percent_score` function from the `sklego` package. As seen below, when checking for the foreign worker feature in our training set, we get consistent p%-scores over 80%. So what does this mean? According to Zafar et.al (2017, see source), we can say that the score satisifes the criterion of disparate treatment if the ratio between the percentage of applicants with the foreign worker attribute being granted a loan and the percentage of
non-foreign workers being granted a loan is no less than 80:100, which holds.




Source (https://arxiv.org/abs/1507.05259)


In [50]:
from sklego.metrics import p_percent_score
X_train, X_test, y_train, y_test = train_test_split(X[encoded_features], X[target], test_size=0.2)
p = ('p_percent_score:', p_percent_score(sensitive_column="foreign_2")(model.model, X_train))
p

('p_percent_score:', 0.8396797743487641)