### <font color="darkblue"><i>A predictive model which tries to predict whether customers will accept a loan offer or not. The model uses only features that will be known at the time of sending an offer. Once the performance metrics of the model have been assessed and the performance (e.g. accuracy) is evaluated to be adequate enough, the model can be used by employees before sending the actual offers. If the model predicts that the offer will not be accepted (i.e. will be refused and/or cancelled) with a high probability, the employee can chose to change the offer and/or not send an offer / send multiple offers. This can reduce the company's overhead/resources in the offer sending process.</i></font>

### <font color="green">imports, preparation and configuration</font>

In [1]:
import tensorflow as tf
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import preprocessing, metrics
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
from IPython.display import HTML
HTML('''<script>
code_show_err=false; 
function code_toggle_err() {
 if (code_show_err){
 $('div.output_stderr').hide();
 } else {
 $('div.output_stderr').show();
 }
 code_show_err = !code_show_err
} 
$( document ).ready(code_toggle_err);
</script>
To toggle on/off output_stderr, click <a href="javascript:code_toggle_err()">here</a>.''')

In [3]:
offers = pd.read_table("../../data/laraData/offer_events.csv", sep=";")
applications = pd.read_table("../../data/laraData/application_events.csv", sep=";")
# all_events = pd.read_table("../data/laraData/events.csv", sep=";") not needed at the moment
offers.head()

Unnamed: 0,A_ID,O_ID,O_SCENARIO,O_DT_CREATED,O_AMOUNT,O_CREDITSCORE,O_FIRST_WITHDRAWL_AMOUNT,NUMBER_OF_TERMS,O_MONTHLY_COST,O_ACCEPTED,...,NUMBER_OF_CHILDREN,MARITAL_STATUS,SEX,HOUSING_TYPE,INCOMEAMOUNT_YEAR,SCE_CREDITSCORE,SCE_ACCEPTED,SCE_ACCEPTED_OVERRIDE,SCE_MAX_AMOUNT,SCE_MAX_AMOUNT_OVERRIDE
0,719696,210708,1367858,"2/1/2017 10:38:26,536000",29000.0,784.0,26604.0,120,29529,1,...,0,Samenwonend,Vrouw,Koophuis,41088.0,784.0,1.0,1.0,49340.0,49340.0
1,719696,210708,1367858,"2/1/2017 10:38:26,536000",29000.0,784.0,26604.0,120,29529,1,...,0,Samenwonend,Vrouw,Koophuis,41088.0,784.0,1.0,1.0,49340.0,49340.0
2,719696,210708,1367858,"2/1/2017 10:38:26,536000",29000.0,784.0,26604.0,120,29529,1,...,0,Samenwonend,Vrouw,Koophuis,41088.0,784.0,1.0,1.0,49340.0,49340.0
3,719696,210708,1367858,"2/1/2017 10:38:26,536000",29000.0,784.0,26604.0,120,29529,1,...,0,Samenwonend,Vrouw,Koophuis,41088.0,784.0,1.0,1.0,49340.0,49340.0
4,719697,210793,1367861,"2/1/2017 16:08:07,608000",10000.0,,10000.0,44,24958,1,...,0,Alleenstaand,Man,Thuiswonend/ Inwonend,21600.0,708.0,1.0,1.0,235875.0,235875.0


O_SELECTED tells us whetehr an offer is accepted(1)/or refused/cancelled (0)

In [4]:
made_offers = offers[pd.notnull(offers.O_SELECTED)]
made_offers.shape

(118931, 30)

In [5]:
len(made_offers.O_ID.unique())

33401

In [6]:
made_offers['OS_DT_CREATED'] =  pd.to_datetime(made_offers['OS_DT_CREATED'], format = "%d/%m/%Y %H:%M:%S,%f")
made_offers['O_DT_CREATED'] =  pd.to_datetime(made_offers['O_DT_CREATED'], format = "%d/%m/%Y %H:%M:%S,%f")

In [7]:
made_offers["time_diff_since_created"] =  (made_offers['O_DT_CREATED'] -
                                             made_offers['OS_DT_CREATED']).abs()  / np.timedelta64(1, 'D')

In [8]:
made_offers = (made_offers.loc[
    (made_offers.event == "O_Accepted") | 
    (made_offers.event == "O_Refused") |
    ((made_offers.event == "O_Cancelled") & (made_offers.OS_USER_CREATED != 'USER_1'))])

In [9]:
len(made_offers.O_ID.unique()), made_offers.shape

(26310, (26310, 31))

Features that are going to be considered:
* Marital status (e.g. married, single, divorced, widowed)
* Housing type (e.g. owner occupied property, rental property etc)
* Yearly income amount (yearly) (if married and/or living together: yearly income of partners)
* Number of children 
* Sex (Male, Female)
* Age (2017 - birthyear)
* <b>O_Amount</b> (offered amount by bank; does not need to be equal to request amount by applicant (but this is often the case))
* <b>Number of terms</b> (relevant to the applicant)
* <b>Monthly cost</b> (relevant to the applicant)
* Percentage offered amount of year income (amount / year_income) * 100 
* Percentage yearly cost (12 * monthly cost) of year income
* Difference between offered amount and inital requested amount

The attributes that are made bold, are attributes which the bank can vary from offer to offer (i.e. the variables of the offer). Other attributes have a relation to these attributes (e.g. last three attributes), while the other attributes is simply customer data. The hypothesis is that there are differences between (groups) of customers in the cognitive mechanisms behind the decision making process when faced with an offer.

Construct age attribute from the applicant's birthyear 

In [10]:
made_offers["age"]= 2017 - made_offers.BIRTHYEAR

Percentage offered amount of year income ((offered_amount / year_income) * 100)

In [11]:
made_offers["percentage_amount_income"] = (made_offers.O_AMOUNT / made_offers.INCOMEAMOUNT_YEAR) * 100

Percentage yearly cost (12 * monthly cost) of year income

In [12]:
made_offers["percentage_cost_income"] = ((made_offers.O_MONTHLY_COST * 12) / made_offers.INCOMEAMOUNT_YEAR) * 100

Enrich offer data with application data (e.g. the applicant's initial requested amount , the application type (e.g. new credit, raise limit) and application loan goal (e.g. car, homeimprovements etc.)

In [13]:
unique_applications = applications.drop_duplicates(subset = 'A_ID')
unique_applications = unique_applications[['A_ID', 'A_INIT_REQ_AMT', 'A_APPLICATIONTYPE_DESC', 'A_LOANGOALTYPE_DESC']]

Enrich offer data with initial requested amount (not necesarilly equal to offered amount), application type and loan goal

In [14]:
made_offers = made_offers.merge(unique_applications, on = "A_ID")

Calculate the difference between the offered amount and the initial requested amount

In [15]:
made_offers["diff_req_and_offered_amount"] = abs(made_offers.A_INIT_REQ_AMT - made_offers.O_AMOUNT)

### <font color="green">Configure classification model</font>

Target attribute (i.e. label): O_SELECTED

In [16]:
income = tf.contrib.layers.real_valued_column('INCOMEAMOUNT_YEAR')
age = tf.contrib.layers.real_valued_column("age")
o_amount = tf.contrib.layers.real_valued_column('O_AMOUNT')
number_of_terms = tf.contrib.layers.real_valued_column('NUMBER_OF_TERMS')
monthly_cost = tf.contrib.layers.real_valued_column('O_MONTHLY_COST')
diff_req_and_offered_amount = tf.contrib.layers.real_valued_column("diff_req_and_offered_amount")

In [17]:
income_buckets = tf.contrib.layers.bucketized_column(income, boundaries=[0, 19982, 33791, 67072, 1250000])

In [18]:
age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 23, 28, 33, 38, 42, 47, 52, 57, 65, 75])

In [19]:
loanGoal = tf.contrib.layers.sparse_column_with_keys(column_name = 'A_LOANGOALTYPE_DESC',
                                                                       keys = ['Auto', 'Woningverbetering', 'Overname lopende leningen',
                                                                            'Tourcaravan / camper', 'Anders, zie toelichting', 'nan',
                                                                            'Extra Bestedingsruimte', 'Niet gespecificeerd',
                                                                            'Restschuld woning', 'Boot', 'Motor', 'Belastingbetalingen',
                                                                            'Zakelijk doel (no go)', 'Schuldsanering (no go)',
                                                                            'Verbouwing aan het huis', 'verbouwing meubels/interieur',
                                                                            'Verbouwing aan tuin', 'Zakelijk pand (no go)', 'Onderhandse lening'])
applicationType = tf.contrib.layers.sparse_column_with_keys(column_name = 'A_APPLICATIONTYPE_DESC',
                                                                        keys = ['Nieuw krediet', 'Limietverhoging'])
housingType = tf.contrib.layers.sparse_column_with_keys(column_name = 'HOUSING_TYPE',
                                                                       keys = ['Koophuis', 'Thuiswonend/ Inwonend', 'Huurhuis / Kamers'])
gender = tf.contrib.layers.sparse_column_with_keys(column_name = 'SEX', keys = ['Man', 'Vrouw'])
marital_status = tf.contrib.layers.sparse_column_with_keys(column_name = 'MARITAL_STATUS', keys = ['Samenwonend', 'Alleenstaand',
                                                                                              'Gehuwd', 'Geregistreerd partnerschap'])

In [20]:
wide_columns = [loanGoal, applicationType, housingType, gender, marital_status, age_buckets, income_buckets]
deep_columns = [
    tf.contrib.layers.embedding_column(loanGoal, dimension=8),
    tf.contrib.layers.embedding_column(applicationType, dimension=8),
    tf.contrib.layers.embedding_column(housingType, dimension=8),
    tf.contrib.layers.embedding_column(gender, dimension=8),
    tf.contrib.layers.embedding_column(marital_status, dimension=8),
    income, age, o_amount, number_of_terms, monthly_cost, diff_req_and_offered_amount
]



In [21]:
m = tf.contrib.learn.DNNLinearCombinedClassifier (linear_feature_columns = wide_columns,
                                                 dnn_feature_columns = deep_columns, dnn_hidden_units = [32, 16, 32])

Instructions for updating:
Please set fix_global_step_increment_bug=True and update training steps in your pipeline. See pydoc for details.
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_task_type': None, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x0000025A586C69B0>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': True, '_evaluation_master': '', '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_log_step_count_steps': 100, '_session_config': None, '_save_checkpoints_steps': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': 'C:\\Users\\s158881\\AppData\\Local\\Temp\\tmp9hdipdjw'}


In [22]:
COLUMNS = ['INCOMEAMOUNT_YEAR', 'age', 'O_AMOUNT', 'NUMBER_OF_TERMS', 'O_MONTHLY_COST', 'diff_req_and_offered_amount', 
          'A_LOANGOALTYPE_DESC', 'A_APPLICATIONTYPE_DESC', 'HOUSING_TYPE', 'SEX', 'MARITAL_STATUS', 'O_SELECTED']
LABEL = ['O_SELECTED']
CATEGORICAL = ['A_LOANGOALTYPE_DESC', 'A_APPLICATIONTYPE_DESC', 'HOUSING_TYPE', 'SEX', 'MARITAL_STATUS']
CONTINUOUS = ['INCOMEAMOUNT_YEAR', 'age', 'O_AMOUNT', 'NUMBER_OF_TERMS', 'O_MONTHLY_COST', 'diff_req_and_offered_amount']

In [23]:
made_offers = made_offers[['INCOMEAMOUNT_YEAR', 'age', 'O_AMOUNT', 'NUMBER_OF_TERMS', 'O_MONTHLY_COST', 'diff_req_and_offered_amount', 
          'A_LOANGOALTYPE_DESC', 'A_APPLICATIONTYPE_DESC', 'HOUSING_TYPE', 'SEX', 'MARITAL_STATUS', 'O_SELECTED']]
made_offers = made_offers.dropna(how = "any")
made_offers.shape

(24603, 12)

In [24]:
# made_offers = made_offers.apply(preprocessing.LabelEncoder().fit_transform)

In [25]:
train, test = train_test_split(made_offers, test_size = 0.2, random_state = 700)

In [26]:
def input_fn(df): #to tensor
    continuous_cols = {k: tf.constant(df[k].values,
                                     shape = [df[k].size, 1])
                      for k in CONTINUOUS}
    
    categorical_cols = { # df[k].size = num of row
        k: tf.SparseTensor(
            indices=[[i,0] for i in range(df[k].size)],  
            values=df[k].values, 
            dense_shape=[df[k].size,1]) for k in CATEGORICAL}
    feature_cols = dict(continuous_cols)
    feature_cols.update(categorical_cols)
    label = tf.constant(df[LABEL].values)
    return feature_cols, label
def train_input_fn():
    return input_fn(train)
def test_input_fn():
    return input_fn(test)

In [27]:
m.fit(input_fn = train_input_fn, steps = 1000)

Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 2 into C:\Users\s158881\AppData\Local\Temp\tmp9hdipdjw\model.ckpt.
INFO:tensorflow:loss = 618.522, step = 2
INFO:tensorflow:global_step/sec: 21.9629
INFO:tensorflow:global_step/sec: 23.0841
INFO:tensorflow:loss = 1.02839, step = 202 (8.881 sec)
INFO:tensorflow:global_step/sec: 25.0329
INFO:tensorflow:global_step/sec: 24.1263
INFO:tensorflow:loss = 0.658075, step = 402 (8.141 sec)
INFO:tensorflow:global_step/sec: 24.457
INFO:tensorflow:global_step/sec: 24.7477
INFO:tensorflow:loss = 0.624959, step = 602 (8.130 sec)
INFO:tensorflow:global_step/sec: 24.2081
INFO:tensorflow:global_step

DNNLinearCombinedClassifier(params={'head': <tensorflow.contrib.learn.python.learn.estimators.head._BinaryLogisticHead object at 0x0000025A5BEB6E10>, 'linear_feature_columns': (_SparseColumnKeys(column_name='A_LOANGOALTYPE_DESC', is_integerized=False, bucket_size=None, lookup_config=_SparseIdLookupConfig(vocabulary_file=None, keys=('Auto', 'Woningverbetering', 'Overname lopende leningen', 'Tourcaravan / camper', 'Anders, zie toelichting', 'nan', 'Extra Bestedingsruimte', 'Niet gespecificeerd', 'Restschuld woning', 'Boot', 'Motor', 'Belastingbetalingen', 'Zakelijk doel (no go)', 'Schuldsanering (no go)', 'Verbouwing aan het huis', 'verbouwing meubels/interieur', 'Verbouwing aan tuin', 'Zakelijk pand (no go)', 'Onderhandse lening'), num_oov_buckets=0, vocab_size=19, default_value=-1), combiner='sum', dtype=tf.string), _SparseColumnKeys(column_name='A_APPLICATIONTYPE_DESC', is_integerized=False, bucket_size=None, lookup_config=_SparseIdLookupConfig(vocabulary_file=None, keys=('Nieuw kredi

In [28]:
result = m.evaluate(input_fn=test_input_fn, steps=1)
for i, key in enumerate(sorted(result)):
    print ("Term%d, %s: %s" %(i,key, result[key]))

Instructions for updating:
Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.
INFO:tensorflow:Starting evaluation at 2017-12-11-13:35:57
INFO:tensorflow:Restoring parameters from C:\Users\s158881\AppData\Local\Temp\tmp9hdipdjw\model.ckpt-1000
INFO:tensorflow:Evaluation [1/1]
INFO:tensorflow:Finished evaluation at 2017-12-11-13:35:59
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.678927, accuracy/baseline_label_mean = 0.681162, accuracy/threshold_0.500000_mean = 0.678927, auc = 0.547812, auc_precision_recall = 0.726101, global_step = 1000, labels/actual_label_mean = 0.681162, labels/prediction_mean = 0.683432, loss = 0.776906, precision/positive_threshold_0.500000_mean = 0.681186, recall/positive_threshold_0.500000_mean = 0.