## Jose Mijangos<br>CST 463<br>Oct 17, 2018
# Neural Net for Predicting Default on Credit Card Payment
Defaulting on credit card debt is very serious because both the credit provider and the client suffer financially as a result. Our motivation is to minimize the cost associated with clients who default by developing a neural network that predicts whether or not a client will default on their credit card payment the following month. If the neural network accurately classifies clients, then the bank can use the neural network's predictions to detect clients at high risk of defaulting before providing a loan or credit increase.
## Imported Modules

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
sns.set(style='darkgrid')

## Custom Transformers

In [2]:
# Selects columns from a dataframe
class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, columns):
        self.columns = columns
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        assert isinstance(X, pd.DataFrame)
        try:
            return X[self.columns]
        except KeyError:
            cols_error = list(set(self.columns) - set(X.columns))
            raise KeyError("The DataFrame does not include the columns: %s" % cols_error)

# Label encodes catagorical features
class MultiColumnLabelEncoder:
    def __init__(self, columns = None):
        self.columns = columns # list of column to encode
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        output = X.copy()        
        if self.columns is not None:
            for col in self.columns:
                output[col] = LabelEncoder().fit_transform(output[col])
        else:
            for colname, col in output.iteritems():
                output[colname] = LabelEncoder().fit_transform(col)        
        return output
    def fit_transform(self, X, y=None):
        return self.fit(X, y).transform(X)

## Load Data
The data contains no missing values, has 25 columns, and 25000 rows.

In [3]:
# Load data
data = pd.read_csv("C:/Users/Josemij39/Desktop/default_cc_train.csv")

## Derive Time Series Features
In order to get the best performance from our neural network, we derive new features from the time series data present in the data set. The code below fits a linear model to the time series data of each instance. The coefficients of every linear fit are used as new features. We will also use the mean and standard deviation of the clients pay/bill and bill*pay time series data as new features. 

In [4]:
# Select Time Series Features                   
bill_amt_subset = data.iloc[:,[12, 13, 14, 15, 16, 17]]
pay_amt_subset = data.iloc[:,[18, 19, 20, 21, 22, 23]]

# Placeholder for derived features
bill_predictors = np.array([])
pay_predictors = np.array([])

# Gets coefficients of linear bill time series fit
for index, row in bill_amt_subset.iterrows():
    fit = np.polyfit([i for i in range(1,7)], row, 1)
    
    if bill_predictors.size == 0:
        bill_predictors = fit
    else:
        bill_predictors = np.vstack([bill_predictors, fit])

# Gets coefficients of linear pay time series fit
for index, row in pay_amt_subset.iterrows():
    fit = np.polyfit([i for i in range(1,7)], row, 1)
    
    if pay_predictors.size == 0:
        pay_predictors = fit
    else:
        pay_predictors = np.vstack([pay_predictors, fit])
        
# Concatenate data and derived features
bill_amt_coeff = pd.DataFrame(bill_predictors)
pay_amt_coeff = pd.DataFrame(pay_predictors)
data = pd.concat([data, bill_amt_coeff, pay_amt_coeff], axis=1)

# Derive pay/bill features
data["pay/bill_amt_mean"] = np.mean(pay_amt_subset.T) / (np.mean(bill_amt_subset.T) + 1)
data["pay/bill_amt_sd"] = np.std(pay_amt_subset.T) / (np.std(bill_amt_subset.T) + 1)

# Derive bill*pay features
data["bill*pay_amt_mean"] = np.mean(pay_amt_subset.T) * np.mean(bill_amt_subset.T)
data["bill*bill_amt_sd"] = np.mean(pay_amt_subset.T) * np.std(bill_amt_subset.T)

## Preprocess Data
Before we use data for machine learning, we preprocess the data by imputing and scaling numeric features and label encoding catagorical features.

In [5]:
# Our pipelines need to know which features are numeric and which features are Categorical
num_features = data.columns[[1,5,12,13,14,15,16,17,18,19,20,21,22,23,25,26,27,28,29,30,31,32]]
cat_features = data.columns[[2,3,4,6,7,8,9,10,11]]

# Numeric pipeline
num_pipeline = Pipeline([
  ("selector", DataFrameSelector(num_features)),
  ("remove_nas", SimpleImputer(strategy="median")),
  ("z-scaling", StandardScaler())
])

# Categorical pipeline
cat_pipeline = Pipeline([
  ('selector', DataFrameSelector(cat_features)),
  ('labeler', MultiColumnLabelEncoder()),
  ('encoder', OneHotEncoder(sparse = False, categories='auto')),
])

# Transform and store data as X then store class labels as y
X = np.concatenate((num_pipeline.fit_transform(data), cat_pipeline.fit_transform(data)), 1)
y = data["default.payment.next.month"].values

## Split Data for Machine Learning
We split the data 50/50 to ensure a somewhat even distribution of Positive and Negative classes in the training and validation set.

In [27]:
# The neural network will learn from the training set and we will use the validation set to gauge the models performance
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.5)

# Displayes distribution of default in training set
print("\nDefault distribution in training set")
print("\tPositives: ", np.sum(y_train == 1))
print("\tNegatives: ", np.sum(y_train == 0), "\n")

# Displayes distribution of default in validation set
print("Default distribution in validation set")
print("\tPositives: ", np.sum(y_valid == 1))
print("\tNegatives: ", np.sum(y_valid == 0))


Default distribution in training set
	Positives:  2713
	Negatives:  9787 

Default distribution in validation set
	Positives:  2801
	Negatives:  9699


## Neural Network with plain TensorFlow and Batch Normalization
We begin by defining the shape of the neural network and placeholders for mini batch gradient descent. We also define a function that creates neuron layers. After constructing the neural network, we define the loss function, a performance measure, and then execute training and validation of the neural network.

In [29]:
tf.reset_default_graph()

# Input layer one neuron for each feature in X
n_inputs = X.shape[1]

# Our neural net has ten hidden layers with geometrically decaying number of neurons
n_hidden1 = 512
n_hidden2 = 512
n_hidden3 = 256
n_hidden4 = 256
n_hidden5 = 128
n_hidden6 = 128
n_hidden7 = 64
n_hidden8 = 64
n_hidden9 = 32
n_hidden10 = 32

# There are two possible classes so the output layer has two neurons
n_outputs = 2

# These placeholders will be used to store mini batches from the training set 
X_hold = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X_hold")
y_hold = tf.placeholder(tf.int32, shape=(None), name="y_hold")

# Defines a neuron layer by its input, number of neurons, name, and activation function
def neuron_layer(X, n_neurons, name, activation=None):
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1])
        stddev = np.std(y) + 1
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)
        W = tf.Variable(init, name="kernel")
        b = tf.Variable(tf.zeros([n_neurons]), name="bias")
        Z = tf.matmul(X, W) + b
        if activation is not None:
            Z = tf.layers.batch_normalization(Z, True)
            return activation(Z)
        else:
            return Z

# Construction of the neural network
with tf.name_scope("dnn"):
    hidden1 = neuron_layer(X_hold, n_hidden1, name="hidden1", activation=tf.nn.elu)
    hidden2 = neuron_layer(hidden1, n_hidden2, name="hidden2", activation=tf.nn.elu)
    hidden3 = neuron_layer(hidden2, n_hidden3, name="hidden3", activation=tf.nn.elu)
    hidden4 = neuron_layer(hidden3, n_hidden4, name="hidden4", activation=tf.nn.elu)
    hidden5 = neuron_layer(hidden4, n_hidden5, name="hidden5", activation=tf.nn.elu)
    hidden6 = neuron_layer(hidden5, n_hidden6, name="hidden6", activation=tf.nn.elu)
    hidden7 = neuron_layer(hidden6, n_hidden7, name="hidden7", activation=tf.nn.elu)
    hidden8 = neuron_layer(hidden7, n_hidden8, name="hidden8", activation=tf.nn.elu)
    hidden9 = neuron_layer(hidden8, n_hidden9, name="hidden9", activation=tf.nn.elu)
    hidden10 = neuron_layer(hidden9, n_hidden10, name="hidden10", activation=tf.nn.elu)
    logits = neuron_layer(hidden10, n_outputs, name="outputs", activation=tf.nn.elu)

# Define loss function to train the neural network
with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y_hold, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")
    
# Set up AdamOptimizer and learning rate
learning_rate = 0.00115
with tf.name_scope("train"):
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    training_op = optimizer.minimize(loss)

# Set up performance measure
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y_hold, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))


init = tf.global_variables_initializer()

# Training hyperparameters
n_epochs = 11
batch_size = 50

# Returns mini batches from the training set
def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = 10
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch

# Execute training and validation
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(training_op, feed_dict={X_hold: X_batch, y_hold: y_batch})
        acc_batch = accuracy.eval(feed_dict={X_hold: X_batch, y_hold: y_batch})
        acc_val = accuracy.eval(feed_dict={X_hold: X_valid, y_hold: y_valid})
        print(epoch, "Batch accuracy:", acc_batch, "Val accuracy:", acc_val)

0 Batch accuracy: 0.8656 Val accuracy: 0.85192
1 Batch accuracy: 0.9872 Val accuracy: 0.99056
2 Batch accuracy: 1.0 Val accuracy: 0.99904
3 Batch accuracy: 1.0 Val accuracy: 0.99992
4 Batch accuracy: 1.0 Val accuracy: 0.99992
5 Batch accuracy: 1.0 Val accuracy: 1.0
6 Batch accuracy: 1.0 Val accuracy: 1.0
7 Batch accuracy: 1.0 Val accuracy: 1.0
8 Batch accuracy: 1.0 Val accuracy: 1.0
9 Batch accuracy: 1.0 Val accuracy: 1.0
10 Batch accuracy: 1.0 Val accuracy: 1.0


Amazingly the neural network defined in plain TensorFlow converges to 100% accuracy in less than 100 steps! The use of derived features and proper random initialization of weights must be the cause of the excellent performance.

## TF.Learn Neural Network with He Initialization and Dropout
Let us see if we can replicate the performance of the last neural network with a new neural network defined with TF.Learn. The setup is similar to before except we will use tf.layer.dense instead of our own function to create neuron layers.

In [44]:
tf.reset_default_graph()

# Input layer one neuron for each feature in X
n_inputs = X.shape[1]

# Our neural net has ten hidden layers with geometrically decaying number of neurons
n_hidden1 = 512
n_hidden2 = 512
n_hidden3 = 256
n_hidden4 = 256
n_hidden5 = 128
n_hidden6 = 128
n_hidden7 = 64
n_hidden8 = 64
n_hidden9 = 32
n_hidden10 = 32

# There are two possible classes so the output layer has two neurons
n_outputs = 2

# These placeholders will be used to store mini batches from the training set 
X_hold = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X_hold")
y_hold = tf.placeholder(tf.int32, shape=(None), name="y_hold")

training = tf.placeholder_with_default(False, shape=(), name='training')

# Define dropout rate
dropout_rate = 0.5
X_drop = tf.layers.dropout(X, dropout_rate, training=training)

# Construction of the neural network with he initalization and drop out
with tf.name_scope("dnn"):
    he_init = tf.variance_scaling_initializer()
    hidden1 = tf.layers.dense(X_hold, n_hidden1, name="hidden1", activation=tf.nn.elu, kernel_initializer=he_init)
    hidden1_drop = tf.layers.dropout(hidden1, dropout_rate, training=training)
    hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2", activation=tf.nn.elu, kernel_initializer=he_init)
    hidden2_drop = tf.layers.dropout(hidden2, dropout_rate, training=training)
    hidden3 = tf.layers.dense(hidden2, n_hidden3, name="hidden3", activation=tf.nn.elu, kernel_initializer=he_init)
    hidden3_drop = tf.layers.dropout(hidden3, dropout_rate, training=training)
    hidden4 = tf.layers.dense(hidden3, n_hidden4, name="hidden4", activation=tf.nn.elu, kernel_initializer=he_init)
    hidden4_drop = tf.layers.dropout(hidden4, dropout_rate, training=training)
    hidden5 = tf.layers.dense(hidden4, n_hidden5, name="hidden5", activation=tf.nn.elu, kernel_initializer=he_init)
    hidden5_drop = tf.layers.dropout(hidden5, dropout_rate, training=training)
    hidden6 = tf.layers.dense(hidden5, n_hidden6, name="hidden6", activation=tf.nn.elu, kernel_initializer=he_init)
    hidden6_drop = tf.layers.dropout(hidden6, dropout_rate, training=training)
    hidden7 = tf.layers.dense(hidden6, n_hidden7, name="hidden7", activation=tf.nn.elu, kernel_initializer=he_init)
    hidden7_drop = tf.layers.dropout(hidden7, dropout_rate, training=training)
    hidden8 = tf.layers.dense(hidden7, n_hidden8, name="hidden8", activation=tf.nn.elu, kernel_initializer=he_init)
    hidden8_drop = tf.layers.dropout(hidden8, dropout_rate, training=training)
    hidden9 = tf.layers.dense(hidden8, n_hidden9, name="hidden9", activation=tf.nn.elu, kernel_initializer=he_init)
    hidden9_drop = tf.layers.dropout(hidden9, dropout_rate, training=training)
    hidden10 = tf.layers.dense(hidden9, n_hidden10, name="hidden10", activation=tf.nn.elu, kernel_initializer=he_init)
    hidden10_drop = tf.layers.dropout(hidden10, dropout_rate, training=training)
    logits = tf.layers.dense(hidden10, n_outputs, name="outputs", activation=tf.nn.elu, kernel_initializer=he_init)
    y_proba = tf.nn.softmax(logits)

# Define loss function to train the neural network
with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y_hold, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")
    
# Set up AdamOptimizer and learning rate
learning_rate = 0.0115
with tf.name_scope("train"):
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    training_op = optimizer.minimize(loss)

# Set up performance measure
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y_hold, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

# Intilize TensorFlow variables
init = tf.global_variables_initializer()

# Training hyperparameters
n_epochs = 11
batch_size = 50

# Returns mini batches from the training set
def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = 10
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch

# Execute training and validation
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(training_op, feed_dict={X_hold: X_batch, y_hold: y_batch, training: True})
        acc_batch = accuracy.eval(feed_dict={X_hold: X_batch, y_hold: y_batch})
        acc_val = accuracy.eval(feed_dict={X_hold: X_valid, y_hold: y_valid})
        print(epoch, "Batch accuracy:", acc_batch, "Val accuracy:", acc_val)

0 Batch accuracy: 0.8504 Val accuracy: 0.85128
1 Batch accuracy: 0.988 Val accuracy: 0.99168
2 Batch accuracy: 1.0 Val accuracy: 0.99968
3 Batch accuracy: 1.0 Val accuracy: 1.0
4 Batch accuracy: 1.0 Val accuracy: 1.0
5 Batch accuracy: 1.0 Val accuracy: 1.0
6 Batch accuracy: 1.0 Val accuracy: 1.0
7 Batch accuracy: 1.0 Val accuracy: 1.0
8 Batch accuracy: 1.0 Val accuracy: 1.0
9 Batch accuracy: 1.0 Val accuracy: 1.0
10 Batch accuracy: 1.0 Val accuracy: 1.0


Eureka! We did it again and in even fewer steps too! As we saw earlier there is a good distribution of clients in the Positive and Negative class within the validation set. So our neural network must be correctly classifying all the clients in order to score 100% accuracy on the validation set.