# Predicting whether a loan borrower will pay back or not using Neural Networks.

## The Data

We will be using a subset of the LendingClub DataSet obtained from Kaggle: https://www.kaggle.com/wordsforthewise/lending-club


LendingClub is a US peer-to-peer lending company, headquartered in San Francisco, California.[3] It was the first peer-to-peer lender to register its offerings as securities with the Securities and Exchange Commission (SEC), and to offer loan trading on a secondary market. LendingClub is the world's largest peer-to-peer lending platform.

### Our Goal

Given historical data on loans given out with information on whether or not the borrower defaulted (charge-off), can we build a model thatcan predict wether or nor a borrower will pay back their loan? This way in the future when we get a new potential customer we can assess whether or not they are likely to pay back the loan. Keep in mind classification metrics when evaluating the performance of your model!

The "loan_status" column contains our label.

### Data Overview

In [21]:
# importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# might be needed depending on your version of Jupyter
%matplotlib inline

In [31]:
# Data is already pre-processed with feature engineering & Explatory Data Analysis
import pickle
with open('my.pickle', 'rb') as data:
    df1=pickle.load(data)

## Train Test Split

**Using Scikit learn( fit the data to train) & use MinMaxScaler for normalising data.**

In [33]:
#Fit the X to be everything expect loan_repaid as loan_repaid is what we want to predict
X = df1.drop('loan_repaid',axis=1).values
# y is our predictor so we want to predict 'loan_repaid'
y = df1['loan_repaid'].values
#fit the train test into x and y with 80% training and 20% testing
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=101)

In [34]:
# preprocess data and avoid testing data leakage 
from sklearn.preprocessing import MinMaxScaler
scaler= MinMaxScaler()

In [35]:
X_train=scaler.fit_transform(X_train)

In [36]:
X_test=scaler.transform(X_test)

<h2> Quick Note on our Neural Network Build </h2>
<br>Building a NN( sequential model)
<br>We have different dense units: 78 dense units --> 39 dense unit --> 19 densue unit--> 1 output neuron. 
<br>We use dropouts to prevent overfitting and turn off neurons if they do. 
<br> We use rectified Linear Unit activation for most part and sigmoid activation for last output neurons.
<br>Using sigmoid activation for the last output neuron as this is binary classification problem. 
<br>Using adam optimiser with binary crossentrophy loss measurement. We then visualise these losses.
<br> EarlyStopping mechanism when training so it stops at perfect epochs(iterations)
<br> Optional: Using Tensorboard but it won't be as useful as this is not Convoluted Neural Network. 

In [183]:
# importing necessary libraries
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
#check out shape of train data so we know how many Neuron unit we need
X_train.shape # we have 426 elements with 30 features so 30 units of NN
model=Sequential()
model.add(Dense(78, activation='relu'))# put 78 NN
model.add(Dropout(0.5))# 0- no neurons turnt off, 1 = 100% neurons turnt off, use 0.2-0.5
model.add(Dense(39, activation='relu'))# cut in half almost
model.add(Dropout(0.5))# 0- no neurons turnt off, 1 = 100% neurons turnt off, use 0.2-0.5
model.add(Dense(19, activation='relu'))# cut in half again
model.add(Dropout(0.5))# 0- no neurons turnt off, 1 = 100% neurons turnt off, use 0.2-0.5
# BINARY CLASSIFICATION AS IN LOAN-PAID=1 AND LOAN-NOT-PAID=0 SO USE SIGMOID ACTIVATION
model.add(Dense(1, activation='sigmoid'))# we want 1 NN output with sigmoid activation
model.compile(loss='binary_crossentropy', optimizer='adam')
# early stopping mechanism to prevent overfitting
early_stop= EarlyStopping(monitor='val_loss', mode='min' , verbose=1, patience=25)# have patience after 25iterations
# mode what is what your trying to(min for loss or max for accuracy)
# verbose gives report
# train the model using fit
model.fit(x=X_train, y=y_train, epochs=600,validation_data=(X_test,y_test),
         callbacks=[early_stop])

Epoch 1/600
Epoch 2/600
Epoch 3/600
Epoch 4/600
Epoch 5/600
Epoch 6/600
Epoch 7/600
Epoch 8/600
Epoch 9/600
Epoch 10/600
Epoch 11/600
Epoch 12/600
Epoch 13/600
Epoch 14/600
Epoch 15/600
Epoch 16/600
Epoch 17/600
Epoch 18/600
Epoch 19/600
Epoch 20/600
Epoch 21/600
Epoch 22/600
Epoch 23/600
Epoch 24/600
Epoch 25/600
Epoch 26/600
Epoch 27/600
Epoch 28/600
Epoch 29/600
Epoch 30/600
Epoch 31/600
Epoch 32/600
Epoch 33/600
Epoch 34/600
Epoch 35/600
Epoch 36/600
Epoch 37/600
Epoch 38/600
Epoch 39/600
Epoch 40/600
Epoch 41/600
Epoch 42/600
Epoch 43/600
Epoch 44/600
Epoch 45/600
Epoch 46/600
Epoch 47/600
Epoch 48/600
Epoch 49/600
Epoch 50/600
Epoch 51/600
Epoch 52/600
Epoch 00052: early stopping


<tensorflow.python.keras.callbacks.History at 0x18d2b169508>

**Save the model so we can instantly check new customers without training NN model again.**

In [39]:
#by saving the command above, we can instantly check new price of any customers loan_status
#by loading this .h5 files on it 
from tensorflow.keras.models import load_model
model.save('loan_model.h5')
later_model=load_model('loan_model.h5')

NameError: name 'model' is not defined

#  Evaluating Model Performance.

**Plotting out the validation loss versus the training loss.**

In [None]:
model_loss= pd.DataFrame(model.history.history)
model_loss.plot()

**TASK: Create predictions from the X_test set and display a classification report and confusion matrix for the X_test set.**

In [186]:
#lets predict & evaluate with keras
predictions= model.predict_classes(X_test)
from sklearn.metrics import classification_report, confusion_matrix
print(classification_report(y_test, predictions))
print(confusion_matrix(y_test, predictions)) # they were not perfectly balanced

Instructions for updating:
Please use instead:* `np.argmax(model.predict(x), axis=-1)`,   if your model does multi-class classification   (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`,   if your model does binary classification   (e.g. if it uses a `sigmoid` last-layer activation).
              precision    recall  f1-score   support

           0       1.00      0.43      0.60     15658
           1       0.88      1.00      0.93     63386

    accuracy                           0.89     79044
   macro avg       0.94      0.72      0.77     79044
weighted avg       0.90      0.89      0.87     79044

[[ 6739  8919]
 [   14 63372]]


**Given the customer below, would you offer this person a loan?**

In [40]:
# so i picked a new customer and i want to know what the loan_status is  
import random
random.seed(101)
random_ind= random.randint(0,len(df))
new_person=df.drop('loan_repaid', axis=1).iloc[random_ind]
new_person

loan_amnt                                    24000
term                                     60 months
int_rate                                     13.11
installment                                 547.43
grade                                            B
sub_grade                                       B4
emp_length                               10+ years
home_ownership                            MORTGAGE
annual_inc                                   85000
verification_status                Source Verified
issue_d                                   Jan-2013
loan_status                             Fully Paid
purpose                                credit_card
title                                         Debt
dti                                          10.98
earliest_cr_line                          Oct-1991
open_acc                                         6
pub_rec                                          0
revol_bal                                    35464
revol_util                     

In [193]:
df.iloc[random_ind]['loan_repaid']
# 1 means paid, 0 means default(they did not pay)

1.0