# OPM (Other People's Machine)
### ... learning libraries

Well, it is not about Other People's Money that I am talking about here.  There are many open source libraries available, including Deep Learning libraries that you can call in for your rather complex ANN models.  Why not we do the OPM for ANN?  

In this notebook, let me provide you with a deep learning example with another popular neural network library, Keras for predicting credit default (or write-off) using credit payment sample data.

## Keras
> from [wikipedia](https://en.wikipedia.org/wiki/Keras)

Keras is an open source neural network library written in Python. It is capable of running on top of MXNet, Deeplearning4j, Tensorflow, CNTK or Theano. Designed to enable fast experimentation with deep neural networks, it focuses on being minimal, modular and extensible. It was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System), and its primary author and maintainer is François Chollet, a Google engineer.

In 2017, Google's TensorFlow team decided to support Keras in TensorFlow's core library. Chollet explained that Keras was conceived to be an interface rather than an end-to-end machine-learning framework. It presents a higher-level, more intuitive set of abstractions that make it easy to configure neural networks regardless of the backend scientific computing library. Microsoft has been working to add a CNTK backend to Keras as well and the functionality is currently in beta release with CNTK v2.0 

In [1]:
# read a dataset of interest

import pandas as pd

url = 'https://raw.githubusercontent.com/YLEE200/MLFS/master/testdata/CRPMT_SAMPLE.csv'

df = pd.read_csv(url)
df.head(5)

Unnamed: 0,﻿ACCT_NO,PROD,CURR_BAL,TENURE,CUST_INC,CUST_AGE,PMT_DUE,NO_DM_CNT,WRITE_OFF_IND,FICO_SCR
0,1291,1.REG,755.16,3.0,44212,46,60.41,5,1,651
1,1292,1.REG,276.61,0.7,86249,34,22.13,10,0,702
2,1293,2.GOLD,424.7,0.1,79474,45,21.23,22,0,753
3,1294,3.PLAT,11683.23,10.8,81198,58,584.16,22,0,763
4,1295,1.REG,246.34,5.5,63502,35,19.71,11,1,590


In [2]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
﻿ACCT_NO,610,1595.5,176.236111,1291.0,1443.25,1595.5,1747.75,1900.0
CURR_BAL,610,1804.10777,2905.286132,-25.0,331.7525,680.625,1049.4625,11996.61
TENURE,610,6.432459,4.634506,0.1,2.8,5.9,8.9,22.0
CUST_INC,610,83260.531148,42206.490438,25089.0,54015.5,73585.5,94804.25,217338.0
CUST_AGE,610,40.132787,11.691593,19.0,31.0,40.0,49.0,69.0
PMT_DUE,610,99.196262,142.255957,0.0,22.6625,46.4,78.4625,599.83
NO_DM_CNT,610,9.870492,6.714781,1.0,4.0,7.0,15.0,26.0
WRITE_OFF_IND,610,0.15082,0.358167,0.0,0.0,0.0,0.0,1.0
FICO_SCR,610,701.32623,85.151137,551.0,625.0,693.5,780.0,849.0


## Data Labeling and Train/Test Split

In [3]:
from sklearn.preprocessing import LabelEncoder

# converting PROD to numerical (0: 1.REG, 1: 2.GOLD, 2: 3.PLAT)
lenc = LabelEncoder()
lenc.fit(df['PROD'])

df['PROD_NO'] = lenc.transform(df['PROD'])

df.head(5)

Unnamed: 0,﻿ACCT_NO,PROD,CURR_BAL,TENURE,CUST_INC,CUST_AGE,PMT_DUE,NO_DM_CNT,WRITE_OFF_IND,FICO_SCR,PROD_NO
0,1291,1.REG,755.16,3.0,44212,46,60.41,5,1,651,0
1,1292,1.REG,276.61,0.7,86249,34,22.13,10,0,702,0
2,1293,2.GOLD,424.7,0.1,79474,45,21.23,22,0,753,1
3,1294,3.PLAT,11683.23,10.8,81198,58,584.16,22,0,763,2
4,1295,1.REG,246.34,5.5,63502,35,19.71,11,1,590,0


In [65]:
# all numeric feature variables 

feature_cols = [
    'CURR_BAL',                                               
    'TENURE',                       
    'CUST_INC',                      
    'CUST_AGE',                                
    'PMT_DUE',                                               
    'NO_DM_CNT',               
    'FICO_SCR',
    'PROD_NO'
]

In [66]:
X = df[feature_cols]
y = df.WRITE_OFF_IND

X.dtypes

CURR_BAL     float64
TENURE       float64
CUST_INC       int64
CUST_AGE       int64
PMT_DUE      float64
NO_DM_CNT      int64
FICO_SCR       int64
PROD_NO        int64
dtype: object

In [67]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Data Preprocessing

The neural network may have difficulty converging before the maximum number of iterations allowed if the data is not normalized. Neural Networks is sensitive to feature scaling, so it is highly recommended to scale your data. Note that you must apply the same scaling to the test set for meaningful results. There are a lot of different methods for normalization of data, we will use the built-in StandardScaler for standardization.

In [68]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

# Fit only to the training data
scaler.fit(X_train)

# Now apply the transformations to the data:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

### Model training and Performance check

In [69]:
import numpy as np

from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from keras.utils import np_utils

In [70]:
# Build a simple model
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=X_train.shape[1]))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(set(y)), activation='softmax'))

model.compile(optimizer='rmsprop',
             loss='categorical_crossentropy',
             metrics=['accuracy'])

The training process will run for a fixed number of iterations through the dataset called epochs, that we must specify using the nepochs argument. We can also set the number of instances that are evaluated before a weight update in the network is performed, called the batch size and set using the batch_size argument.

In [71]:
# Set parameters
epoch = 50
batch_size = 10

In [72]:
# make y to class variables
one_hot_label_y_train = np_utils.to_categorical(y_train)
one_hot_label_y_test = np_utils.to_categorical(y_test)

# model training and evaluation
model.fit(X_train, one_hot_label_y_train, epochs=epoch, batch_size=batch_size)
score = model.evaluate(X_test, one_hot_label_y_test, batch_size=batch_size)

print("\n{}: {:.2f}%".format(model.metrics_names[1], score[1]*100))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
 10/122 [=>............................] - ETA: 0s
acc: 80.33%


In [81]:
# sample X test data observation 2, scaled
X_test[1]

array([-0.5003066 , -0.53785823, -0.18407372,  0.60715339, -0.57826074,
        0.47208358,  0.67581671,  0.44874886])

In [82]:
# sample y test data observation 2, tuple
one_hot_label_y_test[1]

array([ 1.,  0.])

In [83]:
# predicting y value, observation 2
predict_data = np.array(X_test[1])
x = predict_data.reshape(-1,8)

predict = model.predict(x)

In [84]:
predict

array([[ 0.98823136,  0.01176866]], dtype=float32)

The prediction is "no write-off" as close to the actual y label of [1,0]

## Summary
This illustrative python notebook shows how to run a simple deep learning technique from Keras library. I hope you to see how easy to adopt deep learning for your data analytics and modeling needs