# MACHINE LEARNING FOR FINANCIAL SERVICES

Welcome to IBM's Data Science Experience! This exciting tool will help your life a lot easier as a data scientist. Below is a simple introductory example of how easy for you to load your data and run a deep learning technique.

## Credit Default Prediction with Deep Learning 
In this notebook, I used a sample credit card data to predict write-off prediction using TesorFlow-based deep learning technique.  Deep-learning is gaining great momentum these days as those enable users to tackle various modeling challenges that we were not able to easily address such as image recognition.  Yet, there are many applicable techniques for financial services and I hope you can see how easy it is to adopt this deep learning technique for a real business problem.

## TensorFlow

> from [wikipedia](https://en.wikipedia.org/wiki/TensorFlow)

TensorFlow™ is an open source software library for numerical computation using data flow graphs. TensorFlow was developed by the Google Brain team for internal Google use. It was released under the Apache 2.0 open source license on 9 November 2015. The name of the library help us understand how we work with it. Basically, tensors are multidimensional arrays that flow through the nodes of a graph.

In the data flow graphs, nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

In this notebook, a simple illustrative example of MLP (multi-layer perceptrons) with two hidden layers, which therefore the model estimation is rather simple, is to be modeled via TensorFlow library.

### Background: Neural Network Architecture 

The neural network will have 2 hidden layers (you have to choose how many hidden layers the network will have, is part of the architecture design). The job of each hidden layer is to transform the inputs into something that the output layer can use. Each node (neuron) is multiplied by a weight. Every node has a weight value, and during the training phase the neural network adjusts these values in order to produce a correct output. In addition to multiplying each input node by a weight, the network also adds a bias (role of bias in neural networks).

In your architecture after multiplying the inputs by the weights and sum the values to the bias, the data also pass by an activation function. This activation function defines the final output of each node. An analogy: imagine that each node is a lamp, the activation function tells if the lamp will light or not.

### To load the data:

1. Load your local file into your notebook. Click the **Find and Add Data** icon on the notebook action bar. Drop the file into the box or browse to select the file. The file is loaded to your object storage and appears in the Data Assets section of the project. For more information, see <a href="https://datascience.ibm.com/docs/content/analyze-data/load-and-access-data.html" target="_blank" rel="noopener noreferrer">Load and access data</a>.
1. click in the next code cell and select **Insert to code > pandas DataFrame** under the file name.

For this exercise, the above steps were done in advance, and a csv file has been put in this notebook's working directory

### Pre-processing & Check the data dimensions
Deep learning techniques which are designed to handle computationally expensive algorithms take to the data into numerical dimensions so that the model layers can perform and estimate inputs and outputs.  

In [2]:
import tensorflow as tf
import pandas as pd
import numpy as np
import itertools

In [3]:
# Read the input data, pre-process and test-check on the dimensions of X and y

df = pd.read_csv('CRPMT_SAMPLE.csv')
df['PROD_NO'] = df.PROD.map({'1.REG': 0, '2.GOLD':1, '3.PLAT':2})

feature_cols = [          
        'CURR_BAL',                                               
        'TENURE',                       
        'CUST_INC',                      
        'CUST_AGE',                                
        'PMT_DUE',                                               
        'NO_DM_CNT',               
        'FICO_SCR',
        'PROD_NO'
    ]

X = df[feature_cols]
y = df.WRITE_OFF_IND

print (X.shape)
print (y.shape)

(610, 8)
(610,)


In [4]:
# to check the model later on
X.head(3)

Unnamed: 0,CURR_BAL,TENURE,CUST_INC,CUST_AGE,PMT_DUE,NO_DM_CNT,FICO_SCR,PROD_NO
0,755.16,3.0,44212,46,60.41,5,651,0
1,276.61,0.7,86249,34,22.13,10,702,0
2,424.7,0.1,79474,45,21.23,22,753,1


In [5]:
# to check the model later on
y.head(3)

0    1
1    0
2    0
Name: WRITE_OFF_IND, dtype: int64

### Read and Format the data into dimensional matrices
As mentioned above, TensorFlow, just like the other typical neural networks, is taking multi-dimensional data arrays (tensors) that flow through the nodes of a data flow graph.  Hence it is important to define (or set up) the layers (input, hidden and output) with the proper array dimensions.  In this example, we have (610,8) input layer and (610,1) output layer matrices, which will be re-arrayed. 

In [16]:
x_labels = np.array(X.values)
y_label = y.values

y = []
value = []
for i in y_label:
    if i not in value:
        value.append(i)

for i in y_label:
    y.append(value.index(i))

y_label = pd.get_dummies(y)

print (x_labels)
print (y_label)

[[  7.55160000e+02   3.00000000e+00   4.42120000e+04 ...,   5.00000000e+00
    6.51000000e+02   0.00000000e+00]
 [  2.76610000e+02   7.00000000e-01   8.62490000e+04 ...,   1.00000000e+01
    7.02000000e+02   0.00000000e+00]
 [  4.24700000e+02   1.00000000e-01   7.94740000e+04 ...,   2.20000000e+01
    7.53000000e+02   1.00000000e+00]
 ..., 
 [  5.85830000e+02   1.90000000e+00   6.49210000e+04 ...,   4.00000000e+00
    6.50000000e+02   0.00000000e+00]
 [  2.69950000e+02   1.60000000e+00   5.73540000e+04 ...,   1.60000000e+01
    7.68000000e+02   1.00000000e+00]
 [  6.91900000e+02   6.00000000e+00   8.37460000e+04 ...,   6.00000000e+00
    6.56000000e+02   0.00000000e+00]]
     0  1
0    1  0
1    0  1
2    0  1
3    0  1
4    1  0
5    0  1
6    0  1
7    0  1
8    1  0
9    0  1
10   1  0
11   0  1
12   0  1
13   1  0
14   0  1
15   0  1
16   1  0
17   0  1
18   1  0
19   0  1
20   0  1
21   0  1
22   0  1
23   0  1
24   0  1
25   0  1
26   0  1
27   0  1
28   0  1
29   0  1
..  .. ..


### Model Setup and Execution

In [17]:
# setting parameters: you could change these based on your data structure
# you need to reduce dimensions as the process flows from input to output

n_input = 8 # input features
n_hidden_1 = 5 # 1st layer's number of features; pick a # that may be a mid-point between prior and next layers
n_hidden_2 = 3 # 2nd layer's number of features; pick a # that may be a mid-point between prior and next layers
n_classes = 2 # total output classes : 0/1 binary array: non write-off (1,0) vs. write-off(0,1)

In [18]:
# tf Graph set up

# creating a palceholder for the holding the input value data that will be fed during run time 
# number of rows undefined to get any values, 8 is set for the number of columns
X_input = tf.placeholder(tf.float32, [None, n_input])

# creating a placeholder for holding the actual y data that will fed during the run time
# number of rows undefined to get any values, 2 is set for the number of columns
y_output = tf.placeholder(tf.float32, [None, n_classes])

### Background: Weights & Biases

In neural networks, it is important to understand weights and biases (not just for the coding purpose).  In a simplest concept, you may want to think about a simple linear function expressed in Y = mX + c, where m is for slope and c is for constant.  Then, simply replace that with Y = wX + b, where w is a matrix of weights and b is a bias. Think of the weights as importance you assign to each feature (row of an input). Bias is independent of features. If the bias for that row (category) is high, then the score for that row will be higher.
                    
In TensorFlow, the weights are initialized with the tf.random_normal and given their shape of a 2-D tensor with the first dim representing the number of units in the layer from which the weights connect and the second dim representing the number of units in the layer to which the weights connect. The tf.random_normal initializer generates a random distribution with a given mean and standard deviation.  As you see below, the dimensions become reduced as the data/flow progresses. Often the features in the hidden layers are not easily interpretable by humans. 

This is a type of dimensionality reduction (or a type of feature extraction) as the next layer reduces its dimension from the prior dimensions by taking combinatorial dimensional values.  To simply put, overall neural networks is designed to extract the first layer or input data (a more complex matrix of data) and process to reduce (or extract) the complexity into a simpler output (or a target vector) so that the process is modeled to predict the target value. 

Then the biases are initialized with tf.zeros to ensure they start with all zero values, and their shape is simply the number of units in the layer to which they connect.

In [19]:
# Setting layers' weights & biases

w1 = tf.Variable(tf.random_normal([n_input, n_hidden_1]))
w2 = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]))
out_w =  tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
 
b1 = tf.Variable(tf.zeros([n_hidden_1]))
b2 = tf.Variable(tf.zeros([n_hidden_2]))
out_b = tf.Variable(tf.zeros([n_classes]))

What is ReLU activation?

In the context of artificial neural networks, the rectifier is an activation function defined as:

f(x) = max (0,x), where x is the input to a neuron. 

The important thing to note is that it’s non-linear (as opposed to the xW+b part, which is linear.) Why do we need to add non-linearities? Because if not, the entire network could collapse to one layer.

Activation just means output. The linear activation in the last layer of this model means ‘return the output without doing anything more (like ReLU) to it’.

In [20]:
# Hidden layer 1 with ReLU activation
layer_1 = tf.add(tf.matmul(X_input, w1), b1)
layer_1 = tf.nn.relu(layer_1)
    
# Hidden layer 2 with ReLU activation
layer_2 = tf.add(tf.matmul(layer_1, w2), b2)
layer_2 = tf.nn.relu(layer_2)
    
# Output layer with linear activation
out_layer = tf.matmul(layer_2, out_w ) + out_b

In [21]:
losses = tf.losses.mean_squared_error(y_label,out_layer)

#optimizer = tf.train.AdamOptimizer(0.001).minimize(losses) #using Adaptive Moment Estimation (Adam)
optimizer = tf.train.GradientDescentOptimizer(0.09).minimize(losses) #using Gradient Descent

In [22]:
# Initiate the session and setting global variables

sess = tf.InteractiveSession()
init = tf.global_variables_initializer()
sess.run(init)
print "************************weights before training*********************************"
print "                                                                                "


we1,we2,we3 = sess.run([w1,w2,out_w])
print "weight_1: "
print we1
print "weight_2: "
print we2
print "weight_3: "
print we3

print "-------------------------Training Result------------------------------------------"
print "                                                                                 "

step_size = 10000
for step in range(step_size):
    
    a,b,c,d,e,f,g,h = sess.run([layer_1,layer_2,out_layer,losses,optimizer,w1,w2,out_w], feed_dict={X_input:x_labels, y_output:y_label})

    if step%2000==0:
        print "losses after per 10000 iteration: ",d

print "                                                                                         "       
correct_prediction = tf.equal(tf.argmax(out_layer,1), tf.argmax(y_output,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print "-------------------------------------Accuracy--------------------------------------------"
print "                                                                                         "

print "Accuracy on the model: ",accuracy.eval(feed_dict={X_input:x_labels, y_output:y_label})

#check the model with the first observation
check_1 = tf.add(tf.matmul([[755.16, 3, 44212, 46, 60.41, 5, 651, 0]], w1), b1)
check_1 = tf.nn.relu(check_1)
    
check_2 = tf.add(tf.matmul(check_1, w2), b2)
check_2 = tf.nn.relu(check_2)

output = tf.nn.relu(tf.matmul(check_2, out_w) + out_b)

print "***************************** weight after training *************************************"
print "weight_1 after trainig: "
print f
print "weight_2 after training: "
print g
print "weight_3 after training: "
print h

a,b,c = sess.run([check_1, check_2 ,output])

print "-------------------------------------Results--------------------------------------------"
print "                                                                                         "

convert_list = list(itertools.chain.from_iterable(c))
print convert_list

indx = sess.run(tf.argmax(convert_list))

if(indx==0):
    print "NO WRITE OFF"
elif(indx==1):
    print "WRITE OFF"
else:
    print "WHAT??"

************************weights before training*********************************
                                                                                
weight_1: 
[[ 0.19714087 -0.53576887 -1.6313076  -0.46538144 -0.24096622]
 [ 0.48001674  1.38334513  2.1973803  -1.50088358 -1.11959493]
 [-0.55318099  0.07201028 -2.67183828 -0.3964572   0.20036089]
 [-1.26047742  0.0318081  -1.26312864 -0.6359393   0.36029524]
 [-0.00574145  0.58239913 -1.10277545  0.13962074 -0.33975729]
 [-0.8128832  -0.33963752  0.03247813 -0.92922157 -0.29505524]
 [ 0.24585201 -0.51315165 -0.45331183 -0.88602358  0.20855016]
 [ 0.74723089 -1.01083338 -0.21508068  0.21001558 -1.73101401]]
weight_2: 
[[-1.63071394 -0.42717698 -2.45502782]
 [ 0.65831822 -0.1745652  -1.46276009]
 [ 0.02916694  0.36403146 -0.07637378]
 [ 1.35566914  0.5127328  -1.04606068]
 [-0.37508479  0.15286568 -1.00266933]]
weight_3: 
[[ 1.3991344   0.16665764]
 [-0.28473514  1.08020461]
 [ 0.57035786  0.02002707]]
----------------------

## Summary

This illustrative python notebook shows how to get started with basic deep learning utilizing TensorFlow library and MLP technique. I hope you to see how easy to adopt IBM's Data Science Experience for your data analytics and modeling needs. Please find overview and getting-started information in the Data Science Experience documentation: https://datascience.ibm.com/docs/content/getting-started/welcome-main.html. Learn about Jupyter notebooks, which are used throughout this scenario, in the Data Science Experience documentation: https://datascience.ibm.com/docs/content/analyze-data/notebooks-parent.html
