# MACHINE LEARNING FOR FINANCIAL SERVICES

Welcome to IBM's Data Science Experience! This exciting tool will help your life a lot easier as a data scientist. Below is a simple introductory example of how easy for you to load your data and run deep learning algorithms to analyze and predict an outcome.

## Deep Learning for Text Analysis 
Approximately, 85% of the data is unstructured and those who can unleash the powerful insights from those will, no doubt, create a superior competitive advantage.   In this notebook, I chose a sample hypothetical SMS data to predict churn response by training simple Neural Network algorithms (MLP and 1-D CNN) using Keras library. Especially, CNN (Convolutional Neural Networks) is being employed in high dimensional computations such as image recognition , yet I'd like to introduce a basic/simple Natural Language Processing (NLP) for illustrative purpose. 

> Natural Language Processing (NLP): NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way. By utilizing NLP, developers can organize and structure knowledge to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation.  

> Although NLP could get very complex, in a nutshell, the data (text or speech) needs to be "parameterized" (or making the data to "numbers") so that machine learning algorithms can kicks in and do the magics :-)

In [1]:
from io import StringIO
import requests
import json
import pandas as pd

# @hidden_cell
# This function accesses a file in your Object Storage. The definition contains your credentials.
# You might want to remove those credentials before you share your notebook.
def get_object_storage_file_with_credentials_fcf1c90868844bc1ab7d4fffe6063140(container, filename):
    """This functions returns a StringIO object containing
    the file content from Bluemix Object Storage."""

    url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens'])
    data = {'auth': {'identity': {'methods': ['password'],
            'password': {'user': {'name': 'member_cf2f28cf1cf37485c31a0c2d2443463e000ad9b0','domain': {'id': '20b49a5877434f9486aa2a1d2fcdd21c'},
            'password': 'tVP(ZWG8k9J._ZhB'}}}}}
    headers1 = {'Content-Type': 'application/json'}
    resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
    resp1_body = resp1.json()
    for e1 in resp1_body['token']['catalog']:
        if(e1['type']=='object-store'):
            for e2 in e1['endpoints']:
                        if(e2['interface']=='public'and e2['region']=='dallas'):
                            url2 = ''.join([e2['url'],'/', container, '/', filename])
    s_subject_token = resp1.headers['x-subject-token']
    headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
    resp2 = requests.get(url=url2, headers=headers2)
    return StringIO(resp2.text)

data_1 = get_object_storage_file_with_credentials_fcf1c90868844bc1ab7d4fffe6063140('MachineLearningShowcase', 'churnSMS.csv')
sms = pd.read_table(data_1, header=None, sep = ",", names=['label', 'message'])

sms.shape

(403, 2)

In [2]:
# if you like to directly read in from my github...

#import pandas as pd
#url = 'https://raw.githubusercontent.com/YLEE200/ML-SHOWCASE/master/churnSMS.csv'

#sms = pd.read_csv(url, names =['label', 'message'], encoding='iso-8859-1')
#sms.shape

In [3]:
# examine the first 10 rows
sms.head(10)

Unnamed: 0,label,message
0,datactr1,cofidential and proprietary
1,churn,this is interesting� please cancel my policy n...
2,churn,would you call me ASAP? My number is 123-555-...
3,churn,I am not renewing my policy. This is sucks
4,churn,there has been a fraud on my account. please c...
5,churn,"OMG, I don't like this�. I am done w/ my account"
6,churn,"FRAUD, for god sake.. What is going on?"
7,churn,Too expensive! There should be some discount�.
8,churn,"My friend told me, there�s a better deal out t..."
9,churn,Can you believe it? Cancel my account


In [4]:
# debugging the first row
sms = sms.loc[1:,:]
sms.head(10)

Unnamed: 0,label,message
1,churn,this is interesting� please cancel my policy n...
2,churn,would you call me ASAP? My number is 123-555-...
3,churn,I am not renewing my policy. This is sucks
4,churn,there has been a fraud on my account. please c...
5,churn,"OMG, I don't like this�. I am done w/ my account"
6,churn,"FRAUD, for god sake.. What is going on?"
7,churn,Too expensive! There should be some discount�.
8,churn,"My friend told me, there�s a better deal out t..."
9,churn,Can you believe it? Cancel my account
10,churn,Totally unhappy.. How can I close my policy?


In [5]:
# convert label to a numerical variable
sms['label_num'] = sms.label.map({'retain':0, 'churn':1})

In [6]:
# check that the conversion worked
sms.tail(10)

Unnamed: 0,label,message,label_num
393,churn,I have enough.. cancelling my policy,1
394,churn,what???,1
395,retain,"I am happy with your service, please renew my ...",0
396,retain,Auto-renew my policy�.,0
397,retain,"Auto-renew my policy, please!!",0
398,retain,"Jack, what a guy!!!",0
399,retain,Would you renew my policy? My policy number i...,0
400,retain,Can I renew my policy for another two years? ...,0
401,retain,renew my account�. My name is Jane Doe and pho...,0
402,churn,"I am through, you guys..",1


In [7]:
# how to define X and y (from the SMS data) for use with COUNTVECTORIZER
X = sms.message
y = sms.label_num
print(X.shape)
print(y.shape)

(402,)
(402,)


In [8]:
# split X and y into training and testing sets for 50:50

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(201,)
(201,)
(201,)
(201,)


In [9]:
# import and instantiate CountVectorizer (with the default parameters)
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()

In [10]:
# convert X to vectors
vect = CountVectorizer()

X_train = vect.fit_transform(X_train)
X_test = vect.fit_transform(X_test)

In [11]:
# convert the vectors to arrays
X_train = X_train.toarray()
X_test = X_test.toarray()

In [12]:
print (X_train.shape)
print (X_test.shape)

(201, 147)
(201, 139)


In [13]:
print (y_train.shape)
print (y_test.shape)

(201,)
(201,)


## Word Embedding

> A recent breakthrough in the field of natural language processing is called word embedding. This is a technique where words are encoded as real-valued vectors in a high-dimensional space, where the similarity between words in terms of meaning translates to closeness in the vector space. Discrete words are mapped to vectors of continuous numbers. This is useful when working with natural language problems with neural networks and deep learning models are we require numbers as input. Keras provides a convenient way to convert positive integer representations of words into a word embedding by an Embedding layer. The layer takes arguments that define the mapping including the maximum number of expected words also called the vocabulary size (e.g. the largest integer value that will be seen as an integer). The layer also allows you to specify the dimensionality for each word vector, called the output dimension.

# Multi-Layer Perceptron (MLP)
MLP is a rudimentary form of artificial neural network algorithm, where I use a simple MLP model with a single hidden layer for illustrative purpose

In [14]:
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence

Using TensorFlow backend.


> Now we can create our model. We will use an Embedding layer as the input layer, setting the vocabulary to 5,000, the word vector size to 32 dimensions and the input_length to 500. The output of this first layer will be a 32×500 sized matrix.

> We will flatten the Embedded layers output to one dimension, then use one dense hidden layer of 250 units with a rectifier activation function. The output layer has one neuron and will use a sigmoid activation to output values of 0 and 1 as predictions.

In [15]:
# top_words parameter is set to pick 5000 most used words in whole text dataset
top_words = 5000

In [16]:
# max_words parameter is set to choose the length of each observation (in this case, 500).  
# with pad_sequences function, each observation is either padded or truncated to the length of max_works (standardizing for model)

max_words = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_words)
X_test = sequence.pad_sequences(X_test, maxlen=max_words)

In [17]:
print (X_train.shape)
print (X_test.shape)

(201, 500)
(201, 500)


In [18]:
print (y_train.shape)
print (y_test.shape)

(201,)
(201,)


In [None]:
# create the model
model = Sequential()
model.add(Embedding(top_words, 32, input_length=max_words))
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

In [None]:
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5, batch_size=128, verbose=2)

# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

## CNN (Convolutional Neural Networks) 

Convolutional neural networks were designed to process the spatial structure in image data, while being robust to the position and orientation of learned objects in the scene.

This same principle can be used on sequences, such as the one-dimensional sequence of words in this SMS data. The same properties that make the CNN model attractive for learning to recognize objects in images can help to learn structure in paragraphs of words, namely the techniques invariance to the specific position of features.

Keras supports one-dimensional convolutions and pooling by the Conv1D and MaxPooling1D classes respectively.

In [71]:
# CNN 
import numpy

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence

We can now define our convolutional neural network model. This time, after the Embedding input  layer, we insert a Conv1D layer. This convolutional layer has 32 feature maps and reads embedded word representations 3 vector elements of the word embedding at a time.
The convolutional layer is followed by a 1D max pooling layer with a length and stride of 2 that halves the size of the feature maps from the convolutional layer. The rest of the network is the same as the neural network above.

In [72]:

top_words = 5000

# pad dataset to a maximum review length in words
max_words = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_words)
X_test = sequence.pad_sequences(X_test, maxlen=max_words)

In [73]:
# create the model
model = Sequential()
model.add(Embedding(top_words, 32, input_length=max_words))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 500, 32)           3104      
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 250, 32)           0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 8000)              0         
_________________________________________________________________
dense_7 (Dense)              (None, 250)               2000250   
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 251       
Total params: 2,163,605
Trainable params: 2,163,605
Non-trainable params: 0
_________________________________________________________________


In [74]:
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5, batch_size=128, verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Train on 201 samples, validate on 201 samples
Epoch 1/5
17s - loss: 0.6929 - acc: 0.4876 - val_loss: 0.6899 - val_acc: 0.5622
Epoch 2/5
0s - loss: 0.6924 - acc: 0.5174 - val_loss: 0.6871 - val_acc: 0.5622
Epoch 3/5
0s - loss: 0.6860 - acc: 0.5174 - val_loss: 0.6876 - val_acc: 0.6766
Epoch 4/5
0s - loss: 0.6806 - acc: 0.8209 - val_loss: 0.6847 - val_acc: 0.8358
Epoch 5/5
0s - loss: 0.6704 - acc: 0.8209 - val_loss: 0.6757 - val_acc: 0.5821
Accuracy: 58.21%


# Summary

This illustrative python notebook shows how to get started with a simple deep learning techniques utilizing MLP and 1-D CNN. I hope you to see how easy to adopt IBM's Data Science Experience for your data analytics and modeling needs. Please find overview and getting-started information in the Data Science Experience documentation: https://datascience.ibm.com/docs/content/getting-started/welcome-main.html. Learn about Jupyter notebooks, which are used throughout this scenario, in the Data Science Experience documentation: https://datascience.ibm.com/docs/content/analyze-data/notebooks-parent.html