<table style="border: none" align="left">
    <tr style="border: none">
       <th style="border: none"><img src="https://raw.githubusercontent.com/pmservice/cars-4-you/master/static/images/logo.png" width="200" alt="Icon"></th>
       <th style="border: none"><font face="verdana" size="5" color="black"><b>Customer Satisfaction Prediction</b></th>
   </tr>
</table>

<img align=left src="https://github.com/pmservice/cars-4-you/raw/master/static/images/ai_function.png" alt="Icon" width="664">

Keras model and AI function to determine if comment is a complain.

Contents

- [0. Setup](#setup)
- [1. Introduction](#introduction)
- [2. Load and explore data](#load)
- [3. Create Keras model using TensorFlow backend](#model)
- [4. Store the model in the repository](#persistence)
- [5. Deploy the model](#deployment)
- [6. AI function](#ai_function)
- [7. Payload logging for AI function](#ai_function)

<a id="setup"></a>
## 0. Setup

Install TensorFlow version 1.5 and newest version of watson-machine-learning-client.

In [2]:
!pip install --upgrade tensorflow==1.5

Collecting tensorflow==1.5
  Downloading https://files.pythonhosted.org/packages/43/aa/fe3e9d0b48db4adde9781658b9354814b6cdf6acbfeaa7b2677c7b8002d6/tensorflow-1.5.0-cp35-cp35m-manylinux1_x86_64.whl (44.4MB)
[K    100% |████████████████████████████████| 44.4MB 22kB/s  eta 0:00:01
[?25hCollecting absl-py>=0.1.6 (from tensorflow==1.5)
  Downloading https://files.pythonhosted.org/packages/96/5d/18feb90462c8edaae71305716c7e8bac479fc9dface63221f808a6b95880/absl-py-0.3.0.tar.gz (84kB)
[K    100% |████████████████████████████████| 92kB 9.6MB/s eta 0:00:01
[?25hRequirement not upgraded as not directly required: six>=1.10.0 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from tensorflow==1.5)
Requirement not upgraded as not directly required: protobuf>=3.4.0 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from tensorflow==1.5)
Requirement not upgraded as not directly required: wheel>=0.26 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from tensorflow==

In [4]:
!rm -rf $PIP_BUILD/watson-machine-learning-client
!pip install --upgrade watson-machine-learning-client==1.0.260

Requirement already up-to-date: watson-machine-learning-client==1.0.260 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages
Requirement not upgraded as not directly required: ibm-cos-sdk in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from watson-machine-learning-client==1.0.260)
Requirement not upgraded as not directly required: lomond in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from watson-machine-learning-client==1.0.260)
Requirement not upgraded as not directly required: tabulate in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from watson-machine-learning-client==1.0.260)
Requirement not upgraded as not directly required: urllib3 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from watson-machine-learning-client==1.0.260)
Requirement not upgraded as not directly required: certifi in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from watson-machine-learning-client==1.0.260)
Requirement not upgraded as not di

<a id="introduction"></a>
## 1. Introduction

This notebook trains a **Keras** (TensorFlow) model to predict customer satisfaction based on provided feedback. Notebook also shows usage of **AI Function** for deep learning model data preprocessing required before model scoring.

<a id="load"></a>
## 2. Load and explore data

In this section the data is loaded as pandas dataframe.

In [5]:

from ibmdbpy import IdaDataBase, IdaDataFrame

# @hidden_cell
# This connection object is used to access your data and contains your credentials.
# You might want to remove those credentials before you share your notebook.
idadb_c166344e776040b39f477655199897f8 = IdaDataBase(dsn='DASHDB;Database=BLUDB;Hostname=dashdb-entry-yp-dal10-01.services.dal.bluemix.net;Port=50000;PROTOCOL=TCPIP;UID=dash5120;PWD=***')

data_df = IdaDataFrame(idadb_c166344e776040b39f477655199897f8, 'DASH5120.CAR_RENTAL_TRAINING').as_dataframe()
data_df.head()

# You can close the database connection with the following code. Please keep the comment line with the @hidden_cell tag,
# because the close function displays parts of the credentials.
# @hidden_cell
# idadb_c166344e776040b39f477655199897f8.close()
# To learn more about the ibmdby package, please read the documentation: http://pythonhosted.org/ibmdbpy/


Unnamed: 0,ID,Gender,Status,Children,Age,Customer_Status,Car_Owner,Customer_Service,Satisfaction,Business_Area,Action
0,74,Male,M,1,26.26,Active,No,"no wait for pick up and drop off was great, he...",1,Product: Information,
1,83,Female,M,2,48.85,Inactive,Yes,I thought the representative handled the initi...,0,Product: Availability/Variety/Size,Free Upgrade
2,140,Female,S,0,36.92,Inactive,No,Everyone was very cooperative. The auto was r...,1,Product: Functioning,
3,191,Male,M,0,45.51,Inactive,Yes,what customer service? It was a nightmare,0,Service: Knowledge,Voucher
4,239,Male,M,1,46.0,Inactive,Yes,They did not have the auto I wanted. upgraded...,0,Product: Availability/Variety/Size,Free Upgrade


**Note:** 0 - not satisfied, 1 - satisfied

Extract needed columns and count number of records.

In [6]:
complain_data = data_df[['Customer_Service', 'Satisfaction']]

In [7]:
print(complain_data.count())

Customer_Service    482
Satisfaction        482
dtype: int64


<a id="model"></a>
## 3. Create Keras model using TensorFlow backend


In [8]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
from sklearn.model_selection import train_test_split
import os
import numpy
from keras.models import Sequential
from keras.layers.convolutional import Conv1D, MaxPooling1D
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence

Using TensorFlow backend.


### 3.1 Prepare data

In [9]:
max_fatures = 500

for idx,row in complain_data.iterrows():
    row[0] = row[0].replace('rt',' ')

tokenizer = Tokenizer(num_words=max_fatures, split=' ')
tokenizer.fit_on_texts(complain_data['Customer_Service'].values)
X = tokenizer.texts_to_sequences(complain_data['Customer_Service'].values)

maxlen = 50

X = pad_sequences(X, maxlen=maxlen)
print(X.shape)

(482, 50)


Split into train and test datasets.

In [10]:
Y = complain_data['Satisfaction'].values
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.33, random_state = 42)

print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)

(322, 50) (322,)
(160, 50) (160,)


### 3.2 Design and train model

Create the network definition based on Gated Recurrent Unit (Cho et al. 2014).

In [11]:
embedding_vector_length = 32

model = Sequential()
model.add(Embedding(max_fatures, embedding_vector_length, input_length=maxlen))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 50, 32)            16000     
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 50, 32)            3104      
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 25, 32)            0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               53200     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 101       
Total params: 72,405
Trainable params: 72,405
Non-trainable params: 0
_________________________________________________________________
None


Train the model.

In [12]:
history = model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=20, batch_size=64)

Train on 322 samples, validate on 160 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [13]:
print("Best accuracy on test: %3.3f" % numpy.array(history.history['val_acc']).max())

Best accuracy on test: 0.950


**Note:** For purpose of this demo model tuning has been skipped.

Store and archive the model on notebook filesystem.

In [14]:
# evaluate the model
scores = model.evaluate(X_test, Y_test, verbose=0)
print("Evaluation Accuracy: %.2f%%" % (scores[1]*100))

Evaluation Accuracy: 93.75%


In [15]:
filename = 'complain_model.h5'
model.save(filename)

#compress keras model
tar_filename = filename + ".tgz"
cmdstring = "tar -zcvf " + tar_filename + " " + filename
os.system(cmdstring);

In [16]:
!ls -lat

total 1692
-rw-r----- 1 dsxuser dsxuser 815998 Jul 26 07:27 complain_model.h5.tgz
drwxr-x--- 2 dsxuser dsxuser   4096 Jul 26 07:27 .
-rw-r----- 1 dsxuser dsxuser 903944 Jul 26 07:27 complain_model.h5
drwx------ 1 dsxuser dsxuser   4096 Jul 26 07:27 ..


<a id="persistence"></a>
## 4. Store the model in the repository

In [17]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient



In [18]:
# @hidden_cell

wml_credentials = {
  "apikey": "zHjk-2xbPEDWWSBN0b6XDKfSWewYffyKOTdzMz7fpKAx",
  "iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:pm-20:us-south:a/e0f7ec3ac1b24ec9ae771efd772538a2:aaed6937-c0e7-4307-8a17-361aca257c7e::",
  "iam_apikey_name": "auto-generated-apikey-fb47bad6-4fd2-4d0c-9f65-958c383d6460",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/e0f7ec3ac1b24ec9ae771efd772538a2::serviceid:ServiceId-d65f2cf0-84dd-47b7-86ed-d7a7b0c8e91c",
  "instance_id": "aaed6937-c0e7-4307-8a17-361aca257c7e",
  "password": "***",
  "url": "https://us-south.ml.cloud.ibm.com",
  "username": "fb47bad6-4fd2-4d0c-9f65-958c383d6460"
}

In [19]:
client = WatsonMachineLearningAPIClient(wml_credentials)

In [20]:
model_props = {
    client.repository.ModelMetaNames.NAME: "CARS4U - Satisfaction Prediction Model",
    client.repository.ModelMetaNames.FRAMEWORK_NAME: "tensorflow",
    client.repository.ModelMetaNames.FRAMEWORK_VERSION: "1.5",
    client.repository.ModelMetaNames.RUNTIME_NAME: "python",
    client.repository.ModelMetaNames.RUNTIME_VERSION: "3.5",
    client.repository.ModelMetaNames.FRAMEWORK_LIBRARIES: [{'name':'keras', 'version': '2.1.3'}]
}

published_model_details = client.repository.store_model(model=tar_filename, meta_props=model_props)       

In [21]:
model_uid = client.repository.get_model_uid(published_model_details)
print(model_uid)

5a937a28-0696-4bb0-bdf2-cb54322d28df


<a id="deployment"></a>
## 5. Deploy the model

### 5.1 Create deployment

In [22]:
deployment = client.deployments.create(model_uid, 'CARS4U - Satisfaction Prediction Model Deployment')



#######################################################################################

Synchronous deployment creation for uid: '5a937a28-0696-4bb0-bdf2-cb54322d28df' started

#######################################################################################


INITIALIZING
DEPLOY_IN_PROGRESS..
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='dca4156f-63ad-49dc-ae2c-82c0592daf65'
------------------------------------------------------------------------------------------------




In [23]:
client.deployments.list()

------------------------------------  -------------------------------------------------  ------  --------------  ------------------------  --------------  ----------
GUID                                  NAME                                               TYPE    STATE           CREATED                   FRAMEWORK       ASSET TYPE
dca4156f-63ad-49dc-ae2c-82c0592daf65  CARS4U - Satisfaction Prediction Model Deployment  online  DEPLOY_SUCCESS  2018-07-26T07:28:07.551Z  tensorflow-1.5  model
------------------------------------  -------------------------------------------------  ------  --------------  ------------------------  --------------  ----------


### 5.2 Score the model

Let's see if our deployment works.

In [24]:
scoring_endpoint = client.deployments.get_scoring_url(deployment)

In [25]:
print(scoring_endpoint)

https://us-south.ml.cloud.ibm.com/v3/wml_instances/aaed6937-c0e7-4307-8a17-361aca257c7e/deployments/dca4156f-63ad-49dc-ae2c-82c0592daf65/online


In [26]:
index = 5

scoring_data = X[index].tolist()
print(X_test[index])
print(Y_test[index])

[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0  40 107 131  19   1  22  77]
0


In [27]:
scoring_payload = {'values': [scoring_data]}
scores = client.deployments.score(scoring_endpoint, scoring_payload)

In [28]:
print(scoring_payload)

{'values': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 266, 138, 139, 267, 207, 115, 12, 8]]}


In [29]:
len(scoring_payload['values'][0])

50

In [30]:
print({'values': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 266, 138, 139, 267, 207, 115, 12, 8], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 2, 78, 5, 7, 13, 122, 3, 109, 51, 0, 0, 58, 15, 808, 31, 7, 23]]})

{'values': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 266, 138, 139, 267, 207, 115, 12, 8], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 2, 78, 5, 7, 13, 122, 3, 109, 51, 0, 0, 58, 15, 808, 31, 7, 23]]}


Let's print scoring results.

In [31]:
print(str(scores))

{'values': [[[0.024061590433120728], [0], [0.024061590433120728]]], 'fields': ['prediction', 'prediction_classes', 'probability']}


<a id="function"></a>
## 6. AI function

Let's define AI function that does data preprocessing and model scoring for us. As noticed above model expects numerical input, so the text comment needs to be preprocessed.

### 6.1 Definition

Define some generic parameters our function will use to score the model.

#### Parameters

In [32]:
ai_params = {
    'scoring_endpoint': scoring_endpoint,
    'wml_credentials': wml_credentials,
    'word_index': tokenizer.word_index
}

#### Function

In [33]:
def score_generator(params=ai_params):

    def score(payload):
        import re
        from watson_machine_learning_client import WatsonMachineLearningAPIClient
        client = WatsonMachineLearningAPIClient(params['wml_credentials'])
        
        max_fatures = 500
        maxlen = 50

        preprocessed_records = []
        complain_data = payload['values']
        word_index = params['word_index']

        for data in complain_data:
            comment = data[0]
            cleanString = re.sub(r"[!\"#$%&()*+,-./:;<=>?@[\]^_`{|}~]", "", comment)
            splitted_comment = cleanString.split()[:maxlen]
            hashed_tokens = []

            for token in splitted_comment:
                index = word_index.get(token, 0)
                if index < 501 and index > 0:
                    hashed_tokens.append(index)

            hashed_tokens_size = len(hashed_tokens)
            padded_tokens = [0]*(maxlen-hashed_tokens_size) + hashed_tokens
            preprocessed_records.append(padded_tokens)

        scoring_payload = {'values': preprocessed_records}
        print(str(scoring_payload))
        
        return client.deployments.score(params['scoring_endpoint'], scoring_payload)
        
        
    return score

#### Test locally

In [34]:
sample_data = {
    'fields': ['feedback'],
    'values': [
        ['delayed shuttle, almost missed flight, bad customer service'],
        ['The car was great and they were able to provide all features I wanted with limited time they had.']
    ]
}

In [35]:
score = score_generator()
score(sample_data)

{'values': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 266, 138, 139, 267, 207, 115, 12, 8], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 2, 78, 5, 7, 13, 122, 3, 109, 51, 58, 15, 31, 7, 23]]}


{'fields': ['prediction', 'prediction_classes', 'probability'],
 'values': [[[0.024061603471636772], [0], [0.024061603471636772]],
  [[0.9951716065406799], [1], [0.9951716065406799]]]}

**Note:** 0 - not satisfied. 1 - satisfied

### 6.2 AI function storing

In [36]:
runtime_meta = {
            client.runtime_specs.ConfigurationMetaNames.NAME: "Runtime specification",
            client.runtime_specs.ConfigurationMetaNames.PLATFORM: {
               "name": "python",
               "version": "3.5"
            }
}

In [37]:
runtime_details = client.runtime_specs.create(meta_props=runtime_meta)
runtime_url = client.runtime_specs.get_url(runtime_details)
print(runtime_url)

https://us-south.ml.cloud.ibm.com/v4/runtimes/dc3ccdf3-c3a1-4da6-802f-e0406313f5e3


In [38]:
client.repository.FunctionMetaNames.show()

------------------  ----  --------  -------------
META_PROP NAME      TYPE  REQUIRED  DEFAULT_VALUE
NAME                str   Y
DESCRIPTION         str   N
TYPE                str   N         python
RUNTIME_URL         str   N
INPUT_DATA_SCHEMA   dict  N
OUTPUT_DATA_SCHEMA  dict  N
TAGS                list  N
------------------  ----  --------  -------------


In [39]:
meta_data = {
    client.repository.FunctionMetaNames.NAME: 'CARS4U - Satisfaction Prediction - AI Function',
    client.repository.FunctionMetaNames.RUNTIME_URL: runtime_url
}

function_details = client.repository.store_function(meta_props=meta_data, function=score_generator)

Recognized generator function.


In [40]:
client.repository.list_functions()

------------------------------------  -------------------------------------------  ------------------------  ------
GUID                                  NAME                                         CREATED                   TYPE
402d10f6-b7a4-434b-b76c-4a7069c9ab44  CARS4U - Sentiment Prediction - AI Function  2018-07-26T07:28:31.810Z  python
------------------------------------  -------------------------------------------  ------------------------  ------


### 6.3 AI function deployment

In [41]:
function_uid = client.repository.get_function_uid(function_details)

function_deployment_details = client.deployments.create(asset_uid=function_uid, name='CARS4U - Satisfaction Prediction - AI Function Deployment')



#######################################################################################

Synchronous deployment creation for uid: '402d10f6-b7a4-434b-b76c-4a7069c9ab44' started

#######################################################################################


INITIALIZING
DEPLOY_IN_PROGRESS....
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='58856df6-8a11-4e7f-aae9-bdba3dc51272'
------------------------------------------------------------------------------------------------




### Score AI function

In [42]:
ai_function_scoring_endpoint = client.deployments.get_scoring_url(function_deployment_details)

print(ai_function_scoring_endpoint)

https://us-south.ml.cloud.ibm.com/v3/wml_instances/aaed6937-c0e7-4307-8a17-361aca257c7e/deployments/58856df6-8a11-4e7f-aae9-bdba3dc51272/online


In [43]:
response = client.deployments.score(ai_function_scoring_endpoint, sample_data)

In [44]:
print(response)

{'values': [[[0.024061603471636772], [0], [0.024061603471636772]], [[0.9951716065406799], [1], [0.9951716065406799]]], 'fields': ['prediction', 'prediction_classes', 'probability']}


<a id="ai_function"></a>
## 7. Payload logging for AI function

In [45]:
function_deployment_uid = client.deployments.get_uid(function_deployment_details)
print(function_deployment_uid)

58856df6-8a11-4e7f-aae9-bdba3dc51272


In [46]:
# @hidden_cell
postgres_connection = {
  'database':'compose',
  'password':"""***""",
  'port':'47860',
  'host':'sl-us-south-1-portal.28.dblayer.com',
  'username':'admin'
}

In [47]:
payload_data_reference = {
    "type": "postgresql",
    "location": {
        "tablename": "public.cars4u_satsisfaction_prediction_payload"
    },
    "connection": {
            "uri": "postgres://{username}:{password}@{host}:{port}/{database}".format(**postgres_connection)
        }
}

print(payload_data_reference)

{'location': {'tablename': 'public.cars4u_satsisfaction_prediction_payload'}, 'type': 'postgresql', 'connection': {'uri': 'postgres://admin:WHDHTGJYSXKJTMET@sl-us-south-1-portal.28.dblayer.com:47860/compose'}}


In [48]:
payload_metadata = {
    client.deployments.PayloadLoggingMetaNames.PAYLOAD_DATA_REFERENCE: payload_data_reference
}

In [50]:
config_details = client.deployments.setup_payload_logging(function_deployment_uid, meta_props=payload_metadata)

In [51]:
print(config_details)

{'payload_store': {'location': {'tablename': 'public.cars4u_satsisfaction_prediction_payload'}, 'type': 'postgresql', 'connection': {'db': 'compose', 'uri': 'postgres://admin:WHDHTGJYSXKJTMET@sl-us-south-1-portal.28.dblayer.com:47860/compose', 'host': 'sl-us-south-1-portal.28.dblayer.com:47860'}}, 'dynamic_schema_update': True}


#### Score AI function again to have some payload records

In [52]:
client.deployments.score(ai_function_scoring_endpoint, sample_data)

{'fields': ['prediction', 'prediction_classes', 'probability'],
 'values': [[[0.024061603471636772], [0], [0.024061603471636772]],
  [[0.9951716065406799], [1], [0.9951716065406799]]]}

---
