<h1 id="weather_station_clustering" align="center"> Clustering Ghanaian households based on their expenditure patterns</h1>
<hr>

### 1.7 Model Deployment

The results of this project will be deployed in two formats. First, the insights found in this data will be reported in a PDF file and shared among stakeholders. The report can be found here, https://github.com/EmmanuelAmeyaw/IBM-capstone-prject
Second, to classify new households (new data) into the existing clusters found in the dataset, we will build a keras deep learning classifier to learn and explaining the cluster assignments. The model will then be deployed on the IBM Watson Machine Learning platform.

We will make sure that the current versions of Keras and Tensorflow are matching the requirements. Indeed they do match the requirements.

In [2]:
import keras
print('Current:\t', keras.__version__)
print('Expected:\t 2.1.3')

Using TensorFlow backend.


Current:	 2.1.5
Expected:	 2.1.3


In [3]:
import tensorflow as tf
print('Current:\t', tf.__version__)
print('Expected:\t 1.5.0')

Current:	 1.8.0
Expected:	 1.5.0


Next, we will train a household classifier model. First, let's import the necessary libraries.

In [5]:
import keras
from keras.models import Model
from keras.layers import Input, Dense
from keras.layers import Dense, Dropout, Flatten
from keras.models import Sequential, load_model
from keras.optimizers import RMSprop
from keras.optimizers import SGD
from keras.layers import LeakyReLU
from keras import backend as K
import numpy as np
import pandas as pd

Load normalized data

In [22]:
df = pd.read_csv('XX_dfl2.csv')
df.drop('Unnamed: 0', axis = 1, inplace = True)
df.head(2)

Unnamed: 0,totfood,totalch,totclth,tothous,totfurn,tothlth,tottrsp,totcmnq,totrcre,toteduc,totmisc,label
0,0.187262,0.154462,0.381787,0.160858,0.205355,0.123294,0.125713,0.208678,0.144971,0.28414,0.199687,0
1,0.187262,0.154462,0.381787,0.160858,0.205355,0.123294,0.125713,0.208678,0.144971,0.28414,0.199687,0


remove noisy data points. data points with a label of -1 are noisy data points

In [24]:
condition = df.label != -1
df = df[condition]

In [25]:
df.label.value_counts()

1    5626
0    1945
2    1281
Name: label, dtype: int64

Create train and test sets

In [35]:
from sklearn.model_selection import train_test_split
train_set, test_set = train_test_split(df, test_size=0.3, random_state=42)


Create input data

In [43]:
x_train = train_set.iloc[:,0:11].values

In [45]:
x_test = test_set.iloc[:,0:11].values

In [47]:
y_train = train_set.iloc[:,11:12].values

In [48]:
y_test = test_set.iloc[:,11:12].values

Checking data types

In [49]:
df.dtypes

totfood    float64
totalch    float64
totclth    float64
tothous    float64
totfurn    float64
tothlth    float64
tottrsp    float64
totcmnq    float64
totrcre    float64
toteduc    float64
totmisc    float64
label        int64
dtype: object

In [50]:
# convert class vectors (int) to binary class matrices
num_classes = 3
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)


Training a model

In [53]:
model = Sequential()
#Add layers
model.add(Dense(500, activation='relu', input_shape=(11,)))
model.add(Dense(500, activation='relu'))
model.add(Dense(2000, activation='relu'))
model.add(Dense(2000, activation='softmax'))
model.add(Dense(500, activation='relu'))
model.add(Dense(500, activation='relu'))
model.add(Dense(500, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

#Compile model with loss and optimizer
model.compile(loss='categorical_crossentropy',
        optimizer='rmsprop',
        metrics=['accuracy'])

#Train network
batch_size = 128
epochs = 50
model.fit(x_train, y_train,
        batch_size=batch_size,
        epochs=epochs,
        verbose=1,
        validation_data=(x_test, y_test))

# Evaluate model        
score = model.evaluate(x_test, y_test, verbose=0)

print('\n')
print('Accuracy:',score[1])

Train on 6196 samples, validate on 2656 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


Accuracy: 1.0


This accuracy level is not surprising since the labels of the data were generated through a deep learning autoencoder. Hence a deep learning model using similar layers are able to perfectly identify the clusters.  

In [70]:
#some cleanup from the previous run
!rm -f ker_*
!rm -f kker_*
!rm -f my_best_model.tgz

We are satisfied with the model above, so we will save it.

In [72]:
activation_function_layer_1 = 'softmax'
opimizer = 'rmsprop'
score = model.evaluate(x_test, y_test, verbose=0)
save_path = "ker_func_mnist_model_2.%s.%s.%s.h5" % (activation_functions_layer_1,opimizer,score[1])
model.save(save_path)

Let's view the saved model

In [73]:
ls -ltr ker_*

-rw-r--r-- 1 jupyterlab resources 54151768 Jul  7 06:20 ker_func_mnist_model_2.softmax.rmsprop.1.0.h5


Putting the model in a .tgz file

In [74]:
!tar -zcvf my_best_model.tgz ker_func_mnist_model_2.softmax.rmsprop.1.0.h5

ker_func_mnist_model_2.softmax.rmsprop.1.0.h5


#### Save the trained model to WML Repository

We will use `watson_machine_learning_client` python library to save the trained model to WML Repository, to deploy the saved model and to make predictions using the deployed model.</br>

In [75]:
!pip install watson-machine-learning-client --upgrade

Collecting watson-machine-learning-client
[?25l  Downloading https://files.pythonhosted.org/packages/0e/a1/c503614455fb734b0989e8d6abaf24d0544d7370f7eb2b80ffbc99a40caf/watson_machine_learning_client-1.0.371-py3-none-any.whl (536kB)
[K     |████████████████████████████████| 542kB 29.5MB/s eta 0:00:01
[?25hCollecting lomond (from watson-machine-learning-client)
  Downloading https://files.pythonhosted.org/packages/0f/b1/02eebed49c754b01b17de7705caa8c4ceecfb4f926cdafc220c863584360/lomond-0.3.3-py2.py3-none-any.whl
Collecting tabulate (from watson-machine-learning-client)
[?25l  Downloading https://files.pythonhosted.org/packages/c2/fd/202954b3f0eb896c53b7b6f07390851b1fd2ca84aa95880d7ae4f434c4ac/tabulate-0.8.3.tar.gz (46kB)
[K     |████████████████████████████████| 51kB 14.2MB/s eta 0:00:01
Building wheels for collected packages: tabulate
  Building wheel for tabulate (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/jupyterlab/.cache/pip/wheels/2b/67/89/414471314a2d15de625

In [76]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

In [77]:
wml_credentials={
  "apikey": "JUmH8qP63ofgxBnWCQpTsaTo8p3gYVA87-QbccKvE54D",
  "iam_apikey_description": "Auto-generated for key 17679675-ec4a-4a29-9a78-44dd993c5c83",
  "iam_apikey_name": "wdp-writer",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/6b12f48f115b4722bbe3cf1246ff6c67::serviceid:ServiceId-dc2fd81e-909a-4dff-b157-1c662ba49aac",
  "instance_id": "57e1ae13-3998-426b-a23b-fdd1b4d89ede",
  "password": "a316ecaa-8d75-4b0d-93b5-6805b568dabe",
  "url": "https://eu-gb.ml.cloud.ibm.com",
  "username": "17679675-ec4a-4a29-9a78-44dd993c5c83"
}

In [78]:
client = WatsonMachineLearningAPIClient(wml_credentials)

In [79]:
model_props = {client.repository.ModelMetaNames.AUTHOR_NAME: "Emmanuel Ameyaw", 
               client.repository.ModelMetaNames.AUTHOR_EMAIL: "ameyawemmanuel@rocketmail.com", 
               client.repository.ModelMetaNames.NAME: "KK3_clt_keras_household_clustering_ghana",
               client.repository.ModelMetaNames.FRAMEWORK_NAME: "tensorflow",
               client.repository.ModelMetaNames.FRAMEWORK_VERSION: "1.5" ,
               client.repository.ModelMetaNames.FRAMEWORK_LIBRARIES: [{"name": "keras", "version": "2.1.3"}]
              }

In [80]:
published_model = client.repository.store_model(model="my_best_model.tgz", meta_props=model_props) #my_best_model.tgz already saved




In [81]:
published_model_uid = client.repository.get_model_uid(published_model)
model_details = client.repository.get_details(published_model_uid)

#### Deploy the Keras model

In [82]:
client.deployments.list()

----  ----  ----  -----  -------  ---------  -------------
GUID  NAME  TYPE  STATE  CREATED  FRAMEWORK  ARTIFACT TYPE
----  ----  ----  -----  -------  ---------  -------------


To keep your environment clean, just delete all deployments from previous runs

In [84]:
#client.deployments.delete("PASTE_YOUR_GUID_HERE_IF_APPLICABLE = ")

#### Test the model

In [86]:
created_deployment = client.deployments.create(published_model_uid, name="k1_keras_household_clustering_clt1")



#######################################################################################

Synchronous deployment creation for uid: 'aa4cb69a-0816-4a5e-bd46-48e46581c5be' started

#######################################################################################


INITIALIZING
DEPLOY_IN_PROGRESS
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='c5c96e45-964f-474b-a931-9442033f7c9e'
------------------------------------------------------------------------------------------------




In [87]:
#scoring_endpoint = client.deployments.get_scoring_url(created_deployment)
# scoring of provided questions and aswers data
scoring_endpoint = created_deployment['entity']['scoring_url']
print(scoring_endpoint)

https://eu-gb.ml.cloud.ibm.com/v3/wml_instances/57e1ae13-3998-426b-a23b-fdd1b4d89ede/deployments/c5c96e45-964f-474b-a931-9442033f7c9e/online


### EXAMPLE 1: Get some input data AND predict its label

Choose x_test[3] which we know is a middle class household

In [239]:
print(x_test[3].tolist()) # x input
print(y_test[3], 'IS MIDDLE CLASS')

[0.18726212, 0.15446213, 0.38178715, 0.16085844, 0.20535469, 0.12329403, 0.12571345, 0.20867807, 0.14497079, 0.28414047, 0.1996867]
[1. 0. 0.] IS MIDDLE CLASS


Now, let's predict it with our deployed model

In [240]:
x_score_1 = x_test[3].tolist()
#print('The answer should be: ',np.argmax(y_test[23]))
scoring_payload = {'values': [x_score_1]}

In [241]:
predictions = client.deployments.score(scoring_endpoint, scoring_payload)
predictions

{'fields': ['prediction', 'prediction_classes', 'probability'],
 'values': [[[1.0, 5.5607145554859017e-08, 3.4113877944719206e-08],
   0,
   [1.0, 5.5607145554859017e-08, 3.4113877944719206e-08]]]}

In [242]:
xx = predictions['values'][0][2]
xxr = [round(x) for x in xx]
s = xxr
a = np.array([1, 0, 0])
b = np.array([0, 1, 0])
x = (s == a).all() 
y = (s == b).all() 
if x:
  print('cluster0: MIDDLE CLASS')
elif y: 
  print('cluster1: LOWER CLASS')
else:
  print('cluster2: UPPER CLASS')

cluster0: MIDDLE CLASS


Yeah!! It works.

### Example 2

In [243]:
x_score_1 = x_test[1].tolist()
#print('The answer should be: ',np.argmax(y_test[23]))
scoring_payload = {'values': [x_score_1]}
predictions = client.deployments.score(scoring_endpoint, scoring_payload)
predictions

xx = predictions['values'][0][2]
xxr = [round(x) for x in xx]
s = xxr
#s = y_test[3]
a = np.array([1, 0, 0])
b = np.array([0, 1, 0])
x = (s == a).all() 
y = (s == b).all() 
if x:
  print('cluster0: MIDDLE CLASS')
elif y: 
  print('cluster1: LOWER CLASS')
else:
  print('cluster2: UPPER CLASS')

cluster1: LOWER CLASS


From this deployed model, we can get new data, normalize the data and predict the class