<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>Lab: Build, Save and Deploy a Model to IBM Watson Machine Learning (WML)</b></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>
   </tr>
</table>


This notebook walks you through these steps:
- Build a Spark ML model to predict customer churn
- Save the model in the WML repository
- Create a Deployment in WML
- Invoke the deployed model with a Rest Client to test it

### Step 1: Download the customer churn data

In [1]:
#Run once to install the wget package
!pip install wget



In [2]:
import wget
url_churn='https://raw.githubusercontent.com/yfphoon/dsx_demo/master/data/customer_churn/churn.csv'
url_customer='https://raw.githubusercontent.com/yfphoon/dsx_demo/master/data/customer_churn/customer.csv'

#remove existing files before downloading
!rm -f churn.csv
!rm -f customer.csv

churnFilename=wget.download(url_churn)
customerFilename=wget.download(url_customer)

!ls -l churn.csv
!ls -l customer.csv

-rw------- 1 s2ae-eb8c87e0d6bf41-9fb5ca908bcc users 20079 Aug 14 20:21 churn.csv
-rw------- 1 s2ae-eb8c87e0d6bf41-9fb5ca908bcc users 279541 Aug 14 20:21 customer.csv


### Step 2: Create DataFrames with files

In [3]:
churn= sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").load(churnFilename)
customer= sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").load(customerFilename)

### Step 3: Merge Files

In [4]:
data=customer.join(churn,customer['ID']==churn['ID']).select(customer['*'],churn['CHURN'])

### Step 4: Rename some columns
This step is to remove spaces from columns names

In [5]:
data = data.withColumnRenamed("Est Income", "EstIncome").withColumnRenamed("Car Owner","CarOwner")
data.toPandas().head()

Unnamed: 0,ID,Gender,Status,Children,EstIncome,CarOwner,Age,LongDistance,International,Local,Dropped,Paymethod,LocalBilltype,LongDistanceBilltype,Usage,RatePlan,CHURN
0,1,F,S,1,38000.0,N,24.393333,23.56,0,206.08,0,CC,Budget,Intnl_discount,229.64,3,T
1,6,M,M,2,29616.0,N,49.426667,29.78,0,45.5,0,CH,FreeLocal,Standard,75.29,2,F
2,8,M,M,0,19732.8,N,50.673333,24.81,0,22.44,0,CC,FreeLocal,Standard,47.25,3,F
3,11,M,S,2,96.33,N,56.473333,26.13,0,32.88,1,CC,Budget,Standard,59.01,1,F
4,14,F,M,2,52004.8,N,25.14,5.03,0,23.11,0,CH,Budget,Intnl_discount,28.14,1,F


### Step 5: Build the Spark pipeline and the Random Forest model
"Pipeline" is an API in SparkML that's used for building models.
Additional information on SparkML: https://spark.apache.org/docs/2.0.2/ml-guide.html

In [6]:
from pyspark.ml.feature import OneHotEncoder, StringIndexer, VectorIndexer, IndexToString
from pyspark.ml import Pipeline
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier

# StringIndexer encodes a string column of labels to a column of label indices. 
SI1 = StringIndexer(inputCol='Gender', outputCol='GenderEncoded')
SI2 = StringIndexer(inputCol='Status',outputCol='StatusEncoded')
SI3 = StringIndexer(inputCol='CarOwner',outputCol='CarOwnerEncoded')
SI4 = StringIndexer(inputCol='Paymethod',outputCol='PaymethodEncoded')
SI5 = StringIndexer(inputCol='LocalBilltype',outputCol='LocalBilltypeEncoded')
SI6 = StringIndexer(inputCol='LongDistanceBilltype',outputCol='LongDistanceBilltypeEncoded')

# Apply OneHotEncoder so categorical features aren't given numeric importance
# One-hot encoding maps a column of label indices to a column of binary vectors, with at most a single one-value. 
OH1 = OneHotEncoder(inputCol="GenderEncoded", outputCol="GenderEncoded"+"classVec")
OH2 = OneHotEncoder(inputCol="StatusEncoded", outputCol="StatusEncoded"+"classVec")
OH3 = OneHotEncoder(inputCol="CarOwnerEncoded", outputCol="CarOwnerEncoded"+"classVec")
OH4 = OneHotEncoder(inputCol="PaymethodEncoded", outputCol="PaymethodEncoded"+"classVec")
OH5 = OneHotEncoder(inputCol="LocalBilltypeEncoded", outputCol="LocalBilltypeEncoded"+"classVec")
OH6 = OneHotEncoder(inputCol="LongDistanceBilltypeEncoded", outputCol="LongDistanceBilltypeEncoded"+"classVec")


# Pipelines API requires that input variables are passed in  a vector
assembler = VectorAssembler(inputCols=["GenderEncodedclassVec", "StatusEncodedclassVec", "CarOwnerEncodedclassVec", "PaymethodEncodedclassVec", "LocalBilltypeEncodedclassVec", \
                                       "LongDistanceBilltypeEncodedclassVec", "Children", "EstIncome", "Age", "LongDistance", "International", "Local",\
                                      "Dropped","Usage","RatePlan"], outputCol="features")

In [7]:
# encode the label column
labelIndexer = StringIndexer(inputCol='CHURN', outputCol='label').fit(data)

In [8]:
# instantiate the algorithm, take the default settings
rf=RandomForestClassifier(labelCol="label", featuresCol="features")

In [9]:
# Convert indexed labels back to original labels.
labelConverter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=labelIndexer.labels)

In [10]:
# build the pipeline
pipeline = Pipeline(stages=[SI1,SI2,SI3,SI4,SI5,SI6, labelIndexer, OH1, OH2, OH3, OH4, OH5, OH6, assembler, rf, labelConverter])# Split data into train and test datasets

In [11]:
# Split data into train and test datasets
(trainingData, testingData) = data.randomSplit([0.7, 0.3],seed=9)
trainingData.cache()
testingData.cache()

DataFrame[ID: int, Gender: string, Status: string, Children: double, EstIncome: double, CarOwner: string, Age: double, LongDistance: double, International: double, Local: double, Dropped: double, Paymethod: string, LocalBilltype: string, LongDistanceBilltype: string, Usage: double, RatePlan: double, CHURN: string]

In [12]:
# Build model. The fitted model from a Pipeline is a PipelineModel, which consists of fitted models and transformers, corresponding to the pipeline stages.
model = pipeline.fit(trainingData)

### Step 6: Score the test data set

In [13]:
results = model.transform(testingData)

### Step 7: Model Evaluation 

In [14]:
print 'Precision model1 = {:.2f}.'.format(results.filter(results.label == results.prediction).count() / float(results.count()))

Precision model1 = 0.92.


In [15]:
from pyspark.ml.evaluation import BinaryClassificationEvaluator

# Evaluate model
evaluator = BinaryClassificationEvaluator(rawPredictionCol="prediction", labelCol="label", metricName="areaUnderROC")
print 'Area under ROC curve = {:.2f}.'.format(evaluator.evaluate(results))

Area under ROC curve = 0.92.


### Step 8: Save Model in WML repository

In this section you will store your model in the Watson Machine Learning (WML) repository by using Python client libraries.
* <a href="https://console.ng.bluemix.net/docs/services/PredictiveModeling/index.html">WML Documentation</a>
* <a href="http://watson-ml-api.mybluemix.net/">WML REST API</a> 
* <a href="https://watson-ml-staging-libs.mybluemix.net/repository-python/">WML Repository API</a>
<br/>

First, you must import client libraries.

In [16]:
from repository.mlrepositoryclient import MLRepositoryClient
from repository.mlrepositoryartifact import MLRepositoryArtifact

Put your authentication information from your instance of the Watson Machine Learning service in <a href="https://console.ng.bluemix.net/dashboard/apps/" target="_blank">Bluemix</a> in the next cell. You can find your information in the **Service Credentials** tab of your service instance in Bluemix.

![WML Credentials](https://raw.githubusercontent.com/yfphoon/IntroToWML/master/images/WML%20Credentials.png)

<span style="color:red">Replace the service_path and credentials with your own information</span>

service_path=[your url]<br/>
instance_id=[your instance_id]<br/>
username=[your username]<br/>
password=[your password]<br/>

In [17]:
# @hidden_cell
service_path = 'https://ibm-watson-ml.mybluemix.net'
instance_id = 'fd6f82de-d104-4e02-a328-73fd8adfed96'
username = '477a048d-5cec-405a-b58e-4ca0c984ae42'
password = 'a00c4d12-d3a3-4722-870b-680e212fa38a'

Authorize the repository client:

In [18]:
ml_repository_client = MLRepositoryClient(service_path)
ml_repository_client.authorize(username, password)

Create the model artifact.

<b>Tip:</b> The MLRepositoryArtifact method expects a trained model object, training data, and a model name. (It is this model name that is displayed by the Watson Machine Learning service).


In [19]:
model_artifact = MLRepositoryArtifact(model, training_data=trainingData, name="Predict Customer Churn")

Save model artifact to your Watson Machine Learning instance:

In [20]:
saved_model = ml_repository_client.models.save(model_artifact)

In [21]:
# Print the saved model properties
print "modelType: " + saved_model.meta.prop("modelType")
print "creationTime: " + str(saved_model.meta.prop("creationTime"))
print "modelVersionHref: " + saved_model.meta.prop("modelVersionHref")
print "label: " + saved_model.meta.prop("label")

modelType: sparkml-model-2.0
creationTime: 2017-08-15 01:24:48.629000+00:00
modelVersionHref: https://ibm-watson-ml.mybluemix.net/v2/artifacts/models/386aff71-85c9-4bf7-9909-c0076efd1b28/versions/d46a45e4-614b-4489-bc4a-0ef5829e5a41
label: CHURN


### Step 9: Generate the Authorization Token for Invoking the model

In [22]:
import urllib3, requests, json

headers = urllib3.util.make_headers(basic_auth='{}:{}'.format(username, password))
url = '{}/v2/identity/token'.format(service_path)
response = requests.get(url, headers=headers)
mltoken = json.loads(response.text).get('token')

### Step 10:  Go to WML in Bluemix to create a Deployment Endpoint

* In your <a href="https://console.ng.bluemix.net/dashboard/apps/" target="_blank">Bluemix</a> dashboard, click into your WML Service and click the **Launch Dashboard** button under Watson Machine Learing.
![WML Launch Dashboard](https://raw.githubusercontent.com/yfphoon/dsx_demo/master/WML_Launch_Dashboard.png)

<br/>
* You should see your deployed model in the **Models** tab


* Under *Actions*, click on the 3 ellipses and click ***Create Deployment***.  Give your deployment configuration a unique name, e.g. "Predict Customer Churn Deply", select Type=Online and click **Save**.
<br/>
<br/>
* In the *Deployments tab*, under *Actions*, click **View Details**
<br/>
<br/>
* Scoll down to **API Details**, copy the value of the **Scoring Endpoint** into your notepad.  (e.g. 	https://ibm-watson-ml.mybluemix.net/v2/published_models/64fd0462-3f8a-4b42-820b-59a4da9b7dc6/deployments/7d9995ed-7daf-4cfd-b40f-37cb8ab3d88f/online)

### Step 11:  Invoke the model through REST API call

#### Create a JSON Sample record for the model 

In [23]:
sample_data = {
    "fields": [
    "ID",
    "Gender",
    "Status",
    "Children",
    "EstIncome",
    "CarOwner",
    "Age",
    "LongDistance",
    "International",
    "Local",
    "Dropped",
    "Paymethod",
    "LocalBilltype",
    "LongDistanceBilltype",
    "Usage",
    "RatePlan"
    ],
    "values": [ [999,"F","M",2.0,77551.100000,"Y",33.600000,20.530000,0.000000,41.890000,1.000000,"CC","Budget","Intnl_discount",62.420000,2.000000] ]
} 

sample_json = json.dumps(sample_data)

#### Option 1: Call the REST API Programmatically

In [24]:
# Get the scoring endpoint from the WML service
churnModel_endpoint = 'https://ibm-watson-ml.mybluemix.net/v3/wml_instances/fd6f82de-d104-4e02-a328-73fd8adfed96/published_models/386aff71-85c9-4bf7-9909-c0076efd1b28/deployments/34c38b2d-d78f-44e9-8729-33b165d47bac/online'
header_online = {'Content-Type': 'application/json', 'Authorization': mltoken}

# API call here
response_scoring = requests.post(churnModel_endpoint, data=sample_json, headers=header_online)

print response_scoring.text

{
  "fields": ["ID", "Gender", "Status", "Children", "EstIncome", "CarOwner", "Age", "LongDistance", "International", "Local", "Dropped", "Paymethod", "LocalBilltype", "LongDistanceBilltype", "Usage", "RatePlan", "CHURN", "GenderEncoded", "StatusEncoded", "CarOwnerEncoded", "PaymethodEncoded", "LocalBilltypeEncoded", "LongDistanceBilltypeEncoded", "label", "GenderEncodedclassVec", "StatusEncodedclassVec", "CarOwnerEncodedclassVec", "PaymethodEncodedclassVec", "LocalBilltypeEncodedclassVec", "LongDistanceBilltypeEncodedclassVec", "features", "rawPrediction", "probability", "prediction", "predictedLabel"],
  "values": [[999, "F", "M", 2.0, 77551.1, "Y", 33.6, 20.53, 0.0, 41.89, 1.0, "CC", "Budget", "Intnl_discount", 62.42, 2.0, "F", 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, {
    "size": 1,
    "indices": [0],
    "values": [1.0]
  }, {
    "size": 2,
    "indices": [0],
    "values": [1.0]
  }, {
    "size": 1,
    "indices": [],
    "values": []
  }, {
    "size": 2,
    "indices": [0],
    "

#### Grab Predicted Value 

In [25]:
wml = json.loads(response_scoring.text)

# First zip the fields and values together
zipped_wml = zip(wml['fields'], wml['values'].pop())

# Next iterate through items and grab the prediction value
[v for (k,v) in zipped_wml if k == 'prediction'].pop()

0.0

#### Option 2: Call the REST API through a REST Client, e.g. https://client.restlet.com/

In [26]:
# Print the Authorization token
print(mltoken)

eyJhbGciOiJSUzUxMiIsInR5cCI6IkpXVCJ9.eyJ0ZW5hbnRJZCI6ImZkNmY4MmRlLWQxMDQtNGUwMi1hMzI4LTczZmQ4YWRmZWQ5NiIsImluc3RhbmNlSWQiOiJmZDZmODJkZS1kMTA0LTRlMDItYTMyOC03M2ZkOGFkZmVkOTYiLCJwbGFuSWQiOiIzZjZhY2Y0My1lZGU4LTQxM2EtYWM2OS1mOGFmM2JiMGNiZmUiLCJyZWdpb24iOiJ1cy1zb3V0aCIsInVzZXJJZCI6IjQ3N2EwNDhkLTVjZWMtNDA1YS1iNThlLTRjYTBjOTg0YWU0MiIsImlzcyI6Imh0dHA6Ly8xMjkuNDEuMjI5LjE4ODo4MDgwL3YyL2lkZW50aXR5IiwiaWF0IjoxNTAyNzYwMjk2LCJleHAiOjE1MDI3ODkwOTZ9.p76Tdr4CgqnKP8Xff3KMYA8cTSfjuW7jsrb0-nGeMa5Pplsxmc_5i-_4f6ebVrabDKPP-OXpW9PjFyK6ybK-D8h1UuhIxgEMHTelfIkHjayxBb161DzwzU9kw9P2IQBp1y26sUbEEv_PSUICIYxEjSP9T69Hnf_McTtahcf4suh9IkBXCFpfT9J9vfE8CDHooCxFPfcX8nivRciWXLXDMzJhFJz4iTOSKr3vdgdLld91-SL7F2hWR5DhWSHeskPT1P42FKtAGX_GZi7_ZTyXrfXRreRLkcyrmx6o0eHIO79nqnfL68hW1rip1SVOnl9ThtswEe_LkgzHjqeX0NGomQ


In the REST client interface enter the following information:

1. Protocol:  **HTTPS**
<br/>
<br/>

2. URI: **your scoring endpoint**  (Step 10)
<br/>
<br/>
3. method: **POST**
<br/>
<br/>
4. Authorization:  **your generated token**. **Hint**: Add "Basic authorization" with a dummy value of 1 in the userid field. Then replace the value with the token. 
<br/>
<br/>
5. Content Type: **application/JSON**
<br/>
<br/>
6. JSON Body:<br/>**{
  "fields": [
    "ID","Gender","Status","Children","EstIncome","CarOwner","Age","LongDistance","International","Local","Dropped","Paymethod","LocalBilltype","LongDistanceBilltype","Usage","RatePlan"
  ],
  "values": [ 
  [999,"F","M",2.0,77551.100000,"Y",33.600000,20.530000,0.000000,41.890000,1.000000,"CC","Budget","Intnl_discount",62.420000,2.000000]
  ]
} **
<br/>
<br/>
7. Click **Send*

Scroll down to the **RESPONSE** section to see the scored results

**Note:** The values in the JSON body does not include the label.


**Sample REST Client Input**
![Rest Client Input](https://github.com/ibm-cloud-architecture/refarch-data-science/blob/master/static/imgs/RestRequest.PNG?raw=true)

You have come to the end of this notebook


**Sidney Phoon**
<br/>
yfphoon@us.ibm.com
<br/>
August 11, 2017