<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>Use Spark and Scala to predict Equipment Purchase</b></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>
   </tr>
   <tr style="border: none">
       <th style="border: none"><img src="https://github.com/pmservice/wml-sample-models/blob/master/spark/product-line-prediction/images/products_graphics.png?raw=true" alt="Icon" width="800"> </th>
   </tr>
</table>

This notebook how to perform data analysis on classification problem using <a href="http://spark.apache.org/docs/2.3.0/ml-guide.html" target="_blank" rel="noopener no referrer">Spark ML package</a>.

Some familiarity with Scala is helpful. This notebook uses Scala 2.11 and Apache® Spark 2.x.

You will use a publicly available data set, **GoSales Transactions for Naive Bayes Model**, which details anonymous outdoor equipment purchases. This data set will be used to predict clients' interests in terms of product line, such as golf accessories, camping equipment, and so forth.

**Note**: In this notebook, we use the GoSales data available to the <a href="http://spark.apache.org/docs/2.3.0/ml-guide.html" target="_blank" rel="noopener no referrer">Watson Studio Community</a>(https://apsportal.ibm.com/exchange-api/v1/entries/8044492073eb964f46597b4be06ff5ea/data?accessKey=9561295fa407698694b1e254d0099600).
 
## Learning goals

The learning goals of this notebook are:

-  Load a CSV file into an Apache® Spark DataFrame.
-  Explore data.
-  Prepare data for training and evaluation.
-  Create an Apache® Spark machine learning pipeline.
-  Train and evaluate a model.
-  Store a pipeline and model in the Watson Machine Learning (WML) repository.
-  Deploy a model for online scoring via the Watson Machine Learning (WML) API.
-  Score the model using sample data via the Watson Machine Learning (WML) API.


## Contents

This notebook contains the following parts:

1.	[Set up the environment](#setup)
2.	[Load and explore the data](#load)
3.	[Build an Apache® Spark machine learning model](#model)
4.	[Store the model in the WML repository](#persistence)
5.	[Deploy and score in the WML repository](#scoring)
6.	[Summary and next steps](#summary)

<a id="setup"></a>
## 1. Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  -  Create a <a href="https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create the instance can be found <a href="https://dataplatform.ibm.com/docs/content/analyze-data/wml-setup.html" target="_blank" rel="noopener no referrer">here</a>)
-  Make sure that you are using a Spark 2.x kernel.
-  Download **GoSales Transactions** from the Watson Studio Community (code provided below).

<a id="load"></a>
## 2. Load and explore data

In this section, you will load the data as an Apache® Spark DataFrame and perform basic exploratory data analysis.

Use `wget` to upload the data to the IBM General Parallel File System (GPFS), load the data to the Spark DataFrame, and use Spark `read` method to read the data. 

TThe csv file, **GoSales_Tx_NaiveBayes.csv**, is availble in the IBM General Parallel File System (GPFS) - your local file system - now. Load the file into an Apache® Spark DataFrame using the code below.

In [1]:
import com.ibm.ibmos2spark.CloudObjectStorage

// @hidden_cell
var credentials = scala.collection.mutable.HashMap[String, String](
    "endPoint"->"https://s3-api.us-geo.objectstorage.service.networklayer.com",
    "apiKey"->"4rZXPwAseXpl80lFyQjZ0rijjNCROuKrPbdOFclSy9gA",
    "serviceId"->"iam-ServiceId-c4184730-422e-478e-867b-a36c0903cdd2",
    "iamServiceEndpoint" -> "https://iam.ng.bluemix.net/oidc/token")

var configurationName = "os_656e9739af38478fa65a8fb59db4dda3_configs"
var cos = new CloudObjectStorage(sc, credentials, configurationName, "bluemix_cos")

import org.apache.spark.sql.SparkSession

val spark = SparkSession.
    builder().
    getOrCreate()
val dfData = spark.
    read.format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").
    option("header", "true").
    option("inferSchema", "true").
    load(cos.url("watsonstudiosamplenotebooks-donotdelete-pr-rnj5eip2hhmmtq", "GoSales_Tx_NaiveBayes.csv"))

dfData.show(5)

+--------------------+------+---+--------------+------------+
|        PRODUCT_LINE|GENDER|AGE|MARITAL_STATUS|  PROFESSION|
+--------------------+------+---+--------------+------------+
|Personal Accessories|     M| 27|        Single|Professional|
|Personal Accessories|     F| 39|       Married|       Other|
|Mountaineering Eq...|     F| 39|       Married|       Other|
|Personal Accessories|     F| 56|   Unspecified| Hospitality|
|      Golf Equipment|     M| 45|       Married|     Retired|
+--------------------+------+---+--------------+------------+
only showing top 5 rows



credentials = Map(apiKey -> 4rZXPwAseXpl80lFyQjZ0rijjNCROuKrPbdOFclSy9gA, serviceId -> iam-ServiceId-c4184730-422e-478e-867b-a36c0903cdd2, endPoint -> https://s3-api.us-geo.objectstorage.service.networklayer.com, iamServiceEndpoint -> https://iam.ng.bluemix.net/oidc/token)
configurationName = os_656e9739af38478fa65a8fb59db4dda3_configs
cos = com.ibm.ibmos2spark.CloudObjectStorage@122cc8e7
spark = org.apache.spark.sql.SparkSession@59d62dc4
dfData = [PRODUCT_LINE: string, GENDER: string ... 3 more fields]


[PRODUCT_LINE: string, GENDER: string ... 3 more fields]

Explore the loaded data by using the following Apache® Spark DataFrame methods:
-  print schema
-  print top ten records
-  count all records

In [2]:
dfData.printSchema()

root
 |-- PRODUCT_LINE: string (nullable = true)
 |-- GENDER: string (nullable = true)
 |-- AGE: integer (nullable = true)
 |-- MARITAL_STATUS: string (nullable = true)
 |-- PROFESSION: string (nullable = true)



As you can see, the data contains five fields. PRODUCT_LINE field is the one you would like to predict (label).

In [3]:
dfData.show(10)

+--------------------+------+---+--------------+------------+
|        PRODUCT_LINE|GENDER|AGE|MARITAL_STATUS|  PROFESSION|
+--------------------+------+---+--------------+------------+
|Personal Accessories|     M| 27|        Single|Professional|
|Personal Accessories|     F| 39|       Married|       Other|
|Mountaineering Eq...|     F| 39|       Married|       Other|
|Personal Accessories|     F| 56|   Unspecified| Hospitality|
|      Golf Equipment|     M| 45|       Married|     Retired|
|      Golf Equipment|     M| 45|       Married|     Retired|
|   Camping Equipment|     F| 39|       Married|       Other|
|   Camping Equipment|     F| 49|       Married|       Other|
|  Outdoor Protection|     F| 49|       Married|       Other|
|      Golf Equipment|     M| 47|       Married|     Retired|
+--------------------+------+---+--------------+------------+
only showing top 10 rows



In [4]:
print("Total number of records: " + dfData.count())

Total number of records: 60252

As you can see, the data set contains 60252 records.

<a id="model"></a>
## 3. Build an Apache® Spark machine learning model

In this section, you will learn how to:

- [3.1 Split data](#prep)
- [3.2 Create an Apache® Spark machine learning pipeline](#pipe)
- [3.3 Train a model](#train)

### 3.1 Split data<a id="prep"></a>

In this subsection, you will split your data into: 
- Train data set
- Test data set
- Prediction data set

In [5]:
val splits = dfData.randomSplit(Array(0.8, 0.18, 0.02), seed = 24L)
val trainingData = splits(0).cache()
val testData = splits(1)
val predictionData = splits(2)

println("Number of training records: " + trainingData.count())
println("Number of testing records: " + testData.count())
println("Number of prediction records: " + predictionData.count())

Number of training records: 48176
Number of testing records: 10860
Number of prediction records: 1216


splits = Array([PRODUCT_LINE: string, GENDER: string ... 3 more fields], [PRODUCT_LINE: string, GENDER: string ... 3 more fields], [PRODUCT_LINE: string, GENDER: string ... 3 more fields])
trainingData = [PRODUCT_LINE: string, GENDER: string ... 3 more fields]
testData = [PRODUCT_LINE: string, GENDER: string ... 3 more fields]
predictionData = [PRODUCT_LINE: string, GENDER: string ... 3 more fields]


[PRODUCT_LINE: string, GENDER: string ... 3 more fields]

As you can see, your data has been successfully split into three data sets: 

-  The train data set which is the largest group is used for training.
-  The test data set will be used for model evaluation.
-  The predict data set will be used for prediction.

### 3.2 Create an Apache Spark machine learning pipeline<a id="pipe"></a>

In this section, you will create an Apache® Spark machine learning pipeline and train the model.

First, you need to import Apache® Spark machine learning packages that will be needed in the subsequent steps.

In [6]:
import org.apache.spark.ml.classification.RandomForestClassifier
import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer, IndexToString, VectorAssembler}
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
import org.apache.spark.ml.{Model, Pipeline, PipelineStage, PipelineModel}
import org.apache.spark.sql.SparkSession

In the following step, use the `StringIndexer` transformer to convert all the string fields to numeric ones.

In [7]:
val stringIndexerLabel = new StringIndexer().setInputCol("PRODUCT_LINE").setOutputCol("label").fit(dfData)
val stringIndexerProf = new StringIndexer().setInputCol("PROFESSION").setOutputCol("PROFESSION_IX")
val stringIndexerGend = new StringIndexer().setInputCol("GENDER").setOutputCol("GENDER_IX")
val stringIndexerMar = new StringIndexer().setInputCol("MARITAL_STATUS").setOutputCol("MARITAL_STATUS_IX")

stringIndexerLabel = strIdx_d2bd5876009b
stringIndexerProf = strIdx_68abd5d5ad04
stringIndexerGend = strIdx_d3abfcba5aef
stringIndexerMar = strIdx_7f5270668fb3


strIdx_7f5270668fb3

In the following step, create a feature vector by combining all the features together.

In [8]:
val vectorAssemblerFeatures = new VectorAssembler().setInputCols(Array("GENDER_IX", "AGE", "MARITAL_STATUS_IX", "PROFESSION_IX")).setOutputCol("features")

vectorAssemblerFeatures = vecAssembler_21c456a01984


vecAssembler_21c456a01984

Next, select the estimator you want to use for classification. `Random Forest` is used in the this example.

In [9]:
val rf = new RandomForestClassifier().setLabelCol("label").setFeaturesCol("features").setNumTrees(10)

rf = rfc_94f7237f87bd


rfc_94f7237f87bd

Finally, convert the indexed labels back to original labels.

In [10]:
val labelConverter = new IndexToString().setInputCol("prediction").setOutputCol("predictedLabel").setLabels(stringIndexerLabel.labels)

labelConverter = idxToStr_8f8cf30e4dd3


idxToStr_8f8cf30e4dd3

Now build the pipeline. A pipeline consists of transformers and an estimator.

In [11]:
val pipelineRf = new Pipeline().setStages(Array(stringIndexerLabel, stringIndexerProf, stringIndexerGend, stringIndexerMar, vectorAssemblerFeatures, rf, labelConverter))

pipelineRf = pipeline_9736d401be61


pipeline_9736d401be61

### 3.3 Train a model<a id="train"></a>

Now, you can train your Random Forest model by using the previously defined **pipeline** and **training data**.

In [12]:
trainingData.printSchema()

root
 |-- PRODUCT_LINE: string (nullable = true)
 |-- GENDER: string (nullable = true)
 |-- AGE: integer (nullable = true)
 |-- MARITAL_STATUS: string (nullable = true)
 |-- PROFESSION: string (nullable = true)



In [13]:
val modelRf = pipelineRf.fit(trainingData)

modelRf = pipeline_9736d401be61


pipeline_9736d401be61

You can check your **model accuracy** now. Use **test data** to evaluate the model.

In [14]:
val predictions = modelRf.transform(testData)
val evaluatorRF = new MulticlassClassificationEvaluator().setLabelCol("label").setPredictionCol("prediction").setMetricName("accuracy")
val accuracy = evaluatorRF.evaluate(predictions)

println(f"Accuracy = ${accuracy*100}%.2f%%")

Accuracy = 58.21%


predictions = [PRODUCT_LINE: string, GENDER: string ... 12 more fields]
evaluatorRF = mcEval_affc64abff81
accuracy = 0.5821362799263352


0.5821362799263352

You can tune your model now to achieve better accuracy. For simplicity, the tuning example is omitted in this example.

<a id="persistence"></a>
## 4. Store the model in the WML repository

In this section, you will learn how to use Scala libraries to store your pipeline and model in the WML repository and make predictions.

- [4.1 Import required libraries](#lib)
- [4.2 Save the pipeline and model](#save)
- [4.3 Load the model](#load)
- [4.4 Make predictions](#make)

### 4.1 Import required libraries<a id="lib"></a>

First, you must import required libraries.

**Note**: Apache® Spark 2.1 or higher is required.

In [15]:
// WML client library
import com.ibm.analytics.ngp.repository_v3._

// Helper libraries

import scalaj.http.{Http, HttpOptions}
import scala.util.{Success, Failure}
import java.util.Base64
import java.nio.charset.StandardCharsets
import play.api.libs.json._

Authenticate the Watson Machine Learning service on the IBM Cloud.

**Tip**: Authentication information (your credentials) can be found in the <a href="https://console.bluemix.net/docs/services/service_credentials.html#service_credentials" target="_blank" rel="noopener no referrer">Service credentials</a> tab of the service instance that you created on the IBM Cloud. <BR>If you cannot find the **instance_id** field in **Service Credentials**, click **New credential (+)** to generate new authentication information. 

**Action**: Enter your Watson Machine Learning service instance credentials here.

In [16]:
val wmlCredentials = """scala.collection.mutable.HashMap[String, String](
    "url"->"https://ibm-watson-ml.mybluemix.net",
    "username"->"***",
    "password"->"***",
    "instance_id"->"***"
)"""

wmlCredentials = 


scala.collection.mutable.HashMap[String, String](
    "url"->"https://ibm-watson-ml.mybluemix.net",
    "username"->"***",
    "password"->"***",
    "instance_id"->"***"
)


In [18]:
val wmlServicePath = wmlCredentials("url")
val wmlInstanceId = wmlCredentials("instance_id")
val wmlUsername = wmlCredentials("username")
val wmlPassword = wmlCredentials("password")

wmlServicePath = https://us-south.ml.cloud.ibm.com
wmlInstanceId = b4b6c696-172c-4164-8049-c0b621dbf3c9
wmlUsername = c7a34dfc-ac81-4d9e-93e5-61f30f0ceb78
wmlPassword = 16a762f1-03ed-4161-965a-7dbc58f8628a


16a762f1-03ed-4161-965a-7dbc58f8628a

In [19]:
val client = MLRepositoryClient(wmlServicePath)
client.authorize(wmlUsername, wmlPassword)

client = com.ibm.analytics.ngp.repository_v3.MLRepositoryClient@177b93b5


Success(())

Create the model artifact (abstraction layer).

In [20]:
val modelArtifact = MLRepositoryArtifact(modelRf, trainingData, "WML Product Line Prediction Model")

modelArtifact = com.ibm.analytics.ngp.repository_v3.SparkPipelineModelArtifact@5bf4af10


com.ibm.analytics.ngp.repository_v3.SparkPipelineModelArtifact@5bf4af10

**Tip**: The MLRepositoryArtifact method expects a trained model object, training data, and a model name. (It is this model name that is displayed by the WML service).

### 4.2 Save the pipeline and model<a id="save"></a>

In this subsection, you will learn how to save the pipeline and model artifacts in your WML repository.

In [21]:
val savedModel = client.models.save(modelArtifact).get

savedModel = com.ibm.analytics.ngp.repository_v3.MLRepositoryClient$ModelAdapter$$anon$14@45b1d53


com.ibm.analytics.ngp.repository_v3.MLRepositoryClient$ModelAdapter$$anon$14@45b1d53

Get the saved model metadata from WML.

**Tip**: Use *meta.availableProps* to get the list of available props.

In [22]:
savedModel.meta.availableProps

Vector(trainingDataSchema, modelUrl, trainingDefinitionVersionUrl, label, inputDataSchema, content_status, framework_runtimes, modelType, version, modelVersionUrl, artifactPath, modelInMemorySize, contentUrl, frameworkName, runtime, creationTime, frameworkVersion)

In [23]:
println("modelType: " + savedModel.meta.prop("modelType").get)
println("trainingDataSchema: " + savedModel.meta.prop("trainingDataSchema").get)
println("creationTime: " + savedModel.meta.prop("creationTime").get)
println("modelVersionUrl: " + savedModel.meta.prop("modelVersionUrl").get)
println("label: " + savedModel.meta.prop("label").get)

modelType: standard
trainingDataSchema: {"type":"struct","fields":[{"name":"PRODUCT_LINE","type":"string","nullable":true,"metadata":{"modeling_role":"target"}},{"name":"GENDER","type":"string","nullable":true,"metadata":{}},{"name":"AGE","type":"integer","nullable":true,"metadata":{}},{"name":"MARITAL_STATUS","type":"string","nullable":true,"metadata":{}},{"name":"PROFESSION","type":"string","nullable":true,"metadata":{}}]}
creationTime: 2019-02-27T13:50:04.628Z
modelVersionUrl: https://us-south.ml.cloud.ibm.com/v3/ml_assets/models/f2ea0716-7d1f-484a-8468-14f4b6ec7333/versions/8343488e-c304-419a-9bcc-23b3895c0446
label: PRODUCT_LINE


**Tip**: **modelVersionUrl** is the model unique indentifier in the WML repository.

### 4.3 Load the model<a id="load"></a>

In this subsection, you will learn how to load a saved model from a specified WML instance.

In [24]:
val modelVersionUrl = savedModel.meta.prop("modelVersionUrl").get
val loadedModelArtifact = client.models.version(modelVersionUrl).get

modelVersionUrl = https://us-south.ml.cloud.ibm.com/v3/ml_assets/models/f2ea0716-7d1f-484a-8468-14f4b6ec7333/versions/8343488e-c304-419a-9bcc-23b3895c0446
loadedModelArtifact = com.ibm.analytics.ngp.repository_v3.MLRepositoryClient$ModelAdapter$$anon$14@3c421f05


com.ibm.analytics.ngp.repository_v3.MLRepositoryClient$ModelAdapter$$anon$14@3c421f05

You can print the model name to make sure that model artifact has been loaded correctly.

In [25]:
loadedModelArtifact.name.mkString

WML Product Line Prediction Model

As you can see, the name is correct. 

### 4.4 Make predictions<a id="make"></a>

In [26]:
loadedModelArtifact match {
    case SparkPipelineModelLoader(Success(model)) => {
        val predictions = model.transform(predictionData)
    }
    case SparkPipelineModelLoader(Failure(e)) => "Loading failed."
    case _ => println(s"Unexpected artifact class: ${loadedModelArtifact.getClass}")
}

predictions.select("GENDER", "AGE", "MARITAL_STATUS", "PROFESSION", "predictedLabel").show(10)

+------+---+--------------+-----------+--------------------+
|GENDER|AGE|MARITAL_STATUS| PROFESSION|      predictedLabel|
+------+---+--------------+-----------+--------------------+
|     F| 18|        Single|      Other|Personal Accessories|
|     F| 18|        Single|     Retail|Personal Accessories|
|     F| 19|        Single|Hospitality|   Camping Equipment|
|     F| 19|        Single|Hospitality|   Camping Equipment|
|     F| 19|        Single|Hospitality|   Camping Equipment|
|     F| 19|        Single|Hospitality|   Camping Equipment|
|     F| 19|        Single|      Other|Personal Accessories|
|     F| 19|        Single|      Other|Personal Accessories|
|     F| 19|        Single|      Other|Personal Accessories|
|     F| 19|        Single|      Other|Personal Accessories|
+------+---+--------------+-----------+--------------------+
only showing top 10 rows



By tabulating the `predictedLabel` column and count the frequencies of `predictedLabel` classes, you can see which product line is the most popular.

In [27]:
predictions.select("predictedLabel").groupBy("predictedLabel").count().show()

+--------------------+-----+
|      predictedLabel|count|
+--------------------+-----+
|   Camping Equipment| 6356|
|      Golf Equipment|  631|
|Mountaineering Eq...|  699|
|Personal Accessories| 3174|
+--------------------+-----+



You have now learned how to save and load a model from the WML repository and make predictions.

<a id="scoring"></a>
## 5. Deploy and score in the WML repository

In this section, you will learn how to create online scoring and to score a test data record by using the WML REST API. 
For more information about REST APIs, see the [Swagger Documentation](http://watson-ml-api.mybluemix.net/).

To work with the WML REST API you must generate an access token. To do this, use the following code:

In [28]:
// Get WML service instance token

val wmlAuthHeader = "Basic " + Base64.getEncoder.encodeToString((wmlUsername + ":" + wmlPassword).getBytes(StandardCharsets.UTF_8))
val wmlUrl = wmlServicePath + "/v3/identity/token"
val wmlResponse = Http(wmlUrl).header("Authorization", wmlAuthHeader).asString
val wmlTokenJson:JsValue = Json.parse(wmlResponse.body)

val wmlToken = (wmlTokenJson \ "token").asOpt[String] match {
    case Some(x) => x
    case None => ""
}

wmlAuthHeader = Basic YzdhMzRkZmMtYWM4MS00ZDllLTkzZTUtNjFmMzBmMGNlYjc4OjE2YTc2MmYxLTAzZWQtNDE2MS05NjVhLTdkYmM1OGY4NjI4YQ==
wmlUrl = https://us-south.ml.cloud.ibm.com/v3/identity/token
wmlResponse = HttpResponse({"token":"eyJhbGciOiJSUzUxMiIsInR5cCI6IkpXVCJ9.eyJ0ZW5hbnRJZCI6ImI0YjZjNjk2LTE3MmMtNDE2NC04MDQ5LWMwYjYyMWRiZjNjOSIsImluc3RhbmNlSWQiOiJiNGI2YzY5Ni0xNzJjLTQxNjQtODA0OS1jMGI2MjFkYmYzYzkiLCJwbGFuSWQiOiIwZjJhM2MyYy00NTZiLTQwZjMtOWIxOS03MjZkMjc0MGIxMWMiLCJyZWdpb24iOiJ1cy1zb3V0aCIsInVzZXJJZCI6ImM3YTM0ZGZjLWFjODEtNGQ5ZS05M2U1LTYxZjMwZjBjZWI3OCIsImlzcyI6Imh0dHBzOi8vdXMtc291dGgubWwuY2xvdWQuaWJtLmNvbS92My9pZGVudGl0eSIsImlhdCI6MTU1MTI3NTQzNSwiZXhwIjoxNTUxMzA0MjM1LCJjcmVhdGVkVGltZSI6MTU1MTI3NTQzNX0.WW2BpipUc5XAnt0_J4YvGdAlbe9YRtgieYoKgio8arJYk...


HttpResponse({"token":"eyJhbGciOiJSUzUxMiIsInR5cCI6IkpXVCJ9.eyJ0ZW5hbnRJZCI6ImI0YjZjNjk2LTE3MmMtNDE2NC04MDQ5LWMwYjYyMWRiZjNjOSIsImluc3RhbmNlSWQiOiJiNGI2YzY5Ni0xNzJjLTQxNjQtODA0OS1jMGI2MjFkYmYzYzkiLCJwbGFuSWQiOiIwZjJhM2MyYy00NTZiLTQwZjMtOWIxOS03MjZkMjc0MGIxMWMiLCJyZWdpb24iOiJ1cy1zb3V0aCIsInVzZXJJZCI6ImM3YTM0ZGZjLWFjODEtNGQ5ZS05M2U1LTYxZjMwZjBjZWI3OCIsImlzcyI6Imh0dHBzOi8vdXMtc291dGgubWwuY2xvdWQuaWJtLmNvbS92My9pZGVudGl0eSIsImlhdCI6MTU1MTI3NTQzNSwiZXhwIjoxNTUxMzA0MjM1LCJjcmVhdGVkVGltZSI6MTU1MTI3NTQzNX0.WW2BpipUc5XAnt0_J4YvGdAlbe9YRtgieYoKgio8arJYk...

#### Get the published model urls from instance details.

In [29]:
val endpointInstance = wmlServicePath + "/v3/wml_instances/" + wmlInstanceId
val wmlResponseInstance = Http(endpointInstance).
                          header("Content-Type", "application/json").
                          header("Authorization", "Bearer " + wmlToken).
                          option(HttpOptions.connTimeout(10000)).
                          option(HttpOptions.readTimeout(50000)).asString

endpointInstance = https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9
wmlResponseInstance = 


HttpResponse({
  "metadata": {
    "guid": "b4b6c696-172c-4164-8049-c0b621dbf3c9",
    "url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9",
    "created_at": "2018-12-10T08:05:26.901Z",
    "modified_at": "2019-02-27T13:50:09.304Z"
  },
  "entity": {
    "source": "Bluemix",
    "published_models": {
      "url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models"
    },
    "usage": {
      "expiration_date": "2019-03-01T00:00:00.000Z",
      "computation_time": {
        "current": 0
      },
      "gpu_count_k80": {
 ...


In [30]:
wmlResponseInstance

HttpResponse({
  "metadata": {
    "guid": "b4b6c696-172c-4164-8049-c0b621dbf3c9",
    "url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9",
    "created_at": "2018-12-10T08:05:26.901Z",
    "modified_at": "2019-02-27T13:50:09.304Z"
  },
  "entity": {
    "source": "Bluemix",
    "published_models": {
      "url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models"
    },
    "usage": {
      "expiration_date": "2019-03-01T00:00:00.000Z",
      "computation_time": {
        "current": 0
      },
      "gpu_count_k80": {
        "limit": 48,
        "current": 0
      },
      "model_count": {
        "limit": 1000,
        "current": 25
      },
  ...


####  Create an online scoring endpoint.
Run the following code that uses the `publishedModelId` value to create an online scoring endpoint ino the WML repository.

In [None]:
val publishedModelsJson: JsValue = Json.parse(wmlResponseInstance.body)
val publishedModelsUrl = (((publishedModelsJson \ "entity") \\ "published_models")(0) \ "url").as[JsString].value
publishedModelsUrl

#### Get the list of published models.

In [32]:
val wmlModels = Http(publishedModelsUrl).
                header("Content-Type", "application/json").
                header("Authorization", "Bearer " + wmlToken).
                option(HttpOptions.connTimeout(10000)).
                option(HttpOptions.readTimeout(50000)).asString
wmlModels

wmlModels = 


HttpResponse({
  "limit": 1000,
  "resources": [{
    "metadata": {
      "guid": "95bfca55-0c53-43d2-86ed-5c770f403ee6",
      "url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models/95bfca55-0c53-43d2-86ed-5c770f403ee6",
      "created_at": "2019-01-15T14:49:42.639Z",
      "modified_at": "2019-01-15T14:49:46.465Z"
    },
    "entity": {
      "runtime_environment": "python-3.5",
      "learning_configuration_url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models/95bfca55-0c53-43d2-86ed-5c770f403ee6/learning_configuration",
      "name": "MNIST - compressed keras model",
      "learning_iterations_url": "https://us-south.ml.clo...


In [None]:
var deploymentEndpoint: String = _

wmlModels.body.split("\"").map{ 
    s => {
        if ((s contains "deployments") & (s contains savedModel.uid.mkString)) {
            deploymentEndpoint = s
        }
    }
}

deploymentEndpoint

#### Create an online deployment for published model.

In [34]:
val payloadName = "Online scoring"
val payloadDataOnline = Json.stringify(Json.toJson(Map("type" -> "online", "name" -> payloadName)))

payloadName = Online scoring
payloadDataOnline = {"type":"online","name":"Online scoring"}


{"type":"online","name":"Online scoring"}

In [35]:
val responseOnline = Http(deploymentEndpoint).
                     postData(payloadDataOnline).
                     header("Content-Type", "application/json").
                     header("Authorization", "Bearer " + wmlToken).
                     option(HttpOptions.connTimeout(50000)).
                     option(HttpOptions.readTimeout(50000)).asString

responseOnline = 


HttpResponse({
  "metadata": {
    "guid": "304c671c-6a48-496d-b27b-3ad9888a0156",
    "url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models/f2ea0716-7d1f-484a-8468-14f4b6ec7333/deployments/304c671c-6a48-496d-b27b-3ad9888a0156",
    "created_at": "2019-02-27T13:50:39.730Z",
    "modified_at": "2019-02-27T13:50:42.946Z"
  },
  "entity": {
    "runtime_environment": "spark-2.3",
    "name": "Online scoring",
    "scoring_url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models/f2ea0716-7d1f-484a-8468-14f4b6ec7333/deployments/304c671c-6a48-496d-b27b-3ad9888a0156/online",
    "deployable_asset": {
      "name": "WML Product Line...


In [36]:
val scoringUrlJson: JsValue = Json.parse(responseOnline.body)
val scoringUrl = (scoringUrlJson \ "entity" \ "scoring_url").asOpt[String] match {
    case Some(x) => x
    case None => ""
}

scoringUrlJson = {"metadata":{"guid":"304c671c-6a48-496d-b27b-3ad9888a0156","url":"https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models/f2ea0716-7d1f-484a-8468-14f4b6ec7333/deployments/304c671c-6a48-496d-b27b-3ad9888a0156","created_at":"2019-02-27T13:50:39.730Z","modified_at":"2019-02-27T13:50:42.946Z"},"entity":{"runtime_environment":"spark-2.3","name":"Online scoring","scoring_url":"https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models/f2ea0716-7d1f-484a-8468-14f4b6ec7333/deployments/304c671c-6a48-496d-b27b-3ad9888a0156/online","deployable_asset":{"name":"WML Product Line Prediction Model","url":"https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-...


{"metadata":{"guid":"304c671c-6a48-496d-b27b-3ad9888a0156","url":"https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models/f2ea0716-7d1f-484a-8468-14f4b6ec7333/deployments/304c671c-6a48-496d-b27b-3ad9888a0156","created_at":"2019-02-27T13:50:39.730Z","modified_at":"2019-02-27T13:50:42.946Z"},"entity":{"runtime_environment":"spark-2.3","name":"Online scoring","scoring_url":"https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models/f2ea0716-7d1f-484a-8468-14f4b6ec7333/deployments/304c671c-6a48-496d-b27b-3ad9888a0156/online","deployable_asset":{"name":"WML Product Line Prediction Model","url":"https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-...

In [None]:
print(scoringUrl)

In [38]:
val payloadScoring = Json.stringify(
    Json.toJson(
        Map(
            "fields" -> Json.toJson(List(Json.toJson("GENDER"), Json.toJson("AGE"), Json.toJson("MARITAL_STATUS"), Json.toJson("PROFESSION"))),
            "values" -> Json.toJson(List(List(Json.toJson("M"), Json.toJson(55), Json.toJson("Single"), Json.toJson("Executive"))))
        )
    )
)

payloadScoring = {"fields":["GENDER","AGE","MARITAL_STATUS","PROFESSION"],"values":[["M",55,"Single","Executive"]]}


{"fields":["GENDER","AGE","MARITAL_STATUS","PROFESSION"],"values":[["M",55,"Single","Executive"]]}

In [39]:
payloadScoring

{"fields":["GENDER","AGE","MARITAL_STATUS","PROFESSION"],"values":[["M",55,"Single","Executive"]]}

Now, you can send (POST) new scoring records (new data) for predictions. To do that, run the following sample code: 

In [40]:
val responseScoring = Http(scoringUrl).
                      postData(payloadScoring).
                      header("Content-Type", "application/json").
                      header("Authorization", "Bearer " + wmlToken).
                      option(HttpOptions.method("POST")).
                      option(HttpOptions.connTimeout(10000)).
                      option(HttpOptions.readTimeout(50000)).asString

responseScoring = 


HttpResponse({
  "fields": ["GENDER", "AGE", "MARITAL_STATUS", "PROFESSION", "PROFESSION_IX", "GENDER_IX", "MARITAL_STATUS_IX", "features", "rawPrediction", "probability", "prediction", "predictedLabel"],
  "values": [["M", 55, "Single", "Executive", 3.0, 0.0, 1.0, [0.0, 55.0, 1.0, 3.0], [2.5050408694752604, 1.8771700964659068, 2.338808434620882, 3.1481331680893025, 0.13084743134864898], [0.250504086947526, 0.1877170096465907, 0.2338808434620882, 0.31481331680893027, 0.013084743134864898], 3.0, "Golf Equipment"]]
},200,Map(cache-control -> Vector(private, no-cache, no-store, must-revalidate), Connection -> Vector(keep-alive), Content-Length -> Vector(507), Content-Type -> Vector(application/json), Date -> Vector(Wed, 27 Feb 2019 13:50:...


In [41]:
print(responseScoring)

HttpResponse({
  "fields": ["GENDER", "AGE", "MARITAL_STATUS", "PROFESSION", "PROFESSION_IX", "GENDER_IX", "MARITAL_STATUS_IX", "features", "rawPrediction", "probability", "prediction", "predictedLabel"],
  "values": [["M", 55, "Single", "Executive", 3.0, 0.0, 1.0, [0.0, 55.0, 1.0, 3.0], [2.5050408694752604, 1.8771700964659068, 2.338808434620882, 3.1481331680893025, 0.13084743134864898], [0.250504086947526, 0.1877170096465907, 0.2338808434620882, 0.31481331680893027, 0.013084743134864898], 3.0, "Golf Equipment"]]
},200,Map(cache-control -> Vector(private, no-cache, no-store, must-revalidate), Connection -> Vector(keep-alive), Content-Length -> Vector(507), Content-Type -> Vector(application/json), Date -> Vector(Wed, 27 Feb 2019 13:50:44 GMT), pragma -> Vector(no-cache), Server -> Vector(nginx), Status -> Vector(HTTP/1.1 200 OK), Strict-Transport-Security -> Vector(max-age=31536000; includeSubDomains), x-content-type-options -> Vector(nosniff), x-envoy-upstream-service-time -> Vector(1

As you can see, you can predict that a 55-year-old single male executive is interested in Mountaineering Equipment (prediction: 2.0).

<a id="summary"></a>
## 6. Summary and next steps 

You successfully completed this notebook! 

You learned how to use Apache® Spark machine learning as well as Watson Machine Learning for model creation and deployment. 

Check out our <a href="https://dataplatform.ibm.com/docs/content/analyze-data/wml-setup.html" target="_blank" rel="noopener noreferrer">Online Documentation</a> for more samples, tutorials, documentation, how-tos, and blog posts. 
 
### Authors

**Umit Mert Cakmak** is a Data Scientist at IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable insights.  
**Jihyoung Kim**, Ph.D., is a Data Scientist at IBM who strives to make data science easy for everyone through Watson Studio.

Copyright © 2017-2019 IBM. This notebook and its source code are released under the terms of the MIT License.

<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>