<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>Use of Continuous Learning System to select the best heart drug with IBM Watson Machine Learning</b></font></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>
   </tr> 
   <tr style="border: none">
       <td style="border: none"><img src="https://github.com/pmservice/wml-sample-models/raw/master/spark/drug-selection/images/learning_banner-05.png" width="600" alt="Icon"></td>
   </tr>
</table>

This notebook contains steps and code to configure **continuous learning system**, and start scoring new data. This notebook introduces commands for getting data, model persistance to Watson Machine Learning repository, model deployment, continuous learning system configuration and scoring.

Some familiarity with Python is helpful. This notebook uses Python 2 and Apache® Spark 2.0.

You will use published on git data set, **drug_feedback_data.csv**, which details anonymous patients records. Use the details of this data set to predict best drug for heart disease treatment.

## Learning goals

The learning goals of this notebook are:

-  Prepare feedback data set in Db2 Warehouse on Cloud on Bluemix.
-  Publish a sample model in Watson Machine Learning repository.
-  Configure continuous learning system for published model using Watson Machine Learning API.
-  Deploy a model for online scoring using Watson Machine Learning API.
-  Track model performance changes after learning system iteration using Watson Machine Learning API.
-  Explore and visualize model performance using the plotly package and Watson Machine Learning API.



## Contents

This notebook contains the following parts:

1.	[Setup](#setup)
2.	[Create spark ml model](#model)
3.	[Persist model](#load)
4.	[Configure continuous learning system](#configuration)
5.	[Track model performance](#performance)
6.	[Visualization of model performance](#visualization)
7.	[Feedback records](#visualization)
8.	[Summary and next steps](#summary)

<a id="setup"></a>
## 1. Setup

Before you use the sample code in this notebook, you must perform the following setup tasks:

- Create a [Watson Machine Learning Service](https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/) instance (a free plan is offered).
- Create a [Spark Service](https://console.ng.bluemix.net/catalog/services/spark/) instance (an entry plan is offered).
- Create a [Db2 Warehouse on Cloud Service](https://console.bluemix.net/catalog/services/db2-warehouse-on-cloud/) instance (an entry plan is offered).
- Create the **DRUG_TRAIN_DATA_UPDATED** and **DRUG_FEEDBACK_DATA** tables in **Db2 Warehouse on Cloud**. 
  + Download [drug_train_data_updated.csv](https://raw.githubusercontent.com/pmservice/wml-sample-models/master/spark/drug-selection/data/drug_train_data_updated.csv) and [drug_feedback_data.csv](https://raw.githubusercontent.com/pmservice/wml-sample-models/master/spark/drug-selection/data/drug_feedback_data.csv) files from git repository.
  + Click the **Open the console** to get started with **Db2 Warehouse on Cloud** icon.
  + Select the **Load Data** and **Desktop** load type.
  + **Drag and drop** previously downloaded file and press **Next**.
  + Select **Schema** to import data and click **New Table**. 
  + Write name for **new table** than click **Next** to finish data import.
  + Use `;` as **field separator**.
  + Click **Next** to create table with uploaded data.

<a id="model"></a>
## 2. Create spark ml model

In this section you will learn how to prepare data, create an Apache® Spark machine learning pipeline, and train a model.

### 2.1: Load training data from DashDB

Using below code you will load DRUG_TRAIN_DATA_UPDATED table content into Spark Data Frame.

In [1]:
db2_credentials = {
    'jdbcurl': 'jdbc:db2://dashdb-entry-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB',
    'user': 'dash14647',
    'password': 'a3803360760c'
}

In [2]:
tablename = "{schema}.{table}".format(schema=db2_credentials['user'], table='DRUG_TRAIN_DATA_UPDATED')
db2_properties = {x: db2_credentials.get(x) for x in db2_credentials.keys() if x in ['jdbcurl', 'user', 'password']}

In [3]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
DRUG_TRAIN_DATA_UPDATED_data = spark.read.jdbc(db2_credentials['jdbcurl'], table=tablename, properties=db2_properties)

**Tip:** All required fields can be found on Service Credentials tab of Db2 Warehouse on Cloud service instance created in Bluemix. In case you do not have any credential, you can create one by clicking "New credential"

In [4]:
DRUG_TRAIN_DATA_UPDATED_data.show(5)

+---+---+----+-----------+--------+--------+-----+
|AGE|SEX|  BP|CHOLESTEROL|      NA|       K| DRUG|
+---+---+----+-----------+--------+--------+-----+
| 43|  M|HIGH|       HIGH|0.656371|0.046979|drugA|
| 32|  M|HIGH|     NORMAL|0.529750|0.056087|drugA|
| 37|  F|HIGH|       HIGH|0.559171|0.042713|drugA|
| 24|  M|HIGH|     NORMAL|0.613261|0.064726|drugA|
| 29|  M|HIGH|       HIGH|0.625272|0.048637|drugA|
+---+---+----+-----------+--------+--------+-----+
only showing top 5 rows



DRUG column is the target/label column.

### 2.2: Prepare data

In this subsection you will split your data into: train and test datasets.

In [5]:
(train_data, test_data) = DRUG_TRAIN_DATA_UPDATED_data.randomSplit([0.8, 0.2], 24)

print("Number of records for training: " + str(train_data.count()))
print("Number of records for evaluation: " + str(test_data.count()))

Number of records for training: 150
Number of records for evaluation: 31


As you can see our data has been successfully split into two datasets:
 - The train data set, which is the largest group, is used for training.
 - The test data set will be used for model evaluation.

### 2.3: Create pipeline and train a model

In this section you will create an Apache® Spark machine learning pipeline and then train the model.

In the first step you need to import the Apache® Spark machine learning packages that will be needed in the subsequent steps.

In [6]:
from pyspark.ml.feature import OneHotEncoder, StringIndexer, IndexToString, VectorAssembler
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml import Pipeline, Model

In the following step, convert all the string fields to numeric ones by using the StringIndexer transformer.

In [7]:
stringIndexer_sex = StringIndexer(inputCol = 'SEX', outputCol = 'SEX_IX')
stringIndexer_bp = StringIndexer(inputCol = 'BP', outputCol = 'BP_IX')
stringIndexer_chol = StringIndexer(inputCol = 'CHOLESTEROL', outputCol = 'CHOL_IX')
stringIndexer_label = StringIndexer(inputCol="DRUG", outputCol="label").fit(DRUG_TRAIN_DATA_UPDATED_data)

In the following step, create a feature vector by combining all features together.

In [8]:
vectorAssembler_features = VectorAssembler(inputCols=["AGE", "SEX_IX", "BP_IX", "CHOL_IX", "NA", "K"], outputCol="features")

Next, define estimators you want to use for classification. Decision Tree is used in the following example.

In [9]:
dt = DecisionTreeClassifier(labelCol="label", featuresCol="features")

Finally, indexed labels back to original labels.

In [10]:
labelConverter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=stringIndexer_label.labels)

Let's build the pipeline now. A pipeline consists of transformers and an estimator.

In [11]:
pipeline_dt = Pipeline(stages=[stringIndexer_label, stringIndexer_sex, stringIndexer_bp, stringIndexer_chol, vectorAssembler_features, dt, labelConverter])

Now, you can train your Decision Tree model by using the previously defined pipeline and train data.

In [12]:
model = pipeline_dt.fit(train_data)

You can check your model accuracy now. To evaluate the model, use test data.

In [13]:
predictions = model.transform(test_data)
evaluatorDT = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="accuracy")
accuracy = evaluatorDT.evaluate(predictions)

print("Accuracy = %g" % accuracy)

Accuracy = 0.870968


You can tune your model now to achieve better accuracy. For simplicity of this example tuning section is omitted.

<a id="load"></a>
## 3. Persist model

In this section you will learn how to store sample model in Watson Machine Learning repository by client library.

First, you must import client libraries.

**Note**: Apache® Spark 2.0 is required.

In [14]:
from repository.mlrepositoryclient import MLRepositoryClient
from repository.mlrepositoryartifact import MLRepositoryArtifact
from repository.mlrepository import MetaProps, MetaNames
import json

Authenticate to Watson Machine Learning service on Bluemix.

**Action**: Put authentication information from your instance of Watson Machine Learning service here.</div>

In [15]:
wml_credentials={
  "url": "https://ibm-watson-ml.mybluemix.net",
  "access_key": "Sgj8kPV1L8/9JUBGEc0APlrsNiY+v2p3IEY62gdGVvLvkSqIh6u5O+lOe53opmOGpxSFKe9cZoFYLlzgPf++qpWZYcc+6fawL9S0V+2V79Adc+zik+ZHJYrsBRl9GAcs",
  "username": "abb54eb2-091c-4041-a1d9-db0701f74481",
  "password": "23c110d7-b0c0-4ea7-b29a-3e70b7bf92fc",
  "instance_id": "81eb122f-a170-4efd-86b6-4a8afa16d30b"
}

**Tip**: `url`, `instance_id`, `username` and `password` can be found on **Service Credentials** tab of service instance created in Bluemix. If you cannot see **instance_id** field in **Serice Credentials** generate new credentials by pressing **New credential (+)** button.

Authenticate to Watson Machine Learning service on Bluemix.

In [16]:
ml_repository_client = MLRepositoryClient(wml_credentials['url'])
ml_repository_client.authorize(wml_credentials['username'], wml_credentials['password'])

#### Metadata preparation

Prepare additional information to be saved as model's metadata:
* TRAINING_DATA_REF
* EVALUATION_METHOD: **multiclass**
* EVALUATION_METRICS name: **accuracy** (metric name used to evaluate the model)
* EVALUATION_METRICS value: **0.87** (accuracy value calculated few steps above)
* EVALUATION_METRICS threshold: **0.8** (if the accuracy after evaluation using feedback data is below this threshold auto-retraining is triggered)

**Tip**: If the accuracy value goes below the threshold retraining action is required.

Prepare training data reference that will be required by continuous learning system to trigger retraning action.

In [17]:
training_data_reference = """{
 "connection": {
  "db": "BLUDB",
  "host": "dashdb-entry-yp-dal09-08.services.dal.bluemix.net",
  "username": "dash14647",
  "password": "a3803360760c"
 },
 "source": {
  "tablename": "DRUG_TRAIN_DATA_UPDATED",
  "type": "dashdb"
 }
}"""

**Tip**: All required fields can be found on Service Credentials tab of Db2 Warehouse on Cloud service instance created in Bluemix.

Add all information to model ```MetaProps```.

In [18]:
meta_props=MetaProps({
    MetaNames.TRAINING_DATA_REF: training_data_reference,
    MetaNames.EVALUATION_METHOD: "multiclass",
    MetaNames.EVALUATION_METRICS: json.dumps([{
        "name": "accuracy",
        "value": accuracy,
        "threshold": 0.8
    }])
})

Create model artifact (abstraction layer).

In [19]:
model_artifact = MLRepositoryArtifact(model, training_data=train_data, meta_props=meta_props, name="Best Heart Drug Selection")

**Tip**: The MLRepositoryArtifact method expects a trained model object, training data, and a model name. (It is this model name that is displayed by the Watson Machine Learning service).

In [20]:
saved_model = ml_repository_client.models.save(model_artifact)

Get saved model metadata from Watson Machine Learning.

**Tip**: Use meta.available_props() to get the list of available props.

In [21]:
saved_model.meta.available_props()

['inputDataSchema',
 'evaluationMetrics',
 'pipelineVersionHref',
 'modelVersionHref',
 'trainingDataRef',
 'pipelineType',
 'creationTime',
 'lastUpdated',
 'label',
 'authorEmail',
 'trainingDataSchema',
 'authorName',
 'version',
 'modelType',
 'runtime',
 'evaluationMethod']

In [22]:
print("modelType: " + saved_model.meta.prop("modelType"))
print("creationTime: " + str(saved_model.meta.prop("creationTime")))
print("modelVersionHref: " + saved_model.meta.prop("modelVersionHref"))
print("label: " + saved_model.meta.prop("label"))
print("evaluationMetrics: " + str(saved_model.meta.prop("evaluationMetrics")))
print("modelID: " + str(saved_model.uid))

modelType: sparkml-model-2.0
creationTime: 2017-11-02 08:23:50.929000+00:00
modelVersionHref: https://ibm-watson-ml-svt.stage1.mybluemix.net/v2/artifacts/models/60785887-5203-4be4-8448-cb33d914997b/versions/ef5c39df-fcd9-485a-a53d-cb0924d80394
label: DRUG
evaluationMetrics: [{'threshold': 0.8, 'name': 'accuracy', 'value': 0.8709677419354839}]
modelID: 60785887-5203-4be4-8448-cb33d914997b


**Tip**: `modelID` is our model unique indentifier in the Watson Machine Learning repository.

<a id="configuration"></a>
## 4. Configure continuous learning system

In this section you will learn how to configure continuous learning system with Watson Machine Learning REST API.
For more information about REST APIs, see the [Swagger Documentation](http://watson-ml-api.mybluemix.net/).

Continuous learning system provides you:
- monitoring of model quality
- model retraining if quality is below specified threshold
- model redeployment if retrained model performs better

To work with the Watson Machine Leraning REST API you must generate an **access token**. To do that you can use the following sample code:

In [22]:
import urllib3, requests, json, base64

headers = urllib3.util.make_headers(basic_auth='{username}:{password}'.format(username=wml_credentials['username'], password=wml_credentials['password']))
token_endpoint = '{}/v3/identity/token'.format(wml_credentials['url'])
response = requests.get(token_endpoint, headers=headers)
mltoken = json.loads(response.text).get('token')

In this subsection you will learn how to configure learning system for your model.

### Prepare Authorization header that combines Watson Machine Learning token and Spark instance credentials.

In [23]:
spark_credentials = {
  "tenant_id": "s125-1e4c8afed91552-01f17dcd4c8c",
  "tenant_id_full": "fa2f233e-2675-4f1c-9125-1e4c8afed915_a2485898-8ed5-43df-8352-01f17dcd4c8c",
  "cluster_master_url": "https://spark.stage1.bluemix.net",
  "tenant_secret": "b3616c28-7463-4560-9be4-87b2b7a66a37",
  "instance_id": "fa2f233e-2675-4f1c-9125-1e4c8afed915",
  "plan": "ibm.SparkService.PayGoPersonal"
}

In [24]:
spark_instance = {
  "credentials": spark_credentials,
  "version": "2.0"
}

In [25]:
encoded_spark_instance_header = base64.b64encode(json.dumps(spark_instance))
header_learning_configuration = {'Content-Type': 'application/json', 'Authorization': "Bearer " + mltoken, 'X-Spark-Service-Instance': encoded_spark_instance_header}

**Tip**: All required fields can be found on Service Credentials tab of Spark service instance created in Bluemix.

### Get published_models url from instance details

In [26]:
endpoint_instance =  "{url}/v3/wml_instances/{instance_id}".format(url=wml_credentials['url'], instance_id=wml_credentials['instance_id'])
header = {'Content-Type': 'application/json', 'Authorization': 'Bearer ' + mltoken}

response_get_instance = requests.get(endpoint_instance, headers=header)


print(response_get_instance)

<Response [200]>


In [27]:
endpoint_published_models = json.loads(response_get_instance.text).get('entity').get('published_models').get('url')

print(endpoint_published_models)

https://ibm-watson-ml.mybluemix.net/v3/wml_instances/81eb122f-a170-4efd-86b6-4a8afa16d30b/published_models


### Prepare the configuration payload.

In [28]:
published_model_ID = saved_model.uid
endpoint_learning_configuration = "{endpoint}/{model_id}/learning_configuration".format(endpoint=endpoint_published_models, model_id=published_model_ID)

Specify feedback data location that will be used to evaluate your model.

In [29]:
feedback_data_reference = {
    "connection": {
        "username": "dash14647",
        "host": "dashdb-entry-yp-dal09-08.services.dal.bluemix.net",
        "password": "a3803360760c",
        "db": "BLUDB"
    },
    "source": {
        "type": "dashdb",
         "tablename": "DRUG_FEEDBACK_DATA"
    }
}

**Tip**: Note that only ```tablename``` is different comparing to ```training_data_reference```.

Define values of the following fields to finalize payload:
- ```min_feedback_data_size``` - this is minimal number of records in feedback dataset to start continuous learning system iteration
- ```auto_retrain``` [never, always, conditionally] - this parameter specifies if retraining process should be triggered (conditionally will trigger retraining process when evaluation result is below specified threshold)
- ```auto_redeploy``` [never, always, conditionally] - this paramete specifies if retrained model should be deployed (conditionally will trigger redeployment when newly trained model quality is better)

In [30]:
payload_learning_configuration = {  
    "feedback_data_reference": feedback_data_reference,
    "min_feedback_data_size": 10,
    "auto_retrain": "conditionally",
    "auto_redeploy": "never"
}

### Set configuration for published model

In [31]:
response_put = requests.put(endpoint_learning_configuration, json=payload_learning_configuration, headers=header_learning_configuration)

print(response_put)

<Response [200]>


Learning configuration has been specified successfuly. You can check the details with below GET call.

In [32]:
endpoint_model = "{endpoint}/{model_id}".format(endpoint=endpoint_published_models , model_id=published_model_ID)
response_get = requests.get(endpoint_model, headers=header_learning_configuration)

print(response_get)
print(json.dumps(response_get.json()["entity"]["learning_configuration"], sort_keys=True, indent=2))

<Response [200]>
{
  "auto_redeploy": "never", 
  "auto_retrain": "conditionally", 
  "evaluation_definition": {
    "method": "multiclass", 
    "metrics": [
      {
        "name": "accuracy", 
        "threshold": 0.8
      }
    ]
  }, 
  "feedback_data_reference": {
    "connection": {
      "db": "BLUDB", 
      "host": "dashdb-entry-yp-dal09-08.services.dal.bluemix.net", 
      "password": "a3803360760c", 
      "username": "dash14647"
    }, 
    "source": {
      "tablename": "DRUG_FEEDBACK_DATA", 
      "type": "dashdb"
    }
  }, 
  "min_feedback_data_size": 10, 
  "spark_service": {
    "credentials": {
      "cluster_master_url": "https://spark.stage1.bluemix.net", 
      "instance_id": "fa2f233e-2675-4f1c-9125-1e4c8afed915", 
      "plan": "ibm.SparkService.PayGoPersonal", 
      "tenant_id": "s125-1e4c8afed91552-01f17dcd4c8c", 
      "tenant_id_full": "fa2f233e-2675-4f1c-9125-1e4c8afed915_a2485898-8ed5-43df-8352-01f17dcd4c8c", 
      "tenant_secret": "b3616c28-7463-4560-

### Patch configuration for published model

To update learning configuration you can use PATCH request as shown below.

In [33]:
patch_payload = [
  {
    "op": "replace",
    "path": "/feedback_data_reference/source/tablename",
    "value": "DRUG_FEEDBACK_DATA"
  }
] 

In [34]:
response_patch = requests.patch(endpoint_learning_configuration, json=patch_payload, headers=header_learning_configuration)

print(response_patch)

<Response [200]>


<a id="performance"></a>
## 5. Track model performance

To start iteration of learning system use below REST API method. Within iteration published model will be evaluated. If the evaluated accuracy is below specified threshold model retraining will be triggered. Both data sets: training and feedback are used for retraining and evaluation.

In [35]:
endpoint_learning_iteration =  "{endpoint}/{model_id}/learning_iterations".format(endpoint=endpoint_published_models, model_id=published_model_ID)
response_learning_iteration = requests.post(endpoint_learning_iteration, json={}, headers=header_learning_configuration)

print(response_learning_iteration)

<Response [201]>


**Tip**: This is asynchronous action. You can use below GET request to check the progress.

**Tip:** You can find iteration url in location header using below code.

In [36]:
iteration_url = response_learning_iteration.headers.get('Location')

print(iteration_url)

https://ibm-watson-ml.mybluemix.net/v3/wml_instances/81eb122f-a170-4efd-86b6-4a8afa16d30b/published_models/9a627d32-8183-4ea1-911c-a716243fa44b/learning_iterations/c60f90bd-ccea-427b-a937-0e92abdc7290


#### Get iterations details

Using this GET request you can check the stage and status of the iteration. For example you can see: 
```
"stage":"PrepareEvaluationData",
"status":"INITIALIZED"
```
or
```
"stage":"EvaluateModel",
"status":"RUNNING"
```

**You can use a while statement to check if iteration has completed**

In [39]:
import time

response_iterations = requests.get(iteration_url, headers=header_learning_configuration)
status = json.dumps(response_iterations.json()["entity"]["status"], sort_keys=True, indent=2)
state = response_iterations.json()["entity"]["status"]["state"]

print(status)

while state not in ["COMPLETED", "ERROR"]:
    time.sleep(4.0) 
    response_iterations = requests.get(iteration_url, headers=header_learning_configuration)
    status = json.dumps(response_iterations.json()["entity"]["status"], sort_keys=True, indent=2)
    state = response_iterations.json()["entity"]["status"]["state"]

    print(status)    

{
  "message": "New Version Not Deployed", 
  "result": {
    "model_version_guid": "955f576d-8bbe-460d-85fb-47f341ba0301", 
    "model_version_url": "https://ibm-watson-ml.mybluemix.net/v2/artifacts/models/9a627d32-8183-4ea1-911c-a716243fa44b/versions/955f576d-8bbe-460d-85fb-47f341ba0301"
  }, 
  "state": "COMPLETED"
}


#### Get evaluation values

In [40]:
metrics_href = "{endpoint}/{model_id}/evaluation_metrics".format(endpoint=endpoint_published_models, model_id=published_model_ID)
response_metrics = requests.get(metrics_href, headers=header_learning_configuration)

print json.dumps(response_metrics.json(), indent=2)

{
  "count": 4, 
  "resources": [
    {
      "phase": "setup", 
      "timestamp": "2017-11-02T10:00:50.740Z", 
      "values": [
        {
          "threshold": 0.8, 
          "name": "accuracy", 
          "value": 0.8709677419354839
        }
      ], 
      "model_version_url": "https://ibm-watson-ml.mybluemix.net/v2/artifacts/models/9a627d32-8183-4ea1-911c-a716243fa44b/versions/c2f32858-183f-4b74-a8c7-8c89fa2a1367"
    }, 
    {
      "phase": "monitoring", 
      "timestamp": "2017-11-02T10:02:23.943Z", 
      "values": [
        {
          "threshold": 0.8, 
          "name": "accuracy", 
          "value": 0.7555555555555555
        }
      ], 
      "model_version_url": "https://ibm-watson-ml.mybluemix.net/v2/artifacts/models/9a627d32-8183-4ea1-911c-a716243fa44b/versions/c2f32858-183f-4b74-a8c7-8c89fa2a1367"
    }, 
    {
      "phase": "setup", 
      "timestamp": "2017-11-02T10:02:30.699Z", 
      "values": [
        {
          "threshold": 0.8, 
          "name": "accu

**Tip**: To see evaluation result you need to wait for iteration completion.

**Action**: to display evaluation details in form of table you need to install ```tabulate``` package.

In [41]:
!pip install tabulate --user

Collecting tabulate
  Downloading tabulate-0.8.1.tar.gz (45kB)
[K    100% |████████████████████████████████| 51kB 2.2MB/s eta 0:00:01
[?25hBuilding wheels for collected packages: tabulate
  Running setup.py bdist_wheel for tabulate ... [?25ldone
[?25h  Stored in directory: /gpfs/fs01/user/sec7-2ac43fd194bc9a-8ef090487f81/.cache/pip/wheels/58/89/d5/15530bd5cb3729e1da8e9a9eb03ea81de30f94e44545400917
Successfully built tabulate
Installing collected packages: tabulate
Successfully installed tabulate-0.8.1


In [42]:
from tabulate import tabulate

metrics = response_metrics.json()['resources']
values = [(m["phase"], m["timestamp"][11:16], m["values"][0]["value"], m["values"][0]["threshold"], m["model_version_url"][-36:-1]) for m in metrics]

print tabulate([["Phase", "Time", "Accuracy", "Threshold", "Version"]] + values)

----------  -----  --------------  ---------  -----------------------------------
Phase       Time   Accuracy        Threshold  Version
setup       10:00  0.870967741935  0.8        c2f32858-183f-4b74-a8c7-8c89fa2a136
monitoring  10:02  0.755555555556  0.8        c2f32858-183f-4b74-a8c7-8c89fa2a136
setup       10:02  0.870967741935  0.8        955f576d-8bbe-460d-85fb-47f341ba030
training    10:02  0.92469031762   0.8        955f576d-8bbe-460d-85fb-47f341ba030
----------  -----  --------------  ---------  -----------------------------------


You can see that this iteration of continuous learning loop consists of the following phases:
- monitoring - using feedback data published model quality was checked (evaluation). 
- training - since evaluation result (0.75) is below specified threshold (0.8) model retraining was triggered. Evaluation of retrained model shows accuracy at 0.92.

**Tip**: If `auto_redeploy` option is set to conditionally the newly trained model will be redeployed since it shows better accuracy than original one.

<a id="visualization"></a>
## 6. Visualization of model performance

In this subsection you will visualize iteration results with Plotly, which is an online analytics and data visualization tool.

**Example**:  First, you need to install required packages. You can do it by running the following code. Run it only one time.

!pip install plotly --user

!pip install cufflinks --user

Import Plotly and other required packages.

In [43]:
import sys
import pandas
import plotly.plotly as py
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import cufflinks as cf
import plotly.graph_objs as go

init_notebook_mode(connected=True)
sys.path.append("".join([os.environ["HOME"]])) 

#### Prepare data for plotly

In [44]:
phases = []
evaluation_values = []
threshold_values = []

for i,x in enumerate(values):
    phases.append(x[0] + '_' + str(i))
    evaluation_values.append(x[2])
    threshold_values.append(x[3])

#### Plot linear chart

In [45]:
trace1 = go.Scatter(
    x = phases,
    y = evaluation_values,
    mode = 'lines+markers',
    name = 'accuracy'
)

trace2 = go.Scatter(
    x = phases,
    y = threshold_values,
    mode = 'lines',
    name = 'threshold'
)

layout = dict(title = 'Model performance',
              xaxis = dict(title = 'Phase'),
              yaxis = dict(title = 'Evaluation result'),
              )

fig = dict(data=[trace1, trace2], layout=layout)
iplot(fig)

Within single Continuous Learning System iteration we can notice two phases:
* monitoring - in that phase initial model is evaluated using feedback data
* training - in that phase model is retrained using combination of training and feedback data. Next, model is evaluated.
<BR><BR>
After retraining model accuracy increased to desired level (above specified threshold).

<a id="feedback"></a>
## 7. Feedback records   

You can use feedback endpoint to send new records to feedback data store.

Let's generate some records based on training data.

In [46]:
from pyspark.sql.functions import UserDefinedFunction, col, column
from pyspark.sql.types import IntegerType

col_name = 'AGE'
udf_add = UserDefinedFunction(lambda x: x + 1, IntegerType())
new_records_df = train_data.select(*[udf_add(column).alias(col_name) if column == col_name else column for column in train_data.columns])
new_records_df = new_records_df.withColumn("K", col("K").cast("double")).withColumn("NA", col("NA").cast("double"))
new_records_pdf = new_records_df.toPandas()

records=[]

for i in range(new_records_pdf.shape[0]):
    records.append(list(new_records_pdf.loc[i].values))

In next step feedback payload is created.

In [47]:
endpoint_feedback =  "{endpoint}/{model_id}/feedback".format(endpoint=endpoint_published_models, model_id=published_model_ID)
feedback_data = {
  "fields": train_data.columns,
  "values": records
}

Using POST request add your records to feedback data store.

In [288]:
response_feedback = requests.post(endpoint_feedback, json=feedback_data, headers=header_learning_configuration)

print(response_feedback)

<Response [200]>


**Tip:** Now, you can run another iteration of learning system using new feedback data.

<a id="summary"></a>
## 8. Summary and next steps     

 You successfully completed this notebook! You learned how to use Continuous Learning System of Watson Machine Learning. Check out our _[Online Documentation](https://console.ng.bluemix.net/docs/services/PredictiveModeling/index.html)_ for more samples, tutorials, documentation, how-tos, and blog posts. 

### Authors

**Lukasz Cmielowski**, PhD, is a Automation Architect and Data Scientist in IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.

Copyright © 2017 IBM. This notebook and its source code are released under the terms of the MIT License.