<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>Predict the best drug for heart treatment with IBM Watson Machine Learning (SPSS)</b></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>
   </tr>
   <tr style="border: none">
       <th style="border: none"><img src="https://github.com/pmservice/drug-selection/raw/master/images/heart_banner.png" width="600" alt="Icon"> </th>
   </tr>
</table>

This notebook contains steps and code to load an IBM SPSS predictive model to IBM Bluemix Cloud and start scoring new data. This notebook introduces commands for getting data, model persistance to Watson Machine Learning repository, model deployment, and batch scoring.

Some familiarity with Python is helpful. This notebook uses Python 2.0.

You will use the **drug_batch_data** data set, which is published on GitHub and details anonymous patients records. Use the details of this data set to predict the best drug for heart disease treatment.

## Learning goals

The learning goals of this notebook are:

-  Load a CSV file into Db2 on Cloud (formerly DashDB).
-  Persist the SPSS model in the Watson Machine Learning repository.
-  Deploy a model for batch scoring by using the Watson Machine Learning API.
-  Score sample scoring data by using the Watson Machine Learning API.
-  Explore and visualize prediction result using the `plotly` package.


## Contents

This notebook contains the following parts:

1.	[Setup](#setup)
2.	[Persist the model](#persistence)
3.	[Score in the cloud](#scoring)
4.	[Explore predictions](#explore)
4.	[Summary and next steps](#summary)

<a id="setup"></a>
## 1. Setup

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a [Watson Machine Learning Service](https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/) instance (a free plan is offered). 
-  Create a [Db2 on Cloud](https://console.ng.bluemix.net/catalog/services/db2-on-cloud/) instance (an entry plan is offered). 
-  Upload the **drugTrain2** data to Db2 on Cloud.


### Create the DRUGTRAIN2 table in Db2 on Cloud  

1.  Download the [drug_batch_data.csv](https://github.com/pmservice/drug-selection/blob/master/data/drug_batch_data.csv) file from the GitHub repository.
2.  Click the **Open the console to get started with Db2 on Cloud** icon.
3.  Select the **Load Data** and **Desktop** load type.
4.  Drag and drop the previously downloaded file and click **Next**.
5.  The **DRUG_BATCH_DATA** table with uploaded data should be created for you.

<a id="persistence"></a>
## 2. Persist the model

In this section you will learn how to store your model in the Watson Machine Learning repository by using the REST API.

**Action**: Put the authentication information (URL and access_key) from your instance of Watson Machine Learning service in the following code cell:

In [None]:
url = "https://ibm-watson-ml.mybluemix.net"
access_key ="***"

**Tip**: The URL and access_key can be found on the **Service Credentials** tab of the service instance you created in Bluemix.

### 2.1: Download the sample SPSS stream

**Action**: Download the sample SPSS stream from the GitHub project by using the `wget` command.

**Example**: First, you need to install required packages. You can do it by running the following code. Run it only one time.<BR><BR>
!pip install wget --user <BR>

In [None]:
!wget https://github.com/pmservice/drug-selection/raw/master/model/Drug1n_capitalized.str

**Tip**: If you are using your own stream make sure that the columns names that are used in the stream and the ones that are used in the database have the same capitalization, for example UPPER CASE letters.

### 2.2: Deploy the Drug1n_capitalized.str to the Watson Machine Learning service

In [None]:
import urllib3, requests, json

In [None]:
context_id = "drug_cap_stream"
upload_endpoint = url + "/pm/v1/file/" + context_id + "?accesskey=" + access_key
files = {'file': ('Drug1n_capitalized.str', open('Drug1n_capitalized.str', 'rb'))}

In [None]:
upload_response = requests.put(upload_endpoint, files=files)

print upload_response
print upload_response.text

As you can see the model is deployed successfuly to the Watson Machine Learning service on cloud.

**Tip**: The `context_id` variable can be any string that describes your model.

<a id="scoring"></a>
## 3. Score in the cloud by using a batch job

In this section you will learn how to a create batch job and score records present in Db2 on Cloud by using the Watson Machine Learning REST API. 
For more information about REST APIs, see the [Bluemix Documentation](https://console.ng.bluemix.net/docs/services/PredictiveModeling/index.html).

### 3.1: Create a connection map to the Db2 table with data

Using your Db2 credentials from Bluemix update the **host**, **port**, **db**, **username**, and **password** values in the following `dbDefinitions` dictionary.

In [None]:
dbDefinitions = {
    "db1":{
         "type":"DashDB",
         "host":"awh-yp-small02.services.dal.bluemix.net",
         "port":50000,
         "db":"BLUDB",
         "username":"***",
         "password":"***"
      }
   }

**Tip**: All the required fields can be found on the **Service Credentials** tab of the Db2 on Cloud service instance that you created in Bluemix.

If you use different names, you'll need to update the `table` name, in the example `DRUG_BATCH_DATA` and `node` name to reflect your model's input/output node names in the following dictionary. You can also update the result `table` name from `RESULTS_DRUG` to any custom string in the `exports` section.

In [None]:
settings = {
      "inputs":[
         {
            "odbc":{
               "dbRef":"db1",
               "table":"DRUG_BATCH_DATA"
            },
            "node":"scoreInput",
            "attributes":[

            ]
         }
      ],
      "exports":[
         {
            "odbc":{
               "dbRef":"db1",
               "table":"RESULTS_DRUG",
               "insertMode":"Create"
            },
            "node":"Table",
            "attributes":[

            ]
         }
      ]
   }

**Tip**: The database table name must match the SPSS Modeler stream file. The table with data to score that should be put into the **inputs: table**; **inputs/exports: node** variables must match the input and output node names that are used in the SPSS modeler stream file.

### 3.2: Submit the batch job

In [None]:
job_id = "drug_job5"
batch_endpoint = url + "/pm/v1/jobs/" + job_id + "?accesskey=" + access_key

batch_payload = {
    "action":"BATCH_SCORE", 
    "model":{
      "id":"drug_cap_stream",
      "name":"Drug1n_capitalized.str"
   },
   "dbDefinitions": dbDefinitions,
   "setting": settings
}

batch_header = {"Content-Type": "application/json"}

In [None]:
batch_response = requests.post(batch_endpoint, json=batch_payload, headers=batch_header)

print batch_response
print batch_response.text

After the batch job has been submitted, you can check the status of your batch job by using the following REST API method:

In [None]:
batch_status_response = requests.get(batch_endpoint)

print batch_status_response
print batch_status_response.text

As we can see our batch job status is SUCCESS. Prediction results are stored in the `RESULTS_DRUG` table. Let's connect to Db2 on Cloud and explore it.

<a id="explore"></a>
## 4. Explore predictions 

In this section we will connect to the `RESULTS_DRUG` table by using the Apache Spark `read` method to explore the prediction results.

### 4.1: Data exploration

Use the following code to read the predictions results into a Spark data frame:

In [None]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

properties = {
    'jdbcurl': 'jdbc:db2://awh-yp-small02.services.dal.bluemix.net:50000/BLUDB',
    'user': '***',
    'password': '***'
}

data = spark.read.jdbc(properties['jdbcurl'], table='DASH111858.RESULTS_DRUG', properties=properties)
data.head()

The results show that data has been loaded correctly. Now we can check the schema of the prediction data by using the `printSchema()` method.

In [None]:
data.printSchema()

Next, to preview the prediction data, call the `show()` method.

In [None]:
data.show()

In the resulting table two columns with predicted drug (`N-DRUG`) and probability (`NC-DRUG`) are shown.

In [None]:
data.select("$N-DRUG").groupBy("$N-DRUG").count().show()

You can calculate drug distribution by using a `select` statement.

### 4.2: Create a sample visualization of the data by using the `plotly` package

In this subsection you will explore prediction results with Plotly, which is an online analytics and data visualization tool.

**Example**:  First, you need to install required packages. You can do it by running the following code. Run it only one time.

!pip install plotly --user

!pip install cufflinks --user

In [None]:
!pip install plotly --user 
!pip install cufflinks --user

Import Plotly and other required packages.

In [None]:
import sys
import pandas
import plotly.plotly as py
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import cufflinks as cf
import plotly.graph_objs as go
init_notebook_mode(connected=True)
sys.path.append("".join([os.environ["HOME"]])) 

Convert the Apache Spark data frame to a Pandas data frame.

In [None]:
data_pdf = data.toPandas()

Plot a pie chart that shows drugs distribution.

In [None]:
cumulative_stats = data_pdf.groupby(['$N-DRUG']).count()

drug_data = [go.Pie(
            labels=cumulative_stats.index,
            values=cumulative_stats['$NC-DRUG'],
    )]

drug_layout = go.Layout(
    title='Heart treatment drugs distribution',
)

fig = go.Figure(data=drug_data, layout=drug_layout)
iplot(fig)

With this data set, you might want to do some analysis of the mean _k_ value per drug type by using a bar chart.

In [None]:
age_data = [go.Bar(
            y=data_pdf.groupby(['$N-DRUG']).mean()["K"],
            x=cumulative_stats.index
            
    )]

age_layout = go.Layout(
    title='Mean K per recommended drug',
    xaxis=dict(
        title = "Drug",
        showline=False,),
    yaxis=dict(
        title = "Mean K",
        ),
)

fig = go.Figure(data=age_data, layout=age_layout)
iplot(fig)

Based on the bar plot you created, you might make the following conclusion: The drugC and drugX is recommended for patients with high value of _k_.

<a id="summary"></a>
## 7. Summary and next steps     

 You successfully completed this notebook! You learned how to use Apache Spark machine learning as well as Watson Machine Learning for model creation and deployment. Check out our [Online Documentation](www.ibm.com) for more samples, tutorials, documentation, how-tos, and blog posts. 

### Authors

**Lukasz Cmielowski**, PhD, is an Automation Architect and Data Scientist in IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.

Copyright © 2017 IBM. This notebook and its source code are released under the terms of the MIT License.