<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>Using an SPSS Modeler flow to predict car price</b></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>
   </tr>
</table>

This notebook provides insight on handling regression problems in data science using SPSS Modeler flows and a notebook. In machine learning, regression is used when the target (the variable that will be predicted) is numerical. There are various regression algorithms you can use to predict numerical data and you'll create a notebook that uses a model previously created using the XGBoost Trees node in Watson Studio. To create your own model using SPSS Modeler in Watson Studio, you can follow the steps provided in this blog: <a href="https://medium.com/ibm-watson/predict-the-price-of-a-car-using-spss-modeler-on-watson-studio-c94472886f9d" target="_blank" rel="noopener noreferrer">Predict the Price of a Car using SPSS Modeler on Watson Studio</a>.

In this notebook, you'll use a data set of <a href="https://archive.ics.uci.edu/ml/datasets/automobile" target="_blank" rel="noopener no referrer">Auto Imports</a> to predict the price of a car. This data set has 201 records, and is split into training and testing sets records by using an SPSS Modeler flow after the data is prepared, where it is also trained using the XGBoost Tree node. You'll learn how to use use Watson Machine Learning Client in a notebook to deploy and score the model created in Watson Studio. 

This notebook runs on Python. 

## Learning goals
- Use a SPSS Modeler model in Watson Studio
    - Save the model to the WML repository
- Use the Watson Machine Learning Client package
    - Deploy and score the selected model.

## Table of Contents
1. [Setting up your environment](#setup)
2. [Loading data](#loaddata)
3. [Retrieving the model from the WML Repository](#wml)
4. [Deploying the selected model and scoring data](#deploy)
5. [Summary and next steps](#summary)

<a id='setup'></a>
## 1. Setting up your environment

Before using the sample code in this notebook, you must:

- create a <a href="https://cloud.ibm.com/catalog/services/machine-learning" target="_blank" rel="noopener noreferrer">Watson Machine Learning (WML) Service</a> instance. A free plan is offered and information about how to create the instance can be found at <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-setup.html" target="_blank" rel="noopener noreferrer"> https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/wml-setup.html.</a>


<a id='loaddata'></a>
## 2. Loading data

1. Download the <a href="https://dataplatform.cloud.ibm.com/exchange/public/entry/view/9704374ab42cdd449b6112a0981dfbe1" target="_blank" rel="noopener noreferrer">UCI: Automobile Data Set</a> from the Watson Studio Community. 
2. Load the .csv file into your notebook. Click the Data icon on the notebook action bar. Drop the file into the box or browse to select the file. The file is loaded to your object storage and appears in the Data Assets section of the project. 
3. To load the data into a DataFrame, click in the next code cell and select **Insert to code > Insert Pandas DataFrame** under the file name.
4. Rename `df_data_x` to `df_data_1`.
5. Run the cell.


In [1]:
# Import the data set as a pandas DataFrame.


Unnamed: 0,symboling,normalized-losses,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,...,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price
0,3,,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111.0,5000.0,21,27,13495
1,3,,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111.0,5000.0,21,27,16500
2,1,,alfa-romero,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154.0,5000.0,19,26,16500
3,2,164.0,audi,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102.0,5500.0,24,30,13950
4,2,164.0,audi,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115.0,5500.0,18,22,17450


Here, you can see the first 5 rows of the DataFrame.

This data set has 201 rows and 26 columns, and the target field to be predicted is the price of the car, which is numerical. The data set has various numerical and categorical variables that affect the price of the car, such as engine location, engine size, number of cylinders, etc. So the XGBoost Tree node is trained on the significant features in this data set, and this trained node is used to predict the price of the car in the instances of the test set.

Next, import the `watson-machine-learning-client`, which is required to deploy and score saved models.

## 3. Retrieving the model from the WML Repository<a id='wml'></a>

In this step, you'll find the model saved from an SPSS Modeler flow in Watson Studio and load it from the WML Repository.
- 3.1 [Set up the Watson Machine Learning client](#instance)
- 3.2 [Retrieve the model](#model)

**Tip**: You can find more information about the `watson-machine-learning-client` at <a href="https://wml-api-pyclient.mybluemix.net" target="_blank" rel="noopener noreferrer">https://wml-api-pyclient.mybluemix.net</a>.

### 3.1 Set up the Watson Machine Learning client<a id='instance'></a>

First, import the Watson Machine Learning client library:

In [None]:
# Import the Watson Machine Learning client.
from watson_machine_learning_client import WatsonMachineLearningAPIClient

**Note**: A deprecation warning is returned from scikit-learn package that doesn't impact the Watson Machine Learning client functionality.

Now enter your Watson Machine Learning service instance credentials to authenticate to Watson Machine Learning.

- You can find your authentication information (your credentials) in the **Service Credentials** tab of the service instance that you created in IBM Cloud. For more information, see <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-get-wml-credentials.html" target="_blank" rel="noopener noreferrer">https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-get-wml-credentials.html</a>. If you don't see the **apikey** field in **Service Credentials**, click **New credential (+)** to generate new authentication information. 

In [None]:
# Enter your Watson Machine Learning service instance credentials.
wml_credentials = {
         'url':  'https://ibm-watson-ml.mybluemix.net',
         'username':  '****',
         'password':  '****',
         'instance_id':  '****',
         'apikey' : '****'
 }

Create the API client:

In [4]:
client = WatsonMachineLearningAPIClient(wml_credentials)

Get the instance details:

In [5]:
instance_details = client.service_instance.get_details()

### 3.2 Retrieve the model<a id='model'></a>

You have **2 options** for retrieving the SPSS model: 
1. Select the model created and saved using an SPSS Modeler flow in Watson Studio
2. Download a sample flow and save the model

#### 3.2.1 Select the model created and saved using an SPSS Modeler flow in Watson Studio

List all the models that are saved in the Watson Machine Learning repository. You'll need to pick the GUID of the relevant XGBoost Trees model that you saved from Watson Studio to continue running the notebook. 

In [None]:
# List existing models in the Watson Machine Learning repository.
models_details = client.repository.list_models()

**Action**: Enter the GUID of the appropriate car price regression model in the cell below once you've found the GUID in the list of models above.

In [None]:
# SPSS regression model saved in Watson Machine Learning repository.
published_model = 'Enter_GUID-here' # Enter GUID here

#### 3.2.2 Download the sample flow and save the model

Download the sample model from <a href="https://github.com/pmservice/wml-sample-models" target="_blank" rel="noopener noreferrer">https://github.com/pmservice/wml-sample-models</a>.

Note: You might need to install the `wget` package. To install the wget package, run the following command:

In [None]:
!pip install --upgrade wget

In [7]:
# Download the car price SPSS model.
import os
import wget

sample_dir = 'spss_model'
if not os.path.isdir(sample_dir):
    os.mkdir(sample_dir)

filename=os.path.join(sample_dir, 'car-price-prediction.str')
if not os.path.isfile(filename):
    filename = wget.download('https://github.com/pmservice/wml-sample-models/raw/master/spss/car-price-prediction/model/car-price-prediction.str',\
                             out=sample_dir)
print(filename)

spss_model/car-price-prediction.str


Save the sample model to your Watson Machine Learning repository. First, you need to create the model metadata to store in the repository:

In [8]:
# Save the SPSS model to the Watson Machine Learning repository.
model_props = {
    client.repository.ModelMetaNames.NAME: 'SPSS Car Price Regression model',
    client.repository.ModelMetaNames.FRAMEWORK_NAME: 'spss-modeler', 
    client.repository.ModelMetaNames.FRAMEWORK_VERSION: '18.1'
}

model_details = client.repository.store_model(filename, model_props)

You can use the `list_models` method to list all stored models:

In [9]:
# List existing models in Watson Machine Learning repository.
client.repository.list_models()

------------------------------------  -------------------------------  ------------------------  -----------------
GUID                                  NAME                             CREATED                   FRAMEWORK
422099cc-0ac5-4d89-a8e0-d08cb9151e14  SPSS Car Price Regression model  2019-06-07T16:23:33.398Z  spss-modeler-18.1
------------------------------------  -------------------------------  ------------------------  -----------------


You need the model UID to create the deployment. You can extract the model UID from the saved model details and use it in the next section to create the deployment.

In [10]:
published_model = client.repository.get_model_uid(model_details)
print("Model UID = " + published_model)

Model UID = 422099cc-0ac5-4d89-a8e0-d08cb9151e14


## 4. Deploying the selected model and scoring data<a id='deploy'></a>

In this section, you'll learn how to use the Watson Machine Learning client to create online scoring and score a new data record.

- 4.1  [Create an online deployment for the published model](#create)
- 4.2  [Get the deployment](#get)
- 4.3  [Score data](#score)
- 4.4  [Delete the deployment and model](#delete)

### 4.1 Create an online deployment for the published model<a id='create'></a>

Deploy the model:

In [11]:
# Initialize deployment of the published SPSS regression model.
created_deployment = client.deployments.create(published_model, 'Deployment of regression model (SPSS)')



#######################################################################################

Synchronous deployment creation for uid: '422099cc-0ac5-4d89-a8e0-d08cb9151e14' started

#######################################################################################


INITIALIZING
DEPLOY_IN_PROGRESS.........
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='e6259aae-8f8c-4158-9ea7-59aab82f8cac'
------------------------------------------------------------------------------------------------




**Note:** Here, you use the deployment url saved in the published_model object. The next section shows you how to retrieve the deployment url from the Watson Machine Learning instance.

Retrieve the online scoring endpoint - this is required to score the deployed model:

In [None]:
# Get scoring endpoint.
scoring_endpoint = client.deployments.get_scoring_url(created_deployment)

print(scoring_endpoint)

### 4.2 Get the deployment<a id='get'></a>

In [13]:
# Get deployment details.
deployments = client.deployments.get_details()

You can get the deployment url by parsing the deployment details for the last deployed model:

In [None]:
# Retrieve deployment url.
deployment_url = client.deployments.get_url(created_deployment)

print(deployment_url)

List all the current deployments in the WML Repository to see the model you just deployed:

In [15]:
# List the existing deployments in the WML Repository.
client.deployments.list()

------------------------------------  -------------------------------------  ------  --------------  ------------------------  -----------------  -------------
GUID                                  NAME                                   TYPE    STATE           CREATED                   FRAMEWORK          ARTIFACT TYPE
e6259aae-8f8c-4158-9ea7-59aab82f8cac  Deployment of regression model (SPSS)  online  DEPLOY_SUCCESS  2019-06-07T16:23:50.112Z  spss-modeler-18.1  model
------------------------------------  -------------------------------------  ------  --------------  ------------------------  -----------------  -------------


### 4.3 Score data<a id='score'></a>

Use the following method to run a test scoring request against the deployed model.

**Action**: Prepare the scoring payload with the records to score.

Before you score the model, you need to modify the scoring payload (the records used in scoring) to be compatible with scoring. As you can see below, some values are numeric and of type `float64` and `int64`. These Python types are not compatible with scoring and must be modified.

In [16]:
# List information about the columns of the data set.
df_data_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 201 entries, 0 to 200
Data columns (total 26 columns):
symboling            201 non-null int64
normalized-losses    164 non-null float64
make                 201 non-null object
fuel-type            201 non-null object
aspiration           201 non-null object
num-of-doors         199 non-null object
body-style           201 non-null object
drive-wheels         201 non-null object
engine-location      201 non-null object
wheel-base           201 non-null float64
length               201 non-null float64
width                201 non-null float64
height               201 non-null float64
curb-weight          201 non-null int64
engine-type          201 non-null object
num-of-cylinders     201 non-null object
engine-size          201 non-null int64
fuel-system          201 non-null object
bore                 197 non-null float64
stroke               197 non-null float64
compression-ratio    201 non-null float64
horsepower           199 non-

Scoring doesn't take null values, so you can pick the scoring payload from a copy of the data where the `NaN` values are removed:

In [17]:
# Drop the rows of the data set that contain NaN.
df_score = df_data_1.dropna()

Define the scoring payload by taking 5 records as lists and saving them in a dictionary:

In [18]:
# Define the scoring payload.]
scoring_payload = {'fields': (df_data_1.columns).tolist(), 'values': [df_score.iloc[a].tolist() for a in range(5)]}

Convert the values in the scoring payload into the appropriate `int` data type:

In [19]:
# Convert type to int for every int64/float64 type in each record.
for instance in scoring_payload['values']:
    for i in [0, 1, 13, 16, 21, 22, 23, 24, 25]:
        instance[i] = int(instance[i])

You can see the scoring payload here:

In [20]:
scoring_payload

{'fields': ['symboling',
  'normalized-losses',
  'make',
  'fuel-type',
  'aspiration',
  'num-of-doors',
  'body-style',
  'drive-wheels',
  'engine-location',
  'wheel-base',
  'length',
  'width',
  'height',
  'curb-weight',
  'engine-type',
  'num-of-cylinders',
  'engine-size',
  'fuel-system',
  'bore',
  'stroke',
  'compression-ratio',
  'horsepower',
  'peak-rpm',
  'city-mpg',
  'highway-mpg',
  'price'],
 'values': [[2,
   164,
   'audi',
   'gas',
   'std',
   'four',
   'sedan',
   'fwd',
   'front',
   99.799999999999997,
   176.59999999999999,
   66.200000000000003,
   54.299999999999997,
   2337,
   'ohc',
   'four',
   109,
   'mpfi',
   3.1899999999999999,
   3.3999999999999999,
   10.0,
   102,
   5500,
   24,
   30,
   13950],
  [2,
   164,
   'audi',
   'gas',
   'std',
   'four',
   'sedan',
   '4wd',
   'front',
   99.400000000000006,
   176.59999999999999,
   66.400000000000006,
   54.299999999999997,
   2824,
   'ohc',
   'five',
   136,
   'mpfi',
   3.18999

Score the payload against the deployed model. You can see the predicted price and the real price as follows:

In [21]:
# Score the payload using the deployed model and print results.
import json
predictions = client.deployments.score(scoring_endpoint, scoring_payload)
print(json.dumps(predictions, indent=2))

{
  "values": [
    [
      "audi",
      "front",
      "four",
      "mpfi",
      13950,
      0.1479655443510484,
      -0.09287269099619345,
      -0.45891330077980236,
      -0.5257515189324719,
      -0.03121270473601937,
      10132
    ],
    [
      "audi",
      "front",
      "five",
      "mpfi",
      17450,
      0.24313698247765597,
      -1.3050001714348938,
      0.3290871612744488,
      -0.5257515189324719,
      0.325588711145538,
      13620
    ],
    [
      "audi",
      "front",
      "five",
      "mpfi",
      17710,
      2.6224229356428115,
      -0.8504523662703811,
      0.3290871612744488,
      -0.5257515189324719,
      0.18835739734493903,
      14824
    ],
    [
      "audi",
      "front",
      "five",
      "mpfi",
      23875,
      2.6224229356428115,
      -1.6080320415445688,
      0.1831611497829208,
      -0.749790714344187,
      1.011745280148533,
      18721
    ],
    [
      "bmw",
      "front",
      "four",
      "mpfi",
      1643

In [22]:
# Compare the predicted price with the actual value.
print(predictions['fields'][-1],':', predictions['values'][0][-1])
print(predictions['fields'][4],':', predictions['values'][0][4])

$XGT-price : 10132
price : 13950


### 4.4 Delete the deployment and model<a id='delete'></a>

Use the following method to delete the deployment:

In [23]:
# Delete the deployment.
client.deployments.delete(client.deployments.get_uid(created_deployment))

'SUCCESS'

In [24]:
# Check that the deployment was deleted.
client.deployments.list()

----  ----  ----  -----  -------  ---------  -------------
GUID  NAME  TYPE  STATE  CREATED  FRAMEWORK  ARTIFACT TYPE
----  ----  ----  -----  -------  ---------  -------------


## 5. Summary and next steps<a id='summary'></a>

You successfully completed this notebook! 

You learned how to use Watson Machine Learning to create, deploy, and score a model creating in a Watson Studio SPSS Modeler flow.

Check out our online documentation at <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/wml-ai.html" target="_blank" rel="noopener noreferrer">https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/wml-ai.html</a> for more samples, tutorials, and documentation. 

### Data citations

Jeffrey C. Schlimmer (1987), <a href="https://archive.ics.uci.edu/ml/datasets/automobile" target="_blank" rel="noopener noreferrer">UCI Machine Learning Repository</a>. Irvine, CA: University of California, School of Information and Computer Science.

### Author

**Ananya Kaushik** is a Data Scientist at IBM.

<hr>
Copyright © 2019 IBM. This notebook and its source code are released under the terms of the MIT License.

<div style='background:#F5F7FA; height:110px; padding: 2em; font-size:14px;'>
<span style='font-size:18px;color:#152935;'>Love this notebook? </span>
<span style='font-size:15px;color:#152935;float:right;margin-right:40px;'>Don't have an account yet?</span><br>
<span style='color:#5A6872;'>Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style='border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;'><a href='https://ibm.co/wsnotebooks' target='_blank' style='color: #3d70b2;text-decoration: none;'>Sign Up</a></span><br>
</div>