# Import a model from PMML into IBM Watson Machine Learning

Importing a model into Watson Machine Learning means to store a trained model in your Watson Machine Learning repository and then deploy the stored model.  This notebook demonstrates how to use the Watson Machine Learning Python client to import a model that has been saved in PMML format.

See also: <a href="https://dataplatform.cloud.ibm.com/docs/content/analyze-data/ml-import-pmml.html" target="_blank" rel="noopener noreferrer">Importing models from PMML</a>

This notebook runs on Spark Python 3.5.


### Notebook sections

[Setup](#setup)

**[Example 1](#ex1)**
1. [Load PMML file](#load1)
2. [Store and the model](#store1)
3. [Test the deployment](#test1)

**[Example 2](#ex2)**
1. [Load PMML file](#load2)
2. [Store and the model](#store2)
3. [Test the deployment](#test2)

## <a id="setup"></a> Set up
- Install packages
- Import libraries
- Instaiate a Watson Machine Learning client

In [None]:
!pip install wget # needed to download sample file

In [None]:
!pip install watson_machine_learning_client

Paste your Watson Machine Learning credentials in the following cell.

See: <a href="https://dataplatform.cloud.ibm.com/docs/content/analyze-data/ml-get-wml-credentials.html" target="_blank" rel="noopener noreferrer">Looking up credentials</a>

In [3]:
# Create a Watson Machine Learning client instance
from watson_machine_learning_client import WatsonMachineLearningAPIClient
wml_credentials = {
    "instance_id": "782a5876-d897-49e3-a919-7901b7be5fba",
    "password": "7d21311b-86b9-414e-aec4-e14cf601ced1",
    "url": "https://us-south.ml.cloud.ibm.com",
    "username": "6fdf56cd-b281-4473-aed2-7ef86b524406"
}
client = WatsonMachineLearningAPIClient( wml_credentials )



## <a id="ex1"></a> Example 1

**About the sample PMML file and model**

The sample PMML file was generated by this notebook: <a href="" target="_blank" rel="noopener noreferrer">Saving a Spark MLlib model in PMML format</a>

The sample model is a logistic regression model for predicting whether or not a customer will purchase a tent from a fictional outdoor equipment store, based on the customer charateristics.

The data used to train the model is the "GoSales.csv" training data in the IBM Watson Studio community: <a href="https://dataplatform.cloud.ibm.com/exchange/public/entry/view/aa07a773f71cf1172a349f33e2028e4e" target="_blank" rel="noopener noreferrer">GoSales sample data</a>.

### <a id="load1"></a> 1. Load sample PMML file

In [4]:
# Download sample PMML file
import wget
pmml_file_url = 'https://raw.githubusercontent.com/pmservice/wml-sample-models/master/spark/import-pmml/spark-mllib-lr-model-pmml.xml'
pmml_filename = wget.download( pmml_file_url )
print( pmml_filename )

spark-mllib-lr-model-pmml.xml


In [5]:
# View the PMML
!cat spark-mllib-lr-model-pmml.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_2" version="4.2">
    <Header description="logistic regression">
        <Application name="Apache Spark MLlib" version="2.3.0"/>
        <Timestamp>2018-12-24T00:33:53</Timestamp>
    </Header>
    <DataDictionary numberOfFields="5">
        <DataField name="field_0" optype="continuous" dataType="double"/>
        <DataField name="field_1" optype="continuous" dataType="double"/>
        <DataField name="field_2" optype="continuous" dataType="double"/>
        <DataField name="field_3" optype="continuous" dataType="double"/>
        <DataField name="target" optype="categorical" dataType="string"/>
    </DataDictionary>
    <RegressionModel modelName="logistic regression" functionName="classification" normalizationMethod="logit">
        <MiningSchema>
            <MiningField name="field_0" usageType="active"/>
            <MiningField name="field_1" usageType="active"/>
    

### <a id="store1"></a> 2. Store and deploy the model

In [6]:
# Store the model in the Watson Machine Learning repository.
# Parameters:
# 1. The name of the PMML file
# 2. Metadata that includes a name you choose for the stored model and the framework
metadata = {
    client.repository.ModelMetaNames.NAME: "Model from PMML",
    client.repository.ModelMetaNames.FRAMEWORK_NAME: "pmml"
}
model_details = client.repository.store_model( model=pmml_filename, meta_props=metadata )

In [None]:
# Deploy the stored model as an online web service deployment
model_id = model_details["metadata"]["guid"]
deployment_details = client.deployments.create( artifact_uid=model_id, name="Deployment of model from PMML" )

### <a id="test1"></a> 3. Test the deployment

In the sample notebook that builds the model and saves it in PMML format, you can see examples of sample data:

Features of a customer who did not buy a tent:
```
| GENDER | AGE | MARITAL_STATUS | PROFESSION     | feature_vector           |
+--------+-----+----------------+----------------+--------------------------+
| "F"    | 35  | "Married"      | "Professional" | [ 1.0, 35.0 , 0.0, 1.0 ] |
```

Features of a customer who did buy a tent:
```
| GENDER | AGE | MARITAL_STATUS | PROFESSION     | feature_vector           |
+--------+-----+----------------+----------------+--------------------------+
| "M"    | 20  | "Single"       | "Sales"        | [ 0.0, 20.0, 1.0, 2.0 ]  |
```

It is the four numeric values in `feature_vector` column that are sent to the model to make a prediction.

The values in the `feature_vector` column for the "GENDER", "MARITAL_STATUS", and "PROFESSION" columns are generated by string indexers:
```
+------+------------+      +--------------+--------------------+      +------------+----------------+
|GENDER|GENDER_index|      |MARITAL_STATUS|MARITAL_STATUS_index|      |  PROFESSION|PROFESSION_index|
+------+------------+      +--------------+--------------------+      +------------+----------------+
|     M|         0.0|      |       Married|                 0.0|      |       Other|             0.0|
|     F|         1.0|      |        Single|                 1.0|      |Professional|             1.0|
+------+------------+      |   Unspecified|                 2.0|      |       Sales|             2.0|
                           +--------------+--------------------+      |   Executive|             3.0|
                                                                      |      Trades|             4.0|
                                                                      | Hospitality|             5.0|
                                                                      |     Student|             6.0|
                                                                      |      Retail|             7.0|
                                                                      |     Retired|             8.0|
                                                                      +------------+----------------+
```

For details, refer to the sample notebook that builds the model and saved it to PMML format: <a href="" target="_blank" rel="noopener noreferrer">Saving a Spark MLlib model in PMML format</a>

In [8]:
negative_example_payload = { "fields" : [ "field_0", "field_1", "field_2", "field_3" ], "values" : [ [ 1.0, 35.0, 0.0, 1.0 ] ] }
positive_example_payload = { "fields" : [ "field_0", "field_1", "field_2", "field_3" ], "values" : [ [ 0.0, 20.0, 1.0, 2.0 ] ] }

In [9]:
# Test the deployment
model_endpoint_url = client.deployments.get_scoring_url( deployment_details )
prediction_1 = client.deployments.score( model_endpoint_url, negative_example_payload )
prediction_2 = client.deployments.score( model_endpoint_url, positive_example_payload )
print( "Prediction for negative_example_payload: " + str( prediction_1["values"][0][0] ) )
print( "Prediction for positive_example_payload: " + str( prediction_2["values"][0][0] ) )

Prediction for negative_example_payload: 0
Prediction for positive_example_payload: 1


## <a id="ex2"></a> Example 2


**About the sample PMML file and model**

The sample PMML file was generated by this notebook: <a href="" target="_blank" rel="noopener noreferrer">Saving a scikit-learn model in PMML format</a>

The sample model is a logistic regression model for predicting whether or not a customer will purchase a tent from a fictional outdoor equipment store, based on the customer charateristics.

The data used to train the model is the "GoSales.csv" training data in the IBM Watson Studio community: <a href="https://dataplatform.cloud.ibm.com/exchange/public/entry/view/aa07a773f71cf1172a349f33e2028e4e" target="_blank" rel="noopener noreferrer">GoSales sample data</a>.

### <a id="load2"></a> 1. Load sample PMML file

In [11]:
# Download sample PMML file
import wget
pmml_file_url2 = 'https://raw.githubusercontent.com/pmservice/wml-sample-models/master/scikit-learn/import-pmml/scikit-learn-lr-model-pmml.xml'
pmml_filename2 = wget.download( pmml_file_url2 )
print( pmml_filename2 )

scikit-learn-lr-model-pmml.xml


In [12]:
# View the PMML
!cat scikit-learn-lr-model-pmml.xml





































</PMML>

### <a id="store2"></a> 2. Store and deploy the model

In [13]:
# Store the model in the Watson Machine Learning repository.
# Parameters:
# 1. The name of the PMML file
# 2. Metadata that includes a name you choose for the stored model and the framework
metadata2 = {
    client.repository.ModelMetaNames.NAME: "Model from PMML 2",
    client.repository.ModelMetaNames.FRAMEWORK_NAME: "pmml"
}
model_details2 = client.repository.store_model( model=pmml_filename2, meta_props=metadata2 )

In [None]:
# Deploy the stored model as an online web service deployment
model_id2 = model_details2["metadata"]["guid"]
deployment_details2 = client.deployments.create( artifact_uid=model_id2, name="Deployment of model from PMML 2" )

### <a id="test2"></a> 3. Test the deployment

In the sample notebook that builds the model and saves it in PMML format, you can see examples of sample data:

Features of a customer who did not buy a tent:
```
| GENDER | AGE | MARITAL_STATUS | PROFESSION     |
+--------+-----+----------------+----------------+
| "F"    | 35  | "Married"      | "Professional" |
```

Features of a customer who did buy a tent:
```
| GENDER | AGE | MARITAL_STATUS | PROFESSION     |
+--------+-----+----------------+----------------+
| "M"    | 20  | "Single"       | "Sales"        |
```

In this second example, numeric values for the "GENDER", "MARITAL_STATUS", and "PROFESSION" columns are generated by label encoders:
```
le_GENDER:      le_MARITAL_STATUS:        le_PROFESSION:
[[0 'F']        [[0 'Married']            [[0 'Executive']
 [1 'M']]        [1 'Single']              [1 'Hospitality']
                 [2 'Unspecified']]        [2 'Other']
                                           [3 'Professional']
                                           [4 'Retail']
                                           [5 'Retired']
                                           [6 'Sales']
                                           [7 'Student']
                                           [8 'Trades']]
```

For details, refer to the sample notebook that builds the model and saved it to PMML format: <a href="" target="_blank" rel="noopener noreferrer">Saving a scikit-learn model in PMML format</a>

In [15]:
negative_example_payload2 = { "fields" : [ "AGE", "GENDER_index", "MARITAL_STATUS_index", "PROFESSION_index" ], "values" : [ [ 35, 0, 0, 3 ] ] }
positive_example_payload2 = { "fields" : [ "AGE", "GENDER_index", "MARITAL_STATUS_index", "PROFESSION_index" ], "values" : [ [ 20, 1, 1, 6 ] ] }

In [18]:
# Test the deployment
model_endpoint_url2 = client.deployments.get_scoring_url( deployment_details2 )
prediction_3 = client.deployments.score( model_endpoint_url2, negative_example_payload2 )
prediction_4 = client.deployments.score( model_endpoint_url2, positive_example_payload2 )
print( "Prediction for negative_example_payload2: " + str( prediction_3 ) )
print( "Prediction for positive_example_payload2: " + str( prediction_4 ) )

Prediction for negative_example_payload2: {'values': [[0.9186550192609427, 0.08134498073905735]], 'fields': [['probability(0)', 'probability(1)']]}
Prediction for positive_example_payload2: {'values': [[0.09258759915566939, 0.9074124008443305]], 'fields': [['probability(0)', 'probability(1)']]}


## Summary
In this notebook, you imported two models into Watson Machine Learning from sample PMML files using the Watson Machine Learning Python client.

### <a id="authors"></a>Authors

**Sarah Packowski** is a member of the IBM Watson Studio Content Design team in Canada.


<hr>
Copyright &copy; IBM Corp. 2019. This notebook and its source code are released under the terms of the MIT License.

<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>