<img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" align="center" alt="Watson Machine Learning icon" height="45" width="45"/>

# Using custom-defined transformer in a scikit-learn model with Watson Machine Learning

Building models with standard components is straightforward, but to use any custom component with your models in Watson Machine Learning, you need to package your custom component code in a source distribution package and store that package in your Watson Machine Learning repository with your model.

Learn how to train a scikit-learn model that uses a custom-defined transformer and then use it with the Watson Machine Learning service. After the model is trained, persist the model and the custom-defined transformer to the Watson Machine Learning Repository, deploy and score it using the Watson Machine Learning python client.

In this notebook, you will use the GNFUV data set that contains mobile sensor readings data about humidity and temperature from Unmanned Surface Vehicles in a test-bed in Athens to train a Scikit-Learn model to predict the temperature. 

Some familiarity with Python is helpful. This notebook uses scikit-learn-0.19.1. Learn more about custom components <a href="https://dataplatform.cloud.ibm.com/docs/content/analyze-data/ml-custom_libs_overview.html" target="_blank" rel="noopener noreferrer">here</a>.
This notebook runs on Python.

## Learning goals

This notebooks focuses particularly on demonstrating how to use custom components in your model. You will learn how to:

- train a model with a custom-defined transformer
- persist the custom-defined transformer and the model in the Watson Machine Learning repository
- deploy the model using the Watson Machine Learning Service
- perform predictions using the deployed model

## Contents
1.	[Set up](#setup)
2.	[Install a sample custom package for a scikit-learn model](#install_lib)
3.  [Prepare training data](#load)
3.	[Train the scikit-learn model](#train)
4.	[Save the model and library to Watson Machine Learning Repository](#persistence)
5.	[Deploy and predict](#deploy)
6.	[Summary and next steps](#summary)


<a id="setup"></a>
## 1. Set up

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://cloud.ibm.com/catalog/services/machine-learning" target="_blank" rel="noopener noreferrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create the instance is <a href="https://dataplatform.ibm.com/docs/content/analyze-data/wml-setup.html" target="_blank" rel="noopener noreferrer">here</a>.

- Configure your local python environment:
  + Latest version of Python
  + scikit-learn 0.19.1
  + watson-machine-learning-client

**Tip**: Run the cell below to install libraries from <a href="https://pypi.python.org/pypi" target="_blank" rel="noopener no referrer">PyPI</a>.

In [1]:
!rm -rf $PIP_BUILD/watson-machine-learning-client

In [None]:
!pip install watson-machine-learning-client --upgrade

<a id="install_lib"></a>

## 2. Install a sample custom package for a scikit-learn model

The library `linalgnorm-0.1.zip` is a Python distributable package that contains the implementation of the user-defined Scikit-Learn transformer `LNormalizer` . <br>
Any 3rd party libraries that are required for the custom transformer must be defined as the dependency for the corresponding library that contains the transformer implementation. 

Download the library and install it in the current notebook environment. 

In [3]:
import os

if not os.path.exists("data/libs"):
    os.makedirs("data/libs")
    os.chdir("data/libs")

In [None]:
# download the library
!wget https://github.com/pmservice/wml-sample-models/raw/master/scikit-learn/custom-transformer-temperature-prediction/libraries/linalgnorm-0.1.zip --output-document=linalgnorm-0.1.zip

In [5]:
ls -ltr

total 4
-rw-r----- 1 dsxuser dsxuser 2550 Jul  9 17:28 linalgnorm-0.1.zip


Install the library using `pip` command.

In [None]:
!pip install linalgnorm-0.1.zip

<a id="load"></a>
## 3. Download the training data set and prepare the training data

Download the GNFUV Unmanned Surface Vehicles Sensor Data into a directory called `dataset` using the `wget` command. You can also download the data set directly from the <a href="https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV%20USV%20Dataset.zip" target="_blank" rel="noopener noreferrer">UCI repository</a>. More details about the GNFUV data set can be found <a href="https://archive.ics.uci.edu/ml/datasets/GNFUV+Unmanned+Surface+Vehicles+Sensor+Data" target="_blank" rel="noopener noreferrer">here</a>.

In [7]:
!rm -rf dataset
!mkdir dataset

In [None]:
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV%20USV%20Dataset.zip --output-document=dataset/gnfuv_dataset.zip

In [9]:
cd dataset

/home/dsxuser/work/data/libs/dataset


In [10]:
!unzip gnfuv_dataset.zip

Archive:  gnfuv_dataset.zip
  inflating: pi2/gnfuv-temp-exp1-55d487b85b-5g2xh_1.0.csv  
  inflating: pi3/gnfuv-temp-exp1-55d487b85b-2bl8b_1.0.csv  
  inflating: pi4/gnfuv-temp-exp1-55d487b85b-xcl97_1.0.csv  
  inflating: pi5/gnfuv-temp-exp1-55d487b85b-5ztk8_1.0.csv  
  inflating: README.pdf              


Create a pandas dataFrame based on the downloaded data set.

In [11]:
import json
import pandas as pd
import numpy as np
import os
from datetime import datetime
from json import JSONDecodeError

In [12]:
## Get all the entries
home_dir = '.'
pi_dirs = os.listdir(home_dir)

data_list = []
base_time = None
columns = None

for pi_dir in pi_dirs:
    if 'pi' not in pi_dir:
        continue
    curr_dir = os.path.join(home_dir, pi_dir)
    data_file = os.path.join(curr_dir, os.listdir(curr_dir)[0])
    with open(data_file, 'r') as f:
        line = f.readline().strip().replace("'", '"')
        while line != '':
            try:
                input_json = json.loads(line)
                sensor_datetime = datetime.fromtimestamp(input_json['time'])
                if base_time is None:
                    base_time = datetime(sensor_datetime.year, sensor_datetime.month, sensor_datetime.day, 0, 0, 0, 0)
                input_json['time'] = (sensor_datetime - base_time).seconds
                data_list.append(list(input_json.values()))
                if columns is None:
                    columns = list(input_json.keys())
            except JSONDecodeError as je:
                pass
            line = f.readline().strip().replace("'", '"')

data_df = pd.DataFrame(data_list, columns=columns)

In [13]:
data_df.head()

Unnamed: 0,device,humidity,temperature,experiment,time
0,gnfuv-temp-exp1-55d487b85b-xcl97,59.0,20.0,1.0,59247
1,gnfuv-temp-exp1-55d487b85b-xcl97,58.0,21.0,1.0,59252
2,gnfuv-temp-exp1-55d487b85b-xcl97,57.0,21.0,1.0,59259
3,gnfuv-temp-exp1-55d487b85b-xcl97,57.0,21.0,1.0,59267
4,gnfuv-temp-exp1-55d487b85b-xcl97,57.0,21.0,1.0,59271


Create the training and test data sets from the downloaded GNFUV-USV data set.

In [None]:
!pip install scikit-learn==0.19.1

In [15]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

Y = data_df['temperature']
X = data_df.drop('temperature', axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=143)

<a id="train"></a>

## 4. Train a model

In this section, you will use the custom transformer as a stage in the Scikit-Learn `Pipeline` and train a model.

### 4.1 Import the custom transformer 
Import the custom transformer that has been defined in `linalgnorm-0.2.zip` and create an instance of it that will in turn be used as a stage in `sklearn.Pipeline`

In [16]:
from linalg_norm.sklearn_transformers import LNormalizer

In [17]:
lnorm_transf = LNormalizer()

### 4.2 Import other objects required to train a model

In [18]:
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression

Now, you can create a `Pipeline` with the user-defined transformer as one of the stages and train the model.

In [19]:
skl_pipeline = Pipeline(steps=[('normalizer', lnorm_transf), ('regression_estimator', LinearRegression())])
skl_pipeline.fit(X_train.loc[:, ['time', 'humidity']].values, y_train)

Pipeline(memory=None,
     steps=[('normalizer', LNormalizer(norm_ord=2)), ('regression_estimator', LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False))])

In [20]:
y_pred = skl_pipeline.predict(X_test.loc[:, ['time', 'humidity']].values)
rmse = np.mean((np.round(y_pred) - y_test.values)**2)**0.5
print('RMSE: {}'.format(rmse))

RMSE: 2.226301072382404


<a id="persistence"></a>

## 5. Persist the model and custom library to Watson Machine Learning Repository

In this section, you will use the `watson_machine_learning_client` to:
- Save the library `linalgnorm-0.1.zip` in the Watson Machine Learning Repository by creating a Library resource.
- Create a runtime resource that will be used to configure the online deployment runtime environment for a model.
- Save the model to Watson Machine Learning Repository.

### 5.1 Work with the Watson Machine Learning service instance

In [21]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

Authenticate to the Watson Machine Learning service on IBM Cloud.

Authentication information (your credentials) can be found in the <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-get-wml-credentials.html" target="_blank" rel="noopener noreferrer">Service Credentials</a> tab of the service instance that you created on IBM Cloud. <BR>If you cannot see the **instance_id** field in **Service Credentials**, click **New credential (+)** to generate new authentication information. 

**Action**: Enter your Watson Machine Learning service instance credentials here.


In [22]:
wml_credentials = {
    "apikey": "...",
    "username": "...",
    "password": "...",
    "instance_id": "...",
    "url": "https://ibm-watson-ml.mybluemix.net"
}

Create Watson Machine Learning API client

In [24]:
client = WatsonMachineLearningAPIClient(wml_credentials)

### 5.2 Save the custom library in the Watson Machine Learning Repository

In [25]:
cd ~/work/data/libs 

/home/dsxuser/work/data/libs


Define the meta data required to create library resource and save the library. <br>

The value for `client.runtimes.LibraryMetaNames.FILEPATH` meta data contains the library file name that must be saved to the Watson Machine Learning Repository.

In [26]:
lib_meta = {
        client.runtimes.LibraryMetaNames.NAME: "K_Linag_norm_skl",
        client.runtimes.LibraryMetaNames.DESCRIPTION: "K_Linag_norm_skl",
        client.runtimes.LibraryMetaNames.FILEPATH: "linalgnorm-0.1.zip",
        client.runtimes.LibraryMetaNames.VERSION: "1.3",
        client.runtimes.LibraryMetaNames.PLATFORM: {"name": "python", "versions": ["3.5"]}
    }
custom_library_details = client.runtimes.store_library(lib_meta)
custom_library_uid = client.runtimes.get_library_uid(custom_library_details)
print("Custom Library UID: " + custom_library_uid)

Custom Library UID: 76fe4244-1d2c-46e9-b97b-30401543df33


Display the details of the library resource that was created in the previous cell.

In [27]:
custom_library_details

{'metadata': {'guid': '76fe4244-1d2c-46e9-b97b-30401543df33',
  'url': 'https://us-south.ml.cloud.ibm.com/v4/libraries/76fe4244-1d2c-46e9-b97b-30401543df33',
  'created_at': '2019-07-09T17:28:29.319Z',
  'modified_at': '2019-07-09T17:28:30.077Z'},
 'entity': {'name': 'K_Linag_norm_skl',
  'description': 'K_Linag_norm_skl',
  'version': '1.3',
  'platform': {'name': 'python', 'versions': ['3.5']}}}

### 5.3 Create the runtime resource

Define the meta data required to create the runtime resource and bind the library. This runtime resource will be used to configure the online deployment runtime environment for a model.

The `client.runtimes.ConfigurationMetaNames.LIBRARIES_UIDS` meta data property is used to specify the list of library resource GUIDs that need to be part of the runtime.

In [28]:
runtimes_meta = {
    client.runtimes.ConfigurationMetaNames.NAME: "K_linalg_gnfuv1", 
    client.runtimes.ConfigurationMetaNames.DESCRIPTION: "skl linalg gnfuv model", 
    client.runtimes.ConfigurationMetaNames.PLATFORM: { "name": "python", "version": "3.5" }, 
    client.runtimes.ConfigurationMetaNames.LIBRARIES_UIDS: [custom_library_uid]
}

**Alternate method:** Create library and runtime together by specifying the meta data property below:

```
client.runtimes.ConfigurationMetaNames.LIBRARIES_DEFINITIONS: [
    LibraryDefinition("my_lib_1", "1.0", "/home/user/my_lib_1.zip", description="t", platform={"name": "python", "versions": ["3.5"]}), 
    LibraryDefinition("my_lib_2", "1.1", "/home/user/my_lib_2.zip") ]
```

Create a runtime resource based on the meta data specified above and display the details.

In [29]:
runtime_details = client.runtimes.store(runtimes_meta)
runtime_details

{'metadata': {'guid': '883b5d39-6b10-4143-9064-e282d16b1888',
  'url': 'https://us-south.ml.cloud.ibm.com/v4/runtimes/883b5d39-6b10-4143-9064-e282d16b1888',
  'created_at': '2019-07-09T17:28:30.255Z'},
 'entity': {'name': 'K_linalg_gnfuv1',
  'description': 'skl linalg gnfuv model',
  'custom_libraries': [{'name': 'K_Linag_norm_skl',
    'version': '1.3',
    'url': 'https://us-south.ml.cloud.ibm.com/v4/libraries/76fe4244-1d2c-46e9-b97b-30401543df33'}],
  'content_url': 'https://us-south.ml.cloud.ibm.com/v4/runtimes/883b5d39-6b10-4143-9064-e282d16b1888/content',
  'platform': {'name': 'python', 'version': '3.5'}}}

Retrieve the URL and GUID information about the runtime resource you just created.

In [30]:
runtime_url = client.runtimes.get_url(runtime_details)
runtime_uid = client.runtimes.get_uid(runtime_details)
print("Runtimes URL: " + runtime_url)
print("Runtimes UID: " + runtime_uid)

Runtimes URL: https://us-south.ml.cloud.ibm.com/v4/runtimes/883b5d39-6b10-4143-9064-e282d16b1888
Runtimes UID: 883b5d39-6b10-4143-9064-e282d16b1888


### 5.4 Save the model

Define the meta data to save the trained model to the Watson Machine Learning Repository together with the information about the runtime resource required for the model. 

The `client.repository.ModelMetaNames.RUNTIME_UID` meta data property is used to specify the GUID of the Runtime resource that needs to be associated with the model .

In [31]:
model_props = {client.repository.ModelMetaNames.NAME: "cust norm linalg_norm gnfuv1",
               client.repository.ModelMetaNames.RUNTIME_UID: runtime_uid
              }

Save the model to the Watson Machine Learning repository and display its saved meta data. 

In [32]:
published_model = client.repository.store_model(model=skl_pipeline, meta_props=model_props)

In [33]:
published_model_uid = client.repository.get_model_uid(published_model)
model_details = client.repository.get_details(published_model_uid)
print(json.dumps(model_details, indent=2))

{
  "metadata": {
    "guid": "e3031439-7cf2-4ebb-aca6-6d69962d703f",
    "url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models/e3031439-7cf2-4ebb-aca6-6d69962d703f",
    "created_at": "2019-07-09T17:28:31.174Z",
    "modified_at": "2019-07-09T17:28:31.246Z"
  },
  "entity": {
    "runtime_environment": "python-3.6",
    "learning_configuration_url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models/e3031439-7cf2-4ebb-aca6-6d69962d703f/learning_configuration",
    "name": "cust norm linalg_norm gnfuv1",
    "learning_iterations_url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models/e3031439-7cf2-4ebb-aca6-6d69962d703f/learning_iterations",
    "feedback_url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/b4b6c696-172c-4164-8049-c0b621dbf3c9/published_models/e3031439-7cf2-4ebb-aca6-6d69962d703f/feedback",
    "l

<a id="deploy"></a>

## 6 Deploy and predict new input

In this section, you will deploy the saved model that uses the custom transformer and perform predictions. You will use the Watson Machine Learning client to perform these tasks.

### 6.1 Deploy the model

In [34]:
created_deployment = client.deployments.create(published_model_uid, name="k_linalg_gnfuv1_skl")



#######################################################################################

Synchronous deployment creation for uid: 'e3031439-7cf2-4ebb-aca6-6d69962d703f' started

#######################################################################################


INITIALIZING
DEPLOY_IN_PROGRESS................
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='0ca6545c-a075-4281-a744-3257e3f62ad3'
------------------------------------------------------------------------------------------------




### 6.2 Predict using the deployed model

Get the URL that is to be used for prediction. The prediction URL is obtained from the deployment details of the deployment created above.

In [None]:
scoring_endpoint = client.deployments.get_scoring_url(created_deployment)
print(scoring_endpoint)

Prepare the payload for prediction. The payload contains the input records for which predictions has to be performed.

In [36]:
scoring_payload = {'fields': ["time", "humidity"], 
                   'values': [[79863, 47]]}

Execute the method to perform online predictions and display the prediction results.

In [37]:
predictions = client.deployments.score(scoring_endpoint, scoring_payload)

In [38]:
print(json.dumps(predictions, indent=2))

{
  "fields": [
    "prediction"
  ],
  "values": [
    [
      7.276982664417417
    ]
  ]
}


### 6.3 Delete the deployments, libaries, models and runtimes

Use the following method to delete the deployment.

In [39]:
client.deployments.delete(client.deployments.get_uid(created_deployment))

'SUCCESS'

Check that all your deployments are deleted.

In [40]:
client.deployments.list()

------------------------------------  -------------------------------------------------------  -------  ------------------  ------------------------  -----------------  -------------
GUID                                  NAME                                                     TYPE     STATE               CREATED                   FRAMEWORK          ARTIFACT TYPE
7985e484-4192-400e-82e5-3756d4600668  ARIMA model python function deployment                   online   DEPLOY_SUCCESS      2019-07-03T20:33:40.817Z  n/a                function
6aedb5b7-638a-4388-ab0d-45fecb3b7081  Customer Churn Prediction                                online   DEPLOY_SUCCESS      2019-07-03T15:51:34.166Z  mllib-2.3          model
cfcd5f9e-5b07-4bea-b57d-304c12254add  sklearn_pipeline_arima                                   online   DEPLOY_SUCCESS      2019-07-03T01:06:29.302Z  scikit-learn-0.19  model
7b045679-07c9-4225-9116-c153c6359588  Virtual deployment of Boston model                       virtual  DE

Delete the library, model or runtime by passing in the appropriate GUID:

In [41]:
client.repository.delete(published_model_uid)

'SUCCESS'

In [42]:
client.repository.delete(custom_library_uid)

'SUCCESS'

In [43]:
client.repository.delete(runtime_uid)

'SUCCESS'

In [44]:
client.repository.list()

------------------------------------  -------------------------------------------  ------------------------  ------------------  ----------
GUID                                  NAME                                         CREATED                   FRAMEWORK           TYPE
0af2fb87-ab96-4493-9064-ea3f9b516600  RF_AttackDetection_PySpark                   2019-07-08T23:06:46.358Z  mllib               definition
e7bdf9ff-0ae5-42e1-a8fa-f65ef1297cfc  Product line model                           2019-07-08T21:39:37.805Z  mllib               definition
763445e7-ff63-456f-bb12-f235e520ece1  My definition name                           2019-07-08T21:33:49.284Z  tensorflow          definition
86cec0cd-1dd0-404d-98c9-aee5d6c859ca  Customer churn Spark model                   2019-07-03T15:47:47.262Z  mllib               definition
3417726e-bf1b-417a-813a-1bddab36fca9  Customer churn Spark model                   2019-06-28T18:53:02.483Z  mllib               definition
cbe6295f-651b-491a-81bb-3d

<a id="summary"></a>

### 7. Summary

You successfully completed this notebook! 
 
You learned how to use a scikit-learn model with custom transformer in Watson Machine Learning service to deploy and score.

Check out our <a href="https://dataplatform.ibm.com/docs/content/analyze-data/wml-setup.html" target="_blank" rel="noopener noreferrer">Online Documentation</a> for more samples, tutorials, documentation, how-tos, and blog posts. 

### Citations

Dua, D. and Karra Taniskidou, E. (2017). <a href="http://archive.ics.uci.edu/ml" target="_blank" rel="noopener noreferrer">UCI Machine Learning Repository.</a>. Irvine, CA: University of California, School of Information and Computer Science.

Harth, N. Anagnostopoulos, C. (2018) Edge-centric Efficient Regression Analytics. In: 2018 IEEE International Conference on Edge Computing (EDGE), San Francisco, CA, USA, 02-07 Jul 2018 

## Author

**Krishnamurthy Arthanarisamy**, is a senior technical lead in IBM Watson Machine Learning team. Krishna works on developing cloud services that caters to different stages of machine learning and deep learning modeling life cycle.

<hr>
Copyright © 2018, 2019 IBM. This notebook and its source code are released under the terms of the MIT License.

<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>