# Use scikit-learn and custom library to predict temperature with `ibm-watson-machine-learning`

This notebook contains steps and code to train a Scikit-Learn model that uses a custom defined transformer and use it with Watson Machine Learning service. Once the model is trained, this notebook contains steps to persist the model and custom defined transformer to Watson Machine Learning Repository, deploy and score it using Watson Machine Learning python client.

In this notebook, we use GNFUV dataset that contains mobile sensor readings data about humidity and temperature from Unmanned Surface Vehicles in a test-bed in Athens, to train a Scikit-Learn model for predicting the temperature. 

Some familiarity with Python is helpful. This notebook uses Python-3.6, scikit-learn-0.20

## Learning goals

The learning goals of this notebook are:

- Train a model with custom defined transformer
- Persist the custom defined transformer and the model in Watson Machine Learning repository.
- Deploy the model using Watson Machine Learning Service
- Perform predictions using the deployed model

## Contents
1.	[Set up the environment](#setup)
2.	[Install python library containing custom transformer implementation](#install_lib)
3.  [Prepare training data](#load)
4.	[Train the scikit-learn model](#train)
5.	[Save the model and library to WML Repository](#persistence)
6.	[Deploy and score data in the IBM Cloud](#deploy)
7.  [Clean up](#cleanup)
8.	[Summary and next steps](#summary)


<a id="setup"></a>
## 1. Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create the instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-service-instance.html?context=analytics" target="_blank" rel="noopener no referrer">here</a>).


### Connection to WML

Authenticate the Watson Machine Learning service on IBM Cloud. You need to provide platform `api_key` and instance `location`.

You can use [IBM Cloud CLI](https://cloud.ibm.com/docs/cli/index.html) to retrieve platform API Key and instance location.

API Key can be generated in the following way:
```
ibmcloud login
ibmcloud iam api-key-create API_KEY_NAME
```

In result, get the value of `api_key` from the output.


Location of your WML instance can be retrieved in the following way:
```
ibmcloud login --apikey API_KEY -a https://cloud.ibm.com
ibmcloud resource service-instance WML_INSTANCE_NAME
```

In result, get the value of `location` from the output.

**Tip**: Your `Cloud API key` can be generated by going to the [**Users** section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the **API Keys** section, and click **Create an IBM Cloud API key**. Give your key a name and click **Create**, then copy the created key and paste it below. You can also get a service specific url by going to the [**Endpoint URLs** section of the Watson Machine Learning docs](https://cloud.ibm.com/apidocs/machine-learning).  You can check your instance location in your  <a href="https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance details.

You can also get service specific apikey by going to the [**Service IDs** section of the Cloud Console](https://cloud.ibm.com/iam/serviceids).  From that page, click **Create**, then copy the created key and paste it below.

**Action**: Enter your `api_key` and `location` in the following cell.

In [3]:
api_key = 'PASTE YOUR PLATFORM API KEY HERE'
location = 'PASTE YOUR INSTANCE LOCATION HERE'

In [4]:
wml_credentials = {
    "apikey": api_key,
    "url": 'https://' + location + '.ml.cloud.ibm.com'
}

### Install and import the `ibm-watson-machine-learning` package
**Note:** `ibm-watson-machine-learning` documentation can be found <a href="http://ibm-wml-api-pyclient.mybluemix.net/" target="_blank" rel="noopener no referrer">here</a>.


In [None]:
!pip install -U ibm-watson-machine-learning

In [2]:
from ibm_watson_machine_learning import APIClient

client = APIClient(wml_credentials)

### Working with spaces

First of all, you need to create a space that will be used for your work. If you do not have space already created, you can use [Deployment Spaces Dashboard](https://dataplatform.cloud.ibm.com/ml-runtime/spaces?context=cpdaas) to create one.

- Click New Deployment Space
- Create an empty space
- Select Cloud Object Storage
- Select Watson Machine Learning instance and press Create
- Copy `space_id` and paste it below

**Tip**: You can also use SDK to prepare the space for your work. More information can be found [here](https://github.com/IBM/watson-machine-learning-samples/blob/master/cloud/notebooks/python_sdk/instance-management/Space%20management.ipynb).

**Action**: Assign space ID below

In [3]:
space_id = 'PASTE YOUR SPACE ID HERE'

You can use `list` method to print all existing spaces.

In [None]:
client.spaces.list(limit=10)

To be able to interact with all resources available in Watson Machine Learning, you need to set **space** which you will be using.

In [4]:
client.set.default_space(space_id)

'SUCCESS'

<a id="install_lib"></a>

## 2. Install the library containing custom transformer

The library - `linalgnorm-0.1.zip` is a python distributable package that contains the implementation of a user defined Scikit-Learn transformer - `LNormalizer` . <br>
Any 3rd party libraries that are required for the custom transformer must be defined as the dependency for the corresponding library that contains implementation of the transformer. 


In this section, we download the library and install it in the current notebook environment. 

In [5]:
!wget https://github.com/IBM/watson-machine-learning-samples/raw/master/cloud/libraries/linalgnorm-0.1.zip --output-document=linalgnorm-0.1.zip

--2020-10-07 14:49:56--  https://github.com/IBM/watson-machine-learning-samples/raw/master/cloud/libraries/linalgnorm-0.1.zip
Resolving github.com... 140.82.121.3
Connecting to github.com|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/libraries/linalgnorm-0.1.zip [following]
--2020-10-07 14:49:57--  https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/libraries/linalgnorm-0.1.zip
Resolving raw.githubusercontent.com... 151.101.64.133, 151.101.128.133, 151.101.192.133, ...
Connecting to raw.githubusercontent.com|151.101.64.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2550 (2.5K) [application/zip]
Saving to: 'linalgnorm-0.1.zip'


2020-10-07 14:49:58 (13.9 MB/s) - 'linalgnorm-0.1.zip' saved [2550/2550]



Install the downloaded library using `pip` command

In [6]:
!pip install linalgnorm-0.1.zip

Processing ./linalgnorm-0.1.zip
Building wheels for collected packages: linalgnorm
  Building wheel for linalgnorm (setup.py) ... [?25ldone
[?25h  Created wheel for linalgnorm: filename=linalgnorm-0.1-py3-none-any.whl size=1670 sha256=49a7004c422c17b7b0cfb95da3296424ea79aca801d2144fd2b65009d0e64e1c
  Stored in directory: /Users/jansoltysik/Library/Caches/pip/wheels/f7/79/7a/03678a810012d80c7611cb545c23c9c562c8465eabcbd663dc
Successfully built linalgnorm
Installing collected packages: linalgnorm
Successfully installed linalgnorm-0.1


<a id="load"></a>

## 3. Download training dataset and prepare training data

Download the data from UCI repository - https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV%20USV%20Dataset.zip

In [7]:
!rm -rf dataset
!mkdir dataset

In [8]:
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV%20USV%20Dataset.zip --output-document=dataset/gnfuv_dataset.zip

--2020-10-07 14:50:05--  https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV%20USV%20Dataset.zip
Resolving archive.ics.uci.edu... 128.195.10.252
Connecting to archive.ics.uci.edu|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 501978 (490K) [application/x-httpd-php]
Saving to: 'dataset/gnfuv_dataset.zip'


2020-10-07 14:50:09 (224 KB/s) - 'dataset/gnfuv_dataset.zip' saved [501978/501978]



In [9]:
!unzip dataset/gnfuv_dataset.zip -d dataset

Archive:  dataset/gnfuv_dataset.zip
  inflating: dataset/pi2/gnfuv-temp-exp1-55d487b85b-5g2xh_1.0.csv  
  inflating: dataset/pi3/gnfuv-temp-exp1-55d487b85b-2bl8b_1.0.csv  
  inflating: dataset/pi4/gnfuv-temp-exp1-55d487b85b-xcl97_1.0.csv  
  inflating: dataset/pi5/gnfuv-temp-exp1-55d487b85b-5ztk8_1.0.csv  
  inflating: dataset/README.pdf      


Create pandas datafame based on the downloaded dataset

In [10]:
import json
import pandas as pd
import numpy as np
import os
from datetime import datetime
from json import JSONDecodeError

In [11]:
home_dir = './dataset'
pi_dirs = os.listdir(home_dir)

data_list = []
base_time = None
columns = None

for pi_dir in pi_dirs:
    if 'pi' not in pi_dir:
        continue
    curr_dir = os.path.join(home_dir, pi_dir)
    data_file = os.path.join(curr_dir, os.listdir(curr_dir)[0])
    with open(data_file, 'r') as f:
        line = f.readline().strip().replace("'", '"')
        while line != '':
            try:
                input_json = json.loads(line)
                sensor_datetime = datetime.fromtimestamp(input_json['time'])
                if base_time is None:
                    base_time = datetime(sensor_datetime.year, sensor_datetime.month, sensor_datetime.day, 0, 0, 0, 0)
                input_json['time'] = (sensor_datetime - base_time).seconds
                data_list.append(list(input_json.values()))
                if columns is None:
                    columns = list(input_json.keys())
            except JSONDecodeError as je:
                pass
            line = f.readline().strip().replace("'", '"')

data_df = pd.DataFrame(data_list, columns=columns)

In [12]:
data_df.head()

Unnamed: 0,device,humidity,temperature,experiment,time
0,gnfuv-temp-exp1-55d487b85b-5g2xh,21.0,40.0,1.0,69557
1,gnfuv-temp-exp1-55d487b85b-5g2xh,21.0,40.0,1.0,69571
2,gnfuv-temp-exp1-55d487b85b-5g2xh,21.0,40.0,1.0,69577
3,gnfuv-temp-exp1-55d487b85b-5g2xh,21.0,40.0,1.0,69583
4,gnfuv-temp-exp1-55d487b85b-5g2xh,22.0,40.0,1.0,69589


Create training and test datasets from the downloaded GNFUV-USV dataset.

In [13]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

Y = data_df['temperature']
X = data_df.drop('temperature', axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=143)

<a id="train"></a>

## 4. Train a model

In this section, you will use the custom transformer as a stage in the Scikit-Learn `Pipeline` and train a model.

#### Import the custom transformer 
Here, import the custom transformer that has been defined in `linalgnorm-0.1.zip` and create an instance of it that will inturn be used as stage in `sklearn.Pipeline`

In [14]:
from linalg_norm.sklearn_transformers import LNormalizer

In [15]:
lnorm_transf = LNormalizer()

Import other objects required to train a model

In [16]:
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression

Now, you can create a `Pipeline` with user defined transformer as one of the stages and train the model

In [17]:
skl_pipeline = Pipeline(steps=[('normalizer', lnorm_transf), ('regression_estimator', LinearRegression())])
skl_pipeline.fit(X_train.loc[:, ['time', 'humidity']].values, y_train)

Pipeline(steps=[('normalizer', LNormalizer()),
                ('regression_estimator', LinearRegression())])

In [18]:
y_pred = skl_pipeline.predict(X_test.loc[:, ['time', 'humidity']].values)
rmse = np.mean((np.round(y_pred) - y_test.values)**2)**0.5
print('RMSE: {}'.format(rmse))

RMSE: 2.213758431322581


<a id="persistence"></a>

## 5. Persist the model and custom library

In this section, using `ibm-watson_machine_learning` SDK, you will ...
- save the library `linalgnorm-0.1.zip` in WML Repository by creating a package extension resource
- create a Software Specification resource and bind the package resource to it. This Software Specification resource will be used to configure the online deployment runtime environment for a model 
- bind Software Specification resource to the model and save the model to WML Repository

### Create package extension

Define the meta data required to create package extension resource. <br>

The value for `file_path` in `client.package_extensions.LibraryMetaNames.store()` contains the library file name that must be uploaded to the WML.

**Note:** You can also use conda environment configuration file `yaml` as package extension input. In such case set the `TYPE` to `conda_yml` and `file_path` to yaml file.
```
client.package_extensions.ConfigurationMetaNames.TYPE = "conda_yml"
```

In [19]:
meta_prop_pkg_extn = {
    client.package_extensions.ConfigurationMetaNames.NAME: "K_Linag_norm_skl",
    client.package_extensions.ConfigurationMetaNames.DESCRIPTION: "Pkg extension for custom lib",
    client.package_extensions.ConfigurationMetaNames.TYPE: "pip_zip"
}

pkg_extn_details = client.package_extensions.store(meta_props=meta_prop_pkg_extn, file_path="linalgnorm-0.1.zip")
pkg_extn_uid = client.package_extensions.get_uid(pkg_extn_details)
pkg_extn_url = client.package_extensions.get_href(pkg_extn_details)


Creating package extensions
SUCCESS


Display the details of the package extension resource that was created in the above cell.

In [20]:
details = client.package_extensions.get_details(pkg_extn_uid)

### Create software specification and add custom library

Define the meta data required to create software spec resource and bind the package. This software spec resource will be used to configure the online deployment runtime environment for a model.

In [21]:
client.software_specifications.ConfigurationMetaNames.show()

---------------------------  ----  --------  --------------------------------
META_PROP NAME               TYPE  REQUIRED  SCHEMA
NAME                         str   Y
DESCRIPTION                  str   N
PACKAGE_EXTENSIONS           list  N
SOFTWARE_CONFIGURATION       dict  N         {'platform(required)': 'string'}
BASE_SOFTWARE_SPECIFICATION  dict  Y
---------------------------  ----  --------  --------------------------------


#### List base software specifications

In [22]:
client.software_specifications.list()

-----------------------------  ------------------------------------  ----
NAME                           ASSET_ID                              TYPE
default_py3.6                  0062b8c9-8b7d-44a0-a9b9-46c416adcbd9  base
scikit-learn_0.20-py3.6        09c5a1d0-9c1e-4473-a344-eb7b665ff687  base
spark-mllib_3.0-scala_2.12     09f4cff0-90a7-5899-b9ed-1ef348aebdee  base
ai-function_0.1-py3.6          0cdb0f1e-5376-4f4d-92dd-da3b69aa9bda  base
shiny-r3.6                     0e6e79df-875e-4f24-8ae9-62dcc2148306  base
pytorch_1.1-py3.6              10ac12d6-6b30-4ccd-8392-3e922c096a92  base
scikit-learn_0.22-py3.6        154010fa-5b3b-4ac1-82af-4d5ee5abbc85  base
tensorflow_2.1-py3.6           160fcf8b-0f8e-4043-ba83-e3bf021ea4fa  base
default_r3.6                   1b70aec3-ab34-4b87-8aa0-a4a3c8296a36  base
tensorflow_1.15-py3.6          2b73a275-7cbf-420b-a912-eae7f436e0bc  base
pytorch_1.2-py3.6              2c8ef57d-2687-4b7d-acce-01f94976dac1  base
spark-mllib_2.3                2e51f70

#### Select base software specification to extend

In [25]:
base_sw_spec_uid = client.software_specifications.get_uid_by_name("default_py3.7")

#### Define new software specification based on base one and custom library

In [26]:
meta_prop_sw_spec = {
    client.software_specifications.ConfigurationMetaNames.NAME: "linalgnorm-0.1",
    client.software_specifications.ConfigurationMetaNames.DESCRIPTION: "Software specification for linalgnorm-0.1",
    client.software_specifications.ConfigurationMetaNames.BASE_SOFTWARE_SPECIFICATION: {"guid": base_sw_spec_uid}
}

sw_spec_details = client.software_specifications.store(meta_props=meta_prop_sw_spec)
sw_spec_uid = client.software_specifications.get_uid(sw_spec_details)


client.software_specifications.add_package_extension(sw_spec_uid, pkg_extn_uid)

SUCCESS


'SUCCESS'

### Save the model

Define the metadata to save the trained model to WML Repository along with the information about the software spec resource required for the model. 

The `client.repository.ModelMetaNames.SOFTWARE_SPEC_UID` metadata property is used to specify the GUID of the software spec resource that needs to be associated with the model.

In [27]:
model_props = {
    client.repository.ModelMetaNames.NAME: "Temp prediction model with custom lib",
    client.repository.ModelMetaNames.TYPE: 'scikit-learn_0.23',
    client.repository.ModelMetaNames.SOFTWARE_SPEC_UID: sw_spec_uid
    
}

Save the model to the WML Repository and display its saved metadata. 

In [28]:
published_model = client.repository.store_model(model=skl_pipeline, meta_props=model_props)

In [29]:
published_model_uid = client.repository.get_model_uid(published_model)
model_details = client.repository.get_details(published_model_uid)
print(json.dumps(model_details, indent=2))

{
  "entity": {
    "software_spec": {
      "id": "49f75adc-a156-45e4-802f-b3aed47472c9",
      "name": "linalgnorm-0.1"
    },
    "type": "scikit-learn_0.23"
  },
  "metadata": {
    "created_at": "2020-10-07T12:51:56.275Z",
    "id": "44ed59ec-8daa-40d6-87b0-34d824724133",
    "modified_at": "2020-10-07T12:51:59.081Z",
    "name": "Temp prediction model with custom lib",
    "owner": "IBMid-5500067NJD",
    "space_id": "dba54737-1397-4499-9e82-1c67360ba597"
  }
}


<a id="deploy"></a>

## 6 Deploy and Score

In this section, you will deploy the saved model that uses the custom transformer and perform predictions. You will use WML client to perform these tasks.

### Deploy the model

In [30]:
metadata = {
    client.deployments.ConfigurationMetaNames.NAME: "Deployment of custom lib model",
    client.deployments.ConfigurationMetaNames.ONLINE: {}
}

created_deployment = client.deployments.create(published_model_uid, meta_props=metadata)



#######################################################################################

Synchronous deployment creation for uid: '44ed59ec-8daa-40d6-87b0-34d824724133' started

#######################################################################################


initializing..................................
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='36bb7c32-4736-4f78-8d01-c0b0bf84d5fc'
------------------------------------------------------------------------------------------------




### Predict using the deployed model

**Note**: Here we use deployment `uid` saved in published_model object. In next section, we show how to retrive deployment url from Watson Machine Learning instance.

In [31]:
deployment_uid = client.deployments.get_uid(created_deployment)

Now you can print an online scoring endpoint. 

In [32]:
scoring_endpoint = client.deployments.get_scoring_href(created_deployment)
print(scoring_endpoint)

https://wml-fvt.ml.test.cloud.ibm.com/ml/v4/deployments/36bb7c32-4736-4f78-8d01-c0b0bf84d5fc/predictions


Prepare the payload for prediction. The payload contains the input records for which predictions has to be performed.

In [33]:
scoring_payload = {
    "input_data": [{
        'fields': ["time", "humidity"],
        'values': [[79863, 47]]}]
}

Execute the method to perform online predictions and display the prediction results

In [34]:
predictions = client.deployments.score(deployment_uid, scoring_payload)

In [35]:
print(json.dumps(predictions, indent=2))

{
  "predictions": [
    {
      "fields": [
        "prediction"
      ],
      "values": [
        [
          14.629242312262988
        ]
      ]
    }
  ]
}


<a id="cleanup"></a>
## 7. Clean up

If you want to clean up all created assets:
- experiments
- trainings
- pipelines
- model definitions
- models
- functions
- deployments

please follow up this sample [notebook](https://github.com/IBM/watson-machine-learning-samples/blob/master/cloud/notebooks/python_sdk/instance-management/Machine%20Learning%20artifacts%20management.ipynb).

<a id="summary"></a>

## 8. Summary

You successfully completed this notebook! 
 
You learned how to use a scikit-learn model with custom transformer in Watson Machine Learning service to deploy and score.

Check out our [Online Documentation](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-service-instance.html?context=analytics) for more samples, tutorials, documentation, how-tos, and blog posts. 

## Author

**Krishnamurthy Arthanarisamy**, is a senior technical lead in IBM Watson Machine Learning team. Krishna works on developing cloud services that caters to different stages of machine learning and deep learning modeling life cycle.

**Lukasz Cmielowski**, PhD, is a Software Architect and Data Scientist at IBM.

Copyright © 2020, 2021 IBM. This notebook and its source code are released under the terms of the MIT License.