<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Watson OpenScale  Multi Output Regression Models

This notebook should be run using with **Python 3.9 or Python 3.10** runtime environment in **IBM Cloud Pak for Data 4.7.x** 

It requires service credentials for the following services:
  * Watson OpenScale
  

# OpenScale Headless Subscription for Multi-Output model


### Headless Subscription

Some customers are unwilling to expose their Machine Learning model scoring endpoint but customers are looking for measuring the performance of their multi-ouput models. In OpenScale, the customers can create a custom ML provider with an empty deployment URL, and there by configure an headless subscription by describing the payload data, followed by logging the feedback data and configuring the monitors for multi-output models.

### Multi-Output Model
Multi-output model predicts multiple outputs for each sample. In multi-output regression, the model will give two or more numeric outputs given an input.


This notebook will create headless subscription for multi-output regression model using the `Boston house price` dataset, predicts the multiple outputs ie `Price` of the house and `PTRatio`(pupil-teacher ratio by town) using the given features and save the Boston house price data along with the predictions to the Openscale feedback table, configure the quality monitor and evaluate the quality metrics for each output.


# Setup <a name="setup"></a>

## Package installation

In [None]:
!pip install --upgrade ibm-watson-machine-learning --user | tail -n 1
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1

### Action: restart the kernel!

In [None]:
import warnings
warnings.filterwarnings('ignore')

## Configure credentials

In [None]:
############################################################################################
# Paste your Watson OpenScale credentials into the following section and then run this cell.
############################################################################################

WOS_CREDENTIALS = {
    "url": "<Cloud Pak for Data Host URL>",
    "username": "<User>",
    "apikey": "<User APIKey>"
}

### Enter your Watson OpenScale GUID.

For most systems, the default GUID is already entered for you. You would only need to update this particular entry if the GUID was changed from the default.



In [None]:
#Update your Watson OpenScale datamart id.
WOS_GUID="00000000-0000-0000-0000-000000000000"

# Generate Feedback data for the Boston Housing Price dataset

Download the Boston housing price dataset from <a href='https://www.kaggle.com/datasets/altavish/boston-housing-dataset'>Kaggle</a> or sklearn datasets and modify the format of multi output columns as per the Openscale requirement. 


###  Sample Multi-Output  data

Openscale expects the multiple output columns to be in single array output column. User need to combine the multi target columns into single array and provide the output column names in the `prediction_output_names` field while creating the subscription.

In [None]:
#Sample record from botson housing price data set 
#CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,B,LSTAT,PTRatio,Price
#0.04932,33.0,2.18,0.0,0.472,6.849,70.3,3.1827,7.0,222.0,396.9,7.53,18.4, 28.2

#Openscale Expected Format
#CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,B,LSTAT,PTRatio_Price
#0.04932,33.0,2.18,0.0,0.472,6.849,70.3,3.1827,7.0,222.0,396.9,7.53,[18.4, 28.2]

### Read the Boston Housing Price Dataset
Reads the Housing Price data from the sklearn datasets  and identify the feature columns and multiple output columns and creates the MultiOutputRegressor model to predict the multiple outputs and save the actual data along with predicted output to a csv file. 


In [None]:
# Importing the Boston Housing dataset
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.utils import shuffle
import pandas as pd
import numpy as np

In [None]:
# Loading the Boston Housing dataset
boston = load_boston()
# Initializing the dataframe
boston_housing_df = pd.DataFrame(boston.data)
#Adding the feature names to the dataframe
boston_housing_df.columns = boston.feature_names
#Adding target variable to dataframe
boston_housing_df['PRICE'] = boston.target
boston_housing_df.head()

In [None]:
# Split the data into train and test with 50 train / 50 test
boston_housing_df_train, boston_housing_df_test = train_test_split(boston_housing_df, test_size=0.5, random_state = 1)
boston_housing_df_train

In [None]:
# Get Y1 and Y2 as the 2 outputs and format them as np arrays
# PTRATIO - pupil-teacher ratio by town
y1 = boston_housing_df_train.pop('PTRATIO')
y1 = np.array(y1)
y2 = boston_housing_df_train.pop('PRICE')
y2 = np.array(y2)

Y = np.vstack((y1, y2)).T
X = np.array(boston_housing_df_train)


#### Create MultiOutputRegressor Model

In [None]:
model = MultiOutputRegressor(GradientBoostingRegressor(random_state=0)).fit(X, Y)
predictions = model.predict(X)
predictions[:5]

#### Input Data

Specify the `feature` fields and  `target` columns of the dataset 

In [None]:
features_list = boston_housing_df_train.columns
target_columns_list = ["PTRATIO","PRICE"]
#Name of the prediction column in the Openscale feedback table
prediction_column_name = "_original_prediction"
#Name of the label field
label_column = "PTRatio_Price"

In [None]:
def format_predictions(predictions):
    pred_array = list()
    
    for pred in predictions:
        pred_array.append(pred)
    return pred_array


#### Predict the test data 

In [None]:
boston_housing_df_test[prediction_column_name] = boston_housing_df_test[features_list].apply(lambda x: model.predict([x])[0],axis=1)
boston_housing_df_test[prediction_column_name] = boston_housing_df_test[prediction_column_name].apply(lambda x: format_predictions(x))
boston_housing_df_test.head()

In [None]:
# Combine the multiple target columns `PTRATIO`, `PRICE` to single column `PTRatio_Price`

boston_housing_df_test[label_column] = boston_housing_df_test[target_columns_list].apply(list, axis=1)
boston_housing_df_test.head()
new_columns = list()
for feature in features_list:
    new_columns.append(feature)
new_columns.append(label_column)
new_columns.append(prediction_column_name)

# Swapping the columns
boston_housing_df_test = boston_housing_df_test.reindex(columns=new_columns)
boston_housing_df_test.head()


#### Save the  `Boston Housing Price` data  along with the predictions data to the csv<a name = "save_data">

In [None]:
#Save the actual data along with the predictions data to the csv
import csv
boston_housing_df_test.to_csv("boston_housing_price_feedback.csv",sep=',',index=False)

# Configure OpenScale 

The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [None]:
from ibm_watson_openscale import APIClient
from ibm_watson_openscale.utils import *
from ibm_watson_openscale.supporting_classes import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.base_classes.watson_open_scale_v2 import *
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

import json
import requests
import base64
from requests.auth import HTTPBasicAuth
import time

## Get a instance of the OpenScale SDK client

In [None]:
authenticator = CloudPakForDataAuthenticator(
        url=WOS_CREDENTIALS['url'],
        username=WOS_CREDENTIALS['username'],
        apikey=WOS_CREDENTIALS['apikey'],
        disable_ssl_verification=True
    )
wos_client = APIClient(service_url=WOS_CREDENTIALS['url'],authenticator=authenticator, service_instance_id=WOS_GUID)
wos_client.version


## OpenScale DataMart

Watson OpenScale uses a database to store payload and feedback logs and calculated metrics. Here we are using already configured data mart.

In [None]:
wos_client.data_marts.show()

In [None]:
data_marts = wos_client.data_marts.list().result.data_marts
data_mart_id=data_marts[0].metadata.id
print('Using existing datamart {}'.format(data_mart_id))

## Service Provider

In [None]:
#Show the existing providers

wos_client.service_providers.show()

## Remove existing service provider

Multiple service providers for the same engine instance are avaiable in Watson OpenScale. To avoid multiple service providers of used WML instance in the tutorial notebook the following code deletes existing service provder(s) and then adds new one.

In [None]:
SERVICE_PROVIDER_NAME = "OpenScale Headless Service Provider"
SERVICE_PROVIDER_DESCRIPTION = "Added by tutorial WOS notebook to showcase Multi output models functionality."

In [None]:
service_providers = wos_client.service_providers.list().result.service_providers
for service_provider in service_providers:
    service_instance_name = service_provider.entity.name
    if service_instance_name == SERVICE_PROVIDER_NAME:
        service_provider_id = service_provider.metadata.id
        wos_client.service_providers.delete(service_provider_id)
        print("Deleted existing service_provider for WML instance: {}".format(service_provider_id))

## Add service provider

Watson OpenScale needs to be bound to the Watson Machine Learning instance to capture payload data into and out of the model.

Note: Here the service provider is created with empty credentials, meaning no endpoint. Just to demonstrate the use case were we don't need an actual end point serving requests.

In [None]:
MLCredentials = {}
added_service_provider_result = wos_client.service_providers.add(
        name=SERVICE_PROVIDER_NAME,
        description=SERVICE_PROVIDER_DESCRIPTION,
        service_type=ServiceTypes.CUSTOM_MACHINE_LEARNING,
        operational_space_id = "production",
        credentials=MLCredentials,
        background_mode=False
    ).result
service_provider_id = added_service_provider_result.metadata.id

In [None]:
wos_client.service_providers.show()
print(wos_client.service_providers.get(service_provider_id).result)

## Subscriptions

Remove existing subscriptions

This code removes previous subscriptions to the model to refresh the monitors with the new model and new data.

In [None]:
wos_client.subscriptions.show()

## Remove the existing subscription

In [None]:
SUBSCRIPTION_NAME = "Multi-Output Regression Headless Subscription"

In [None]:
subscriptions = wos_client.subscriptions.list().result.subscriptions
for subscription in subscriptions:
    if subscription.entity.asset.name == '[asset] ' + SUBSCRIPTION_NAME:
        sub_model_id = subscription.metadata.id
        wos_client.subscriptions.delete(subscription.metadata.id)
        print('Deleted existing subscription for model', sub_model_id)

In [None]:
#List of the feature fields
feature_columns=list(features_list)

#Names of the multiple output columns in the Boston housing price dataset 
prediction_output_names = ["PTRatio","Price"]


This code creates the model subscription in OpenScale using the Python client API. Note that we need to provide the model unique identifier, and some information about the model itself.

In [None]:
print("Data Mart ID: " + data_mart_id)
print("Service Provide ID: " + service_provider_id)
import uuid
asset_id = str(uuid.uuid4())
asset_name = '[asset] ' + SUBSCRIPTION_NAME
url = None

asset_deployment_id = str(uuid.uuid4())
asset_deployment_name = asset_name

In [None]:
subscription_details = wos_client.subscriptions.add(data_mart_id,
    service_provider_id,
    asset=Asset(
        asset_id=asset_id,
        name=asset_name,
        url=url,
        asset_type=AssetTypes.MODEL,
        input_data_type=InputDataType.STRUCTURED,
        problem_type=ProblemType.REGRESSION
    ),
    deployment=AssetDeploymentRequest(name="deployment_"+asset_name,
                                     deployment_id=asset_id,
                                     deployment_type= DeploymentTypes.ONLINE),
    asset_properties=AssetPropertiesRequest(
        probability_fields=[ "probability" ],
        label_column=label_column,
        prediction_field="prediction",
        prediction_names= prediction_output_names,
        feature_fields = feature_columns,
        categorical_fields = None
    ),
    deployment_name = asset_name,
   background_mode=False
).result

subscription_id = subscription_details.metadata.id
print("Subscription id {}".format(subscription_id))



In [None]:
wos_client.subscriptions.get(subscription_id).result.to_dict()

### The following code fetches the data set id, against which we would be performing the payload logging

In [None]:
import time

time.sleep(5)
payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id:", payload_data_set_id)

## Push a payload record to setup the required schemas in the subscription

This is the location where one needs to fetch the output of the scoring model and construct the payload as per the OpenScale Payload Logging format.

Note : No scoring is done against the model. The PayloadRecord is constructed as per the format required for the multi output models in Openscale with the request and response from the machine learning model 

## Scoring Request Payload

In [None]:
scoring_request =   {
        "fields": [
            "CRIM","ZN","INDUS","CHAS","NOX","RM","AGE","DIS","RAD","TAX","B","LSTAT"
           
        ],
        "values": [
            [6.44405,0.0,18.1,0.0,0.584,6.425,74.8,2.2004,24.0,666.0,97.95,12.03]
        ]
    }

## Scoring Response Payload

In [None]:
scoring_response = {
    "predictions": [
        {
            "fields": [
                "prediction"
            ],
            "values": [
                 [20.145297738866503, 17.068259536632187]
            ]
        }
    ]
}

### Construct the payload using the scoring_request and scoring_response and then log the records

In [None]:
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord

records_list=[]
for x in range(2):
    pl_record = PayloadRecord(request=scoring_request, response=scoring_response)
    records_list.append(pl_record)

wos_client.data_sets.store_records(data_set_id=payload_data_set_id, request_body=records_list)

### Make sure the records reached the payload logging table inside the OpenScale DataMart.

In [None]:
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    raise Exception("Payload logging did not happen!")

## Fetch the subscription details to confirm output data schemas are setup

In [None]:
subscription = wos_client.subscriptions.get(subscription_id).result.to_dict()
subscription

# Quality monitoring and feedback logging

## Enable quality monitoring
Evaluates the metrics for each target during the runtime evaluation and publishes the quality metrics for each target individually.


First, it turns on the quality monitor and sets an alert threshold of 80%. OpenScale will show an alert on the dashboard if the model accuracy measurement (R squared score, in the case of a regression classifier) falls below this threshold.

The second paramater supplied, min_records, specifies the minimum number of feedback records OpenScale needs before it calculates a new measurement. The quality monitor runs hourly, but the accuracy reading in the dashboard will not change until an additional 10 feedback records have been added, via the user interface, the Python client, or the supplied feedback endpoint.



In [None]:
import time

target = Target(
        target_type=TargetTypes.SUBSCRIPTION,
        target_id=subscription_id
)
parameters = {
    "min_feedback_data_size": 10
}
thresholds = [
                {
                    "metric_id": "r2",
                    "type": "lower_limit",
                    "value": .80
                }
            ]
quality_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.QUALITY.ID,
    target=target,
    parameters=parameters,
    thresholds=thresholds
).result

In [None]:
quality_monitor_instance_id = quality_monitor_details.metadata.id
quality_monitor_instance_id

## Get feedback logging dataset ID

In [None]:
feedback_dataset_id = None
feedback_dataset = wos_client.data_sets.list(type=DataSetTypes.FEEDBACK, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result
feedback_dataset_id = feedback_dataset.data_sets[0].metadata.id
if feedback_dataset_id is None:
    print("Feedback data set not found. Please check quality monitor status.")

In [None]:
feedback_dataset_id

### Store the feedback payload using OpenScale Python SDK

Read the csv file which was generated in the above [Save the Boston Housing Price data ](#save_data) cell  and save the records to the feedback table

In [None]:
import io
csv_buffer_reader = io.open('boston_housing_price_feedback.csv', mode="rb")

store_record_info = wos_client.data_sets.store_records(
      request_body=csv_buffer_reader,
      delimiter=',',
      header=True,
      data_set_id=feedback_dataset_id,
      csv_max_line_length = 8196
  )

### Wait for sometime, and make sure the records have reached to feedback table.

In [None]:
time.sleep(5)
feedback_records_count = wos_client.data_sets.get_records_count(feedback_dataset_id)
print("Number of records in the feedback logging table: {}".format(feedback_records_count))
if feedback_records_count == 0:
    raise Exception("feedback logging did not happen!")

## Run Quality Monitor

In [None]:
run_details = wos_client.monitor_instances.run(monitor_instance_id=quality_monitor_instance_id, background_mode=False).result

### Show Metrics

Shows the quality metrics for each target column

In [None]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=quality_monitor_instance_id,limit=100)

Congratulations!

You have finished configuring and evaluating quality monitor for multi output models in IBM Watson OpenScale.
