<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Watson OpenScale  Multi Output Classification Models

This notebook should be run using with **Python 3.9 or Python 3.10** runtime environment in **IBM Cloud Pak for Data 4.7.x**

It requires service credentials for the following services:
  * Watson OpenScale
  

# OpenScale Headless Subscription for Multi-Output model


### Headless Subscription

Some customers are unwilling to expose their Machine Learning model scoring endpoint but customers are looking for measuring the performance of their multi-ouput models. In OpenScale, the customers can create a custom ML provider with an empty deployment URL, and there by configure an headless subscription by describing the payload data, followed by logging the feedback data and configuring the monitors for multi-output models.

### Multi-Output Model
Multi-output model predicts multiple outputs for each sample. In multi-output classification, the model will give two or more outputs and the number of classes per output is greater than 2.

This notebook will create headless subscription for multi-output text classification model using the job type dataset, save the feature, label fields and predictions data of job type dataset to the Openscale feedback table, configure the quality monitor and evaluate the quality metrics for each output.


Note: Job type dataset can be downloaded from the <a href='https://www.kaggle.com/datasets/cactuscode7/job-descriptions-dataset'>Kaggle</a>. 


# Setup <a name="setup"></a>

## Package installation

In [None]:
!pip install --upgrade ibm-watson-machine-learning --user | tail -n 1
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1

### Action: restart the kernel!

In [None]:
import warnings
warnings.filterwarnings('ignore')

## Configure credentials

In [None]:
############################################################################################
# Paste your Watson OpenScale credentials into the following section and then run this cell.
############################################################################################

WOS_CREDENTIALS = {
    "url": "<Cloud Pak for Data Host URL>",
    "username": "<User>",
    "apikey": "<User APIKey>"
}

### Enter your Watson OpenScale GUID.

For most systems, the default GUID is already entered for you. You would only need to update this particular entry if the GUID was changed from the default.



In [None]:
#Update your Watson OpenScale datamart id.
WOS_GUID="00000000-0000-0000-0000-000000000000"

###  Sample Multi-Output  data

Openscale expects the multiple output columns to be in single array output column. User need to combine the multi target columns into single array and provide the output column names in the `prediction_output_names` field while creating the subscription.


Job type dataset has 2 output columns `job_type` and `job_category`. We have generated the scored data for this dataset and will use that data to upload to the Openscale payload logging and feeback tables.

In [None]:
#Sample record from Job type data set 
#job_description,job_type,category
#"Sales Specialist - Point of Care Diagnostics - South West/South Wales This well established and highly successful company is a leader in the supply of advanced point of \
#care diagnostic equipment to healthcare customers across the UK.With market leading products in POC clinical chemistry, haematology, blood gas, bowel cancer screening etc \
#it enjoys a great reputation with its customers and staff. The company is now looking for an enthusiastic and experienced sales specialist to work across the \
#South West of England, South Wales and Dorset promoting its high quality POC portfolio to rapid response and admission avoidance teams, primary care medical centres and hospitals.",
#"Permanent","Pharmaceutical, Healthcare and Medical Sales"

#Openscale Expected Format
#job_description,job_type_category
#"Sales Specialist - Point of Care Diagnostics - South West/South Wales This well established and highly successful company is a leader in the supply of advanced point of \
#["Permanent","Pharmaceutical, Healthcare and Medical Sales"]

### Input Data

Specify the `feature` fields and `label` columns of the dataset 

In [None]:
#List of the feature fields
feature_fields=["job_description"] 

#Name of the label field
label_field_name = "job_type_category"

#Names of the multiple output(target) columns in the dataset 
prediction_output_names = ["job_type","category"]

# Configure OpenScale 

The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [None]:
from ibm_watson_openscale import APIClient
from ibm_watson_openscale.utils import *
from ibm_watson_openscale.supporting_classes import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.base_classes.watson_open_scale_v2 import *
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

import json
import requests
import base64
from requests.auth import HTTPBasicAuth
import time

## Get a instance of the OpenScale SDK client

In [None]:
authenticator = CloudPakForDataAuthenticator(
        url=WOS_CREDENTIALS['url'],
        username=WOS_CREDENTIALS['username'],
        apikey=WOS_CREDENTIALS['apikey'],
        disable_ssl_verification=True
    )
wos_client = APIClient(service_url=WOS_CREDENTIALS['url'],authenticator=authenticator, service_instance_id=WOS_GUID)
wos_client.version

## OpenScale DataMart

Watson OpenScale uses a database to store payload and feedback logs and calculated metrics. Here we are using already configured data mart.

In [None]:
wos_client.data_marts.show()

In [None]:
data_marts = wos_client.data_marts.list().result.data_marts
data_mart_id=data_marts[0].metadata.id
print('Using existing datamart {}'.format(data_mart_id))

## Service Provider

In [None]:
#Show the existing providers

wos_client.service_providers.show()

## Remove existing service provider

Multiple service providers for the same engine instance are avaiable in Watson OpenScale. To avoid multiple service providers of used WML instance in the tutorial notebook the following code deletes existing service provder(s) and then adds new one.

In [None]:
SERVICE_PROVIDER_NAME = "OpenScale Headless Service Provider"
SERVICE_PROVIDER_DESCRIPTION = "Added by tutorial WOS notebook to showcase Multi output models functionality."

In [None]:
service_providers = wos_client.service_providers.list().result.service_providers
for service_provider in service_providers:
    service_instance_name = service_provider.entity.name
    if service_instance_name == SERVICE_PROVIDER_NAME:
        service_provider_id = service_provider.metadata.id
        wos_client.service_providers.delete(service_provider_id)
        print("Deleted existing service_provider for WML instance: {}".format(service_provider_id))

## Add service provider

Watson OpenScale needs to be bound to the Watson Machine Learning instance to capture payload data into and out of the model.

Note: Here the service provider is created with empty credentials, meaning no endpoint. Just to demonstrate the use case were we don't need an actual end point serving requests.

In [None]:
MLCredentials = {}
added_service_provider_result = wos_client.service_providers.add(
        name=SERVICE_PROVIDER_NAME,
        description=SERVICE_PROVIDER_DESCRIPTION,
        service_type=ServiceTypes.CUSTOM_MACHINE_LEARNING,
        operational_space_id = "production",
        credentials=MLCredentials,
        background_mode=False
    ).result
service_provider_id = added_service_provider_result.metadata.id

In [None]:
wos_client.service_providers.show()
print(wos_client.service_providers.get(service_provider_id).result)

## Subscriptions

Remove existing subscriptions

This code removes previous subscriptions to the model to refresh the monitors with the new model and new data.

In [None]:
wos_client.subscriptions.show()

## Remove the existing subscription

In [None]:
SUBSCRIPTION_NAME = "Multi-Output Classification Headless Subscription"

In [None]:
subscriptions = wos_client.subscriptions.list().result.subscriptions
for subscription in subscriptions:
    if subscription.entity.asset.name == '[asset] ' + SUBSCRIPTION_NAME:
        sub_model_id = subscription.metadata.id
        wos_client.subscriptions.delete(subscription.metadata.id)
        print('Deleted existing subscription for model', sub_model_id)

This code creates the model subscription in OpenScale using the Python client API. Note that we need to provide the model unique identifier, and some information about the model itself.

In [None]:
print("Data Mart ID: " + data_mart_id)
print("Service Provide ID: " + service_provider_id)
import uuid
asset_id = str(uuid.uuid4())
asset_name = '[asset] ' + SUBSCRIPTION_NAME
url = None

asset_deployment_id = str(uuid.uuid4())
asset_deployment_name = asset_name

In [None]:
subscription_details = wos_client.subscriptions.add(data_mart_id,
    service_provider_id,
    asset=Asset(
        asset_id=asset_id,
        name=asset_name,
        url=url,
        asset_type=AssetTypes.MODEL,
        input_data_type=InputDataType.UNSTRUCTURED_TEXT,
        problem_type=ProblemType.MULTICLASS_CLASSIFICATION
    ),
    deployment=AssetDeploymentRequest(name="deployment_"+asset_name,
                                     deployment_id=asset_id,
                                     deployment_type= DeploymentTypes.ONLINE),
    asset_properties=AssetPropertiesRequest(
        probability_fields=[ "probability" ],
        label_column=label_field_name,
        prediction_field="prediction",
        prediction_names= prediction_output_names,
        feature_fields = feature_fields,
        categorical_fields = None
    ),
    deployment_name = asset_name,
   background_mode=False
).result

subscription_id = subscription_details.metadata.id
print("Subscription id {}".format(subscription_id))



In [None]:
wos_client.subscriptions.get(subscription_id).result.to_dict()

### The following code fetches the data set id, against which we would be performing the payload logging

In [None]:
import time

time.sleep(5)
payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id:", payload_data_set_id)

## Push a payload record to setup the required schemas in the subscription

This is the location where one needs to fetch the output of the scoring model and construct the payload as per the OpenScale Payload Logging format.

Note : No scoring is done against the model. The PayloadRecord is constructed as per the format required for the multi output models in Openscale with the request and response from the machine learning model 

## Scoring Request Payload

In [None]:
scoring_request =   {
        "fields": [
            "job_description"
           
        ],
        "values": [
            ["Zest Scientific searching accomplished Scientific Sales professional Netherlands. territory play pivotal role company increases presence Europe, presenting wealth untapped potential. client internationally recognized providing 'best class' solutions. region provides excellent growth opportunities motivated entrepreneurial Technical Sales Specialist. highly autonomous role input encouraged provided platform implement selling methods / business plan. Candidate: * demonstrate successful track record selling academic research, clinical diagnostics / hospital laboratories biotech accounts. * Ability learn new scientific concepts required 'hands on' approach supporting customers stages sales process. * Independent comfortable working limited support. * Enterprising willingness exceed customer expectations. * Excellent relationship builder class communicator. Role: * Responsible delivering annual sales plan, developing existing key accounts strategically identifying new development opportunities Clinical Diagnostics Laboratories / Clinical Chemistry. * Targeting high value potential customers leveraging influence existing accounts. * High degree autonomy (limited direct management) freedom carve profitable territory Netherlands. Company: * Global presence, offering 'best class' solutions. * Provides extensive house training company believes Technical Sales Specialist provide class post-sales support maintain close working relationships end users. * Currently experiencing excellent levels growth new product lines added portfolio, providing comprehensive solution Clinical Labs. Remuneration: order secure services right quality candidate client offering highly attractive salary commission package. Additional benefits include Car, Healthcare Pension. Zest Scientific working strict deadline order considered opportunity apply application reviewed immediately. Apply Ref no: 338477-2989"]
    ]
    }

## Scoring Response Payload

In [None]:
scoring_response = {
    "predictions": [
        {
            "fields": [
                "prediction",
                "probability"
            ],
            "values": [
                 [['Permanent', 'Pharmaceutical, Healthcare and Medical Sales'],
                  [[0.00040183096472598453, 8.667440726762661e-05, 5.9507069275522565e-05, 0.9994519875587308], 
                   [0.000165036231357427, 0.006449028871314254, 0.9933080247261301, 7.79101711981272e-05]
                  ]
                 ]
            ]
        }
    ]
}

### Construct the payload using the scoring_request and scoring_response and then log the records

In [None]:
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord

records_list=[]
for x in range(2):
    pl_record = PayloadRecord(request=scoring_request, response=scoring_response)
    records_list.append(pl_record)

wos_client.data_sets.store_records(data_set_id=payload_data_set_id, request_body=records_list)

### Make sure the records reached the payload logging table inside the OpenScale DataMart.

In [None]:
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    raise Exception("Payload logging did not happen!")

## Fetch the subscription details to confirm output data schemas are setup

In [None]:
subscription = wos_client.subscriptions.get(subscription_id).result.to_dict()
subscription

# Quality monitoring and feedback logging

## Enable quality monitoring

Evaluates the metrics for each target during the runtime evaluation and publishes the quality metrics for each target individually.

First, it turns on the quality monitor and sets an alert threshold of 80%. OpenScale will show an alert on the dashboard if the model accuracy measurement falls below this threshold.

The second paramater supplied, min_records, specifies the minimum number of feedback records OpenScale needs before it calculates a new measurement. The quality monitor runs hourly, but the accuracy reading in the dashboard will not change until an additional 10 feedback records have been added, via the user interface, the Python client, or the supplied feedback endpoint.

In [None]:
import time

target = Target(
        target_type=TargetTypes.SUBSCRIPTION,
        target_id=subscription_id
)
parameters = {
    "min_feedback_data_size": 10
}
thresholds = [
                {
                    "metric_id": "accuracy",
                    "type": "lower_limit",
                    "value": .80
                }
            ]
quality_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.QUALITY.ID,
    target=target,
    parameters=parameters,
    thresholds=thresholds
).result

In [None]:
quality_monitor_instance_id = quality_monitor_details.metadata.id
quality_monitor_instance_id

## Get feedback logging dataset ID

In [None]:
feedback_dataset_id = None
feedback_dataset = wos_client.data_sets.list(type=DataSetTypes.FEEDBACK, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result
feedback_dataset_id = feedback_dataset.data_sets[0].metadata.id
if feedback_dataset_id is None:
    print("Feedback data set not found. Please check quality monitor status.")

In [None]:
feedback_dataset_id

### Store the feedback payload using OpenScale Python SDK

Save the features, target column and scored data to the feedback table as per the Openscale multi-output payload format

In [None]:
feedback_data = []
feedback_payload =  {
    "fields": [
        "job_description",
        "job_type_category", "_original_prediction", "_original_probability"

    ],
    "values": [
        ["Field Service Engineer - Life Science Cambridge & Surrounding Areas Reporting Life Science Supervisor, Field Service Engineer provides aspects service support main face company customers. successful applicant supporting Centrifugation & Automation products Cambridge area allow easy access customers region. position instrumental providing high level support, requiring close interaction customer colleagues maintain quality service. Adhering customer focused strategies achieve business goals, increase customer satisfaction position Service Support positive sales differentiator increasingly competitive market. position time, days week. working days include Saturday Sunday Company requires. position primarily supporting customers London area, necessary travel regions Company requires. Main Roles Responsibilities: Develop maintain good working relationship customer Achieve departmental goals objectives meet Company's operational plan Work closely internal departments build relationships results timely implementation, installation pro-live support Ensure Preventative Maintenance Inspections modifications completed specified time frames Ensure accurate completion paperwork receipts Contribute environment quality perceived culture Maintain accurate record boot stock Adhere Company policies procedures Requirements: minimum HNC (or equivalent) relevant technical subject Minimum 2 years Field Service experience Good interpersonal skills ability communicate effectively internally externally levels Proven track record technical service organisation",
            ["Permanent", "Manufacturing & Operations"],
            ["Permanent", "Manufacturing & Operations"],
            [
                [0.0008271883892251415, 0.0001905242239474885, 0.0001257367882981827, 0.9988565505985292],
                [0.0001412213500395188, 0.9953277971306047, 0.003789043421195452, 0.0007419380981605825]
            ]
        ],
        ["Company: Benecol Position: Account Manager Territory: Bristol/South Wales/Gloucester Vacancy Type: Part-time Salary: Competitive Therapy Area: Nutrition Ashfield delighted recruiting Account Managers client Benecol. Benecol number cholesterol-lowering brand. Benecol foods launched UK Ireland 1999, range includes spreads, yogurts mini drinks. Today, Benecol foods range products preferred trusted millions people world delicious, convenient proven way lower cholesterol UK, Benecol number cholesterol-lowering food brand, continuing innovate bring new products market year. looking Account Managers support growth established brand raise awareness benefits Benecol deliver patients struggling high cholesterol. Key Responsibilities: Develop implement agreed territory Business Plans Design implement business territory plan line Benecol objectives Achieve positive endorsement Benecol product range influence HCPs Upskill",
         ["Part-Time", "Pharmaceutical, Healthcare and Medical Sales"],
            ["Part-Time", "Pharmaceutical, Healthcare and Medical Sales"],
            [[0.00017962062635380117, 0.00032392943350308866, 0.9842118268793946, 0.01528462306074844], 
             [2.0332798233189208e-05, 0.0003142799288142808, 0.9996487536164178, 1.6633656534704604e-05]
            ]
        ],
        ["Jim Gleeson recruiting Quality Assurance Associate join successful, mid-sized international pharmaceutical company good portfolio products market strong pipeline development. contract role months. RESPONSIBILITIES Quality Assurance Associate support Head Quality Assurance wide range operational coordination support activities. Key duties include: - Maintenance Quality Management System, databases training records - Contribution development maintenance company SOPs - Taking minutes QA-led meetings - Acting company's authorised personnel Document Storage Room QUALIFICATIONS Quality Assurance Associate require following: - relevant science nursing degree - QA administration experience gained CRO pharmaceutical company environment - Excellent communication, organisational time management skills - high degree literacy APPLY information apply Quality Assurance Associate position contact Jim Gleeson",
             ["Contract/Temp", "Quality-assurance"],
             ["Contract/Temp", "Quality-assurance"],
             [[0.00480696003180402, 0.9818955745604192, 0.0005742707769595733, 0.012723194630817124], 
              [0.0014341676263572817, 0.013487925015221662, 0.0019171066525658928, 0.9831608007058551]
             ]

        ],
        ["looking advance career clinical trials space? want work leading multinational organisation, renown industry meteoric rise success dependability deliver global contracts multiple therapeutic areas phases? looking play integral role future medicine globe? so, opportunity you: excellent opportunity candidate scientific background exposure clinical trials pharmaceutical industry play implemental role driving organisation success. Package description Synexus great team professional, smart energetic people. focus continued professional development, engaging supporting team inorder reach common goals business. offer competitive benefits package consisting of, limited to, 25 days annual leave, Life Assurance, Private Health Insurance*, Target Bonus Scheme, Childcare Vouchers, Healthcare Cash Plan, Employees Assistance programme high street discount scheme. * Dependent role Main responsibilities Data Coordinator 2you provide support data coordinators project related duties including ensuring accuracy data, producing guidelines data entry process entering data electronic data capture systems. develop data entry data cleaning processes according timelines, track CRF query totals projects. key accountabilities be: * Coordinating multiple projects whilst ensuring study documentation accurate date correct versions * reporting tools provide accurate weekly figures project managers * Following guidelines data entry process studies * Maintaining database track flow CRFs queries",     
            ["Permanent", "Data Management and Statistics"],
            ["Permanent", "Data Management and Statistics"],
            [
                [0.0016243309797038675, 5.426366978393848e-05, 3.772544649173256e-05, 0.9982836799040206],
                [0.9931888338647511, 0.0033713906851491043, 0.00013840429061979067, 0.0033013711594797705]
            ]
        ],
        ["MEDICAL REPRESENTATIVE - SUFFOLK client renowned pharmaceutical company strong commitment health care. truly global company, products sold 100 countries world-wide. drive excellence do, coupled dynamic culture company makes exciting rewarding place work. People greatest resource invest great deal developing people maximum potential. new position Primary Secondary Care Diabetes business. Candidates demonstrate tangibly personal sales successes capabilities including outstanding teamwork, influencing skills ability problem solve. employer choice offers excellent basic salary, competitive bonus benefits package. CHASE [Phone number removed] apply line Apply Ref no: 248128-HO020209",
            ["Contract/Interim", "Pharmaceutical, Healthcare and Medical Sales"],
            ["Permanent", "Pharmaceutical, Healthcare and Medical Sales"],
            [[0.005345609796164804, 0.0004323382677841009, 0.0005000226300820948, 0.993722029305969], 
             [0.008910298536723994, 0.09269483553034599, 0.8894957286714953, 0.00889913726143468]
            ]
        ]
    ]
 }


In [None]:
for i in range(4):
    feedback_data.append(feedback_payload)
wos_client.data_sets.store_records(feedback_dataset_id, request_body=feedback_data, background_mode=False)

### Wait for sometime, and make sure the records have reached to feedback table.

In [None]:
time.sleep(5)
feedback_records_count = wos_client.data_sets.get_records_count(feedback_dataset_id)
print("Number of records in the feedback logging table: {}".format(feedback_records_count))
if feedback_records_count == 0:
    raise Exception("feedback logging did not happen!")


## Run Quality Monitor

In [None]:
run_details = wos_client.monitor_instances.run(monitor_instance_id=quality_monitor_instance_id, background_mode=False).result

### Show Metrics

Shows the quality metrics for each target column

In [None]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=quality_monitor_instance_id,limit=100)

Congratulations!

You have finished configuring and evaluating quality monitor for multi output models in IBM Watson OpenScale.


<!-- ### Run on-demand Fairness
If you would like to peform an on-demand fairness check, then we need to score a fresh set of data with meta-fields, so that they would be used for indirect bias checking. So the below two cells will score and make sure these records are reached to payload logging table. -->