<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Watson OpenScale - Custom Machine Learning Provider

This notebook works correctly with kernel **`IBM Runtime 22.1 on Python 3.9 XS`** if using IBM Watson Studio or else use standard Python 3.9 runtime. Please update the runtime now.** It requires service credentials for the following services:
  * Watson OpenScale
  * A Custom ML provider which is hosted in a VM that can be accessible from CPD PODs, specifically OpenScale PODs namely ML Gateway fairness, quality, drift, and explain.
  * DB2 - as part of this notebook, we make use of an existing data mart.

  
The notebook will configure a OpenScale data mart subscription for Custom ML Provider deployment. We configure and execute the fairness, explain, quality and drift monitors.

## Custom Machine Learning Provider Setup
Following code can be used to start a gunicorn/flask application that can be hosted in a VM, such that it can be accessable from CPD system.
This code does the following:
* It wraps a Watson Machine Learning model that is deployed to a space.
* So the hosting application URL should contain the SPACE ID and the DEPLOYMENT ID. Then, the same can be used to talk to the target WML model/deployment.
* Having said that, this is only for this tutorial purpose, and you can define your Custom ML provider endpoint in any fashion you want, such that it wraps your own custom ML engine.
* The scoring request and response payload should confirm to the schema as described here at: https://dataplatform.cloud.ibm.com/docs/content/wsj/model/wos-frameworks-custom.html
* To start the application using the below code, make sure you install following python packages in your VM:

python -m pip install gunicorn
python -m pip install flask
python -m pip install numpy
python -m pip install pandas
python -m pip install requests
python -m pip install joblib==0.11
python -m pip install scipy==0.19.1
python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose
python -m pip install ibm_watson_machine_learning

-----------------

```
from flask import Flask, request, abort, jsonify
import json
import base64
import requests, io
import pandas as pd
from ibm_watson_machine_learning import APIClient

app = Flask(__name__)

WML_CREDENTIALS = {
                   "url": "https://namespace1-cpd-namespace1.apps.xxxxx.os.fyre.ibm.com",
                   "username": "admin",
                   "password" : "xxxx",
                   "instance_id": "wml_local",
                   "version" : "3.5"
                  }

@app.route('/spaces/<space_id>/deployments/<deployment_id>/predictions', methods=['POST'])
def wml_scoring(space_id, deployment_id):
	if not request.json:
		abort(400)
	wml_credentials = WML_CREDENTIALS
	payload_scoring = {
        "input_data": [
            request.json
        ]
    }

	wml_client = APIClient(wml_credentials)
	wml_client.set.default_space(space_id)

	records_list=[]
	scoring_response = wml_client.deployments.score(deployment_id, payload_scoring)
	return jsonify(scoring_response["predictions"][0])

if __name__ == '__main__':
    app.run(host='xxxx.fyre.ibm.com', port=9443, debug=True)
```
-----------------

# Setup <a name="setup"></a>

## Package installation

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [1]:
# IF you are not using IBM Watson Studio then install the below packages
!pip install --upgrade pandas==1.3.4 --no-cache | tail -n 1
!pip install --upgrade requests==2.26.0 --no-cache | tail -n 1
!pip install numpy==1.20.3 --no-cache | tail -n 1
!pip install scikit-learn==1.0.2 --no-cache | tail -n 1
!pip install SciPy --no-cache | tail -n 1
!pip install --upgrade ibm-watson-machine-learning --user | tail -n 1
!pip install --upgrade "ibm-watson-openscale~=3.0.34" --no-cache | tail -n 1

^C


In [None]:
!pip install lime --no-cache | tail -n 1
!pip install "ibm-wos-utils~=5.0.0" --no-cache | tail -n 1

### Action: restart the kernel!

## Configure credentials

- WOS_CREDENTIALS (CP4D)
- WML_CREDENTIALS (CP4D)
- DATABASE_CREDENTIALS (DB2 on CP4D or Cloud Object Storage (COS))
- SCHEMA_NAME

In [2]:
#masked
WOS_CREDENTIALS = {
    "url": "https://namespace1-cpd-namespace1.apps.xxxxx.os.fyre.ibm.com",
    "username": "admin",
    "password": "xxxxx",
    "version": "3.5"
}

In [3]:
ASSET_DEPLOYMENT_ID = "TO-BE-EDITED"
ASSET_DEPLOYMENT_NAME = "TO-BE-EDITED"
CUSTOM_ML_PROVIDER_SCORING_URL = 'https://xxxxx.fyre.ibm.com:9443/spaces/$SPACE_ID/deployments/$DEPLOYMENT_ID/predictions'
scoring_url = CUSTOM_ML_PROVIDER_SCORING_URL

In [4]:
label_column="Risk"
model_type = "binary"

In [5]:
import os
import base64
import json
import requests
from requests.auth import HTTPBasicAuth

## Save training data to Cloud Object Storage

### Cloud object storage details¶

In next cells, you will need to paste some credentials to Cloud Object Storage. If you haven't worked with COS yet please visit getting started with COS tutorial. You can find COS_API_KEY_ID and COS_RESOURCE_CRN variables in Service Credentials in menu of your COS instance. Used COS Service Credentials must be created with Role parameter set as Writer. Later training data file will be loaded to the bucket of your instance and used as training refecence in subsription. COS_ENDPOINT variable can be found in Endpoint field of the menu.

In [6]:
IAM_URL="https://iam.ng.bluemix.net/oidc/token"

In [7]:
# masked
COS_API_KEY_ID = "*****"
COS_RESOURCE_CRN = "*****"
COS_ENDPOINT = "https://s3.us.cloud-object-storage.appdomain.cloud" # Current list avaiable at https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints
BUCKET_NAME = "*****"
FILE_NAME = "german_credit_data_biased_training.csv"

# Load and explore data

In [8]:
!rm german_credit_data_biased_training.csv
!wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/german_credit_data_biased_training.csv

--2024-07-24 14:16:14--  https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/german_credit_data_biased_training.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8003::154, 2606:50c0:8002::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 689622 (673K) [text/plain]
Saving to: ‘german_credit_data_biased_training.csv’


2024-07-24 14:16:15 (4.67 MB/s) - ‘german_credit_data_biased_training.csv’ saved [689622/689622]



## Explore data

In [9]:
training_data_references = [
                {
                    "id": "Credit Risk",
                    "type": "s3",
                    "connection": {
                        "access_key_id": COS_API_KEY_ID,
                        "endpoint_url": COS_ENDPOINT,
                        "resource_instance_id":COS_RESOURCE_CRN
                    },
                    "location": {
                        "bucket": BUCKET_NAME,
                        "path": FILE_NAME,
                    }
                }
            ]

## Construct the scoring payload

In [10]:
import pandas as pd

df = pd.read_csv("german_credit_data_biased_training.csv")
df.head()

Unnamed: 0,CheckingStatus,LoanDuration,CreditHistory,LoanPurpose,LoanAmount,ExistingSavings,EmploymentDuration,InstallmentPercent,Sex,OthersOnLoan,...,OwnsProperty,Age,InstallmentPlans,Housing,ExistingCreditsCount,Job,Dependents,Telephone,ForeignWorker,Risk
0,0_to_200,31,credits_paid_to_date,other,1889,100_to_500,less_1,3,female,none,...,savings_insurance,32,none,own,1,skilled,1,none,yes,No Risk
1,less_0,18,credits_paid_to_date,car_new,462,less_100,1_to_4,2,female,none,...,savings_insurance,37,stores,own,2,skilled,1,none,yes,No Risk
2,less_0,15,prior_payments_delayed,furniture,250,less_100,1_to_4,2,male,none,...,real_estate,28,none,own,2,skilled,1,yes,no,No Risk
3,0_to_200,28,credits_paid_to_date,retraining,3693,less_100,greater_7,3,male,none,...,savings_insurance,32,none,own,1,skilled,1,none,yes,No Risk
4,no_checking,28,prior_payments_delayed,education,6235,500_to_1000,greater_7,3,male,none,...,unknown,57,none,own,2,skilled,1,none,yes,Risk


In [11]:
cols_to_remove = [label_column]
def get_scoring_payload(no_of_records_to_score = 1):

    for col in cols_to_remove:
        if col in df.columns:
            del df[col] 

    fields = df.columns.tolist()
    values = df[fields].values.tolist()

    payload_scoring ={"fields": fields, "values": values[:no_of_records_to_score]}  
    return payload_scoring

In [12]:
#debug
payload_scoring = get_scoring_payload(1)
payload_scoring

{'fields': ['CheckingStatus',
    'LoanDuration',
    'CreditHistory',
    'LoanPurpose',
    'LoanAmount',
    'ExistingSavings',
    'EmploymentDuration',
    'InstallmentPercent',
    'Sex',
    'OthersOnLoan',
    'CurrentResidenceDuration',
    'OwnsProperty',
    'Age',
    'InstallmentPlans',
    'Housing',
    'ExistingCreditsCount',
    'Job',
    'Dependents',
    'Telephone',
    'ForeignWorker'],
   'values': [['0_to_200',
     31,
     'credits_paid_to_date',
     'other',
     1889,
     '100_to_500',
     'less_1',
     3,
     'female',
     'none',
     3,
     'savings_insurance',
     32,
     'none',
     'own',
     1,
     'skilled',
     1,
     'none',
     'yes'],
    ['less_0',
     18,
     'credits_paid_to_date',
     'car_new',
     462,
     'less_100',
     '1_to_4',
     2,
     'female',
     'none',
     2,
     'savings_insurance',
     37,
     'stores',
     'own',
     2,
     'skilled',
     1,
     'none',
     'yes'],
    ['less_0',
     15,
   

## Method to perform scoring

In [13]:
def custom_ml_scoring():
    header = {"Content-Type": "application/json", "x":"y"}
    
    print(scoring_url)
    scoring_response = requests.post(scoring_url, json=payload_scoring, headers=header, verify=False)
    jsonify_scoring_response = scoring_response.json()
    return jsonify_scoring_response

## Method to perform payload logging

In [14]:
import uuid
scoring_id = None

In [15]:
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
def payload_logging(payload_scoring, scoring_response):
    scoring_id = str(uuid.uuid4())
    records_list=[]
    
    #manual PL logging for custom ml provider
    pl_record = PayloadRecord(scoring_id=scoring_id, request=payload_scoring, response=scoring_response, response_time=int(460))
    records_list.append(pl_record)
    wos_client.data_sets.store_records(data_set_id = payload_data_set_id, request_body=records_list)
    
    time.sleep(5)
    pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))
    return scoring_id

## Score the model and print the scoring response
### Sample Scoring

In [16]:
custom_ml_scoring()

https://cpd-cpd-instance.apps.wos415nfs2672.cp.fyre.ibm.com/ml/v4/deployments/2b976af0-e4ab-4859-af7d-2f2287d864ad/predictions?version=2021-05-01


{'predictions': [{'fields': ['prediction', 'probability'],
   'values': [['No Risk', [0.8077278137207031, 0.1922721564769745]],
    ['No Risk', [0.8498722314834595, 0.15012776851654053]],
    ['No Risk', [0.8721302151679993, 0.12786978483200073]],
    ['No Risk', [0.7591236233711243, 0.24087636172771454]],
    ['Risk', [0.3296038508415222, 0.6703961491584778]],
    ['Risk', [0.11021614074707031, 0.8897838592529297]],
    ['No Risk', [0.6773909330368042, 0.3226090669631958]],
    ['No Risk', [0.7757553458213806, 0.2242446392774582]],
    ['No Risk', [0.863286018371582, 0.13671396672725677]],
    ['Risk', [0.101570725440979, 0.898429274559021]],
    ['No Risk', [0.9411729574203491, 0.05882701277732849]],
    ['No Risk', [0.8737754225730896, 0.1262245625257492]],
    ['Risk', [0.4776124358177185, 0.5223875641822815]],
    ['No Risk', [0.7123474478721619, 0.28765255212783813]],
    ['No Risk', [0.7879252433776855, 0.21207472681999207]],
    ['No Risk', [0.6823184490203857, 0.31768152117729

# Configure OpenScale 

The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [17]:
from ibm_watson_openscale import APIClient
from ibm_watson_openscale.utils import *
from ibm_watson_openscale.supporting_classes import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.base_classes.watson_open_scale_v2 import *
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

import json
import requests
import base64
from requests.auth import HTTPBasicAuth
import time

## Get a instance of the OpenScale SDK client

In [18]:
authenticator = CloudPakForDataAuthenticator(
        url=WOS_CREDENTIALS['url'],
        username=WOS_CREDENTIALS['username'],
        password=WOS_CREDENTIALS['password'],
        disable_ssl_verification=True
    )

wos_client = APIClient(service_url=WOS_CREDENTIALS['url'],authenticator=authenticator)
wos_client.version

'3.0.39'

## Set up datamart

Watson OpenScale uses a database to store payload logs and calculated metrics. If database credentials were not supplied above, the notebook will use the free, internal lite database. If database credentials were supplied, the datamart will be created there unless there is an existing datamart and the KEEP_MY_INTERNAL_POSTGRES variable is set to True. If an OpenScale datamart exists in Db2 or PostgreSQL, the existing datamart will be used and no data will be overwritten.

Prior instances of the model will be removed from OpenScale monitoring.

In [19]:
wos_client.data_marts.show()

0,1,2,3,4,5
AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000,Data Mart created by OpenScale ExpressPath,False,active,2024-06-04 05:19:03.698000+00:00,00000000-0000-0000-0000-000000000000


In [20]:
data_marts = wos_client.data_marts.list().result.data_marts
if len(data_marts) == 0:
    raise Exception("Missing data mart.")
data_mart_id=data_marts[0].metadata.id
print('Using existing datamart {}'.format(data_mart_id))

Using existing datamart 00000000-0000-0000-0000-000000000000


In [21]:
data_mart_details = wos_client.data_marts.list().result.data_marts[0]
data_mart_details.to_dict()

{'metadata': {'id': '00000000-0000-0000-0000-000000000000',
  'crn': 'crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:data_mart:00000000-0000-0000-0000-000000000000',
  'url': '/v2/data_marts/00000000-0000-0000-0000-000000000000',
  'created_at': '2024-06-04T05:19:03.698000Z',
  'created_by': 'cpadmin',
  'modified_at': '2024-06-04T05:19:11.402000Z',
  'modified_by': 'cpadmin'},
 'entity': {'name': 'AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000',
  'description': 'Data Mart created by OpenScale ExpressPath',
  'service_instance_crn': 'N/A',
  'internal_database': False,
  'database_configuration': {'database_type': 'db2',
   'credentials': {'secret_id': '3bd1e5c0-6987-4719-a9f8-7dc94bfac74d'},
   'location': {'schema_name': 'AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000'}},
  'status': {'state': 'active'}}}

In [22]:
wos_client.service_providers.show()

0,1,2,3,4,5
,active,Custom ML Provider Demo - All Monitors-2,custom_machine_learning,2024-07-24 08:21:21.834000+00:00,0b4a4621-3506-4b39-9817-e0bae38a11fa
00000000-0000-0000-0000-000000000000,active,WML_IAE6,watson_machine_learning,2024-07-17 06:01:27.589000+00:00,e7e9f4fe-5a9d-4ac8-b284-8bc3dabb749b
99999999-9999-9999-9999-999999999999,active,Image Binary WML V2_test,watson_machine_learning,2024-07-09 14:59:27.678000+00:00,0f363199-46fd-496e-8d92-e83129583ab4
,active,RC - OpenScale Headless Service Provider,custom_machine_learning,2024-07-09 13:44:40.294000+00:00,76b7ac24-760c-4682-b6c9-c2b20bafc39b
99999999-9999-9999-9999-999999999999,active,GCR Auto AI prod,watson_machine_learning,2024-07-04 06:42:11.028000+00:00,43063a84-b760-479c-a2ca-bc5d60f929fd
99999999-9999-9999-9999-999999999999,active,GCR AutoAI space Demo,watson_machine_learning,2024-07-03 14:31:13.702000+00:00,0dcfefc6-3b44-4387-a2e1-b1a693f38f62
,active,IAE7,custom_machine_learning,2024-07-02 09:08:06.273000+00:00,184e73a2-7fd8-4f3f-b994-bb648f6eb8ec
,active,IAE6,custom_machine_learning,2024-07-02 09:04:22.643000+00:00,644befcd-6d36-4f4d-a30a-cd51a28b63fe
,active,WML_IAE5,custom_machine_learning,2024-07-02 08:43:08.723000+00:00,e986a0d7-8187-4fed-ab2a-c614a9683cae
,active,WML_IAE4,custom_machine_learning,2024-07-02 07:00:41.816000+00:00,d90c6bf2-49c6-4179-9876-8b85b0247d95


Note: First 10 records were displayed.


## Remove existing service provider connected with used WML instance.

Multiple service providers for the same engine instance are avaiable in Watson OpenScale. To avoid multiple service providers of used WML instance in the tutorial notebook the following code deletes existing service provder(s) and then adds new one.

In [23]:
SERVICE_PROVIDER_NAME = "Custom ML Provider Demo - All Monitors"
SERVICE_PROVIDER_DESCRIPTION = "Added by tutorial WOS notebook to showcase monitoring Fairness, Quality, Drift and Explainability against a Custom ML provider."

In [24]:
service_providers = wos_client.service_providers.list().result.service_providers
for service_provider in service_providers:
    service_instance_name = service_provider.entity.name
    if service_instance_name == SERVICE_PROVIDER_NAME:
        service_provider_id = service_provider.metadata.id
        wos_client.service_providers.delete(service_provider_id)
        print("Deleted existing service_provider for WML instance: {}".format(service_provider_id))

Deleted existing service_provider for WML instance: 0b4a4621-3506-4b39-9817-e0bae38a11fa


## Add service provider

Watson OpenScale needs to be bound to the Watson Machine Learning instance to capture payload data into and out of the model.
Note: You can bind more than one engine instance if needed by calling wos_client.service_providers.add method. Next, you can refer to particular service provider using service_provider_id.

In [25]:
request_headers = {"Content-Type": "application/json"}
MLCredentials = {}
added_service_provider_result = wos_client.service_providers.add(
        name=SERVICE_PROVIDER_NAME,
        description=SERVICE_PROVIDER_DESCRIPTION,
        service_type=ServiceTypes.CUSTOM_MACHINE_LEARNING,
        request_headers=request_headers,
        operational_space_id = "production",
        credentials=MLCredentials,
        background_mode=False
    ).result
service_provider_id = added_service_provider_result.metadata.id




 Waiting for end of adding service provider fd4b59aa-af9b-427a-90ba-ab243226df2c 




active

-----------------------------------------------
 Successfully finished adding service provider 
-----------------------------------------------




In [26]:
print(wos_client.service_providers.get(service_provider_id).result)

{
  "metadata": {
    "id": "fd4b59aa-af9b-427a-90ba-ab243226df2c",
    "crn": "crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:service_provider:fd4b59aa-af9b-427a-90ba-ab243226df2c",
    "url": "/v2/service_providers/fd4b59aa-af9b-427a-90ba-ab243226df2c",
    "created_at": "2024-07-24T08:46:48.131000Z",
    "created_by": "cpadmin"
  },
  "entity": {
    "name": "Custom ML Provider Demo - All Monitors-2",
    "service_type": "custom_machine_learning",
    "credentials": {
      "secret_id": "e8bc75a0-c630-4d7a-8381-dee41f99c98e"
    },
    "request_headers": {
      "Authorization": "Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6ImM1NG4yLWRFSU9NS0ZsNHUwZFZyaW5VcE1EazdRSFdPV2h2Y19oQnpleW8ifQ.eyJ1c2VybmFtZSI6ImNwYWRtaW4iLCJyb2xlIjoiQWRtaW4iLCJwZXJtaXNzaW9ucyI6WyJtYW5hZ2VfY2F0YWxvZyIsImFkZF9jYXRhbG9nX2Fzc2V0c190b19wcm9qZWN0cyIsImFkbWluaXN0cmF0b3IiLCJjYW5fcHJvdmlzaW9uIiwibW9uaXRvcl9wbGF0Zm9ybSIsImNvbmZpZ3VyZV9wbGF0Zm9ybSIsInZpZXdfcGxhdGZvcm1faGVhbH

In [27]:
print('Data Mart ID : ' + data_mart_id)
print('Service Provider ID : ' + service_provider_id)

Data Mart ID : 00000000-0000-0000-0000-000000000000
Service Provider ID : fd4b59aa-af9b-427a-90ba-ab243226df2c


## Subscriptions

Remove existing credit risk subscriptions

This code removes previous subscriptions to the model to refresh the monitors with the new model and new data.

In [28]:
wos_client.subscriptions.show()

0,1,2,3,4,5,6,7,8,9
592b902d-3dc9-4e56-8bcb-86cbf1a6d8a9,model,gcr - P2 XGB Classifier - Model,00000000-0000-0000-0000-000000000000,2b976af0-e4ab-4859-af7d-2f2287d864ad,gcr model,4d2f2fb2-6b64-4d58-8f13-257166e468e9,active,2024-07-17 07:11:44.727000+00:00,e2df4ec7-6c75-416f-a444-8d21389f7513
438ca544-9bd1-48c2-8e8d-3de4ef4ca79b,model,WML_IAE4,00000000-0000-0000-0000-000000000000,78a0af9e-1014-4fb1-b22a-5e11f4fd70e7,WML_IAE4,d90c6bf2-49c6-4179-9876-8b85b0247d95,active,2024-07-02 07:03:31.504000+00:00,e34b9b87-b6e1-4c53-b92e-cb80dea042be
172ef5c8-096b-4133-b159-f3d3c5586e5b,model,Dog-Cat binary,00000000-0000-0000-0000-000000000000,c99073b2-daab-4bb0-9fa8-82d97fb47f89,Dog-Cat binary deployment,0f363199-46fd-496e-8d92-e83129583ab4,active,2024-07-09 15:00:41.072000+00:00,775afc50-ff8a-4846-9db2-6dcdc9916ae6
25a7e048-2e21-4aec-b035-1863853951ab,model,[asset] RC - GCR Headless Subscription,00000000-0000-0000-0000-000000000000,aa6d890c-9ee6-4d54-aa8e-bb94b54daab0,[asset] RC - GCR Headless Subscription,76b7ac24-760c-4682-b6c9-c2b20bafc39b,active,2024-07-09 13:46:10.011000+00:00,2d6b0008-6d7a-4830-9fb0-cb938dacbbb4
91ad45c3-0bda-4ea3-a35f-7c814f7987ad,model,GCR AutoAI Demo - P4 Snap Random Forest Classifier - Model,00000000-0000-0000-0000-000000000000,5382d5ef-41f5-44b4-994a-8bd1d948b35e,GCR Auto AI prod,43063a84-b760-479c-a2ca-bc5d60f929fd,active,2024-07-04 06:42:29.105000+00:00,dd9b03e6-fecd-47dc-b277-7d8545220c5d
19754024-651a-4eff-a3e6-044e6c61867a,model,GCR AutoAI Demo - P4 Snap Random Forest Classifier - Model,00000000-0000-0000-0000-000000000000,452fdeb6-7b59-4f8f-b3f7-4ef8421c92cf,GCR AutoAI Demo,0dcfefc6-3b44-4387-a2e1-b1a693f38f62,active,2024-07-03 15:19:39.096000+00:00,bbecb696-24aa-489b-9f4a-0fd70cd50214
e3ac9fc3-bccf-4a4e-b37b-490bfb93dd81,model,GCR AutoAI - P2 XGB Classifier - Model,00000000-0000-0000-0000-000000000000,6399e6e8-df5a-4370-9af4-34b2f2e76bc6,GCR Auto AI,a7ca157a-de07-457a-8c4c-b1a2e998699c,active,2024-07-03 10:37:20.487000+00:00,ce36911c-75f6-4f99-b4d2-b71e5d55a802
592b902d-3dc9-4e56-8bcb-86cbf1a6d8a9,model,gcr - P2 XGB Classifier - Model,00000000-0000-0000-0000-000000000000,2b976af0-e4ab-4859-af7d-2f2287d864ad,gcr model,4d2f2fb2-6b64-4d58-8f13-257166e468e9,active,2024-07-02 15:33:17.454000+00:00,e96278a6-7190-48f3-b8bd-945fa48cfe50
327d8aea-ecfc-4990-9bb9-601a1695094d,model,GCR AutoAI - P2 XGB Classifier - Model,00000000-0000-0000-0000-000000000000,755c3e75-24b5-4839-8a8f-3f85c07a40c9,GCR demo,a7ca157a-de07-457a-8c4c-b1a2e998699c,active,2024-07-02 12:03:14.224000+00:00,a0f86241-8bfc-4322-895a-2597512e1653
b21904ef-7478-4dae-b93f-c120e95c9200,model,My SDK Batch Subscription-db2,00000000-0000-0000-0000-000000000000,a10a121b-2394-4fe7-9a2d-f0520abd212c,My SDK Batch Subscription-db2,644befcd-6d36-4f4d-a30a-cd51a28b63fe,active,2024-07-02 09:04:38.883000+00:00,70c9c394-2530-4c0f-b0b2-6b81e44612d0


Note: First 10 records were displayed.


## Remove the existing subscription

In [29]:
SUBSCRIPTION_NAME = "Custom ML Subscription - All Monitors"

In [30]:
subscriptions = wos_client.subscriptions.list().result.subscriptions
for subscription in subscriptions:
    if subscription.entity.asset.name == "[asset] " + SUBSCRIPTION_NAME:
        sub_model_id = subscription.metadata.id
        wos_client.subscriptions.delete(subscription.metadata.id)
        print('Deleted existing subscription for model', sub_model_id)

This code creates the model subscription in OpenScale using the Python client API. Note that we need to provide the model unique identifier, and some information about the model itself.

In [31]:
feature_columns=["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"]
cat_features=["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]

In [32]:
import uuid
asset_id = str(uuid.uuid4())
url = ''


asset_name = '[asset] ' + SUBSCRIPTION_NAME
asset_deployment_id = ASSET_DEPLOYMENT_ID
asset_deployment_name = ASSET_DEPLOYMENT_NAME
asset_deployment_scoring_url = scoring_url

scoring_endpoint_url = scoring_url
scoring_request_headers = {
        "Content-Type": "application/json",
        "x":"y"
    }

In [33]:
subscription_details = wos_client.subscriptions.add(
        data_mart_id=data_mart_id,
        service_provider_id=service_provider_id,
        asset=Asset(
            asset_id=asset_id,
            name=asset_name,
            url=url,
            asset_type=AssetTypes.MODEL,
            input_data_type=InputDataType.STRUCTURED,
            problem_type=ProblemType.BINARY_CLASSIFICATION
        ),
        deployment=AssetDeploymentRequest(
            deployment_id=asset_deployment_id,
            name=asset_deployment_name,
            deployment_type= DeploymentTypes.ONLINE,
            scoring_endpoint=ScoringEndpointRequest(
                url=scoring_endpoint_url,
                request_headers=scoring_request_headers
            )
        ),
        asset_properties=AssetPropertiesRequest(
            label_column=label_column,
            probability_fields=["probability"],
            prediction_field="predictedLabel",
            feature_fields = feature_columns,
            categorical_fields = cat_features,
            training_data_reference=TrainingDataReference(type="cos",
                                                          location=COSTrainingDataReferenceLocation(bucket = BUCKET_NAME,
                                                                                                    file_name = FILE_NAME),
                                                          connection=COSTrainingDataReferenceConnection.from_dict({
                                                                        "resource_instance_id": COS_RESOURCE_CRN,
                                                                        "url": COS_ENDPOINT,
                                                                        "api_key": COS_API_KEY_ID,
                                                                        "iam_url": IAM_URL}))
        )
    ).result
subscription_id = subscription_details.metadata.id

In [34]:
print('Subscription ID: ' + subscription_id)

Subscription ID: db717069-7f62-46a7-af34-6063b206e551


In [35]:
import time

time.sleep(5)
payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id:", payload_data_set_id)

### Before the payload logging

wos_client.subscriptions.get(subscription_id).result.to_dict()

# Score the model so we can configure monitors

Now that the WML service has been bound and the subscription has been created, we need to send a request to the model before we configure OpenScale. This allows OpenScale to create a payload log in the datamart with the correct schema, so it can capture data coming into and out of the model.

In [36]:
no_of_records_to_score = 100

### Construct the scoring payload

In [37]:
payload_scoring = get_scoring_payload(no_of_records_to_score)

### Perform the scoring against the Custom ML Provider

In [38]:
scoring_response = custom_ml_scoring()

https://cpd-cpd-instance.apps.wos415nfs2672.cp.fyre.ibm.com/ml/v4/deployments/2b976af0-e4ab-4859-af7d-2f2287d864ad/predictions?version=2021-05-01


### Perform payload logging by passing the scoring payload and scoring response

In [39]:
scoring_id = payload_logging(payload_scoring, scoring_response)

### The scoring id, which would be later used for explanation of the randomly picked transactions

In [40]:
print('scoring_id: ' + str(scoring_id))

scoring_id: 2737144b-cd4e-457f-97da-717a9ba4287b


# Fairness configuration <a name="Fairness"></a>

The code below configures fairness monitoring for our model. It turns on monitoring for two features, sex and age. In each case, we must specify:
    
Which model feature to monitor One or more majority groups, which are values of that feature that we expect to receive a higher percentage of favorable outcomes One or more minority groups, which are values of that feature that we expect to receive a higher percentage of unfavorable outcomes The threshold at which we would like OpenScale to display an alert if the fairness measurement falls below (in this case, 80%) Additionally, we must specify which outcomes from the model are favourable outcomes, and which are unfavourable. We must also provide the number of records OpenScale will use to calculate the fairness score. In this case, OpenScale's fairness monitor will run hourly, but will not calculate a new fairness rating until at least 100 records have been added. Finally, to calculate fairness, OpenScale must perform some calculations on the training data, so we provide the dataframe containing the data.

### Create Fairness Monitor Instance

In [41]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id

)
parameters = {
    "features": [
        {"feature": "Sex",
         "majority": ['male'],
         "minority": ['female']
         },
        {"feature": "Age",
         "majority": [[26, 75]],
         "minority": [[18, 25]]
         }
    ],
    "favourable_class": ["No Risk"],
    "unfavourable_class": ["Risk"],
    "min_records": 10
}
thresholds = [{
    "metric_id": "fairness_value",
    "specific_values": [{
            "applies_to": [{
                "key": "feature",
                "type": "tag",
                "value": "Age"
            }],
            "value": 95
        },
        {
            "applies_to": [{
                "key": "feature",
                "type": "tag",
                "value": "Sex"
            }],
            "value": 95
        }
    ],
    "type": "lower_limit",
    "value": 80.0
}]

fairness_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.FAIRNESS.ID,
    target=target,
    parameters=parameters,
    thresholds=thresholds).result




 Waiting for end of monitor instance creation 26b26f38-9015-459e-a32c-1fc039ed400c 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




In [42]:
fairness_monitor_instance_id = fairness_monitor_details.metadata.id

### Get Fairness Monitor Instance

In [43]:
wos_client.monitor_instances.show()

0,1,2,3,4,5,6
00000000-0000-0000-0000-000000000000,error,e1010aed-4c93-46c2-9523-82b724409806,subscription,fairness,2024-07-24 08:22:46.300000+00:00,7ab42364-ab61-4a16-91ab-21b44fe0b3ab
00000000-0000-0000-0000-000000000000,active,dec4e572-3ae2-47ed-b61c-38b9b95d2c24,subscription,drift,2024-06-13 11:20:05.738000+00:00,4a6e0c07-923d-4c97-a8b4-8a0de0af1683
00000000-0000-0000-0000-000000000000,active,e1010aed-4c93-46c2-9523-82b724409806,subscription,model_health,2024-07-24 08:21:46.561000+00:00,3d569ebc-482a-48a5-a310-f343d50c21a5
00000000-0000-0000-0000-000000000000,active,e1010aed-4c93-46c2-9523-82b724409806,subscription,performance,2024-07-24 08:21:47.381000+00:00,e2c4f206-193f-44cc-a1da-7c2698a42508
00000000-0000-0000-0000-000000000000,active,cc0a7062-56ef-4feb-91aa-33764217ad77,subscription,model_health,2024-06-20 14:21:30.390000+00:00,7dcdf7ac-7cba-4a07-914f-b6a58b499d6a
00000000-0000-0000-0000-000000000000,active,1b8ebb3c-e69a-4cc1-9a0a-57424546cab3,subscription,model_health,2024-07-01 17:17:45.382000+00:00,1d0924c1-7f1a-419c-b2bc-bff1f9a5d5d0
00000000-0000-0000-0000-000000000000,active,7c0c6db0-0c7f-415b-85f1-cae28daded5b,subscription,mrm,2024-06-14 03:16:37.493000+00:00,87fce793-9360-4782-8b02-1f83bd4b8b2f
00000000-0000-0000-0000-000000000000,active,7c0c6db0-0c7f-415b-85f1-cae28daded5b,subscription,model_health,2024-06-14 03:16:37.532000+00:00,bf46efe7-3db2-4627-92db-e4119cb9bbe0
00000000-0000-0000-0000-000000000000,active,89646855-3162-4b5a-aafe-0a1e8a2a3946,subscription,model_health,2024-06-20 14:12:19.861000+00:00,d52ee902-d6aa-4efb-9640-a9c50a5da6a7
00000000-0000-0000-0000-000000000000,active,a0f86241-8bfc-4322-895a-2597512e1653,subscription,mrm,2024-07-02 12:03:19.615000+00:00,fd149fb2-1685-4dca-badb-0c783a82c174


Note: First 10 records were displayed.


### Get run details
In case of production subscription, initial monitoring run is triggered internally. Checking its status

In [44]:
runs = wos_client.monitor_instances.list_runs(fairness_monitor_instance_id, limit=1).result.to_dict()

In [45]:
fairness_monitoring_run_id = runs["runs"][0]["metadata"]["id"]
run_status = None
while(run_status not in ["finished", "error"]):
    run_details = wos_client.monitor_instances.get_run_details(fairness_monitor_instance_id, fairness_monitoring_run_id).result.to_dict()
    run_status = run_details["entity"]["status"]["state"]
    print('run_status: ', run_status)
    if run_status in ["finished", "error"]:
        break
    time.sleep(10)


run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  finished


### Fairness run output

In [46]:
wos_client.monitor_instances.get_run_details(fairness_monitor_instance_id, fairness_monitoring_run_id).result.to_dict()

{'metadata': {'id': '461076a4-84a1-479d-bbc3-45aa20826a54',
  'crn': 'crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:run:461076a4-84a1-479d-bbc3-45aa20826a54',
  'url': '/v2/monitor_instances/26b26f38-9015-459e-a32c-1fc039ed400c/runs/461076a4-84a1-479d-bbc3-45aa20826a54',
  'created_at': '2024-07-24T10:30:38.880000Z',
  'created_by': 'internal-service'},
 'entity': {'triggered_by': 'user',
  'parameters': {'favourable_class': ['No Risk'],
   'features': [{'feature': 'Sex',
     'majority': ['male'],
     'minority': ['female']},
    {'feature': 'Age', 'majority': [[26, 75]], 'minority': [[18, 25]]}],
   'min_records': 10,
   'spark_job_polling_schedule_id': 'e1d01414-c170-4fff-b8c0-3ab4cefe0ae5',
   'total_records_processed': 12,
   'unfavourable_class': ['Risk']},
  'status': {'state': 'finished',
   'queued_at': '2024-07-24T10:30:38.873000Z',
   'started_at': '2024-07-24T10:30:40.576000Z',
   'updated_at': '2024-07-24T10:35:45.485000Z',
   'compl

In [47]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=fairness_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2024-07-24 10:35:45.039309+00:00,fairness_value,fd763aea-3b87-435b-9aa9-31cdb77b2d14,175.00000000000003,95.0,,"['feature:Sex', 'fairness_metric_type:fairness', 'feature_value:female']",fairness,26b26f38-9015-459e-a32c-1fc039ed400c,461076a4-84a1-479d-bbc3-45aa20826a54,subscription,e34b9b87-b6e1-4c53-b92e-cb80dea042be
2024-07-24 10:35:45.039309+00:00,fairness_value,fd763aea-3b87-435b-9aa9-31cdb77b2d14,142.85714285714286,95.0,,"['feature:Age', 'fairness_metric_type:fairness', 'feature_value:18 - 25']",fairness,26b26f38-9015-459e-a32c-1fc039ed400c,461076a4-84a1-479d-bbc3-45aa20826a54,subscription,e34b9b87-b6e1-4c53-b92e-cb80dea042be


# Configure Explainability <a name="explain"></a>
We provide OpenScale with the training data to enable and configure the explainability features.

In [48]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)
parameters = {
    # Comment the below lines to disable lime global explanation. 
    # Lime global explanation is available from Cloud Pak for Data version 4.6.4 onwards.
    "global_explanation": {
        "enabled": True,  # Flag to enable global explanation 
        "explanation_method": "lime",
        # "sample_size": 1000, # [Optional] The sample size of records to be used for generating payload data global explanation. If not specified entire data in the payload window is used.
    }
}
explain_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.EXPLAINABILITY.ID,
    target=target,
    parameters=parameters
).result

explain_monitor_details.metadata.id




 Waiting for end of monitor instance creation 57232063-65c8-4519-b96c-e8ab12bc471f 




preparing
active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




In [49]:
scoring_ids = []
sample_size = 2
import random
for i in range(0, sample_size):
    n = random.randint(1,100)
    scoring_ids.append(scoring_id + '-' + str(n))
print("Running explanations on scoring IDs: {}".format(scoring_ids))

Running explanations on scoring IDs: ['2737144b-cd4e-457f-97da-717a9ba4287b-53', '2737144b-cd4e-457f-97da-717a9ba4287b-24']


In [50]:
explanation_types = ["lime", "contrastive"]
result = wos_client.monitor_instances.explanation_tasks(scoring_ids=scoring_ids, explanation_types=explanation_types, subscription_id=subscription_id).result
print(result)

{
  "metadata": {
    "explanation_task_ids": [
      "d0430ad8-31f9-4088-a2a2-85bb93073d47",
      "37912fdb-8b1e-49bc-b90c-78c18f489bdc"
    ],
    "created_by": "1000331001",
    "created_at": "2024-07-26T05:53:05.086894Z"
  }
}


### Explanation tasks

In [51]:
explanation_task_ids=result.metadata.explanation_task_ids
explanation_task_ids

['d0430ad8-31f9-4088-a2a2-85bb93073d47',
 '37912fdb-8b1e-49bc-b90c-78c18f489bdc']

### Wait for the explanation tasks to complete - all of them

In [52]:
import time
def finish_explanation_tasks():
    finished_explanations = []
    finished_explanation_task_ids = []
    
    # Check for the explanation task status for finished status. 
    # If it is in-progress state, then sleep for some time and check again. 
    # Perform the same for couple of times, so that all tasks get into finished state.
    for i in range(0, 5):
        # for each explanation
        print('iteration ' + str(i))
        
        #check status for all explanation tasks
        for explanation_task_id in explanation_task_ids:
            if explanation_task_id not in finished_explanation_task_ids:
                result = wos_client.monitor_instances.get_explanation_tasks(explanation_task_id=explanation_task_id, subscription_id=subscription_id).result
                print(explanation_task_id + ' : ' + result.entity.status.state)
                if (result.entity.status.state == 'finished' or result.entity.status.state == 'error') and explanation_task_id not in finished_explanation_task_ids:
                    finished_explanation_task_ids.append(explanation_task_id)
                    finished_explanations.append(result)


        # if there is altest one explanation task that is not yet completed, then sleep for sometime, 
        # and check for all those tasks, for which explanation is not yet completeed.
        
        if len(finished_explanation_task_ids) != sample_size:
            print('sleeping for some time..')
            time.sleep(10)
        else:
            break
                    
    return finished_explanations

### You may have to run the below multiple times till all explanation tasks are either finished or error'ed.

In [53]:
finished_explanations = finish_explanation_tasks()

iteration 0
d0430ad8-31f9-4088-a2a2-85bb93073d47 : in_progress
37912fdb-8b1e-49bc-b90c-78c18f489bdc : in_progress
sleeping for some time..
iteration 1
d0430ad8-31f9-4088-a2a2-85bb93073d47 : in_progress
37912fdb-8b1e-49bc-b90c-78c18f489bdc : in_progress
sleeping for some time..
iteration 2
d0430ad8-31f9-4088-a2a2-85bb93073d47 : finished
37912fdb-8b1e-49bc-b90c-78c18f489bdc : finished


In [54]:
for i in finished_explanations:
    print(i)

{
  "metadata": {
    "explanation_task_id": "d0430ad8-31f9-4088-a2a2-85bb93073d47",
    "created_by": "1000331001",
    "created_at": "2024-07-26T05:54:14.740694Z",
    "updated_at": "2024-07-26T05:54:14.740781Z"
  },
  "entity": {
    "status": {
      "state": "finished"
    },
    "asset": {
      "id": "438ca544-9bd1-48c2-8e8d-3de4ef4ca79b",
      "name": "WML_IAE4",
      "input_data_type": "structured",
      "problem_type": "binary",
      "deployment": {
        "id": "78a0af9e-1014-4fb1-b22a-5e11f4fd70e7",
        "name": "WML_IAE4"
      }
    },
    "scoring_id": "2737144b-cd4e-457f-97da-717a9ba4287b-53"
  }
}
{
  "metadata": {
    "explanation_task_id": "37912fdb-8b1e-49bc-b90c-78c18f489bdc",
    "created_by": "1000331001",
    "created_at": "2024-07-26T05:54:14.741106Z",
    "updated_at": "2024-07-26T05:54:14.741143Z"
  },
  "entity": {
    "status": {
      "state": "finished"
    },
    "asset": {
      "id": "438ca544-9bd1-48c2-8e8d-3de4ef4ca79b",
      "name": "WML_IA

In [55]:
len(finished_explanations)

2

In [56]:
def construct_explanation_features_map(feature_name, feature_weight):
    if feature_name in explanation_features_map:
        explanation_features_map[feature_name].append(feature_weight)
    else:
        explanation_features_map[feature_name] = [feature_weight]

In [57]:
explanation_features_map = {}
for result in finished_explanations:
    print('\n>>>>>>>>>>>>>>>>>>>>>>\n')
    print('explanation task: ' + str(result.metadata.explanation_task_id) + ', perturbed:' + str(result.entity.perturbed))
    if result.entity.explanations is not None:
        explanations = result.entity.explanations
        for explanation in explanations:
            if 'predictions' in explanation:
                predictions = explanation['predictions']
                for prediction in predictions:
                    predicted_value = prediction['value']
                    probability = prediction['probability']
                    print('prediction : ' + str(predicted_value) + ', probability : ' + str(probability))
                    if 'explanation_features' in prediction:
                        explanation_features = prediction['explanation_features']
                        for explanation_feature in explanation_features:
                            feature_name = explanation_feature['feature_name']
                            feature_weight = explanation_feature['weight']
                            if (feature_weight >= 0 ):
                                feature_weight_percent = round(feature_weight * 100, 2)
                                print(str(feature_name) + ' : ' + str(feature_weight_percent))
                                task_feature_weight_map = {}
                                task_feature_weight_map[result.metadata.explanation_task_id] = feature_weight_percent
                                construct_explanation_features_map(feature_name, feature_weight_percent)
        print('\n>>>>>>>>>>>>>>>>>>>>>>\n')
explanation_features_map


>>>>>>>>>>>>>>>>>>>>>>

explanation task: d0430ad8-31f9-4088-a2a2-85bb93073d47, perturbed:None

>>>>>>>>>>>>>>>>>>>>>>

explanation task: 37912fdb-8b1e-49bc-b90c-78c18f489bdc, perturbed:None


{}

In [58]:
import matplotlib.pyplot as plt
for key in explanation_features_map.keys():
    #plot_graph(key, explanation_features_map[key])
    values = explanation_features_map[key]
    plt.title(key)
    plt.ylabel('Weight')
    plt.bar(range(len(values)), values)
    plt.show()

# Quality monitoring and feedback logging <a name="quality"></a>

## Enable quality monitoring

The code below waits ten seconds to allow the payload logging table to be set up before it begins enabling monitors. First, it turns on the quality (accuracy) monitor and sets an alert threshold of 70%. OpenScale will show an alert on the dashboard if the model accuracy measurement (area under the curve, in the case of a binary classifier) falls below this threshold.

The second paramater supplied, min_records, specifies the minimum number of feedback records OpenScale needs before it calculates a new measurement. The quality monitor runs hourly, but the accuracy reading in the dashboard will not change until an additional 50 feedback records have been added, via the user interface, the Python client, or the supplied feedback endpoint.

In [59]:
import time

#time.sleep(10)
target = Target(
        target_type=TargetTypes.SUBSCRIPTION,
        target_id=subscription_id
)
parameters = {
    "min_feedback_data_size": 90
}
thresholds = [
                {
                    "metric_id": "area_under_roc",
                    "type": "lower_limit",
                    "value": .80
                }
            ]
quality_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.QUALITY.ID,
    target=target,
    parameters=parameters,
    thresholds=thresholds
).result




 Waiting for end of monitor instance creation e36dcc66-2407-6729-8e8c-5612398e46a7 




preparing
active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




In [60]:
quality_monitor_instance_id = quality_monitor_details.metadata.id
quality_monitor_instance_id

'e36dcc66-2407-6729-8e8c-5612398e46a7'

## Feedback logging

The code below downloads and stores enough feedback data to meet the minimum threshold so that OpenScale can calculate a new accuracy measurement. It then kicks off the accuracy monitor. The monitors run hourly, or can be initiated via the Python API, the REST API, or the graphical user interface.

In [61]:
!rm additional_feedback_data_v2.json
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/credit_risk/additional_feedback_data_v2.json

--2024-08-01 06:16:00--  https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/credit_risk/additional_feedback_data_v2.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8002::154, 2606:50c0:8003::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 50890 (50K) [text/plain]
Saving to: ‘additional_feedback_data_v2.json’


2024-08-01 06:16:01 (1.64 MB/s) - ‘additional_feedback_data_v2.json’ saved [50890/50890]



## Get feedback logging dataset ID

In [62]:
feedback_dataset_id = None
feedback_dataset = wos_client.data_sets.list(type=DataSetTypes.FEEDBACK, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result
feedback_dataset_id = feedback_dataset.data_sets[0].metadata.id
if feedback_dataset_id is None:
    print("Feedback data set not found. Please check quality monitor status.")

In [63]:
with open('additional_feedback_data_v2.json') as feedback_file:
    additional_feedback_data = json.load(feedback_file)

In [64]:
wos_client.data_sets.store_records(feedback_dataset_id, request_body=additional_feedback_data, background_mode=False)




 Waiting for end of storing records with request id: 7a62e26f-1e79-4253-bf9c-857e1b54d00b 




active

---------------------------------------
 Successfully finished storing records 
---------------------------------------




In [65]:
wos_client.data_sets.get_records_count(data_set_id=feedback_dataset_id)

100

In [66]:
run_details = wos_client.monitor_instances.run(monitor_instance_id=quality_monitor_instance_id, background_mode=False).result




 Waiting for end of monitoring run c36d23ee-80d7-4409-be08-b8303f52cf05 




finished

---------------------------
 Successfully finished run 
---------------------------




In [67]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=quality_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2024-07-16 06:13:19.969000+00:00,true_positive_rate,1267b942-5fe0-423c-8263-b32816c30099,0.3636363636363636,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,area_under_roc,1267b942-5fe0-423c-8263-b32816c30099,0.6587412587412588,0.8,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,precision,1267b942-5fe0-423c-8263-b32816c30099,0.8,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,matthews_correlation_coefficient,1267b942-5fe0-423c-8263-b32816c30099,0.4167242637192667,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,f1_measure,1267b942-5fe0-423c-8263-b32816c30099,0.5000000000000001,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,accuracy,1267b942-5fe0-423c-8263-b32816c30099,0.7551020408163265,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,label_skew,1267b942-5fe0-423c-8263-b32816c30099,0.6909336273400493,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,gini_coefficient,1267b942-5fe0-423c-8263-b32816c30099,0.3174825174825175,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,log_loss,1267b942-5fe0-423c-8263-b32816c30099,0.4493805793027406,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,false_positive_rate,1267b942-5fe0-423c-8263-b32816c30099,0.0461538461538461,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68


Note: First 10 records were displayed.


# Drift configuration <a name="drift"></a>

# Drift detection model generation

Please update the score function which will be used forgenerating drift detection model which will used for drift detection . This might take sometime to generate model and time taken depends on the training dataset size. The output of the score function should be a 2 arrays 1. Array of model prediction 2. Array of probabilities 

- User is expected to make sure that the data type of the "class label" column selected and the prediction column are same . For eg : If class label is numeric , the prediction array should also be numeric

- Each entry of a probability array should have all the probabities of the unique class lable .
  For eg: If the model_type=multiclass and unique class labels are A, B, C, D . Each entry in the probability array should be a array of size 4 . Eg : [ [50,30,10,10] ,[40,20,30,10]...]
  
**Note:**
- *User is expected to add "score" method , which should output prediction column array and probability column array.*
- *The data type of the label column and prediction column should be same . User needs to make sure that label column and prediction column array should have the same unique class labels*
- **Please update the score function below with the help of templates documented [here](https://github.com/IBM-Watson/aios-data-distribution/blob/master/Score%20function%20templates%20for%20drift%20detection.md)**

In [68]:
import pandas as pd

df = pd.read_csv("german_credit_data_biased_training.csv")
df.head()

Unnamed: 0,CheckingStatus,LoanDuration,CreditHistory,LoanPurpose,LoanAmount,ExistingSavings,EmploymentDuration,InstallmentPercent,Sex,OthersOnLoan,...,OwnsProperty,Age,InstallmentPlans,Housing,ExistingCreditsCount,Job,Dependents,Telephone,ForeignWorker,Risk
0,0_to_200,31,credits_paid_to_date,other,1889,100_to_500,less_1,3,female,none,...,savings_insurance,32,none,own,1,skilled,1,none,yes,No Risk
1,less_0,18,credits_paid_to_date,car_new,462,less_100,1_to_4,2,female,none,...,savings_insurance,37,stores,own,2,skilled,1,none,yes,No Risk
2,less_0,15,prior_payments_delayed,furniture,250,less_100,1_to_4,2,male,none,...,real_estate,28,none,own,2,skilled,1,yes,no,No Risk
3,0_to_200,28,credits_paid_to_date,retraining,3693,less_100,greater_7,3,male,none,...,savings_insurance,32,none,own,1,skilled,1,none,yes,No Risk
4,no_checking,28,prior_payments_delayed,education,6235,500_to_1000,greater_7,3,male,none,...,unknown,57,none,own,2,skilled,1,none,yes,Risk


In [69]:
def score(training_data_frame):
    
    num = len(training_data_frame)
    output_classes = 2
    
    import numpy as np
    
    # probability_array = np.array([[1/output_classes]*output_classes for _ in range(num)])
    # probability_array = np.array(list([0.6, 0.4] for _ in range(num)))
    probability_array = np.random.dirichlet(alpha=np.ones(output_classes), size=num)
    
    # prediction_vector = np.array(["No Risk"]*num)
    prediction_vector = np.random.choice(np.array(["Risk", "No Risk"], dtype="str"), num)
    
    return probability_array, prediction_vector

### Define the drift detection input

In [70]:
drift_detection_input = {
    "feature_columns": feature_columns,
    "categorical_columns": cat_features,
    "label_column": label_column,
    "problem_type": model_type
}
print(drift_detection_input)

{'feature_columns': ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], 'categorical_columns': ['CheckingStatus', 'CreditHistory', 'LoanPurpose', 'ExistingSavings', 'EmploymentDuration', 'Sex', 'OthersOnLoan', 'OwnsProperty', 'InstallmentPlans', 'Housing', 'Job', 'Telephone', 'ForeignWorker'], 'label_column': 'Risk', 'problem_type': 'binary'}


### Generate drift detection model

In [71]:
!rm drift_detection_model.tar.gz

rm: drift_detection_model.tar.gz: No such file or directory


In [72]:
from ibm_wos_utils.drift.drift_trainer import DriftTrainer
drift_trainer = DriftTrainer(df,drift_detection_input)
if model_type != "regression":
    #Note: batch_size can be customized by user as per the training data size
    drift_trainer.generate_drift_detection_model(score,batch_size=df.shape[0])

#Note: Two column constraints are not computed beyond two_column_learner_limit(default set to 200)
#User can adjust the value depending on the requirement
drift_trainer.learn_constraints(two_column_learner_limit=200)
drift_trainer.create_archive()

Scoring training dataframe...: 100%|██████████| 4000/4000 [00:00<00:00, 2348434.49rows/s]

  drift_trainer.generate_drift_detection_model(score,batch_size=df.shape[0])



Optimising Drift Detection Model...: 100%|██████████| 40/40 [00:27<00:00,  1.47models/s]
Scoring training dataframe...: 100%|██████████| 1000/1000 [00:00<00:00, 1562124.39rows/s]
Computing feature stats...: 100%|██████████| 20/20 [00:00<00:00, 665.60features/s]
Learning single feature constraints...: 100%|██████████| 21/21 [00:00<00:00, 1093.23constraints/s]
Learning two feature constraints...: 100%|██████████| 209/209 [00:01<00:00, 177.34constraints/s]


In [73]:
!ls -al

In [74]:
filename = 'drift_detection_model.tar.gz'

### Upload the drift detection model to OpenScale subscription

In [75]:
wos_client.monitor_instances.upload_drift_model(
        model_path=filename,
        archive_name=filename,
        data_mart_id=data_mart_id,
        subscription_id=subscription_id,
        enable_data_drift=True,
        enable_model_drift=True
     )

### Delete the existing drift monitor instance for the subscription

In [76]:
monitor_instances = wos_client.monitor_instances.list().result.monitor_instances
for monitor_instance in monitor_instances:
    monitor_def_id=monitor_instance.entity.monitor_definition_id
    if monitor_def_id == "drift" and monitor_instance.entity.target.target_id == subscription_id:
        wos_client.monitor_instances.delete(monitor_instance.metadata.id)
        print('Deleted existing drift monitor instance with id: ', monitor_instance.metadata.id)

In [77]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id

)
parameters = {
    "min_samples": 100,
    "drift_threshold": 0.1,
    "train_drift_model": False,
    "enable_model_drift": True,
    "enable_data_drift": True
}

drift_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.DRIFT.ID,
    target=target,
    parameters=parameters
).result

drift_monitor_instance_id = drift_monitor_details.metadata.id
drift_monitor_instance_id




 Waiting for end of monitor instance creation b9a3b5d4-16bc-400c-9e6c-0e9f593101c7 




preparing
active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




### Drift run

In [78]:
drift_run_details = wos_client.monitor_instances.run(monitor_instance_id=drift_monitor_instance_id, background_mode=False)




 Waiting for end of monitoring run 31c96d8a-9d38-4695-825e-466e7265cb71 




finished

---------------------------
 Successfully finished run 
---------------------------




In [79]:
time.sleep(5)
wos_client.monitor_instances.show_metrics(monitor_instance_id=drift_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2024-07-16 06:13:19.969000+00:00,true_positive_rate,1267b942-5fe0-423c-8263-b32816c30099,0.3636363636363636,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,area_under_roc,1267b942-5fe0-423c-8263-b32816c30099,0.6587412587412588,0.8,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,precision,1267b942-5fe0-423c-8263-b32816c30099,0.8,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,matthews_correlation_coefficient,1267b942-5fe0-423c-8263-b32816c30099,0.4167242637192667,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,f1_measure,1267b942-5fe0-423c-8263-b32816c30099,0.5000000000000001,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,accuracy,1267b942-5fe0-423c-8263-b32816c30099,0.7551020408163265,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,label_skew,1267b942-5fe0-423c-8263-b32816c30099,0.6909336273400493,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,gini_coefficient,1267b942-5fe0-423c-8263-b32816c30099,0.3174825174825175,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,log_loss,1267b942-5fe0-423c-8263-b32816c30099,0.4493805793027406,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,false_positive_rate,1267b942-5fe0-423c-8263-b32816c30099,0.0461538461538461,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68


Note: First 10 records were displayed.


## Summary

As part of this notebook, we have performed the following:
* Create a subscription to an custom ML end point
* Scored the custom ML provider with 100 records
* With the scored payload and also the scored response, we called the DataSets SDK method to store the payload logging records into the data mart. While doing so, we have set the scoring_id attribute.
* Configured the fairness monitor and executed it and viewed the fairness metrics output.
* Configured explainabilty monitor
* Randomly selected 5 transactions for which we want to get the prediction explanation.
* Submitted explainability tasks for the selected scoring ids, and waited for their completion.
* In the end, we composed a weight map of feature and its weight across transactions. And plotted the same.
* For example:
```
{'ForeignWorker': [33.29, 5.23],
 'OthersOnLoan': [15.96, 19.97, 12.76],
 'OwnsProperty': [15.43, 3.92, 4.44, 10.36],
 'Dependents': [9.06],
 'InstallmentPercent': [9.05],
 'CurrentResidenceDuration': [8.74, 13.15, 12.1, 10.83],
 'Sex': [2.96, 12.76],
 'InstallmentPlans': [2.4, 5.67, 6.57],
 'Age': [2.28, 8.6, 11.26],
 'Job': [0.84],
 'LoanDuration': [15.02, 10.87, 18.91, 12.72],
 'EmploymentDuration': [14.02, 14.05, 12.1],
 'LoanAmount': [9.28, 12.42, 7.85],
 'Housing': [4.35],
 'CreditHistory': [6.5]}
 ```

The understanding of the above map is like this:
* LoanDuration, CurrentResidenceDuration, OwnsProperty are the most contributing features across transactions for their respective prediction. Their weights for the respective prediction can also be seen.
* And the low contributing features are CreditHistory, Housing, Job, InstallmentPercent and Dependents, with their respective weights can also be seen as printed.

* We configured quality monitor and uploaded feedback data, and thereby ran the quality monitor
* For drift monitoring purposes, we created the drift detection model and uploaded to the OpenScale subscription.
* Executed the drift monitor.

Thank You! for working on tutorial notebook.

Author: Ravi Chamarthy (ravi.chamarthy@in.ibm.com)