<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Watson Machine Learning

## This notebook should be run in a Watson Studio project, using **IBM Runtime 23.1 on Python 3.10 XS** runtime environment. **If you are viewing this in Watson Studio and do not see the required runtime env in the upper right corner of your screen, please update the runtime now.** It requires service credentials for the following Cloud services:
  * Watson OpenScale
  * Watson Machine Learning
  
If you have a paid Cloud account, you may also provision a **Databases for PostgreSQL** or **Db2 Warehouse** service to take full advantage of integration with Watson Studio. If you choose not to provision this paid service, you can use the free internal PostgreSQL storage with OpenScale, but will not be able to configure continuous learning for your model.

The notebook will train, create and deploy a House Price regression model, configure OpenScale to monitor that deployment in the OpenScale Insights dashboard.

### Contents

- [Setup](#setup)
- [Model building and deployment](#model)
- [OpenScale configuration](#openscale)
- [Quality monitor and feedback logging](#quality)
- [Fairness, drift monitoring and explanations](#fairness)

# Setup <a name="setup"></a>

## Package installation

In [1]:
import warnings
warnings.filterwarnings('ignore')


In [None]:
# If you are not using IBM Watson Studio to run your notebook then install the below packages

# !pip install pandas --no-cache | tail -n 1
# !pip install requests --no-cache | tail -n 1
# !pip install numpy --user --no-cache | tail -n 1
# !pip install SciPy --no-cache | tail -n 1
# !pip install lime --no-cache | tail -n 1
#!pip install xgboost --no-cache | tail -n 1


!pip install --upgrade ibm-watson-machine-learning --user | tail -n 1
!pip install --upgrade "ibm-watson-openscale~=3.0.34" --no-cache | tail -n 1

## Provision services and configure credentials

If you have not already, provision an instance of IBM Watson OpenScale using the [OpenScale link in the Cloud catalog](https://cloud.ibm.com/catalog/services/watson-openscale).

Your Cloud API key can be generated by going to the [**Users** section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the **API Keys** section, and click **Create an IBM Cloud API key**. Give your key a name and click **Create**, then copy the created key and paste it below.

**NOTE:** You can also get OpenScale `API_KEY` using IBM CLOUD CLI.

How to install IBM Cloud (bluemix) console: [instruction](https://console.bluemix.net/docs/cli/reference/ibmcloud/download_cli.html#install_use)

How to get api key using console:
```
bx login --sso
bx iam api-key-create 'my_key'
```

In [2]:
CLOUD_API_KEY = "***"
IAM_URL="https://iam.ng.bluemix.net/oidc/token"

If you have not already, provision an instance of IBM Watson OpenScale using the [OpenScale link in the Cloud catalog](https://cloud.ibm.com/catalog/services/watson-openscale).

Your Cloud API key can be generated by going to the [**Users** section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the **API Keys** section, and click **Create an IBM Cloud API key**. Give your key a name and click **Create**, then copy the created key, generate an IAM token using that key and paste it below.

### WML credentials example with API key

In [3]:
WML_CREDENTIALS = {
                   "url": "https://us-south.ml.cloud.ibm.com",
                   "apikey": CLOUD_API_KEY
}

###  WML credentials example using IAM_token 

**NOTE**: If IAM_TOKEN is used for authentication and you receive unauthorized/expired token error at any steps, please create a new token and reinitiate clients authentication.

In [4]:
# #uncomment this cell if want to use IAM_TOKEN
# import requests
# def generate_access_token():
#     headers={}
#     headers["Content-Type"] = "application/x-www-form-urlencoded"
#     headers["Accept"] = "application/json"
#     auth = HTTPBasicAuth("bx", "bx")
#     data = {
#         "grant_type": "urn:ibm:params:oauth:grant-type:apikey",
#         "apikey": CLOUD_API_KEY
#     }
#     response = requests.post(IAM_URL, data=data, headers=headers, auth=auth)
#     json_data = response.json()
#     iam_access_token = json_data['access_token']
#     return iam_access_token

In [5]:
#uncomment this cell if want to use IAM_TOKEN
# IAM_TOKEN = generate_access_token()
# WML_CREDENTIALS = {
#                    "url": "https://us-south.ml.cloud.ibm.com",
#                    "token": IAM_TOKEN
#             }

### Cloud object storage details

In next cells, you will need to paste some credentials to Cloud Object Storage. If you haven't worked with COS yet please visit [getting started with COS tutorial](https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-getting-started). 
You can find `COS_API_KEY_ID` and `COS_RESOURCE_CRN` variables in **_Service Credentials_** in menu of your COS instance. Used COS Service Credentials must be created with _Role_ parameter set as Writer. Later training data file will be loaded to the bucket of your instance and used as training refecence in subsription.  
`COS_ENDPOINT` variable can be found in **_Endpoint_** field of the menu.

In [6]:
COS_API_KEY_ID = "***"
COS_RESOURCE_CRN = "***" # eg "crn:v1:bluemix:public:cloud-object-storage:global:a/3bf0d9003abfb5d29761c3e97696b71c:d6f04d83-6c4f-4a62-a165-696756d63903::"
COS_ENDPOINT = "***" # Current list avaiable at https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints

In [7]:
BUCKET_NAME = "llm-test" 
training_data_file_name="house_price_regression.csv"

This tutorial can use Databases for PostgreSQL, Db2 Warehouse, or a free internal verison of PostgreSQL to create a datamart for OpenScale.

If you have previously configured OpenScale, it will use your existing datamart, and not interfere with any models you are currently monitoring. Do not update the cell below.

If you do not have a paid Cloud account or would prefer not to provision this paid service, you may use the free internal PostgreSQL service with OpenScale. Do not update the cell below.

To provision a new instance of Db2 Warehouse, locate [Db2 Warehouse in the Cloud catalog](https://cloud.ibm.com/catalog/services/db2-warehouse), give your service a name, and click **Create**. Once your instance is created, click the **Service Credentials** link on the left side of the screen. Click the **New credential** button, give your credentials a name, and click **Add**. Your new credentials can be accessed by clicking the **View credentials** button. Copy and paste your Db2 Warehouse credentials into the cell below.

To provision a new instance of Databases for PostgreSQL, locate [Databases for PostgreSQL in the Cloud catalog](https://cloud.ibm.com/catalog/services/databases-for-postgresql), give your service a name, and click **Create**. Once your instance is created, click the **Service Credentials** link on the left side of the screen. Click the **New credential** button, give your credentials a name, and click **Add**. Your new credentials can be accessed by clicking the **View credentials** button. Copy and paste your Databases for PostgreSQL credentials into the cell below.

In [8]:
DB_CREDENTIALS = None
#DB_CREDENTIALS= {"hostname":"","username":"","password":"","database":"","port":"","ssl":True,"sslmode":"","certificate_base64":""}

In [9]:
KEEP_MY_INTERNAL_POSTGRES = True

## Run the notebook

At this point, the notebook is ready to run. You can either run the cells one at a time, or click the **Kernel** option above and select **Restart and Run All** to run all the cells.

# Model building and deployment <a name="model"></a>

In this section you will learn how to train model and next deploy it as web-service using Watson Machine Learning service.

## Load the training data from github

In [10]:
!rm house_price_regression.csv
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/house_price/house_price_regression.csv

rm: cannot remove 'house_price_regression.csv': No such file or directory
--2024-08-06 05:23:03--  https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/house_price/house_price_regression.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 185379 (181K) [text/plain]
Saving to: ‘house_price_regression.csv’


2024-08-06 05:23:03 (33.6 MB/s) - ‘house_price_regression.csv’ saved [185379/185379]



In [11]:
import pandas as pd
import numpy as np
pd_data = pd.read_csv("house_price_regression.csv")
pd_data.head()

Unnamed: 0,Id,MSSubClass,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,BsmtUnfSF,...,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleCondition,SalePrice
0,1,60,7,5,2003,2003,196.0,706,0,150,...,61,0,0,0,0,0,2,2008,Normal,208500
1,2,20,6,8,1976,1976,0.0,978,0,284,...,0,0,0,0,0,0,5,2007,Normal,181500
2,3,60,7,5,2001,2002,162.0,486,0,434,...,42,0,0,0,0,0,9,2008,Normal,223500
3,4,70,7,5,1915,1970,0.0,216,0,540,...,35,272,0,0,0,0,2,2006,Abnorml,140000
4,5,60,8,5,2000,2000,350.0,655,0,490,...,84,0,0,0,0,0,12,2008,Normal,250000


## Explore data

## Save training data to Cloud Object Storage

In [12]:
import ibm_boto3
from ibm_botocore.client import Config, ClientError

cos_client = ibm_boto3.resource("s3",
    ibm_api_key_id=COS_API_KEY_ID,
    ibm_service_instance_id=COS_RESOURCE_CRN,
    ibm_auth_endpoint="https://iam.bluemix.net/oidc/token",
    config=Config(signature_version="oauth"),
    endpoint_url=COS_ENDPOINT
)

In [13]:
with open(training_data_file_name, "rb") as file_data:
    cos_client.Object(BUCKET_NAME, training_data_file_name).upload_fileobj(
        Fileobj=file_data
    )

## Create a model

In [14]:
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer

pd_data.dropna(axis=0, subset=['SalePrice'], inplace=True)
label = pd_data.SalePrice
feature_data = pd_data.drop(['SalePrice'], axis=1).select_dtypes(exclude=['object'])
train_X, test_X, train_y, test_y = train_test_split(feature_data.values, label.values, test_size=0.25)

my_imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
train_X = my_imputer.fit_transform(train_X)
test_X = my_imputer.transform(test_X)

In [15]:
from xgboost import XGBRegressor
from sklearn.compose import ColumnTransformer

model=XGBRegressor(eval_metric=['error'])
model.fit(train_X, train_y, 
             eval_set=[(test_X, test_y)], verbose=False)

In [16]:
# make predictions
predictions = model.predict(test_X)
from sklearn.metrics import mean_absolute_error
print("Mean Absolute Error : " + str(mean_absolute_error(predictions, test_y)))

Mean Absolute Error : 17869.313366866438


### wrap xgboost with scikit pipeline

In [17]:
from sklearn.pipeline import Pipeline
xgb_model_imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
pipeline = Pipeline(steps=[('Imputer', xgb_model_imputer), ('xgb', model)])

In [18]:
model_xgb=pipeline.fit(train_X, train_y)

In [19]:
# make predictions
predictions = model_xgb.predict(test_X)
from sklearn.metrics import mean_absolute_error
print("Mean Absolute Error : " + str(mean_absolute_error(predictions, test_y)))

Mean Absolute Error : 17869.313366866438


## Publish the model

In [20]:
import json
from ibm_watson_machine_learning import APIClient

wml_client = APIClient(WML_CREDENTIALS)
wml_client.version

'1.0.360'

### Listing all the available spaces

In [21]:
wml_client.spaces.list(limit=10)

------------------------------------  -------------------------------------------------------------------  ------------------------
ID                                    NAME                                                                 CREATED
4021f1d9-c203-4e9f-97f6-4766dd48155b  prod-space                                                           2024-08-05T04:42:04.665Z
be45ab4c-1fb7-440c-9b03-2909067e45e0  Automotive Demo - Quality report summarization                       2024-05-27T13:13:27.233Z
e0ee6250-7ef6-42c3-8ffa-350d9b0df578  pre-prod-space                                                       2024-02-28T07:35:30.368Z
63c5982f-7160-41c4-86f7-1310a8ab32cb  prompt-space                                                         2024-01-15T19:29:21.535Z
f04e0e73-a1b7-4ae9-a08d-7e16add4fe08  llm. space                                                           2023-11-24T13:57:05.167Z
d1afbea3-e899-4ed3-b9a6-0686751508c3  wml                                                    

Unnamed: 0,ID,NAME,CREATED
0,4021f1d9-c203-4e9f-97f6-4766dd48155b,prod-space,2024-08-05T04:42:04.665Z
1,be45ab4c-1fb7-440c-9b03-2909067e45e0,Automotive Demo - Quality report summarization,2024-05-27T13:13:27.233Z
2,e0ee6250-7ef6-42c3-8ffa-350d9b0df578,pre-prod-space,2024-02-28T07:35:30.368Z
3,63c5982f-7160-41c4-86f7-1310a8ab32cb,prompt-space,2024-01-15T19:29:21.535Z
4,f04e0e73-a1b7-4ae9-a08d-7e16add4fe08,llm. space,2023-11-24T13:57:05.167Z
5,d1afbea3-e899-4ed3-b9a6-0686751508c3,wml,2023-09-21T07:06:42.577Z
6,6f7c3969-6d3f-4f9a-b97a-b534f4e4fef3,AutoAIDemo,2023-08-25T02:50:27.113Z
7,0b7992c2-3991-4145-a5ba-d5b428261171,openscale-express-path-preprod-80e6093f-5acf-4...,2023-08-16T06:59:18.538Z
8,3226c381-5ae0-4bc4-b306-fc638c785e47,openscale-express-path-80e6093f-5acf-4eb7-9da6...,2023-08-16T06:58:57.332Z


In [22]:
WML_SPACE_ID='***' # use space id here

wml_client.set.default_space(WML_SPACE_ID)

'SUCCESS'

### Remove existing model and deployment

In [23]:
MODEL_NAME="house_price_xgbregression"
DEPLOYMENT_NAME="house_price_xgbregression_deployment"

In [25]:
deployments_list = wml_client.deployments.get_details()
for deployment in deployments_list["resources"]:
    model_id = deployment["entity"]["asset"]["id"]
    deployment_id = deployment["metadata"]["id"]
    if deployment["metadata"]["name"] == DEPLOYMENT_NAME:
        print("Deleting deployment id", deployment_id)
        wml_client.deployments.delete(deployment_id)
        print("Deleting model id", model_id)
        wml_client.repository.delete(model_id)
wml_client.repository.list_models()

In [26]:
datasource_type = wml_client.connections.get_datasource_type_uid_by_name('bluemixcloudobjectstorage')
conn_meta_props= {
    wml_client.connections.ConfigurationMetaNames.NAME: "Connection My COS ",
    wml_client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: datasource_type,
    wml_client.connections.ConfigurationMetaNames.DESCRIPTION: "Connection to my COS",
    wml_client.connections.ConfigurationMetaNames.PROPERTIES: {
        'bucket': BUCKET_NAME,
        'api_key': COS_API_KEY_ID,
        'resource_instance_id': COS_RESOURCE_CRN,
        'iam_url': "https://iam.ng.bluemix.net/oidc/token",
        'url': COS_ENDPOINT
    }
}

conn_details = wml_client.connections.create(meta_props=conn_meta_props)
connection_id = wml_client.connections.get_uid(conn_details)

training_data_references = [
    {
        "id": "German Credit Risk", 
        "type": "connection_asset",
        "connection": {
            "id": connection_id,
            "href": "/v2/connections/" + connection_id + "?space_id=" + WML_SPACE_ID

        },
        "location": {
            "bucket": BUCKET_NAME,
            "file_name": training_data_file_name
        }
    }    
]


Creating connections...
SUCCESS


In [27]:
#Note if there is specification related exception or specification ID is None then use "default_py3.7" instead of "default_py3.8_opence"
software_spec_uid = wml_client.software_specifications.get_id_by_name("runtime-23.1-py3.10")
print("Software Specification ID: {}".format(software_spec_uid))
model_props = {
        wml_client._models.ConfigurationMetaNames.NAME:"{}".format(MODEL_NAME),
        wml_client._models.ConfigurationMetaNames.TYPE: "scikit-learn_1.1",
        wml_client._models.ConfigurationMetaNames.SOFTWARE_SPEC_UID: software_spec_uid,
        wml_client._models.ConfigurationMetaNames.LABEL_FIELD: "SalePrice",
    }

Software Specification ID: 336b29df-e0e1-5e7d-b6a5-f6ab722625b2


In [28]:
print("Storing model ...")
published_model_details = wml_client.repository.store_model(
    model=model_xgb, 
    meta_props=model_props,
    training_data=feature_data, 
    training_target=label
)

model_uid = wml_client.repository.get_model_id(published_model_details)
print("Done")
print("Model ID: {}".format(model_uid))

Storing model ...
Done
Model ID: fcbeb140-dcd2-4ce2-a74e-fe613c252539


## Deploy the model

The next section of the notebook deploys the model as a RESTful web service in Watson Machine Learning. The deployed model will have a scoring URL you can use to send data to the model for predictions.

In [29]:
deployment_details = wml_client.deployments.create(
    model_uid, 
    meta_props={
        wml_client.deployments.ConfigurationMetaNames.NAME: "{}".format(DEPLOYMENT_NAME),
        wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
    }
)
scoring_url = wml_client.deployments.get_scoring_href(deployment_details)
deployment_uid=wml_client.deployments.get_id(deployment_details)

print("Scoring URL:" + scoring_url)
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))



#######################################################################################

Synchronous deployment creation for uid: 'fcbeb140-dcd2-4ce2-a74e-fe613c252539' started

#######################################################################################


initializing
Note: online_url and serving_urls are deprecated and will be removed in a future release. Use inference instead.

ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='e82b106e-0ce0-4bd2-8af7-60f78b46151d'
------------------------------------------------------------------------------------------------


Scoring URL:https://us-south.ml.cloud.ibm.com/ml/v4/deployments/e82b106e-0ce0-4bd2-8af7-60f78b46151d/predictions
Model id: fcbeb140-dcd2-4ce2-a74e-fe613c252539
Deployment id: e82b106e-0ce0-4bd2-8af7-60f78b46151d


## Sample scoring

In [30]:
fields = feature_data.columns.tolist()
values = [
            test_X[0].tolist()
        ]

scoring_payload = {"input_data": [{"fields": fields, "values": values}]}
scoring_payload

{'input_data': [{'fields': ['Id',
    'MSSubClass',
    'OverallQual',
    'OverallCond',
    'YearBuilt',
    'YearRemodAdd',
    'MasVnrArea',
    'BsmtFinSF1',
    'BsmtFinSF2',
    'BsmtUnfSF',
    'TotalBsmtSF',
    '1stFlrSF',
    '2ndFlrSF',
    'LowQualFinSF',
    'GrLivArea',
    'BsmtFullBath',
    'BsmtHalfBath',
    'FullBath',
    'HalfBath',
    'BedroomAbvGr',
    'KitchenAbvGr',
    'TotRmsAbvGrd',
    'Fireplaces',
    'GarageYrBlt',
    'GarageCars',
    'GarageArea',
    'WoodDeckSF',
    'OpenPorchSF',
    'EnclosedPorch',
    '3SsnPorch',
    'ScreenPorch',
    'PoolArea',
    'MiscVal',
    'MoSold',
    'YrSold'],
   'values': [[411.0,
     20.0,
     5.0,
     3.0,
     1958.0,
     1958.0,
     0.0,
     0.0,
     0.0,
     1276.0,
     1276.0,
     1276.0,
     0.0,
     0.0,
     1276.0,
     0.0,
     0.0,
     1.0,
     0.0,
     3.0,
     1.0,
     5.0,
     0.0,
     1958.0,
     1.0,
     350.0,
     0.0,
     0.0,
     0.0,
     0.0,
     0.0,
     0.0,

In [31]:
scoring_response = wml_client.deployments.score(deployment_uid, scoring_payload)
scoring_response

{'predictions': [{'fields': ['prediction'], 'values': [[96719.5703125]]}]}

# Configure OpenScale <a name="openscale"></a>

The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [32]:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator,BearerTokenAuthenticator

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *


authenticator = IAMAuthenticator(apikey=CLOUD_API_KEY)
wos_client = APIClient(authenticator=authenticator)
wos_client.version

'3.0.39'

## Create schema and datamart

### Set up datamart

Watson OpenScale uses a database to store payload logs and calculated metrics. If database credentials were **not** supplied above, the notebook will use the free, internal lite database. If database credentials were supplied, the datamart will be created there **unless** there is an existing datamart **and** the **KEEP_MY_INTERNAL_POSTGRES** variable is set to **True**. If an OpenScale datamart exists in Db2 or PostgreSQL, the existing datamart will be used and no data will be overwritten.

Prior instances of the House price model will be removed from OpenScale monitoring.

In [33]:
wos_client.data_marts.show()

0,1,2,3,4,5
AIOSFASTPATH-80E6093F-5ACF-4EB7-9DA6-7BA9BF56A929,,True,active,2024-05-16 06:09:07.089000+00:00,80e6093f-5acf-4eb7-9da6-7ba9bf56a929


In [34]:
data_marts = wos_client.data_marts.list().result.data_marts
if len(data_marts) == 0:
    if DB_CREDENTIALS is not None:
        if SCHEMA_NAME is None: 
            print("Please specify the SCHEMA_NAME and rerun the cell")

        print('Setting up external datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook",
                database_configuration=DatabaseConfigurationRequest(
                  database_type=DatabaseType.POSTGRESQL,
                    credentials=PrimaryStorageCredentialsLong(
                        hostname=DB_CREDENTIALS['hostname'],
                        username=DB_CREDENTIALS['username'],
                        password=DB_CREDENTIALS['password'],
                        db=DB_CREDENTIALS['database'],
                        port=DB_CREDENTIALS['port'],
                        ssl=True,
                        sslmode=DB_CREDENTIALS['sslmode'],
                        certificate_base64=DB_CREDENTIALS['certificate_base64']
                    ),
                    location=LocationSchemaName(
                        schema_name= SCHEMA_NAME
                    )
                )
             ).result
    else:
        print('Setting up internal datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook", 
                internal_database = True).result
        
    data_mart_id = added_data_mart_result.metadata.id
    
else:
    data_mart_id=data_marts[0].metadata.id
    print('Using existing datamart {}'.format(data_mart_id))
    

Using existing datamart 80e6093f-5acf-4eb7-9da6-7ba9bf56a929


### Remove existing service provider connected with used  WML instance. 

Multiple service providers for the same engine instance are avaiable in Watson OpenScale. To avoid multiple service providers of used WML instance in the tutorial notebook the following code deletes existing service provder(s) and then adds new one. 

In [35]:
SERVICE_PROVIDER_NAME = "xgboost_WML V2"
SERVICE_PROVIDER_DESCRIPTION = "Added by tutorial WOS notebook."

In [36]:
service_providers = wos_client.service_providers.list().result.service_providers
for service_provider in service_providers:
    service_instance_name = service_provider.entity.name
    if service_instance_name == SERVICE_PROVIDER_NAME:
        service_provider_id = service_provider.metadata.id
        wos_client.service_providers.delete(service_provider_id)
        print("Deleted existing service_provider for WML instance: {}".format(service_provider_id))

## Add service provider

Watson OpenScale needs to be bound to the Watson Machine Learning instance to capture payload data into and out of the model.

**Note:** You can bind more than one engine instance if needed by calling `wos_client.service_providers.add` method. Next, you can refer to particular service provider using `service_provider_id`.

In [37]:
added_service_provider_result = wos_client.service_providers.add(
        name=SERVICE_PROVIDER_NAME,
        description=SERVICE_PROVIDER_DESCRIPTION,
        service_type=ServiceTypes.WATSON_MACHINE_LEARNING,
        deployment_space_id = WML_SPACE_ID,
        operational_space_id = "production",
        credentials=WMLCredentialsCloud(
            apikey=CLOUD_API_KEY,      ## use `apikey=IAM_TOKEN` if using IAM_TOKEN to initiate client
            url=WML_CREDENTIALS["url"],
            instance_id=None
        ),
        background_mode=False
    ).result
service_provider_id = added_service_provider_result.metadata.id




 Waiting for end of adding service provider 57e9af4f-cc42-4a7e-9e7a-65c230c768ea 




active

-----------------------------------------------
 Successfully finished adding service provider 
-----------------------------------------------




In [38]:
wos_client.service_providers.show()

0,1,2,3,4,5
89cbf463-87d9-4032-9c0c-e371dd68d156,active,xgboost_WML V2,watson_machine_learning,2024-08-06 05:24:54.322000+00:00,57e9af4f-cc42-4a7e-9e7a-65c230c768ea


In [39]:
asset_deployment_details = wos_client.service_providers.list_assets(data_mart_id=data_mart_id, service_provider_id=service_provider_id,deployment_id=deployment_uid, deployment_space_id = WML_SPACE_ID).result['resources'][0]
asset_deployment_details

{'metadata': {'guid': 'e82b106e-0ce0-4bd2-8af7-60f78b46151d',
  'url': 'https://us-south.ml.cloud.ibm.com/ml/v4/deployments/e82b106e-0ce0-4bd2-8af7-60f78b46151d?space_id=d1afbea3-e899-4ed3-b9a6-0686751508c3',
  'created_at': '2024-08-06T05:23:57.950Z',
  'modified_at': '2024-08-06T05:23:57.950Z'},
 'entity': {'name': 'house_price_xgbregression_deployment',
  'type': 'online',
  'scoring_endpoint': {'url': 'https://us-south.ml.cloud.ibm.com/ml/v4/deployments/e82b106e-0ce0-4bd2-8af7-60f78b46151d/predictions'},
  'asset': {},
  'asset_properties': {}}}

In [40]:
model_asset_details_from_deployment=wos_client.service_providers.get_deployment_asset(data_mart_id=data_mart_id,service_provider_id=service_provider_id,deployment_id=deployment_uid,deployment_space_id=WML_SPACE_ID)
model_asset_details_from_deployment

{'metadata': {'guid': 'e82b106e-0ce0-4bd2-8af7-60f78b46151d',
  'url': 'https://us-south.ml.cloud.ibm.com/ml/v4/deployments/e82b106e-0ce0-4bd2-8af7-60f78b46151d?space_id=d1afbea3-e899-4ed3-b9a6-0686751508c3',
  'created_at': '2024-08-06T05:23:57.950Z',
  'modified_at': '2024-08-06T05:23:57.950Z'},
 'entity': {'name': 'house_price_xgbregression_deployment',
  'type': 'online',
  'scoring_endpoint': {'url': 'https://us-south.ml.cloud.ibm.com/ml/v4/deployments/e82b106e-0ce0-4bd2-8af7-60f78b46151d/predictions'},
  'asset': {'asset_id': 'fcbeb140-dcd2-4ce2-a74e-fe613c252539',
   'url': 'https://us-south.ml.cloud.ibm.com/ml/v4/models/fcbeb140-dcd2-4ce2-a74e-fe613c252539?space_id=d1afbea3-e899-4ed3-b9a6-0686751508c3&version=2020-06-12',
   'name': 'house_price_xgbregression',
   'asset_type': 'model',
   'created_at': '2024-08-06T05:23:46.813Z',
   'modified_at': '2024-08-06T05:23:52.000Z'},
  'asset_properties': {'model_type': 'scikit-learn_1.1',
   'runtime_environment': 'python-3.10',
   '

## Subscriptions

### Remove existing House price model subscriptions

This code removes previous subscriptions to the House price model to refresh the monitors with the new model and new data.

In [41]:
wos_client.subscriptions.show()

This code removes previous subscriptions to the House price model to refresh the monitors with the new model and new data.

In [42]:
subscriptions = wos_client.subscriptions.list().result.subscriptions
for subscription in subscriptions:
    sub_model_id = subscription.entity.asset.asset_id
    if sub_model_id == model_uid:
        wos_client.subscriptions.delete(subscription.metadata.id)
        print('Deleted existing subscription for model', sub_model_id)

This code creates the model subscription in OpenScale using the Python client API. Note that we need to provide the model unique identifier, and some information about the model itself.

### This code creates the model subscription in OpenScale using the Python client API. Note that we need to provide the model unique identifier, and some information about the model itself.

In [43]:
feature_cols=feature_data.columns.tolist()
#categorical_cols=X.select_dtypes(include=['object']).columns

In [44]:
from ibm_watson_openscale.base_classes.watson_open_scale_v2 import ScoringEndpointRequest

In [45]:
subscription_details = wos_client.subscriptions.add(
        data_mart_id=data_mart_id,
        service_provider_id=service_provider_id,
        asset=Asset(
            asset_id=model_asset_details_from_deployment["entity"]["asset"]["asset_id"],
            name=model_asset_details_from_deployment["entity"]["asset"]["name"],
            url=model_asset_details_from_deployment["entity"]["asset"]["url"],
            asset_type=AssetTypes.MODEL,
            input_data_type=InputDataType.STRUCTURED,
            problem_type=ProblemType.REGRESSION
        ),
        deployment=AssetDeploymentRequest(
            deployment_id=asset_deployment_details['metadata']['guid'],
            name=asset_deployment_details['entity']['name'],
            deployment_type= DeploymentTypes.ONLINE,
            url=asset_deployment_details['metadata']['url'],
            scoring_endpoint=ScoringEndpointRequest(url=scoring_url) # scoring model without shadow deployment
        ),
        asset_properties=AssetPropertiesRequest(
            label_column='SalePrice',
            prediction_field='prediction',
            feature_fields = feature_cols,
            #categorical_fields = categorical_cols,
            training_data_reference=TrainingDataReference(type='cos',
                                                          location=COSTrainingDataReferenceLocation(bucket = BUCKET_NAME,
                                                                                                    file_name = training_data_file_name),
                                                          connection=COSTrainingDataReferenceConnection.from_dict({
                                                                        "resource_instance_id": COS_RESOURCE_CRN,
                                                                        "url": COS_ENDPOINT,
                                                                        "api_key": COS_API_KEY_ID,
                                                                        "iam_url": IAM_URL}))
        ),background_mode = False
    ).result
subscription_id = subscription_details.metadata.id
subscription_id




 Waiting for end of adding subscription 200c31a3-9a6f-452b-8512-1884e3e90c9e 




active

-------------------------------------------
 Successfully finished adding subscription 
-------------------------------------------




'200c31a3-9a6f-452b-8512-1884e3e90c9e'

In [46]:
import time

time.sleep(5)
payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id: ", payload_data_set_id)

Payload data set id:  c4e039a6-bcba-448e-8499-97b525e52a85


In [47]:
wos_client.data_sets.show()

0,1,2,3,4,5,6
80e6093f-5acf-4eb7-9da6-7ba9bf56a929,active,200c31a3-9a6f-452b-8512-1884e3e90c9e,subscription,model_health,2024-08-06 05:25:19.390000+00:00,a38c2164-5171-4953-8e34-1404cf59cd3b
80e6093f-5acf-4eb7-9da6-7ba9bf56a929,active,200c31a3-9a6f-452b-8512-1884e3e90c9e,subscription,payload_logging_error,2024-08-06 05:25:18.268000+00:00,4f366f4e-0922-4963-b56d-46ff3fe7546b
80e6093f-5acf-4eb7-9da6-7ba9bf56a929,active,200c31a3-9a6f-452b-8512-1884e3e90c9e,subscription,manual_labeling,2024-08-06 05:25:17.993000+00:00,13f9dec1-122b-4116-bc30-ebb09b3561ef
80e6093f-5acf-4eb7-9da6-7ba9bf56a929,active,200c31a3-9a6f-452b-8512-1884e3e90c9e,subscription,payload_logging,2024-08-06 05:25:17.805000+00:00,c4e039a6-bcba-448e-8499-97b525e52a85


Get subscription list

In [48]:
wos_client.subscriptions.show()

0,1,2,3,4,5,6,7,8,9
fcbeb140-dcd2-4ce2-a74e-fe613c252539,model,house_price_xgbregression,80e6093f-5acf-4eb7-9da6-7ba9bf56a929,e82b106e-0ce0-4bd2-8af7-60f78b46151d,house_price_xgbregression_deployment,57e9af4f-cc42-4a7e-9e7a-65c230c768ea,active,2024-08-06 05:25:16.743000+00:00,200c31a3-9a6f-452b-8512-1884e3e90c9e


### Score the model so we can configure monitors

In [49]:
import random


fields = feature_data.columns.tolist()
values = random.sample(test_X.tolist(), 2)
        
scoring_payload = {"input_data": [{"fields": fields, "values": values}]}
predictions = wml_client.deployments.score(deployment_uid, scoring_payload)

print("Single record scoring result:", "\n fields:", predictions["predictions"][0]["fields"], "\n values: ", predictions["predictions"][0]["values"][0])

Single record scoring result: 
 fields: ['prediction'] 
 values:  [158750.609375]


## Check if WML payload logging worked else manually store payload records

In [50]:
import uuid
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    print("Payload logging did not happen, performing explicit payload logging.")
    wos_client.data_sets.store_records(data_set_id=payload_data_set_id, request_body=[PayloadRecord(
                   scoring_id=str(uuid.uuid4()),
                   request=scoring_payload,
                   response={"fields": predictions['predictions'][0]['fields'], "values":predictions['predictions'][0]['values']},
                   response_time=460
               )],background_mode=False)
    time.sleep(5)
    pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))

Number of records in the payload logging table: 2


In [51]:
wos_client.data_sets.show_records(payload_data_set_id)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
,0.0,879.0,6.0,7c5adb552d94e50e34f2cc068b5b9e75-1,0.0,0.0,1.0,2024-08-06T05:25:31.520490Z,0.0,0.0,0.0,576.0,0.0,1995.0,3.0,1155.0,210.0,2.0,158750.609375,0.0,1961.0,1987.0,2010.0,1.0,1155.0,6.0,400.0,0.0,0.0,5.0,85.0,192.0,e82b106e-0ce0-4bd2-8af7-60f78b46151d,7.0,1109.0,0.0,1.0,899.0,0.0
,0.0,1017.0,6.0,7c5adb552d94e50e34f2cc068b5b9e75-2,0.0,0.0,2.0,2024-08-06T05:25:31.520490Z,0.0,0.0,0.0,478.0,196.0,1996.0,3.0,1504.0,814.0,2.0,213135.703125,66.0,1996.0,1996.0,2009.0,1.0,1504.0,6.0,0.0,0.0,0.0,7.0,20.0,115.0,e82b106e-0ce0-4bd2-8af7-60f78b46151d,5.0,1504.0,1.0,1.0,690.0,0.0


# Quality monitoring and feedback logging <a name="quality"></a>

## Enable quality monitoring

In [52]:
import time

time.sleep(10)
target = Target(
        target_type=TargetTypes.SUBSCRIPTION,
        target_id=subscription_id
)
parameters = {
    "min_feedback_data_size": 50
}
quality_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.QUALITY.ID,
    target=target,
    parameters=parameters
).result




 Waiting for end of monitor instance creation de133709-934d-44a0-bf9d-ee3af30f0b84 




preparing
active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




In [53]:
quality_monitor_instance_id = quality_monitor_details.metadata.id
quality_monitor_instance_id

'de133709-934d-44a0-bf9d-ee3af30f0b84'

## Feedback logging

The code below downloads and stores enough feedback data to meet the minimum threshold so that OpenScale can calculate a new accuracy measurement. It then kicks off the accuracy monitor. The monitors run hourly, or can be initiated via the Python API, the REST API, or the graphical user interface.

### Get feedback logging dataset ID

In [54]:
feedback_dataset_id = None
feedback_dataset = wos_client.data_sets.list(type=DataSetTypes.FEEDBACK, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result
print(feedback_dataset)
feedback_dataset_id = feedback_dataset.data_sets[0].metadata.id
if feedback_dataset_id is None:
    print("Feedback data set not found. Please check quality monitor status.")

{
  "data_sets": [
    {
      "metadata": {
        "id": "207a66f6-9eb7-4de9-bcbd-04a219342ac5",
        "crn": "crn:v1:bluemix:public:aiopenscale:us-south:a/181ed6cc388f47bd9d862fe066f9cfce:80e6093f-5acf-4eb7-9da6-7ba9bf56a929:data_set:207a66f6-9eb7-4de9-bcbd-04a219342ac5",
        "url": "/v2/data_sets/207a66f6-9eb7-4de9-bcbd-04a219342ac5",
        "created_at": "2024-08-06T05:25:50.235000Z",
        "created_by": "iam-ServiceId-2e5c9fda-38bf-4279-9712-cdb3b6f3a7ad",
        "modified_at": "2024-08-06T05:25:50.675000Z",
        "modified_by": "iam-ServiceId-2e5c9fda-38bf-4279-9712-cdb3b6f3a7ad"
      },
      "entity": {
        "data_mart_id": "80e6093f-5acf-4eb7-9da6-7ba9bf56a929",
        "name": "200c31a3-9a6f-452b-8512-1884e3e90c9e_feedback",
        "description": "200c31a3-9a6f-452b-8512-1884e3e90c9e_feedback",
        "type": "feedback",
        "target": {
          "target_type": "subscription",
          "target_id": "200c31a3-9a6f-452b-8512-1884e3e90c9e"
        },
    

In [55]:
!rm custom_feedback_50_regression.json
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/house_price/custom_feedback_50_regression.json

rm: cannot remove 'custom_feedback_50_regression.json': No such file or directory
--2024-08-06 05:26:03--  https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/house_price/custom_feedback_50_regression.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11571 (11K) [text/plain]
Saving to: ‘custom_feedback_50_regression.json’


2024-08-06 05:26:03 (15.3 MB/s) - ‘custom_feedback_50_regression.json’ saved [11571/11571]



In [56]:
with open ('custom_feedback_50_regression.json')as file:
    feedback_data=json.load(file)

In [57]:
wos_client.data_sets.store_records(feedback_dataset_id, request_body=feedback_data, background_mode=False)




 Waiting for end of storing records with request id: 7a62e26f-1e79-4253-bf9c-857e1b54d00b 




active

---------------------------------------
 Successfully finished storing records 
---------------------------------------




<ibm_cloud_sdk_core.detailed_response.DetailedResponse at 0x1481bbf10>

In [58]:
wos_client.data_sets.get_records_count(data_set_id=feedback_dataset_id)

98

In [59]:
run_details = wos_client.monitor_instances.run(monitor_instance_id=quality_monitor_instance_id, background_mode=False).result




 Waiting for end of monitoring run 71790afc-05d2-4f4b-be08-f0afc85da02d 




finished

---------------------------
 Successfully finished run 
---------------------------




In [60]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=quality_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2024-07-16 06:13:19.969000+00:00,true_positive_rate,1267b942-5fe0-423c-8263-b32816c30099,0.3636363636363636,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,area_under_roc,1267b942-5fe0-423c-8263-b32816c30099,0.6587412587412588,0.8,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,precision,1267b942-5fe0-423c-8263-b32816c30099,0.8,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,matthews_correlation_coefficient,1267b942-5fe0-423c-8263-b32816c30099,0.4167242637192667,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,f1_measure,1267b942-5fe0-423c-8263-b32816c30099,0.5000000000000001,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,accuracy,1267b942-5fe0-423c-8263-b32816c30099,0.7551020408163265,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,label_skew,1267b942-5fe0-423c-8263-b32816c30099,0.6909336273400493,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,gini_coefficient,1267b942-5fe0-423c-8263-b32816c30099,0.3174825174825175,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,log_loss,1267b942-5fe0-423c-8263-b32816c30099,0.4493805793027406,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:13:19.969000+00:00,false_positive_rate,1267b942-5fe0-423c-8263-b32816c30099,0.0461538461538461,,,['model_type:original'],quality,f16dff88-1305-4584-8c7d-3056728e79a4,92b130ed-fc31-43c2-8b0e-a5f8d3a91345,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68


Note: First 10 records were displayed.


# Fairness, drift monitoring and explanations <a name="fairness"></a>

### Fairness configuration

The code below configures fairness monitoring for our model. It turns on monitoring for one features, MSSubClass. In each case, we must specify:

  * Which model feature to monitor
  * One or more **majority** groups, which are values of that feature that we expect to receive a higher percentage of favorable outcomes
  * One or more **minority** groups, which are values of that feature that we expect to receive a higher percentage of unfavorable outcomes
  * The threshold at which we would like OpenScale to display an alert if the fairness measurement falls below (in this case, 80%)

Additionally, we must specify which outcomes from the model are favourable outcomes, and which are unfavourable. We must also provide the number of records OpenScale will use to calculate the fairness score. In this case, OpenScale's fairness monitor will run hourly, but will not calculate a new fairness rating until at least 50 records have been added. Finally, to calculate fairness, OpenScale must perform some calculations on the training data, so we provide the dataframe containing the data.

In [None]:
wos_client.monitor_instances.show()

In [None]:
#wos_client.monitor_instances.delete(drift_monitor_instance_id,background_mode=False)

In [61]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id

)
parameters = {
    "features": [
       {
                "feature": "MSSubClass",
                "majority": [[50,70]],
                "threshold": 0.8,
                "minority": [[80,100]]
            }
    ],
    "favourable_class": [[200000,500000]],
    "unfavourable_class": [[35000,100000]],
    "min_records": 50
}

fairness_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.FAIRNESS.ID,
    target=target,
    parameters=parameters).result
fairness_monitor_instance_id =fairness_monitor_details.metadata.id
fairness_monitor_instance_id




 Waiting for end of monitor instance creation f6acf84d-fd92-44f3-9cbb-82fef3a13575 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




'f6acf84d-fd92-44f3-9cbb-82fef3a13575'

### Drift configuration

####  Note: you can choose to enable/disable (True or False) model or data drift within config

In [62]:
monitor_instances = wos_client.monitor_instances.list().result.monitor_instances
for monitor_instance in monitor_instances:
    monitor_def_id=monitor_instance.entity.monitor_definition_id
    if monitor_def_id == "drift" and monitor_instance.entity.target.target_id == subscription_id:
        wos_client.monitor_instances.delete(monitor_instance.metadata.id)
        print('Deleted existing drift monitor instance with id: ', monitor_instance.metadata.id)

In [63]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id

)
parameters = {
    "min_samples": 50,
    "drift_threshold": 0.1,
    "train_drift_model": True,
    "enable_model_drift": True,
    "enable_data_drift": True
}

drift_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.DRIFT.ID,
    target=target,
    parameters=parameters
).result

drift_monitor_instance_id = drift_monitor_details.metadata.id
drift_monitor_instance_id




 Waiting for end of monitor instance creation c5482035-2d6f-4454-af35-91868568b23c 




preparing.
active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




'c5482035-2d6f-4454-af35-91868568b23c'

#### Finding feature importances

This is done because we will deduce the important and most important features to the model. This will help in narrowing down the analysis of Drift evaluation in the UI

In [64]:
fields = [col for col in feature_data.columns if col != "SalePrice"]
feature_importance = dict(zip(fields, model_xgb.named_steps['xgb'].feature_importances_))

### Drift V2 configuration

In [65]:
monitor_instances = wos_client.monitor_instances.list().result.monitor_instances
for monitor_instance in monitor_instances:
    monitor_def_id=monitor_instance.entity.monitor_definition_id
    if monitor_def_id == "drift_v2" and monitor_instance.entity.target.target_id == subscription_id:
        wos_client.monitor_instances.delete(monitor_instance.metadata.id)
        print('Deleted existing drift v2 monitor instance with id: ', monitor_instance.metadata.id)

Deleted existing drift v2 monitor instance with id:  8d540372-c15c-4a1b-be45-90dace18da77


In [66]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)

parameters = {
        "min_samples": 10,
        "max_samples": 1000,
        "train_archive": True,
#         "features": {
#             "fields": feature_cols,
#             "importances": feature_importance
#         }
    }
drift_v2_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.DRIFT_V2.ID,
    target=target,
    parameters=parameters
).result

drift_v2_monitor_instance_id = drift_v2_monitor_details.metadata.id
drift_v2_monitor_instance_id

'1f80ac13-dc71-423b-b54a-2f477ca5f860'

## Score the model again now that monitoring is configured

This next section randomly selects 200 records from the data feed and sends those records to the model for predictions. This is enough to exceed the minimum threshold for records set in the previous section, which allows OpenScale to begin calculating fairness.

In [67]:
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/house_price/custom_scoring_payloads_50_regression.json

--2024-08-06 05:27:38--  https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/house_price/custom_scoring_payloads_50_regression.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12527 (12K) [text/plain]
Saving to: ‘custom_scoring_payloads_50_regression.json’


2024-08-06 05:27:39 (11.1 MB/s) - ‘custom_scoring_payloads_50_regression.json’ saved [12527/12527]



In [68]:
with open('custom_scoring_payloads_50_regression.json', 'r') as scoring_file:
    scoring_data = json.load(scoring_file)

In [69]:
import random

with open('custom_scoring_payloads_50_regression.json', 'r') as scoring_file:
    scoring_data = json.load(scoring_file)

fields = scoring_data[0]['request']['fields']
values = scoring_data[0]['request']['values']
payload_scoring = {"input_data": [{"fields": fields, "values": values}]}

scoring_response = wml_client.deployments.score(deployment_uid, payload_scoring)
time.sleep(5)

pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
if pl_records_count == 2:
    print("Payload logging did not happen, performing explicit payload logging.")
    wos_client.data_sets.store_records(data_set_id=payload_data_set_id, request_body=[PayloadRecord(
                   scoring_id=str(uuid.uuid4()),
                   request=payload_scoring,
                   response={"fields": scoring_response['predictions'][0]['fields'], "values":scoring_response['predictions'][0]['values']},
                   response_time=460
               )])
    time.sleep(5)
    pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))

In [70]:
print('Number of records in payload table: ', wos_client.data_sets.get_records_count(data_set_id=payload_data_set_id))

Number of records in payload table:  52


## Run fairness monitor

Kick off a fairness monitor run on current data. The monitor runs hourly, but can be manually initiated using the Python client, the REST API, or the graphical user interface.

In [71]:
run_details = wos_client.monitor_instances.run(monitor_instance_id=fairness_monitor_instance_id, background_mode=False)




 Waiting for end of monitoring run bfceb323-51ca-4055-80ea-25288a3a3919 




running
finished

---------------------------
 Successfully finished run 
---------------------------




In [72]:
time.sleep(10)

wos_client.monitor_instances.show_metrics(monitor_instance_id=fairness_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2024-08-06 05:27:54.153189+00:00,fairness_value,7cba2434-ab77-4d4f-bdd7-46e17b731420,100.0,80.0,,"['feature:MSSubClass', 'fairness_metric_type:fairness', 'feature_value:80-100']",fairness,f6acf84d-fd92-44f3-9cbb-82fef3a13575,bfceb323-51ca-4055-80ea-25288a3a3919,subscription,200c31a3-9a6f-452b-8512-1884e3e90c9e


## Run drift monitor


Kick off a drift monitor run on current data. The monitor runs every hour, but can be manually initiated using the Python client, the REST API.

In [73]:
drift_run_details = wos_client.monitor_instances.run(monitor_instance_id=drift_monitor_instance_id, background_mode=False)




 Waiting for end of monitoring run cc657f33-7964-4448-a720-305da0b2f687 




finished

---------------------------
 Successfully finished run 
---------------------------




In [74]:
time.sleep(5)

wos_client.monitor_instances.show_metrics(monitor_instance_id=drift_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2024-08-06 05:28:12.322844+00:00,data_drift_magnitude,15d45ed5-9c46-48eb-a3bb-374fac9c857a,1.0,,0.1,[],drift,c5482035-2d6f-4454-af35-91868568b23c,cc657f33-7964-4448-a720-305da0b2f687,subscription,200c31a3-9a6f-452b-8512-1884e3e90c9e


## Run drift V2 monitor


Kick off a drift v2 monitor run on current data. The monitor runs every day, but can be manually initiated using the Python client, the REST API.

In [None]:
drift_v2_run_details = wos_client.monitor_instances.run(monitor_instance_id=drift_v2_monitor_instance_id, background_mode=False)




 Waiting for end of monitoring run 0fd0b8bc-944d-49f0-a71d-7bf2ec7098c9 




running................................................
finished

---------------------------
 Successfully finished run 
---------------------------




In [84]:
time.sleep(15)

wos_client.monitor_instances.show_metrics(monitor_instance_id=drift_v2_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2024-07-16 06:19:52.904904+00:00,records_processed,b7e32f25-be24-4583-95b3-bf0c52109ffb,208.0,,,"['algorithm_used:total_variation', 'computed_on:payload', 'field_type:class', 'field_name:Class Probability for No Risk']",drift_v2,c61a2e92-9218-430a-96b2-0d6262fba0ae,0fd0b8bc-944d-49f0-a71d-7bf2ec7098c9,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:19:52.904904+00:00,confidence_drift_score,b7e32f25-be24-4583-95b3-bf0c52109ffb,0.1862,,0.05,"['algorithm_used:total_variation', 'computed_on:payload', 'field_type:class', 'field_name:Class Probability for No Risk']",drift_v2,c61a2e92-9218-430a-96b2-0d6262fba0ae,0fd0b8bc-944d-49f0-a71d-7bf2ec7098c9,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:19:52.904904+00:00,records_processed,b7e32f25-be24-4583-95b3-bf0c52109ffb,208.0,,,"['algorithm_used:overlap_coefficient', 'computed_on:payload', 'field_type:class', 'field_name:Class Probability for No Risk']",drift_v2,c61a2e92-9218-430a-96b2-0d6262fba0ae,0fd0b8bc-944d-49f0-a71d-7bf2ec7098c9,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:19:52.904904+00:00,confidence_drift_score,b7e32f25-be24-4583-95b3-bf0c52109ffb,0.1868,,0.05,"['algorithm_used:overlap_coefficient', 'computed_on:payload', 'field_type:class', 'field_name:Class Probability for No Risk']",drift_v2,c61a2e92-9218-430a-96b2-0d6262fba0ae,0fd0b8bc-944d-49f0-a71d-7bf2ec7098c9,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:19:58.188821+00:00,records_processed,e314eb41-daf5-4bc4-bf92-c56cccba4eb7,208.0,,,"['algorithm_used:total_variation', 'computed_on:payload', 'field_type:class', 'field_name:Class Probability for Risk']",drift_v2,c61a2e92-9218-430a-96b2-0d6262fba0ae,0fd0b8bc-944d-49f0-a71d-7bf2ec7098c9,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:19:58.188821+00:00,confidence_drift_score,e314eb41-daf5-4bc4-bf92-c56cccba4eb7,0.1657,,0.05,"['algorithm_used:total_variation', 'computed_on:payload', 'field_type:class', 'field_name:Class Probability for Risk']",drift_v2,c61a2e92-9218-430a-96b2-0d6262fba0ae,0fd0b8bc-944d-49f0-a71d-7bf2ec7098c9,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:19:58.188821+00:00,records_processed,e314eb41-daf5-4bc4-bf92-c56cccba4eb7,208.0,,,"['algorithm_used:overlap_coefficient', 'computed_on:payload', 'field_type:class', 'field_name:Class Probability for Risk']",drift_v2,c61a2e92-9218-430a-96b2-0d6262fba0ae,0fd0b8bc-944d-49f0-a71d-7bf2ec7098c9,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:19:58.188821+00:00,confidence_drift_score,e314eb41-daf5-4bc4-bf92-c56cccba4eb7,0.1667,,0.05,"['algorithm_used:overlap_coefficient', 'computed_on:payload', 'field_type:class', 'field_name:Class Probability for Risk']",drift_v2,c61a2e92-9218-430a-96b2-0d6262fba0ae,0fd0b8bc-944d-49f0-a71d-7bf2ec7098c9,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:19:58.596595+00:00,records_processed,75c7061d-8b95-4f94-bfc7-9c4b8d354476,208.0,,,"['algorithm_used:jensen_shannon', 'computed_on:payload', 'field_type:class', 'field_name:predictedLabel']",drift_v2,c61a2e92-9218-430a-96b2-0d6262fba0ae,0fd0b8bc-944d-49f0-a71d-7bf2ec7098c9,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68
2024-07-16 06:19:58.596595+00:00,prediction_drift_score,75c7061d-8b95-4f94-bfc7-9c4b8d354476,0.0469,,0.05,"['algorithm_used:jensen_shannon', 'computed_on:payload', 'field_type:class', 'field_name:predictedLabel']",drift_v2,c61a2e92-9218-430a-96b2-0d6262fba0ae,0fd0b8bc-944d-49f0-a71d-7bf2ec7098c9,subscription,f1a960ef-b474-43f9-8621-d0b1f2a4ef68


Note: First 10 records were displayed.


## Configure Explainability

Finally, we provide OpenScale with the training data to enable and configure the explainability features.

In [80]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)
parameters = {
    "enabled": True
}
explainability_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.EXPLAINABILITY.ID,
    target=target,
    parameters=parameters
).result

explainability_monitor_id = explainability_details.metadata.id




 Waiting for end of monitor instance creation 0b1bf6c2-3adb-4c4d-ab74-b2cd0f236466 




preparing
active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




## Run explanation for sample record

In [81]:
pl_records_resp = wos_client.data_sets.get_list_of_records(data_set_id=payload_data_set_id, limit=1, offset=0).result
scoring_ids = [pl_records_resp["records"][0]["entity"]["values"]["scoring_id"]]
print("Running explanations on scoring IDs: {}".format(scoring_ids))
explanation_types = ["lime", "contrastive"]
result = wos_client.monitor_instances.explanation_tasks(scoring_ids=scoring_ids, explanation_types=explanation_types, subscription_id=subscription_id).result
print(result)

Running explanations on scoring IDs: ['952b8f735f2c3033561fe82bd84ed51b-1']
{
  "metadata": {
    "explanation_task_ids": [
      "71b7c9ff-e86e-4e8c-add0-3554237882b6"
    ],
    "created_by": "IBMid-662005298W",
    "created_at": "2024-08-06T05:34:04.730157Z"
  }
}


In [82]:
explanation_task_id=result.to_dict()['metadata']['explanation_task_ids'][0]
explanation_task_id

'71b7c9ff-e86e-4e8c-add0-3554237882b6'

In [83]:
wos_client.monitor_instances.get_explanation_tasks(explanation_task_id=explanation_task_id, subscription_id=subscription_id).result.to_dict()

{'metadata': {'explanation_task_id': '71b7c9ff-e86e-4e8c-add0-3554237882b6',
  'created_by': 'IBMid-662005298W',
  'created_at': '2024-08-06T05:34:04.730157Z'},
 'entity': {'status': {'state': 'in_progress'},
  'asset': {'id': 'fcbeb140-dcd2-4ce2-a74e-fe613c252539',
   'name': 'house_price_xgbregression',
   'input_data_type': 'structured',
   'problem_type': 'regression',
   'deployment': {'id': 'e82b106e-0ce0-4bd2-8af7-60f78b46151d',
    'name': 'house_price_xgbregression_deployment'}},
  'scoring_id': '952b8f735f2c3033561fe82bd84ed51b-1'}}

## Additional data to help debugging

In [84]:
print('Datamart:', data_mart_id)
print('Model:', model_uid)
print('Deployment:', deployment_uid)

Datamart: 80e6093f-5acf-4eb7-9da6-7ba9bf56a929
Model: fcbeb140-dcd2-4ce2-a74e-fe613c252539
Deployment: e82b106e-0ce0-4bd2-8af7-60f78b46151d


## Identify transactions for Explainability

Transaction IDs identified by the cells below can be copied and pasted into the Explainability tab of the OpenScale dashboard.

In [85]:
wos_client.data_sets.show_records(payload_data_set_id, limit=5)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
,0.0,555.0,8.0,952b8f735f2c3033561fe82bd84ed51b-1,0.0,1040.0,2.0,2024-08-06T05:27:39.636966Z,0.0,1.0,0.0,871.0,292.0,2004.0,3.0,2046.0,132.0,3.0,283784.0625,62.0,2003.0,2003.0,2008.0,1.0,1006.0,8.0,0.0,0.0,0.0,7.0,60.0,320.0,e82b106e-0ce0-4bd2-8af7-60f78b46151d,5.0,998.0,1.0,1.0,866.0,0.0
,0.0,333.0,10.0,952b8f735f2c3033561fe82bd84ed51b-10,0.0,0.0,2.0,2024-08-06T05:27:39.636966Z,0.0,0.0,0.0,880.0,296.0,2004.0,3.0,1629.0,1603.0,3.0,283677.9375,0.0,2003.0,2003.0,2009.0,1.0,1629.0,7.0,0.0,479.0,0.0,8.0,20.0,0.0,e82b106e-0ce0-4bd2-8af7-60f78b46151d,5.0,3206.0,1.0,1.0,1124.0,0.0
,0.0,1181.0,4.0,952b8f735f2c3033561fe82bd84ed51b-11,0.0,1216.0,2.0,2024-08-06T05:27:39.636966Z,0.0,1.0,0.0,693.0,0.0,1991.0,4.0,2514.0,0.0,2.0,257960.984375,0.0,1990.0,1990.0,2006.0,1.0,1298.0,8.0,0.0,0.0,0.0,7.0,60.0,0.0,e82b106e-0ce0-4bd2-8af7-60f78b46151d,5.0,1216.0,0.0,0.0,1216.0,0.0
,0.0,399.0,11.0,952b8f735f2c3033561fe82bd84ed51b-12,0.0,0.0,1.0,2024-08-06T05:27:39.636966Z,0.0,0.0,0.0,338.0,0.0,1950.0,2.0,1077.0,961.0,1.0,68453.53125,0.0,1920.0,1920.0,2007.0,1.0,1077.0,6.0,0.0,0.0,0.0,5.0,30.0,0.0,e82b106e-0ce0-4bd2-8af7-60f78b46151d,2.0,961.0,0.0,0.0,0.0,0.0
,0.0,1411.0,6.0,952b8f735f2c3033561fe82bd84ed51b-13,0.0,896.0,2.0,2024-08-06T05:27:39.636966Z,0.0,1.0,0.0,622.0,0.0,2001.0,3.0,1840.0,278.0,2.0,229165.0625,45.0,2001.0,2001.0,2009.0,1.0,944.0,6.0,0.0,0.0,0.0,7.0,60.0,0.0,e82b106e-0ce0-4bd2-8af7-60f78b46151d,5.0,944.0,0.0,1.0,666.0,0.0


## Congratulations!

You have finished the hands-on lab for IBM Watson OpenScale. You can now view the [OpenScale Dashboard](https://aiopenscale.cloud.ibm.com/). Click on the tile for the House Price Regression model to see fairness, accuracy, and performance monitors. Click on the timeseries graph to get detailed information on transactions during a specific time window.
