<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# IBM Watson OpenScale - Onboard models for monitoring using scored training data table and a sample csv

This notebook must be run in the Python 3.10 runtime environment. It requires Watson OpenScale service credentials.

The notebook demonstrates how to onboard a model (which stores its runtime data in a remote Hive database) for monitoring in IBM Watson OpenScale. Use the notebook to enable quality, drift, drift v2, fairness and explainability monitoring. Before you can run the notebook, you must have the following resources:

1. Sample CSV file
2. Scored training data table (existing) in Hive
3. Feedback, Payload, Drifted transactions, Explanations Queue and Result tables details (either existing or to be created) in an Hive.

## Contents

1. [Setup](#setup)
2. [Provide path to sample csv file containing training data](#path-to-csv)
3. [Provide Storage Details](#backend-storage)
4. [Provide Table Details](#table-details)
5. [Provide model details](#model_details)
2. [Provide Spark Compute Engine Details](#spark)
5. [Connect to IBM Watson OpenScale Instance](#connect-openscale)
6. [Connect service provider in IBM Watson OpenScale Instance](#create-service-provider)
7. [Onboard model for monitoring in IBM Watson OpenScale Instance](#create-subscription)
9. [Enable services to monitor model](#enable-monitors)

## Setup <a name="setup"></a>

### Installing Required Libraries

First import some of the packages you need to use. After you finish installing the following software packages, restart the kernel.

In [None]:
import warnings
warnings.filterwarnings("ignore")
%env PIP_DISABLE_PIP_VERSION_CHECK=1

# Note: Restart kernel after the dependencies are installed
!pip install --upgrade ibm-watson-openscale
!pip install "ibm_wos_utils~=5.2.0"

## Provide path to sample csv file containing model input and output including label column <a name="path-to-csv">
This csv file is required to understand model input and output columns and their data-types. Provide path location of csv file here.

Please note if you are executing this notebook in IBM Watson Studio, first upload the csv file to project and use provided code snippet to download it to local directory of this notebook.

In [13]:
# # Download "sample_csv" from project to local directory
# from ibm_watson_studio_lib import access_project_or_space
# wslib = access_project_or_space()
# wslib.download_file("sample_csv")
sample_csv = ""

## Provide Backend Storage Details <a name="backend-storage"></a>

IBM Watson OpenScale services monitors models by analyzing runtime data, i.e., the data model is making predictions on. To do this analysis, most of the services require access to this runtime data (also called payload data). In addition, some of the services may require access to manually labelled runtime data (also called feedback data). Hence, user needs to store such data in some backend storage and connect this storage to IBM Watson OpenScale.

### Provide Hive database connection details

| Parameter | Description | Possible Value(s) |
| :- | :- | :- |
| type | Describes the type of storage being used. For hive, this must be set to `hive`. | `hive` |
| metastore_url | An optional string value specifying hive metastore url. Example: `thrift://localhost:9083` | |
| location_type | Identifies the type of location for connection to use. For hive, this must be set to `metastore`. | `metastore` |

#### Provide additional details related Hadoop delegation token if the Hive is Kerberos secured and Spark in IBM Analytics Engine is used [Optional]
| Parameter | Description | Possible Value(s) |
| :- | :- | :- |
| kerberos_principal | The kerberos principal used to generate the delegation token. | |
| delegation_token_urn | The secret_urn of the CP4D vault where the delegation token is stored. | |
| delegation_token_endpoint | The REST endpoint which generates and returns the delegation token. | |

In [14]:
datawarehouse_details = {
    "type": "hive",
    "connection": {
        "location_type": "metastore",
        "metastore_url": ""
    },
    "credentials": {}
}

# Flag to indicate if the Hive is secured with Kerberos and Spark in IAE is used
kerberos_enabled = False

# Provide Hadoop delegation token details if kerberos_enabled is True
# Provide either secret_urn of the CP4D vault OR the delegation token endpoint. One of the two fields is mandatory to fetch the delegation token.
kerberos_principal = ""
delegation_token_urn = ""
delegation_token_endpoint = ""

if kerberos_enabled is True:
    datawarehouse_details["connection"]["kerberos_enabled"] = True
    datawarehouse_details["credentials"]["kerberos_principal"] = kerberos_principal
    if delegation_token_urn:
        datawarehouse_details["credentials"]["delegation_token_urn"] = delegation_token_urn
    if delegation_token_endpoint:
        datawarehouse_details["credentials"]["delegation_token_endpoint"] = delegation_token_endpoint

## Provide details of different tables<a name="table-details"></a>

IBM Watson OpenScale services require different tables to perform their analysis. Depending on which services you have enabled, provide details of the corresponding tables.
Tables are:

| Table | Description |
| :- | :- |
| Payload Table | Hosts the runtime data predicted by model. Required for detecting fairness and drift in runtime data. |
| Feedback Table | Hosts the manually labelled runtime data (also called feedback data) predicted by model. Required for tracking quality of monitor by analyzing feedback data. |
| Drifted Transactions Table | Hosts the data identified to be drifted.|
| Explain Queue Table | Hosts the data for which explanations are required to be generated. This can be same as payload table.|
| Explain Results Table | Hosts the explanations generated for records in explain queue table. |
| Scored Training Data Table | Contains the details of table containing scored training data. If you dont have this table available, Please refer to [this notebook](https://github.ibm.com/aiopenscale/api-client-utils/blob/master/notebooks/batch/4.6/jdbc/common_configuration_notebook_simplified_jdbc.ipynb) for creating Scored Training Data Table. Scored training data table should be available in the DATABASE. |

For each of the table, following information is required:

| Parameter | Description | Possible Value(s) |
| :- | :- | :- |
| database | Name of the database hosting the schema. | |
| table | Name of the table. | |
| auto_create | Boolean value identifying if the table already exists or has to be created via IBM Watson OpenScale. | `True` or `False`|
| hive_storage_format | Storage format to use for data in tables. Used only when tables are created using IBM Watson OpenScale. | `csv`, `parquet`, `orc` |

In [15]:
DATABASE_NAME= ""
# Scored training data table information 
scored_training_data_table = {
    "data": {
        "auto_create": False, #set it to False if table already exists
        "database": DATABASE_NAME,
        "table": ""
    },
    "parameters":{
        "hive_storage_format": "csv"
    }
}

In [16]:
DATABASE_NAME=""

# Payload table information
payload_table = {
    "data": {
        "auto_create": True, #set it to False if table already exists
        "database": DATABASE_NAME,
        "table": ""
    },
    "parameters":{
        "hive_storage_format": ""
    }
}

# Feedback table information
feedback_table = {
    "data": {
        "auto_create": True, #set it to False if table already exists
        "database": DATABASE_NAME,
        "table": ""
    },
    "parameters":{
        "hive_storage_format": ""
    }
}

# The below tables are required by monitors

#Drifted Transaction table. 
#Set this table information if drift is enabled
drifted_transaction_table = {
    "data": {
        "auto_create": True, #set it to False if table already exists
        "database": DATABASE_NAME,
        "table": ""
    },
    "parameters":{}
}

#Explanation Result table
#Set this table information if Explain is enabled
explain_result_table = {
    "data": {
        "auto_create": True, #set it to False if table already exists
        "database": DATABASE_NAME,
        "table": ""
    }
}

#Explanation Queue table
#Set this table information if Explain is enabled
explain_queue_table = {
    "data": {
        "auto_create": True, #set it to False if table already exists
        "database": DATABASE_NAME,
        "table": ""
    },
    "parameters":{
        "hive_storage_format": ""
    }
}

## Provide Model Details <a name="model-details"></a>

| Parameter | Description | Possible Value(s) |
| :- | :- | :- |
| label_column | The column which contains the target field (also known as label column or the class label). | |
| model_type | Enumeration classifying if your model is a binary or a multi-class classifier or a regressor. | `binary`, `multiclass`, `regression` |
| prediction | The column containing the model output. This should be of the same data type as the label column. | |
| probability | The column (of type array) containing the model probabilities for all the possible prediction outcomes. This is not required for regression models. | |
| url | scoring url for the deployed model. | |
| token | scoring token for the deployed model. This is required only for Azure ML studio model | |
| feature_columns | Columns identified as features by model. If user is not providing this, it will be inferred from the input csv file. | A list of column names, `None` |
| categorical_columns | Feature columns identified as categorical by model. If user is not providing this, it will be inferred from the input csv file. | A list of column names,  `None` |

## Select IBM Watson OpenScale services

| Parameter | Description | Possible Value(s) |
| :- | :- | :- |
| enable_quality | Boolean value to allow generation of common configuration details needed if quality alone is selected | `True` or `False` |
| enable_fairness | Boolean value to allow generation of fairness specific data distribution needed for configuration | `True` or `False` |
| enable_drift | Boolean value to allow generation of Drift Archive containing relevant information for Model and Data Drift. | `True` or `False` |
| enable_drift_v2 | Boolean value to allow generation of Drift v2 Archive. | `True` or `False` |
| enable_explainability | Boolean value to allow generation of explainability configuration and perturbations | `True` or `False` |
| parameters | Provide the parameters for a monitor that needs to get enabled, | |
| thresholds | Provide the thresholds for faireness and quality monitor if that monitor needs to get enabled | |
| train_drift_model | It is set to `True` to train drift model and learn stats online. |`True` or `False` |
| enable_online_learning | It is set to `True` to generate the stats and scored perturbations online. |`True` or `False` |

In [17]:
model_info = {
    "model_type": "",
    "label_column": "",
    "prediction": "",
    "probability": "",
    "feature_columns": [""],
    "categorical_columns": [""],
    "scoring":{
        "url":"",
        "token":""
    }
}

monitors_config = {
    "fairness_configuration": {
        "enabled": True,
        "parameters":{
        },
        "thresholds": [
        ]
    },
    "quality_configuration": {
        "enabled": True,
        "parameters" : {
        },
        "thresholds" : [
        ]
    },
    "drift_configuration": {
        "enabled": True,
        "parameters":{
            "train_drift_model": True
        }
    },
    "explainability_configuration":{
        "enabled": True,
        "parameters":{
            "enable_online_learning": True,
            # Set below params to enable global explanation. Available from Cloud Pak for Data 4.6.4 onwards.
            #"global_explanation": {
            #    "enabled": True,
            #    "explanation_method": "lime", # The explanation method
            #    "training_data_sample_size": 1000, # [Optional] The sample size of records to be used for generating training data global explanation. If not specified entire training data is used.
            #    "sample_size": 1000, # [Optional] The sample size of records to be used for generating payload data global explanation. If not specified entire data in the payload window is used.
            #}
        }
    },
    "drift_v2_configuration":{
        "enabled": True,
        "parameters": {
            "train_archive": True,
            "feature_importance": [], # required field
            "most_important_features":[],
            "important_input_metadata_columns": [] # <- Add this if input metadata drift to be calculated and meta columns are available
        }
    }
    
}

## Provide Spark Connection Details <a name="spark"></a>

To generate configuration for monitoring models in IBM Watson OpenScale, a spark compute engine is required. It can be either IBM Analytics Engine or your own Spark Cluster. Provide details of any one of them in this section.

Please note, if you are using your own Spark cluster, checkout IBM Watson OpenScale documentation on how to setup spark manager API to enable interface for use with IBM Watson OpenScale services.

### Parameters for IBM Analytics Engine
If your job is going to run on Spark cluster as part of an IBM Analytics Engine instance on IBM Cloud Pak for Data, enter the following details:

| Parameter | Description | Possible Value(s) |
| :- | :- | :- |
| display_name | Display Name of the Spark instance in IBM Analytics Engine | |
| location_type | Identifies if compute engine is IBM IAE or Remote Spark. For IBM IAE, this must be set to `cpd_iae`. | `cpd_iae` |
| endpoint | Spark Jobs Endpoint for IBM Analytics Engine | |
| volume | IBM Cloud Pak for Data storage volume name | |
| username | IBM Cloud Pak for Data username | |
| apikey | IBM Cloud Pak for Data API key | |

### Parameters for Remote Spark Cluster
If your job is going to run on Spark Cluster as part of a Remote Hadoop Ecosystem, enter the following details:

| Parameter | Description | Possible Value(s) |
| :- | :- | :- |
| location_type | Identifies if compute engine is IBM IAE or Remote Spark. For Remote Spark, this must be set to `custom`. | `custom` |
| endpoint | Endpoint URL where the Spark Manager Application is running | |
| username | Username to connect to Spark Manager Application | |
| password | Password to connect to Spark Manager Application | |


### Provide Spark Resource Settings [Optional]
Configure how much of your Spark Cluster resources can this job consume. Leave the variable `spark_settings` to `{}` if no customisation is required.

| Parameter | Description |
| :- | :- |
| max_num_executors | Maximum Number of executors to launch for this session |
| min_executors | Minimum Number of executors to launch for this session |
| executor_cores | Number of cores to use for each executor |
| executor_memory | Amount of memory (in GBs) to use per executor process |
| driver_cores | Number of cores to use for the driver process |
| driver_memory | Amount of memory (in GBs) to use for the driver process |

In [18]:
spark_connection_info = {
    "connection": {
        "endpoint": "",
        "location_type": "",
        "display_name": "",
        "volume": "",
        "instance_id":""
    },
    "credentials": {
        "username": "",
        "password": "",
        "apikey": ""
    }
}

"""
Example:

spark_settings = {
    # max_num_executors: Maximum Number of executors to launch for this session
    "max_num_executors": "2",
    
    # min_executors: Minimum Number of executors to launch for this session
    "min_executors": "1",
    
    # executor_cores: Number of cores to use for each executor
    "executor_cores": "2",
    
    # executor_memory: Amount of memory (in GBs) to use per executor process
    "executor_memory": "2",
    
    # driver_cores: Number of cores to use for the driver process
    "driver_cores": "2",
    
    # driver_memory: Amount of memory (in GBs) to use for the driver process 
    "driver_memory": "1"
}
"""
spark_settings = {}

spark_connection_info["spark_settings"] = spark_settings

## Connect to IBM Watson OpenScale instance <a name="connect-openscale"></a>

Following information is required to connect to IBM Watson OpenScale instance:

| Parameter | Description |
| :- | :- |
| url | Base url of your Cloud Pak for Data cluster hosting IBM Watson OpenScale instance. |
| username | Username to connect to your IBM Watson OpenScale instance in Cloud Pak for Data cluster. |
| password | Password to connect to your IBM Watson OpenScale instance in  Cloud Pak for Data cluster. One of `password` or `api_key` must be provided. |
| api_key | API Key to connect to your IBM Watson OpenScale instance in Cloud Pak for Data cluster. One of `password` or `api_key` must be provided. |
| service_instance_id | Id of your IBM Watson OpenScale Instance |

In [19]:
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator
from ibm_watson_openscale import APIClient

import warnings
warnings.filterwarnings('ignore')

service_instance_id = "" #Default is 00000000-0000-0000-0000-000000000000
service_credentials = {
    "url": "",
    "username": "",
    "password": ""
    #     "apikey":""
}

authenticator = CloudPakForDataAuthenticator(
    url=service_credentials['url'],
    username=service_credentials['username'],
    password=service_credentials['password'],
    #     apikey=service_credentials['apikey'],
    disable_ssl_verification=True
)

client = APIClient(
    service_url=service_credentials['url'],
    service_instance_id=service_instance_id,
    authenticator=authenticator
)

print(client.version)

3.0.45.2


## Configure Machine Learning Provider in IBM Watson OpenScale instance <a name="create-service-provider"></a>

Before configuring model for monitoring in IBM Watson OpenScale, you need to connect your machine learning provider with IBM Watson OpenScale instance. Since, we are configuring a model for monitoring which has its runtime data located remotely to IBM Watson OpenScale, we'll create a custom machine learning provider in given instance.

Following details are required:

| Parameter | Description |
| :- | :- |
| name | Name of the machine learning provider being configured. This can be any string value. |
| description | Description for the machine learning provider being configured. |
| service_type | Identifies type of the machine learning provider. In this case, this value must be `ServiceTypes.CUSTOM_MACHINE_LEARNING` |
| credentials | Stores username and password to connect to machine learning provider. |
| deployment_space_id | Identifies the space where the model is deployed. |
| operational_space_id | Defines the classification of machine learning provider. Possible values are `pre-production` and `production`. |

## Onboard model for monitoring in IBM Watson OpenScale instance <a name="create-subscription"></a>

When you configure a model for monitoring in IBM Watson OpenScale instance, a corresponding subscription is created for this model. Following details are required:

| Parameter | Description |
| :- | :- |
| subscription_name | Name of the subscription to use. This can be any string value typically identifying model being monitored. |
| datamart_id | Same as id of IBM Watson OpenScale instance. |
| service_provider_id | Id of the machine learning provider instance created in IBM Watson OpenScale. |
| model_info | Details of the model to be monitored |
| sample_csv | Path to the csv file containing scored training data |
| spark_credentials | Connection details of Spark compute engine to use for analysis by different IBM Watson OpenScale services. |
| payload_table | Details of the payload table to be used with this subscription. |
| feedback_table | Details of the feedback table to be used with this subscription. |
| scored_training_data_table | Details of the scored training data table to be used with this subscription. |
| managed_by | To identify whether the subscription is `system` managed (Model transactions are stored in the OpenScale database and evaluated using OpenScale computing resources) or `self` managed (Model transactions are stored in a your own data warehouse and evaluated by your Spark analytics engine.) . This function is not supporting system managed subscriptions as of now. |

In [20]:
# [OPTIONAL] Delete existing service provider with the same name as provided

# SERVICE_PROVIDER_NAME = ""
# service_providers = client.service_providers.list().result.service_providers
# for provider in service_providers:
#     if provider.entity.name == SERVICE_PROVIDER_NAME:
#         client.service_providers.delete(service_provider_id=provider.metadata.id)
#         break

# Add Service Provider
from ibm_watson_openscale.supporting_classes.enums import ServiceTypes
# from ibm_watson_openscale.base_classes.watson_open_scale_v2 import CustomCredentials

added_service_provider_result = client.service_providers.add(
        name="",
        description="",
        service_type=ServiceTypes.CUSTOM_MACHINE_LEARNING,
        credentials={},
        deployment_space_id = "",
        operational_space_id="",
        background_mode=False
    ).result

service_provider_id = added_service_provider_result.metadata.id

client.service_providers.show()




 Waiting for end of adding service provider 01963d9c-8b77-76d3-b647-aa75d5b73d16 




active

-----------------------------------------------
 Successfully finished adding service provider 
-----------------------------------------------




0,1,2,3,4,5
,active,Final-Batch-Sample-CSV-Hive,custom_machine_learning,2025-04-16 07:59:51.679000+00:00,01963d9c-8b77-76d3-b647-aa75d5b73d16
,active,GCR-Driftv2-Batch-Sample-CSV-DB2,custom_machine_learning,2025-04-15 09:58:01.562000+00:00,019638e2-5e3a-703e-a39f-fce0404f5fea
,active,GCR-Driftv2-Batch-Sample-CSV-JDBC,custom_machine_learning,2025-04-15 08:16:41.875000+00:00,01963885-9988-7328-9e2d-8c2f1aca6025
99999999-9999-9999-9999-999999999999,active,service-provider-space-f18c065b-6096-448c-b1aa-80bd23f1cecb,watson_machine_learning,2025-04-15 07:58:34.729000+00:00,01963875-0294-7954-90f5-2606051356c8
,active,CUSTOM_APIKEY_CLOUD_WITHOUTAPI_PREPROD,custom_machine_learning,2025-04-14 05:58:07.777000+00:00,019632e0-6096-7a89-8be6-628d39f2bef2
,active,CUSTOM_APIKEY_CLOUD_WITHOUTAPI,custom_machine_learning,2025-04-14 05:58:07.574000+00:00,019632e0-5fb8-7c3c-abc4-1a84f67a91c6
4ca7ec1c-6b35-48ef-a45c-0434c6985058,active,MRM_WMLV4_CLOUD_PREPROD,watson_machine_learning,2025-04-14 05:58:07.415000+00:00,019632e0-5e59-7b4d-b912-3a27e1e18391
4ca7ec1c-6b35-48ef-a45c-0434c6985058,active,MRM_WMLV4_CLOUD_PROD,watson_machine_learning,2025-04-14 05:58:04.460000+00:00,019632e0-50f0-7b96-b640-cddd7685e822
,active,CUSTOM_HLS_PREPROD,custom_machine_learning,2025-04-14 05:57:59.876000+00:00,019632e0-41ba-7b9c-8ee2-2ce5df750412
,active,CUSTOM_BATCH_PROD,custom_machine_learning,2025-04-14 05:57:59.780000+00:00,019632e0-4159-791d-b775-1035fe606be9


Note: First 10 records were displayed.


Note: First 10 records were displayed.


In [21]:
subscription_id = client.subscriptions.create_subscription_using_training_data(
    subscription_name="My SDK Batch Subscription-hive",
    datamart_id=service_instance_id,
    service_provider_id=service_provider_id,
    model_info=model_info,
    sample_csv = sample_csv,
    spark_credentials=spark_connection_info,
    data_warehouse_connection = datawarehouse_details,
    payload_table=payload_table,
    feedback_table=feedback_table,
    scored_training_data_table = scored_training_data_table,
    managed_by="self"
)

print("Subscription id is {}".format(subscription_id))

# Wait for the subscription to get in active state and to create the 
# required tables in the background before moving onto enabling monitors

# import time
# from datetime import datetime

# subscription_status = None
# while subscription_status not in ("active", "error"):
#     subscription_status = client.subscriptions.get(subscription_id).result.entity.status.state
#     if subscription_status not in ("active", "error"):
#         print(datetime.now().strftime("%H:%M:%S"), subscription_status)
#         time.sleep(15)
        
# print(datetime.now().strftime("%H:%M:%S"), subscription_status)


Creating integrated system for Spark
Integrated system 01963d9c-a87f-7588-911a-3e70d994d77f created 
Creating integrated system for Hive/DB2
Hive/Db2 Integrated system 01963d9c-a925-7878-ad2f-4ce47e94a3e8 created
Updating schemas ...
08:00:00 preparing
Schemas update completed.
Updating data-sources ...
Data-sources update complete.
Subscription is created. Id is : 01963d9c-a9d1-7f85-bf0f-5f7193460ee8
Subscription is being activated, please wait for state to be active before using it further.
Subscription id is 01963d9c-a9d1-7f85-bf0f-5f7193460ee8


Creating integrated system for Spark


Integrated system 01963a38-fdd5-7c34-a651-13190d3aea8e created 
Creating integrated system for Hive/DB2
Hive/Db2 Integrated system 01963a38-ff38-782d-b4f2-1f412be9c31c created


Updating schemas ...


16:12:18 preparing


Schemas update completed.
Updating data-sources ...


Data-sources update complete.
Subscription is created. Id is : 01963a38-ffa1-784e-8811-e044ff914431
Subscription is being activated, please wait for state to be active before using it further.
Subscription id is 01963a38-ffa1-784e-8811-e044ff914431


## Enable different services to monitor model <a name="enable-monitors"></a>

Depending on the services enabled in `monitors_config`, different services are enabled in given subscription. There services are called monitors.

Following details are required:

| Parameter | Description |
| :- | :- |
| datamart_id | Same as id of IBM Watson OpenScale instance. |
| subscription_id | Id of the subscription created for given model in IBM Watson OpenScale instance. |
| monitors_config | Details of the monitores that needs to get configured. |
| drifted_transaction_table | Details of the drifted transactions table to be used with this subscription. |
| explain_queue_table | Details of the explain queue table to be used with this subscription. |
| explain_results_table | Details of the explain results table to be used with this subscription. |

In [23]:
instance_ids = client.monitor_instances.enable_monitor_using_training_data(
                  datamart_id = service_instance_id,
                  subscription_id = subscription_id,
                  monitors_config = monitors_config,
                  drifted_transaction_table = drifted_transaction_table,
                  explain_queue_table = explain_queue_table,
                  explain_results_table = explain_result_table
)

print(instance_ids)

## Track each monitor instance status
# for key, value in instance_ids.items():
#     monitor_instance_status = None

#     while monitor_instance_status not in ("active", "error"):
#         monitor_instance_details = client.monitor_instances.get(monitor_instance_id=value).result
#         monitor_instance_status = monitor_instance_details.entity.status.state
#         if monitor_instance_status not in ("active", "error"):
#             print(datetime.now().strftime("%H:%M:%S"), monitor_instance_status)
#             time.sleep(30)

#     print(key, monitor_instance_status)

Enabling Drift V2....
{'drift_v2': '01963d9d-71df-7349-bae3-cae49d2d5382'}


## Congratulations!

All the monitors have been enabled. It will take some time for monitors to get into active state. You can track the status of each monitor separately by using above code snippet.

Once, all monitors are active, load data into payload or feedback table and either run on-demand evaluations or wait for scheduled evaluations to complete for each monitor. You can check more details in [Watson OpenScale Dashboard](https://url-to-your-cp4d-cluster/aiopenscale).

## Helper Methods

### Cleanup subscription and its related artefacts
Crawls through subscription json and identifies entities to be deleted. Currently, following entities are identified and deleted:
- Analytics Engine integrated system
- Data Warehouse Connection integrated system(s)

In [None]:
# # Uncomment and update following if you are running this at a later point of time or 
# # separate from this notebook with no subscription id and wos client session

# from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator
# from ibm_watson_openscale import APIClient

# import warnings
# warnings.filterwarnings('ignore')

# service_instance_id = "<SERVICE_INSTANCE_ID>" #Default is 00000000-0000-0000-0000-000000000000
# service_credentials = {
#     "url": "<to_be_edited>",
#     "username": "<to_be_edited>",
#     "password": "<to_be_edited>",
# #     "apikey":"<to_be_edited>"
# }

# authenticator = CloudPakForDataAuthenticator(
#     url=service_credentials['url'],
#     username=service_credentials['username'],
#     password=service_credentials['password'],
# #     apikey=service_credentials['apikey'],
#     disable_ssl_verification=True
# )

# client = APIClient(
#     service_url=service_credentials['url'],
#     service_instance_id=service_instance_id,
#     authenticator=authenticator
# )

# print(client.version)

# subscription_id = "<to_be_edited>"

subscription_details = client.subscriptions.get(
    subscription_id=subscription_id).result.to_dict()
subscription_entity = subscription_details.get("entity", {})

integrated_systems_id = []

# add analytics engine integrated system id
analytics_engine = subscription_entity.get("analytics_engine", {})
if analytics_engine and analytics_engine.get("integrated_system_id"):
    print("Found integrated system for analytics engine with type: {}".format(
        analytics_engine.get("type")))
    integrated_systems_id.append(analytics_engine.get("integrated_system_id"))

# add data source integrated system ids
data_sources = subscription_entity.get("data_sources", [])
for data_source in data_sources:
    if not data_source.get("connection"):
        continue

    if not data_source.get("connection").get("integrated_system_id"):
        continue

    integrated_system_id = data_source.get("connection").get("integrated_system_id")
    if integrated_system_id in integrated_systems_id:
        continue

    print("Found integrated system for data source with type: {}".format(
        data_source.get("type")))
    integrated_systems_id.append(integrated_system_id)
    
print("Integrated Systems to delete: {}".format(integrated_systems_id))
    
# delete subscription
client.subscriptions.delete(
    subscription_id=subscription_id,
    background_mode=False)

# wait time for subscription delete to complete
import time
time.sleep(30)

# delete all integrated systems
for integrated_system_id in integrated_systems_id:
    print("Deleting integrated system with id: {}".format(integrated_system_id))
    client.integrated_systems.delete(integrated_system_id)
    
    # wait time for integrated system delete to complete
    time.sleep(10)
    
print("Cleanup Complete!!!")