<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with a custom metrics provider

This notebook should be run in a Watson Studio project, using **IBM Runtime 24.1 on Python 3.11 XS** runtime environment. **If you are viewing this in Watson Studio and do not see the required runtime env in the upper right corner of your screen, please update the runtime now.** It requires service credentials for the following services:
  * Watson OpenScale
  * Watson Machine Learning

## Overview

This sample notebook demonstrates how to configure a custom monitor and compute metrics such as answer completeness, answer relevance, HAP, and PII for LLM subscriptions. Based on the values specified in the configuration cell, it automatically creates the custom monitor definition, WML batch deployment for the Python function, custom metrics provider with the deployment scoring endpoint, and a custom dataset for storing record-level metrics. Users must update the appropriate metric computation logic inside the Python function.

During each run, OpenScale invokes the custom metrics provider(python function) and sends inputs like data_mart_id, subscription_id,  custom_monitor_id and other parameters. The provider then:

- Reads data from feedback, payload logging, or other datasets.
- Computes record-level metrics and saves them to the custom dataset.
- Computes and publishes aggregated metrics to the Measurements API.
- Updates the monitor run status to Finished.



## Contents

This notebook contains the following parts:

  1. [Set up your environment](#setup)
  1. [Configure values for the custom monitor](#provider)
  1. [Create the custom metrics provider - python function](#deployment)
  1. [Configure Watson OpenScale](#config)
  1. [Set up the custom monitor](#custom_monitor)
  1. [Get the custom monitor configuration](#get_config)
  1. [Run the custom monitor](#run)
  1. [Risk evaluations for subscription](#evaluate_risk)


## 1. Set up your environment <a name="#setup"></a>

Before you use the sample code in this notebook, you must perform the following setup tasks:

### Install the  `ibm_watsonx_ai` and `ibm-watson-openscale` packages.

In [None]:
!pip install --upgrade ibm_watsonx_ai   | tail -n 1
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1

### Action: restart the kernel!

### Credentials for IBM Cloud
To authenticate, in the following code boxes, replace the sample data with your own credentials. Get the information from your system administrator or through the IBM Cloud dashboard.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [1]:
############################################################################################
# Paste your credentials into the following section and then run this cell.
############################################################################################
CLOUD_API_KEY = "<Your Cloud IAM API Key>"

In [2]:
SPACE_ID = "<Your space id>"
DATAMART_ID =  "<DataMart Id>"
SUBSCRIPTION_ID= "<Subscription Id>"
#PROJECT_ID = "<Your project id>" #update the project id for pre-production subscription

OPENSCALE_API_URL = "https://api.aiopenscale.cloud.ibm.com"
IAM_URL = "https://iam.cloud.ibm.com/oidc/token"

## 2. Configure values for the custom monitor <a name="setup"></a>

Default values for the following custom monitor parameters are set. You can override them by specifying parameter values in the configuration cell.

| Parameter Name                                | Type           | Optional | Description                                                                 | Default Value                              |
|----------------------------------------------|----------------|----------|-----------------------------------------------------------------------------|-------------------------------------------|
| `DEPLOYMENT_NAME`                             | string         | Yes      | Name of the function deployment                                             | `"Custom Metrics Provider Deployment"`   |
| `PYTHON_FUNCTION_NAME`                        | string         | Yes      | Name of the Python function to be deployed                                 | `"Custom Metrics Provider Function"`      |
| `CUSTOM_METRICS_PROVIDER_NAME`                | string         | Yes      | Name for the Custom Metrics Provider                                       | `"Custom Metrics Provider"`               |
| `CUSTOM_MONITOR_NAME`                         | string         | Yes      | Name of the custom monitor                                                 | `"Sample Model Performance"`              |
| `DATAMART_ID`                                 | string         | Yes      | Watson OpenScale DataMart GUID                                             | `"00000000-0000-0000-0000-000000000000"`  |
| `SPACE_ID`                                    | string         | No      | Watson OpenScale Space ID                                                   | `"<Your Space ID>"`  |
| `RUNTIME_ENV`                                 | string         | Yes      | Runtime environment for the Python function                                | `"runtime-24.1-py3.11"`                   |
| `ENABLE_SCHEDULE`                             | boolean        | Yes      | Flag to enable scheduled runs of the monitor                               | `True`                                    |
| `START_TIME`                                  | string         | Yes      | Scheduled run start time (format: `HH:MM:SS`)                              | `"10:00:00"`                              |
| `CUSTOM_METRICS_WAIT_TIME`                    | integer        | Yes      | Time in seconds to check the run status                                    | `300`                                     |
| `DELETE_CUSTOM_MONITOR`                       | boolean        | Yes      | Flag to delete any existing monitor with the same name                     | `True`                                   |
| `DELETE_CUSTOM_MONITOR_INSTANCE`              | boolean        | Yes      | Flag to delete any existing monitor instance                               | `True`                                   |
| `DELETE_INTEGRATED_SYSTEM`                    | boolean        | Yes      | Flag to delete the existing python function and associated custom metric provider                         | `True`                                   |
| `ALGORITHM_TYPES`                              | list[string]   | Yes      | Types of algorithms used (`binary`, `regression`, etc.)                    | `["binary","multiclass","regression","question_answering","summarization","retrieval_augmented_generation","classification","generation","code_generation_and_conversion","extraction","translation"]`                              |
| `INPUT_DATA_TYPES`                              | list[string]   | Yes      | Type of input data (`structured`, `unstructured`)                          | `["structured","unstructured_text","unstructured_image"]`                          |
| `WOS_URL`                                      | string         | No       | URL of the Watson OpenScale instance                                       | `"https://api.aiopenscale.cloud.ibm.com"` |
| `WML_URL`                                      | string         | No       | URL of Watson Machine Learning instance                                    | `"https://us-south.ml.cloud.ibm.com"`     |
| `CLOUD_API_KEY`                                | string         | No       | IBM Cloud API Key for IAM authentication                                   | `"<Your Cloud IAM API Key>"`             |
| `IAM_URL`                                      | string         | No       | IAM authentication URL                                                     | `"https://iam.ng.bluemix.net/oidc/token"` |
| `SUBSCRIPTION_ID`                              | string         | Yes      | ID of the subscription to be monitored                                     | `"<Subscription Id>"`                    |
| `CUSTOM_MONITOR_METRICS`                       | list[dict]     | No       | List of metric definitions used in the custom monitor                      |                                           |
| └─ `name`                                      | string         | No       | Name of the custom metric (e.g., `sensitivity`)                            |                                           |
| └─ `description`                               | string         | No       | Human-readable description of the metric                                   |                                           |
| └─ `type`                                      | string         | No       | Data type of the metric value (e.g., `number`)                             |                                           |
| `CUSTOM_METRICS_PROVIDER_CREDENTIALS`          | dict           | No       | Dictionary with authentication method for custom metrics provider          |                                           |
| └─ `auth_type`                                 | string         | No       | Authentication method (e.g., `bearer`)                                     |                                           |
| └─ `token_info`                                | dict           | Yes      | Token generation details (used for bearer tokens)                          |                                           |
|     └─ `url`                                   | string         | No       | URL to request IAM token                                                   |                                           |
|     └─ `headers`                               | dict           | No       | HTTP headers for token request                                             |                                           |
| `SCHEDULE              `                       | list[dict]     | No       | List of metric definitions used in the custom monitor                      |                                           |
| └─ `repeat_interval`                           | integer        | No       | Interval between scheduled executions                                      | `1`                                       |
| └─ `repeat_type`                               | string         | No       | Unit of repeat interval (`hour`, `day`, etc.)                              | `"hour"`                                  |
| └─ `delay_unit`                                | string         | No       | Unit of delay duration (`minute`, `second`, etc.)                          |`"minute"`                                 |
| └─ `delay_time`                                | integer        | No       | Delay duration before execution                                            |`5`                                        |
| `CPD_INFO              `                       | list[dict]     | No       | List of metric definitions used in the custom monitor                      |                                           |
| └─ `CPD_URL`                                   | string         | No       | CPD instance URL (if using CPD)                                            |                                           |
| └─ `USERNAME   `                               | string         | No       | CPD Username                                                               |                                           |
| └─ `PASSWORD`                                  | string         | No       | CPD User API Key                                                           |                                           |
| └─ `VERSION`                                   | integer        | No       | Version                                                                    |`5.0`                                      |


### Configuration cell

In [11]:
config = {
  "CLOUD_API_KEY": CLOUD_API_KEY,
  "SPACE_ID": SPACE_ID,
  "DATAMART_ID": DATAMART_ID,
  "SUBSCRIPTION_ID": SUBSCRIPTION_ID,
  "CUSTOM_MONITOR_NAME":"RAG Quality Monitor",
  "DEPLOYMENT_TYPE": "wml_batch",
  "CUSTOM_METRICS_WAIT_TIME": 120,
  "MONITOR_METRICS": [
    {
      "name": "answer_completeness",
      "thresholds": {
        "lower_limit": 0.8
      }
    },
    {
      "name": "answer_relevance",
      "thresholds": {
        "lower_limit": 0.6,
        "upper_limit": 1.0
      }
    },
    {
      "name": "hap",
      "thresholds": {
        "lower_limit": 0.8
      }
    },
    {
      "name": "pii",
      "thresholds": {
        "lower_limit": 0.8
      }
    }
  ],
  "TAGS": [
      {
          "name": "region",
          "TAG_DESCRIPTION": "Custom metrics tag for monitoring"
      }
  ]
}

## 3. Create the custom metrics provider - Python function <a name="provider"></a>

The Python function receives the required variables such as the `datamart_id`, `monitor_instance_id`, `monitor_id`, `monitor_instance_parameters` and `subscription_id` from the Watson OpenScale service when it is invoked by the custom monitor.

Within the Python function, implement your logic to compute the custom metrics in the `get_metrics` method and the record-level metrics in the `get_record_level_metrics` method, then publish the metrics to the Watson OpenScale service and update the monitor instance run status to finished.

Note: Metric names must exactly match the names defined in the configuration otherwise, an error will occur while publishing the metrics.
The `record_id` column in the custom dataset is unique (primary key), so you cannot save record-level metrics for the same record more than once. If you need to store duplicate or repeated records, use the `reference_record_id` field instead.

In [12]:
#wml_python_function
parms = {
        "url": OPENSCALE_API_URL,
        "iam_url": IAM_URL,
        "apikey": CLOUD_API_KEY
    }
def custom_metrics_provider(parms = parms):
    
    import json
    import requests
    import base64
    from requests.auth import HTTPBasicAuth
    import time
    import uuid
    import datetime
    import random
    import pandas as pd
    
    headers = {}
    headers["Content-Type"] = "application/json"
    headers["Accept"] = "application/json"

    def get_access_token():
        token_headers={}
        token_headers["Content-Type"] = "application/x-www-form-urlencoded"
        token_headers["Accept"] = "application/json"
        auth = HTTPBasicAuth("bx", "bx")
        data = {
            "grant_type": "urn:ibm:params:oauth:grant-type:apikey",
            "apikey": parms["apikey"]
        }
        response = requests.post(parms["iam_url"], data=data, headers=token_headers, auth=auth)
        json_data = response.json()
        access_token = json_data['access_token']
        return access_token    
    
    
    def get_feedback_data(access_token, data_mart_id, feedback_dataset_id):
        json_data = None
        if feedback_dataset_id is not None:
            headers["Authorization"] = "Bearer {}".format(access_token)
            DATASETS_STORE_RECORDS_URL = parms["url"] + "/openscale/{0}/v2/data_sets/{1}/records?format=list&limit={2}".format(data_mart_id, feedback_dataset_id, 10)
            response = requests.get(DATASETS_STORE_RECORDS_URL, headers=headers, verify=False)
            json_data = response.json()
        
            return json_data

    def save_record_level_metrics(base_url, access_token, data_mart_id, custom_dataset_id, run_id, record_level_metrics_df):
        
        if custom_dataset_id and record_level_metrics_df is not None and not record_level_metrics_df.empty:
            payload = [
            {
                "fields": list(record_level_metrics_df.columns),
                "values": record_level_metrics_df.values.tolist()
            }]
            headers["Authorization"] = "Bearer {}".format(access_token)
            headers["Content-Type"] = "application/json"
            DATASETS_STORE_RECORDS_URL = base_url + "/v2/data_sets/{1}/records".format(data_mart_id, custom_dataset_id)
            response = requests.post(DATASETS_STORE_RECORDS_URL, headers=headers, json = payload, verify=False)
            record_lev_metrics_resp = response.json()
            status_code = response.status_code
            if int(status_code) in [200, 201, 202]:
                print(f"Accepted request to save record level metrics to custom dataset {custom_dataset_id}. response {record_lev_metrics_resp}")
            else:
                print(f"Failed while saving record level metrics to custom dataset. Error {record_lev_metrics_resp}")

    
    #Update the run status to Finished in the Monitor Run
    def update_monitor_run_status(base_url, access_token, custom_monitor_instance_id, run_id, status, error_msg = None):
        monitor_run_url = base_url + '/v2/monitor_instances/' + custom_monitor_instance_id + '/runs/'+run_id
        completed_timestamp = datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S.%fZ")
        patch_payload  = []
        base_path = "/status"
        
        patch_payload.append(get_patch_request_field(base_path, "state", status))
        patch_payload.append(get_patch_request_field(base_path, "completed_at", completed_timestamp))
        if error_msg != None:
            error_json = get_error_json(error_msg)
            patch_payload.append(get_patch_request_field(base_path, "failure", error_json))
        
        headers["Authorization"] = "Bearer {}".format(access_token)
        response = requests.patch(monitor_run_url, headers=headers, json = patch_payload, verify=False)
        monitor_run_response = response.json()
        return response.status_code, monitor_run_response
    
    def get_error_json(error_message):
        trace = str(uuid.uuid4())
        error_json = {
            'trace': trace,
            'errors': [{
                'code': "custom_metrics_error_code",
                'message': str(error_message)
            }]
        }
        return error_json
    
    def get_patch_request_field(base_path, field_name, field_value, op_name="replace"):
        field_json = {
            "op": op_name,
            "path": "{0}/{1}".format(base_path, field_name),
            "value": field_value
        }
        return field_json
        
    def get_record_level_metrics(feedback_data_df, feedback_dataset_id, custom_monitor_run_id):
        # Add the computation logic here to compute the record level metrics
        #The record_id column in the custom dataset is unique (primary key), so you cannot save record-level metrics for the same record more than once. 
        #If you need to store duplicate or repeated records, use the reference_record_id field instead.
    
        record_level_metrics_df = pd.DataFrame({
            #"record_id": feedback_data_df["record_id"],
            "reference_record_id": feedback_data_df["record_id"],
            "record_timestamp": feedback_data_df["record_timestamp"],
            "run_id": custom_monitor_run_id,
            "computed_on": "feedback",
            "data_set_id": feedback_dataset_id,
            # generate float values between 0 and 1
            "hap": [round(random.random(), 2) for _ in range(len(feedback_data_df))],
            "pii": [round(random.random(), 2) for _ in range(len(feedback_data_df))],
            "answer_completeness": [round(random.random(), 2) for _ in range(len(feedback_data_df))],
            "answer_relevance": [round(random.random(), 2) for _ in range(len(feedback_data_df))]
            })

        return record_level_metrics_df
        
    #Add your code to compute the custom metrics. 
    def get_metrics(access_token, data_mart_id, subscription_id, feedback_dataset_id, custom_monitor_run_id, timestamp):
        #Add the logic here to compute the metrics. Use the below metric names while creating the custom monitor definition
        json_data = get_feedback_data(access_token, data_mart_id, feedback_dataset_id)
        metrics = None
        record_level_metrics_df = pd.DataFrame()
        if json_data is not None and len(json_data['records']) > 0:
            fields = json_data['records'][0]['fields']
            values = json_data['records'][0]['values']

            feedback_data_df = pd.DataFrame(values, columns = fields)
            record_level_metrics_df = get_record_level_metrics(feedback_data_df, feedback_dataset_id, custom_monitor_run_id)

        #Remove the tag("region": "us-south") in below metrics while publishing the metric values to Openscale Datamart 
        #if the custom monitor definition is not created with tags
        
        if not record_level_metrics_df.empty:
            #Aggregate the record level metrics
            metrics = {"answer_completeness": record_level_metrics_df["answer_completeness"].mean(), "answer_relevance": record_level_metrics_df["answer_relevance"].mean(),"hap": record_level_metrics_df["hap"].mean(), "pii": record_level_metrics_df["pii"].mean(), "region": "us-south"}
        else:
            metrics = {"answer_completeness": 0.6, "answer_relevance": 0.7, "hap": 0.9,"pii": 0.95, "region": "us-south"}
        
    
        return metrics, record_level_metrics_df
        
        
    # Publishes the Custom Metrics to OpenScale
    def publish_metrics(base_url, access_token, data_mart_id, subscription_id, custom_monitor_id, custom_monitor_instance_id, custom_monitor_run_id, feedback_dataset_id, custom_dataset_id, timestamp):
        # Generate an monitoring run id, where the publishing happens against this run id
        custom_metrics, record_level_metrics = get_metrics(access_token, data_mart_id, subscription_id, feedback_dataset_id, custom_monitor_run_id, timestamp)
        save_record_level_metrics(base_url, access_token, data_mart_id, custom_dataset_id, custom_monitor_run_id, record_level_metrics)
        measurements_payload = [
                  {
                    "timestamp": timestamp,
                    "run_id": custom_monitor_run_id,
                    "metrics": [custom_metrics]
                  }
                ]
        headers["Authorization"] = "Bearer {}".format(access_token)
        headers["Content-Type"] = "application/json"
        measurements_url = base_url + '/v2/monitor_instances/' + custom_monitor_instance_id + '/measurements'
        response = requests.post(measurements_url, headers=headers, json = measurements_payload, verify=False)
        published_measurement = response.json()
        return response.status_code, published_measurement
        
    
    def publish( input_data ):
        timestamp = datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S.%fZ")
        payload_array = input_data.get("input_data")[0].get("values")
        payload = payload_array[0]
        data_mart_id = payload['data_mart_id']
        subscription_id = payload['subscription_id']
        custom_monitor_id = payload['custom_monitor_id']
        custom_monitor_instance_id = payload['custom_monitor_instance_id']
        custom_monitor_instance_params  = payload['custom_monitor_instance_params']
        custom_monitor_run_id = payload['custom_monitor_run_id']
        payload_dataset_id = payload.get('payload_dataset_id')
        feedback_dataset_id = payload.get('feedback_dataset_id')
        custom_dataset_id = payload.get('custom_dataset_id')

        base_url = parms['url'] + '/openscale' + '/' + data_mart_id
        access_token = get_access_token()
        
        published_measurements = []
        error_msgs = []
        run_status = "finished"
        error_msg = None
        
        try:
            status_code, published_measurement = publish_metrics(base_url, access_token, data_mart_id, subscription_id, custom_monitor_id, custom_monitor_instance_id, custom_monitor_run_id, feedback_dataset_id, custom_dataset_id, timestamp)
            if int(status_code) in [200, 201, 202]:
                published_measurements.append(published_measurement)
            else:
                run_status = "error"
                error_msg = published_measurement
                error_msgs.append(error_msg)
                
        except Exception as ex:
            run_status = "error"
            error_msg = str(ex)
            error_msgs.append(error_msg)
            
        finally:
            status_code, response = update_monitor_run_status(base_url, access_token, custom_monitor_instance_id, custom_monitor_run_id, run_status, error_msg)
            if not int(status_code) in [200, 201, 202]:
                error_msgs.append(response)
    
        if len(error_msgs) == 0:
            response_payload = {
                "predictions" : [{ 
                    "values" : published_measurements
                }]

            }
        else:
            response_payload = {
                "predictions":[{
                    "values":[{"errors": error_msgs}]
                }]
            }
        
        return response_payload
        
    return publish
    

## 4. Configure OpenScale. <a name="config"></a>

Import the required libraries and set up the Watson OpenScale Python client.

In [13]:
from ibm_watson_openscale import APIClient
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator(
    apikey=config["CLOUD_API_KEY"]
)
wos_client = APIClient(service_url=OPENSCALE_API_URL, authenticator=authenticator, service_instance_id = DATAMART_ID)
wos_client.version

'3.1.2.19'

## 5. Set up the custom monitor configuration. <a name="custom_monitor"></a>


This setup initializes the WML client, sets the default space, deletes existing resources, and recreates the python function deployment, custom metrics provider, custom monitor definition, and monitor instance.

In [14]:
wos_client.custom_monitor.setup_configuration(config,custom_metrics_provider)

Initialising  Watson Machine Learning (WML) client.
Initilising Cloud WML
Default space set to 77981bef-037d-421d-b60d-f5e7c0ee4689
Setting up an Integration System for Custom Metrics Provider
CUSTOM_METRICS_PROVIDER_NAME: Custom Metrics Provider_0199c390-f7f2-7f54-98cf-289b743c3218
delete_integrated_system is True
Cleaning up existing deployment Custom Metrics Provider Deployment.
Performing Batch deployment Cleanup for: Custom Metrics Provider Deployment_0199c390-f7f2-7f54-98cf-289b743c3218
Deleting Batch deployment: 0b1c46e6-2edc-418d-baf5-e63ab8407f99 
Deleting associated asset: 4f8b27c0-6064-4c3a-85d0-eecb70e327b4
Creating custom function.
Deploy function as BATCH : Custom Metrics Provider Deployment


######################################################################################

Synchronous deployment creation for id: 'cce46763-0f78-4402-bd5e-f7a251e3d313' started

######################################################################################


ready.


---------

{'function_id': 'cce46763-0f78-4402-bd5e-f7a251e3d313',
 'deployment_id': '7779cf78-0714-44c8-839d-37cab060a411',
 'scoring_url': 'https://yp-qa.ml.cloud.ibm.com/ml/v4/deployment_jobs?version=2025-11-24',
 'integrated_system_id': '019ab43b-13a6-78de-b045-276cec772386',
 'custom_metrics_provider_name': 'Custom Metrics Provider_0199c390-f7f2-7f54-98cf-289b743c3218',
 'custom_monitor_id': 'rag_quality_monitor',
 'custom_monitor_instance_id': '019ab43f-fbb8-787c-935a-5e20012c84b6',
 'custom_dataset_id': '019ab440-18fa-7217-8dbd-971758164051',
 'custom_dataset_table_name': 'rag_quality_monitor_0199c390-f7f2-7f54-98cf-289b743c3218'}

## 6 Get custom monitor configuration <a name="get_config"></a>

In [15]:
result = wos_client.custom_monitor.get_custom_monitor_configuration(config=config)
result

{'function_id': 'cce46763-0f78-4402-bd5e-f7a251e3d313',
 'deployment_id': '7779cf78-0714-44c8-839d-37cab060a411',
 'scoring_url': 'https://yp-qa.ml.cloud.ibm.com/ml/v4/deployment_jobs?version=2025-11-24',
 'integrated_system_id': '019ab43b-13a6-78de-b045-276cec772386',
 'custom_metrics_provider_name': 'Custom Metrics Provider_0199c390-f7f2-7f54-98cf-289b743c3218',
 'custom_monitor_id': 'rag_quality_monitor',
 'custom_monitor_instance_id': '019ab43f-fbb8-787c-935a-5e20012c84b6',
 'custom_dataset_id': '019ab440-18fa-7217-8dbd-971758164051',
 'custom_dataset_table_name': 'rag_quality_monitor_0199c390-f7f2-7f54-98cf-289b743c3218'}

In [16]:
custom_monitor_instance_id = result["custom_monitor_instance_id"]
deployment_uid = result["deployment_id"]
custom_monitor_id = result["custom_monitor_id"]
custom_dataset_id = result["custom_dataset_id"]

## 7. Run the custom monitor <a name="run"></a>

In [17]:
#Execute the custom metrics provider deployment
monitor_instance_run_info = wos_client.monitor_instances.run(
        background_mode=False,
        monitor_instance_id=custom_monitor_instance_id
     ).result

monitor_instance_run_info
custom_monitor_run_id = monitor_instance_run_info.metadata.id




 Waiting for end of monitoring run e2fcb62f-349b-46df-9680-f5cd63393c24 




running...
finished

---------------------------
 Successfully finished run 
---------------------------




## 8. Risk evaluations for subscription 

The cell below triggers all configured monitors (OOTB and custom) for the selected subscription. It assesses the test data, computes the metrics, and publishes the results to Watson OpenScale and Facts.

For risk assessment of a development-type/pre production subscription, an evaluation dataset must be provided. The risk evaluation function uses the dataset path as an input parameter to evaluate the configured metric dimensions.

Note: Disable Step 7 (Run the custom monitor) and uncomment the code in the following cell to run the custom monitor along with other monitors through MRM to evaluate the risk and publish the results to Facts. 

In [15]:
def get_mrm_monitor_instance():
    monitor_instances = wos_client.monitor_instances.list(data_mart_id = DATAMART_ID, monitor_definition_id = "mrm", target_target_id = SUBSCRIPTION_ID).result.monitor_instances
    if len(monitor_instances) == 1:
        return monitor_instances[0]
    return None

In [None]:
#mrm_monitor_instance = get_mrm_monitor_instance()
#mrm_monitor_instance_id = mrm_monitor_instance.metadata.id

###################################################################################
#Enable the below code for pre production flow
######################################################################################

#test_data_set_name = "test_data"
#body = {}
#test_data_path= "llm_data.csv"

#response = wos_client.monitor_instances.mrm.evaluate_risk(monitor_instance_id=mrm_monitor_instance_id, test_data_set_name=test_data_set_name,
#                                                     test_data_path=test_data_path, body=body, project_id=PROJECT_ID,
#                                                     includes_model_output=True, background_mode=False)
#response.result.to_dict()


#####################################################################################
#Enable the below code for production flow 
######################################################################################
#response  = wos_client.monitor_instances.mrm.evaluate_risk(monitor_instance_id=mrm_monitor_instance_id, 
#                                                      space_id = SPACE_ID, background_mode = False)
#response.result.to_dict()
############################################################################################



### Show custom metrics

In [18]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=custom_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2025-11-24 05:05:26.806091+00:00,answer_relevance,019ab440-a516-7bab-b72d-2c293916a308,0.502,0.6,1.0,['region:us-south'],rag_quality_monitor,019ab43f-fbb8-787c-935a-5e20012c84b6,e2fcb62f-349b-46df-9680-f5cd63393c24,subscription,0199c390-f7f2-7f54-98cf-289b743c3218
2025-11-24 05:05:26.806091+00:00,pii,019ab440-a516-7bab-b72d-2c293916a308,0.5,0.8,,['region:us-south'],rag_quality_monitor,019ab43f-fbb8-787c-935a-5e20012c84b6,e2fcb62f-349b-46df-9680-f5cd63393c24,subscription,0199c390-f7f2-7f54-98cf-289b743c3218
2025-11-24 05:05:26.806091+00:00,hap,019ab440-a516-7bab-b72d-2c293916a308,0.279,0.8,,['region:us-south'],rag_quality_monitor,019ab43f-fbb8-787c-935a-5e20012c84b6,e2fcb62f-349b-46df-9680-f5cd63393c24,subscription,0199c390-f7f2-7f54-98cf-289b743c3218
2025-11-24 05:05:26.806091+00:00,answer_completeness,019ab440-a516-7bab-b72d-2c293916a308,0.537,0.8,,['region:us-south'],rag_quality_monitor,019ab43f-fbb8-787c-935a-5e20012c84b6,e2fcb62f-349b-46df-9680-f5cd63393c24,subscription,0199c390-f7f2-7f54-98cf-289b743c3218


### Show record level metrics

In [None]:
wos_client.data_sets.show_records(data_set_id= custom_dataset_id)

# [OPTIONAL STEP] Invoke the custom metrics python function deployment as part of this notebook.

Run the cell below to validate the custom metrics provider python function by providing the correct parameters to generate the custom metrics.

In [20]:
def get_dataset_id(data_set_type: str):
    data_sets = wos_client.data_sets.list(target_target_id= config["SUBSCRIPTION_ID"], type = data_set_type).result.data_sets
    feedback_data_set_id = None
    if len(data_sets) > 0:
        feedback_data_set_id = data_sets[0].metadata.id
    return feedback_data_set_id

### Get the custom monitor instance configuration

In [21]:
res = wos_client.custom_monitor.get_monitor_instance_config(config=config)
monitor_instance_parameters = res["monitor_instances"][0]["entity"]["parameters"]

Monitor instance details picking for Subscription: 0199c390-f7f2-7f54-98cf-289b743c3218 ,monitor definition: rag_quality_monitor 


In [22]:
parameters = {
    "custom_metrics_provider_id": result["integrated_system_id"],
    "custom_metrics_wait_time": monitor_instance_parameters["custom_metrics_wait_time"],
    "custom_metrics_provider_type": monitor_instance_parameters["custom_metrics_provider_type"],
    "space_id": monitor_instance_parameters["space_id"],
    "deployment_id":monitor_instance_parameters["deployment_id"],
    "hardware_spec_id": monitor_instance_parameters["hardware_spec_id"]
}

payload = {
    "data_mart_id" : config["DATAMART_ID"],
    "subscription_id" : config["SUBSCRIPTION_ID"],
    "custom_monitor_id" : result["custom_monitor_id"],
    "custom_monitor_instance_id" : custom_monitor_instance_id,
    "custom_monitor_run_id": custom_monitor_run_id,
    "custom_monitor_instance_params": parameters,
    "feedback_dataset_id": get_dataset_id("feedback"),
    "custom_dataset_id": custom_dataset_id
    
}

input_data= { "input_data": [ {"values": [ payload ] } ]
            }
func_result = custom_metrics_provider()(input_data)
print(func_result)

Accepted request to save record level metrics to custom dataset 019ab440-18fa-7217-8dbd-971758164051. response {'state': 'preparing'}
{'predictions': [{'values': [[{'measurement_id': '019ab441-9176-7b25-9a1a-4cad02e75657', 'metrics': [{'answer_completeness': 0.507, 'answer_relevance': 0.34700000000000003, 'hap': 0.5850000000000001, 'pii': 0.398, 'region': 'us-south'}], 'run_id': 'e2fcb62f-349b-46df-9680-f5cd63393c24', 'timestamp': '2025-11-24T05:06:27.318053Z'}]]}]}


## Congratulations

You have finished configuring Custom Monitor Definition and Monitor instance and executing Custom Monitor Run for your Subscription. You can also run the custom monitor from `Watson OpenScale Dashboard`(http://aiopenscale.cloud.ibm.com). Click the tile of your model and select `Evaluate Now` option from `Actions` drop down menu to run the monitor.