<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with a custom metrics provider for Detached PTA

 This notebook should be run in a Watson Studio project, using **IBM Runtime 24.1 on Python 3.11 XS** runtime environment. **If you are viewing this in Watson Studio and do not see the required runtime env in the upper right corner of your screen, please update the runtime now.**. It requires service credentials for the following services:
  * Watson OpenScale
  * Watson Machine Learning

## Overview

This sample notebook demonstrates how to create and deploy a PTA in space and configure a custom monitor and compute metrics such as answer completeness, answer relevance, HAP, and PII for LLM subscriptions. Based on the values specified in the configuration cell, it automatically creates the custom monitor definition, WML batch deployment for the Python function, custom metrics provider with the deployment scoring endpoint, and a custom dataset for storing record-level metrics. Users must update the appropriate metric computation logic inside the Python function.

During each run, OpenScale invokes the custom metrics provider(python function) and sends inputs like data_mart_id, subscription_id, custom_monitor_id and other parameters. The provider then:

- Reads data from feedback, payload logging, or other datasets.
- Computes record-level metrics and saves them to the custom dataset.
- Computes and publishes aggregated metrics to the Measurements API.
- Updates the monitor run status to Finished.
  
## Contents

This notebook contains the following parts:

  1. [Set up your environment](#setup)
  2. [Create Prompt template](#prompt)
  3. [Prompt Setup](#ptatsetup)
  4. [Configure Values for the Custom Monitor](#provider)
  5. [Create the custom metrics provider - Python function](#deployment)
  6. [Set up the custom monitor configuration](#monitor)
  7. [Get custom monitor configuration](#run)
  8. [Risk evaluations for PTA subscription](#run)
  9. [Display the Custom metrics](#custom_metrics)


## 1. Set up your environment <a name="setup"></a>

Before you use the sample code in this notebook, you must perform the following setup tasks:

### Install the  `ibm-watson-machine-learning` and `ibm-watson-openscale` packages.

In [None]:
!pip install --upgrade ibm-watson-machine-learning   | tail -n 1
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1

Note: you may need to restart the kernel to use updated packages.

### Provision services and configure credentials

If you have not already, provision an instance of IBM Watson OpenScale using the [OpenScale link in the Cloud catalog](https://cloud.ibm.com/catalog/services/watson-openscale).

Your Cloud API key can be generated by going to the [**Users** section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the **API Keys** section, and click **Create an IBM Cloud API key**. Give your key a name and click **Create**, then copy the created key and paste it below.

**NOTE:** You can also get OpenScale `API_KEY` using IBM CLOUD CLI.

How to install IBM Cloud (bluemix) console: [instruction](https://console.bluemix.net/docs/cli/reference/ibmcloud/download_cli.html#install_use)

How to get api key using console:
```
bx login --sso
bx iam api-key-create 'my_key'
```

In [None]:

IAM_URL = "https://iam.cloud.ibm.com/oidc/token"
DATAPLATFORM_URL = "https://api.dataplatform.cloud.ibm.com"
SERVICE_URL = "https://api.aiopenscale.cloud.ibm.com"
CLOUD_API_KEY = "<Your Cloud IAM API Key>"
DATAMART_ID =  "<DataMart Id>"
WML_URL = "<WML URL>"


### Set the project ID

In order to set up a development type subscription, the PTA must be within the project. Please supply the project ID where the PTA needs to be created.

In [None]:
PROJECT_ID = "" # YOUR_PROJECT_ID


### Set the space ID

In [None]:
SPACE_ID = "" #YOUR_SPACE_ID

### Function to create the access token

This function generates an IAM access token using the provided credentials. The API calls for creating and scoring prompt template assets utilize the token generated by this function.

In [None]:
import requests, json


def generate_access_token():
    headers = {}
    headers["Content-Type"] = "application/x-www-form-urlencoded"
    headers["Accept"] = "application/json"
    data = {
        "grant_type": "urn:ibm:params:oauth:grant-type:apikey",
        "apikey": CLOUD_API_KEY,
        "response_type": "cloud_iam"
    }
    response = requests.post(IAM_URL, data=data, headers=headers)
    json_data = response.json()
    iam_access_token = json_data["access_token"]

    return iam_access_token

iam_access_token = generate_access_token()

# HuggingFace model

In [None]:
!pip install torch torchvision torchaudio

In [None]:
from transformers import pipeline

summarizer = pipeline(
    "summarization",
    model="google/flan-t5-base",
    tokenizer="google/flan-t5-base"
)

In [None]:
# Download summarisation data
!rm summarisation.csv
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/summarization/summarisation.csv

In [None]:
import pandas as pd

test_data_path = "summarisation.csv"
llm_data = pd.read_csv(test_data_path)
llm_data.head()

## Set the generated_summary with the summary from HF Google Flan model prompt evaluation

In [None]:
def get_completion(prompt_text):
    summary = (summarizer(prompt_text, max_length=len(prompt_text), min_length=1))
    summary_text = summary[0]["summary_text"]
    return summary_text

llm_data["generated_text"] = llm_data["original_text"].apply(get_completion)
llm_data.head()

## 2. Create Prompt template <a name="prompt"></a>

Create a prompt template for a summarization task

In [18]:
from ibm_aigov_facts_client import AIGovFactsClient

facts_client = AIGovFactsClient(
    api_key=CLOUD_API_KEY,
    container_id=PROJECT_ID,
    container_type="project",
    disable_tracing=True
)


In [19]:
from ibm_aigov_facts_client import DetachedPromptTemplate, PromptTemplate

detached_information = DetachedPromptTemplate(
    prompt_id="detached_prompt",
    model_id="google/flan-t5-base",
    model_provider="Hugging Face",
    model_name="google/flan-t5-base",
    model_url="https://huggingface.co/google/flan-t5-base",
    prompt_url="prompt_url",
    prompt_additional_info={"model_owner": "huggingface"}
)

task_id = "summarization"
name = "Summarization-detached notebook"
description = "My first detached prompt"
model_id = "google/flan-t5-base"

# define parameters for PromptTemplate
prompt_variables = {"original_text": ""}
input = "{original_text}"
input_prefix = "Input:"
output_prefix = "Output:"

prompt_template = PromptTemplate(
    input=input,
    prompt_variables=prompt_variables,
    input_prefix=input_prefix,
    output_prefix=output_prefix
)

pta_details = facts_client.assets.create_detached_prompt(
    model_id=model_id,
    task_id=task_id,
    name=name,
    description=description,
    prompt_details=prompt_template,
    detached_information=detached_information
)
project_pta_id = pta_details.to_dict()["asset_id"]

2025/04/30 11:53:41 INFO : ------------------------------ Detached Prompt Creation Started ------------------------------
2025/04/30 11:53:44 INFO : The detached prompt with ID 40027bbd-9f13-4098-b314-a31e0c8f8700 was created successfully in container_id acf10f1c-58d6-449e-9b38-a996ab1c432d.


## 3. Prompt setup <a name="ptatsetup"></a>

### Configure OpenScale

In [None]:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator, CloudPakForDataAuthenticator
from ibm_watson_openscale import APIClient
from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *


service_instance_id = None  # Update this to refer to a particular service instance
authenticator = IAMAuthenticator(
    apikey=CLOUD_API_KEY
)
wos_client = APIClient(
    authenticator=authenticator,
    service_url=SERVICE_URL,
    service_instance_id= DATAMART_ID
)
data_mart_id =  wos_client.service_instance_id
wos_client.version

'3.0.46'

### Promote the asset to Space

In [None]:
headers={}
headers["Content-Type"] = "application/json"
headers["Accept"] = "*/*"
headers["Authorization"] = "Bearer {}".format(iam_access_token)
verify = True

url = "{}/v2/assets/{}/promote".format(DATAPLATFORM_URL ,project_pta_id)

params = {
    "project_id":PROJECT_ID
}

payload = {
    "space_id": SPACE_ID
}
response = requests.post(url, json=payload, headers=headers, params = params, verify = verify)
json_data = response.json()
space_pta_id = json_data["metadata"]["asset_id"]
space_pta_id

### Deployment of asset from space

In [23]:
DEPLOYMENTS_URL = WML_URL + "/ml/v4/deployments"

headers={}
headers["Content-Type"] = "application/json"
headers["Accept"] = "*/*"
headers["Authorization"] = "Bearer {}".format(iam_access_token)
verify = True
payload = {
    "prompt_template": {
      "id": space_pta_id
    },
    "detached": {
    },
    "base_model_id": "meta-llama/llama-3-70b-instruct",
    "description": "summarization-detached",
    "name": "summarization_detached",
    "space_id": SPACE_ID
}

version = "2023-07-07" # The version date for the API of the form YYYY-MM-DD. Example : 2023-07-07
params = {
    "version":version,
    "space_id":SPACE_ID
}

response = requests.post(DEPLOYMENTS_URL, json=payload, headers=headers, params = params, verify = verify)
json_data = response.json()


if "metadata" in json_data:
    deployment_id = json_data["metadata"]["id"]
    print(deployment_id)
else:
    print(json_data)

14598d71-1df4-4687-a583-23f4982841a1


### Setup the prompt template asset in project for evaluation with supported monitor dimensions

The prompt template assets from space is supported with `pre_production` or `production` operational space ID. Running the below cell will create a development type subscription from the prompt template asset created within the project.

The available parameters that can be passed for `execute_prompt_setup` function are:

 * `prompt_template_asset_id` : Id of prompt template asset for which subscription needs to be created.
 * `label_column` :  The name of the column containing the ground truth or actual labels.
 * `project_id` : The GUID of the project.
 * `space_id` : The GUID of the space.
 * `deployment_id` : (optional) The GUID of the deployment.
 * `operational_space_id` : The rank of the environment in which the monitoring is happening. Accepted values are `development`, `pre_production`, `production`.
 * `problem_type` : (optional) The task type to monitor for the given prompt template asset.
 * `classification_type` : The classification type `binary`/`multiclass` applicable only for `classification` problem (task) type.
 * `input_data_type` : The input data type.
 * `supporting_monitors` : Monitor configuration for the subscription to be created.
 * `background_mode` : When `True`, the promt setup operation will be executed in the background

In [24]:
label_column = "reference_summary"
operational_space_id = "pre_production"
problem_type = "summarization"
input_data_type = "unstructured_text"

monitors = {
    "generative_ai_quality": {
        "parameters": {
            "min_sample_size": 10,
            "metrics_configuration": {
            }
        }
    }
}

response = wos_client.wos.execute_prompt_setup(
    prompt_template_asset_id=space_pta_id,
    space_id=SPACE_ID,
    deployment_id=deployment_id,
    label_column=label_column,
    operational_space_id=operational_space_id,
    problem_type=problem_type,
    input_data_type=input_data_type,
    supporting_monitors=monitors,
    background_mode=False
)

result = response.result
res_dict = result.to_dict()




 Waiting for end of adding prompt setup b8601bfe-06a4-4669-b7bd-eda9c6d1ce76 




running.
finished

---------------------------------------------------------------
 Successfully finished setting up prompt template subscription 
---------------------------------------------------------------




With the below cell, users can  read the  prompt setup task and check its status

In [None]:

SUBSCRIPTION_ID = result.to_dict()["subscription_id"]
SUBSCRIPTION_ID

'01968561-1b2a-7e33-be21-7664c95df22d'

In [26]:
response = wos_client.wos.get_prompt_setup(
    prompt_template_asset_id=space_pta_id,
    space_id=SPACE_ID,
    deployment_id=deployment_id
)

result = response.result
result_json = result.to_dict()

if result_json["status"]["state"] == "FINISHED":
    print("Finished prompt setup. The response is {}".format(result_json))
else:
    print("Prompt setup failed. The response is {}".format(result_json))

Finished prompt setup. The response is {'prompt_template_asset_id': 'b8601bfe-06a4-4669-b7bd-eda9c6d1ce76', 'space_id': 'f0941313-c25e-440b-9531-ad0106f8435d', 'deployment_id': '14598d71-1df4-4687-a583-23f4982841a1', 'service_provider_id': '01968561-1761-7274-a799-4a163bbb9d75', 'subscription_id': '01968561-1b2a-7e33-be21-7664c95df22d', 'mrm_monitor_instance_id': '01968561-3a93-7bfc-a331-3e84de3f177f', 'start_time': '2025-04-30T06:27:34.210928Z', 'end_time': '2025-04-30T06:27:50.975981Z', 'status': {'state': 'FINISHED'}}


### Read required IDs from prompt setup response

In [None]:
SUBSCRIPTION_ID = result_json["subscription_id"]
mrm_monitor_instance_id = result_json["mrm_monitor_instance_id"]

## Configure Custom Monitor for Detached PTA subscription

## 4. Default values for the custom monitor <a name="setup"></a>
Default values for the following custom monitor parameters are set. You can override them by specifying parameter values in the configuration cell.

| Parameter Name                                | Type           | Optional | Description                                                                 | Default Value                              |
|----------------------------------------------|----------------|----------|-----------------------------------------------------------------------------|-------------------------------------------|
| `DEPLOYMENT_NAME`                             | string         | Yes      | Name of the function deployment                                             | `"Custom Metrics Provider Deployment"`   |
| `PYTHON_FUNCTION_NAME`                        | string         | Yes      | Name of the Python function to be deployed                                 | `"Custom Metrics Provider Function"`      |
| `CUSTOM_METRICS_PROVIDER_NAME`                | string         | Yes      | Name for the Custom Metrics Provider                                       | `"Custom Metrics Provider"`               |
| `CUSTOM_MONITOR_NAME`                         | string         | Yes      | Name of the custom monitor                                                 | `"Sample Model Performance"`              |
| `DATAMART_ID`                                 | string         | Yes      | Watson OpenScale DataMart GUID                                             | `"00000000-0000-0000-0000-000000000000"`  |
| `SPACE_ID`                                    | string         | No      | Watson OpenScale Space ID                                                   | `"<Your Space ID>"`  |
| `RUNTIME_ENV`                                 | string         | Yes      | Runtime environment for the Python function                                | `"runtime-24.1-py3.11"`                   |
| `ENABLE_SCHEDULE`                             | boolean        | Yes      | Flag to enable scheduled runs of the monitor                               | `True`                                    |
| `START_TIME`                                  | string         | Yes      | Scheduled run start time (format: `HH:MM:SS`)                              | `"10:00:00"`                              |
| `CUSTOM_METRICS_WAIT_TIME`                    | integer        | Yes      | Time in seconds to check the run status                                    | `60`                                     |
| `DELETE_CUSTOM_MONITOR`                       | boolean        | Yes      | Flag to delete any existing monitor with the same name                     | `True`                                   |
| `DELETE_CUSTOM_MONITOR_INSTANCE`              | boolean        | Yes      | Flag to delete any existing monitor instance                               | `True`                                   |
| `ALGORITHM_TYPES`                              | list[string]   | Yes      | Types of algorithms used (`binary`, `regression`, etc.)                    | `["binary","multiclass","regression","question_answering","summarization","retrieval_augmented_generation","classification","generation","extraction"]`                              |
| `INPUT_DATA_TYPES`                              | list[string]   | Yes      | Type of input data (`structured`, `unstructured`)                          | `["structured","unstructured_text","unstructured_image"]`                          |
| `WOS_URL`                                      | string         | No       | URL of the Watson OpenScale instance                                       | `"https://api.aiopenscale.cloud.ibm.com"` |
| `WML_URL`                                      | string         | No       | URL of Watson Machine Learning instance                                    | `"https://us-south.ml.cloud.ibm.com"`     |
| `CLOUD_API_KEY`                                | string         | No       | IBM Cloud API Key for IAM authentication                                   | `"<Your Cloud IAM API Key>"`             |
| `IAM_URL`                                      | string         | No       | IAM authentication URL                                                     | `"https://iam.ng.bluemix.net/oidc/token"` |
| `SUBSCRIPTION_ID`                              | string         | Yes      | ID of the subscription to be monitored                                     | `"<Subscription Id>"`                    |
| `CUSTOM_MONITOR_METRICS`                       | list[dict]     | No       | List of metric definitions used in the custom monitor                      |                                           |
| └─ `name`                                      | string         | No       | Name of the custom metric (e.g., `sensitivity`)                            |                                           |
| └─ `description`                               | string         | No       | Human-readable description of the metric                                   |                                           |
| └─ `type`                                      | string         | No       | Data type of the metric value (e.g., `number`)                             |                                           |
| `CUSTOM_METRICS_PROVIDER_CREDENTIALS`          | dict           | No       | Dictionary with authentication method for custom metrics provider          |                                           |
| └─ `auth_type`                                 | string         | No       | Authentication method (e.g., `bearer`)                                     |                                           |
| └─ `token_info`                                | dict           | Yes      | Token generation details (used for bearer tokens)                          |                                           |
|     └─ `url`                                   | string         | No       | URL to request IAM token                                                   |                                           |
|     └─ `headers`                               | dict           | No       | HTTP headers for token request                                             |                                           |
| `SCHEDULE              `                       | list[dict]     | No       | List of metric definitions used in the custom monitor                      |                                           |
| └─ `repeat_interval`                           | integer        | No       | Interval between scheduled executions                                      | `1`                                       |
| └─ `repeat_type`                               | string         | No       | Unit of repeat interval (`hour`, `day`, etc.)                              | `"hour"`                                  |
| └─ `delay_unit`                                | string         | No       | Unit of delay duration (`minute`, `second`, etc.)                          |`"minute"`                                 |
| └─ `delay_time`                                | integer        | No       | Delay duration before execution                                            |`5`                                        |
| `CPD_INFO              `                       | list[dict]     | No       | List of metric definitions used in the custom monitor                      |                                           |
| └─ `CPD_URL`                                   | string         | No       | CPD instance URL (if using CPD)                                            |                                           |
| └─ `USERNAME   `                               | string         | No       | CPD Username                                                               |                                           |
| └─ `PASSWORD`                                  | string         | No       | CPD User API Key                                                           |                                           |
| └─ `VERSION`                                   | integer        | No       | Version                                                                    |`5.0`                                      |




### Configuration cell

In [None]:
config = {
   "CLOUD_API_KEY": CLOUD_API_KEY,
  "SPACE_ID": SPACE_ID,
  "DATAMART_ID": DATAMART_ID,
  "SUBSCRIPTION_ID": SUBSCRIPTION_ID,
  "CUSTOM_METRICS_WAIT_TIME": 60,
  "WML_URL": WML_URL,
  "CUSTOM_MONITOR_NAME": "RAG Quality Monitor",
  "MONITOR_METRICS": [
      {
          "name": "answer_completeness",
          "thresholds": {
              "lower_limit": 0.8
          }
      },
      {
          "name": "answer_relevance",
          "thresholds": {
              "lower_limit": 0.6,
              "upper_limit": 1
          }
      },
      {
          "name": "hap",
          "thresholds": {
              "lower_limit": 0.8
          }
      },
      {
          "name": "pii",
          "thresholds": {
              "lower_limit": 0.8
          }
      }
  ],
  "TAGS": [
      {
          "name": "region",
          "TAG_DESCRIPTION": "Custom metrics tag for monitoring"
      }
    ]
}

## 5. Create the custom metrics provider - Python function <a name="provider"></a>

The Python function receives the required variables, such as the `datamart_id`, `monitor_instance_id`, `monitor_id`, `monitor_instance_parameters` and `subscription_id` from the Watson OpenScale service when it is invoked by the custom monitor. 

In the Python function, add your own logic to compute the custom metrics in the `get_metrics` method, publish the metrics to the Watson Openscale service and update the status of the run to the `finished` state in the custom monitor instance run.

Update the `WOS_CREDENTIALS` in the Python function. 

In [None]:
#wml_python_function
parms = {
        "url": SERVICE_URL,
        "iam_url": IAM_URL,
        "apikey": CLOUD_API_KEY
    }
def custom_metrics_provider(parms = parms):
    
    import json
    import requests
    import base64
    from requests.auth import HTTPBasicAuth
    import time
    import uuid
    import datetime
    import random
    import pandas as pd
    
    headers = {}
    headers["Content-Type"] = "application/json"
    headers["Accept"] = "application/json"

    def get_access_token():
        token_headers={}
        token_headers["Content-Type"] = "application/x-www-form-urlencoded"
        token_headers["Accept"] = "application/json"
        auth = HTTPBasicAuth("bx", "bx")
        data = {
            "grant_type": "urn:ibm:params:oauth:grant-type:apikey",
            "apikey": parms["apikey"]
        }
        response = requests.post(parms["iam_url"], data=data, headers=token_headers, auth=auth)
        json_data = response.json()
        access_token = json_data['access_token']
        return access_token    
    
    
    def get_feedback_data(access_token, data_mart_id, feedback_dataset_id):
        json_data = None
        if feedback_dataset_id is not None:
            headers["Authorization"] = "Bearer {}".format(access_token)
            DATASETS_STORE_RECORDS_URL = parms["url"] + "/openscale/{0}/v2/data_sets/{1}/records?format=list&limit={2}".format(data_mart_id, feedback_dataset_id, 10)
            response = requests.get(DATASETS_STORE_RECORDS_URL, headers=headers, verify=False)
            json_data = response.json()
        
            return json_data

    def save_record_level_metrics(base_url, access_token, data_mart_id, custom_dataset_id, run_id, record_level_metrics_df):
        
        if custom_dataset_id and record_level_metrics_df is not None and not record_level_metrics_df.empty:
            payload = [
            {
                "fields": list(record_level_metrics_df.columns),
                "values": record_level_metrics_df.values.tolist()
            }]
            headers["Authorization"] = "Bearer {}".format(access_token)
            headers["Content-Type"] = "application/json"
            DATASETS_STORE_RECORDS_URL = base_url + "/v2/data_sets/{1}/records".format(data_mart_id, custom_dataset_id)
            response = requests.post(DATASETS_STORE_RECORDS_URL, headers=headers, json = payload, verify=False)
            record_lev_metrics_resp = response.json()
            status_code = response.status_code
            if int(status_code) in [200, 201, 202]:
                print(f"Accepted request to save record level metrics to custom dataset {custom_dataset_id}. response {record_lev_metrics_resp}")
            else:
                print(f"Failed while saving record level metrics to custom dataset. Error {record_lev_metrics_resp}")

    
    #Update the run status to Finished in the Monitor Run
    def update_monitor_run_status(base_url, access_token, custom_monitor_instance_id, run_id, status, error_msg = None):
        monitor_run_url = base_url + '/v2/monitor_instances/' + custom_monitor_instance_id + '/runs/'+run_id
        completed_timestamp = datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S.%fZ")
        patch_payload  = []
        base_path = "/status"
        
        patch_payload.append(get_patch_request_field(base_path, "state", status))
        patch_payload.append(get_patch_request_field(base_path, "completed_at", completed_timestamp))
        if error_msg != None:
            error_json = get_error_json(error_msg)
            patch_payload.append(get_patch_request_field(base_path, "failure", error_json))
        
        headers["Authorization"] = "Bearer {}".format(access_token)
        response = requests.patch(monitor_run_url, headers=headers, json = patch_payload, verify=False)
        monitor_run_response = response.json()
        return response.status_code, monitor_run_response
    
    def get_error_json(error_message):
        trace = str(uuid.uuid4())
        error_json = {
            'trace': trace,
            'errors': [{
                'code': "custom_metrics_error_code",
                'message': str(error_message)
            }]
        }
        return error_json
    
    def get_patch_request_field(base_path, field_name, field_value, op_name="replace"):
        field_json = {
            "op": op_name,
            "path": "{0}/{1}".format(base_path, field_name),
            "value": field_value
        }
        return field_json
        
    def get_record_level_metrics(feedback_data_df, feedback_dataset_id, custom_monitor_run_id):
        # Add the computation logic here to compute the record level metrics
        #The record_id column in the custom dataset is unique (primary key), so you cannot save record-level metrics for the same record more than once. 
        #If you need to store duplicate or repeated records, use the reference_record_id field instead.
    
        record_level_metrics_df = pd.DataFrame({
            #"record_id": feedback_data_df["record_id"],
            "reference_record_id": feedback_data_df["record_id"],
            "record_timestamp": feedback_data_df["record_timestamp"],
            "run_id": custom_monitor_run_id,
            "computed_on": "feedback",
            "data_set_id": feedback_dataset_id,
            # generate float values between 0 and 1
            "hap": [round(random.random(), 2) for _ in range(len(feedback_data_df))],
            "pii": [round(random.random(), 2) for _ in range(len(feedback_data_df))],
            "answer_completeness": [round(random.random(), 2) for _ in range(len(feedback_data_df))],
            "answer_relevance": [round(random.random(), 2) for _ in range(len(feedback_data_df))]
            })

        return record_level_metrics_df
        
    #Add your code to compute the custom metrics. 
    def get_metrics(access_token, data_mart_id, subscription_id, feedback_dataset_id, custom_monitor_run_id, timestamp):
        #Add the logic here to compute the metrics. Use the below metric names while creating the custom monitor definition
        json_data = get_feedback_data(access_token, data_mart_id, feedback_dataset_id)
        metrics = None
        record_level_metrics_df = pd.DataFrame()
        if json_data is not None and len(json_data['records']) > 0:
            fields = json_data['records'][0]['fields']
            values = json_data['records'][0]['values']

            feedback_data_df = pd.DataFrame(values, columns = fields)
            record_level_metrics_df = get_record_level_metrics(feedback_data_df, feedback_dataset_id, custom_monitor_run_id)

        #Remove the tag("region": "us-south") in below metrics while publishing the metric values to Openscale Datamart 
        #if the custom monitor definition is not created with tags
        
        if not record_level_metrics_df.empty:
            #Aggregate the record level metrics
            metrics = {"answer_completeness": record_level_metrics_df["answer_completeness"].mean(), "answer_relevance": record_level_metrics_df["answer_relevance"].mean(),"hap": record_level_metrics_df["hap"].mean(), "pii": record_level_metrics_df["pii"].mean(), "region": "us-south"}
        else:
            metrics = {"answer_completeness": 0.6, "answer_relevance": 0.7, "hap": 0.9,"pii": 0.95, "region": "us-south"}
        
    
        return metrics, record_level_metrics_df
        
        
    # Publishes the Custom Metrics to OpenScale
    def publish_metrics(base_url, access_token, data_mart_id, subscription_id, custom_monitor_id, custom_monitor_instance_id, custom_monitor_run_id, feedback_dataset_id, custom_dataset_id, timestamp):
        # Generate an monitoring run id, where the publishing happens against this run id
        custom_metrics, record_level_metrics = get_metrics(access_token, data_mart_id, subscription_id, feedback_dataset_id, custom_monitor_run_id, timestamp)
        save_record_level_metrics(base_url, access_token, data_mart_id, custom_dataset_id, custom_monitor_run_id, record_level_metrics)
        measurements_payload = [
                  {
                    "timestamp": timestamp,
                    "run_id": custom_monitor_run_id,
                    "metrics": [custom_metrics]
                  }
                ]
        headers["Authorization"] = "Bearer {}".format(access_token)
        headers["Content-Type"] = "application/json"
        measurements_url = base_url + '/v2/monitor_instances/' + custom_monitor_instance_id + '/measurements'
        response = requests.post(measurements_url, headers=headers, json = measurements_payload, verify=False)
        published_measurement = response.json()
        return response.status_code, published_measurement
        
    
    def publish( input_data ):
        timestamp = datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S.%fZ")
        payload = input_data.get("input_data")[0].get("values")
        data_mart_id = payload['data_mart_id']
        subscription_id = payload['subscription_id']
        custom_monitor_id = payload['custom_monitor_id']
        custom_monitor_instance_id = payload['custom_monitor_instance_id']
        custom_monitor_instance_params  = payload['custom_monitor_instance_params']
        custom_monitor_run_id = payload['custom_monitor_run_id']
        payload_dataset_id = payload.get('payload_dataset_id')
        feedback_dataset_id = payload.get('feedback_dataset_id')
        custom_dataset_id = payload.get('custom_dataset_id')

        base_url = parms['url'] + '/openscale' + '/' + data_mart_id
        access_token = get_access_token()
        
        published_measurements = []
        error_msgs = []
        run_status = "finished"
        error_msg = None
        
        try:
            status_code, published_measurement = publish_metrics(base_url, access_token, data_mart_id, subscription_id, custom_monitor_id, custom_monitor_instance_id, custom_monitor_run_id, feedback_dataset_id, custom_dataset_id, timestamp)
            if int(status_code) in [200, 201, 202]:
                published_measurements.append(published_measurement)
            else:
                run_status = "error"
                error_msg = published_measurement
                error_msgs.append(error_msg)
                
        except Exception as ex:
            run_status = "error"
            error_msg = str(ex)
            error_msgs.append(error_msg)
            
        finally:
            status_code, response = update_monitor_run_status(base_url, access_token, custom_monitor_instance_id, custom_monitor_run_id, run_status, error_msg)
            if not int(status_code) in [200, 201, 202]:
                error_msgs.append(response)
    
        if len(error_msgs) == 0:
            response_payload = {
                "predictions" : [{ 
                    "values" : published_measurements
                }]

            }
        else:
            response_payload = {
                "predictions":[{
                    "values":[{"errors": error_msgs}]
                }]
            }
        
        return response_payload
        
    return publish
    

## 6. Set up the custom monitor configuration. <a name="custom_monitor"></a>


This setup initializes the WML client, sets the default space, deletes existing resources, and recreates the python function deployment, custom metrics provider, custom monitor definition, and monitor instance.

In [None]:
wos_client.custom_monitor.setup_configuration(config,custom_metrics_provider)

## 7. Get custom monitor configuration <a name="get_config"></a>

In [31]:
result = wos_client.custom_monitor.get_custom_monitor_configuration(config=config)
result

{'function_id': '6d10299c-a396-4f05-9a57-98ea57d67b3d',
 'deployment_id': '13e3669b-dcd8-43e7-854a-f8843d159fb2',
 'scoring_url': 'https://us-south.ml.cloud.ibm.com/ml/v4/deployments/13e3669b-dcd8-43e7-854a-f8843d159fb2/predictions?version=2025-04-30',
 'integrated_system_id': '01968562-ca4c-7858-ab05-d92f5d31cb1f',
 'custom_monitor_id': 'sample_model_performance',
 'custom_monitor_instance_id': '01968562-f7a5-706e-9390-00b1aec736fd'}

In [None]:
custom_monitor_instance_id = result["custom_monitor_instance_id"]
custom_monitor_id = result["custom_monitor_id"]
custom_dataset_id = result["custom_dataset_id"]
custom_monitor_instance_id

'01968562-f7a5-706e-9390-00b1aec736fd'

## Show all the monitor instances of the subscription
The following cell lists the monitors present in the development subscription along with their respective statuses and other details. Please wait for all the monitors to be in active state before proceeding further.

In [None]:
wos_client.monitor_instances.show(target_target_id=SUBSCRIPTION_ID)

0,1,2,3,4,5,6
2e25b9e6-1dc2-4ea2-9c81-53904ab3d931,active,01968561-1b2a-7e33-be21-7664c95df22d,subscription,sample_model_performance,2025-04-30 06:29:39.002000+00:00,01968562-f7a5-706e-9390-00b1aec736fd
2e25b9e6-1dc2-4ea2-9c81-53904ab3d931,active,01968561-1b2a-7e33-be21-7664c95df22d,subscription,generative_ai_quality,2025-04-30 06:27:42.145000+00:00,01968561-3180-7a13-bcc8-b69c54b16302
2e25b9e6-1dc2-4ea2-9c81-53904ab3d931,active,01968561-1b2a-7e33-be21-7664c95df22d,subscription,model_health,2025-04-30 06:27:43.449000+00:00,01968561-34df-7d57-a12a-5520f520d403
2e25b9e6-1dc2-4ea2-9c81-53904ab3d931,active,01968561-1b2a-7e33-be21-7664c95df22d,subscription,mrm,2025-04-30 06:27:44.083000+00:00,01968561-3a93-7bfc-a331-3e84de3f177f


## 8. Risk evaluations for PTA subscription <a name="evaluate"></a>

### Evaluate the prompt template subscription

For the risk assessment of a development type subscription the user needs to have an evaluation dataset. The risk evaluation function takes the evaluation dataset path as a parameter for evaluation of the configured metric dimensions. If there is a discrepancy between the feature columns in the subscription and the column names in the uploading CSV, users has the option to supply a mapping JSON file to associate the CSV column names with the feature column names in the subscription.


**Note:* If you are running this notebook from Watson studio, you may first need to upload your test data to studio and run code snippet to download feedback data file from project to local directory

The following cell will assess the test data with the subscription of the prompt template asset and produce relevant measurements for the configured monitor.

In [34]:
test_data_set_name = "data"
content_type = "multipart/form-data"
body = {}

# Preparing the test data, removing extra columns
cols_to_remove = ["uid", "doc", "title", "id"]
for col in cols_to_remove:
    if col in llm_data:
        del llm_data[col]
llm_data.to_csv(test_data_path, index=False)

response = wos_client.monitor_instances.mrm.evaluate_risk(
    monitor_instance_id=mrm_monitor_instance_id,
    test_data_set_name=test_data_set_name,
    test_data_path=test_data_path,
    content_type=content_type,
    body=body,
    space_id=SPACE_ID,
    includes_model_output=True,
    background_mode=False
)

#####################################################################################
        #For production flow 
######################################################################################
# response  = wos_client.monitor_instances.mrm.evaluate_risk(monitor_instance_id=mrm_monitor_instance_id, 
#                                                     body = body,
#                                                     space_id = SPACE_ID,
#                                                     evaluation_tests = [custom_monitor_id, "model_health"],
#                                                     background_mode = False)





 Waiting for risk evaluation of MRM monitor 01968561-3a93-7bfc-a331-3e84de3f177f 




upload_in_progress.
running..
finished

---------------------------------------
 Successfully finished evaluating risk 
---------------------------------------




### Read the risk evaluation response

After initiating the risk evaluation, the evaluation results are now available for review

In [35]:
response = wos_client.monitor_instances.mrm.get_risk_evaluation(mrm_monitor_instance_id, space_id=SPACE_ID)
response.result.to_dict()

{'metadata': {'id': 'ea9f55b7-f7ab-4dc6-b00e-7d9ffa2ed8fb',
  'created_at': '2025-04-30T06:30:11.618Z',
  'created_by': 'iam-ServiceId-b317a8da-d926-496e-b0ca-6bcc57f556ae'},
 'entity': {'triggered_by': 'user',
  'parameters': {'deployment_id': '14598d71-1df4-4687-a583-23f4982841a1',
   'evaluation_start_time': '2025-04-30T06:29:58.414895Z',
   'evaluator_user_key': 'd347a831-fa73-43f1-a1c1-2fd620440e2a',
   'facts': {'state': 'finished'},
   'is_auto_evaluated': False,
   'measurement_id': '01968563-7be5-7522-beeb-a044ef60ce9a',
   'monitors_run_status': [{'monitor_id': 'generative_ai_quality',
     'status': {'state': 'finished'}},
    {'monitor_id': 'model_health', 'status': {'state': 'finished'}},
    {'monitor_id': 'sample_model_performance',
     'status': {'state': 'finished'}}],
   'prompt_template_asset_id': 'b8601bfe-06a4-4669-b7bd-eda9c6d1ce76',
   'prompt_template_details': {'pta_resource_key': '70b12f06c5dd777c296b066bccf35bdc535f5989df3171a5e1746d689d5d2079'},
   'space_i

## 9. Display the custom metrics <a name="custom_metrics"></a>

Monitor instance ID of custom monitor is required for reading its metrics.

Displaying the custom monitor metrics generated through the risk evaluation.

In [37]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=custom_monitor_instance_id, space_id=SPACE_ID)

0,1,2,3,4,5,6,7,8,9,10,11
2025-04-30 06:30:21.749253+00:00,sensitivity,01968563-a335-7a9f-91b2-b39b3c382721,0.85,0.6,1.0,['region:us-south'],sample_model_performance,01968562-f7a5-706e-9390-00b1aec736fd,e43e151e-6e1f-46cd-89b7-135a0e4c124b,subscription,01968561-1b2a-7e33-be21-7664c95df22d
2025-04-30 06:30:21.749253+00:00,gender_less40_fav_prediction_ratio,01968563-a335-7a9f-91b2-b39b3c382721,0.4,0.6,1.0,['region:us-south'],sample_model_performance,01968562-f7a5-706e-9390-00b1aec736fd,e43e151e-6e1f-46cd-89b7-135a0e4c124b,subscription,01968561-1b2a-7e33-be21-7664c95df22d
2025-04-30 06:30:21.749253+00:00,specificity,01968563-a335-7a9f-91b2-b39b3c382721,1.2,0.8,,['region:us-south'],sample_model_performance,01968562-f7a5-706e-9390-00b1aec736fd,e43e151e-6e1f-46cd-89b7-135a0e4c124b,subscription,01968561-1b2a-7e33-be21-7664c95df22d


### Show record level metrics

In [None]:
wos_client.data_sets.show_records(data_set_id = custom_dataset_id)

# [OPTIONAL STEP] Invoke the custom metrics python function deployment as part of this notebook.

Validate the custom metrics provider deployment by providing the correct set of paramaters to generate the custom metrics.

In [38]:
def get_dataset_id(data_set_type: str):
    data_sets = wos_client.data_sets.list(target_target_id= config["SUBSCRIPTION_ID"], type = data_set_type).result.data_sets
    feedback_data_set_id = None
    if len(data_sets) > 0:
        feedback_data_set_id = data_sets[0].metadata.id
    return feedback_data_set_id

In [42]:
monitor_runs = wos_client.monitor_instances.list_runs(monitor_instance_id=custom_monitor_instance_id).result
result_json = monitor_runs._to_dict()
latest_run = result_json["runs"][0]
print(latest_run)
custom_monitor_run_id = latest_run["metadata"]["id"]

{'metadata': {'id': 'e43e151e-6e1f-46cd-89b7-135a0e4c124b', 'crn': 'crn:v1:bluemix:public:aiopenscale:us-south:a/6c26e1f11ef745238913798de41a0653:2e25b9e6-1dc2-4ea2-9c81-53904ab3d931:run:e43e151e-6e1f-46cd-89b7-135a0e4c124b', 'url': '/v2/monitor_instances/01968562-f7a5-706e-9390-00b1aec736fd/runs/e43e151e-6e1f-46cd-89b7-135a0e4c124b', 'created_at': '2025-04-30T06:30:20.532000Z', 'created_by': 'iam-ServiceId-b317a8da-d926-496e-b0ca-6bcc57f556ae'}, 'entity': {'triggered_by': 'user', 'parameters': {'custom_metrics_provider_id': '01968562-ca4c-7858-ab05-d92f5d31cb1f', 'custom_metrics_wait_time': 60, 'enable_custom_metric_runs': True}, 'status': {'state': 'finished', 'queued_at': '2025-04-30T06:30:20.528000Z', 'started_at': '2025-04-30T06:30:20.532000Z', 'updated_at': '2025-04-30T06:30:28.149000Z', 'completed_at': '2025-04-30T06:30:27.721000Z', 'operators': []}}}


In [None]:
import uuid
parameters = {
    "custom_metrics_provider_id": result["integrated_system_id"],
    "custom_metrics_wait_time":   config["CUSTOM_METRICS_WAIT_TIME"]
}

payload= {
    "data_mart_id" : config["DATAMART_ID"],
    "subscription_id" : config["SUBSCRIPTION_ID"],
    "custom_monitor_id" : custom_monitor_instance_id,
    "custom_monitor_instance_id" : custom_monitor_instance_id,
    "custom_monitor_run_id":custom_monitor_run_id,
    "custom_monitor_instance_params": parameters,
    "feedback_dataset_id": get_dataset_id("feedback"),
    "custom_dataset_id": custom_dataset_id
}

input_data= { "input_data": [ { "values": payload } ]
            }


func_result = custom_metrics_provider()(input_data)
func_result


User can navigate to see the published facts in space 

In [None]:
factsheets_url = "https://dataplatform.cloud.ibm.com/ml-runtime/deployments/{}/details?space_id={}&context=wx&flush=true".format(deployment_id, SPACE_ID)
print("User can navigate to the published facts in space {}".format(factsheets_url))


## Congratulations

You have finished configuring GenAI Quality monitor, Custom Monitor Definition and Monitor instance and executing Custom Monitor Run for summarization task type. You can now navigate to the prompt template asset in your project / space and click on the Evaluate tab to visualise the results on the UI.