# Working with Prompt Template Assets for a Retrieval-Augmented Generation task in watsonx.governance

This notebook will create a retrieval-augmented generation (RAG) prompt template asset (PTA) in a given project, configure watsonx.governance to monitor that PTA and evaluate generative quality metrics and model health metrics, and then promote the prompt template asset to a space and do the same evaluation.

If you wish to execute this notebook for task types other than RAG, refer to [Evaluating prompt template asset of different task types](https://github.com/IBM/watson-openscale-samples/blob/main/IBM%20Cloud/WML/notebooks/watsonx/README.md) for guidance on evaluating prompt templates for other available task types.

This notebook should be run using a Python 3.10 or greater runtime environment. If you are viewing this notebook in Watson Studio and do not see Python 3.10.x or higher in the upper right corner of your screen, please update the runtime now. 

**Note**: Run your notebook on a Cloud Pak for Data (CPD) cluster using version 5.0.0 or above.

## Learning goals

- Create a prompt template asset in a CPD project
- Configure watsonx.governance to monitor the created prompt template asset 
- Evaluate generative quality metrics and model health metrics
- Promote the prompt template asset to a space
- Evaluate the prompt template asset in a space 

## Prerequisites

- Service credentials for IBM watsonx.governance are required
- Watson OpenScale (WOS) credentials are required
- Watson Machine Learning (WML) credentials are required
- A `.csv` file containing test data to be evaluated
- ID of the CPD project in which you want to create the PTA
- ID of the CPD space to which you want to promote the PTA

## Contents

[Evaluating a Prompt Template Asset from a project](#evaluateproject)
- [Step 1 - Setup](#settingup)
- [Step 2 - Create a Prompt template](#prompt)
- [Step 3 - Setup the prompt template](#ptatsetup)
- [Step 4 - Risk evaluations for the PTA subscription](#evaluate)
- [Step 5 - Display the Model Risk metrics](#mrmmetric)
- [Step 6 - Display the Generative AI Quality metrics](#genaimetrics)
- [Step 7 - Plot faithfulness and answer relevance metrics against records](#plotproject)
- [Step 8 - See factsheets information](#factsheetsspace)

[Evaluating a Prompt Template Asset from a space](#evaluatespace)
- [Step 9 - Promote a PTA to a space](#promottospace)
- [Step 10 - Create a deployment for a PTA in a space](#ptadeployment)
- [Step 11 - Set up the PTA in a space for evaluation with supported monitor parameters](#ptaspace)
- [Step 12 - Score the model and configure monitors](#score)
- [Step 13 - Display the source attributions for a record](#attributions)
- [Step 14 - Plot faithfulness and answer relevance metrics against records](#plotspace)
- [Step 15 - See factsheets information from a space](#factsheetsproject)

## Evaluating a Prompt Template Asset from a project <a name="evaluateproject"></a>

In the first section of this notebook, you will learn how to:

1. Create a PTA in a project
2. Create a `development`-type subscription for a PTA in OpenScale
3. Configure monitors supported by OpenScale for the subscription
4. Perform risk evaluations against the PTA subscription with a sample set of test data
5. Display the metrics generated with the risk evaluation
6. Display the factsheets information for the subscription

## Step 1 - Setup <a name="settingup"></a>

### Install the necessary packages

In [None]:
!pip install -U ibm-watson-openscale | tail -n 1
!pip install --upgrade ibm-watson-machine-learning | tail -n 1
!pip install matplotlib

**Note**: you may need to restart the kernel to use updated packages.

### Configure your credentials

Run your notebook on a CPD cluster using version 5.0.0 or above.

In [None]:
WOS_CREDENTIALS = {
     "url": "<PLATFORM_URL>",
     "username": "<YOUR_USERNAME>",
     "password": "<YOUR_PASSWORD>"
}

WML_CREDENTIALS = {
     "url": "<PLATFORM_URL>",
     "username": "<YOUR_USERNAME>",
     "password" : "<YOUR_PASSWORD>",
     "instance_id": "wml_local",
     "apikey": "<YOUR_APIKEY>",
     "version" : "4.8"
}

**Note**: Replace the `WOS_CREDENTIALS` with your Watson OpenScale credentials, and the `WML_CREDENTIALS` with your Watson Machine Learning credentials.

### Configure your project ID

To set up a development-type subscription in Watson OpenScale, the PTA must be within a CPD project. Supply the project ID where the PTA needs to be created.

In [None]:
project_id = "<YOUR_PROJECT_ID>"

### Configure your space ID

You can use an existing space, or you can create a new space to promote the model.

#### (Optional) If you choose an existing space

Set variable for an existing space:

In [None]:
use_existing_space = True # Set as False to create a new space

Import WML client:

In [None]:
import json
from ibm_watson_machine_learning import APIClient

wml_client = APIClient(WML_CREDENTIALS)
wml_client.version

List the available spaces:

In [None]:
wml_client.spaces.list()

Add the existing space name to the following cell:

In [None]:
existing_space_id = "<YOUR_SPACE_ID_NAME>"

#### (Optional) If you choose to create a new space

Set variable for a new space:

In [None]:
use_existing_space = False # Set as True to use an existing space

Create a name for your new space:

In [None]:
space_name = "<YOUR_NEW_SPACE_NAME>"

Set up your new space:

In [None]:
if use_existing_space == True:
    space_id = existing_space_id
else:
    space_meta_data = {
        wml_client.spaces.ConfigurationMetaNames.NAME : space_name,
        wml_client.spaces.ConfigurationMetaNames.DESCRIPTION : 'tutorial_space'
    }

    space_id = wml_client.spaces.store(
        meta_props=space_meta_data)["metadata"]["id"]
wml_client.set.default_space(space_id)
print(space_id)

### Create an access token

The following function generates an IAM access token using the provided credentials. The API calls for creating and scoring prompt template assets utilize the token generated by this function.

In [None]:
import requests, json
def generate_access_token():
    headers={}
    headers["Content-Type"] = "application/json"
    headers["Accept"] = "application/json"
    data = {
        "username":WOS_CREDENTIALS["username"],
        "password":WOS_CREDENTIALS["password"]
    }
    data = json.dumps(data).encode("utf-8")
    url = WOS_CREDENTIALS["url"] + "/icp4d-api/v1/authorize"
    response = requests.post(url=url, data=data, headers=headers,verify=False)
    json_data = response.json()
    iam_access_token = json_data['token']      
        
    return iam_access_token

iam_access_token = generate_access_token()

## Step 2 - Create a Prompt template <a name="prompt"></a>

Create a prompt template for a RAG task:

In [None]:
credentials={
     "apikey": WML_CREDENTIALS["apikey"],
     "url": WML_CREDENTIALS["url"],
     "instance_id": "openshift",
     "username": WML_CREDENTIALS["username"]
}

In [None]:
from ibm_watson_machine_learning.foundation_models.prompts import PromptTemplate, PromptTemplateManager
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes

prompt_mgr = PromptTemplateManager(
                credentials = credentials,
                project_id = project_id
                )

prompt_template = PromptTemplate(name="RAG QA",
                                 model_id=ModelTypes.GRANITE_13B_CHAT_V2,
                                 task_ids=["retrieval_augmented_generation"],
                                 input_prefix="",
                                 output_prefix="",
                                 input_text="Answer the below question from the given context only and do not use the knowledge outside the context.\n\nContext: {context1} {context2} {context3} {context4}\nQuestion: {question}\nAnswer:",
                                 input_variables=["context1", "context2", "context3", "context4", "question"])

stored_prompt_template = prompt_mgr.store_prompt(prompt_template)
project_pta_id = stored_prompt_template.prompt_id
project_pta_id

## Step 3 - Set up the Prompt template <a name="ptatsetup"></a>

### Configure OpenScale

In [None]:
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *


authenticator = CloudPakForDataAuthenticator(
        url=WOS_CREDENTIALS['url'],
        username=WOS_CREDENTIALS['username'],
        password=WOS_CREDENTIALS['password'],
        disable_ssl_verification=True
    )

wos_client = APIClient(service_url=WOS_CREDENTIALS['url'],authenticator=authenticator)
print(wos_client.version)

### List available OpenScale datamarts and configure the datamart ID

In [None]:
wos_client.data_marts.show()

In [None]:
data_mart_id = "<YOUR_DATAMART_ID>"

### Map the project ID to an Openscale instance

When authentication is on CPD, you must take the additional step of mapping the project_id and space_id to an OpenScale instance.

In [None]:
wos_client.wos.add_instance_mapping(                
            service_instance_id=data_mart_id,
            project_id=project_id
         )
wos_client.wos.add_instance_mapping(                
            service_instance_id=data_mart_id,
            space_id=space_id
         )

### Set up the PTA in the project for evaluation with supported monitor parameters

The PTAs from a project are only supported with a `development`-type operational space ID. Running the following cell will create a `development`-type subscription from the PTA created within your project.

The available parameters that can be passed for the `execute_prompt_setup` function are:

 * `prompt_template_asset_id`: ID of the PTA for which a subscription needs to be created
 * `label_column`: The name of the column containing the ground truth or actual labels
 * `project_id`: The ID of the project
 * `space_id`: The ID of the space
 * `deployment_id`: (optional) The ID of the deployment
 * `operational_space_id`: The rank of the environment in which the monitoring is happening. Accepted values are `development`, `pre_production`, `production`
 * `problem_type`: (optional) The task type to monitor for the given PTA
 * `classification_type`: The classification type (`binary`/`multiclass`) applicable only for the `classification` problem (task) type
 * `input_data_type`: The input data type
 * `supporting_monitors`: Monitor configuration for the subscription to be created
 * `background_mode`: When `True`, the prompt setup operation will be executed in the background

 #### Faithfulness parameters
| Parameter | Description | Default Value |
|:-|:-|:-|
| `attributions_count` [Optional]| Source attributions are computed for each sentence in the generated answer. Source attribution for a sentence is the set of sentences in the context which contributed to the LLM generating that sentence in the answer.  The `attributions_count` parameter specifies the number of sentences in the context which need to be identified for attributions. , if the value is set to 2, then we will find the top 2 sentences from the context as source attributions. | `3` |
| `ngrams` [Optional]| The number of sentences to be grouped from the context when computing faithfulness score. These grouped sentences will be shown in the attributions. Having a very high value of ngrams might lead to having lower faithfulness scores due to dispersion of data and inclusion of unrelated sentences in the attributions. Having a very low value might lead to increase in metric computation time and attributions not capturing the all the aspects of the answer. | `2` |

#### Unsuccessful requests parameters
| Parameter | Description | Default Value |
|:-|:-|:-|
| `unsuccessful_phrases` [Optional]| The list of phrases to be used for comparing the model output to determine whether the request is unsuccessful or not. | `["i don't know", "i do not know", "i'm not sure", "i am not sure", "i'm unsure", "i am unsure", "i'm uncertain", "i am uncertain", "i'm not certain", "i am not certain", "i can't fulfill", "i cannot fulfill"]` |

In [None]:
# Update the label_column, context_fields, question_field values based on the prompt and test data used
label_column = "answer"
context_fields = ["context1", "context2", "context3", "context4"]
question_field = "question"

operational_space_id = "development"
problem_type= "retrieval_augmented_generation"
input_data_type= "unstructured_text"


monitors = {
    "generative_ai_quality": {
        "parameters": {
            "min_sample_size": 5,
            "metrics_configuration":{
                "faithfulness": {
                    #"attributions_count": 3,
                    #"ngrams": 2,
                },
                "answer_relevance": {},
                "rouge_score": {},
                "exact_match": {},
                "bleu": {},
                "unsuccessful_requests": {
                    #"unsuccessful_phrases": []
                },
                "hap_input_score": {},
                "hap_score": {},
                "pii": {},
                "pii_input": {}
            }
        }
    }
}

response = wos_client.wos.execute_prompt_setup(prompt_template_asset_id = project_pta_id, 
                                               project_id = project_id,
                                               context_fields = context_fields,
                                               question_field = question_field,
                                               label_column = label_column,
                                               operational_space_id = operational_space_id, 
                                               problem_type = problem_type,
                                               input_data_type = input_data_type, 
                                               supporting_monitors = monitors, 
                                               background_mode = False)

result = response.result
result._to_dict()

With the following cell, you can read the prompt setup task and check its status

In [None]:
response = wos_client.wos.get_prompt_setup(prompt_template_asset_id = project_pta_id,
                                                             project_id = project_id)

result = response.result
result_json = result._to_dict()

if result_json["status"]["state"] == "FINISHED":
    print("Finished prompt setup : The response is {}".format(result_json))
else:
    print("prompt setup failed The response is {}".format(result_json))

### Read the `subscription_id` from the prompt setup

Once the prompt setup status is `FINISHED`, read the subscription ID:

In [None]:
dev_subscription_id = result_json["subscription_id"]
dev_subscription_id

### Show all monitor instances in the development subscription
The following cell lists the monitors present in the development subscription, along with their respective statuses and other details. Please wait for all the monitors to be in an active state before proceeding further.

In [None]:
wos_client.monitor_instances.show(target_target_id = dev_subscription_id)

## Step 4 - Risk evaluations for the PTA subscription <a name="evaluate"></a>

### Evaluate the prompt template subscription

For risk assessment of a `development`-type subscription, you must have an evaluation dataset. The risk assessment function takes the evaluation dataset path as a parameter when evaluating the configured metrics. If there is a discrepancy between the feature columns in the subscription and the column names in the uploading `.CSV` file, you have the option to supply a mapping JSON file to associate the `.CSV` column names with the feature column names in the subscription.

**Note**: If you are running this notebook from Watson Studio, you may first need to upload your test data to Watson Studio, then run the code snippet below to download the feedback data file from the project to a local directory.

In [None]:
# Download rag data
!rm rag_state_union.csv
!wget https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/watsonx/rag_state_union.csv

In [None]:
test_data_path = "rag_state_union.csv"
body = None # Please update your mapping file path here if needed

# Download data from project to local directory
# Run the below code snippet only if you are running the notebook via Watson Studio
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()
wslib.download_file(test_data_path)
if body:
    wslib.download_file(body)

### Read the Model Risk metrics `instance_id` from OpenScale

Evaluating test data against the prompt template subscription requires the monitor instance ID for your OpenScale Model Risk metrics.

In [None]:
monitor_definition_id = "mrm"
target_target_id = dev_subscription_id
result = wos_client.monitor_instances.list(data_mart_id=data_mart_id,
                                           monitor_definition_id=monitor_definition_id,
                                           target_target_id=target_target_id,
                                           project_id=project_id).result
result_json = result._to_dict()
mrm_monitor_id = result_json["monitor_instances"][0]["metadata"]["id"]
mrm_monitor_id

The following cell will assess the test data with the subscription of the PTA and produce relevant measurements for the configured monitor.

In [None]:
test_data_set_name = "data"
content_type = "multipart/form-data"

response  = wos_client.monitor_instances.mrm.evaluate_risk(monitor_instance_id=mrm_monitor_id, 
                                                    test_data_set_name = test_data_set_name, 
                                                    test_data_path = test_data_path,
                                                    content_type = content_type,
                                                    body = body,
                                                    project_id = project_id,
                                                    background_mode = False)

### Read the risk evaluation response

After initiating the risk evaluation, the evaluation results are available for review:

In [None]:
response  = wos_client.monitor_instances.mrm.get_risk_evaluation(mrm_monitor_id, project_id = project_id)
response.result.to_dict()

## Step 5 - Display the Model Risk metrics <a name="mrmmetric"></a>

Having calculated the measurements for the Foundation Model subscription, the Model Risk metrics generated for this subscription are available for your review:

In [None]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=mrm_monitor_id, project_id=project_id)

## Step 6 - Display the Generative AI quality metrics <a name="genaimetrics"></a>

The monitor instance ID is required for reading the Generative AI quality metrics.

In [None]:
monitor_definition_id = "generative_ai_quality"
result = wos_client.monitor_instances.list(data_mart_id = data_mart_id,
                                           monitor_definition_id = monitor_definition_id,
                                           target_target_id = target_target_id,
                                           project_id = project_id).result
result_json = result._to_dict()
genaiquality_monitor_id = result_json["monitor_instances"][0]["metadata"]["id"]
genaiquality_monitor_id

Display the Generative AI quality monitor metrics generated through the risk evaluation.

In [None]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=genaiquality_monitor_id, project_id=project_id)

### Display record level metrics for Generative AI quality 

Get the dataset ID for the Generative AI quality dataset:

In [None]:
result = wos_client.data_sets.list(target_target_id = dev_subscription_id,
                                target_target_type = "subscription",
                                type = "gen_ai_quality_metrics").result

genaiq_dataset_id = result.data_sets[0].metadata.id
genaiq_dataset_id

Display record level metrics for Generative AI quality:

In [None]:
wos_client.data_sets.show_records(data_set_id = genaiq_dataset_id)

## Step 7 - Plot faithfulness and answer relevance metrics against records <a name="plotproject"></a>

Retrieve a list of records and extract the record IDs, faithfulness values, and answer relevance values:

In [None]:
result = wos_client.data_sets.get_list_of_records(data_set_id = genaiq_dataset_id).result
result["records"]
x = []
y_faithfulness = []
y_answer_relevance = []
for each in result["records"]:
    x.append(each["metadata"]["id"][-5:]) # Reading only last 5 characters to fit in the display
    y_faithfulness.append(each["entity"]["values"]["faithfulness"])
    y_answer_relevance.append(each["entity"]["values"]["answer_relevance"])

Plot faithfulness metrics against the records

In [None]:
import matplotlib.pyplot as plt
plt.scatter(x, y_faithfulness, marker='o')

# Adding labels and title
plt.xlabel('X-axis - Record id (last 5 characters)')
plt.ylabel('Y-axis - Faithfulness')
plt.title('faithfulness vs record id')

# Display the graph
plt.show()

Plot answer relevance metrics against the records

In [None]:
import matplotlib.pyplot as plt
plt.scatter(x, y_answer_relevance, marker='o')

# Adding labels and title
plt.xlabel('X-axis - Record id (last 5 characters)')
plt.ylabel('Y-axis - Answer relevance')
plt.title('answer_relevance vs record id')

# Display the graph
plt.show()

## Step 8 - See factsheets information <a name="factsheetsspace"></a>

In [None]:
factsheets_url = factsheets_url = "{}/wx/prompt-details/{}/factsheet?context=wx&project_id={}".format(WML_CREDENTIALS["url"],project_pta_id, project_id)
print("User can navigate to the published facts in project {}".format(factsheets_url))

## Evaluating a Prompt Template Asset from a space <a name="evaluatespace"></a>

So far, you have performed the following tasks:

1. Created a PTA in a project
2. Created a `development`-type subscription for a PTA in OpenScale
3. Configured monitors supported by OpenScale for the subscription
4. Performed risk evaluations against the PTA subscription with a sample set of test data
5. Displayed the metrics generated with the risk evaluation
6. Displayed the factsheets information for the subscription

Now, you will promote the created PTA to a space and perform similar actions.

## Step 9 - Promote a PTA to a space <a name="promottospace"></a> 

The following cell promotes the prompt template asset from your project to your space.

In [None]:
headers={}
headers["Content-Type"] = "application/json"
headers["Accept"] = "*/*"
headers["Authorization"] = "Bearer {}".format(iam_access_token)
verify = True

DATAPLATFORM_URL = WOS_CREDENTIALS["url"]
verify = False
url = "{}/v2/assets/{}/promote".format(DATAPLATFORM_URL ,project_pta_id)

params = {
    "project_id":project_id
}

payload = {
    "space_id": space_id
}
response = requests.post(url, json=payload, headers=headers, params = params, verify = verify)
json_data = response.json()
json_data
space_pta_id = json_data["metadata"]["asset_id"]
space_pta_id

## Step 10 - Create a deployment for a PTA in a space <a name="ptadeployment"></a>

To create a subscription from a space, it is necessary to create a deployment for a PTA in a space.

In [None]:
DEPLOYMENTS_URL = WML_CREDENTIALS["url"] + "/ml/v4/deployments"

serving_name = "rag_qa_deployment" # eg: summary_deployment

payload = {
    "prompt_template": {
      "id": space_pta_id
    },
    "online": {
       "parameters": {
         "serving_name": serving_name
       }
    },
    "base_model_id": "ibm/granite-13b-chat-v2",
    "description": "rag qa deployment",
    "name": "rag qa deployment",
    "space_id": space_id
}

version = "2024-05-05" # The version date for the API of the form YYYY-MM-DD. Example : 2023-07-07
params = {
    "version":version,
    "space_id":space_id
}

response = requests.post(DEPLOYMENTS_URL, json=payload, headers=headers, params = params, verify = verify)
json_data = response.json()


if "metadata" in json_data:
    deployment_id = json_data["metadata"]["id"]
    print(deployment_id)
else:
    print(json_data)

## Step 11 - Set up the PTA in a space for evaluation with supported monitor parameters <a name="ptaspace"></a>

Use of a PTA in a space is only supported with `pre_production` and `production` operational space IDs. Running the following cell will create a `production`-type subscription from the PTA promoted to the space. The `problem_type` value should depend on the task type specified in the PTA.

In [None]:
label_column = "answer"
context_fields = ["context1", "context2", "context3", "context4"]
question_field = "question"
operational_space_id = "production"
problem_type= "retrieval_augmented_generation"
input_data_type= "unstructured_text"

monitors = {
    "generative_ai_quality": {
        "parameters": {
            "min_sample_size": 5,
            "metrics_configuration":{
                "faithfulness": {
                    #"attributions_count": 3,
                    #"ngrams": 2,
                },
                "answer_relevance": {},
                "rouge_score": {},
                "exact_match": {},
                "bleu": {},
                "unsuccessful_requests": {
                    #"unsuccessful_phrases": []
                },
                "hap_input_score": {},
                "hap_score": {},
                "pii": {},
                "pii_input": {}
            }
        }
    },
    "drift_v2": {
        "thresholds": [
            {
                "metric_id": "confidence_drift_score",
                "type": "upper_limit",
                "value": 0.05
            },
            {
                "metric_id": "prediction_drift_score",
                "type": "upper_limit",
                "value": 0.05
            },
            {
                "metric_id": "input_metadata_drift_score",
                "specific_values": [
                    {
                        "applies_to": [
                            {
                                "type": "tag",
                                "value": "subscription",
                                "key": "field_type"
                            }
                        ],
                        "value": 0.05
                    }
                ],
                "type": "upper_limit"
            },
            {
                "metric_id": "output_metadata_drift_score",
                "specific_values": [
                    {
                        "applies_to": [
                            {
                                "type": "tag",
                                "value": "subscription",
                                "key": "field_type"
                            }
                        ],
                        "value": 0.05
                    }
                ],
                "type": "upper_limit"
            }
        ],
        "parameters": {
            "min_samples": 10,
            "train_archive": True
        }
    }
}


response = wos_client.wos.execute_prompt_setup(prompt_template_asset_id = space_pta_id, 
                                               space_id = space_id,
                                               deployment_id = deployment_id,
                                               context_fields=context_fields,
                                               question_field = question_field,
                                               label_column = label_column, 
                                               operational_space_id = operational_space_id, 
                                               problem_type = problem_type,
                                               input_data_type = input_data_type, 
                                               supporting_monitors = monitors, 
                                               background_mode = False)

result = response.result
result._to_dict()

With the following cell, you can read the prompt setup task and check its status:

In [None]:
response = wos_client.wos.get_prompt_setup(prompt_template_asset_id = space_pta_id,
                                                             deployment_id = deployment_id,
                                                             space_id = space_id)

result = response.result
result_json = result._to_dict()
result_json

### Read the subscription ID from the prompt setup

Once the prompt setup status is `finished`, get the subscription ID:

In [None]:
prod_subscription_id = result_json["subscription_id"]
prod_subscription_id

### Score the PTA deployment

Retrieve the scoring URL of the deployment from the subscription details.

In [None]:
sub_details = wos_client.subscriptions.get(prod_subscription_id).result
sub_details = sub_details._to_dict()
scoring_url = sub_details["entity"]["deployment"]["url"]
if not scoring_url.find("?version=") != -1:
    scoring_url = scoring_url.strip() + "?version=2024-05-05"

scoring_url = WML_CREDENTIALS["url"] + "/ml/v1/deployments/"+ deployment_id +"/text/generation?version=2024-05-05"
print(scoring_url)

## Step 12 - Score the model and configure monitors <a name="score"></a>

Once the WML service has been bound and the subscription has been created, you must score the PTA. Generate the test data content in JSON format from the previously-downloaded `.CSV` file. This is used to construct the payload for scoring the deployment.

In [None]:
test_data_path = "rag_state_union.csv"

In [None]:
import csv

feature_fields = context_fields + [question_field]
prediction = "generated_text"

headers={}
headers["Content-Type"] = "application/json"
headers["Accept"] = "*/*"
headers["Authorization"] = "Bearer {}".format(iam_access_token)

pl_data = []
prediction_list = []
with open(test_data_path, 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for row in csv_reader:
        request = {
            "parameters": {
                "template_variables": {
                }
            }
        }
        for each in feature_fields:
            request["parameters"]["template_variables"][each] = str(row[each])

        response = requests.post(scoring_url, json=request, headers=headers, verify=False).json()
        predicted_val = response["results"][0][prediction]
        prediction_list.append(predicted_val)
        record = {"request":request, "response":response}
        pl_data.append(record)
    
pl_data

### Generate additional payload data to enable drift

To enable drift, there should be a minimum of 100 records in the payload table. The following cell duplicates the scored records and creates another 100 records for adding to the payload table:

In [None]:
import copy

additional_pl_data = copy.copy(pl_data)
additional_pl_data *= 20
print("Generated {} additional payload data".format(len(additional_pl_data)))

### Add payload data

The following cell reads the payload data set ID from the subscription:

In [None]:
import time
from ibm_watson_openscale.supporting_classes.enums import *

time.sleep(5)
payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=prod_subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id: ", payload_data_set_id)

Add additional payload data to enable drift V2

In [None]:
wos_client.data_sets.store_records(data_set_id=payload_data_set_id, request_body=additional_pl_data,background_mode=False)
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))

In [None]:
wos_client.data_sets.get_records_count(payload_data_set_id)

A total of 105 records should be available within the payload table. If auto payload logging fails to transmit the scored records to the payload logging table, the following code can be used to manually add payload data to the table:

In [None]:
import uuid
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count < 105:
    print("Payload logging did not happen, performing explicit payload logging.")
    wos_client.data_sets.store_records(data_set_id=payload_data_set_id, request_body=pl_data,background_mode=False)
    time.sleep(5)
    pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))

### Add feedback data

The following cell reads the feedback dataset ID from the subscription:

In [None]:
import time
from ibm_watson_openscale.supporting_classes.enums import *

time.sleep(5)
feedback_data_set_id = None
feedback_data_set_id = wos_client.data_sets.list(type=DataSetTypes.FEEDBACK, 
                                                target_target_id=prod_subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if feedback_data_set_id is None:
    print("Feedback data set not found. Please check subscription status.")
else:
    print("Feedback data set id: ", feedback_data_set_id)

The provided code generates feedback data based on the downloaded `.CSV` file and the scored response.

In [None]:
import csv

test_data_content = []
csv_file_path = "rag_state_union.csv"

with open(csv_file_path, 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for row, prediction_val in zip(csv_reader, prediction_list):

        # Read each row from the CSV and add label and prediction values
        result_row = []
        result_row = [row[key] for key in feature_fields if key in row]
        result_row.append(row[label_column])
        result_row.append(prediction_val)

        test_data_content.append(result_row)
if len(test_data_content) == 5: # 10 records are there in the downloaded CSV
    print("generated feedback data from CSV")
else:
    print("Failed to generated feedback data from CSV, Kindly verify the CSV file content")

In [None]:
fields = feature_fields
fields.append(label_column)
fields.append("_original_prediction")
feedback_data = [
    {
        "fields": fields,
        "values": test_data_content
    }
]
feedback_data

The following code can be used to manually add feedback data to the table.

In [None]:
wos_client.data_sets.store_records(data_set_id=feedback_data_set_id, request_body=feedback_data,background_mode=False)
time.sleep(5)
fb_records_count = wos_client.data_sets.get_records_count(feedback_data_set_id)
# Adding time delay to enable drift
time.sleep(10)
print("Number of records in the feedback logging table: {}".format(fb_records_count))

### Show all the monitor instances in the production subscription
The following cell lists the monitors present in the production subscription, along with their respective statuses and other details. Please wait for all the monitors to be in an active state before proceeding further:

In [None]:
wos_client.monitor_instances.show(target_target_id = prod_subscription_id)

### Read the Model Risk Metrics monitor instance ID of a PTA subscription deployed in a space

Evaluating the test data against the prompt template subscription requires the monitor instance ID of the Model Risk Metrics monitor.

In [None]:
monitor_definition_id = "mrm"
target_target_id = prod_subscription_id
result = wos_client.monitor_instances.list(data_mart_id=data_mart_id,
                                           monitor_definition_id=monitor_definition_id,
                                           target_target_id=target_target_id,
                                           space_id=space_id).result
result_json = result._to_dict()
mrm_monitor_id = result_json["monitor_instances"][0]["metadata"]["id"]
mrm_monitor_id

### Evaluate the prompt template subscription from a space

The following cell will assess subscription of the prompt template asset and produce relevant measurements for the configured monitor. The data to be evaluated are already uploaded to payload and feedback table.

In [None]:
response  = wos_client.monitor_instances.mrm.evaluate_risk(monitor_instance_id=mrm_monitor_id, 
                                                    body = body,
                                                    space_id = space_id,
                                                    evaluation_tests = ["model_health", "drift_v2", "generative_ai_quality"],
                                                    background_mode = False)

### Read the risk evaluation response

After initiating the risk evaluation, the evaluation results of the PTA from your space are now available for review:

In [None]:
response  = wos_client.monitor_instances.mrm.get_risk_evaluation(mrm_monitor_id, space_id = space_id)
response.result.to_dict()

### Display the Model Risk metrics

Having calculated the measurements for the Foundation Model subscription, the MModel Risk metrics generated for this subscription are now available for your review:

In [None]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=mrm_monitor_id, space_id=space_id)

### Display the Generative AI quality metrics

The monitor instance ID for the Generative AI quality metrics is required for reading its metrics:

In [None]:
monitor_definition_id = "generative_ai_quality"
result = wos_client.monitor_instances.list(data_mart_id = data_mart_id,
                                           monitor_definition_id = monitor_definition_id,
                                           target_target_id = target_target_id,
                                           space_id = space_id).result
result_json = result._to_dict()
genaiquality_monitor_id = result_json["monitor_instances"][0]["metadata"]["id"]
genaiquality_monitor_id

Display the monitor metrics of the Generative AI quality metrics generated through the risk evaluation:

In [None]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=genaiquality_monitor_id, space_id=space_id)

## Step 13 - Display the source attribution for a record

Read the dataset ID for the Generative AI quality dataset:

In [None]:
result = wos_client.data_sets.list(target_target_id = prod_subscription_id,
                                target_target_type = "subscription",
                                type = "gen_ai_quality_metrics").result

genaiq_dataset_id = result.data_sets[0].metadata.id
genaiq_dataset_id

Display record level metrics for Generative AI quality:

In [None]:
wos_client.data_sets.show_records(data_set_id = genaiq_dataset_id)

### Display source attributions for a record from payload or feedback data <a name="attributions"></a>

Get a record from payload table. The below method can also be used to get the record from feedback table by providing the feedback dataset id

In [None]:
result = wos_client.data_sets.get_list_of_records(data_set_id = payload_data_set_id, limit=1).result
record = result["records"][0]["entity"]["values"]
scoring_id = record.get("scoring_id")
scoring_id

Get the source attributions from generative ai quality dataset for the scoring id

In [None]:
import pandas as pd
metrics_result = wos_client.data_sets.get_list_of_records(data_set_id = genaiq_dataset_id, filter="scoring_id:eq:{}".format(scoring_id)).result
record_metrics = metrics_result["records"][0]["entity"]["values"]
attributions, attribution_scores = [], []
for i in record_metrics.get("faithfulness_attributions")["faithfulness_attributions"]:
    for attr in i["attributions"]:
        attributions.extend(attr.get("feature_values"))
        attribution_scores.extend(attr.get("faithfulness_scores"))

attributions_df = pd.DataFrame({"faithfulness attribution": attributions, "attribution score": attribution_scores})
pd.set_option("display.max_colwidth", 0)
attributions_df.sort_values(by=["attribution score"], inplace=True, ascending=False)
print("Question: {}".format(record.get("question")))
print("Answer: {}".format(record.get("generated_text")))
print("Attributions: ")
attributions_df

## Step 14 - Plot faithfulness and answer relevance metrics against records <a name="plotspace"></a>

Retrieve a list of records and extract the record IDs, faithfulness values, and answer relevance values:

In [None]:
result = wos_client.data_sets.get_list_of_records(data_set_id = genaiq_dataset_id).result
result["records"]
x = []
y_faithfulness = []
y_answer_relevance = []
for each in result["records"]:
    x.append(each["metadata"]["id"][-5:]) # Reading only last 5 characters to fit in the display
    y_faithfulness.append(each["entity"]["values"]["faithfulness"])
    y_answer_relevance.append(each["entity"]["values"]["answer_relevance"])

Plot faithfulness metrics against the records

In [None]:
import matplotlib.pyplot as plt
plt.scatter(x, y_faithfulness, marker='o')

# Adding labels and title
plt.xlabel('X-axis - Record id (last 5 characters)')
plt.ylabel('Y-axis - Faithfulness')
plt.title('faithfulness vs record id')

# Display the graph
plt.show()

Plot answer_relevance metrics against the records

In [None]:
import matplotlib.pyplot as plt
plt.scatter(x, y_answer_relevance, marker='o')

# Adding labels and title
plt.xlabel('X-axis - Record id (last 5 characters)')
plt.ylabel('Y-axis - Answer relevance')
plt.title('answer_relevance vs record id')

# Display the graph
plt.show()

### Display the Drift V2 metrics

### Read the Drift V2 monitor instance id

The monitor instance ID of Drift V2 metrics is required for reading its metrics.

In [None]:
monitor_definition_id = "drift_v2"
result = wos_client.monitor_instances.list(data_mart_id = data_mart_id,
                                           monitor_definition_id = monitor_definition_id,
                                           target_target_id = target_target_id,
                                           space_id = space_id).result
result_json = result._to_dict()
drift_monitor_id = result_json["monitor_instances"][0]["metadata"]["id"]
drift_monitor_id

Display the monitor metrics of Drift V2 generated through the risk evaluation.

In [None]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=drift_monitor_id, space_id=space_id)

## Step 15 - See factsheets information from a space <a name="factsheetsproject"></a>

In [None]:
factsheets_url = "{}/ml-runtime/deployments/{}/details?space_id={}&context=wx&flush=true".format(WML_CREDENTIALS["url"], deployment_id, space_id)
    
print("User can navigate to the published facts in space {}".format(factsheets_url))

## Congratulations!

You have completed this notebook. You can now navigate to the prompt template asset in your OpenScale project / space and click on the `Evaluate` tab to visualize the results in the UI.

watsonx.governance

Copyright © 2024 IBM.