# Evaluating watsonx prompts with watsonx.governance

This notebook is part of the [watsonx.governance Level 4 Proof of Experience (PoX) hands-on lab](https://cp4d-outcomes.techzone.ibm.com/l4-pox/watsonx-governance). It will query a watsonx tuned prompt, and evaluate the output using the watsonx.governance (OpenScale) LLM SDK. Finally, it will push the evaluation metrics to the model use case in the watsonx governance console (OpenPages).

This notebook should be run in a Cloud Pak for Data 4.8.5 or higher software environment. It requires credentials for the Cloud Pak for Data install, which must be entered in the first code cell. It also requires credentials for the watsonx.ai SaaS environment.

Instructions for location your credentials are contained in the relevant portions of the hands-on lab. The code in this notebook is based off of the [Github sample code for Azure OpenAI monitoring](https://github.com/IBM/watson-openscale-samples/blob/main/IBM%20Cloud/WML/notebooks/watsonx/LLM%20Metrics%20Evals-Azure-OpenAI-OpenPages.ipynb) by [Ravi Chamarthy](mailto:ravi.chamarthy@in.ibm.com). If you receive errors caused by OpenPages API incompatibilities, you should be able to update with code from that notebook to address any issues.

In [None]:
CPD_URL = "https://cpd-cpd.apps._________.cloud.techzone.ibm.com"
CPD_USERNAME = "complianceofficer"
CPD_PASSWORD = "passw0rd"
API_KEY = "_____________"

WATSONX_API_KEY = "________________"
WATSONX_BASE_URL = "https://us-south.ml.cloud.ibm.com/ml/v1/deployments/_____________/text/generation?version=2021-05-01"
WATSONX_MODEL_TITLE = "watsonx Resume Summarization"

Once the keys have been entered in the cell above, you may run through the remainder of the notebook. It has been heavily commented to show what is occurring at each stage.

### Install the necessary libraries

**YOU MAY GET PIP DEPENDENCY RESOLVER ERRORS**. These can be safely ignored.

In [None]:
!pip install --upgrade ibm-watson-machine-learning   | tail -n 1
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1
!pip install --upgrade ibm-metrics-plugin --no-cache | tail -n 1
!pip install --upgrade ibm-metrics-plugin --no-cache | tail -n 1
!pip install --upgrade evaluate --no-cache | tail -n 1
!pip install --upgrade textstat --no-cache | tail -n 1
!pip install --upgrade sacrebleu --no-cache | tail -n 1
!pip install --upgrade sacremoses --no-cache | tail -n 1
!pip install --upgrade nltk --no-cache | tail -n 1

In [None]:
import nltk
nltk.download("punkt_tab")

### Read the test data into a dataframe

In [None]:
import pandas as pd
import numpy as np
llm_data_all = pd.read_csv("https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/watsonx-governance-l4/data/resume_summarization_test_data.csv")
llm_data_all.head()

### Define the prompt evaluation



In [None]:
def get_foundation(prompt_text):
    input_text = prompt_text.replace('\n', ' ').replace('\\n', ' ')
    data_string = '{ "parameters": { "prompt_variables": { "input": "' + input_text + '" } } }'
    data = data_string.encode()
    
    response = requests.post(
        WATSONX_BASE_URL,
        params=params,
        headers=headers,
        data=data
    )
    return response.json()['results'][0]['generated_text']

### Authenticate with watsonx

In [None]:
import requests
import json

headers = {
    'Content-Type': 'application/x-www-form-urlencoded',
    'Accept': 'application/json',
}

data = {
    'grant_type': 'urn:ibm:params:oauth:grant-type:apikey',
    'apikey': WATSONX_API_KEY,
}

response = requests.post('https://iam.cloud.ibm.com/identity/token', headers=headers, data=data, verify=False)
token = response.json()['access_token']

### Set headers

In [None]:
headers = {
    'Content-Type': 'application/json',
    'Accept': 'application/json',
    'Authorization': 'Bearer ' + token,
}

params = {
    'version': '2021-05-01',
}

### Run the prompt evaluation

The next cell tries to run the prompt evaluation using the supplied Azure credentials. If the credentials are blank or if it fails, it will fall back to loading the pre-generated responses.

In [None]:
llm_data_all['watsonx_generated_summary'] = llm_data_all['Resume'].apply(get_foundation)

### Show the output from the evaluation

In [None]:
llm_data_all.head()

### Sample generated output

In [None]:
llm_data_all['watsonx_generated_summary'][0]

# Evaluate Metrics

The next section of the notebook will evaluate the output.

### IBM watsonx.governance authentication

In [None]:
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *

authenticator = CloudPakForDataAuthenticator(
    url=CPD_URL,
    username=CPD_USERNAME,
    password=CPD_PASSWORD,
    disable_ssl_verification=True
)
    
client = APIClient(service_url=CPD_URL,authenticator=authenticator)
print(client.version)

### Common Imports

In [None]:
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMTextMetricGroup
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMGenerationMetrics
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMSummarizationMetrics
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMQAMetrics
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMClassificationMetrics
from ibm_metrics_plugin.metrics.llm.utils.constants import HAP_SCORE
from ibm_metrics_plugin.metrics.llm.utils.constants import PII_DETECTION

### Split the input, output, and source data into different dataframes

In [None]:
df_input = llm_data_all[['Resume']].copy()
df_output = llm_data_all[['watsonx_generated_summary']].copy()
df_reference = llm_data_all[['Summarization']].copy()

### Configure the metrics for evaluation

In [None]:
metric_config = {   
    "configuration": {
        LLMTextMetricGroup.SUMMARIZATION.value: {
            LLMSummarizationMetrics.ROUGE_SCORE.value: {},
            LLMSummarizationMetrics.SARI.value: {},
            LLMSummarizationMetrics.METEOR.value: {},
            LLMSummarizationMetrics.NORMALIZED_RECALL.value: {},
            LLMSummarizationMetrics.NORMALIZED_PRECISION.value: {},
            LLMSummarizationMetrics.NORMALIZED_F1_SCORE.value: {},
            LLMSummarizationMetrics.COSINE_SIMILARITY.value: {},
            LLMSummarizationMetrics.JACCARD_SIMILARITY.value: {},
            LLMSummarizationMetrics.BLEU.value: {},
            LLMSummarizationMetrics.FLESCH.value: {}
        }
    }
}

### Compute the metrics

In [None]:
import json
result = client.llm_metrics.compute_metrics(metric_config,df_input,df_output, df_reference)

### Evaluated Metrics

In [None]:
print(json.dumps(result,indent=2))

### Construct a key/value dict of the metrics to be published to OpenPages

In [None]:
def get_metrics(result):
    metrics = {}
    metrics['rouge1'] = round(result['rouge_score']['rouge1'], 4)
    metrics['rouge2'] = round(result['rouge_score']['rouge2'], 4)
    metrics['rougeL'] = round(result['rouge_score']['rougeL'], 4)
    metrics['rougeLsum'] = round(result['rouge_score']['rougeLsum'], 4)
    metrics['meteor'] = round(result['meteor']['metric_value'], 4)
    metrics['sari'] = round(result['sari']['metric_value'], 4)
    metrics['cosine_similarity'] = round(result['cosine_similarity']['metric_value'], 4)
    metrics['jaccard_similarity'] = round(result['jaccard_similarity']['metric_value'], 4)
    return metrics

### IF THE FOLLOWING CELL RESULTS IN AN ERROR RELATED TO THE nltk LIBRARY, RESTART THE KERNEL AND RE-RUN THE NOTEBOOK.

In some cases, you may receive an error stating that the nltk library is missing; it has been installed in a previous cell. The error can be fixed by restarting the kernel and re-running the notebook. Click on **Kernel** from the menu above, then click **Restart**. Then re-run the previous cells and continue.

In [None]:
metrics =  get_metrics(result)
metrics

# Publishing computed metrics to watsonx governance console

This section of the notebook publishes the metrics to a model that has been defined in the watsonx governance console.

### Import libraries for the REST API

In [None]:
import requests
import base64
import json
import http.client
import ssl

### Define functions to get authorization token for OpenPages

In [None]:
def get_token():
    from ibm_watson_studio_lib import access_project_or_space
    wslib = access_project_or_space()
    token = wslib.auth.get_current_token()
    return token

### Define a function to get the ID of the model from the title

In [None]:
def get_op_model_id(header, model_name):
    openpages_url = CPD_URL.rstrip("/") + "/openpages-openpagesinstance-cr-grc/api/query?q=SELECT [Model].[Resource ID] FROM [Model] WHERE [Model].[Title] IN ('{0}')".format(model_name)
    print(openpages_url)
    response = requests.get(openpages_url, headers=header, verify=False).json()
    
    model_id = None
    if response is not None:
        if response.get("rows") is not None:
            rows = response.get("rows")
            if len(rows) != 0:
                fields = rows[0].get("fields")
                if fields is not None:
                    field = fields.get("field")
                    if len(field) != 0:
                        model_id = field[0]["value"]

    if model_id is None:
        print("Model ID not found.")
    else:
        print("Model ID fetched: " + model_id)
    return model_id

### For a given model id, get the corresponding OP metrics definitions - Map containing metric id and its name

In [None]:
def get_op_model_metrics_definitions(header, model_id):
    openpages_url = CPD_URL.rstrip("/") + "/openpages-openpagesinstance-cr-grc/api/query?q=SELECT [Metric].[Resource ID], [Metric].[Name] FROM [Model] JOIN [Metric] ON PARENT([Model]) WHERE [Model].[Resource ID]='{0}'".format(model_id)
    response = requests.get(openpages_url, headers=header, verify=False).json()
    
    metrics_map = []

    if response is not None:
        if response.get("rows") is not None:
            rows = response.get("rows")
            if len(rows) != 0:
                for i in range(len(rows)):
                    fields = rows[i].get("fields")
                    if fields is not None:
                        field = fields.get("field")
                        metric_id_name = {}
                        metric_id = None
                        metric_name = None
                        for row in field:
                            if row.get('name') == 'Resource ID':
                                metric_id = row.get('value')
                            if row.get('name') == 'Name':
                                metric_name = row.get('value')
                        metric_id_name['metric_name'] = metric_name
                        metric_id_name['metric_id'] = metric_id
                        metrics_map.append(metric_id_name)
        print("Completed fetching, if any, all metrics associated with the model.")
        return metrics_map

### Construct the Metrics Object Payload for metrics creation

In [None]:
def get_metric_object_payload(primaryParentId, metric_name):
    metric_description = "watsonx.governance metric for '" + metric_name + "'"
    metric_object_payload = {
        "name": metric_name,
        "description": metric_description,
        "typeDefinitionId": "Metric",
        "primaryParentId": primaryParentId,
        "fields": {
            "field": [
                {
                    "name": "MRG-Metric:Data Source",
                    "dataType": "STRING_TYPE",
                    "value": "watsonx.governance"
                },
                {
                    "name": "MRG-Metric:Frequency",
                    "dataType": "ENUM_TYPE",
                    "enumValue": {
                        "name": "Multiple times a day"
                    }
                }
            ]
        }
    }
    return metric_object_payload

### Construct the Metrics Value Payload for creating and associating a metric value to a metric of a given model object

In [None]:
def get_metric_value_payload(primaryParentId, metric_name, metric_value):
    metric_description = "watsonx.governance metric for '" + metric_name + "'"
    metric_value_payload = {
        "typeDefinitionId": "MetricValue",
        "primaryParentId": primaryParentId,
        "description": metric_description,
        "fields": {
            "field": [
                {
                    "name": "MRG-Metric-Shared:Breach Status",
                    "dataType": "ENUM_TYPE",
                    "enumValue": {
                        "name": "Green"
                    }
                },
                {
                    "name": "MRG-Metric-Shared:Red Threshold",
                    "dataType": "FLOAT_TYPE",
                    "value": 0.5
                },
                {
                    "name": "MRG-MetricVal:Value",
                    "dataType": "FLOAT_TYPE",
                    "value": metric_value
                },
                {
                    "name": "MRG-Metric-Shared:Collection Status",
                    "dataType": "ENUM_TYPE",
                    "enumValue": {
                        "name": "Collected"
                    }
                }
            ]
        }
    }
    return metric_value_payload

### Create Metrics Object

In [None]:
def create_metrics_object(metric_object_payload):
    openpages_metric_object_creation_url = CPD_URL.rstrip("/") + "/openpages-openpagesinstance-cr-grc/api/contents"
    response = requests.post(openpages_metric_object_creation_url, json=metric_object_payload, headers=header, verify=False).json()
    metric_id = response['id']
    return metric_id

### Add Metric Value to the Metric Object

In [None]:
def add_metric_value_to_metric_object(metric_value_payload):
    openpages_metric_value_creation_url = CPD_URL.rstrip("/") + "/openpages-openpagesinstance-cr-grc/api/contents"
    response = requests.post(openpages_metric_value_creation_url, json=metric_value_payload, headers=header, verify=False).json()
    metric_value_id = response['id']
    return metric_value_id

### Check for the metric existence in the metrics map

In [None]:
def get_existing_metric_id(metrics_map, metric_name):
    for item in metrics_map:
        if 'metric_name' in item and item['metric_name'] == metric_name:
            return item['metric_id']
    return None

### Create an OpenPages connection

In [None]:
token = get_token()
header = {
    "Content-Type": "application/json",
    "Accept": "application/json",
    "Authorization": "Bearer {0}".format(token)
}

### Fetch the Model Id for a given OP Model Name

In [None]:
model_id = get_op_model_id(header, WATSONX_MODEL_TITLE)
model_id

In [None]:
metrics

### Publish the metrics to the watsonx governance console

In [None]:
### Fetch the existing, if any, OP Model Metrics for a given OP Model ID
metrics_map = get_op_model_metrics_definitions(header, model_id)

print('\n')

# Iterate over the given metrics to be published..
for metric_name, metric_value in metrics.items():
    
    # check if the metric exists by the given name, and if, get its metric_id
    metric_id = get_existing_metric_id(metrics_map, metric_name)

    # if the metric does not exists, then create it
    if metric_id is None:
        print(metric_name + ': Metric Object does not exist, creating it..')

        # construct the metric object to be published
        metric_object_payload = get_metric_object_payload(model_id, metric_name)

        # now, create the metric object
        metric_id = create_metrics_object(metric_object_payload)

    # Add the metric value to metric object

    # construct the metric value object to be published
    metric_value_payload = get_metric_value_payload(metric_id, metric_name, metric_value)

    # create the metric value - basically add the metric value to the metric object
    metric_value_id = add_metric_value_to_metric_object(metric_value_payload)
    
    print(str(metric_name) + ': Metric Object ID: ' + str(metric_id) + ', Metric Value Object ID: '+ str(metric_value_id) + '\n')