# Evaluate the IBM watsonx.ai model prompt quality Using IBM watsonx.governance - monitoring toolkit. Publish the evaluated metrics to IBM watsonx.governance - OpenPages GRC platform.

In [None]:
!pip install --upgrade ibm-watson-machine-learning   | tail -n 1
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1


Make sure to install "yum install postgresql-devel" if you are running this notebook in a jupyter env in a linux install

In [None]:
!pip install --upgrade ibm-metrics-plugin --no-cache | tail -n 1

In [None]:
!pip install --upgrade evaluate --no-cache | tail -n 1
!pip install --upgrade rouge_score --no-cache | tail -n 1
!pip install --upgrade textstat --no-cache | tail -n 1
!pip install --upgrade sacrebleu --no-cache | tail -n 1
!pip install --upgrade sacremoses --no-cache | tail -n 1
!pip install --upgrade datasets==2.10.0 --no-cache | tail -n 1

In [None]:
!pip install sqlalchemy==1.4.47
!pip install datasets==2.10.0

In [1]:
import warnings
warnings.filterwarnings('ignore')

## Provision services and configure credentials

If you have not already, provision an instance of IBM Watson OpenScale using the [OpenScale link in the Cloud catalog](https://cloud.ibm.com/catalog/services/watson-openscale).

Your Cloud API key can be generated by going to the [**Users** section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the **API Keys** section, and click **Create an IBM Cloud API key**. Give your key a name and click **Create**, then copy the created key and paste it below.

**NOTE:** You can also get OpenScale `API_KEY` using IBM CLOUD CLI.

How to install IBM Cloud (bluemix) console: [instruction](https://console.bluemix.net/docs/cli/reference/ibmcloud/download_cli.html#install_use)

How to get api key using console:
```
bx login --sso
bx iam api-key-create 'my_key'
```

In [2]:
GEN_API_KEY = "<Your API Key>"
CLOUD_API_KEY = GEN_API_KEY
api_endpoint = "<IBM watsonx.ai model inferencing end point>"
project_id = "<Your IBM watsonx.ai project id>"
endpoint_url = "https://us-south.ml.cloud.ibm.com"

## OpenPages Connection Details

In [3]:
OP_URL = "<OpenPages URL>"
OP_USERNAME = "OpenPages Username"
OP_PASSWORD = "OpenPages User Password"
model_name = 'OpenPages FM Model Id to which metrics needs to be published'

## Test data containing the summarization output from the IBM watsonx.ai model

In [4]:
!rm -fr llm_content.csv
!wget "https://raw.githubusercontent.com/ravichamarthy/custom_metrics/main/llm_content.csv"

--2024-01-23 05:48:09--  https://raw.githubusercontent.com/ravichamarthy/custom_metrics/main/llm_content.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31230 (30K) [text/plain]
Saving to: ‘llm_content.csv’


2024-01-23 05:48:10 (22.0 MB/s) - ‘llm_content.csv’ saved [31230/31230]



In [5]:
import pandas as pd
import numpy as np
llm_data_all = pd.read_csv("llm_content.csv")
llm_data_all.head()

Unnamed: 0,input_text,generated_summary,reference_summary_1,reference_summary_2
0,Scientists have discovered a new species of de...,New bioluminescent fish species found in deep ...,Discovery of deep-sea fish emitting soothing l...,Scientists find new bioluminescent fish specie...
1,An international team of astronomers has ident...,Distant exoplanet\'s water vapor-filled atmosp...,Astronomers identify exoplanet with water vapo...,Discovery of exoplanet with water vapor in its...
2,Researchers have developed a novel nanotechnol...,New nanotechnology-based cancer treatment demo...,Researchers create cancer treatment using nano...,Innovative cancer treatment utilizing nanotech...
3,A new app is aiming to reduce food waste by co...,App connects local restaurants with customers ...,New sustainability-focused app facilitates sal...,Initiative to reduce food waste involves app c...
4,Archaeologists have uncovered an ancient city ...,"Ancient city dating back over 4,000 years disc...",Archaeological find in Iraq reveals ancient ci...,"Discovery of 4,000-year-old ancient city in Ku..."


In [6]:
llm_data = llm_data_all.head(10)
llm_data.head()

Unnamed: 0,input_text,generated_summary,reference_summary_1,reference_summary_2
0,Scientists have discovered a new species of de...,New bioluminescent fish species found in deep ...,Discovery of deep-sea fish emitting soothing l...,Scientists find new bioluminescent fish specie...
1,An international team of astronomers has ident...,Distant exoplanet\'s water vapor-filled atmosp...,Astronomers identify exoplanet with water vapo...,Discovery of exoplanet with water vapor in its...
2,Researchers have developed a novel nanotechnol...,New nanotechnology-based cancer treatment demo...,Researchers create cancer treatment using nano...,Innovative cancer treatment utilizing nanotech...
3,A new app is aiming to reduce food waste by co...,App connects local restaurants with customers ...,New sustainability-focused app facilitates sal...,Initiative to reduce food waste involves app c...
4,Archaeologists have uncovered an ancient city ...,"Ancient city dating back over 4,000 years disc...",Archaeological find in Iraq reveals ancient ci...,"Discovery of 4,000-year-old ancient city in Ku..."


# Build and evaluate the prompt against IBM watsonx.ai model

In [7]:
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes, DecodingMethods

In [8]:
generate_params = {
    GenParams.MAX_NEW_TOKENS: 75,
    GenParams.MIN_NEW_TOKENS: 10,
    GenParams.TEMPERATURE: 0.0
}

model = Model(
    model_id=ModelTypes.FLAN_UL2,
    params=generate_params,
    credentials={
        "apikey": GEN_API_KEY,
        "url": endpoint_url
    },
    project_id=project_id
)

# For each input text construct and evaluate the prompts

In [9]:
def get_prompt(text):
    prompt = f"""Please provide a summary of the following text with maximum of 20 words.
    
{text}
    
Summary:"""
    return prompt

In [10]:
def get_completion(input_text):
    prompt_text = get_prompt(input_text)
    model_response = model.generate_text(prompt=prompt_text)
    return model_response

In [11]:
llm_data['watsonx_ai_generated_summary'] = llm_data['input_text'].apply(get_completion)

In [12]:
llm_data.head()

Unnamed: 0,input_text,generated_summary,reference_summary_1,reference_summary_2,watsonx_ai_generated_summary
0,Scientists have discovered a new species of de...,New bioluminescent fish species found in deep ...,Discovery of deep-sea fish emitting soothing l...,Scientists find new bioluminescent fish specie...,Scientists have discovered a new species of de...
1,An international team of astronomers has ident...,Distant exoplanet\'s water vapor-filled atmosp...,Astronomers identify exoplanet with water vapo...,Discovery of exoplanet with water vapor in its...,Astronomers have identified a distant exoplane...
2,Researchers have developed a novel nanotechnol...,New nanotechnology-based cancer treatment demo...,Researchers create cancer treatment using nano...,Innovative cancer treatment utilizing nanotech...,Researchers have developed a novel nanotechnol...
3,A new app is aiming to reduce food waste by co...,App connects local restaurants with customers ...,New sustainability-focused app facilitates sal...,Initiative to reduce food waste involves app c...,A new app is aiming to reduce food waste by co...
4,Archaeologists have uncovered an ancient city ...,"Ancient city dating back over 4,000 years disc...",Archaeological find in Iraq reveals ancient ci...,"Discovery of 4,000-year-old ancient city in Ku...",Archaeologists have uncovered an ancient city ...


In [13]:
llm_data['watsonx_ai_generated_summary'][0]

'Scientists have discovered a new species of deep-sea fish that emits a soft, soothing light.'

## IBM watsonx.governance authentication

In [14]:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator,BearerTokenAuthenticator

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *


authenticator = IAMAuthenticator(apikey=CLOUD_API_KEY)
client = APIClient(authenticator=authenticator)
client.version

'3.0.34'

# Common Imports

In [15]:
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMTextMetricGroup
from ibm_metrics_plugin.metrics.llm.utils.constants import  LLMGenerationMetrics
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMSummarizationMetrics
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMQAMetrics
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMClassificationMetrics
from ibm_metrics_plugin.metrics.llm.utils.constants import HAP_SCORE
from ibm_metrics_plugin.metrics.llm.utils.constants import PII_DETECTION

# Evaluating Summarization output from IBM watsonx.ai model

In [16]:
df_input = llm_data[['input_text']].copy()
df_output = llm_data[['watsonx_ai_generated_summary']].copy()
df_reference = llm_data[['reference_summary_2']].copy()

## Metrics configuration for evaluation

In [17]:
metric_config = {   
    "configuration": {
        LLMTextMetricGroup.SUMMARIZATION.value: {
            LLMSummarizationMetrics.ROUGE_SCORE.value: {},
            LLMSummarizationMetrics.SARI.value: {},
            LLMSummarizationMetrics.METEOR.value: {},
            LLMSummarizationMetrics.NORMALIZED_RECALL.value: {},
            LLMSummarizationMetrics.NORMALIZED_PRECISION.value: {},
            LLMSummarizationMetrics.NORMALIZED_F1_SCORE.value: {},
            LLMSummarizationMetrics.COSINE_SIMILARITY.value: {},
            LLMSummarizationMetrics.JACCARD_SIMILARITY.value: {},
            LLMSummarizationMetrics.BLEU.value: {},
            LLMSummarizationMetrics.FLESCH.value: {}
        }
    }
}

## Summarization Metrics Evaluation

In [18]:
import json
result = client.llm_metrics.compute_metrics(metric_config,df_input,df_output, df_reference)

Please install adversarial-robustness-toolbox package
please install adversarial-robustness-toolbox package
please install adversarial-robustness-toolbox package
please install adversarial-robustness-toolbox package
please install adversarial-robustness-toolbox package


2024-01-23 05:48:34.673192: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Please install watson_nlp package to compute HAP score
Please install watson_nlp package to compute HAP score
Please install watson_nlp package to compute HAP score
Please install watson_nlp package to detect PII information
Please install watson_nlp package to detect PII information
Please install watson_nlp package to detect PII information


[nltk_data] Downloading package wordnet to /home/hadoop/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to /home/hadoop/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /home/hadoop/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


## Evaluated Metrics

In [19]:
print(json.dumps(result,indent=2))

{
  "flesch": {
    "flesch_reading_ease": {
      "metric_value": 44.477,
      "mean": 44.477,
      "min": 18.35,
      "max": 81.63,
      "std": 19.491639771963772
    },
    "flesch_kincaid_grade": {
      "metric_value": 11.180000000000001,
      "mean": 11.180000000000001,
      "min": 5.6,
      "max": 15.4,
      "std": 3.079870127131987
    }
  },
  "bleu": {
    "precisions": [
      0.423728813559322,
      0.20359281437125748,
      0.12738853503184713,
      0.09523809523809523
    ],
    "brevity_penalty": 1.0,
    "length_ratio": 1.0114285714285713,
    "translation_length": 177,
    "reference_length": 175,
    "metric_value": 0.17986550102070845
  },
  "cosine_similarity": {
    "metric_value": 0.30749787927468736,
    "mean": 0.30749787927468736,
    "min": 0.12078567833746143,
    "max": 0.626031645509536,
    "std": 0.1429948129822306
  },
  "jaccard_similarity": {
    "metric_value": 0.24681884704223536,
    "mean": 0.24681884704223536,
    "min": 0.0740740740740

## Construct a key/value dict of the metrics to be published to OpenPages

In [20]:
def get_metrics(result):
    metrics = {}
    metrics['rouge1'] = round(result['rouge_score']['rouge1']['metric_value'], 4)
    metrics['rouge2'] = round(result['rouge_score']['rouge2']['metric_value'], 4)
    metrics['rougeL'] = round(result['rouge_score']['rougeL']['metric_value'], 4)
    metrics['rougeLsum'] = round(result['rouge_score']['rougeLsum']['metric_value'], 4)
    metrics['meteor'] = round(result['meteor']['metric_value'], 4)
    metrics['sari'] = round(result['sari']['metric_value'], 4)
    metrics['cosine_similarity'] = round(result['cosine_similarity']['metric_value'], 4)
    metrics['jaccard_similarity'] = round(result['jaccard_similarity']['metric_value'], 4)
    return metrics

In [21]:
metrics =  get_metrics(result)
metrics

{'rouge1': 0.4254,
 'rouge2': 0.2205,
 'rougeL': 0.3574,
 'rougeLsum': 0.3574,
 'meteor': 0.3941,
 'sari': 38.2816,
 'cosine_similarity': 0.3075,
 'jaccard_similarity': 0.2468}

# Publishing computed metrics to OpenPages Foundation Model

In [22]:
import requests
import base64
import json

In [23]:
## Get Auth Token for OpenPages
def get_basic_auth_token(username, password):
    token = base64.b64encode(bytes('{0}:{1}'.format(username, password), 'utf-8')).decode("ascii")
    return token

In [24]:
## For a given model name, get OP model id
def get_op_model_id(header, model_name):
    openpages_url = OP_URL.rstrip("/") + "/grc/api/query"
    # Prepare post payload
    get_id_payload = {
        "statement": "SELECT [Model].[Resource ID] FROM [Model] WHERE [Model].[Name] IN ('{0}')".format(model_name),
        "skipCount": 0
    }
    response = requests.post(openpages_url, json=get_id_payload, headers=header, verify=False).json()

    model_id = None
    if response is not None:
        if response.get("rows") is not None:
            rows = response.get("rows")
            if len(rows) != 0:
                fields = rows[0].get("fields")
                if fields is not None:
                    field = fields.get("field")
                    if len(field) != 0:
                        model_id = field[0]["value"]

    if model_id is None:
        print("Model ID not found.")
    else:
        print("Model ID fetched: " + model_id)
    return model_id

In [25]:
## For a given model id, get the corresponding OP metrics definitions - Map containing metric id and its name
def get_op_model_metrics_definitions(header, model_id):
    openpages_url = OP_URL.rstrip("/") + "/grc/api/query"    
    get_metrics_payload = {
        "statement": "SELECT [Metric].[Resource ID], [Metric].[Name], [Metric].[Description] FROM [Model] JOIN [Metric] ON PARENT([Model]) WHERE [Model].[Resource ID]='{0}'".format(model_id),
        "skipCount": 0
    }
    print("Sending request to fetch all metrics associated with the model.")
    response = requests.post(openpages_url, json=get_metrics_payload, headers=header, verify=False).json()

    metrics_map = []

    if response is not None:
        if response.get("rows") is not None:
            rows = response.get("rows")
            if len(rows) != 0:
                for i in range(len(rows)):
                    fields = rows[i].get("fields")
                    if fields is not None:
                        field = fields.get("field")
                        metric_id_desc = {}
                        metric_id = None
                        metric_desc = None
                        for row in field:
                            if row.get('name') == 'Resource ID':
                                metric_id = row.get('value')
                            if row.get('name') == 'Description':
                                metric_desc = row.get('value')
                        metric_id_desc['metric_desc'] = metric_desc
                        metric_id_desc['metric_id'] = metric_id
                        metrics_map.append(metric_id_desc)
        print("Completed fetching, if any, all metrics associated with the model.")
        return metrics_map

In [26]:
## Construct the Metrics Object Payload for metrics creation
def get_metric_object_payload(primaryParentId, metric_name):
    metric_description = "watsonx.governance metric for '" + metric_name + "'"
    metric_object_payload = {
    	"name": metric_name,
    	"description": metric_description,
    	"typeDefinitionId": "Metric",
        "primaryParentId": primaryParentId,
    	"fields":
    	{
    		"field":
    		[
    			{
                    "name": "MRG-Metric:Data Source",
                    "dataType": "STRING_TYPE",
                    "value": "watsonx.governance"
                },
                {
            		"name": "MRG-Metric:Frequency",
            		"dataType": "ENUM_TYPE",
            		"enumValue": {
                		"name": "Multiple times a day"
                	}
            	}
    		]
    	}
    }
    return metric_object_payload

In [27]:
## Construct the Metrics Value Payload for creating and associating a metric value to a metric of a given model object
def get_metric_value_payload(primaryParentId, metric_name, metric_value):
    metric_description = "watsonx.governance metric for '" + metric_name + "'"
    metric_value_payload = {
        "typeDefinitionId": "MetricValue",
        "primaryParentId": primaryParentId,
        "description": metric_description,
        "fields": {
            "field": [
                {
                    "name": "MRG-Metric-Shared:Breach Status",
                    "dataType": "ENUM_TYPE",
                    "enumValue": {
                        "name": "Green"
                    }
                },
                {
                    "name": "MRG-Metric-Shared:Red Threshold",
                    "dataType": "FLOAT_TYPE",
                    "value": 0.5
                },
                {
                    "name": "MRG-MetricVal:Value",
                    "dataType": "FLOAT_TYPE",
                    "value": metric_value
                }
            ]
        }
    }
    return metric_value_payload

In [28]:
## Create Metrics Object
def create_metrics_object(metric_object_payload):
    openpages_metric_object_creation_url = OP_URL + "/grc/api/contents"
    response = requests.post(openpages_metric_object_creation_url, json=metric_object_payload, headers=header, verify=False).json()
    metric_id = response['id']
    return metric_id

In [29]:
## Add Metric Value to the Metric Object
def add_metric_value_to_metric_object(metric_value_payload):
    openpages_metric_value_creation_url = OP_URL + "/grc/api/contents"
    response = requests.post(openpages_metric_value_creation_url, json=metric_value_payload, headers=header, verify=False).json()
    metric_value_id = response['id']
    return metric_value_id

In [30]:
## Check for the metric existence in the metrics map
def get_existing_metric_id(metrics_map, metric_name):
    for item in metrics_map:
        if 'metric_desc' in item and metric_name in item['metric_desc']:
            return item['metric_id']
    return None

In [31]:
token = get_basic_auth_token(OP_USERNAME, OP_PASSWORD)
header = {
        "Content-Type": "application/json",
        "Accept": "application/json",
        "Authorization": "Basic {0}".format(token)
    }

In [32]:
### Fetch the Model Id for a given OP Model Name
model_id = get_op_model_id(header, model_name)
model_id

Model ID fetched: 12442


'12442'

In [None]:
## Publish the metrics to OpenPages

In [33]:
### Fetch the existing, if any, OP Model Metrics for a given OP Model ID
metrics_map = get_op_model_metrics_definitions(header, model_id)
print(metrics_map)

print('\n')

# Iterate over the given metrics to be published..
for metric_name, metric_value in metrics.items():
    
    # check if the metric exists by the given name, and if, get its metric_id
    metric_id = get_existing_metric_id(metrics_map, metric_name)

    # if the metric does not exists, then create it
    if metric_id is None:
        print(metric_name + ': Metric Object does not exists, hence creating it..')

        # construct the metric object to be published
        metric_object_payload = get_metric_object_payload(model_id, metric_name)

        # now, create the metric object
        metric_id = create_metrics_object(metric_object_payload)

    # Add the metric value to metric object

    # construct the metric value object to be published
    metric_value_payload = get_metric_value_payload(metric_id, metric_name, metric_value)

    # create the metric value - basically add the metric value to the metric object
    metric_value_id = add_metric_value_to_metric_object(metric_value_payload)
    
    print(str(metric_name) + ': Metric Object ID: ' + str(metric_id) + ', Metric Value Object ID: '+ str(metric_value_id) + '\n')

Sending request to fetch all metrics associated with the model.
Completed fetching, if any, all metrics associated with the model.
[{'metric_desc': "watsonx.governance metric for 'rouge1'", 'metric_id': '12443'}, {'metric_desc': "watsonx.governance metric for 'rouge2'", 'metric_id': '12445'}, {'metric_desc': "watsonx.governance metric for 'rougeL'", 'metric_id': '12447'}, {'metric_desc': "watsonx.governance metric for 'rougeLsum'", 'metric_id': '12449'}, {'metric_desc': "watsonx.governance metric for 'meteor'", 'metric_id': '12451'}, {'metric_desc': "watsonx.governance metric for 'sari'", 'metric_id': '12453'}, {'metric_desc': "watsonx.governance metric for 'cosine_similarity'", 'metric_id': '12455'}, {'metric_desc': "watsonx.governance metric for 'keyword_inclusions_f1_score'", 'metric_id': '12457'}, {'metric_desc': "watsonx.governance metric for 'jaccard_similarity'", 'metric_id': '12508'}]


rouge1: Metric Object ID: 12443, Metric Value Object ID: 12510

rouge2: Metric Object ID: 12

Author: ravi.chamarthy@in.ibm.com