# Foundation Model Risk Evaluation using IBM watsonx.governance


The notebook evaluates the risk associated with a given foundation model using IBM watsonx.governance. The foundation model to evaluate can be either in IBM watsonx.ai or in an external model provider. The risk evaluation result can be stored in Governance Console or can be downloaded as a pdf file.


This notebook should be run in Python 3.10 or greater runtime environment.

## Learning goals

- Provide foundation model details hosted in IBM watsonx.ai or in external system
- Run Model Risk Evaluation for the given foundation model
- Generate PDF report of the evaluation result and publish results in Governance Console.

## Contents

- [Step 1 - Install libraries](#install)
- [Step 2 - Configuration](#configuration)
- [Step 3 - Foundation Model Risk Evaluation](#evaluate)
- [Step 4 - Display the results](#display)

## Install libraries<a name="install"></a>

### Install the required packages

In [None]:
!pip install ibm_watsonx_gov[mre]
!python -m nltk.downloader stopwords
import warnings
warnings.filterwarnings('ignore')

Note: you may need to restart the kernel to use updated packages.

## Configuration <a name="configuration"></a>

### Configure your watsonx.governance credentials



Provide the watsonx.governance and Governance Console credentials as needed. 

Use the below cell to configure the credentials when using IBM watsonx.governance as Service.

In [None]:
from ibm_watsonx_gov.config import (Credentials, WxGovConsoleCredentials)

# Provide watsonx.governance credentials for IBM watsonx.governance as Service
CLOUD_API_KEY = "<IBM watsonx.governance as Service api key>"

credentials = Credentials(api_key=CLOUD_API_KEY,
                          # Provide the watsonx.governance url when using other regions. Dallas region is used by default.
                          # url="https://eu-de.api.aiopenscale.cloud.ibm.com"
                          )

## [Optional] Provide watsonx.governance console details for IBM watsonx.governance as Service, if required to push the evaluation result
# WXGC_URL = "https://<watsonx.governance console>/openpages-openpagesinstance-cr-grc"
# WXGC_USERNAME = "<watsonx.governance console username>"
# WXGC_PASSWORD = "<watsonx.governance console password>"
# wx_gc_credentials = WxGovConsoleCredentials(
#     url=WXGC_URL,
#     username=WXGC_USERNAME,
#     password=WXGC_PASSWORD,
# )

Use the below cell to configure the credentials when using IBM watsonx.governance IBM Software

In [None]:
from ibm_watsonx_gov.config import (Credentials, WxGovConsoleCredentials)

# Provide watsonx.governance credentials for IBM watsonx.governance Software

WXG_URL =  "<watsonx.governance host url>"
WXG_API_KEY  = "<watsonx.governance API key>"
WXG_USERNAME = "<watsonx.governance username>"
WXG_VERSION = "<watsonx.governance version>" # Eg: 2.1


credentials = Credentials(url=WXG_URL,
                          api_key=WXG_API_KEY,
                          username=WXG_USERNAME,
                          version=WXG_VERSION,
                          disable_ssl=False)


#[Optional] Provide watsonx.governance console details for IBM watsonx.governance Software, if required to push the evaluation result
# WXGC_URL = "https://<watsonx.governance console>/openpages-openpagesinstance-cr-grc"
# WXGC_USERNAME = "<watsonx.governance console username>"
# WXGC_PASSWORD = "<watsonx.governance console password>"
# WXGC_API_KEY = "<watsonx.governance console api key>"

# wx_gc_credentials = WxGovConsoleCredentials(url=WXGC_URL,
#                                             username=WXGC_USERNAME,
#                                             password=WXGC_PASSWORD,
#                                             api_key=WXGC_API_KEY)


Parameters for Model Risk Configuration

| Name | Description | Type | Default Value |
|:-|:-|:-|:-|
| model_details | The foundation model details. | WxAIFoundationModel or CustomFoundationModel | |
| risk_dimensions [Optional] | The list of risks to be evaluated. Supported Risk dimensions: ["toxic-output", "harmful-output", "prompt-leaking", "hallucination", "prompt-injection", "jailbreaking", "output-bias", "harmful-code-generation"]. The risks ["jailbreaking", "prompt-leaking", "harmful-code-generation","prompt-injection"] requires watsonx.ai credentials to be provided.| list[str] | None |
| max_sample_size [Optional] | The maximum number of data instances to be used for evaluation. Specify a smaller value(Eg:50) to speed up the evaluation or set it to None to use all data for evaluation, which takes longer but ensures meaningful results.| int | None |
| pdf_report_output_path [Optional] | The output file path to save the pdf report. | str | None |
| wx_gc_configuration [Optional] | The Governance Console configuration to store computed metrics result. Storing the evaluation result in the Governance Console prevents recomputing the metrics during the next evaluation and the evaluation engine retrieves the saved metrics instead.| WxGovConsoleConfiguration | None |

### Foundation Model details

The foundation model to evaluate can be of IBM watsonx.ai model or any external foundation model.

In [None]:
# Foundation model name required for exporting metrics to a PDF
FOUNDATION_MODEL_NAME =  "<Foundation model name>"

To evaluate the IBM watsonx.ai foundation model provide the WxAIFoundationModel details.

In [None]:
from ibm_watsonx_gov.entities.foundation_model import WxAIFoundationModel
from ibm_watsonx_gov.entities.model_provider import WxAIModelProvider
from ibm_watsonx_gov.config import WxAICredentials


# wastonx.ai model id eg: "ibm/granite-13b-instruct-v1", "meta-llama/llama-3-70b-instruct", "google/flan-t5-xxl"
FOUNDATION_MODEL_ID = "<Foundation model id>"

# Provide one of wastonx.ai project id or space id
PROJECT_ID = "<Project id>"
# SPACE_ID = "<Space id>"

# [Optional]WxAIModelProvider details are required if using watsonx.ai from a different region than used for watsonx.gov
# wx_ai_provider = WxAIModelProvider(credentials=WxAICredentials(url="",
#                                                               api_key=""))

model_details = WxAIFoundationModel(model_name=FOUNDATION_MODEL_NAME,
                                    model_id=FOUNDATION_MODEL_ID,
                                    project_id=PROJECT_ID,
                                    # space_id=SPACE_ID,
                                    # provider=wx_ai_provider
                                    )

To evaluate the external foundation model provide the CustomFoundationModel details.

In [None]:
from ibm_watsonx_gov.entities.foundation_model import CustomFoundationModel

# Provide a scoring function wrapping the external LLM
scoring_fn = None

## Sample scoring function using model from HF
# from transformers import T5Tokenizer, T5ForConditionalGeneration
# import pandas as pd
# tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xl")
# model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl")

# def scoring_fn(data):
#     predictions_list = []
#     for prompt_text in data.iloc[:, 0].values.tolist():
#         input_ids = tokenizer(prompt_text, return_tensors="pt").input_ids
#         output = model.generate(input_ids)
#         output = tokenizer.decode(output[0], skip_special_tokens=True)
#         predictions_list.append(output)
#     return pd.DataFrame({"generated_text": predictions_list})


model_details = CustomFoundationModel(model_name=FOUNDATION_MODEL_NAME,
                                      scoring_fn=scoring_fn)

In [None]:
from ibm_watsonx_gov.config.model_risk_configuration import WxGovConsoleConfiguration
import os
risk_dimensions =  ["<Risk dimension>"]
pdf_report_output_path = "<Output file path>"
max_sample_size = None


## [Optional] Provide the Governance Console configuration if required to push the result.
# WXGC_MODEL_ID = "<The model ID in Governance Console>"

# wx_gc_configuration = WxGovConsoleConfiguration(model_id=WXGC_MODEL_ID,
#                                                 credentials=wx_gc_credentials)

## Foundational Model Risk Evaluation<a name="evaluate"></a>

Create Model Risk Evaluation configuration

In [8]:
from ibm_watsonx_gov.config.model_risk_configuration import ModelRiskConfiguration, WxGovConsoleConfiguration

# MRE Configuration 
configuration = ModelRiskConfiguration(
    model_details = model_details,
    risk_dimensions=risk_dimensions,
    max_sample_size=max_sample_size,
    pdf_report_output_path=pdf_report_output_path,
    # wx_gc_configuration=wx_gc_configuration, # uncomment this line if the result should be pushed to Governance Console (OpenPages)
)

Evaluate model risk

In [None]:
from ibm_watsonx_gov.evaluate import evaluate_model_risk

evaluation_results = evaluate_model_risk(
    configuration=configuration,
    credentials=credentials,
)

print(evaluation_results.risks)

## Display the results <a name="display"></a>

### Display the metrics result in a table

In [10]:
for risk in evaluation_results.risks:
    print("\n--------------------------------\n")
    print(f"risk: {risk.name}")
    for card in risk.benchmarks:
        print(f"card: {card.name}")
        print(card.get_metric_df())


--------------------------------

risk: hallucination
card: cards.value_alignment.hallucinations.truthfulqa
                 name   value
0          score_name  rougeL
1               score   0.068
2    num_of_instances     3.0
3              counts    0.25
4              totals    22.5
5          precisions   0.019
6                  bp   0.641
7             sys_len    27.0
8             ref_len    39.0
9           sacrebleu    0.01
10       score_ci_low     0.0
11      score_ci_high   0.107
12   sacrebleu_ci_low     0.0
13  sacrebleu_ci_high   0.017
14          rougeLsum   0.068
15             rouge2     0.0
16             rouge1   0.068
17             rougeL   0.068
18   rougeLsum_ci_low     0.0
19  rougeLsum_ci_high   0.107
20      rouge1_ci_low     0.0
21     rouge1_ci_high   0.107
22      rougeL_ci_low     0.0
23     rougeL_ci_high   0.107


### Export the computed metrics to PDF report

In [11]:
from ibm_wos_utils.joblib.utils.notebook_utils import  create_download_link_for_file
pdf_file = create_download_link_for_file(evaluation_results.output_file_path)
display((pdf_file))

### [Optional] Steps to get Model ID from Governance Console:

1. Log in to the Governance Console.
2. From the left-hand menu, navigate to Inventory and select Models from the dropdown menu.
3. Click New and fill in the required details for your new model.
4. In the General section, locate the field labeled Model ID
