# Evaluate Content Generation using langchain and watsonx.governance

This notebook demonstrates the working flow of Content Generation using langchain and watsonx.ai, evaluation of the application using watsonx.governance callback handler.

## Learning goals

- Read data
- Initialize foundation model
- Generate the responses
- Configure and compute metrics


**Note:** Search for `<EDIT THIS>` and provide the inputs.

**Please run the notebook in an environment with memory greater than 4GB**

## Contents

- [Step 1 - Setup](#setup)
- [Step 2 - Read and store data](#data)
- [Step 3 - Initialize a foundation model using `watsonx.ai`](#model)
- [Step 4 - Create the prompt and inputs for the prompt template](#predict)
- [Step 5 - Configure the `watsonx.governance` metrics](#config)
- [Step 6 - Run the LLMChain to generate response and compute the watsonx.governance metrics using callback](#compute)
- [Step 7 - Display the results](#results)

## Step 1 - Setup <a id="setup"></a>

### Install the necessary libraries

In [None]:
!pip install wget 
!pip install nltk
!pip install -U chromadb
!pip install -qU langchain-ibm
!pip install -U ibm-watsonx-ai
!pip install -U ibm-watson-openscale
!pip install ibm-metrics-plugin~=5.0.3.0
!pip install nest_asyncio unitxt torch==2.1.0 
!pip install textstat pydantic-settings sentence-transformers
!pip install -U langchain langchain-core langchain-community

import warnings
warnings.filterwarnings("ignore")

In [None]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /home/wsuser/nltk_data...


True

**Note**: you may need to restart the kernel to use updated libraries.

### Configure your cloud credentials with IBM's APIClient object

In [None]:
from ibm_watsonx_ai import APIClient

api_client = APIClient(credentials = {
                                "url" : "<EDIT THIS>",
                                "apikey" : "<EDIT THIS>",
                                "project_id" : "<EDIT THIS>",
                            })

## Step 2 - Read and store data <a id="data"></a>

### Read the data

Download the sample "LLM Content Generation" file.

In [None]:
import wget
import os

!wget "https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/watsonx/llm_content_generation.csv"

--2024-11-19 09:17:52--  https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/watsonx/llm_content_generation.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11794 (12K) [text/plain]
Saving to: ‘llm_content_generation.csv’


2024-11-19 09:17:52 (29.9 MB/s) - ‘llm_content_generation.csv’ saved [11794/11794]



HTTP request sent, awaiting response... 

200 OK
Length: 42771 (42K) [application/octet-stream]
Saving to: ‘llm_content.csv.3’


2024-11-18 15:49:48 (12.0 MB/s) - ‘llm_content.csv.3’ saved [42771/42771]



In [None]:
import pandas as pd

data = pd.read_csv("llm_content_generation.csv",encoding="latin-1")
data

Unnamed: 0,question,generated_text,reference_text
0,What are the benefits of regular exercise?,"Regular exercise has numerous benefits, includ...","Regular exercise has numerous benefits, includ..."
1,What is the process of photosynthesis?,Photosynthesis is the process by which plants ...,Photosynthesis is the process by which plants ...
2,What are the key features of a smartphone?,A smartphone is a mobile device that typically...,A smartphone is a mobile device that typically...
3,How does the immune system work?,The immune system is a complex network of cell...,The immune system is a complex network of cell...
4,What is the capital of France?,"The capital of France is Paris, which is known...","The capital of France is Paris, which is known..."
5,What is climate change?,Climate change refers to long-term alterations...,Climate change refers to long-term alterations...
6,How does the water cycle work?,"The water cycle, also known as the hydrologic ...","The water cycle, also known as the hydrologic ..."
7,What are the main principles of democracy?,Democracy is a system of government where the ...,Democracy is a system of government where the ...
8,How do plants photosynthesize?,"Plants photosynthesize by using sunlight, carb...","Plants photosynthesize by using sunlight, carb..."
9,What are the key components of a computer system?,A computer system consists of several key comp...,A computer system consists of several key comp...


## Step 3 - Initialize a foundation model using `watsonx.ai`
<a id="model"></a>

IBM watsonx foundation models are among the <a href="https://python.langchain.com/docs/integrations/llms/watsonxllm" target="_blank" rel="noopener no referrer">list of LLM models supported by Langchain</a>. This example shows how to communicate with <a href="https://newsroom.ibm.com/2023-09-28-IBM-Announces-Availability-of-watsonx-Granite-Model-Series,-Client-Protections-for-IBM-watsonx-Models" target="_blank" rel="noopener no referrer">the Granite Model Series</a> using <a href="https://python.langchain.com/docs/get_started/introduction" target="_blank" rel="noopener no referrer">Langchain</a>.

### Define the model parameters
Provide a set of model parameters that will influence the result:

In [None]:
parameters = {
    "decoding_method": "greedy",
    "max_new_tokens": 100,
    "min_new_tokens": 1,
    "temperature": 0.5,
    "top_k": 50,
    "top_p": 1,
}

### Create watsonx model with IBM's APIClient object into the WatsonxLLM class
Initialize the model from watsonx.ai with required parameters, and using `ibm/granite-13b-chat-v2`.

In [None]:
from langchain_ibm import WatsonxLLM

watsonx_llm = WatsonxLLM(
    model_id = "ibm/granite-13b-chat-v2",
    watsonx_client = api_client,
    params = parameters,
    project_id = "<EDIT THIS>"
)

## Step 4 - Create the prompt and inputs for the prompt template
<a id="predict"></a>

### Construct a dataframe with question, generated text and reference text to be used for metrics computation
<a id="predict"><a>

In [None]:
df_input = pd.DataFrame(data, columns=["question", "generated_text", "reference_text"])

sources = df_input.to_dict(orient='records')

### Create the prompt template and prompt variable

In [None]:
from langchain import PromptTemplate

generation_prompt_text = """
Generate high-quality content based on the following question. Use the reference content to guide your generation, ensuring it aligns with the question's context and intent.

Question: {question}

Reference Content: {reference_text}

Generated Content:
"""

generation_prompt = PromptTemplate(
    input_variables=["question", "reference_text"],
    template=generation_prompt_text
)

## Step 5 - Configure the `watsonx.governance` metrics
<a id="config"></a>

Configure the required metrics

In [None]:
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMTextMetricGroup, LLMGenerationMetrics

config_json = {
            "configuration": {
                "record_level": True,
                LLMTextMetricGroup.GENERATION.value: {
                LLMGenerationMetrics.ROUGE_SCORE.value: {},
                LLMGenerationMetrics.METEOR.value: {},
                LLMGenerationMetrics.NORMALIZED_RECALL.value: {},
                LLMGenerationMetrics.NORMALIZED_PRECISION.value: {},
                LLMGenerationMetrics.NORMALIZED_F1_SCORE.value: {},
                LLMGenerationMetrics.BLEU.value: {},
                LLMGenerationMetrics.FLESCH.value: {}
            }
        }
    }

### Create watsonx.governance client 

In [None]:
CLOUD_API_KEY = "<EDIT THIS>"

from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *

authenticator = IAMAuthenticator(apikey=CLOUD_API_KEY, url="https://iam.cloud.ibm.com")
client = APIClient(authenticator=authenticator, service_url="https://aiopenscale.cloud.ibm.com")

print(client.version)

3.0.41


### Step 6 - Run the LLMChain to generate response and compute the watsonx.governance metrics using callback

#### Intialize LLMChain

In [None]:
from langchain.chains import LLMChain
generation_chain = LLMChain(llm=watsonx_llm, prompt=generation_prompt)

#### WatsonxGovCallbackHandler parameters
| Parameter | Description | Type | Default Value  |
|:-|:-|:-|:-|
| configuration* | Configuration of metrics to be evaluated | dictionary |  |
| watsonxgov_client* | watsonx client objects |  |  |
| source | The context from which the model answers the question | dictionary |  |
| reference | The reference for the response generated for the model | dictionary |  |
| record_id | record id for the record getting evaluated | string |  |
| debug | flag variable to handle the debugs during the execution | boolean | false |

In [None]:
from ibm_watson_openscale.callbacks.langchain import WatsonxGovCallbackHandler

answers=[]
record_level_metrics=[]

for input_row in sources:
    question = input_row["question"]
    reference_text = input_row["reference_text"]
    handler=WatsonxGovCallbackHandler(configuration=config_json, watsonxgov_client=client, source=input_row, reference={"reference_text": reference_text})
    result=generation_chain.run({"question": question, "reference_text": reference_text}, callbacks=[handler])
    answers.append(result)
    record_level_metrics.append(handler.computed_metrics)

Evaluating for record eaf3f38b-6f04-4278-8428-b3ffb0c89996
Evaluating for record 508f7543-9722-42e2-9ca7-965a048056a5
Evaluating for record 7182b7f7-ab07-408e-abb0-6a9a2febb8ce
Evaluating for record 3c2d491e-7be1-4521-a626-a94edff93e2d
Evaluating for record 46c4ffde-d2cc-4f8e-9981-3da53567afac
Evaluating for record 9a6d49e2-0b78-4704-ac51-aa31adb9b390
Evaluating for record 39a174f6-4afa-4764-9394-c2e4265dcbfd
Evaluating for record b2d22562-53e4-4ee0-92ec-eee9d2962a56
Evaluating for record 86cc8033-e750-4ce8-9bce-e68921a3540e
Evaluating for record 9e605947-61e9-4d89-a997-0a56c2364250
Evaluating for record 6495f275-3df8-4b61-9626-5c755712d78b
Evaluating for record 4f1cf1a6-b561-4e99-a5a9-ac2e44a4720e
Evaluating for record 7639b7f4-10b1-4b1e-ab16-0e53688668df
Evaluating for record 9536f3a2-efd3-41ec-adab-c800496c91d8
Evaluating for record 423e6f62-117b-46c9-8bd5-f6381b0ee917
Evaluating for record 77c823e5-1ce1-48bc-9e20-bc6311263fc5
Evaluating for record 827bc0ca-2e57-4ac8-be10-047df15436

#### Run this cell to get the combined metrics results

In [None]:
import json
metric_result = WatsonxGovCallbackHandler.aggregate_result(record_level_metrics)
print(json.dumps(metric_result,indent=2))

{
  "flesch": {
    "record_level_metrics": [
      {
        "record_id": "eaf3f38b-6f04-4278-8428-b3ffb0c89996",
        "flesch_reading_ease": 35.27,
        "flesch_kincaid_grade": 13.1
      },
      {
        "record_id": "508f7543-9722-42e2-9ca7-965a048056a5",
        "flesch_reading_ease": 40.55,
        "flesch_kincaid_grade": 11.0
      },
      {
        "record_id": "7182b7f7-ab07-408e-abb0-6a9a2febb8ce",
        "flesch_reading_ease": 44.95,
        "flesch_kincaid_grade": 11.4
      },
      {
        "record_id": "3c2d491e-7be1-4521-a626-a94edff93e2d",
        "flesch_reading_ease": 59.33,
        "flesch_kincaid_grade": 10.0
      },
      {
        "record_id": "46c4ffde-d2cc-4f8e-9981-3da53567afac",
        "flesch_reading_ease": 36.63,
        "flesch_kincaid_grade": 14.6
      },
      {
        "record_id": "9a6d49e2-0b78-4704-ac51-aa31adb9b390",
        "flesch_reading_ease": 41.7,
        "flesch_kincaid_grade": 12.7
      },
      {
        "record_id": "39a174f

## Step 7 - Display the results <a id="results"></a>

### Metric results for all the records

In [None]:
# Display results
results_df = data.copy()
results_df['answer'] = answers
for k, v in metric_result.items():
    for rm in v.get("record_level_metrics"):
        for m, mv in rm.items():
            if m != "record_id":
                results_df[m] = [r.get(m) for r in v.get("record_level_metrics")]
results_df

Unnamed: 0,question,generated_text,reference_text,answer,flesch_reading_ease,flesch_kincaid_grade,bleu,precisions,brevity_penalty,length_ratio,...,normalized_precision,normalized_recall,rouge1,rouge2,rougeL,rougeLsum,rouge1_recall,rouge2_recall,rougeL_recall,rougeLsum_recall
0,What are the benefits of regular exercise?,"Regular exercise has numerous benefits, includ...","Regular exercise has numerous benefits, includ...",\n1. Improved cardiovascular health: Regular e...,35.27,13.1,0.069222,"[0.24444444444444444, 0.0898876404494382, 0.04...",1.0,2.727273,...,0.263889,0.703704,0.3846,0.2157,0.2692,0.3846,0.7143,0.4074,0.5,0.7143
1,What is the process of photosynthesis?,Photosynthesis is the process by which plants ...,Photosynthesis is the process by which plants ...,\nPhotosynthesis is a vital process carried ou...,40.55,11.0,0.145045,"[0.35, 0.189873417721519, 0.10256410256410256,...",1.0,2.424242,...,0.389831,0.821429,0.5049,0.2772,0.3883,0.5049,0.8387,0.4667,0.6452,0.8387
2,What are the key features of a smartphone?,A smartphone is a mobile device that typically...,A smartphone is a mobile device that typically...,\n1. Touchscreen: A smartphone's primary inter...,44.95,11.4,0.0,"[0.2111111111111111, 0.0449438202247191, 0.011...",1.0,2.093023,...,0.19403,0.40625,0.2936,0.0748,0.2385,0.2752,0.4324,0.1111,0.3514,0.4054
3,How does the immune system work?,The immune system is a complex network of cell...,The immune system is a complex network of cell...,\nThe immune system is a complex network of ce...,59.33,10.0,0.209733,"[0.3473684210526316, 0.2127659574468085, 0.172...",1.0,2.209302,...,0.315068,0.69697,0.4538,0.2564,0.3697,0.4202,0.7297,0.4167,0.5946,0.6757
4,What is the capital of France?,"The capital of France is Paris, which is known...","The capital of France is Paris, which is known...","Paris is indeed the capital city of France, a ...",36.63,14.6,0.0,"[0.3582089552238806, 0.12121212121212122, 0.03...",1.0,1.914286,...,0.422222,0.655172,0.5238,0.2439,0.3333,0.4762,0.6875,0.3226,0.4375,0.625
5,What is climate change?,Climate change refers to long-term alterations...,Climate change refers to long-term alterations...,Climate change is a long-term shift in tempera...,41.7,12.7,0.07951,"[0.4148936170212766, 0.16129032258064516, 0.05...",1.0,1.843137,...,0.405063,0.744186,0.5303,0.2,0.3788,0.4848,0.7609,0.2889,0.5435,0.6957
6,How does the water cycle work?,"The water cycle, also known as the hydrologic ...","The water cycle, also known as the hydrologic ...","\nThe water cycle, also known as the hydrologi...",45.46,11.2,0.180496,"[0.375, 0.1724137931034483, 0.1395348837209302...",1.0,1.76,...,0.366667,0.578947,0.5128,0.2261,0.359,0.4103,0.6522,0.2889,0.4565,0.5217
7,What are the main principles of democracy?,Democracy is a system of government where the ...,Democracy is a system of government where the ...,\nDemocracy is a form of government in which p...,24.98,14.9,0.099147,"[0.32608695652173914, 0.0989010989010989, 0.06...",1.0,1.916667,...,0.25,0.527778,0.384,0.1301,0.288,0.384,0.5854,0.2,0.439,0.5854
8,How do plants photosynthesize?,"Plants photosynthesize by using sunlight, carb...","Plants photosynthesize by using sunlight, carb...",\nPlants photosynthesize through a process tha...,51.89,10.8,0.104342,"[0.2823529411764706, 0.11904761904761904, 0.07...",1.0,1.931818,...,0.223881,0.428571,0.3167,0.1356,0.2667,0.2667,0.475,0.2051,0.4,0.4
9,What are the key components of a computer system?,A computer system consists of several key comp...,A computer system consists of several key comp...,\n1. Central Processing Unit (CPU): Often refe...,45.25,11.3,0.0,"[0.22340425531914893, 0.043010752688172046, 0....",1.0,2.043478,...,0.196721,0.352941,0.2909,0.0741,0.1636,0.2364,0.4324,0.1111,0.2432,0.3514
