# Evaluate Summarisation using langchain and watsonx.governance

This notebook demonstrates the working flow of Summarisation using langchain and watsonx.ai, evaluation of the application using watsonx.governance callback handler.

## Learning goals

- Read data
- Initialize foundation model
- Generate the responses
- Configure and compute metrics


**Note:** Search for `<EDIT THIS>` and provide the inputs.

**Please run the notebook in an environment with memory greater than 4GB**

## Contents

- [Step 1 - Setup](#setup)
- [Step 2 - Read and store data](#data)
- [Step 3 - Initialize a foundation model using `watsonx.ai`](#model)
- [Step 4 - Create the prompt and inputs for the prompt template](#predict)
- [Step 5 - Configure the `watsonx.governance` metrics](#config)
- [Step 6 - Run the LLMChain to generate response and compute the watsonx.governance metrics using callback](#compute)
- [Step 7 - Display the results](#results)

## Step 1 - Setup <a id="setup"></a>

### Install the necessary libraries

In [None]:
!pip install wget 
!pip install nltk
!pip install -U chromadb
!pip install -qU langchain-ibm
!pip install -U ibm-watsonx-ai
!pip install sacrebleu sacremoses
!pip install -U ibm-watson-openscale
!pip install ibm-metrics-plugin~=5.0.3.0
!pip install nest_asyncio unitxt torch==2.1.0 
!pip install textstat pydantic-settings sentence-transformers
!pip install -U langchain langchain-core langchain-community

import warnings
warnings.filterwarnings("ignore")

In [None]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /home/wsuser/nltk_data...


True

**Note**: you may need to restart the kernel to use updated libraries.

### Configure your cloud credentials with IBM's APIClient object

In [None]:
from ibm_watsonx_ai import APIClient

api_client = APIClient(credentials = {
                                "url" : "<EDIT THIS>",
                                "apikey" : "<EDIT THIS>",
                                "project_id" : "<EDIT THIS>",
                            })

## Step 2 - Read and store data <a id="data"></a>

### Read the data

Download the sample "LLM Content Generation" file.

In [None]:
import wget
import os

!wget "https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/watsonx/llm_content.csv"

--2024-11-18 15:49:47--  https://raw.githubusercontent.com/IBM/watson-openscale-samples/main/IBM%20Cloud/WML/assets/data/watsonx/llm_content.csv


Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.


HTTP request sent, awaiting response... 

200 OK
Length: 42771 (42K) [application/octet-stream]
Saving to: ‘llm_content.csv.3’


2024-11-18 15:49:48 (12.0 MB/s) - ‘llm_content.csv.3’ saved [42771/42771]



In [None]:
import pandas as pd

data = pd.read_csv("llm_content.csv",encoding="latin-1")
data

Unnamed: 0,input_text,generated_summary,reference_summary_1,reference_summary_2
0,Scientists have discovered a new species of de...,New bioluminescent fish species found in deep ...,Discovery of deep-sea fish emitting soothing l...,Scientists find new bioluminescent fish specie...
1,An international team of astronomers has ident...,Distant exoplanet\'s water vapor-filled atmosp...,Astronomers identify exoplanet with water vapo...,Discovery of exoplanet with water vapor in its...
2,Researchers have developed a novel nanotechnol...,New nanotechnology-based cancer treatment demo...,Researchers create cancer treatment using nano...,Innovative cancer treatment utilizing nanotech...
3,A new app is aiming to reduce food waste by co...,App connects local restaurants with customers ...,New sustainability-focused app facilitates sal...,Initiative to reduce food waste involves app c...
4,Archaeologists have uncovered an ancient city ...,"Ancient city dating back over 4,000 years disc...",Archaeological find in Iraq reveals ancient ci...,"Discovery of 4,000-year-old ancient city in Ku..."
...,...,...,...,...
57,A new species of bioluminescent fish has been ...,A newly discovered bioluminescent fish species...,A recent deep-sea exploration has uncovered a ...,The discovery of a novel bioluminescent fish s...
58,The AI revolution is changing how we work and ...,AI revolution,The AI revolution is introducing advanced tech...,"As the AI revolution progresses, it is bringin..."
59,Visit httr://ibm.com for more information on I...,Visit httr://ibm.com for the latest informatio...,To explore IBMÕs cutting-edge technologies and...,For comprehensive details on IBMÕs offerings a...
60,Verify the IP address settings,Check the IP address configuration,Check the IP address settings,Inspect IP address configuration


## Step 3 - Initialize a foundation model using `watsonx.ai`
<a id="model"></a>

IBM watsonx foundation models are among the <a href="https://python.langchain.com/docs/integrations/llms/watsonxllm" target="_blank" rel="noopener no referrer">list of LLM models supported by Langchain</a>. This example shows how to communicate with <a href="https://newsroom.ibm.com/2023-09-28-IBM-Announces-Availability-of-watsonx-Granite-Model-Series,-Client-Protections-for-IBM-watsonx-Models" target="_blank" rel="noopener no referrer">the Granite Model Series</a> using <a href="https://python.langchain.com/docs/get_started/introduction" target="_blank" rel="noopener no referrer">Langchain</a>.

### Define the model parameters
Provide a set of model parameters that will influence the result:

In [None]:
parameters = {
    "decoding_method": "greedy",
    "max_new_tokens": 100,
    "min_new_tokens": 1,
    "temperature": 0.5,
    "top_k": 50,
    "top_p": 1,
}

### Create watsonx model with IBM's APIClient object into the WatsonxLLM class
Initialize the model from watsonx.ai with required parameters, and using `ibm/granite-13b-chat-v2`.

In [None]:
from langchain_ibm import WatsonxLLM

watsonx_llm = WatsonxLLM(
    model_id = "ibm/granite-13b-chat-v2",
    watsonx_client = api_client,
    params = parameters,
    project_id = "<EDIT THIS>"
)

## Step 4 - Create the prompt and inputs for the prompt template
<a id="predict"></a>

### Construct a dataframe with question, generated text and reference text to be used for metrics computation
<a id="predict"><a>

In [None]:
df_input = pd.DataFrame(data, columns=["input_text", "generated_summary", "reference_summary_1", "reference_summary_2"])

sources = df_input.to_dict(orient='records')

### Create the prompt template and prompt variable

In [None]:
from langchain import PromptTemplate

summarization_prompt_text = """
Summarize the following content concisely while retaining the key information:

Input Text: {input_text}

Summary:
"""

summarization_prompt = PromptTemplate(
    input_variables=["input_text"],
    template=summarization_prompt_text
)

## Step 5 - Configure the `watsonx.governance` metrics
<a id="config"></a>

Configure the required metrics

In [None]:
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMTextMetricGroup, LLMSummarizationMetrics

config_json = {
            "configuration": {
                "record_level": True,
                LLMTextMetricGroup.SUMMARIZATION.value: {
                LLMSummarizationMetrics.ROUGE_SCORE.value: {},
                LLMSummarizationMetrics.SARI.value: {},
                LLMSummarizationMetrics.METEOR.value: {},
                LLMSummarizationMetrics.NORMALIZED_RECALL.value: {},
                LLMSummarizationMetrics.NORMALIZED_PRECISION.value: {},
                LLMSummarizationMetrics.NORMALIZED_F1_SCORE.value: {},
                LLMSummarizationMetrics.COSINE_SIMILARITY.value: {},
                LLMSummarizationMetrics.JACCARD_SIMILARITY.value: {},
                LLMSummarizationMetrics.BLEU.value: {},
                LLMSummarizationMetrics.FLESCH.value: {}
        }
            }
        }

### Create watsonx.governance client 

In [None]:
CLOUD_API_KEY = "<EDIT THIS>"

from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *

authenticator = IAMAuthenticator(apikey=CLOUD_API_KEY, url="https://iam.cloud.ibm.com")
client = APIClient(authenticator=authenticator, service_url="https://aiopenscale.cloud.ibm.com")

print(client.version)

3.0.41


### Step 6 - Run the LLMChain to generate response and compute the watsonx.governance metrics using callback

#### Intialize LLMChain

In [None]:
from langchain.chains import LLMChain
summarization_chain = LLMChain(llm=watsonx_llm, prompt=summarization_prompt)

#### WatsonxGovCallbackHandler parameters
| Parameter | Description | Type | Default Value  |
|:-|:-|:-|:-|
| configuration* | Configuration of metrics to be evaluated | dictionary |  |
| watsonxgov_client* | watsonx client objects |  |  |
| source | The context from which the model answers the question | dictionary |  |
| reference | The reference for the response generated for the model | dictionary |  |
| record_id | record id for the record getting evaluated | string |  |
| debug | flag variable to handle the debugs during the execution | boolean | false |

In [None]:
from ibm_watson_openscale.callbacks.langchain import WatsonxGovCallbackHandler

answers = []
record_level_metrics = []

for input_row in sources:
    input_text = input_row["input_text"]
    reference_summaries = { "reference_summary_1": input_row["reference_summary_1"], "reference_summary_2": input_row["reference_summary_2"],}
    handler = WatsonxGovCallbackHandler(configuration=config_json, watsonxgov_client=client, source={"input_text": input_text}, reference=reference_summaries)
    result = summarization_chain.run({"input_text": input_text}, callbacks=[handler])
    answers.append(result)
    record_level_metrics.append(handler.computed_metrics)

Evaluating for record c489081c-d20f-408f-b08c-483967df8104
Evaluating for record 01f097c0-d042-4b11-8998-dd3ee6a5d8a4
Evaluating for record 8256d92f-9c03-4b7d-be0a-140c99e9124c
Evaluating for record 1908322f-43f2-41be-b49d-089043e785bd
Evaluating for record 5a99ee1e-07b2-46af-9ca3-d0871b80732d
Evaluating for record 423257b7-b06c-4b06-babb-62e18d5683cd
Evaluating for record 985375ad-7a02-4452-b40a-94c2cce82b8b
Evaluating for record 4361e0e1-dfb8-4322-a030-4dc05fd44489
Evaluating for record fbcee7c9-7830-40eb-8938-268d4eb60b24
Evaluating for record a04f7859-04b7-45b2-b710-a67cf6153e17
Evaluating for record 7f2abdef-458e-43d8-8f17-ca6b15beac30
Evaluating for record bd75e003-9ffb-4f03-b1f7-dfadb1a00b38
Evaluating for record 71ae92a7-84b5-4abb-a6f9-ac4e5612ec36
Evaluating for record c8b2871e-2ab3-4dba-935d-8775a9c30abe
Evaluating for record 297c26e1-1918-4d07-88d7-baef1f029132
Evaluating for record 3bca07e8-cf36-4349-a979-bca2cddd3baa
Evaluating for record f11b65dc-e1c6-4e55-8e2d-7cb2e70306

#### Run this cell to get the combined metrics results

In [None]:
import json
metric_result = WatsonxGovCallbackHandler.aggregate_result(record_level_metrics)
print(json.dumps(metric_result,indent=2))

{
  "flesch": {
    "record_level_metrics": [
      {
        "record_id": "c489081c-d20f-408f-b08c-483967df8104",
        "flesch_reading_ease": 60.65,
        "flesch_kincaid_grade": 9.5
      },
      {
        "record_id": "01f097c0-d042-4b11-8998-dd3ee6a5d8a4",
        "flesch_reading_ease": 44.44,
        "flesch_kincaid_grade": 11.6
      },
      {
        "record_id": "8256d92f-9c03-4b7d-be0a-140c99e9124c",
        "flesch_reading_ease": 32.73,
        "flesch_kincaid_grade": 14.0
      },
      {
        "record_id": "1908322f-43f2-41be-b49d-089043e785bd",
        "flesch_reading_ease": 45.76,
        "flesch_kincaid_grade": 11.1
      },
      {
        "record_id": "5a99ee1e-07b2-46af-9ca3-d0871b80732d",
        "flesch_reading_ease": 44.95,
        "flesch_kincaid_grade": 11.4
      },
      {
        "record_id": "423257b7-b06c-4b06-babb-62e18d5683cd",
        "flesch_reading_ease": 7.56,
        "flesch_kincaid_grade": 17.5
      },
      {
        "record_id": "985375ad

## Step 7 - Display the results <a id="results"></a>

### Metric results for all the records

In [None]:
# Display results
results_df = data.copy()
results_df['answer'] = answers
for k, v in metric_result.items():
    for rm in v.get("record_level_metrics"):
        for m, mv in rm.items():
            if m != "record_id":
                results_df[m] = [r.get(m) for r in v.get("record_level_metrics")]
results_df

Unnamed: 0,input_text,generated_summary,reference_summary_1,reference_summary_2,answer,flesch_reading_ease,flesch_kincaid_grade,bleu,precisions,brevity_penalty,...,normalized_recall,rouge1,rouge2,rougeL,rougeLsum,rouge1_recall,rouge2_recall,rougeL_recall,rougeLsum_recall,sari
0,Scientists have discovered a new species of de...,New bioluminescent fish species found in deep ...,Discovery of deep-sea fish emitting soothing l...,Scientists find new bioluminescent fish specie...,Scientists have found a new species of deep-se...,60.65,9.5,0.000000,"[0.11764705882352941, 0.047619047619047616, 0....",1.000000,...,0.714286,0.2340,0.1304,0.2128,0.2340,0.7333,0.4286,0.6667,0.7333,44.531388
1,An international team of astronomers has ident...,Distant exoplanet\'s water vapor-filled atmosp...,Astronomers identify exoplanet with water vapo...,Discovery of exoplanet with water vapor in its...,An international team of astronomers discovere...,44.44,11.6,0.000000,"[0.125, 0.02531645569620253, 0.012820512820512...",1.000000,...,0.642857,0.2174,0.0667,0.1522,0.1739,0.6667,0.2143,0.4667,0.5333,38.741661
2,Researchers have developed a novel nanotechnol...,New nanotechnology-based cancer treatment demo...,Researchers create cancer treatment using nano...,Innovative cancer treatment utilizing nanotech...,Researchers have created a nanotechnology-base...,32.73,14.0,0.000000,"[0.06451612903225806, 0.010869565217391304, 0....",1.000000,...,0.307692,0.1000,0.0204,0.0800,0.0800,0.3846,0.0833,0.3077,0.3077,36.318806
3,A new app is aiming to reduce food waste by co...,App connects local restaurants with customers ...,New sustainability-focused app facilitates sal...,Initiative to reduce food waste involves app c...,A new app is designed to combat food waste by ...,45.76,11.1,0.000000,"[0.06451612903225806, 0.010869565217391304, 0....",1.000000,...,0.416667,0.1200,0.0204,0.1200,0.1200,0.4615,0.0833,0.4615,0.4615,36.340257
4,Archaeologists have uncovered an ancient city ...,"Ancient city dating back over 4,000 years disc...",Archaeological find in Iraq reveals ancient ci...,"Discovery of 4,000-year-old ancient city in Ku...","Archaeologists have discovered a 4,000-year-ol...",44.95,11.4,0.061415,"[0.1875, 0.0759493670886076, 0.038461538461538...",1.000000,...,0.625000,0.3093,0.1474,0.2268,0.2474,0.7895,0.3889,0.5789,0.6316,34.106392
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57,A new species of bioluminescent fish has been ...,A newly discovered bioluminescent fish species...,A recent deep-sea exploration has uncovered a ...,The discovery of a novel bioluminescent fish s...,A new bioluminescent fish species has been dis...,23.73,13.4,0.093217,"[0.4, 0.11363636363636363, 0.06976744186046512...",1.000000,...,0.419355,0.4304,0.1558,0.3291,0.3797,0.4474,0.1622,0.3421,0.3947,41.200166
58,The AI revolution is changing how we work and ...,AI revolution,The AI revolution is introducing advanced tech...,"As the AI revolution progresses, it is bringin...",The AI revolution is transforming the way we l...,28.64,13.5,0.044539,"[0.1702127659574468, 0.06451612903225806, 0.03...",1.000000,...,0.705882,0.2667,0.1165,0.2286,0.2095,0.7368,0.3333,0.6316,0.5789,54.199514
59,Visit httr://ibm.com for more information on I...,Visit httr://ibm.com for the latest informatio...,To explore IBMÕs cutting-edge technologies and...,For comprehensive details on IBMÕs offerings a...,The input text provides a link to IBM's websit...,38.32,11.9,0.000000,"[0.3142857142857143, 0.08823529411764706, 0.03...",1.000000,...,0.360000,0.3692,0.0952,0.2154,0.1846,0.3871,0.1000,0.2258,0.1935,30.666754
60,Verify the IP address settings,Check the IP address configuration,Check the IP address settings,Inspect IP address configuration,"\n1. To verify the IP address settings, open t...",67.15,7.0,0.026196,"[0.046511627906976744, 0.03529411764705882, 0....",1.000000,...,0.750000,0.1176,0.0909,0.1176,0.1176,0.8000,0.7500,0.8000,0.8000,60.105820
