# Evaluate RAG application using langchain and watsonx.governance

This notebook demonstrates the creation of Retrieval Augumented Generation(RAG) application using langchain and watsonx.ai, evaluation of the application using watsonx.governance callback handler.

## Learning goals

- Read data into a vector database
- Initialize foundation model
- Generate RAG responses
- Configure and compute metrics


**Note:** Search for `<EDIT THIS>` and provide the inputs.

**Please run the notebook in an environment with memory greater than 4GB**

## Contents

- [Step 1 - Setup](#setup)
- [Step 2 - Read and store data in a vector database](#data)
- [Step 3 - Initialize a foundation model using `watsonx.ai`](#model)
- [Step 4 - Create the prompt and inputs for the prompt template](#predict)
- [Step 5 - Configure the `watsonx.governance` metrics](#config)
- [Step 6 - Run the LLMChain to generate response and compute the watsonx.governance metrics using callback](#compute)
- [Step 7 - Display the results](#results)

## Step 1 - Setup <a id="setup"></a>

### Install the necessary libraries

In [None]:
!pip install -U "ibm-metrics-plugin~=5.1.0" | tail -n 1
!pip install -U ibm-watson-openscale | tail -n 1
!pip install -U ibm-watsonx-ai | tail -n 1
!pip install nest_asyncio unitxt torch==2.1.0 | tail -n 1
!pip install langchain==0.3.4 | tail -n 1
!pip install langchain-huggingface==0.1.2 | tail -n 1
!pip install wget | tail -n 1
!pip install sentence-transformers | tail -n 1
!pip install chromadb==0.4.13 | tail -n 1
!pip install pydantic | tail -n 1
!pip install langchain-ibm | tail -n 1
!pip install nltk | tail -n 1

import warnings
warnings.filterwarnings("ignore")

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autoai-libs 2.0.9 requires scikit-learn==1.3.*, but you have scikit-learn 1.5.1 which is incompatible.
autoai-ts-libs 4.0.11 requires scikit-learn==1.3.*, but you have scikit-learn 1.5.1 which is incompatible.
lale 0.8.4 requires scikit-learn<1.5.0,>=1.0.0, but you have scikit-learn 1.5.1 which is incompatible.[0m[31m
[0mSuccessfully installed boto3-1.34.162 botocore-1.34.162 h5py-3.11.0 ibm-metrics-plugin-5.1.0.9 ibm-wos-utils-5.1.0.7 imageio-2.27.0 jenkspy-0.4.1 more-itertools-10.2.0 pyparsing-3.1.4 retrying-1.3.4 s3transfer-0.10.4 scikit-learn-1.5.1 service-locator-0.1.3 shap-0.45.1 slicer-0.0.8 spark-nlp-5.3.3 threadpoolctl-3.5.0 toolz-0.12.1 transformers-4.39.3
Successfully installed ibm-watsonx-ai-1.1.24
Successfully installed evaluate-0.4.3 ipadic-1.0.0 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-

**Note**: you may need to restart the kernel to use updated libraries.

### Configure your credentials

In [None]:
# CPD credentials
credentials = {
    "url": "<EDIT THIS>",
    "username": "<EDIT THIS>",
    "password" : "<EDIT THIS>",
    "instance_id": "openshift",
    "apikey": "<EDIT THIS>",
    "version" : "5.0"
}

### Configure your project id
Provide the project id to provide the context needed to run the inference against the watsonx.ai model.

***Hint***: You can find the `project_id` as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be "Projects / *project name* /". Click on the "*project name*" link, then get the `project_id` from the project's "Manage" tab ("Project -> Manage -> General -> Details").

In [None]:
project_id = "<EDIT THIS>"

## Step 2 - Read and store data in a vector database <a id="data"></a>

### Read the data

Download the sample "State of the Union" file.

In [3]:
import wget
import os

data = 'state_of_the_union.txt'
url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/foundation_models/state_of_the_union.txt'

if not os.path.isfile(data):
    wget.download(url, out=data)

### Prepare the data for the vector database

Take the `state_of_the_union.txt` speech content data and split it into chunks. 

In [4]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

loader = TextLoader(data)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

### Create an embedding function to store the data in a vector database

Embed the chunked data using an open-source embedding model and load it into Chromadb, a vector database.

**Note**: You can also provide a custom embedding function to be used by Chromadb; the performance of Chromadb may differ depending on the embedding model used.

In [5]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)

  warn(


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Step 3 - Initialize a foundation model using `watsonx.ai`
<a id="model"></a>

IBM watsonx foundation models are among the <a href="https://python.langchain.com/docs/integrations/llms/watsonxllm" target="_blank" rel="noopener no referrer">list of LLM models supported by Langchain</a>. This example shows how to communicate with <a href="https://newsroom.ibm.com/2023-09-28-IBM-Announces-Availability-of-watsonx-Granite-Model-Series,-Client-Protections-for-IBM-watsonx-Models" target="_blank" rel="noopener no referrer">the Granite Model Series</a> using <a href="https://python.langchain.com/docs/get_started/introduction" target="_blank" rel="noopener no referrer">Langchain</a>.

### Define a model
Specify a `model_id` that will be used for inferencing:

In [6]:
model_id = "ibm/granite-3-8b-instruct"

### Define the model parameters
Provide a set of model parameters that will influence the result:

In [7]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY.value,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}

### Create watsonx model
Initialize the model from watsonx.ai with required parameters, and using `ibm/granite-13b-chat-v2`.

In [8]:
from ibm_watsonx_ai.foundation_models import ModelInference

watsonx_model = ModelInference(
    model_id=model_id,
    params={
        "decoding_method": "sample",
        "max_new_tokens": 10,
        "min_new_tokens": 0,
        "temperature":0.0
    },
    credentials=credentials,
    project_id=project_id)

## Step 4 - Create the prompt and inputs for the prompt template
<a id="predict"></a>

### Create questions and contexts data

In [9]:
query1 = "What is ARPA-H?"
query2 = "What is the investment of Ford and GM to build electric vehicles?"
query3 = "What is the proposed tax rate for corporations?"
query4 = "What is Intel going to build?"
query5 = "How many new manufacturing jobs are created last year?"
query6 = "How many electric vehicle charging stations are built?"

questions = [query1 , query2, query3, query4, query5, query6]

In [10]:
contexts = []
for query in questions:
    #Retrive relevant context for each question from the vector db
    docs = docsearch.as_retriever().get_relevant_documents(query)

    context = []
    #Extract the needed information
    for doc in docs:
        context.append(doc.to_json()['kwargs']['page_content'])

    #Capture the context
    contexts.append(context)

  docs = docsearch.as_retriever().get_relevant_documents(query)


### Construct a dataframe with question, contexts and answer to be used for metrics computation
<a id="predict"><a>

In [11]:
import pandas as pd
data = pd.DataFrame(contexts, columns=["context1", "context2", "context3", "context4"])
data["question"] = questions
data

Unnamed: 0,context1,context2,context3,context4,question
0,"Last month, I announced our plan to supercharg...",We’re also ready with anti-viral treatments. I...,For that purpose we’ve mobilized American grou...,"If you travel 20 miles east of Columbus, Ohio,...",What is ARPA-H?
1,So let’s not wait any longer. Send it to my de...,"If you travel 20 miles east of Columbus, Ohio,...",When we use taxpayer dollars to rebuild Americ...,It is going to transform America and put us on...,What is the investment of Ford and GM to build...
2,My plan will cut the cost in half for most fam...,We got more than 130 countries to agree on a g...,And unlike the $2 Trillion tax cut passed in t...,We’re going after the criminals who stole bill...,What is the proposed tax rate for corporations?
3,"If you travel 20 miles east of Columbus, Ohio,...",So let’s not wait any longer. Send it to my de...,When we use taxpayer dollars to rebuild Americ...,It is going to transform America and put us on...,What is Intel going to build?
4,So let’s not wait any longer. Send it to my de...,"If you travel 20 miles east of Columbus, Ohio,...",When we use taxpayer dollars to rebuild Americ...,"As Ohio Senator Sherrod Brown says, “It’s time...",How many new manufacturing jobs are created la...
5,So let’s not wait any longer. Send it to my de...,"If you travel 20 miles east of Columbus, Ohio,...",It is going to transform America and put us on...,Vice President Harris and I ran for office wit...,How many electric vehicle charging stations ar...


In [12]:
df_input = pd.DataFrame(data, columns=["context1", "context2", "context3", "context4", "question"])

sources = df_input[["context1", "context2", "context3", "context4", "question"]].to_dict(orient='records')

### Create the prompt template and prompt variable

In [13]:
from langchain import PromptTemplate

rag_prompt_text = """
Based on the contexts provided, answer the question. Provide the answer in a complete sentence.

{context1}

{context2}

{context3}

{context4}

Question : {question}
Answer: 
"""

rag_prompt = PromptTemplate(
    input_variables=["context1","context2","context3","context4","question"],
    template=rag_prompt_text
)

## Step 5 - Configure the `watsonx.governance` metrics
<a id="config"></a>

Configure the required metrics

In [14]:
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMTextMetricGroup, LLMCommonMetrics, LLMQAMetrics, LLMRAGMetrics

# Edit below values based on the input data
context_columns = ["context1", "context2", "context3", "context4"]
question_column = "question"
answer_column = "answer"

config_json = {
            "configuration": {
                "context_columns": context_columns,
                "question_column": question_column,
                "record_level": True,
                 LLMTextMetricGroup.RAG.value: {
                    LLMCommonMetrics.FAITHFULNESS.value: {
                    },
                    LLMCommonMetrics.ANSWER_RELEVANCE.value: {
                    },
                    LLMQAMetrics.UNSUCCESSFUL_REQUESTS.value: {
                    },
                    LLMRAGMetrics.RETRIEVAL_QUALITY.value: {
                    },
                    LLMCommonMetrics.CONTENT_ANALYSIS.value: {},
                    LLMCommonMetrics.UNSUCCESSFUL_REQUESTS.value: {
                        # "unsuccessful_phrases": []
                    }
                }
            }
        }

### Create watsonx.governance client 

In [15]:
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *

authenticator = CloudPakForDataAuthenticator(
        url=credentials['url'],
        username=credentials['username'],
        apikey=credentials['apikey'],
        disable_ssl_verification=True,
    )

client = APIClient(service_url=credentials['url'],authenticator=authenticator)

client.version

print(client.version)

3.0.41


### Step 6 - Run the LLMChain to generate response and compute the watsonx.governance metrics using callback

#### Intialize LLMChain

In [16]:
from langchain.chains import LLMChain
rag_chain = LLMChain(llm=watsonx_model.to_langchain(), prompt=rag_prompt)

#### WatsonxGovCallbackHandler parameters
| Parameter | Description | Type | Default Value  |
|:-|:-|:-|:-|
| configuration* | Configuration of metrics to be evaluated | dictionary |  |
| watsonxgov_client* | watsonx client objects |  |  |
| source | The context from which the model answers the question | dictionary |  |
| reference | The reference for the response generated for the model | dictionary |  |
| record_id | record id for the record getting evaluated | string |  |
| debug | flag variable to handle the debugs during the execution | boolean | false |

In [17]:
from ibm_watson_openscale.callbacks.langchain import WatsonxGovCallbackHandler

answers=[]
record_level_metrics=[]

for input_text in sources:
    handler=WatsonxGovCallbackHandler(configuration=config_json, watsonxgov_client=client, source=input_text)
    result=rag_chain.run(input_text, callbacks=[handler])
    answers.append(result)
    record_level_metrics.append(handler.computed_metrics)

Evaluating for record fbde2013-a9a4-461b-b440-5c5a1c976fb6


[nltk_data] Downloading package punkt_tab to /home/wsuser/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
2024-11-26T06:31:52.160362 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.answer-relevance` does not contain a `config.yml` file.')
2024-11-26T06:31:52.162325 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.groundedness` does not contain a `config.yml` file.')
2024-11-26T06:31:52.163498 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.context-relevance` does not contain a `config.yml` file.')
2024-11-26T06:32:13.487677 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.answer-relevance` does not c

Evaluating for record 9a2f9a56-907b-4be0-ad96-3bb9663f4d87


2024-11-26T06:32:34.581349 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.answer-relevance` does not contain a `config.yml` file.')
2024-11-26T06:32:34.582918 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.groundedness` does not contain a `config.yml` file.')
2024-11-26T06:32:34.583667 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.context-relevance` does not contain a `config.yml` file.')


Evaluating for record fa22c572-ef60-45aa-acf7-2ebc6f8d6d1f


2024-11-26T06:32:55.696750 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.answer-relevance` does not contain a `config.yml` file.')
2024-11-26T06:32:55.697724 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.groundedness` does not contain a `config.yml` file.')
2024-11-26T06:32:55.698353 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.context-relevance` does not contain a `config.yml` file.')


Evaluating for record 890fffaa-fc7a-46de-a16e-944e76d74407


2024-11-26T06:33:16.731228 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.answer-relevance` does not contain a `config.yml` file.')
2024-11-26T06:33:16.732532 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.groundedness` does not contain a `config.yml` file.')
2024-11-26T06:33:16.733309 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.context-relevance` does not contain a `config.yml` file.')


Evaluating for record 5abbd438-516a-4c6a-b262-16767ba0fddc


2024-11-26T06:33:37.762762 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.answer-relevance` does not contain a `config.yml` file.')
2024-11-26T06:33:37.764416 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.groundedness` does not contain a `config.yml` file.')
2024-11-26T06:33:37.765236 [MDLMN:ERRR] <COR80419785E> exception raised: FileNotFoundError('Module load path `/opt/ibm/nlpmodels/classification_transformer_en_slate.125m.context-relevance` does not contain a `config.yml` file.')


Evaluating for record 8d0131af-054f-4222-b6b4-9a7e58e8faa1


#### Run this cell to get the combined metrics results

In [18]:
import json
metric_result = WatsonxGovCallbackHandler.aggregate_result(record_level_metrics)
print(json.dumps(metric_result,indent=2))

{
  "answer_relevance": {
    "record_level_metrics": [
      {
        "answer_relevance": 0.9937,
        "record_id": "af402d61-e77d-4985-b2dd-56f977a66c86"
      },
      {
        "answer_relevance": 0.2211,
        "record_id": "2c93eeea-7a11-4407-92c9-c264589beb8e"
      },
      {
        "answer_relevance": 0.9956,
        "record_id": "dee60774-90f6-4561-bff9-555a796fe7bb"
      },
      {
        "answer_relevance": 0.9903,
        "record_id": "8db499da-d915-4aaf-978f-172007624106"
      },
      {
        "answer_relevance": 0.0063,
        "record_id": "50f36916-01a8-42f5-8eec-af19e2e8d264"
      },
      {
        "answer_relevance": 0.0036,
        "record_id": "4b7ae67a-a69e-47ac-be4a-f2d0e2a64d3d"
      }
    ],
    "metric_value": 0.5351,
    "mean": 0.5351,
    "min": 0.0036,
    "max": 0.9956,
    "std": 0.4637346475446204
  },
  "faithfulness": {
    "record_level_metrics": [
      {
        "faithfulness": 0.7348,
        "faithfulness_attributions": [
          

## Step 7 - Display the results <a id="results"></a>

### Metric results for all the records

In [19]:
# Display results
results_df = data.copy()
results_df['answer'] = answers
for k, v in metric_result.items():
    for rm in v.get("record_level_metrics"):
        for m, mv in rm.items():
            if m != "record_id":
                results_df[m] = [r.get(m) for r in v.get("record_level_metrics")]
results_df

Unnamed: 0,context1,context2,context3,context4,question,answer,answer_relevance,faithfulness,faithfulness_attributions,average_precision,context_relevance,context_relevances,hit_rate,ndcg,reciprocal_rank,retrieval_precision,coverage,density,abstractness,unsuccessful_requests
0,"Last month, I announced our plan to supercharg...",We’re also ready with anti-viral treatments. I...,For that purpose we’ve mobilized American grou...,"If you travel 20 miles east of Columbus, Ohio,...",What is ARPA-H?,"\nARPA-H, or the Advanced Research",0.9937,0.7348,[{'attributions': [{'faithfulness_scores': [0....,1.0,0.8103,"{'context_columns': ['context1', 'context2', '...",1,0.9926,1.0,0.25,0.1017,0.3889,0.2,0
1,So let’s not wait any longer. Send it to my de...,"If you travel 20 miles east of Columbus, Ohio,...",When we use taxpayer dollars to rebuild Americ...,It is going to transform America and put us on...,What is the investment of Ford and GM to build...,\nFord is investing $11 billion,0.2211,0.9929,[{'attributions': [{'faithfulness_scores': [0....,0.0,0.2005,"{'context_columns': ['context1', 'context2', '...",0,1.0,0.0,0.0,0.0342,1.0,0.2,0
2,My plan will cut the cost in half for most fam...,We got more than 130 countries to agree on a g...,And unlike the $2 Trillion tax cut passed in t...,We’re going after the criminals who stole bill...,What is the proposed tax rate for corporations?,\nThe proposed tax rate for corporations is,0.9956,0.7354,[{'attributions': [{'faithfulness_scores': [0....,0.0,0.1992,"{'context_columns': ['context1', 'context2', '...",0,0.9997,0.0,0.0,0.0889,0.3878,0.0,0
3,"If you travel 20 miles east of Columbus, Ohio,...",So let’s not wait any longer. Send it to my de...,When we use taxpayer dollars to rebuild Americ...,It is going to transform America and put us on...,What is Intel going to build?,\nIntel is going to build eight state-of,0.9903,0.0467,[{'attributions': [{'faithfulness_scores': [0....,1.0,0.8737,"{'context_columns': ['context1', 'context2', '...",1,1.0,1.0,0.25,0.0683,0.2857,0.0,0
4,So let’s not wait any longer. Send it to my de...,"If you travel 20 miles east of Columbus, Ohio,...",When we use taxpayer dollars to rebuild Americ...,"As Ohio Senator Sherrod Brown says, “It’s time...",How many new manufacturing jobs are created la...,"\nWe created 369,00",0.0063,0.2225,[{'attributions': [{'faithfulness_scores': [0....,1.0,0.8313,"{'context_columns': ['context1', 'context2', '...",1,0.9971,1.0,0.25,0.0115,0.4444,0.0,0
5,So let’s not wait any longer. Send it to my de...,"If you travel 20 miles east of Columbus, Ohio,...",It is going to transform America and put us on...,Vice President Harris and I ran for office wit...,How many electric vehicle charging stations ar...,\nThe answer is not explicitly stated in the p...,0.0036,0.272,[{'attributions': [{'faithfulness_scores': [0....,0.0,0.5984,"{'context_columns': ['context1', 'context2', '...",0,0.6561,0.0,0.0,0.0814,0.1111,0.4444,0
