# Evaluate RAG application using langchain and watsonx.governance

This notebook demonstrates the creation of Retrieval Augumented Generation(RAG) application using langchain and watsonx.ai, evaluation of the application using watsonx.governance callback handler.

## Learning goals

- Read data into a vector database
- Initialize foundation model
- Generate RAG responses
- Configure and compute metrics


**Note:** Search for `<EDIT THIS>` and provide the inputs.

**Please run the notebook in an environment with memory greater than 4GB**

## Contents

- [Step 1 - Setup](#setup)
- [Step 2 - Read and store data in a vector database](#data)
- [Step 3 - Initialize a foundation model using `watsonx.ai`](#model)
- [Step 4 - Create the prompt and inputs for the prompt template](#predict)
- [Step 5 - Configure the `watsonx.governance` metrics](#config)
- [Step 6 - Run the LLMChain to generate response and compute the watsonx.governance metrics using callback](#compute)
- [Step 7 - Display the results](#results)

## Step 1 - Setup <a id="setup"></a>

### Install the necessary libraries

In [None]:
!pip install ibm-metrics-plugin~=5.0.3.0
!pip install -U ibm-watson-openscale
!pip install -U ibm-watsonx-ai
!pip install nest_asyncio unitxt torch==2.1.0 
!pip install "langchain==0.0.345" 
!pip install wget 
!pip install sentence-transformers 
!pip install "chromadb==0.3.26" 
!pip install "pydantic==1.10.0" 

import warnings
warnings.filterwarnings("ignore")

**Note**: you may need to restart the kernel to use updated libraries.

### Configure your credentials

In [1]:
# CPD credentials
credentials = {
    "url": "<EDIT THIS>",
    "username": "<EDIT THIS>",
    "password" : "<EDIT THIS>",
    "instance_id": "openshift",
    "apikey": "<EDIT THIS>",
    "version" : "5.0"
}

### Configure your project id
Provide the project id to provide the context needed to run the inference against the watsonx.ai model.

***Hint***: You can find the `project_id` as follows. Open the prompt lab in watsonx.ai. At the very top of the UI, there will be "Projects / *project name* /". Click on the "*project name*" link, then get the `project_id` from the project's "Manage" tab ("Project -> Manage -> General -> Details").

In [2]:
project_id = "<EDIT THIS>"

## Step 2 - Read and store data in a vector database <a id="data"></a>

### Read the data

Download the sample "State of the Union" file.

In [3]:
import wget
import os

data = 'state_of_the_union.txt'
url = 'https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/foundation_models/state_of_the_union.txt'

if not os.path.isfile(data):
    wget.download(url, out=data)

### Prepare the data for the vector database

Take the `state_of_the_union.txt` speech content data and split it into chunks. 

In [4]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

loader = TextLoader(data)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

### Create an embedding function to store the data in a vector database

Embed the chunked data using an open-source embedding model and load it into Chromadb, a vector database.

**Note**: You can also provide a custom embedding function to be used by Chromadb; the performance of Chromadb may differ depending on the embedding model used.

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)

## Step 3 - Initialize a foundation model using `watsonx.ai`
<a id="model"></a>

IBM watsonx foundation models are among the <a href="https://python.langchain.com/docs/integrations/llms/watsonxllm" target="_blank" rel="noopener no referrer">list of LLM models supported by Langchain</a>. This example shows how to communicate with <a href="https://newsroom.ibm.com/2023-09-28-IBM-Announces-Availability-of-watsonx-Granite-Model-Series,-Client-Protections-for-IBM-watsonx-Models" target="_blank" rel="noopener no referrer">the Granite Model Series</a> using <a href="https://python.langchain.com/docs/get_started/introduction" target="_blank" rel="noopener no referrer">Langchain</a>.

### Define a model
Specify a `model_id` that will be used for inferencing:

In [6]:
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes
model_id = ModelTypes.GRANITE_13B_CHAT_V2

### Define the model parameters
Provide a set of model parameters that will influence the result:

In [7]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

parameters = {
    GenParams.DECODING_METHOD: DecodingMethods.GREEDY.value,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.STOP_SEQUENCES: ["<|endoftext|>"]
}

### Create watsonx model
Initialize the model from watsonx.ai with required parameters, and using `ibm/granite-13b-chat-v2`.

In [8]:
from ibm_watsonx_ai.foundation_models import Model

watsonx_model = Model(
    model_id=model_id,
    params={
        "decoding_method": "sample",
        "max_new_tokens": 10,
        "min_new_tokens": 0,
        "temperature":0.0
    },
    credentials=cloud_credentials,
    project_id=project_id)



## Step 4 - Create the prompt and inputs for the prompt template
<a id="predict"></a>

### Create questions and contexts data

In [9]:
query1 = "What is ARPA-H?"
query2 = "What is the investment of Ford and GM to build electric vehicles?"
query3 = "What is the proposed tax rate for corporations?"
query4 = "What is Intel going to build?"
query5 = "How many new manufacturing jobs are created last year?"
query6 = "How many electric vehicle charging stations are built?"

questions = [query1 , query2, query3, query4, query5, query6]

In [10]:
contexts = []
for query in questions:
    #Retrive relevant context for each question from the vector db
    docs = docsearch.as_retriever().get_relevant_documents(query)

    context = []
    #Extract the needed information
    for doc in docs:
        context.append(doc.to_json()['kwargs']['page_content'])

    #Capture the context
    contexts.append(context)

### Construct a dataframe with question, contexts and answer to be used for metrics computation
<a id="predict"><a>

In [11]:
import pandas as pd
data = pd.DataFrame(contexts, columns=["context1", "context2", "context3", "context4"])
data["question"] = questions
data

Unnamed: 0,context1,context2,context3,context4,question
0,"Last month, I announced our plan to supercharg...",For that purpose we’ve mobilized American grou...,"If you travel 20 miles east of Columbus, Ohio,...",But cancer from prolonged exposure to burn pit...,What is ARPA-H?
1,So let’s not wait any longer. Send it to my de...,"If you travel 20 miles east of Columbus, Ohio,...",When we use taxpayer dollars to rebuild Americ...,It is going to transform America and put us on...,What is the investment of Ford and GM to build...
2,My plan will cut the cost in half for most fam...,We got more than 130 countries to agree on a g...,And unlike the $2 Trillion tax cut passed in t...,We’re going after the criminals who stole bill...,What is the proposed tax rate for corporations?
3,"If you travel 20 miles east of Columbus, Ohio,...",So let’s not wait any longer. Send it to my de...,When we use taxpayer dollars to rebuild Americ...,It is going to transform America and put us on...,What is Intel going to build?
4,So let’s not wait any longer. Send it to my de...,"If you travel 20 miles east of Columbus, Ohio,...",When we use taxpayer dollars to rebuild Americ...,"As Ohio Senator Sherrod Brown says, “It’s time...",How many new manufacturing jobs are created la...
5,So let’s not wait any longer. Send it to my de...,"If you travel 20 miles east of Columbus, Ohio,...",It is going to transform America and put us on...,Vice President Harris and I ran for office wit...,How many electric vehicle charging stations ar...


In [12]:
df_input = pd.DataFrame(data, columns=["context1", "context2", "context3", "context4", "question"])

sources = df_input[["context1", "context2", "context3", "context4", "question"]].to_dict(orient='records')

### Create the prompt template and prompt variable

In [13]:
from langchain import PromptTemplate

rag_prompt_text = """
Based on the contexts provided, answer the question. Provide the answer in a complete sentence.

{context1}

{context2}

{context3}

{context4}

Question : {question}
Answer: 
"""

rag_prompt = PromptTemplate(
    input_variables=["context1","context2","context3","context4","question"],
    template=rag_prompt_text
)

## Step 5 - Configure the `watsonx.governance` metrics
<a id="config"></a>

Configure the required metrics

In [14]:
from ibm_metrics_plugin.metrics.llm.utils.constants import LLMTextMetricGroup, LLMCommonMetrics, LLMQAMetrics, LLMRAGMetrics

# Edit below values based on the input data
context_columns = ["context1", "context2", "context3", "context4"]
question_column = "question"
answer_column = "answer"

config_json = {
            "configuration": {
                "context_columns": context_columns,
                "question_column": question_column,
                "record_level": True,
                 LLMTextMetricGroup.RAG.value: {
                    LLMCommonMetrics.FAITHFULNESS.value: {
                    },
                    LLMCommonMetrics.ANSWER_RELEVANCE.value: {
                    },
                    LLMQAMetrics.UNSUCCESSFUL_REQUESTS.value: {
                    },
                    LLMRAGMetrics.RETRIEVAL_QUALITY.value: {
                    },
                    LLMCommonMetrics.CONTENT_ANALYSIS.value: {},
                    LLMCommonMetrics.UNSUCCESSFUL_REQUESTS.value: {
                        # "unsuccessful_phrases": []
                    }
                }
            }
        }

### Create watsonx.governance client 

In [15]:
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *

authenticator = CloudPakForDataAuthenticator(
        url=credentials['url'],
        username=credentials['username'],
        apikey=credentials['apikey'],
        disable_ssl_verification=True,
    )

client = APIClient(service_url=credentials['url'],authenticator=authenticator)

client.version

print(client.version)

3.0.41


### Step 6 - Run the LLMChain to generate response and compute the watsonx.governance metrics using callback

#### Intialize LLMChain

In [16]:
from langchain.chains import LLMChain
rag_chain = LLMChain(llm=watsonx_model.to_langchain(), prompt=rag_prompt)

#### WatsonxGovCallbackHandler parameters
| Parameter | Description | Type | Default Value  |
|:-|:-|:-|:-|
| configuration* | Configuration of metrics to be evaluated | dictionary |  |
| watsonxgov_client* | watsonx client objects |  |  |
| source | The context from which the model answers the question | dictionary |  |
| reference | The reference for the response generated for the model | dictionary |  |
| record_id | record id for the record getting evaluated | string |  |
| debug | flag variable to handle the debugs during the execution | boolean | false |

In [17]:
from ibm_watson_openscale.callbacks.langchain import WatsonxGovCallbackHandler

answers=[]
record_level_metrics=[]

for input_text in sources:
    handler=WatsonxGovCallbackHandler(configuration=config_json, watsonxgov_client=client, source=input_text)
    result=rag_chain.run(input_text, callbacks=[handler])
    answers.append(result)
    record_level_metrics.append(handler.computed_metrics)

Evaluating for record 8a9dc4c0-4bde-43a8-b4e9-7809b264e18e


[nltk_data] Downloading package punkt_tab to /home/wsuser/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


Evaluating for record 195d226b-a4ea-47ee-a04c-16781c35231e
Evaluating for record a6862cef-131d-47f1-94b8-997034988785
Evaluating for record f94f253f-4b90-4fe7-a0c3-8f102e5ab344
Evaluating for record 956c43be-f7b1-42ff-a5e5-e1470c89bb5a
Evaluating for record 68c4793c-d2a0-45ae-a4c2-ba2a1e3d77c1


#### Run this cell to get the combined metrics results

In [18]:
import json
metric_result = WatsonxGovCallbackHandler.aggregate_result(record_level_metrics)
print(json.dumps(metric_result,indent=2))

{
  "answer_relevance": {
    "record_level_metrics": [
      {
        "answer_relevance": 0.3717,
        "record_id": "db482f2d-12b8-40f8-9df3-c0e35801fc90"
      },
      {
        "answer_relevance": 0.0108,
        "record_id": "abb973be-c3f5-4001-a4df-9d61b7ea3c2a"
      },
      {
        "answer_relevance": 0.5363,
        "record_id": "9f1b8cb8-9859-4a2a-8fc0-12c1dd311f0f"
      },
      {
        "answer_relevance": 0.993,
        "record_id": "a2237b85-c204-4b81-8934-d37c986db194"
      },
      {
        "answer_relevance": 0.9903,
        "record_id": "9371b36e-a165-4e54-9e14-25d7a8d53ec6"
      },
      {
        "answer_relevance": 0.0036,
        "record_id": "d79771d2-2e63-4bfa-affc-4c925b0b03dd"
      }
    ],
    "metric_value": 0.48428333333333334,
    "mean": 0.48428333333333334,
    "min": 0.0036,
    "max": 0.993,
    "std": 0.4052713470284104
  },
  "faithfulness": {
    "record_level_metrics": [
      {
        "faithfulness": 0.9847,
        "faithfulness_att

## Step 7 - Display the results <a id="results"></a>

### Metric results for all the records

In [19]:
# Display results
results_df = data.copy()
results_df['answer'] = answers
for k, v in metric_result.items():
    for rm in v.get("record_level_metrics"):
        for m, mv in rm.items():
            if m != "record_id":
                results_df[m] = [r.get(m) for r in v.get("record_level_metrics")]
results_df

Unnamed: 0,context1,context2,context3,context4,question,answer,answer_relevance,faithfulness,faithfulness_attributions,average_precision,context_relevance,context_relevances,hit_rate,ndcg,reciprocal_rank,retrieval_precision,coverage,density,abstractness,unsuccessful_requests
0,"Last month, I announced our plan to supercharg...",For that purpose we’ve mobilized American grou...,"If you travel 20 miles east of Columbus, Ohio,...",But cancer from prolonged exposure to burn pit...,What is ARPA-H?,\nA. The Advanced Research Projects Agency for...,0.3717,0.9847,[{'attributions': [{'faithfulness_scores': [0....,0.8,0.8103,"{'context_columns': ['context1', 'context2', '...",1,0.9828,1.0,0.5,0.1245,0.5062,0.125,0
1,So let’s not wait any longer. Send it to my de...,"If you travel 20 miles east of Columbus, Ohio,...",When we use taxpayer dollars to rebuild Americ...,It is going to transform America and put us on...,What is the investment of Ford and GM to build...,\n$11 billion by Ford and $7 billion,0.0108,0.4419,[{'attributions': [{'faithfulness_scores': [0....,0.0,0.2005,"{'context_columns': ['context1', 'context2', '...",0,1.0,0.0,0.0,0.0447,0.2593,0.1429,0
2,My plan will cut the cost in half for most fam...,We got more than 130 countries to agree on a g...,And unlike the $2 Trillion tax cut passed in t...,We’re going after the criminals who stole bill...,What is the proposed tax rate for corporations?,\nA 15% minimum tax rate for corporations has,0.5363,0.9056,[{'attributions': [{'faithfulness_scores': [0....,0.0,0.1992,"{'context_columns': ['context1', 'context2', '...",0,0.9997,0.0,0.0,0.0637,0.7901,0.0,0
3,"If you travel 20 miles east of Columbus, Ohio,...",So let’s not wait any longer. Send it to my de...,When we use taxpayer dollars to rebuild Americ...,It is going to transform America and put us on...,What is Intel going to build?,\nIntel is going to build a $20,0.993,0.3836,[{'attributions': [{'faithfulness_scores': [0....,1.0,0.8737,"{'context_columns': ['context1', 'context2', '...",1,1.0,1.0,0.25,0.0907,0.2812,0.0,0
4,So let’s not wait any longer. Send it to my de...,"If you travel 20 miles east of Columbus, Ohio,...",When we use taxpayer dollars to rebuild Americ...,"As Ohio Senator Sherrod Brown says, “It’s time...",How many new manufacturing jobs are created la...,"\n369,000 new manufacturing jobs are created last",0.9903,0.7552,[{'attributions': [{'faithfulness_scores': [0....,1.0,0.8313,"{'context_columns': ['context1', 'context2', '...",1,0.9971,1.0,0.25,0.032,0.3878,0.1429,0
5,So let’s not wait any longer. Send it to my de...,"If you travel 20 miles east of Columbus, Ohio,...",It is going to transform America and put us on...,Vice President Harris and I ran for office wit...,How many electric vehicle charging stations ar...,\nThe response is not accurate based on the in...,0.0036,0.042,[{'attributions': [{'faithfulness_scores': [0....,0.0,0.5984,"{'context_columns': ['context1', 'context2', '...",0,0.6561,0.0,0.0,0.0682,0.0864,0.4444,0
