# Retrieval Augmented Generation Evaluation with LangChain and KDB.AI

##### Note: This example requires KDB.AI server. Sign up for a free [KDB.AI account](https://kdb.ai/get-started).

This notebook serves as a guide to utilizing LangChain tooling for evaluating a basic Retrieval Augmented Generation (RAG) system. 

The evaluation process involves employing [LangChain's String Evaluators](https://python.langchain.com/docs/guides/evaluation/string/) to assess both conciseness and correctness. KDB.AI serves as the primary knowledge base, enabling the retrieval of semantically relevant content for the evaluation.

### Aim

In this tutorial, we build upon the retrieval augmented generation pipeline seen in our [retrieval_augmented_generation.ipynb](retrieval_augmented_generation.ipynb) notebook.
If you have not seen it, please read and understand that notebook as it will cover the setup steps of RAG in greater detail than we do here.

This notebook focuses on the evaluation of your retrieval augmented generation using KDB.AI as the vector store.
We will cover the following topics:

1. Load Text Data
1. Define OpenAI Text Emedding Model
1. Store Embeddings In KDB.AI
1. Perform Retrieval Augmented Generation
1. Evaluate Retrieval Augmented Generation
1. Delete the KDB.AI Table

---

## 0. Setup

### Install dependencies 

In order to successfully run this sample, note the following steps depending on where you are running this notebook:

-***Run Locally / Private Environment:*** The [Setup](https://github.com/KxSystems/kdbai-samples/blob/main/README.md#setup) steps in the repository's `README.md` will guide you on prerequisites and how to run this with Jupyter.


-***Colab / Hosted Environment:*** Open this notebook in Colab and run through the cells.



In [None]:
!pip install kdbai_client langchain langchain_openai #langchain-community

import os
!git clone -b KDBAI_v1.4 https://github.com/KxSystems/langchain.git
os.chdir('langchain/libs/community')
!pip install .

In [None]:
### !!! Only run this cell if you need to download the data into your environment, for example in Colab
### This downloads State of the Union Speech data
import os

if os.path.exists("./data/state_of_the_union.txt") == False:
    !mkdir ./data
    !wget -P ./data https://raw.githubusercontent.com/KxSystems/kdbai-samples/main/retrieval_augmented_generation/data/state_of_the_union.txt

### Import Packages

Load the various libraries that will be needed in this tutorial, including all the langchain libraries we will use.

In [3]:
# vector DB
from getpass import getpass
import kdbai_client as kdbai
import time

In [4]:
# langchain packages
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import KDBAI

In [5]:
# evaluation packages
from langchain.evaluation import load_evaluator

### Set API Keys

To follow this example you will need to request an [OpenAI API Key](https://platform.openai.com/apps). 

You can create this for free by registering using the links provided.
Once you have the credentials you can add them below.

In [6]:
os.environ["OPENAI_API_KEY"] = (
    os.environ["OPENAI_API_KEY"]
    if "OPENAI_API_KEY" in os.environ
    else getpass("OpenAI API Key: ")
)

### Define Helper Functions

In [7]:
def print_dict(d: dict) -> None:
    for k, v in d.items():
        print(f"\n{k.capitalize()}\n---\n{v}".replace('\n\n', '\n'))

## 1. Load Text Data

### Read In Text Document

The document we will use for this examples is a State of the Union message from the President of the United States to the United States Congress.

In the below code snippet, we read the text file in.

In [8]:
# Load the documents we want to prompt an LLM about
doc = TextLoader("data/state_of_the_union.txt").load()

### Split The Document Into Chunks

We then split this document into chunks.

In [9]:
# Chunk the documents into 500 character chunks using langchain's text splitter "RucursiveCharacterTextSplitter"
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)

In [10]:
# split_documents produces a list of all the chunks created, printing out first chunk for example
pages = [p.page_content for p in text_splitter.split_documents(doc)]

## 2. Define OpenAI Text Embedding Model
 
We will use OpenAIEmbeddings to embed our document into a format suitable for the vector database. We select `text-embedding-ada-002` for use in the next step.

In [11]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

## 3. Store Embeddings In KDB.AI

With the embeddings created, we need to store them in a vector database to enable efficient searching.

### Define KDB.AI Session
To use KDB.AI Server, you will need download and run your own container.
To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/).

You will receive an email with the required license file and bearer token needed to download your instance.
Follow instructions in the signup email to get your session up and running.

Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint.


In [None]:
#Set up KDB.AI server endpoint 
KDBAI_ENDPOINT = (
    os.environ["KDBAI_ENDPOINT"]
    if "KDBAI_ENDPOINT" in os.environ
    else "http://localhost:8082"
)

#connect to KDB.AI Server, default mode is qipc
session = kdbai.Session(endpoint=KDBAI_ENDPOINT)


### Define Vector DB Table Schema

In [15]:
rag_eval_schema = [
    {"name": "id", "type": "str"},
    {"name": "text", "type": "bytes"},
    {"name": "embeddings", "type": "float32s"}
]
indexes = [{"name": "flat_index", "type": "flat", "column": "embeddings", "params": {"dims": 1536, "metric": "L2"}}]

### Create Vector DB Table

Use the KDB.AI `create_table` function to create a table that matches the defined schema in the vector database.

In [16]:
database = session.database("default")
# First ensure the table does not already exist
try:
    database.table("rag_eval").drop()
except kdbai.KDBAIException:
    pass

In [17]:
table = database.create_table("rag_eval", schema=rag_eval_schema, indexes=indexes)

### Add Embedded Data to KDB.AI Table

We can now store our data in KDB.AI by passing a few parameters to `KDBAI.from_texts`:

- `session` our handle to talk to KDB.AI
- `table_name` our KDB.AI table name
- `texts` the chunked document 
- `embeddings` the embeddings model we have chosen 

In [18]:
# use KDBAI as vector store
vecdb_kdbai = KDBAI(table, embeddings)
vecdb_kdbai.add_texts(texts=pages)

['3a39deab-3ca1-457b-a725-192878e2ef3e',
 'b9e62100-9bf3-4d66-9066-e546505346a8',
 'baf3fbeb-1997-4eec-be6a-faec7431380e',
 'de7c3272-38e3-48e0-bbc6-1240cb639430',
 'ee1984d6-3be6-4e16-bb54-8a0d4ba80e36',
 'ac042a18-cb3e-4369-bd7a-114fd77b938a',
 'e4842ca7-d965-44d2-a20e-a23c8710c469',
 'd574a79a-2cdc-41e5-bc5b-56de4796a2da',
 '67f4e777-37e4-4cb6-a2b5-614218593a2a',
 'e3656ced-e86b-400d-8f5b-d706ece9dd70',
 '47e20998-cadc-4d5e-8514-0221821921f6',
 'cf8c9905-dd1b-48e0-8cfb-b3e6cfe41649',
 '9f58e0e9-bd90-4ae4-9961-798b312467c4',
 '2da5b147-45dd-4325-977b-26b9e9c825f0',
 '6744caee-09d6-4aee-a2d0-017c568cbbfd',
 '434d9d87-0e35-4aa7-8ffc-db4bfbc90f16',
 '4c2fc45d-c631-4856-b8ab-61af03d3c41a',
 '536d2df1-40f9-4dab-be85-28b4cb6f79af',
 '750b3adc-17d2-4d6f-836e-1a6382ad2ff2',
 '0b27e61e-638f-442a-9b92-752eb99cf67d',
 'fff6c3a1-94ee-4d36-90a3-42f5405f52ff',
 'a3779604-81bf-4267-aae7-e38c97577ed9',
 '6539774a-07c3-4529-a3e3-f71ee58d4667',
 '0db73c13-5b88-4a48-a8d8-7a94777c3470',
 '8b25f891-30d1-

Now we have the vector embeddings stored in KDB.AI we are ready to query.

## 4. Perform Retrieval Augmented Generation

We will perform [question answering (QA) in LangChain](https://python.langchain.com/docs/use_cases/question_answering/#go-deeper-4) using `RetrievalQA`.

`RetrievalQA` retrieves the most relevant chunk of text and does QA on that subset.
We will use KDB.AI as the retriever of `RetrievalQA`.

### Define QA Bot

The code below defines a question-answering bot that combines OpenAI's GPT-4o-mini for generating responses and a retriever that accesses the KDB.AI vector database to find relevant information.

In [19]:
K = 10

In [20]:
qabot = RetrievalQA.from_chain_type(
    chain_type="stuff",
    llm=ChatOpenAI(model="gpt-4o-mini", temperature=0.0),
    retriever=vecdb_kdbai.as_retriever(search_kwargs=dict(k=K, index="flat_index")),
    return_source_documents=True,
)

`as_retriever` is a method that converts a vectorstore into a retriever. A retriever is an interface that returns documents given an unstructured query. By using <code>as_retriever</code>, we can create a retriever from a vectorstore and use it to retrieve relevant documents for a query. This allows us to perform question answering over the documents indexed by the vectorstore `vecdb_kdbai`.

### Query The QA Bot

In [21]:
def query_qabot(qabot, query: str) -> str:
    query_res = qabot.invoke(dict(query=query))["result"]
    print(f"{query}\n---\n{query_res}")
    return query_res

##### Query 1

In [22]:
query1 = "What improvements could be made in infrastructure?"

In [23]:
res1 = query_qabot(qabot, query1)

What improvements could be made in infrastructure?
---
Improvements that could be made in infrastructure include:

1. Rebuilding and modernizing roads, highways, and bridges to ensure safety and efficiency.
2. Expanding and upgrading public transportation systems to provide better access and reduce congestion.
3. Developing a national network of electric vehicle charging stations to support the transition to electric vehicles.
4. Replacing lead pipes to ensure clean drinking water for all Americans.
5. Providing affordable high-speed internet access to urban, suburban, rural, and tribal communities.
6. Upgrading airports, ports, and waterways to enhance transportation and trade capabilities.
7. Implementing sustainable practices to withstand the effects of climate change and promote environmental justice. 

These improvements aim to enhance the overall infrastructure and support economic growth and competitiveness.


##### Query 2

In [24]:
query2 = "How many jobs were created in the country due the electric vehicle manufacturing industry?"

In [25]:
res2 = query_qabot(qabot, query2)

How many jobs were created in the country due the electric vehicle manufacturing industry?
---
Ford is creating 11,000 jobs and GM is creating 4,000 jobs in the electric vehicle manufacturing industry, which totals 15,000 jobs.


## 5. Evaluate Retrieval Augmented Generation

Here we will carry out two evaluation techniques against the results of our retrieval augmented generation pipeline.
We will measure the *Conciseness* and the *Correctness* of the answers.

### Evaluate Conciseness

We will evaluate the conciseness of the answers the QA bot returns using LangChain's `load_evaluator` function with the `criteria` set to `"conciseness"`.

In this example, we use GPT-4o as the LLM that performs the evaluation.

In [26]:
evaluation_llm = ChatOpenAI(model="gpt-4o")

In [27]:
concise_evaluator = load_evaluator(
    "criteria", criteria="conciseness", llm=evaluation_llm
)

In [28]:
concise_eval_res = concise_evaluator.evaluate_strings(prediction=res1, input=query1)

In [29]:
print_dict(concise_eval_res)


Reasoning
---
To determine if the submission meets the criterion of conciseness, we need to assess whether it is brief and to the point. Here is a step-by-step reasoning process:
1. **Identify Key Points**: The submission lists seven specific improvements that could be made in infrastructure:
   - Rebuilding and modernizing roads, highways, and bridges.
   - Expanding and upgrading public transportation systems.
   - Developing a national network of electric vehicle charging stations.
   - Replacing lead pipes.
   - Providing affordable high-speed internet access.
   - Upgrading airports, ports, and waterways.
   - Implementing sustainable practices.
2. **Examine Each Point for Brevity**:
   - Each point is presented in a single sentence.
   - The points are specific and avoid unnecessary elaboration.
3. **Overall Length and Focus**:
   - The list format helps in making the submission concise.
   - The concluding sentence summarizes the purpose of the improvements succinctly: "These i

### Evaluate Correctness

We can use the same `load_evaluator` function to calculate correctness by simply changing the `criteria` to `"correctness"`.

When using this option, we can pass a reference for the evaluator to check the correctness against.
Let's pass a reference that matches the information returned as well as one that doesn't.

For this evaluation, we will use the result of the second query we ran through our RAG pipeline.

In [30]:
correct_evaluator = load_evaluator(
    "labeled_criteria",
    criteria="correctness",
    llm=evaluation_llm,
    requires_reference=True,
)

##### Matching Reference

In [31]:
matching_ref = "15000 jobs were created due to manufacturing of electric vehicles."

In [32]:
correct_eval_res1 = correct_evaluator.evaluate_strings(
    prediction=res2, input=query2, reference=matching_ref
)

In [33]:
print_dict(correct_eval_res1)


Reasoning
---
Step-by-step reasoning:
1. **Correctness**: 
   - The submission states that Ford is creating 11,000 jobs and GM is creating 4,000 jobs, which totals 15,000 jobs.
   - The reference states that 15,000 jobs were created due to the manufacturing of electric vehicles.
2. **Accuracy**:
   - The total number of jobs mentioned in the submission (15,000) matches the reference number (15,000). 
   - The specific companies mentioned (Ford and GM) and their respective job creation numbers (11,000 and 4,000) add up correctly to 15,000 jobs.
3. **Factuality**:
   - There is no conflicting information between the submission and the reference.
   - The details provided about the companies (Ford and GM) are not disputed by the reference, and since the total number aligns, it can be considered factual.
Conclusion:
- Since the submission correctly totals 15,000 jobs, which matches the reference, and there are no inaccuracies or factual errors, the submission meets the criteria.
Y

Value


##### Contradictory Reference

In [34]:
contractic_ref = "12000 jobs were created due to manufacturing of electric vehicles."

In [35]:
correct_eval_res2 = correct_evaluator.evaluate_strings(
    prediction=res2, input=query2, reference=contractic_ref
)

In [36]:
print_dict(correct_eval_res2)


Reasoning
---
First, I will assess the submission based on the criterion of correctness, which includes accuracy and factuality. Here is the step-by-step reasoning:
1. **Correctness**:
   - The submission states that Ford is creating 11,000 jobs and GM is creating 4,000 jobs in the electric vehicle manufacturing industry, totaling 15,000 jobs.
   - The reference data indicates that 12,000 jobs were created due to the manufacturing of electric vehicles.
   - There is a discrepancy between the submission and the reference data. The submission claims a total of 15,000 jobs, whereas the reference data states 12,000 jobs.
   - Since the submission's total (15,000 jobs) does not match the reference data (12,000 jobs), it is not factually correct.
Given this analysis, the submission does not meet the criterion of correctness as it provides an inaccurate total number of jobs created.
Therefore, the answer is:
N

Value
---
N

Score
---
0


## 6. Delete the KDB.AI Table

Once finished with the table, it is best practice to drop it.

In [37]:
table.drop()

## Take Our Survey

We hope you found this sample helpful! Your feedback is important to us, and we would appreciate it if you could take a moment to fill out our brief survey. Your input helps us improve our content.

[**Take the Survey**](https://delighted.com/t/dgCLUkdx)