# Retrieval Augmented Generation Evaluation with LangChain and KDB.AI

This notebook serves as a guide to utilizing LangChain tooling for evaluating a basic Retrieval Augmented Generation (RAG) system. 

The evaluation process involves employing [LangChain's String Evaluators](https://python.langchain.com/docs/guides/evaluation/string/) to assess both conciseness and correctness. KDB.AI serves as the primary knowledge base, enabling the retrieval of semantically relevant content for the evaluation.

### Aim

In this tutorial, we build upon the retrieval augmented generation pipeline seen in our [retrieval_augmented_generation.ipynb](retrieval_augmented_generation.ipynb) notebook.
If you have not seen it, please read and understand that notebook as it will cover the setup steps of RAG in greater detail than we do here.

This notebook focuses on the evaluation of your retrieval augmented generation using KDB.AI as the vector store.
We will cover the following topics:

1. Load Text Data
1. Define OpenAI Text Emedding Model
1. Store Embeddings In KDB.AI
1. Perform Retrieval Augmented Generation
1. Evaluate Retrieval Augmented Generation
1. Delete the KDB.AI Table

---

## 0. Setup

### Install dependencies 

In order to successfully run this sample, the [Setup](https://github.com/KxSystems/kdbai-samples/blob/main/README.md#setup) steps in the repository's `README.md` file must be completed.
This will ensure that you have installed all of the relevant packages and versions needed for this sample.
If you have not completed these setup steps, please navigate to the repositories `README.md` file and follow the steps detailed there.

### Import Packages

Load the various libraries that will be needed in this tutorial, including all the langchain libraries we will use.

In [None]:
!pip install kdbai_client langchain langchain_openai

In [None]:
### !!! Only run this cell if running the notebook in Colab
### This downloads state of the union speech into Colab
!mkdir ./data 
!wget -P ./data https://raw.githubusercontent.com/KxSystems/kdbai-samples/main/retrieval_augmented_generation/data/state_of_the_union.txt

In [1]:
# vector DB
import os
from getpass import getpass
import kdbai_client as kdbai
import time

In [2]:
# langchain packages
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import KDBAI

In [3]:
# evaluation packages
from langchain.evaluation import load_evaluator

### Set API Keys

To follow this example you will need to request an [OpenAI API Key](https://platform.openai.com/apps). 

You can create this for free by registering using the links provided.
Once you have the credentials you can add them below.

In [4]:
os.environ["OPENAI_API_KEY"] = (
    os.environ["OPENAI_API_KEY"]
    if "OPENAI_API_KEY" in os.environ
    else getpass("OpenAI API Key: ")
)

### Define Helper Functions

In [5]:
def print_dict(d: dict) -> None:
    for k, v in d.items():
        print(f"\n{k.capitalize()}\n---\n{v}".replace('\n\n', '\n'))

## 1. Load Text Data

### Read In Text Document

The document we will use for this examples is a State of the Union message from the President of the United States to the United States Congress.

In the below code snippet, we read the text file in.

In [6]:
# Load the documents we want to prompt an LLM about
doc = TextLoader("data/state_of_the_union.txt").load()

### Split The Document Into Chunks

We then split this document into chunks.

In [7]:
# Chunk the documents into 500 character chunks using langchain's text splitter "RucursiveCharacterTextSplitter"
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)

In [8]:
# split_documents produces a list of all the chunks created, printing out first chunk for example
pages = [p.page_content for p in text_splitter.split_documents(doc)]

## 2. Define OpenAI Text Embedding Model
 
We will use OpenAIEmbeddings to embed our document into a format suitable for the vector database. We select `text-embedding-ada-002` for use in the next step.

In [9]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

## 3. Store Embeddings In KDB.AI

With the embeddings created, we need to store them in a vector database to enable efficient searching.

### Define KDB.AI Session

KDB.AI comes in two offerings:

1. [KDB.AI Cloud](https://trykdb.kx.com/kdbai/signup/) - For experimenting with smaller generative AI projects with a vector database in our cloud.
2. [KDB.AI Server](https://trykdb.kx.com/kdbaiserver/signup/) - For evaluating large scale generative AI applications on-premises or on your own cloud provider.

Depending on which you use there will be different setup steps and connection details required.

##### Option 1. KDB.AI Cloud

To use KDB.AI Cloud, you will need two session details - a URL endpoint and an API key.
To get these you can sign up for free [here](https://trykdb.kx.com/kdbai/signup).

You can connect to a KDB.AI Cloud session using `kdbai.Session` and passing the session URL endpoint and API key details from your KDB.AI Cloud portal.

If the environment variables `KDBAI_ENDPOINTS` and `KDBAI_API_KEY` exist on your system containing your KDB.AI Cloud portal details, these variables will automatically be used to connect.
If these do not exist, it will prompt you to enter your KDB.AI Cloud portal session URL endpoint and API key details.

In [10]:
KDBAI_ENDPOINT = (
    os.environ["KDBAI_ENDPOINT"]
    if "KDBAI_ENDPOINT" in os.environ
    else input("KDB.AI endpoint: ")
)
KDBAI_API_KEY = (
    os.environ["KDBAI_API_KEY"]
    if "KDBAI_API_KEY" in os.environ
    else getpass("KDB.AI API key: ")
)

In [11]:
session = kdbai.Session(api_key=KDBAI_API_KEY, endpoint=KDBAI_ENDPOINT)

##### Option 2. KDB.AI Server

To use KDB.AI Server, you will need download and run your own container.
To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/). 

You will receive an email with the required license file and bearer token needed to download your instance.
Follow instructions in the signup email to get your session up and running.

Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint.

In [12]:
# session = kdbai.Session(endpoint="http://localhost:8082")

### Define Vector DB Table Schema

In [13]:
rag_eval_schema = {
    "columns": [
        {"name": "id", "pytype": "str"},
        {"name": "text", "pytype": "bytes"},
        {
            "name": "embeddings",
            "pytype": "float32",
            "vectorIndex": {"dims": 1536, "metric": "L2", "type": "flat"},
        },
    ]
}

### Create Vector DB Table

Use the KDB.AI `create_table` function to create a table that matches the defined schema in the vector database.

In [14]:
# First ensure the table does not already exist
try:
    session.table("rag_eval").drop()
    time.sleep(5)
except kdbai.KDBAIException:
    pass

In [15]:
table = session.create_table("rag_eval", rag_eval_schema)

### Add Embedded Data to KDB.AI Table

We can now store our data in KDB.AI by passing a few parameters to `KDBAI.from_texts`:

- `session` our handle to talk to KDB.AI
- `table_name` our KDB.AI table name
- `texts` the chunked document 
- `embeddings` the embeddings model we have chosen 

In [16]:
# use KDBAI as vector store
vecdb_kdbai = KDBAI(table, embeddings)
vecdb_kdbai.add_texts(texts=pages)

Now we have the vector embeddings stored in KDB.AI we are ready to query.

## 4. Perform Retrieval Augmented Generation

We will perform [question answering (QA) in LangChain](https://python.langchain.com/docs/use_cases/question_answering/#go-deeper-4) using `RetrievalQA`.

`RetrievalQA` retrieves the most relevant chunk of text and does QA on that subset.
We will use KDB.AI as the retriever of `RetrievalQA`.

### Define QA Bot

The code below defines a question-answering bot that combines OpenAI's GPT-3.5 Turbo for generating responses and a retriever that accesses the KDB.AI vector database to find relevant information.

In [17]:
K = 10

In [18]:
qabot = RetrievalQA.from_chain_type(
    chain_type="stuff",
    llm=ChatOpenAI(model="gpt-3.5-turbo-16k", temperature=0.0),
    retriever=vecdb_kdbai.as_retriever(search_kwargs=dict(k=K)),
    return_source_documents=True,
)

`as_retriever` is a method that converts a vectorstore into a retriever. A retriever is an interface that returns documents given an unstructured query. By using <code>as_retriever</code>, we can create a retriever from a vectorstore and use it to retrieve relevant documents for a query. This allows us to perform question answering over the documents indexed by the vectorstore `vecdb_kdbai`.

### Query The QA Bot

In [19]:
def query_qabot(qabot, query: str) -> str:
    query_res = qabot.invoke(dict(query=query))["result"]
    print(f"{query}\n---\n{query_res}")
    return query_res

##### Query 1

In [20]:
query1 = "What improvements could be made in infrastructure?"

In [21]:
res1 = query_qabot(qabot, query1)

What improvements could be made in infrastructure?
---
Some improvements that could be made in infrastructure include:

1. Rebuilding and repairing roads, bridges, and highways that are in disrepair.
2. Building a national network of 500,000 electric vehicle charging stations.
3. Replacing poisonous lead pipes to ensure clean water for every American.
4. Providing affordable high-speed internet access for all Americans, including urban, suburban, rural, and tribal communities.
5. Modernizing airports, ports, and waterways.
6. Investing in renewable energy production, such as solar and wind, to promote clean energy and reduce reliance on fossil fuels.
7. Weatherizing homes and businesses to improve energy efficiency and reduce costs.
8. Investing in emerging technologies and American manufacturing to compete with global competitors like China.
9. Ensuring that infrastructure projects are made in America, supporting domestic manufacturing and supply chains.
10. Increasing investments in 

##### Query 2

In [22]:
query2 = "How many jobs were created in the country due the electric vehicle manufacturing industry?"

In [23]:
res2 = query_qabot(qabot, query2)

How many jobs were created in the country due the electric vehicle manufacturing industry?
---
The passage states that Ford is investing $11 billion to build electric vehicles, creating 11,000 jobs across the country. Additionally, GM is making the largest investment in its history—$7 billion to build electric vehicles, creating 4,000 jobs in Michigan. Therefore, a total of 15,000 jobs were created in the country due to the electric vehicle manufacturing industry mentioned in the passage.


## 5. Evaluate Retrieval Augmented Generation

Here we will carry out two evaluation techniques against the results of our retrieval augmented generation pipeline.
We will measure the *Conciseness* and the *Correctness* of the answers.

### Evaluate Conciseness

We will evaluate the conciseness of the answers the QA bot returns using LangChain's `load_evaluator` function with the `criteria` set to `"conciseness"`.

In this example, we use GPT-4 as the LLM that performs the evaluation.

In [24]:
evaluation_llm = ChatOpenAI(model="gpt-4")

In [25]:
concise_evaluator = load_evaluator(
    "criteria", criteria="conciseness", llm=evaluation_llm
)

In [26]:
concise_eval_res = concise_evaluator.evaluate_strings(prediction=res1, input=query1)

In [27]:
print_dict(concise_eval_res)


Reasoning
---
The criterion for assessment is the conciseness of the submitted answer. 
The submission gives a list of ten potential improvements to infrastructure. Each suggestion is fairly concise, providing a brief explanation of the proposed improvement without unnecessary elaboration or tangents. 
However, the submission does include an introductory sentence and a concluding sentence that add some length. The conclusion, in particular, adds a bit of extra information about the potential for other improvements depending on regional needs.
This additional information could be seen as unnecessary, but it also provides context and acknowledges the complexity of infrastructure improvements, which could be seen as enhancing the quality of the response rather than detracting from its conciseness.
Overall, while not the briefest possible response, the submission is fairly concise and to the point. Each suggested improvement is described in a single, succinct sentence, and the overall res

### Evaluate Correctness

We can use the same `load_evaluator` function to calculate correctness by simply changing the `criteria` to `"correctness"`.

When using this option, we can pass a reference for the evaluator to check the correctness against.
Let's pass a reference that matches the information returned as well as one that doesn't.

For this evaluation, we will use the result of the second query we ran through our RAG pipeline.

In [28]:
correct_evaluator = load_evaluator(
    "labeled_criteria",
    criteria="correctness",
    llm=evaluation_llm,
    requires_reference=True,
)

##### Matching Reference

In [29]:
matching_ref = "15000 jobs were created due to manufacturing of electric vehicles."

In [30]:
correct_eval_res1 = correct_evaluator.evaluate_strings(
    prediction=res2, input=query2, reference=matching_ref
)

In [31]:
print_dict(correct_eval_res1)


Reasoning
---
First, we need to assess the correctness of the submission, according to the criteria.
The input asks about the number of jobs created in the country due to the electric vehicle manufacturing industry.
The submission provides a detailed answer, stating that Ford and GM's investments in electric vehicles have created a total of 15,000 jobs across the country.
Comparing this to the reference, which states that 15,000 jobs were created due to the manufacturing of electric vehicles, it's clear that the submission is accurate and factual.
Therefore, the submission meets the criteria.
Y

Value
---
Y

Score
---
1


##### Contradictory Reference

In [32]:
contractic_ref = "12000 jobs were created due to manufacturing of electric vehicles."

In [33]:
correct_eval_res2 = correct_evaluator.evaluate_strings(
    prediction=res2, input=query2, reference=contractic_ref
)

In [34]:
print_dict(correct_eval_res2)


Reasoning
---
The criteria for this task is correctness: Is the submission correct, accurate, and factual?
Looking at the submission, the answer provided is that 15,000 jobs were created due to the electric vehicle manufacturing industry. This is based on the data provided in the submission that Ford created 11,000 jobs and GM created 4,000 jobs.
The reference data, however, states that 12,000 jobs were created due to the manufacturing of electric vehicles.
Since the submission and the reference data do not match, it appears that the submission does not meet the criteria of correctness. The submission's answer is not accurate according to the reference data.
N

Value
---
N

Score
---
0


## 6. Delete the KDB.AI Table

Once finished with the table, it is best practice to drop it.

In [35]:
table.drop()

## Take Our Survey

We hope you found this sample helpful! Your feedback is important to us, and we would appreciate it if you could take a moment to fill out our brief survey. Your input helps us improve our content.

[**Take the Survey**](https://delighted.com/t/dgCLUkdx)