# Retrieval Augmented Generation (RAG) with LangChain

##### Note: This example requires a KDB.AI endpoint and API key. Sign up for a free [KDB.AI account](https://kdb.ai/get-started).

This example will demonstrate how to use an advanced prompt engineering technique called Retrieval Augmented Generation (RAG), with hands-on examples using Langchain, KDB.AI and various LLMs.

### What is RAG and Why Do We Need it?

Large Language Models have remarkable capabilities in generating human-like text. These models are found in applications ranging from chatbots to content generation and translation. However, they face a significant challenge in staying up-to-date with recent world events, as they are essentially frozen in time, operating within the static knowledge snapshot captured during their training.

To bridge this gap and address the need for specialized, real-time information, the concept of "Retrieval Augmented Generation" (RAG) has emerged as a powerful solution. RAG enables these language models to access relevant data from external knowledge bases, enriching their responses with current and contextually accurate information. For more content on RAG you can check out our videos on [Youtube](https://www.youtube.com/@KxSystems/streams) where we discuss the best practices for RAG, chunking strategies, the variety of approaches as well as how to evaluate your RAG application.

### Aim

In this tutorial, we'll cover:

1. Load Text Data
1. Define OpenAI Text Emedding Model
1. Store Embeddings In KDB.AI
1. Search For Similar Documents To A Given Query
1. Perform Retrieval Augmented Generation
1. Delete the KDB.AI Table

---

## 0. Setup

### Install dependencies 

In order to successfully run this sample, note the following steps depending on where you are running this notebook:

-***Run Locally / Private Environment:*** The [Setup](https://github.com/KxSystems/kdbai-samples/blob/main/README.md#setup) steps in the repository's `README.md` will guide you on prerequisites and how to run this with Jupyter.


-***Colab / Hosted Environment:*** Open this notebook in Colab and run through the cells.

### Import Packages

Load the various libraries that will be needed in this tutorial, including all the langchain libraries we will use.

In [None]:
!pip install kdbai_client langchain langchain_openai langchain-huggingface #langchain-community 

import os
!git clone -b KDBAI_v1.4 https://github.com/KxSystems/langchain.git
os.chdir('langchain/libs/community')
!pip install .

In [None]:
### !!! Only run this cell if you need to download the data into your environment, for example in Colab
### This downloads State of the Union speech data
import os

if os.path.exists("./data/state_of_the_union.txt") == False:
    !mkdir ./data
    !wget -P ./data https://raw.githubusercontent.com/KxSystems/kdbai-samples/main/retrieval_augmented_generation/data/state_of_the_union.txt

In [45]:
# vector DB
from getpass import getpass
import kdbai_client as kdbai
import time

In [46]:
# langchain packages
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import KDBAI
from langchain import HuggingFaceHub
from langchain_openai import OpenAI
from langchain.chains.question_answering import load_qa_chain
from langchain_huggingface import HuggingFaceEndpoint

### Set API Keys

To follow this example you will need to request both an [OpenAI API Key](https://platform.openai.com/apps) and a [Hugging Face API Token](https://huggingface.co/docs/hub/security-tokens). 

You can create both for free by registering using the links provided. Once you have the credentials you can add them below.

In [48]:
os.environ["OPENAI_API_KEY"] = (
    os.environ["OPENAI_API_KEY"]
    if "OPENAI_API_KEY" in os.environ
    else getpass("OpenAI API Key: ")
)

In [49]:
os.environ["HUGGINGFACEHUB_API_TOKEN"] = (
    os.environ["HUGGINGFACEHUB_API_TOKEN"]
    if "HUGGINGFACEHUB_API_TOKEN" in os.environ
    else getpass("Hugging Face API Token: ")
)

## 1. Load Text Data

### Read In Text Document

The document we will use for this examples is a State of the Union message from the President of the United States to the United States Congress.

In the below code snippet, we read the text file in.

In [50]:
# Load the documents we want to prompt an LLM about
doc = TextLoader("data/state_of_the_union.txt").load()

### Split The Document Into Chunks

We then split this document into chunks.

In [51]:
# Chunk the documents into 500 character chunks using langchain's text splitter "RucursiveCharacterTextSplitter"
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)

In [52]:
# split_documents produces a list of all the chunks created, printing out first chunk for example
pages = [p.page_content for p in text_splitter.split_documents(doc)]

In [53]:
pages[0]

'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny.'

## 2. Define OpenAI Text Embedding Model
 
We will use OpenAIEmbeddings to embed our document into a format suitable for the vector database. We select `text-embedding-3-small` for use in the next step.

In [54]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

## 3. Store Embeddings In KDB.AI

With the embeddings created, we need to store them in a vector database to enable efficient searching.

### Define KDB.AI Session

KDB.AI comes in two offerings:

1. [KDB.AI Cloud](https://trykdb.kx.com/kdbai/signup/) - For experimenting with smaller generative AI projects with a vector database in our cloud.
2. [KDB.AI Server](https://trykdb.kx.com/kdbaiserver/signup/) - For evaluating large scale generative AI applications on-premises or on your own cloud provider.

Depending on which you use there will be different setup steps and connection details required.

##### Option 1. KDB.AI Cloud

To use KDB.AI Cloud, you will need two session details - a URL endpoint and an API key.
To get these you can sign up for free [here](https://trykdb.kx.com/kdbai/signup).

You can connect to a KDB.AI Cloud session using `kdbai.Session` and passing the session URL endpoint and API key details from your KDB.AI Cloud portal.

If the environment variables `KDBAI_ENDPOINTS` and `KDBAI_API_KEY` exist on your system containing your KDB.AI Cloud portal details, these variables will automatically be used to connect.
If these do not exist, it will prompt you to enter your KDB.AI Cloud portal session URL endpoint and API key details.

In [55]:
KDBAI_ENDPOINT = (
    os.environ["KDBAI_ENDPOINT"]
    if "KDBAI_ENDPOINT" in os.environ
    else input("KDB.AI endpoint: ")
)
KDBAI_API_KEY = (
    os.environ["KDBAI_API_KEY"]
    if "KDBAI_API_KEY" in os.environ
    else getpass("KDB.AI API key: ")
)

In [None]:
session = kdbai.Session(api_key=KDBAI_API_KEY, endpoint=KDBAI_ENDPOINT)

##### Option 2. KDB.AI Server

To use KDB.AI Server, you will need download and run your own container.
To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/). 

You will receive an email with the required license file and bearer token needed to download your instance.
Follow instructions in the signup email to get your session up and running.

Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint.

In [57]:
# session = kdbai.Session(endpoint="http://localhost:8082")

### Define Vector DB Table Schema

In [58]:
rag_schema = [
    {"name": "id", "type": "str"},
    {"name": "text", "type": "bytes"},
    {"name": "embeddings", "type": "float32s"},
]

indexes = [{'name': 'flat_index', 'column': 'embeddings', 'type': 'flat', 'params': {"dims": 1536, "metric": "L2"}}]

### Create Vector DB Table

Use the KDB.AI `create_table` function to create a table that matches the defined schema in the vector database.

In [59]:
database = session.database("default")

# First ensure the table does not already exist
try:
    database.table("rag_langchain").drop()
except kdbai.KDBAIException:
    pass

In [60]:
table = database.create_table("rag_langchain", schema=rag_schema, indexes=indexes)

### Add Embedded Data to KDB.AI Table

We can now store our data in KDB.AI by passing a few parameters to `KDBAI.add_texts`:

- `session` our handle to talk to KDB.AI
- `table_name` our KDB.AI table name
- `texts` the chunked document 
- `embeddings` the embeddings model we have chosen 

In [61]:
# use KDBAI as vector store
vecdb_kdbai = KDBAI(table, embeddings)
vecdb_kdbai.add_texts(texts=pages)

['1f71fa4d-b876-4ede-9c2f-a2feab358e0f',
 '617e8958-bd38-4a0c-b0da-c85e939a6d5a',
 '45745afe-596d-4b4a-b6f5-ca380cbb17bb',
 '005773e8-4410-4ac1-b703-d69a944673aa',
 'c8befd91-211b-4f03-88e3-ebca313a19a0',
 '51044021-f668-410c-848d-d587d124843f',
 '4ec0daf2-4a3c-4e6a-ac7b-7a7611dd80a6',
 'ee9fe9bc-10e6-4ed5-a10f-3b5400b1bd6f',
 'e62fd355-d776-4ba0-a94f-6fb757576e4a',
 '77bd54db-51e8-4c42-80fa-4961431b2c3a',
 '4955a466-2e4d-45b3-b349-b8cba5e0b958',
 'e8467e1e-c977-4096-8366-1e2104bdf8f2',
 'eac91746-237d-420e-9933-b2a7bc94e0ff',
 '05db44c8-4aef-4514-be16-17c08fa81691',
 '1b89fd3d-523e-4596-8386-8d75de457162',
 '5eedfebf-e420-4543-a3d5-b9b72987c372',
 '9821af3d-0e02-45ee-8431-4489b2680043',
 '76aa0e0c-bfc3-4a9c-9208-4304e649451d',
 '62578d27-54bf-4aaa-9cca-07e19a216710',
 '76bdc65e-f5d4-4ab3-9a41-5a30bd39b62e',
 '2f80e016-3962-45b8-85ed-4eadb1016eb6',
 'f25ff936-dba3-4c4b-9bdf-f8db4bb1889b',
 '27630a25-ef27-445a-a6da-4545e84a8f78',
 '10558419-924f-41b7-ac37-cc85fe55c04e',
 'ce28729c-1f8b-

Now we have the vector embeddings stored in KDB.AI we are ready to query.

## 4. Search For Similar Documents To A Given Query 

Before we implement RAG, let's see an example of using similarity search directly on KDB.AI vector store. The search uses Euclidean similarity search which measures distance between two points in vector space.

In [62]:
query = "what are the nations strengths?"

In [63]:
# query_sim holds results of the similarity search, the closest related chunks to the query.
query_sim = vecdb_kdbai.similarity_search(query, index='flat_index')

In [64]:
query_sim

[Document(page_content='We are the only nation on Earth that has always turned every crisis we have faced into an opportunity. \n\nThe only nation that can be defined by a single word: possibilities. \n\nSo on this night, in our 245th year as a nation, I have come to report on the State of the Union. \n\nAnd my report is this: the State of the Union is strong—because you, the American people, are strong. \n\nWe are stronger today than we were a year ago. \n\nAnd we will be stronger a year from now than we are today.', metadata={'id': '23b9b1af-ca12-41c4-b62c-eac095fe2b6e', 'embeddings': array([ 0.03127478,  0.0130375 ,  0.04139533, ..., -0.01439452,
        -0.01379844, -0.0240585 ], dtype=float32)})]

This result returns the most similar chunks of text to our query, which is an okay start but it is hard to read. It would be a lot better if we could summarize these findings and return a response that is more human readable - this is where RAG comes in!

## 5. Perform Retrieval Augmented Generation

There are four different ways to do [question answering (QA) in LangChain](https://python.langchain.com/docs/use_cases/question_answering/#go-deeper-4):
- `load_qa_chain` will do QA over all documents passed every time it is called. It is simple and comprehensive, but can be slower and less efficient than `RetrievalQA` as it may not focus on the most relevant parts of the tests for the question. In one example below, we will perform similarity search with KDB.AI before using `load_qa_chain` to act upon "all documents" being passed.
- `RetrievalQA` retrieves the most relevant chunk of text and does QA on that subset. It uses `load_qa_chain` under the hood on each chunk and is faster and more efficient then the vanilla `load_qa_chain`. These performance gains come at the risk of losing some information or context from the documents as it may not always find the best text chunks for the question. In one example below, we will use KDB.AI as the retriever of `RetrievalQA`.
- `VectorstoreIndexCreator` is a higher level wrapper for `RetrievalQA` to make it easier to run in fewer lines of code
- `ConversationalRetrievalChain` builds on RetrievalQAChain to provide a chat history component

In this tutorial we will implement the first two.

### 'load_qa_chain' with OpenAI and HuggingFace LLMs

We set up two question-answering chains for different models, OpenAI and HuggingFaceHub, using LangChain's `load_qa_chain` function. To do this we first perform the same similarity search run earlier and then run both chains on the query and the related chunks from the documentation, printing the responses from both models. We compare the responses of OpenAI and HuggingFaceHub models to the query about vector database strengths.

In [65]:
# select two llm models (OpenAI gpt-4o, HuggingFaceHub mistralai/Mistral-7B-Instruct-v0.2)
llm_openai = ChatOpenAI(model="gpt-4o", max_tokens=512)
llm_mistral = HuggingFaceEndpoint(
    repo_id="mistralai/Mistral-7B-Instruct-v0.2"
)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /home/gflood/.cache/huggingface/token
Login successful



We chose the `chain_type ="stuff"` which is the most straightforward of the document chains. It takes a list of documents, inserts them all into a prompt and passes that prompt to an LLM.

In [66]:
# create the chain for each model using langchain load_qa_chain
chain_openAI = load_qa_chain(llm_openai, chain_type="stuff")
chain_HuggingFaceHub = load_qa_chain(llm_mistral, chain_type="stuff")

In [67]:
# Show the most related chunks to the query
query_sim

[Document(page_content='We are the only nation on Earth that has always turned every crisis we have faced into an opportunity. \n\nThe only nation that can be defined by a single word: possibilities. \n\nSo on this night, in our 245th year as a nation, I have come to report on the State of the Union. \n\nAnd my report is this: the State of the Union is strong—because you, the American people, are strong. \n\nWe are stronger today than we were a year ago. \n\nAnd we will be stronger a year from now than we are today.', metadata={'id': '23b9b1af-ca12-41c4-b62c-eac095fe2b6e', 'embeddings': array([ 0.03127478,  0.0130375 ,  0.04139533, ..., -0.01439452,
        -0.01379844, -0.0240585 ], dtype=float32)})]

In [68]:
# OpenAI - run the chain on the query and the related chunks from the documentation
chain_openAI.invoke({'input_documents':query_sim, 'question':query})['output_text']

"The nations' strengths as highlighted in the context include the ability to turn every crisis into an opportunity, being defined by possibilities, and the strength and resilience of the American people. The overall message is that the nation is strong and continues to grow stronger over time."

In [69]:
# HugginFace - run the chain on the query and the related chunks from the documentation
chain_HuggingFaceHub.invoke({'input_documents':query_sim, 'question':query})['output_text']

' The nation is strong because the American people are strong and full of possibilities. The country has turned every crisis into an opportunity and continues to do so. It is defined by its ability to see and create possibilities. The nation is stronger today than it was a year ago and will be stronger a year from now.'

### RetrievalQA with GPT-4o

Let's try the second method using `RetrievalQA`. This time lets use GPT-4o as our LLM of choice.

The code below defines a question-answering bot that combines OpenAI's GPT-4o for generating responses and a retriever that accesses the KDB.AI vector database to find relevant information.

In [70]:
K = 10

In [71]:
qabot = RetrievalQA.from_chain_type(
    chain_type="stuff",
    llm=ChatOpenAI(model="gpt-4o", temperature=0.0),
    retriever=vecdb_kdbai.as_retriever(search_kwargs=dict(k=K, index="flat_index")),
    return_source_documents=True,
)

`as_retriever` is a method that converts a vectorstore into a retriever. A retriever is an interface that returns documents given an unstructured query. By using <code>as_retriever</code>, we can create a retriever from a vectorstore and use it to retrieve relevant documents for a query. This allows us to perform question answering over the documents indexed by the vectorstore `vecdb_kdbai`.

In [72]:
print(query)
print("-----")
print(qabot.invoke(dict(query=query))["result"])

what are the nations strengths?
-----
The strengths of the United States, as highlighted in the context, include:

1. **Resilience and Adaptability**: The nation has a history of turning crises into opportunities.
2. **Possibilities**: The country is defined by the concept of possibilities, indicating a forward-looking and optimistic outlook.
3. **Strength of the People**: The American people are described as strong, contributing to the overall strength of the nation.
4. **Diplomacy and Resolve**: American diplomacy and resolve are emphasized as important factors in international relations.
5. **Military Preparedness**: The U.S. has mobilized ground forces, air squadrons, and ship deployments to protect NATO allies.
6. **Economic Sanctions and Coalition Building**: The U.S. has built a coalition of nations to impose economic sanctions on Russia and support Ukraine.
7. **Historical Achievements**: The nation has a legacy of fighting for freedom, expanding liberty, and defeating totalita

Trying another query:

In [73]:
def query_qabot(qabot, query: str):
    print(new_query)
    print("---")
    return qabot.invoke(dict(query=new_query))["result"]

In [74]:
new_query = "what are the things this country needs to protect?"
query_qabot(qabot, new_query)

what are the things this country needs to protect?
---


"This country needs to protect several key areas:\n\n1. **American Jobs and Businesses**: By ensuring taxpayer dollars support American jobs and businesses through initiatives like Buy American policies.\n2. **Safety and Security**: By investing in crime prevention, community policing, and measures to reduce gun violence, such as universal background checks and banning assault weapons and high-capacity magazines.\n3. **Immigration and Border Security**: By providing pathways to citizenship for certain groups, revising laws to meet labor needs, and securing borders with new technology and joint patrols.\n4. **Voting Rights**: By protecting the fundamental right to vote and ensuring that votes are counted, combating laws that suppress or subvert elections.\n5. **Liberty and Justice**: By advancing immigration reform, protecting women's rights, and holding law enforcement accountable.\n6. **National and International Security**: By maintaining strong American diplomacy and resolve, partic

Clearly, Retrieval Augmented Generation stands out as a valuable technique that synergizes the capabilities of language models such as GPT-3 with the potency of information retrieval.
By enhancing the input with contextually specific data, RAG empowers language models to produce responses that are not only more precize but also well-suited to the context. 
Particularly in enterprize scenarios where extensive fine-tuning may not be feasible, RAG presents an efficient and economically viable approach to deliver personalized and informed interactions with users.

## 6. Delete the KDB.AI Table

Once finished with the table, it is best practice to drop it.

In [75]:
table.drop()

## Take Our Survey

We hope you found this sample helpful! Your feedback is important to us, and we would appreciate it if you could take a moment to fill out our brief survey. Your input helps us improve our content.

[**Take the Survey**](https://delighted.com/t/dgCLUkdx)