# Retrieval Augmented Generation (RAG) with LangChain

This example will demonstrate how to use an advanced prompt engineering technique called Retrieval Augmented Generation (RAG), with hands-on examples using Langchain, KDB.AI and various LLMs.

### What is RAG and Why Do We Need it?

Large Language Models have remarkable capabilities in generating human-like text. These models are found in applications ranging from chatbots to content generation and translation. However, they face a significant challenge in staying up-to-date with recent world events, as they are essentially frozen in time, operating within the static knowledge snapshot captured during their training.

To bridge this gap and address the need for specialized, real-time information, the concept of "Retrieval Augmented Generation" (RAG) has emerged as a powerful solution. RAG enables these language models to access relevant data from external knowledge bases, enriching their responses with current and contextually accurate information. For more content on RAG you can check out our videos on [Youtube](https://www.youtube.com/@KxSystems/streams) where we discuss the best practices for RAG, chunking strategies, the variety of approaches as well as how to evaluate your RAG application.

### Aim

In this tutorial, we'll cover:

1. Load Text Data
1. Define OpenAI Text Emedding Model
1. Store Embeddings In KDB.AI
1. Search For Similar Documents To A Given Query
1. Perform Retrieval Augmented Generation
1. Delete the KDB.AI Table

---

## 0. Setup

### Import Packages

Load the various libraries that will be needed in this tutorial, including all the langchain libraries we will use.

In [1]:
# vector DB
import os
from getpass import getpass
import kdbai_client as kdbai
import time

In [2]:
# langchain packages
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import KDBAI
from langchain import HuggingFaceHub
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

### Set API Keys

To follow this example you will need to request both an [OpenAI API Key](https://platform.openai.com/apps) and a [Hugging Face API Token](https://huggingface.co/docs/hub/security-tokens). 

You can create both for free by registering using the links provided. Once you have the credentials you can add them below.

In [3]:
os.environ["OPENAI_API_KEY"] = (
    os.environ["OPENAI_API_KEY"]
    if "OPENAI_API_KEY" in os.environ
    else getpass("OpenAI API Key: ")
)

In [4]:
os.environ["HUGGINGFACEHUB_API_TOKEN"] = (
    os.environ["HUGGINGFACEHUB_API_TOKEN"]
    if "HUGGINGFACEHUB_API_TOKEN" in os.environ
    else getpass("Hugging Face API Token: ")
)

Hugging Face API Token:  ········


## 1. Load Text Data

### Read In Text Document

In the below code snippet, we read the text file and split the document into chunks. This document is a State of the Union message from the President of the United States to the United States Congress.

In [5]:
# Load the documents we want to prompt an LLM about
doc = TextLoader("data/state_of_the_union.txt").load()

### Split The Document Into Chunks

In [6]:
# Chunk the documents into 500 character chunks using langchain's text splitter "RucursiveCharacterTextSplitter"
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)

In [7]:
# split_documents produces a list of all the chunks created, printing out first chunk for example
pages = [p.page_content for p in text_splitter.split_documents(doc)]

In [8]:
pages[0]

'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny.'

## 2. Define OpenAI Text Embedding Model
 
We will use OpenAIEmbeddings to embed our document into a format suitable for the vector database. We select `text-embedding-ada-002` for use in the next step.

In [9]:
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

## 3. Store Embeddings In KDB.AI

With the embeddings created, we need to store them in a vector database to enable efficient searching. KDB.AI is perfect for this task.

KDB.AI comes in two offerings:

1. [KDB.AI Cloud](https://trykdb.kx.com/kdbai/signup/) - For experimenting with smaller generative AI projects with a vector database in our cloud.
2. [KDB.AI Server](https://trykdb.kx.com/kdbaiserver/signup/) - For evaluating large scale generative AI applications on-premises or on your own cloud provider.

Depending on which you use there will be different setup steps and connection details required.

### KDB.AI Cloud

To use KDB.AI Cloud, you will need two session details - a URL endpoint and an API key. To get these you can sign up for free [here](https://trykdb.kx.com/kdbai/signup).

You can connect to a KDB.AI Cloud session using `kdbai.Session` and passing the session URL endpoint and API key details from your KDB.AI Cloud portal.

If the environment variables `KDBAI_ENDPOINTS` and `KDBAI_API_KEY` exist on your system containing your KDB.AI Cloud portal details, these variables will automatically be used to connect.
If these do not exist, it will prompt you to enter your KDB.AI Cloud portal session URL endpoint and API key details.

In [10]:
KDBAI_ENDPOINT = (
    os.environ["KDBAI_ENDPOINT"]
    if "KDBAI_ENDPOINT" in os.environ
    else input("KDB.AI endpoint: ")
)
KDBAI_API_KEY = (
    os.environ["KDBAI_API_KEY"]
    if "KDBAI_API_KEY" in os.environ
    else getpass("KDB.AI API key: ")
)

In [11]:
session = kdbai.Session(api_key=KDBAI_API_KEY, endpoint=KDBAI_ENDPOINT)

### KDB.AI Server

To use KDB.AI Server, you will need download and run your own container. To do this you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/). 

You will receive an email with the required license file and bearer token needed to download your instance. Follow instructions in the signup email to get your session up and running.

Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint.

In [None]:
session = kdbai.Session(endpoint='http://localhost:8082')

### Define Vector DB Table Schema

In [12]:
rag_schema = {
    "columns": [
        {"name": "id", "pytype": "str"},
        {"name": "text", "pytype": "bytes"},
        {
            "name": "embeddings",
            "pytype": "float32",
            "vectorIndex": {"dims": 1536, "metric": "L2", "type": "flat"},
        },
    ]
}

### Create Vector DB Table

Use the KDB.AI `create_table` function to create a table that matches the defined schema in the vector database.

In [13]:
# First ensure the table does not already exist
try:
    session.table("rag_langchain").drop()
    time.sleep(5)
except kdbai.KDBAIException:
    pass

In [14]:
table = session.create_table("rag_langchain", rag_schema)

### Add Embedded Data to KDB.AI Table

We can now store our data in KDB.AI by passing a few parameters to `KDBAI.from_texts`:

- `session` our handle to talk to KDB.AI
- `table_name` our KDB.AI table name
- `texts` the chunked document 
- `embeddings` the embeddings model we have chosen 

In [15]:
# use KDBAI as vector store
vecdb_kdbai = KDBAI(table, embeddings)
vecdb_kdbai.add_texts(texts=pages)

Now we have the vector embeddings stored in KDB.AI we are ready to query.

## 4. Search For Similar Documents To A Given Query 

Before we implement RAG, let's see an example of using similarity search directly on KDB.AI vector store. The search uses Euclidean similarity search which measures distance between two points in vector space.

In [16]:
query = "what are the nations strengths?"

In [17]:
# query_sim holds results of the similarity search, the closest related chunks to the query.
query_sim = vecdb_kdbai.similarity_search(query)

In [18]:
query_sim

[Document(page_content='We are the only nation on Earth that has always turned every crisis we have faced into an opportunity. \n\nThe only nation that can be defined by a single word: possibilities. \n\nSo on this night, in our 245th year as a nation, I have come to report on the State of the Union. \n\nAnd my report is this: the State of the Union is strong—because you, the American people, are strong. \n\nWe are stronger today than we were a year ago. \n\nAnd we will be stronger a year from now than we are today.', metadata={'id': 'e19475e6-e0df-4bbb-95e6-388956795a38', 'embeddings': array([-0.00213418, -0.01874734,  0.0001179 , ..., -0.00102042,
         0.00753241, -0.02074311], dtype=float32)})]

This result returns the most similar chunks of text to our query, which is an okay start but it is hard to read. It would be a lot better if we could summarize these findings and return a response that is more human readable - this is where RAG comes in!

## 5. Perform Retrieval Augmented Generation

There are four different ways to do [question answering (QA) in LangChain](https://python.langchain.com/docs/use_cases/question_answering/#go-deeper-4):
- `load_qa_chain` will do QA over all documents passed every time it is called. It is simple and comprehensive, but can be slower and less efficient than `RetrievalQA` as it may not focus on the most relevant parts of the tests for the question. In one example below, we will perform similarity search with KDB.AI before using `load_qa_chain` to act upon "all documents" being passed.
- `RetrievalQA` retrieves the most relevant chunk of text and does QA on that subset. It uses `load_qa_chain` under the hood on each chunk and is faster and more efficient then the vanilla `load_qa_chain`. These performance gains come at the risk of losing some information or context from the documents as it may not always find the best text chunks for the question. In one example below, we will use KDB.AI as the retriever of `RetrievalQA`.
- `VectorstoreIndexCreator` is a higher level wrapper for `RetrievalQA` to make it easier to run in fewer lines of code
- `ConversationalRetrievalChain` builds on RetrievalQAChain to provide a chat history component

In this tutorial we will implement the first two.

### 'load_qa_chain' with OpenAI and HuggingFace LLMs

We set up two question-answering chains for different models, OpenAI and HuggingFaceHub, using LangChain's `load_qa_chain` function. To do this we first perform the same similarity search run earlier and then run both chains on the query and the related chunks from the documentation, printing the responses from both models. We compare the responses of OpenAI and HuggingFaceHub models to the query about vector database strengths.

In [19]:
# select two llm models (OpenAI text-davinci-003, HuggingFaceHub google/flan-t5-xxl(designed for short answers))
llm_openai = OpenAI(model="text-davinci-003", max_tokens=512)
llm_flan = HuggingFaceHub(
    repo_id="google/flan-t5-xxl", model_kwargs={"temperature": 0.5, "max_length": 512}
)


We chose the `chain_type ="stuff"` which is the most straightforward of the document chains. It takes a list of documents, inserts them all into a prompt and passes that prompt to an LLM.

In [20]:
# create the chain for each model using langchain load_qa_chain
chain_openAI = load_qa_chain(llm_openai, chain_type="stuff")
chain_HuggingFaceHub = load_qa_chain(llm_flan, chain_type="stuff")

In [21]:
# Show the most related chunks to the query
query_sim

[Document(page_content='We are the only nation on Earth that has always turned every crisis we have faced into an opportunity. \n\nThe only nation that can be defined by a single word: possibilities. \n\nSo on this night, in our 245th year as a nation, I have come to report on the State of the Union. \n\nAnd my report is this: the State of the Union is strong—because you, the American people, are strong. \n\nWe are stronger today than we were a year ago. \n\nAnd we will be stronger a year from now than we are today.', metadata={'id': 'e19475e6-e0df-4bbb-95e6-388956795a38', 'embeddings': array([-0.00213418, -0.01874734,  0.0001179 , ..., -0.00102042,
         0.00753241, -0.02074311], dtype=float32)})]

In [22]:
# OpenAI - run the chain on the query and the related chunks from the documentation
chain_openAI.run(input_documents=query_sim, question=query)

' The American people are strong. The nation is capable of turning crises into opportunities and is filled with possibilities.'

In [23]:
# HugginFace - run the chain on the query and the related chunks from the documentation
chain_HuggingFaceHub.run(input_documents=query_sim, question=query)

'We are strong today than we were a year ago. And we will be stronger a year'

We can see the response from OpenAI is longer and more detailed and seems to have done a better job summarizing the nation's strengths from the document provided.

### RetrievalQA with GPT-3.5

Let's try the second method using `RetrievalQA`. This time lets use GPT-3.5 as our LLM of choice.

The code below defines a question-answering bot that combines OpenAI's GPT-3.5 Turbo for generating responses and a retriever that accesses the KDB.AI vector database to find relevant information.

In [24]:
K = 10

In [25]:
qabot = RetrievalQA.from_chain_type(
    chain_type="stuff",
    llm=ChatOpenAI(model="gpt-3.5-turbo-16k", temperature=0.0),
    retriever=vecdb_kdbai.as_retriever(search_kwargs=dict(k=K)),
    return_source_documents=True,
)

`as_retriever` is a method that converts a vectorstore into a retriever. A retriever is an interface that returns documents given an unstructured query. By using <code>as_retriever</code>, we can create a retriever from a vectorstore and use it to retrieve relevant documents for a query. This allows us to perform question answering over the documents indexed by the vectorstore `vecdb_kdbai`.

In [26]:
print(query)
print("-----")
print(qabot(dict(query=query))["result"])

what are the nations strengths?
-----
The passage does not explicitly mention the specific strengths of the United States. However, it emphasizes the resilience and strength of the American people, their ability to turn crises into opportunities, and their commitment to protecting freedom, expanding liberty, and defending democracy. It also highlights the nation's history of debating important issues, fighting for freedom, and building a strong and prosperous nation. Additionally, it mentions the mobilization of American forces to defend NATO allies and the implementation of economic sanctions to support Ukraine.


Trying another query:

In [27]:
def query_qabot(qabot, query: str):
    print(new_query)
    print("---")
    return qabot(dict(query=new_query))["result"]

In [28]:
new_query = "what are the things this country needs to protect?"
query_qabot(qabot, new_query)

what are the things this country needs to protect?
-----
"Based on the provided context, the country needs to protect the following:\n\n1. The right to vote and ensure that every vote is counted.\n2. The torch of liberty that has led generations of immigrants to the United States.\n3. The pathway to citizenship for Dreamers, temporary status holders, farm workers, and essential workers.\n4. The border security by implementing new technology, joint patrols, and dedicated immigration judges.\n5. American industries and jobs by buying American-made products and leveling the playing field with competitors like China.\n6. Communities by investing in crime prevention, community police officers, and holding law enforcement accountable.\n7. Access to healthcare, including preserving a woman's right to choose and advancing maternal health care.\n8. LGBTQ+ rights by passing the bipartisan Equality Act.\n9. Democracy by protecting it from threats and ensuring fairness and opportunity for all.\n10

Clearly, Retrieval Augmented Generation stands out as a valuable technique that synergizes the capabilities of language models such as GPT-3 with the potency of information retrieval.
By enhancing the input with contextually specific data, RAG empowers language models to produce responses that are not only more precize but also well-suited to the context. 
Particularly in enterprize scenarios where extensive fine-tuning may not be feasible, RAG presents an efficient and economically viable approach to deliver personalized and informed interactions with users.

## 6. Delete the KDB.AI Table

Once finished with the table, it is best practice to drop it.

In [29]:
table.drop()

True

We hope you found this sample helpful! Your feedback is important to us, and we would appreciate it if you could take a moment to fill out our brief survey. Your input helps us improve our content.

[**Take the Survey**](https://delighted.com/t/dgCLUkdx)