# Retrieval Augmented Generation (RAG) with LangChain and KDB.AI

This example will demonstrate how to use an advanced prompt engineering technique called Retrieval Augmented Generation (RAG), with hands-on examples using Langchain, KDB.AI and various LLMs.

## What is RAG and Why Do We Need it?

Large Language Models have remarkable capabilities in generating human-like text. These models are found in applications ranging from chatbots to content generation and translation. However, they face a significant challenge in staying up-to-date with recent world events, as they are essentially frozen in time, operating within the static knowledge snapshot captured during their training.

To bridge this gap and address the need for specialized, real-time information, the concept of "Retrieval Augmented Generation" (RAG) has emerged as a powerful solution. RAG enables these language models to access relevant data from external knowledge bases, enriching their responses with current and contextually accurate information.

## Aim

In this tutorial, we'll cover:
- Utilizing LangChain to create OpenAI embeddings for PDF document
- Storing these embeddings in a KDB.AI vector database
- Returning search results using various LLM's whose prompts are augmented by the context from the vector database aka. RAG


## Install Dependencies

We first need to install some libraries:

In [1]:
%pip install ipython jupyter pandas pyarrow openai pypdf tiktoken kdbai-client git+https://github.com/KxSystems/langchain.git@KDB.AI#subdirectory=libs/langchain -q

Note: you may need to restart the kernel to use updated packages.


### Load Libraries

Load the various libraries that will be needed in this tutorial, including `getpass` to set API tokens, `kdbai_client` to connect to KDB.AI and all the langchain libraries we will use.

In [3]:
from getpass import getpass
import os
import pandas as pd
import kdbai_client as kdbai

In [4]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import KDBAI
from langchain import HuggingFaceHub
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

## 2. Set API Keys

To follow this example you will need to request both an [OpenAI API Key](https://platform.openai.com/apps) and a [Hugging Face API Token](https://huggingface.co/docs/hub/security-tokens). 

You can create both for free by registering using the links provided. Once you have the credentials you can add them below.

In [None]:
os.environ['OPENAI_API_KEY'] = getpass('OpenAI API Key: ')
os.environ['HUGGINGFACEHUB_API_TOKEN'] = getpass('Hugging Face API Token: ')

## 3. Data Preparation 

### Load TXT Document

In the below code snippet we read the TXT file and split the document into chunks. This document is a State of the Union message from the President of the United States to the United States Congress.

In [6]:
#Load the documents we want to prompt an LLM about
loader = TextLoader('./state_of_the_union.txt')
doc = loader.load()

#Chunk the documents into 500 character chunks using langchain's text splitter "RucursiveCharacterTextSplitter"
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 0
)

chunks = text_splitter.split_documents(doc) #split_documents produces a list of all the chunks created, printing out first chunk for example
texts = [p.page_content for p in chunks]

### Embeddings
 
We will use OpenAIEmbeddings to embed our document into a format suitable for the vector database. We select `text-embedding-ada-002` for use in the next step.

In [7]:
embeddings = OpenAIEmbeddings(model='text-embedding-ada-002')

## 4. Use KDB.AI as Vector Database

### Connect to KDB.AI

To use KDB.AI, you will need two session details - a URL endpoint and an API key. To get these you can sign up for free [here](https://trykdb.kx.com/kdbai/signup).

You can connect to a KDB.AI session using `kdbai.Session`. Enter the session URL endpoint and API key details from your KDB.AI Cloud portal below.

In [None]:
## Connect to KDB.AI
print('Connect KDB.AI session...')
KDBAI_ENDPOINT = input('KDB.AI endpoint: ')
KDBAI_API_KEY = getpass('KDB.AI API key: ')
session = kdbai.Session(api_key=KDBAI_API_KEY, endpoint=KDBAI_ENDPOINT)

In [10]:
## Setup table for vector store later 
schema_rag = {'columns': [ {'name': 'id', 'pytype': 'str'},
                           {'name': 'text', 'pytype': 'bytes'},
                           {'name': 'embeddings',
                               'pytype': 'float32',
                               'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'flat'}}]}

print('Create table "rag_langchain"...')
table = session.create_table('rag_langchain', schema_rag)

Create table "rag_langchain"...


### Store in KDB.AI

We can now store our data in KDB.AI by passing a few parameters to `KDBAI.from_texts`:

- `session` our handle to talk to KDB.AI
- `table_name` our KDB.AI table name
- `texts` the chunked document 
- `embeddings` the embeddings model we have chosen 

In [11]:
# use KDBAI as vector store  

vecdb_kdbai = KDBAI.from_texts(session, 'rag_langchain', texts=texts, embedding=embeddings) 

Now we have the vector embeddings stored in KDB.AI we are ready to query.

## 5. Similarity Search 

Before we implement RAG, let's see an example of using similarity search directly on KDB.AI vector store. The search uses Euclidean similarity search which measures distance between two points in vector space.

In [12]:
query = "what are the nations strengths?"

query_sim = vecdb_kdbai.similarity_search(query) #query_sim holds results of the similarity search, the closest related chunks to the query.
print(query_sim)

[Document(page_content='We are the only nation on Earth that has always turned every crisis we have faced into an opportunity. \n\nThe only nation that can be defined by a single word: possibilities. \n\nSo on this night, in our 245th year as a nation, I have come to report on the State of the Union. \n\nAnd my report is this: the State of the Union is strong—because you, the American people, are strong. \n\nWe are stronger today than we were a year ago. \n\nAnd we will be stronger a year from now than we are today.', metadata={'id': '958497bd-21f0-4d51-8934-68a6469a3d65', 'embeddings': array([-2.1954027e-03, -1.8670579e-02,  4.0691000e-05, ...,
       -9.3835755e-04,  7.5132987e-03, -2.0782286e-02], dtype=float32)})]


This result returns the most similar chunks of text to our query, which is an okay start but it is hard to read. It would be a lot better if we could summarize these findings and return a response that is more human readable - this is where RAG comes in!

## 6. Retrieval Augmented Generation 

There are four different ways to do [question answering (QA) in LangChain](https://python.langchain.com/docs/use_cases/question_answering/#go-deeper-4):
- `load_qa_chain` will do QA over all documents passed 
- `RetrievalQA` retrieves the most relevant chunk of text and does QA on that subset. Uses `load_qa_chain` under the hood.
- `VectorstoreIndexCreator` is a higher level wrapper for `RetrievalQA` to make it easier to run in fewer lines of code
- `ConversationalRetrievalChain` builds on RetrievalQAChain to provide a chat history component

In this tutorial we will implement the first two.

### load_qa_chain with OpenAI and HuggingFace LLMs

We set up two question-answering chains for different models, OpenAI and HuggingFaceHub, using LangChain's `load_qa_chain` function. To do this we first perform the same similarity search run earlier and then run both chains on the query and the related chunks from the documentation, printing the responses from both models. We compare the responses of OpenAI and HuggingFaceHub models to the query about vector database strengths.

In [13]:
#select two llm models (OpenAI text-davinci-003, HuggingFaceHub google/flan-t5-xxl(designed for short answers)) 
llm_openai = OpenAI(model="text-davinci-003", max_tokens=512)
llm_flan = HuggingFaceHub(repo_id="google/flan-t5-xxl", model_kwargs={"temperature":0.5, "max_length":512})


We chose the `chain_type ="stuff"` which is the most straightforward of the document chains. It takes a list of documents, inserts them all into a prompt and passes that prompt to an LLM.

In [15]:
#create the chain for each model using langchain load_qa_chain
chain_openAI = load_qa_chain(llm_openai, chain_type="stuff")
chain_HuggingFaceHub = load_qa_chain(llm_flan, chain_type="stuff")

query = "what are the nations strengths?"

query_sim = vecdb_kdbai.similarity_search(query) #Gather the most related chunks to the query

#run the chain on the query and the related chunks from the documentation
print("OpenAI Response: ")
print(chain_openAI.run(input_documents=query_sim, question=query),'\n')
print("HuggingFaceHub Response: ")
print(chain_HuggingFaceHub.run(input_documents=query_sim, question=query))

OpenAI Response: 
 The American people are strong and capable of turning any crisis into an opportunity. 

HuggingFaceHub Response: 
possibilities


We can see the response from OpenAI is longer and more detailed and seems to have done a better job summarizing the nation's strengths from the document provided.

### RetrievalQA with GPT-3.5 

Let's try the second method using `RetrievalQA`. This time lets use GPT-3.5 as our LLM of choice.

The code below defines a question-answering bot that combines OpenAI's GPT-3.5 Turbo for generating responses and a retriever that accesses the KDB.AI vector database to find relevant information.

In [17]:
K = 10

qabot = RetrievalQA.from_chain_type(chain_type='stuff',
                                    llm=ChatOpenAI(model='gpt-3.5-turbo-16k', temperature=0.0), 
                                    retriever=vecdb_kdbai.as_retriever(search_kwargs=dict(k=K)),
                                    return_source_documents=True)


In [18]:
print(f'\n\n{query}\n')
print(qabot(dict(query=query))['result'])



what are the nations strengths?

The passage mentions several strengths of the United States:

1. Resilience and ability to turn crises into opportunities.
2. Strong and united American people.
3. History of debating and addressing great questions and achieving great things.
4. Commitment to freedom, liberty, and democracy.
5. Strong military capabilities and commitment to defending NATO allies.
6. Mobilization of ground forces, air squadrons, and ship deployments to protect NATO countries.
7. Leadership in releasing petroleum reserves to stabilize gas prices.
8. Confidence in overcoming challenges and capacity to succeed.
9. Support for democracy and peace, with democracies rising to the occasion.
10. Diplomatic influence and resolve.
11. Preparedness and coordination with allies in responding to threats.
12. Imposing economic sanctions on Russia and isolating Putin from the world.
13. Unity and determination of the Ukrainian people in the face of aggression.
14. Support from fellow

Trying another query:

In [19]:
query = "what are the things this country needs to protect?"
print(f'\n\n{query}\n')
print(qabot(dict(query=query))['result'])



what are the things this country needs to protect?

Based on the provided context, the country needs to protect the following:

1. The right to vote and ensure that every vote is counted.
2. The torch of liberty and the values that have attracted generations of immigrants to the United States.
3. The pathway to citizenship for Dreamers, temporary status holders, farm workers, and essential workers.
4. The border security by implementing new technology, joint patrols, and dedicated immigration judges.
5. American industries and jobs by buying American products and leveling the playing field with competitors like China.
6. Communities by investing in crime prevention, community police officers, and holding law enforcement accountable.
7. Access to healthcare, including preserving a woman's right to choose and advancing maternal health care.
8. LGBTQ+ rights by passing the bipartisan Equality Act.
9. Democracy by protecting it from threats and ensuring fairness and opportunity for all.


Clearly, Retrieval Augmented Generation stands out as a valuable technique that synergizes the capabilities of language models such as GPT-3 with the potency of information retrieval.
By enhancing the input with contextually specific data, RAG empowers language models to produce responses that are not only more precise but also well-suited to the context. 
Particularly in enterprise scenarios where extensive fine-tuning may not be feasible, RAG presents an efficient and economically viable approach to deliver personalized and informed interactions with users.