# Retrieval-augmented Generation (RAG) Using LLMs 

In this notebook, we create several prototypes of a Retrieval-Augmented Generation (RAG) system. We start with a basic case where the input documents are small enough to fit the LLM context, and then develop more advanced solutions that can handle large document collections.

### Data
We use small input documents that are availbale in the `tensor-house-data` repository.

## Environment Setup and Initialization

In [6]:
#
# Initialize LLM provider
# (google-cloud-aiplatform must be installed)
#
from google.cloud import aiplatform
aiplatform.init(
    project='<< specify your project name here >>',
    location='us-central1'
)

In [41]:
#
# Imports
#
from langchain.llms import VertexAI

from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

from langchain import PromptTemplate
from langchain.chains.question_answering import load_qa_chain

from langchain.document_loaders import UnstructuredHTMLLoader, TextLoader
from langchain.embeddings.vertexai import VertexAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter

## Question Answering Using a Single Prompt

The most basic scenario is querying small documents that fit the LLM prompt.

In [68]:
query_prompt = """
Please read the following text and answer which fruits have the highest concentration of citric acid.

TEXT:
Citric acid occurs in a variety of fruits and vegetables, most notably citrus fruits. Lemons and limes have particularly 
high concentrations of the acid; it can constitute as much as 8% of the dry weight of these fruits (about 47 g/L in the juices).
The concentrations of citric acid in citrus fruits range from 0.005 mol/L for oranges and grapefruits to 0.30 mol/L in lemons 
and limes; these values vary within species depending upon the cultivar and the circumstances under which the fruit was grown.
"""

llm = VertexAI(temperature=0.7)
response = llm(query_prompt)
print(response)

The text states that "Lemons and limes have particularly high concentrations of the acid; it can constitute as much as 8% of the dry weight of these fruits (about 47 g/L in the juices)". So the correct answer is lemons and limes.


## Question Answering Using MapReduce

For large documents and collections of documents that do not fit the LLM context, we can apply the MapReduce pattern to independently extract relevant summaries from document parts, and then merge these summaries into the final answer. This approach is appropriate for summarization and summarization-like queries.

In [84]:
#
# Load the input document
#
loader = TextLoader("../../tensor-house-data/search/food-additives/citric-acid-applications.txt")
documents = loader.load()

#
# Splitting
#
text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
print(f'The input document has been split into {len(texts)} chunks\n')

#
# Querying
#
question_prompt = """
Use the following portion of a long document to see if any of the text is relevant to answer the question. 
Return bullet points that help to answer the question.

{context}

Question: {question}
Bullet points:
"""
question_prompt_template = PromptTemplate(template=question_prompt, input_variables=["context", "question"])

combine_prompt = """
Given the following bullet points extracted from a long document and a question, create a final answer.
Question: {question}

=========
{summaries}
=========

Final answer:
"""
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["summaries", "question"])

llm = VertexAI(temperature=0.7)
qa_chain = load_qa_chain(llm=llm,
                         chain_type='map_reduce',
                         question_prompt=question_prompt_template,
                         combine_prompt=combine_prompt_template,
                         verbose=False)

question = "What are the three most important applications of citric acid? Provide a short justification for each application."
response = qa_chain({"input_documents": texts, "question": question})

print(response['output_text'])

The input document has been split into 2 chunks

The three most important applications of citric acid are:
1. As a flavoring and preservative in food and beverages, especially soft drinks.
2. In soaps and laundry detergents to remove hard water stains.
3. In the biotechnology and pharmaceutical industry to passivate high purity process piping.


## Question Answering Using Vector Search

For large documents and point questions that require only specific document parts to be answered, LLMs can be combined with traditional information retrieval techniques. The input document(s) is split into chunks which are then indexed in a vector store. To answer the user question, the most relevant chunks are retrieved and passed to the LLM.

In [17]:
#
# Load the input document
#
loader = TextLoader("../../tensor-house-data/search/food-additives/food-additives.txt")
documents = loader.load()

#
# Splitting
#
text_splitter = CharacterTextSplitter(chunk_size=3000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
print(f'The input document has been split into {len(texts)} chunks\n')

#
# Indexing and storing
#
embeddings = VertexAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)

#
# Querying
#
llm = VertexAI(temperature=0.7, verbose=True)
qa_chain = RetrievalQA.from_chain_type(llm=llm, 
                                       chain_type="stuff", 
                                       retriever=docsearch.as_retriever(search_kwargs={"k": 1}), 
                                       return_source_documents=True)

question = "What is the melting point of citric acid?"
response = qa_chain({"query": question})

print(response['result'])

The input document has been split into 5 chunks

The melting point of citric acid is approximately 153 °C (307 °F).


## Conversational Retrieval

In this section, we prototype a conversational retrieval system. It combines the chat history with the retrieved documents to answer the question.

In [65]:
#
# Load the input document
#
loader = TextLoader("../../tensor-house-data/search/food-additives/food-additives.txt")
documents = loader.load()

#
# Splitting
#
text_splitter = CharacterTextSplitter(chunk_size=3000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
print(f'The input document has been split into {len(texts)} chunks\n')

#
# Indexing and storing
#
embeddings = VertexAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)

#
# Initialize new chat
#
llm = VertexAI(verbose=True)
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
chat = ConversationalRetrievalChain.from_llm(llm, 
                                             retriever=docsearch.as_retriever(search_kwargs={"k": 1}), 
                                             memory=memory, 
                                             verbose=False, 
                                             return_generated_question=False)

#
# Ask questions with a continuous context
#
def print_last_chat_turn(chat_history):
    print(chat_history[-2].content, '\n')
    print(chat_history[-1].content, '\n')    

result = chat({"question": "How much times aspartam is sweeter than table sugar?"})
print_last_chat_turn(result['chat_history'])

result = chat({"question": "What is its caloric value?"})
print_last_chat_turn(result['chat_history'])

The input document has been split into 5 chunks

How much times aspartam is sweeter than table sugar? 

Aspartame is approximately 150 to 200 times sweeter than sucrose (table sugar), making it an intense sweetener. 

What is its caloric value? 

Aspartame is virtually calorie-free, as the human body does not metabolize it into energy like sugars or carbohydrates. 

