<a href="https://colab.research.google.com/github/datastax/ragstack-ai/blob/main/examples/notebooks/advancedRAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced RAG

This notebook describes two different Advanced RAG techniques: MultiQueryRAG and ParentDocumentRAG.

In **MultiQueryRAG**, an LLM is used to automate the process of prompt tuning, to generate multiple queries from 
different perspectives for a given user input question.

In **ParentDocumentRAG**, documents are split first into larger "parent" chunks, and then into smaller "child"
chunks so that their embeddings can more accurately reflect their meaning. Then between retrieval and inference, 
each smaller "child" chunk is replaced with its larger "parent" chunk. This provides more context to the 
model to answer the question.

While both of these techniques can increase the response accuracy, they also can have some drawbacks. Often they
take longer to execute, and they can cost more due to increased LLM invocations and/or increased token usage.

## Prerequisites

You need a vector-enabled Astra database and an OpenAI Account.

* Create an [Astra vector database](https://docs.datastax.com/en/astra-serverless/docs/getting-started/create-db-choices.html).
* Create an [OpenAI account](https://openai.com/)
* Within your database, create an [Astra DB Access Token](https://docs.datastax.com/en/astra-serverless/docs/manage/org/manage-tokens.html) with Database Administrator permissions.
* Get your Astra DB Endpoint:
    * `https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com`
* See the [Prerequisites](https://docs.datastax.com/en/ragstack/docs/prerequisites.html) page for more details.

## Setup

Install RAGStack, configure your secrets, and create some helper methods.

In [None]:
%pip install ragstack-ai

In [None]:
# Configure your secrets
import getpass
import os

os.environ["ASTRA_DB_API_ENDPOINT"] = input("Enter your Astra DB API Endpoint: ")
os.environ["ASTRA_DB_APPLICATION_TOKEN"] = getpass("Enter your Astra DB Token: ")
os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key: ")

In [None]:
# Create helper functions for printing docs and results
import textwrap


def pprint_docs(docs) -> None:
    print(
        f"\n{'-' * 70}\n".join(
            [
                f"Document {i+1}:\n\n" + "\n".join(textwrap.wrap(d.page_content))
                for i, d in enumerate(docs)
            ]
        )
    )


def pprint_result(result) -> None:
    print("Answer: " + "\n".join(textwrap.wrap(result)))

## Sample Data

Download and prep the sample data. Setup the prompts and the questions you will ask.

In [None]:
# Retrieve the text of a short story that will be indexed in the vector store
# ruff: noqa: E501
! curl https://raw.githubusercontent.com/CassioML/cassio-website/main/docs/frameworks/langchain/texts/amontillado.txt --output amontillado.txt
SAMPLEDATA = ["amontillado.txt"]

In [None]:
# Alternatively, provide your own file.
# However, you will want to update your queries to match the content of your file.

# Upload sample file (Note: this cell assumes you are on Google Colab)
# Local Jupyter notebooks can provide the path to their files directly
# by uncommenting and running just the next line).
# SAMPLEDATA = ["<path_to_file>"]

from google.colab import files

print("Please upload your own sample file:")
uploaded = files.upload()
if uploaded:
    SAMPLEDATA = uploaded
else:
    raise ValueError("Cannot proceed without Sample Data. Please re-run the cell.")

print("Please make sure to change your queries to match the contents of your file!")

In [None]:
import os

from langchain.document_loaders import PyPDFLoader, TextLoader

# Loop through each file and load it into your vector store
docs = []
for filename in SAMPLEDATA:
    path = os.path.join(os.getcwd(), filename)

    # Supported file types are pdf and txt
    if filename.endswith(".pdf"):
        loader = PyPDFLoader(path)
        new_docs = loader.load()
        print(f"Processed pdf file: {filename}")
    elif filename.endswith(".txt"):
        loader = TextLoader(
            path,
        )
        new_docs = loader.load()
        print(f"Processed txt file: {filename}")
    else:
        print(f"Unsupported file type: {filename}")

    if len(new_docs) > 0:
        docs.extend(new_docs)

# Empty the list of file names in case this cell is run multiple times
SAMPLEDATA = []

print("\nProcessing done.")
len(docs)

In [None]:
# Build a simple prompt and a set of questions to ask
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

prompt_template = """
Answer the question based only on the supplied context. If you don't know the answer, say you don't know the answer.
Context: {context}
Question: {question}
Your answer:
"""
prompt = ChatPromptTemplate.from_template(prompt_template)

questions = [
    "What motivates the narrator, Montresor, to seek revenge against Fortunato?",
    "What are the major themes in this story?",
    "What is the significance of the story taking place during the carnival season?",
    "How is vivid and descriptive language used in the story?",
    "Is there any foreshadowing in the story? If yes, how is it used in the story?",
]

In [None]:
# Create a helper to iterate over the questions
from langchain.callbacks import get_openai_callback


def do_retrieval(chain) -> None:
    for i in range(len(questions)):
        print("-" * 40)
        print(f"Question: {questions[i]}\n")
        with get_openai_callback() as cb:
            pprint_result(chain.invoke(questions[i]))
            print(f"\nTotal Tokens: {cb.total_tokens}\n")

## Document embedding and loading

Next create embeddings for the documents and insert them into the Astra DB vector store.

For the purpose of this example, you will use a method that is compatible with 
ParentDocumentRAG. If you aren't going to use this technique, you can review the 
`astradb.ipynb` example for a simpler document insertion method.

You will create 2 splitters: A parent splitter and a child splitter. The parent splitter 
will split your documents into 512 token documents. The child splitter will split the parent 
documents into 128 token documents. 

Embeddings for the child documents are generated and stored in the vector store. 

For the demo, the parent documents will only be stored in-memory. In a production system, the
parent documents should be stored in a database.

In [None]:
import os

from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain.text_splitter import TokenTextSplitter
from langchain_astradb import AstraDBVectorStore

# Initialize the models
model = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.1)
embedding = OpenAIEmbeddings()

# Initialize a vector store for storing the child chunks
vstore = AstraDBVectorStore(
    collection_name="advancedRAG",
    embedding=embedding,
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
)

# Initialize in-memory storage for the parent chunks
parent_store = InMemoryStore()

# Create a splitter for the parent documents
parent_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=0)

# Create a splitter for the child documents
# Note: child documents should be smaller than parent documents
child_splitter = TokenTextSplitter(chunk_size=128, chunk_overlap=0)

# Create a parent document retriever
parent_retriever = ParentDocumentRetriever(
    vectorstore=vstore,
    docstore=parent_store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

In [None]:
# Split and load the documents into the vector and parent stores
parent_retriever.add_documents(docs)

## Base Retriever

As a control, first make a standard RAG pipeline.

In [None]:
# Standard RAG, nothing fancy
base_retriever = vstore.as_retriever()

base_chain = (
    {"context": base_retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [None]:
# Run the queries on the base_chain
do_retrieval(base_chain)

----------------------------------------
Question: What motivates the narrator, Montresor, to seek revenge against Fortunato?

Answer: The narrator, Montresor, seeks revenge against Fortunato because of
the insults and injuries he has endured from him.

Total Tokens: 852

----------------------------------------
Question: What are the major themes in this story?

Answer: I don't know the answer.

Total Tokens: 818

----------------------------------------
Question: What is the significance of the story taking place during the carnival season?

Answer: The significance of the story taking place during the carnival season
is not clear from the given context.

Total Tokens: 832

----------------------------------------
Question: How is vivid and descriptive language used in the story?

Answer: Vivid and descriptive language is used in the story to create a sense
of atmosphere and to convey the intense emotions and actions of the
characters.

Total Tokens: 842

----------------------------

### Analysis

Some of the questions were answered well, and others were not. Note that 
`I don't know the answer.` should be considered a positive result. It is better than a
hallucination.

One nice thing with standard RAG is that the number of tokens used is quite low. This
keeps costs down. 

To dig deeper, you can examine the context used to answer the third question:

In [None]:
pprint_docs(base_retriever.get_relevant_documents(questions[2]))

Document 1:

 during the supreme madness of the carnival season, that I encountered
my friend.  He accosted me with excessive warmth, for he had been
drinking much.  The man wore motley. He had on a tight-fitting parti-
striped dress, and his head was surmounted by the conical cap and
bells.  I was so pleased to see him, that I thought I should never
have done wringing his hand.  I said to him--"My dear Fortunato, you
are luckily met.  How remarkably well you are looking to-day!
----------------------------------------------------------------------
Document 2:

ado."  Thus speaking, Fortunato possessed himself of my arm. Putting
on a mask of black silk, and drawing a _roquelaire_ closely about my
person, I suffered him to hurry me to my palazzo.  There were no
attendants at home; they had absconded to make merry in honour of the
time.  I had told them that I should not return until the morning, and
had given them explicit orders not to stir from the house. These
orders were sufficient,

There are 4 documents of size 128 tokens.

## MultiQueryRAG

Now try the first advanced technique. When the `MultiQueryRetriever` module is used, an additional
LLM call is made before retrieval. This call is to generates multiple versions of the initial question 
from different perspectives. Then retrieval is performed on this set of questions.

In [None]:
# Build the MultiQueryRAG chain
from langchain.retrievers.multi_query import MultiQueryRetriever

# Note that this retriever depends on the base_retriever
multi_retriever = MultiQueryRetriever.from_llm(retriever=base_retriever, llm=model)

multi_chain = (
    {"context": multi_retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [None]:
# Run the questions on the multi_chain
do_retrieval(multi_chain)

----------------------------------------
Question: What motivates the narrator, Montresor, to seek revenge against Fortunato?

Answer: The narrator, Montresor, seeks revenge against Fortunato because of
the insults and injuries he has endured from him.

Total Tokens: 1209

----------------------------------------
Question: What are the major themes in this story?

Answer: I don't know the answer.

Total Tokens: 1531

----------------------------------------
Question: What is the significance of the story taking place during the carnival season?

Answer: The significance of the story taking place during the carnival season
is not clear from the given context.

Total Tokens: 1176

----------------------------------------
Question: How is vivid and descriptive language used in the story?

Answer: Vivid and descriptive language is used in the story to create a
detailed and immersive atmosphere. It helps to paint a clear picture
of the setting, such as the crypt and the catacombs, and to co

### Analysis

The results using MultiQueryRAG are different. It is unclear if they are better or not. 

The number of tokens used has increased. Also the responsiveness has gone down due to the
extra LLM call.

To dig deeper, you can examine the context used to answer the 3rd question:

In [None]:
pprint_docs(multi_retriever.get_relevant_documents(questions[2]))

Document 1:

 during the supreme madness of the carnival season, that I encountered
my friend.  He accosted me with excessive warmth, for he had been
drinking much.  The man wore motley. He had on a tight-fitting parti-
striped dress, and his head was surmounted by the conical cap and
bells.  I was so pleased to see him, that I thought I should never
have done wringing his hand.  I said to him--"My dear Fortunato, you
are luckily met.  How remarkably well you are looking to-day!
----------------------------------------------------------------------
Document 2:

ado."  Thus speaking, Fortunato possessed himself of my arm. Putting
on a mask of black silk, and drawing a _roquelaire_ closely about my
person, I suffered him to hurry me to my palazzo.  There were no
attendants at home; they had absconded to make merry in honour of the
time.  I had told them that I should not return until the morning, and
had given them explicit orders not to stir from the house. These
orders were sufficient,

There are 5 documents of size 128 tokens. The model might benefit from the extra context provided when answering the question.

## ParentDocumentRAG

The second advanced technique uses the `ParentDocumentRetriever` defined above. Remember
that this will perform a post-processing step between retrieval and inference to replace the
child documents with their parents. After this is done, any duplicate documents are removed.

In [None]:
# Build the ParentDocumentRAG chain
parent_chain = (
    {"context": parent_retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [None]:
# Run it over the questions
do_retrieval(parent_chain)

----------------------------------------
Question: What motivates the narrator, Montresor, to seek revenge against Fortunato?

Answer: The narrator, Montresor, seeks revenge against Fortunato because
Fortunato insulted him.

Total Tokens: 1708

----------------------------------------
Question: What are the major themes in this story?

Answer: The major themes in this story are revenge, deception, and the power
of manipulation.

Total Tokens: 1695

----------------------------------------
Question: What is the significance of the story taking place during the carnival season?

Answer: The significance of the story taking place during the carnival season
is that it provides a chaotic and festive atmosphere, which allows the
narrator to carry out his revenge plot without arousing suspicion.

Total Tokens: 1719

----------------------------------------
Question: How is vivid and descriptive language used in the story?

Answer: Vivid and descriptive language is used in the story to create 

### Analysis

With ParentDocumentRAG, you get decent answers for all 5 questions.

The number of tokens used has gone up significantly, but the response time is similar to standard RAG.
The extra cost might be worth the improvement in results.

Again, you can dig deeper by looking at the context used to answer the 3rd question:

In [None]:
pprint_docs(parent_retriever.get_relevant_documents(questions[2]))

Document 1:

The thousand injuries of Fortunato I had borne as I best could, but
when he ventured upon insult, I vowed revenge.  You, who so well know
the nature of my soul, will not suppose, however, that I gave
utterance to a threat.  _At length_ I would be avenged; this was a
point definitely settled--but the very definitiveness with which it
was resolved, precluded the idea of risk.  I must not only punish, but
punish with impunity.  A wrong is unredressed when retribution
overtakes its redresser.  It is equally unredressed when the avenger
fails to make himself felt as such to him who has done the wrong.  It
must be understood that neither by word nor deed had I given Fortunato
cause to doubt my good will.  I continued, as was my wont, to smile in
his face, and he did not perceive that my smile _now_ was at the
thought of his immolation.  He had a weak point--this Fortunato--
although in other regards he was a man to be respected and even
feared.  He prided himself on his connoiss

There are 3 documents that are 512 tokens in size. The additional context helps the LLM generate a good answer for the question.

## Combined techniques

Finally you can combine both the MultiQuery and ParentDocument techniques, as shown below.

In [None]:
multi_parent_retriever = MultiQueryRetriever.from_llm(
    retriever=parent_retriever, llm=model
)

multi_parent_chain = (
    {"context": multi_parent_retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [None]:
do_retrieval(multi_parent_chain)

----------------------------------------
Question: What motivates the narrator, Montresor, to seek revenge against Fortunato?

Answer: The narrator, Montresor, seeks revenge against Fortunato because
Fortunato insulted him.

Total Tokens: 1880

----------------------------------------
Question: What are the major themes in this story?

Answer: The major themes in this story are revenge, deception, and the power
of manipulation.

Total Tokens: 2373

----------------------------------------
Question: What is the significance of the story taking place during the carnival season?

Answer: The significance of the story taking place during the carnival season
is that it provides a chaotic and festive atmosphere, which allows the
narrator to carry out his revenge plot without arousing suspicion.

Total Tokens: 2417

----------------------------------------
Question: How is vivid and descriptive language used in the story?

Answer: Vivid and descriptive language is used in the story to create 

### Analysis

This is by far the most expensive technique, but perhaps returns the best results.

## Cleanup

Astra has a maximum number of collections per DB. Delete the collection after you are done exploring this example.

In [None]:
# Sometimes this call returns a timeout. Usually it works on the second try.
vstore.delete_collection()