# Build an Advanced RAG App: Query Rewriting

This notebook gives a step-by-step example for the article "Build an Advanced RAG App: Query Rewriting", available at https://ruxu.dev. The purpose of this example is to showcase a very query rewriting techniques in a RAG pipeline.

We will showcase different strategies to perform query rewriting on an input query of a RAG pipeline. These strategies will go from the most basic to some advanced ones. Then, we will also compare the results of a RAG pipeline with and without query rewriting.

First, we will install required dependencies.

In [None]:
!pip install langchain langchain-community pypdf sentence_transformers faiss-cpu langchain-anthropic

For this examples I'll be using Anthropic's Claude 3.5 Sonnet model.

> In order for it to work, remember to set the Secret Variable "ANTHROPIC_API_KEY" to your own Anthropic API Key, or change the model to any of your choice.

In [None]:
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate, FewShotChatMessagePromptTemplate
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from google.colab import userdata

api_key = userdata.get('ANTHROPIC_API_KEY')
model = ChatAnthropic(model='claude-3-5-sonnet-20240620', api_key=api_key)


embeddings_model_name = "BAAI/bge-small-en"
model_kwargs = {"device": "cpu"}
encode_kwargs = {"normalize_embeddings": True}
embeddings_model = HuggingFaceBgeEmbeddings(
    model_name=embeddings_model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)

## Zero-shot Query Rewriting
This is simple query rewriting. Zero-shot refers to the prompt engineering technique of giving examples of the task to the LLM, which in this case we give none.

In [None]:
system_rewrite = """You are a helpful assistant that generates multiple search queries based on a single input query.

Perform query expansion. If there are multiple common ways of phrasing a user question
or common synonyms for key words in the question, make sure to return multiple versions
of the query with the different phrasings.

If there are acronyms or words you are not familiar with, do not try to rephrase them.

Return 3 different versions of the question."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_rewrite),
        ("human", "{question}"),
    ]
)

chain = prompt | model

response = chain.invoke({
    "question": "Which food items does this recipe need?"
})

response

## Few-shot Query Rewriting

For a slightly better result at the cost of using a few more tokens per rewrite, we can give some examples of how we want the rewrite to be done.

In [None]:
examples = [
    {
        "question": "How tall is the Eiffel Tower? It looked so high when I was there last year",
        "answer": "What is the height of the Eiffel Tower?"
    },
    {
        "question": "1 oz is 28 grams, how many cm is 1 inch?",
        "answer": "Convert 1 inch to cm."
    },
    {
        "question": "What's the main point of the article? What did the author try to convey?",
        "answer": "What is the main key point of this article?"
    }
]
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{question}"),
        ("ai", "{answer}"),
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples
)
final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_rewrite),
        few_shot_prompt,
        ("human", "{question}"),
    ]
)

chain = final_prompt | model
response = chain.invoke({
    "question": "Which food items does this recipe need?"
})
response

## Trainable rewriter

We can fine-tune a pre-trained model to perform the query rewriting task. Instead of relying on examples, we can teach it how query rewriting should be done to achieve the best results in context retrieving. Also, we can further train it using Reinforcement Learning so it can learn to recognize problematic queries and avoid toxic and harmful phrases.

Or we can also use an open-source model that has already been trained by somebody else on the task of query rewriting.

## Sub-queries

If the user query contains multiple questions, this can make context retrieval tricky. Each question probably needs different information, and we are not going to get all of it using all the questions as basis for information retrieval. To solve this problem, we can decompose the input into multiple sub-queries, and perform retrieval for each of the sub-queries.

In [None]:
system_decompose = """You are a helpful assistant that generates search queries based on a single input query.

Perform query decomposition. Given a user question, break it down into distinct sub questions that
you need to answer in order to answer the original question.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_decompose),
        ("human", "{question}"),
    ]
)

chain = prompt | model

response = chain.invoke({
    "question": """Which is the most popular programming language for machine learning and
is it the most popular programming language overall?"""
})

response

## Step-back prompt

Many questions can be a bit too complex for the RAG pipeline’s retrieval to grasp the multiple levels of information needed to answer them. For these cases, it can be helpful to generate multiple additional queries to use for retrieval. These queries will be more generic than the original query. This will enable the RAG pipeline to retrieve relevant information on multiple levels.

In [None]:
system_step_back = """You are an expert at taking a specific question and extracting a more generic question that gets at
the underlying principles needed to answer the specific question.

Given a specific user question, write a more generic question that needs to be answered in order to answer the specific question.

If you don't recognize a word or acronym to not try to rewrite it.

Write concise questions."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_step_back),
        ("human", "{question}"),
    ]
)

chain = prompt | model

response = chain.invoke({
    "question": """Which is the most popular programming language for machine learning?"""
})

response

## HyDE

Another method to improve how queries are matched with contexts chunks are Hypothetical Document Embeddings or HyDE. Sometimes, questions and answers are not that semantically similar, which can cause the RAG pipeline to miss critical context chunks in the retrieval stage. However, even if the query is semantically different, a response to the query should be semantically similar to another response to the same query. The HyDE method consists of creating hypothetical context chunks that answer the query and using them to match the real context that will help the LLM answer.

In [None]:
actual_document = """
Berkson's paradox, also known as Berkson's bias, collider bias, or Berkson's fallacy, is a result in conditional probability
and statistics which is often found to be counterintuitive, and hence a veridical paradox. It is a complicating factor arising in
statistical tests of proportions. Specifically, it arises when there is an ascertainment bias inherent in a study design. The effect is
related to the explaining away phenomenon in Bayesian networks, and conditioning on a collider in graphical models.

It is often described in the fields of medical statistics or biostatistics, as in the original description of the problem by Joseph Berkson.
"""

actual_document_emb = embeddings_model.embed_documents([actual_document])

In [None]:
system_hyde = """You are an expert at using a question to generate a document useful for answering the question.

Given a question, generate a paragraph of text that answers the question.
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_hyde),
        ("human", "{question}"),
    ]
)

chain = prompt | model

hypothetical_document = chain.invoke({
    "question": """What does Berkson's paradox consist on?"""
})
hypothetical_document

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

question_embeddings = embeddings_model.embed_documents(["What does Berkson's paradox consist on?"])
hypothetical_document_emb = embeddings_model.embed_documents([hypothetical_document.content])

print(f"Similarity without HyDE: {cosine_similarity(question_embeddings, actual_document_emb)}")
print(f"Similarity with HyDE: {cosine_similarity(hypothetical_document_emb, actual_document_emb)}")

## Example: RAG with vs without Query Rewriting

Taking the RAG pipeline from the last article, “How to build a basic RAG app”, we will introduce Query Rewriting into it. We will ask it a question a bit more advanced than last time and observe whether the response improves with Query Rewriting over without it. First, let's build the same RAG pipeline. Only this time, I'll only use the top document returned from the vector database to be less forgiving to missed documents.

In [None]:
from langchain_community.document_loaders import PyPDFLoader

document_url = "https://arxiv.org/pdf/2312.10997.pdf"
loader = PyPDFLoader(document_url)
pages = loader.load()

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=40,
    length_function=len,
    is_separator_regex=False,
)
chunks = text_splitter.split_documents(pages)

In [None]:
chunk_texts = list(map(lambda d: d.page_content, chunks))
embeddings = embeddings_model.embed_documents(chunk_texts)

In [None]:
from langchain_community.vectorstores import FAISS

text_embedding_pairs = zip(chunk_texts, embeddings)
db = FAISS.from_embeddings(text_embedding_pairs, embeddings_model)

In [None]:
query = "Which evaluation tools are useful for evaluating a RAG pipeline?"

contexts = db.similarity_search(query, k=1)

print(contexts[0])

In [None]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an expert at answering questions based on a context extracted from a document. The context extracted from the document is: {context}"),
        ("human", "{question}"),
    ]
)
chain = prompt | model

response = chain.invoke({
    "context": '\n\n'.join(list(map(lambda c: c.page_content, contexts))),
    "question": query
})
response

The response is good and based on the context, but it got caught up in me asking about evaluation and missed that I was specifically asking for tools. Therefore, the context used does have information on some benchmarks, but it misses the next chunk of information that talks about tools.

Now, let's implement the same RAG pipeline but now with Query Rewriting. As well as the query rewriting prompts, we have already seen in the previous examples, I'll be using a Pydantic parser to extract and iterate over the generated alternative queries.

In [None]:
from langchain_core.pydantic_v1 import BaseModel, Field

class ParaphrasedQuery(BaseModel):
    """You have performed query expansion to generate a paraphrasing of a question."""

    paraphrased_query: str = Field(
        description="A unique paraphrasing of the original question.",
    )

In [None]:
from langchain.output_parsers import PydanticToolsParser

rewrite_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_rewrite),
        ("human", "{question}"),
    ]
)
llm_with_tools = model.bind_tools([ParaphrasedQuery])
query_analyzer = rewrite_prompt | llm_with_tools | PydanticToolsParser(tools=[ParaphrasedQuery])

queries = query_analyzer.invoke({
    "question": query
})
queries

In [None]:
contexts = []
for query in queries:
  contexts = contexts + db.similarity_search(query.paraphrased_query, k=1)
contexts

In [None]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an expert at answering questions based on a context extracted from a document. The context extracted from the document is: {context}"),
        ("human", "{question}"),
    ]
)
chain = prompt | model

response = chain.invoke({
    "context": '\n\n'.join(list(map(lambda c: c.page_content, contexts))),
    "question": query
})
response

In [None]:
#tested the code uses
from IPython.display import clear_output
clear_output(wait=True)
#test test ss



The new query now matches with the chunk of information I wanted to get my answer from, giving the LLM a better chance of answering a much better response for my question.