# Query Rewriting

In RAG we can often encounter issues with the users query. This could be inaccuracy, ambiguity, or even a lack of information. In general its pretty common that the query is not optimal for the retrieval phase of RAG.

Imagine the following scenario. You're building an application that automatically answers support tickets. Queries in this case will be emails from customers. You can imagine the following issues that can arise:

- Emails often contain noise, such as greetings, signatures, etc. which are not relevant for the question and can confuse the model.
- The user might have multiple questions in the same email, which makes it harder to retrieve the correct documents. 
- The question posed by the user might be ambiguous, or not specific enough.

A group of techniques in the **Pre-retrieval** phase categorised as **Query Rewriting** are trying to remedy issues like these. This notebook will explore these methods of rewriting queries to make them more suitable for the retrieval phase of RAG.

The following exmaples will be covered:

- **Rewrite-Retrieve-Read**: Explore a technique proposed in the paper [Query Rewriting for Retrieval-Augmented Large Language Models](https://arxiv.org/pdf/2305.14283.pdf)
- **Hypothetical Document Embeddings (HyDE)**: Generate hypothetical documents to align the semantic space proposed in the paper [Precise Zero-Shot Dense Retrieval without Relevance Labels](https://arxiv.org/pdf/2212.10496.pdf)
- **Step-Back Prompting**: A prompting technique that allows the LLM to do abstractions to derive high-level concepts based on the paper [Take A Step Back: Evoking Reasoning via Abstraction in Large Language Models](https://arxiv.org/pdf/2310.06117)

### Setup libraries and environment

In [None]:
import os
from dotenv import load_dotenv
from util.helpers import get_wiki_pages, create_and_save_wiki_md_files
from util.query_engines import VerboseHyDEQueryTransform, VerboseStepBackQueryEngine, RewriteRetrieveReadQueryEngine
from typing import Optional

from llama_index.llms.openai import OpenAI
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    PromptTemplate,
    Settings,
)
from llama_index.core.prompts.prompt_type import PromptType
from llama_index.core.prompts.base import BasePromptTemplate
from llama_index.core.prompts.mixin import PromptDictType
from llama_index.core.query_engine import CustomQueryEngine, TransformQueryEngine
from llama_index.core.schema import QueryBundle
from llama_index.core.indices.query.query_transform.base import BaseQueryTransform
from llama_index.core.service_context_elements.llm_predictor import LLMPredictorType

Add the following to a `.env` file in the root of the project if not already there.

```
OPENAI_API_KEY=<YOUR_KEY_HERE>
```

In [None]:
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
model = "gpt-4-turbo"
llm = OpenAI(api_key=OPENAI_API_KEY, model=model)

## Rewrite-Retrieve-Read

This techniques improves on the baseline RAG setup by adding and intermediate between the input and retriever. This step uses an LLM to filter and rephrase the input query before passing it to the retriever. This potentially allows the for multiple queries to be generated (representing multiple questions in the input) which can be be be sent to the retriever separately.

In the following example we have two different knowledge sources that contain information about different subjects. For the sake of this example, we will just use two different wikipedia articles as the base for each knowledge source. This can also be seen as a form for **Query Routing**.

Lets start by loading the data and the models.

In [None]:
van_gogh_pages = get_wiki_pages(["Vincent van Gogh"])
amsterdam_pages = get_wiki_pages(["Amsterdam"])

In [None]:
create_and_save_wiki_md_files(van_gogh_pages, path="./data/docs/vangogh/")
create_and_save_wiki_md_files(amsterdam_pages, path="./data/docs/amsterdam/")

In [None]:
van_gogh_docs = SimpleDirectoryReader("./data/docs/vangogh").load_data()
amsterdam_docs = SimpleDirectoryReader("./data/docs/amsterdam").load_data()

In [None]:
vg_index = VectorStoreIndex.from_documents(documents=van_gogh_docs, show_progress=True)
a_index = VectorStoreIndex.from_documents(documents=amsterdam_docs, show_progress=True)

Now we will create a custom pipeline that will use the LLM to rewrite the query before passing it to the retriever. This pipeline will have the following steps:
- **Rewriter**: This step will use the LLM to rewrite the query into multiple questions an categorize for each question the knowledge source that should be used.
- **Retriever**: This step will use the retriever to retrieve the documents for each question.
- **Reader**: Answer each question using the retrieved documents for that document.
- **AnswerMerger**: This step will merge the answers from the reader into a single answer and validate the answers.

The following are the prompts used for the rewriter and the merging step:

In [None]:
REWRITE_PROMPT = PromptTemplate(
    """You're a helpful AI assistant that helps people learn about different topics. 
Given the following query: 
-----------------------------------
{query_str},
-----------------------------------

Extract each question from the query and categorize it into one of the following categories separated into key and description pairs:
-----------------------------------
{categories_with_descriptions}
-----------------------------------

Your output should a comma separated list of questions with their corresponding category prepended in square brackets.
Example: 
-----------------------------------
"What is the capital of France? And who is Prince" -> "[Geography]What is the capital of France?,[People]Who is Prince?"
-----------------------------------
Answer:
"""
)

VALIDATE_AND_MERGE = PromptTemplate(
    """You are a helpful AI assistant that validates, corrects and combines information in answers to a query
Query:
-----------------------------------
{query_stry}
-----------------------------------

Answers:
-----------------------------------
{answers}
-----------------------------------

Validate, correct and combine the answers to provide a single coherent response.
Answer:    
"""
)

We define the query engine. See implementation of `RewriteRetrieveReadQueryEngine` in [query_engines.py](./util/query_engines.py#185)

In [None]:
categories = ["VAN_GOGH", "AMSTERDAM"]
descriptions = ["Questions about Vincent van Gogh", "Questions about Amsterdam"]
retrievers = {"VAN_GOGH": vg_index.as_retriever(similarity_top_k=2), "AMSTERDAM": a_index.as_retriever(similarity_top_k=2)}

query_engine = RewriteRetrieveReadQueryEngine(
    categories=categories,
    descriptions=descriptions,
    retrievers=retrievers,
    llm=llm,
    verbose=True,
)

In [None]:
query = """
Hello, I am a student who is interested in learning about art and history. I have two questions that I would like to know more about.
Who is Vincent van Gogh and what is Amsterdam famous for?

Kind regards, Billy
"""

In [None]:
response = query_engine.query(query)
print("Response:", response)

## Hypothetical Document Embeddings (HyDE)

This technique generates hypothetical documents to align the semantic space of the retriever and the LLM. This allows the LLM to generate queries that are more suitable for the retrieval phase. It is based on the paper [Precise Zero-Shot Dense Retrieval without Relevance Labels](https://arxiv.org/pdf/2212.10496.pdf).

This method can also be seen as a kind of **Query Expansion**, since it expands the original query with hypothetical documents.

The method is split into four steps:
1. Generate hypothetical documents based on the query using an LLM. These documents may contain incorrect information, but the idea is that they are likely to resemble the relevant documents that needs to be retrieved.
2. Vectorize the hypothetical documents using an encoder which filters noise fom the hypothetical documents.
3. Compute the average of the vectorized hypothetical documents to get the hypothetical document embedding.
    - Optionally the average vector might include the embedding of the original query.
4. Retrieve documents using the generated embedding.

The following is the default prompt for the HyDE method:


In [None]:
HYDE_TMPL = (
    "Please write a passage to answer the question\n"
    "Try to include as many key details as possible.\n"
    "-----------------------------------\n"
    "{context_str}\n"
    "-----------------------------------\n"
    'Passage:"""\n'
)

A possible benefit of this method is that the prompt can be changed to better suit the specific use case. For example, if the retrieval phase is based on a specific domain, the prompt can be changed to better suit that domain.

- **Fetch from scientific papers**: *Write a scientific paper to answer the following question:*
- **Fetch from legal documents**: *Write a legal document to answer the following question:*
- **Fetch from different languages**: *Write passages in Danish to answer the following question:*

HyDE is implemented in LlamaIndex (including other frameworks) as `HyDEQueryTransform`. `VerboseHyDEQueryTransform` is an implementation of this in LlamaIndex with some added print statements to better understand the process. See the implementation in [query_engines.py](./util/query_engines.py#30)

In [None]:
index = VectorStoreIndex.from_documents(documents=van_gogh_docs + amsterdam_docs)
query_engine = index.as_query_engine(llm=llm)

hyde = VerboseHyDEQueryTransform(llm=llm, include_original=True, verbose=True)
hyde_query_engine = TransformQueryEngine(query_engine, hyde)

base_response = query_engine.query("What are the best places to visit in Amsterdam?")

print("Base response:", base_response)
print("-" * 100)

hyde_response = hyde_query_engine.query("What are the best places to visit in Amsterdam?")
print("-" * 100)
print("HyDE response:", hyde_response)

## Step-Back Prompting

This technique allows the LLM to do abstractions to derive high-level concepts. This is done by prompting an LLM to take a step back and think about the question in a more abstract way. It is based on the paper [Take A Step Back: Evoking Reasoning via Abstraction in Large Language Models](https://arxiv.org/pdf/2310.06117).
The idea is to define more abstract "step-back problems" based on the original problem. The LLM benefits from this approach when for example the query contains a lot of details or is very specific.

Step-back prompting is more of a general prompt engineering technique, and can be used in many different contexts. 
Other prompt engineering techniques include:
- **[Few-shot Prompting](https://www.promptingguide.ai/techniques/fewshot)** - Prompts that provide a few examples to enable context-learning
- **[Chain-of-Thought](https://www.promptingguide.ai/techniques/cot)** - Prompts that guide the LLM through a chain of thoughts to reach a conclusion. 
- **[Self-Consistency](https://www.promptingguide.ai/techniques/consistency)** - Prompts that require the LLM to be consistent with previous answers
- [Among others](https://www.promptingguide.ai/techniques)... 


The following is an example of how it can be used in the context of RAG with a mix of Step-back and few-shot prompting.
Notice how Few-shot is used to give the LLM examples of how to abstract the questions.

In [None]:
STEP_BACK_PROMPT_TMPL = PromptTemplate(
    """
    You are an expert at world knowledge. 
    Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. 
    
    Here are a few examples:
    
    Human: Could the members of The Police perform lawful arrests?
    AI: what can the members of The Police do?
    
    Human: Jan Sindel’s was born in what country?
    AI: what is Jan Sindel’s personal history? 
    
    Question: {query_str}
    Step-back question:"""
)


See the implementation of `VerboseStepBackQueryEngine` in [query_engines.py](./util/query_engines.py#265)

In [None]:
index = VectorStoreIndex.from_documents(documents=van_gogh_docs + amsterdam_docs)
query_engine = VerboseStepBackQueryEngine(retriever=index.as_retriever(similarity_top_k=2), llm=llm, verbose=True)

response = query_engine.query("What are the best places to visit in Amsterdam?")
print(str(response))