# Query Translation: Multi-Query

![multi-query](../images/images-mulit-query.png)

Basic RAG chains find similar embedded documents based on a distance metric. But, retrieval may produce different results with subtle changes in query wording, or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.

The [MultiQueryRetriever](https://python.langchain.com/docs/how_to/MultiQueryRetriever/) automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the MultiQueryRetriever can mitigate some of the limitations of the distance-based retrieval and get a richer set of results.

![multi-query](../images/multi-query.png)

## Setup

In [1]:
%run "../Z - Common/setup.ipynb"

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [5]:
docs = load_sample_data()
split_docs = split_sample_data(docs)
retriever = seed_sample_data(split_docs)

First let's build a prompt that enables multi-query.

In [2]:
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}"""

prompt_perspectives = ChatPromptTemplate.from_template(template)

chain_generate_queries = (
    prompt_perspectives 
    | llm
    | StrOutputParser() 
    | (lambda x: x.split("\n\n"))
)

Once we have the results from the 5 different variations of the original query, we need to union the results.

In [3]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

We can now build a second chain that joins the results.

In [11]:
chain_retrieval = chain_generate_queries | retriever.map() | get_unique_union

question = "What are the main components of an LLM-powered autonomous agent system?"
docs = chain_retrieval.invoke({"question":question})

print("No. of results: ", len(docs))
docs


No. of results:  4


[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refi

Take a look at the [Langsmith tracing](https://smith.langchain.com/) for the request we just executed. You will see the call to the LLM was provided the following input:

```
You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: What is task decomposition for LLM agents?
```

And rendered the following output:

```
Here are 5 alternative versions of the question to help retrieve relevant documents:

How do LLM agents break down complex tasks into smaller subtasks?

What are the methods and techniques used for decomposing tasks when working with language model agents?

Can you explain the process of splitting larger problems into manageable steps for AI agents?

What is the role of task planning and decomposition in LLM-based autonomous systems?

How do large language model agents analyze and structure tasks into hierarchical components?
```

It then made 5 queries to the vector store to answer those questions, and joined (union) the results.



In the [Basic RAG notebook](../A%20-%20Basic/Basic%20RAG.ipynb) we defined the prompt manually. Instead of defining our own prompts, we can make use of prompt templates published in the [Langchain Hub](https://smith.langchain.com/hub). Lets replace our previous prompt with one from the hub and rebuild the chain.

In [12]:
from langchain import hub
from pprint import pprint

prompt = hub.pull("rlm/rag-prompt")
pprint(prompt)

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})])




Finally, we build a third chain once has retrieved the results from the previous chain, uses those documents as context to answer the original question:

In [13]:
from operator import itemgetter

chain_rag = (
    {"context": chain_retrieval, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

result = chain_rag.invoke({"question":question})
result


'Based on the context, the main components of an LLM-powered autonomous agent system include: 1) Planning (which involves subgoal decomposition and reflection/refinement), 2) Memory (with retrieval models that consider relevance, recency, and importance), and 3) LLM as the core controller or "brain" of the system. These components work together to enable the agent to break down tasks, learn from past experiences, and make informed decisions.'

Save the results so we can compare later.

In [14]:
write_results("multi-query.txt", result)