# MultiQueryRetriever in LangChain

`MultiQueryRetriever` allows you to execute multiple queries on a single retriever, combining results for more comprehensive document retrieval.

### Use Cases:
- **Broader Querying**: Retrieve more data by breaking a query into sub-queries.
- **Query Variability**: Get results for multiple variations of a query.
- **Complementary Results**: Combine results from multiple queries for a fuller response.

### Key Features:
- Executes several queries sequentially or in parallel.
- Combines results from multiple queries.
- Increases efficiency when retrieving diverse information.

### Advantages

- **Flexibility**: Handle different angles of a topic with multiple queries.  
- **Enhanced Retrieval**: Capture more comprehensive results.  
- **Performance**: Optimized retrieval for multiple queries.  

### Considerations

- **Query Management**: Complex logic needed for combining results.  
- **Redundancy**: Handle overlapping results between queries.  


### Step-1: Required Package Installation

These dependencies will set up a complete environment.

In [None]:
!pip install langchain langchain-ollama langchain-chroma langchain-community langchain-text-splitters


### Step-2: Imports
These imports set up an environment for out tryout.

In [2]:
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_chroma import Chroma
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

USER_AGENT environment variable not set, consider setting it to identify your requests.


### Step-3: Configuration

Set the Ollama server URL and model names for chat and embeddings.

In [None]:
OLLAMA_URL = "" # Replace with your Ollama server URL
DEFAULT_MODEL = "llama3.1:8b"
EMBED_MODEL = "snowflake-arctic-embed2:latest"

### Step-4: Loading & Splitting
The following code loads a blog post from the web, then splits the content into manageable text chunks for embedding or retrieval.

In [5]:
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

### Step-5: Create Embeddings and Vector Store
The following code generates embeddings for the text chunks and stores them in a Chroma vector database.

In [6]:
embeddings = OllamaEmbeddings(model=EMBED_MODEL, base_url=OLLAMA_URL)
vectordb = Chroma.from_documents(documents=splits, embedding=embeddings)

### Step-6: Initialize Language Model (LLM)
Setting up the chat-based LLM using Ollama.

In [7]:
llm = ChatOllama(model=DEFAULT_MODEL, base_url=OLLAMA_URL)

### Step-7: Create MultiQuery Retriever

Setting up a `MultiQueryRetriever` that uses an LLM to generate multiple query variations and retrieve relevant documents. This improves the quality and diversity of retrieved results by capturing different interpretations of the input query.

In [8]:
retriever = MultiQueryRetriever.from_llm(retriever=vectordb.as_retriever(), llm=llm)

### Step-8: Running a Query with MultiQueryRetriever

Use the retriever to find relevant information based on multiple reformulated versions of the query.


In [9]:
query = "What are the approaches to Task Decomposition?"

### Step-9: Generate Multiple Query Variations

In [10]:
generated_queries = retriever.llm_chain.invoke({"question": query})
queries = generated_queries

In [11]:
print("🔍 Reformulated Queries:")
for q in queries:
    print("-", q)

🔍 Reformulated Queries:
- Here are three different versions of the original question to retrieve relevant documents from a vector database:
- What are the methods for breaking down complex tasks into smaller, manageable subtasks?
- Can you provide information on task decomposition techniques and their applications in various domains?
- What are the strategies used to divide complex tasks into more organized and structured components?


In [12]:
retrieved_docs = retriever.invoke(query)
print(f"\n📄 Retrieved {len(retrieved_docs)} documents.\n")


for i, doc in enumerate(retrieved_docs[:3]):
    print(f"--- Document {i+1} ---")
    print(doc.page_content[:500])
    print()


📄 Retrieved 20 documents.

--- Document 1 ---
to API search engine to find the right API to call and then uses the corresponding documentation to make a call.

--- Document 2 ---
[11] Nakano et al. “Webgpt: Browser-assisted question-answering with human feedback.” arXiv preprint arXiv:2112.09332 (2021).
[12] Parisi et al. “TALM: Tool Augmented Language Models”
[13] Schick et al. “Toolformer: Language Models Can Teach Themselves to Use Tools.” arXiv preprint arXiv:2302.04761 (2023).
[14] Weaviate Blog. Why is Vector Search so fast? Sep 13, 2022.
[15] Li et al. “API-Bank: A Benchmark for Tool-Augmented LLMs” arXiv preprint arXiv:2304.08244 (2023).

--- Document 3 ---
Maximum Inner Product Search (MIPS)#
The external memory can alleviate the restriction of finite attention span.  A standard practice is to save the embedding representation of information into a vector store database that can support fast maximum inner-product search (MIPS). To optimize the retrieval speed, the common choi