# Multi-Query Retriever Demo

Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded documents based on a distance metric. But, retrieval may produce different results with subtle changes in query wording, or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.

The MultiQueryRetriever automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the MultiQueryRetriever can mitigate some of the limitations of the distance-based retrieval and get a richer set of results.

## What this notebook contains
- Generating multiple query variants from a single user question with a Chat LLM.
- Retrieving documents for each variant using a vector store (Chroma) and OpenAI embeddings.
- Deduplicating and unifying results across queries (unique union).
- Building a simple RAG pipeline that answers a question using the combined context.

## References
https://python.langchain.com/docs/how_to/MultiQueryRetriever/


In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter  
from langchain_community.document_loaders import WebBaseLoader  
from langchain_community.vectorstores import Chroma  
from langchain_core.output_parsers import StrOutputParser  
from langchain_core.runnables import RunnablePassthrough  
from langchain_openai import ChatOpenAI, OpenAIEmbeddings 
from langchain.prompts import ChatPromptTemplate
from langchain.load import dumps, loads
from operator import itemgetter
import yaml
import bs4  
import os

In [9]:
# Get the current working directory
cwd = os.getcwd()

# Build the path to config.yaml
config_path = os.path.join(cwd, '..', 'configs', 'config.yaml')

# Normalize the path
config_path = os.path.abspath(config_path)

# Load credential from config file
with open(config_path, 'r') as file:
    config = yaml.safe_load(file)

# Set environment variables
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = config['API']['LANGCHAIN']
os.environ['OPENAI_API_KEY'] = config['API']['OPENAI']

In [10]:
# Create a loader that fetches and parses the target web page
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),  # tuple of URLs to load
    bs_kwargs=dict(  # pass BeautifulSoup-specific kwargs to limit parsing
        parse_only=bs4.SoupStrainer(  # only parse these parts of the page to reduce noise
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

# Fetch and return a list of Document objects
docs = loader.load()  

# Split long documents into smaller overlapping chunks suitable for embeddings
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
splits = text_splitter.split_documents(docs)  # list of smaller Document chunks

# Create embeddings and store them in a vector DB (Chroma)
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=OpenAIEmbeddings())  # uses OpenAI embeddings under the hood

# Create a retriever to fetch relevant docs (return the top 1 result)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

In [None]:
# Ask an LLM to rephrase the question into multiple variants (generate mult-queries)
template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to:  {question} \n
Output (5 queries):
"""

# Alternatively, a decomposition prompt can be used
# template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
# The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
# Generate multiple search queries related to: {question} \n
# Output (3 queries):"""

# Build the prompt object
prompt_perspectives = ChatPromptTemplate.from_template(template)

# Pipeline: prompt -> chat model -> parse -> split into separate queries
generate_queries = (
    prompt_perspectives 
    | ChatOpenAI(temperature=0) 
    | StrOutputParser() 
    | (lambda x: x.split("\n"))  # split on newlines into a list
)

In [12]:
# Example input question
question = "What is task decomposition for LLM agents?"

# Generate multiple query variants
result = generate_queries.invoke({"question":question})  

# Display the generated variants
display(result)

['1. How do LLM agents use task decomposition in problem-solving?',
 '2. Benefits of task decomposition for LLM agents in artificial intelligence.',
 '3. Examples of task decomposition techniques used by LLM agents.',
 '4. Challenges of implementing task decomposition for LLM agents.',
 '5. Comparison of task decomposition approaches for LLM agents in different AI systems.']

In [13]:
# Deduplicate retrieved documents (only keep the unique documents across all the retrieved documents)
def get_unique_union(documents: list[list]):  
    """ Unique union of retrieved docs """
    # Flatten list of lists and serialize each Document for deduplication
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Use a set to deduplicate serialized documents
    unique_docs = list(set(flattened_docs))
    # Deserialize unique documents and return them
    return [loads(doc) for doc in unique_docs]

# Compose the multi-query retrieval pipeline and run it (generate -> retrieve -> dedupe)
retrieval_chain = generate_queries | retriever.map() | get_unique_union 

# Retrieve unique docs
docs = retrieval_chain.invoke({"question":question}) 

# Show unique retrieved documents
display(docs)

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='}\n]\nChallenges#\nAfter going through key ideas and demos of building LLM-centered agents, I start to see a couple common limitations:'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='The system comprises of 4 stages:\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.')]

In [14]:
# RAG - build and run final chain that answers the question using retrieved context
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

# Build the prompt object
prompt = ChatPromptTemplate.from_template(template)

# Configure chat LLM (deterministic)
llm = ChatOpenAI(temperature=0) 

# Compose pipeline: retrieval_chain -> prompt -> llm -> parse
final_rag_chain = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()  # parse final output to string
)

# Execute the chain and get the answer
final_rag_chain.invoke({"question":question})

'Task decomposition for LLM agents involves breaking down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks. This process allows the agent to parse user requests into multiple tasks with attributes such as task type, ID, dependencies, and arguments, using few-shot examples to guide the LLM in task parsing and planning.'