# How to handle multiple queries when doing query analysis

When using LangChain to analyze user input and generate search queries, our analysis might sometimes result in multiple queries being created. This can happen when the user's request is complex and requires breaking it down into several smaller, more specific searches.

In these cases, our LangChain workflow needs to be designed to handle multiple queries efficiently. We need to:

Execute each of the generated queries.
Combine the results from all the queries into a single, coherent response.
We'll demonstrate how to do this using simulated data. This will show how LangChain can be used to manage and process multiple queries, ensuring that all relevant information is retrieved and presented to the user.

Essentially, we're building LangChain workflows that are capable of managing and orchestrating multiple retrieval operations and combining the results into an accurate response.


# 1. Install Dependencies

In [27]:
%pip install -qU langchain langchain-community langchain-openai langchain-chroma

In [28]:
import getpass
import os

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass()

# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.
# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

# 2. Create Index

This code does the following:

It takes a simple text string.

It uses OpenAI's "text-embedding-3-small" model to generate a numerical representation (embedding) of that string.

It stores the string and its embedding in a Chroma vector database.

It creates a retriever that can be used to find similar texts based on their embeddings.

In [29]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

texts = ["Harrison worked at Kensho", "Ankush worked at Facebook"]
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_texts(
    texts,
    embeddings,
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

# 3. Query analysis

a. Import necessary dependencies

b. Search Class Definition

class Search(BaseModel):: Defines a new class named Search that inherits from BaseModel. This makes it a Pydantic model.

c. queries Field Definition:

queries: List[str] = Field(..., description="Distinct queries to search for"): Defines a field named queries within the Search model.

queries: List[str]: Specifies that the queries field should be a list of strings.

Field(...): Uses the Field function to configure the field.

...: The ellipsis (...) indicates that this field is required. If a Search object is created without a value for queries, Pydantic will raise a validation error.

description="Distinct queries to search for": Sets the description of the queries field. This description is useful for documentation and generating schemas.



In [30]:
from typing import List, Optional

from pydantic import BaseModel, Field


class Search(BaseModel):
    """Search over a database of job records."""

    queries: List[str] = Field(
        ...,
        description="Distinct queries to search for",
    )

### Design the langchain chain

1. Import necessary dependencies

2. Output Parser Creation:

output_parser = PydanticToolsParser(tools=[Search]): Creates a PydanticToolsParser object.

tools=[Search]: Specifies that the parser should parse tool calls that match the Search Pydantic model (which, in this case, allows for a list of search queries).

3. System Prompt Definition:

system = """...""": Defines a system prompt for the language model.

It instructs the model that it can issue search queries.

Crucially, it explicitly allows the model to perform two distinct searches if needed.

4. Chat Prompt Template Creation:

prompt = ChatPromptTemplate.from_messages([...]): Creates a chat prompt template.

("system", system): Adds the system prompt.

("human", "{question}"): Adds a placeholder for the user's question.

5. Language Model Initialization:

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0): Initializes a ChatOpenAI object.

model="gpt-4o-mini": Specifies the OpenAI chat model.

temperature=0: Sets the temperature to 0, making the model's responses more deterministic.

6. Structured LLM Creation:

structured_llm = llm.with_structured_output(Search): Creates a version of the LLM that is configured to return structured output matching the Search Pydantic model.

with_structured_output(Search): This LangChain method enables the LLM to directly output a pydantic object, in this case, a Search object. This is an alternative to using tool calling.

7. Query Analyzer Chain Creation:

query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm: Creates a runnable chain.

{"question": RunnablePassthrough()}: Passes the user's question through.
| prompt: Formats the question with the system prompt.
| structured_llm: Sends the formatted prompt to the language model, which generates a structured Search object as output.

In essence:

This code sets up a pipeline that:

Takes a user question as input.

Formats the question with a system prompt that encourages the language model to use search queries, and to use multiple queries if necessary.

Sends the formatted prompt to the language model, which is configured to return a structured Search object (containing a list of search queries).

Returns a pydantic object of type search, that contains a list of strings.

This pipeline is designed to handle user questions that might require multiple searches to answer, making it more flexible than the previous example. The major difference is that this implementation uses with_structured_output instead of tool calling.

In [31]:
from langchain_core.output_parsers.openai_tools import PydanticToolsParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

output_parser = PydanticToolsParser(tools=[Search])

system = """You have the ability to issue search queries to get information to help answer user information.

If you need to look up two distinct pieces of information, you are allowed to do that!"""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

In [32]:
query_analyzer.invoke("where did Harrison Work")

Search(queries=['Harrison Work history', 'Harrison employment records'])

In [33]:
query_analyzer.invoke("where did Harrison and ankush Work")

Search(queries=['Harrison work history', 'Ankush work history'])

# 4. Retrieval with query analysis

1. Import necessary dependencies

2. @chain Decorator:

@chain: This decorator makes the custom_chain function a LangChain runnable, allowing it to be used in LangChain's runnable framework.

3. custom_chain Asynchronous Function Definition:

async def custom_chain(question):: Defines an asynchronous function custom_chain.
async: Indicates that this function can use await to pause execution until asynchronous operations complete.
question: Takes a question (presumably a string) as input.

4. Asynchronous Query Analysis:

response = await query_analyzer.ainvoke(question): This line asynchronously invokes the query_analyzer chain (defined in a previous example) with the input question.
ainvoke(): This is the asynchronous version of invoke(), used for asynchronous runnables.
await: Pauses execution until the query_analyzer completes and returns a result.
The result (a Search object containing a list of queries) is stored in the response variable.

5. Document Retrieval Loop:

docs = []: Initializes an empty list named docs to store the retrieved documents.
for query in response.queries:: Iterates through the list of queries in the response.queries attribute.
new_docs = await retriever.ainvoke(query): Asynchronously invokes the retriever (defined in a previous example) with the current query.
ainvoke(): Asynchronous invocation of the retriever.
await: Pauses execution until the retriever completes and returns a list of documents.
docs.extend(new_docs): Extends the docs list with the retrieved documents.

6. Document Reranking/Deduplication (Comment):

# You probably want to think about reranking or deduplicating documents here: This comment reminds you that after retrieving documents from multiple queries, you might want to:
Rerank: Reorder the documents based on relevance, potentially considering factors like the original question and the combined content of the documents.
Deduplicate: Remove duplicate documents that might have been retrieved by different queries.
# But that is a separate topic: This indicates that reranking and deduplication are beyond the scope of this code snippet.

7. Return Documents:

return docs: Returns the list of retrieved documents.
In essence:

The custom_chain function does the following:

It takes a user question as input.

It asynchronously uses the query_analyzer to generate a list of search queries.

It iterates through the list of queries, asynchronously retrieving documents for each query using the retriever.

It combines the retrieved documents into a single list.

It returns the combined list of documents.

This chain is designed to handle user questions that might require multiple searches, retrieving documents for each search and combining the results. It also highlights the need for further processing (reranking and deduplication) in a real-world application.

In [35]:
from langchain_core.runnables import chain

@chain
async def custom_chain(question):
    response = await query_analyzer.ainvoke(question)
    docs = []
    for query in response.queries:
        new_docs = await retriever.ainvoke(query)
        docs.extend(new_docs)
    # You probably want to think about reranking or deduplicating documents here
    # But that is a separate topic
    return docs

In [36]:
await custom_chain.ainvoke("where did Harrison Work")

[Document(id='a5e84727-c834-49f9-981b-040705761fb2', metadata={}, page_content='Harrison worked at Kensho'),
 Document(id='a5e84727-c834-49f9-981b-040705761fb2', metadata={}, page_content='Harrison worked at Kensho')]

1. await custom_chain.ainvoke("where did Harrison and ankush Work"):

This line asynchronously calls the custom_chain function with the question "where did Harrison and ankush Work".

As we know from the previous explanation, this chain will:

Use the query_analyzer to generate a list of search queries (in this case, likely two: "where did Harrison work" and "where did Ankush work").

Asynchronously retrieve documents for each query using the retriever.
Combine the retrieved documents into a single list.

Return the list of documents.

In [37]:
await custom_chain.ainvoke("where did Harrison and ankush Work")

[Document(id='a5e84727-c834-49f9-981b-040705761fb2', metadata={}, page_content='Harrison worked at Kensho'),
 Document(id='2e01cd83-33c9-4b78-9802-f2787717c16f', metadata={}, page_content='Ankush worked at Facebook')]

### Output explanation

This is the list of documents returned by the custom_chain.

It contains two Document objects, indicating that the chain successfully retrieved two relevant documents.

Document(id='a5e84727-c834-49f9-981b-040705761fb2', metadata={}, page_content='Harrison worked at Kensho'):

This document's page_content is "Harrison worked at Kensho", which answers the first part of the question.

The id is a unique identifier assigned by Chroma.

metadata is empty.

Document(id='2e01cd83-33c9-4b78-9802-f2787717c16f', metadata={}, page_content='Ankush worked at Facebook'):

This document's page_content is "Ankush worked at Facebook", which answers the second part of the question.

The id is a unique identifier.

metadata is empty.

In [25]:
custom_chain.invoke("hi!")

AIMessage(content='Hello! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 93, 'total_tokens': 104, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_06737a9306', 'finish_reason': 'stop', 'logprobs': None}, id='run-33eada19-6f80-48eb-9e07-352003ba2b28-0', usage_metadata={'input_tokens': 93, 'output_tokens': 11, 'total_tokens': 104, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

In [26]:
custom_chain.invoke("Who is david !")

AIMessage(content='Could you please provide more context about which David you are referring to? There are many notable individuals named David, including historical figures, celebrities, and fictional characters.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 34, 'prompt_tokens': 95, 'total_tokens': 129, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_06737a9306', 'finish_reason': 'stop', 'logprobs': None}, id='run-5eab6ba8-fa97-4abc-8dfe-fda4ec97ee41-0', usage_metadata={'input_tokens': 95, 'output_tokens': 34, 'total_tokens': 129, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})