# Query analysis

In any question answering application we need to search for, or retrieve, information based on a user question. In the simplest case, we can search on the user input directly. This approach has a few common failure modes:

* The data has multiple attributes that a user input could be referring to,
* The user input contains multiple distinct questions in it,
* Search quality is sensitive to phrasing.

To handle these, we can do **query analysis** to translate the raw user question into a query or queries optimized for our index. To illustrate, let's build a Q&A bot over the LangChain YouTube videos that performs query analysis.

**NOTE**: This guide assumes familiarity with the basic building blocks of a simple RAG application outlined in the [Quickstart](/docs/use_cases/question_answering/quickstart).

## Setup
#### Install dependencies

In [None]:
# %pip install -qU langchain-community langchain-openai youtube-transcript-api pytube chromadb

#### Set environment variables

We'll use OpenAI in this example:

In [1]:
import getpass
import os

# os.environ["OPENAI_API_KEY"] = getpass.getpass()

# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

### Load documents

We can use the `YouTubeLoader` to load transcripts of a few LangChain videos:

In [2]:
from langchain_community.document_loaders import YoutubeLoader

urls = [
    "https://www.youtube.com/watch?v=pbAd8O1Lvm4",
    "https://www.youtube.com/watch?v=ylrew7qb8sQ",
    "https://www.youtube.com/watch?v=uRya4zRrRx4",
    "https://www.youtube.com/watch?v=hvAPnpSfSGo",
    "https://www.youtube.com/watch?v=ZcEMLz27sL4",
    "https://www.youtube.com/watch?v=3wAON0Lqviw",
    "https://www.youtube.com/watch?v=jx7xuHlfsEQ",
    "https://www.youtube.com/watch?v=xn1jEjRyJ2U",
    "https://www.youtube.com/watch?v=SaDzIVkYqyY",
    "https://www.youtube.com/watch?v=gqhlqdawHT4",
    "https://www.youtube.com/watch?v=Ce03oEotdPs",
    "https://www.youtube.com/watch?v=rZus0JtRqXE",
    "https://www.youtube.com/watch?v=HAn9vnJy6S4",
    "https://www.youtube.com/watch?v=dA1cHGACXCo",
    "https://www.youtube.com/watch?v=ZcEMLz27sL4",
    "https://www.youtube.com/watch?v=hvAPnpSfSGo",
    "https://www.youtube.com/watch?v=EhlPDL4QrWY",
    "https://www.youtube.com/watch?v=mmBo8nlu2j0",
    "https://www.youtube.com/watch?v=rQdibOsL1ps",
    "https://www.youtube.com/watch?v=28lC4fqukoc",
    "https://www.youtube.com/watch?v=es-9MgxB-uc",
    "https://www.youtube.com/watch?v=wLRHwKuKvOE",
    "https://www.youtube.com/watch?v=ObIltMaRJvY",
    "https://www.youtube.com/watch?v=DjuXACWYkkU",
    "https://www.youtube.com/watch?v=o7C9ld6Ln-M",
]
docs = []
for url in urls:
    docs.extend(YoutubeLoader.from_youtube_url(url, add_video_info=True).load())

In [3]:
[doc.metadata["title"] for doc in docs]

['Self-reflective RAG with LangGraph: Self-RAG and CRAG',
 'WebVoyager',
 'LangGraph: Planning Agents',
 'LangGraph: Multi-Agent Workflows',
 'Streaming Events: Introducing a new `stream_events` method',
 'LangSmith: In-Depth Platform Overview',
 'LangSmith in 10 Minutes',
 'RAG from scratch: Part 8 (Query Translation -- Step Back)',
 'RAG from scratch: Part 9 (Query Translation -- HyDE)',
 'RAG from scratch: Part 7 (Query Translation -- Decomposition - v1)',
 'LangChain Agents with Open Source Models!',
 'Gemini + Google Retrieval Agent from a LangChain Template',
 'OpenGPTs',
 'Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve',
 'Streaming Events: Introducing a new `stream_events` method',
 'LangGraph: Multi-Agent Workflows',
 'Build and Deploy a RAG app with Pinecone Serverless',
 'Auto-Prompt Builder (with Hosted LangServe)',
 'Build a Full Stack RAG App With TypeScript',
 'Getting Started with Multi-Modal LLMs',
 'SQL Research Assi

In [4]:
docs[0].page_content[:500]

"hi this is Lance from Lang chain I'm going to be talking about using Lang graph to build a diverse and sophisticated rag flows so just to set the stage the basic rag flow you can see here starts with a question retrieval of relevant documents from an index which are passed into the context window of an llm for generation of an answer grounded in the ret documents so that's kind of the basic outline and we can see it's like a very linear path um in practice though you often encounter a few differ"

In [5]:
docs[0].metadata

{'source': 'pbAd8O1Lvm4',
 'title': 'Self-reflective RAG with LangGraph: Self-RAG and CRAG',
 'description': 'Unknown',
 'view_count': 7946,
 'thumbnail_url': 'https://i.ytimg.com/vi/pbAd8O1Lvm4/hq720.jpg',
 'publish_date': '2024-02-07 00:00:00',
 'length': 1058,
 'author': 'LangChain'}

We can see that along with a transcription each document also has a title, view count, publication date, and length.

### Indexing documents

We'll use a vector store to index our documents, and we'll chunk them first to make our retrievals more concise and precise:

In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=4000, chunk_overlap=500, add_start_index=True
)
chunked_docs = text_splitter.split_documents(docs)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
    chunked_docs, embeddings, collection_name="langchain_youtube"
)

## Retrieval without query analysis

We can perform similarity search on a user question directly to find chunks relevant to the question:

In [7]:
search_results = vectorstore.similarity_search("how do I build a RAG agent")
print(search_results[0].metadata["title"])
print(search_results[0].page_content[:500])

OpenGPTs
it decides to use a tool and so importantly it lets us know that it's deciding to use a tool and then it also lets us know what the result of the tool is and then it starts streaming back the response so this is streaming not just tokens but also these intermediate steps which provide really good visibility into what is going on we can see here we can see the response that we got back from tavil um and then we can see um the response from the AI and so there's lots of dad jokes in here this is u


This works pretty well! Our first result is quite relevant to the question.

Now what if we remembered that there was a video series titled "RAG from scratch" and wanted to find that specifically?

In [8]:
search_results = vectorstore.similarity_search(
    "rag from scratch",
)
print(search_results[0].metadata["title"])
print(search_results[0].page_content[:500])

Self-reflective RAG with LangGraph: Self-RAG and CRAG
hi this is Lance from Lang chain I'm going to be talking about using Lang graph to build a diverse and sophisticated rag flows so just to set the stage the basic rag flow you can see here starts with a question retrieval of relevant documents from an index which are passed into the context window of an llm for generation of an answer grounded in the ret documents so that's kind of the basic outline and we can see it's like a very linear path um in practice though you often encounter a few differ


In [9]:
[res.metadata["title"] for res in search_results]

['Self-reflective RAG with LangGraph: Self-RAG and CRAG',
 'Build and Deploy a RAG app with Pinecone Serverless',
 'OpenGPTs',
 'LangServe and LangChain Templates Webinar']

Since we're only searching over the transcriptions, and not over titles, our search misses all of the relevant documents. 

What if we wanted to search for results from a specific time period?

In [10]:
search_results = vectorstore.similarity_search("videos on RAG published in 2023")
print(search_results[0].metadata["title"])
print(search_results[0].metadata["publish_date"])
print(search_results[0].page_content[:500])

OpenGPTs
2024-01-31 00:00:00
it decides to use a tool and so importantly it lets us know that it's deciding to use a tool and then it also lets us know what the result of the tool is and then it starts streaming back the response so this is streaming not just tokens but also these intermediate steps which provide really good visibility into what is going on we can see here we can see the response that we got back from tavil um and then we can see um the response from the AI and so there's lots of dad jokes in here this is u


Our first result is from 2024, and not very relevant to the input. Since we're just searching against document contents, there's no way for the results to be filtered on any document attributes.

What if we wanted to know about deploying a LangChain chain as a REST API?

In [11]:
search_results = vectorstore.similarity_search("chain as rest api")
print(search_results[0].metadata["title"])
print(search_results[0].page_content[1500:2000])

Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve
 further and see um that the inputs to each of those kind of Lambda steps is going to be one of those documents and then we're outputting um like the highlights DL segment and then formatting that with our prompt template uh if you recall from when we were constructing it um so now we have kind of our uh fully constructed chain uh for our uh search enable chatbot with XF um and now let's convert that to Lang serve um so to do that we'll go back to vs code um and here we're going to um start with


This brings up LangServe, the package for deploying chains as REST API's, as desired.

But what if we added that we wanted a chain that made use of multi-modal models?

In [86]:
search_results = vectorstore.similarity_search(
    "how to use multi-modal models in a chain and turn chain into a rest api"
)
print(search_results[0].metadata["title"])
print(search_results[0].page_content[:500])

Streaming Events: Introducing a new `stream_events` method
streaming is uh an incredibly important ux consideration for building L Ms in a few ways first of all even if you're just working with a single llm call it can often take a while and you might want to stream individual tokens to the user so they can see what's happening as the llm responds second of all a lot of the things that we build in the laying chain are more complicated chains or agents and so being able to stream the intermediate steps what tool are being called what the input to those t


Our first result ends up not being about LangServe or multi-modal models. In reality "chains as rest API" and "using multi-modal models" are two fairly distinct questions that should be queried for separately.

## Query analysis

To handle these failure modes we can perform **query analysis**. Specifically, we can perform:

* **Query structuring**: If our documents have multiple searchable/filterable attributes, we can infer from any raw user question which specific attributes should be searched/filtered over. For example, when a user input specific something about video publication date, that should become a filter on the `publish_date` attribute of each document.
* **Query decomposition**: If a user input contains multiple distinct questions, we can decompose the input into separate queries.
* **Query expansion**: If an index is sensitive to query phrasing, we can multiple paraphrased versions of the user question to increase our chances of retrieving a relevant result.

To do this we'll define a query schema and use a function-calling model to convert a user question into a structured query or queries. The structured nature of the query schema allows us to do query structuring and routing, and the fact that we can extract multiple of these 
allows us to do decomposition and expansion.

### Query schema
In this case we'll have explicit min and max attributes for view count, publication date, and video length so that those can be filtered on. And we'll add separate attributes for searches against the transcript contents versus the video title. We'll also add some sorting attributes that we'll touch on later.

In [67]:
import datetime
from typing import Literal, Optional, Tuple

from langchain_core.pydantic_v1 import BaseModel, Field


class TutorialSearch(BaseModel):
    """Search over a database of tutorial videos about a software library."""

    content_search: str = Field(
        ...,
        description="Similarity search query applied to video transcripts.",
    )
    title_search: str = Field(
        ...,
        description=(
            "Alternate version of the content search query to apply to video titles. "
            "Should be succinct and only include key words that could be in a video "
            "title."
        ),
    )
    min_view_count: Optional[int] = Field(
        None, description="Minimum view count filter, inclusive."
    )
    max_view_count: Optional[int] = Field(
        None, description="Maximum view count filter, exclusive."
    )
    earliest_publish_date: Optional[datetime.date] = Field(
        None, description="Earliest publish date filter, inclusive."
    )
    latest_publish_date: Optional[datetime.date] = Field(
        None, description="Latest publish date filter, exclusive."
    )
    min_length_sec: Optional[int] = Field(
        None, description="Minimum video length in seconds, inclusive."
    )
    max_length_sec: Optional[int] = Field(
        None, description="Maximum video length in seconds, exclusive."
    )
    sort_by: Literal[
        "relevance",
        "view_count",
        "publish_date",
        "length",
    ] = Field("relevance", description="Attribute to sort by.")
    sort_order: Literal["ascending", "descending"] = Field(
        "descending", description="Whether to sort in ascending or descending order."
    )
    relevance_rank: int = Field(
        ...,
        description=(
            "The index of this search query when all generated queries are sorted by "
            "the expected relevance of their results to the original user question. "
            "Each query must have a distinct index. A lower rank indicates higher "
            "relevance to the user question."
        ),
    )

    def pretty_print(self) -> None:
        for field in self.__fields__:
            if getattr(self, field) is not None and getattr(self, field) != getattr(
                self.__fields__[field], "default", None
            ):
                print(f"{field}: {getattr(self, field)}")

### Query generation

To convert user questions to structured queries we'll make use of OpenAI's function-calling API. Since the latest OpenAI models can return multiple function invocations each turn, this approach automatically supports query expansion and decomposition.

In [72]:
from langchain.output_parsers import PydanticToolsParser
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results.

Perform query expansion. If there are multiple common ways of phrasing a user question \
or common synonyms for key words in the question, make sure to return multiple versions \
of the query with the different phrasings.

Perform query decomposition. If the user input contains a multi-part question, make \
sure to return a separate query for each distinct sub-question.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        MessagesPlaceholder("chat_history", optional=True),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
llm_with_tools = llm.bind_tools([TutorialSearch])
query_analyzer = (
    {"question": RunnablePassthrough()}
    | prompt
    | llm_with_tools
    | PydanticToolsParser(tools=[TutorialSearch])
)

Let's see what queries our analyzer generates for the questions we searched earlier:

In [73]:
for query in query_analyzer.invoke("rag from scratch"):
    query.pretty_print()
    print()

content_search: RAG from scratch
title_search: RAG
relevance_rank: 1

content_search: Reactive Agile Governance from scratch
title_search: Reactive Agile Governance
relevance_rank: 2

content_search: How to build RAG applications from the beginning
title_search: RAG applications
relevance_rank: 3



In [74]:
for query in query_analyzer.invoke("videos on RAG published in 2023"):
    query.pretty_print()
    print()

content_search: RAG
title_search: RAG
earliest_publish_date: 2023-01-01
latest_publish_date: 2024-01-01
relevance_rank: 1



In [85]:
for query in query_analyzer.invoke(
    "how to use multi-modal models in a chain and turn chain into a rest api"
):
    query.pretty_print()
    print()

content_search: multi-modal models chain
title_search: multi-modal models chain
relevance_rank: 1

content_search: chain into REST API
title_search: chain REST API
relevance_rank: 2



### Improvements: Adding examples to prompt

To tune our results we can add some examples of inputs questions and gold standard output queries to our prompt. We'll focus on examples that show how to route and expand queries, to either be against titles or content, how to structure them with filters, and how to decompose them:

In [76]:
examples = []

In [77]:
question = "What is Web Voyager? How about Gemini?"
queries = [
    TutorialSearch(
        content_search="what is Web Voyager",
        title_search="Web Voyager",
        relevance_rank=1,
    ),
    TutorialSearch(
        content_search="What is Gemini", title_search="Gemini", relevance_rank=2
    ),
]
examples.append({"input": question, "tool_calls": queries})

In [78]:
question = "Have they released any chat langchain updates since 2024?"
queries = [
    TutorialSearch(
        title_search="chat langchain",
        content_search="chat langchain",
        earliest_publish_date=datetime.date(2024, 1, 1),
        relevance_rank=1,
    ),
]
examples.append({"input": question, "tool_calls": queries})

In [79]:
question = "How to build multi-agent system and stream intermediate steps from it"
queries = [
    TutorialSearch(
        content_search="How to build multi-agent system",
        title_search="multi-agent system",
        relevance_rank=1,
    ),
    TutorialSearch(
        content_search="how to stream intermediate steps from multi-agent system",
        title_search="stream intermediate steps multi-agent system",
        relevance_rank=2,
    ),
    TutorialSearch(
        content_search="how to stream intermediate steps",
        title_search="stream intermediate steps",
        relevance_rank=3,
    ),
]
examples.append({"input": question, "tool_calls": queries})

Now we need to update our prompt template and chain so that the examples are included in each prompt. Since we're working with OpenAI function-calling, we'll need to do a bit of extra structuring to send example inputs and outputs to the model. We'll create a `tool_example_to_messages` helper function to handle this for us:

In [42]:
import uuid
from typing import Dict, List

from langchain_core.messages import (
    AIMessage,
    BaseMessage,
    HumanMessage,
    SystemMessage,
    ToolMessage,
)


def tool_example_to_messages(example: Dict) -> List[BaseMessage]:
    messages: List[BaseMessage] = [HumanMessage(content=example["input"])]
    openai_tool_calls = []
    for tool_call in example["tool_calls"]:
        openai_tool_calls.append(
            {
                "id": str(uuid.uuid4()),
                "type": "function",
                "function": {
                    "name": tool_call.__class__.__name__,
                    "arguments": tool_call.json(),
                },
            }
        )
    messages.append(
        AIMessage(content="", additional_kwargs={"tool_calls": openai_tool_calls})
    )
    tool_outputs = example.get("tool_outputs") or [
        "This is an example of a correct usage of this tool. Well done. Make sure to continue using the tool this way."
    ] * len(openai_tool_calls)
    for output, tool_call in zip(tool_outputs, openai_tool_calls):
        messages.append(ToolMessage(content=output, tool_call_id=tool_call["id"]))
    return messages

example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]
query_analyzer_with_examples = (
    {"question": RunnablePassthrough()}
    | prompt.partial(examples=example_msgs)
    | llm_with_tools
    | PydanticToolsParser(tools=[TutorialSearch])
)

In [81]:
for query in query_analyzer_with_examples.invoke("rag from scratch"):
    query.pretty_print()
    print()

content_search: rag from scratch
title_search: rag from scratch
relevance_rank: 1



In [84]:
for query in query_analyzer_with_examples.invoke(
    "how to use multi-modal models in a chain and turn chain into a rest api"
):
    query.pretty_print()
    print()

content_search: how to use multi-modal models in a chain
title_search: multi-modal models chain
relevance_rank: 1

content_search: how to turn chain into a REST API
title_search: chain REST API
relevance_rank: 2



In [83]:
for query in query_analyzer_with_examples.invoke(
    "How to do extraction with agent? How to build agent with anthropic"
):
    query.pretty_print()
    print()

content_search: How to do extraction with agent
title_search: extraction with agent
relevance_rank: 1

content_search: How to build agent with anthropic
title_search: build agent anthropic
relevance_rank: 2



## Retrieval with query analysis

Our query analysis looks pretty good; now let's try using our generated queries to actually perform retrieval. We'll define a custom retrieval lambda that takes our output queries and correctly applies them to our indexes. 

Before we do that, we'll also need to create a separate index since we created an additional search field, `title_search`, for searching against video titles specifically.

In [None]:
from copy import deepcopy

from langchain_core.documents import Document
from langchain.storage import InMemoryStore

docstore = InMemoryStore()
docstore.mset([(doc.metadata["source"], doc) for doc in docs])

title_docs = []
for doc in docs:
    metadata = deepcopy(doc.metadata)
    title_docs.append(Document(metadata.pop("title"), metadata=metadata))
title_vectorstore = Chroma.from_documents(
    title_docs, embeddings, collection_name="langchain_youtube_titles"
)

In [None]:
from typing import List
from langchain.chains.query_constructor.ir import StructuredQuery, Comparator, Comparison, Operation, Operator
from langchain.retrievers.self_query.chroma import ChromaTranslator

def query_to_chroma_filter(query: TutorialSearch) -> dict:
    comparisons = []
    if query.min_view_count is not None:
        comparisons.append(Comparison(comparator=Comparator.
    if query.max_view_count is not None:
        ...
    if query.min_length_sec is not None:
        ...
    if query.max_length_sec is not None:
        ...
    if query.earliest_publish_date is not None:
        ...
    if query.latest_publish_date is not None:
        ...


def content_search(query: TutorialSearch) -> List[Document]:
    ...

def title_search(query: TutorialSearch) -> List[Document]:
    title_vectorstore.similarity_search(query.title_search, 
    ...

In [None]:
from langchain_core.runnables import RunnableMap


retrieval = (
    query_analyzer_with_examples
    | RunnableMap(content_docs = 
)

## Routing: Working with multiple indexes



## Sorting: Going beyond similarity search



In [81]:
question = "What has LangChain released lately?"
queries = [
    LangChainYouTubeSearch(sort_by="publish_date", relevance_rank=1),
]
examples.append({"input": question, "tool_calls": queries})

In [82]:
question = "What are the most popular videos about RAG?"
queries = [
    LangChainYouTubeSearch(title_search="RAG", sort_by="view_count", relevance_rank=1),
    LangChainYouTubeSearch(
        content_search="RAG", sort_by="view_count", relevance_rank=2
    ),
    LangChainYouTubeSearch(
        content_search="retrieval augmented generation",
        sort_by="view_count",
        relevance_rank=3,
    ),
    LangChainYouTubeSearch(
        content_search="retrieval", sort_by="view_count", relevance_rank=4
    ),
]
examples.append({"input": question, "tool_calls": queries})

In [86]:
question = "What are some short videos about agentic systems with local models?"
queries = [
    LangChainYouTubeSearch(
        content_search="how to build an agent using a local model",
        sort_by="length",
        sort_order="ascending",
        relevance_rank=1,
    ),
    LangChainYouTubeSearch(
        title_search="agent local model",
        sort_by="length",
        sort_order="ascending",
        relevance_rank=2,
    ),
]
examples.append({"input": question, "tool_calls": queries})

In [90]:
query_analyzer.invoke("Recent talks about local llms")

[LangChainYouTubeSearch(content='local LLMs', title=None, min_view_count=None, max_view_count=None, earliest_publish_date=None, latest_publish_date=None, min_length_sec=None, max_length_sec=None, sort_by='publish_date', sort_order='descending', relevance_rank=1)]

In [212]:
query_analyzer.invoke("new videos about streaming state machines")

[LangChainYouTubeSearch(content='new videos about streaming state machines', title=None, min_view_count=None, max_view_count=None, earliest_publish_date=None, latest_publish_date=None, min_length_sec=None, max_length_sec=None, sort_by='relevance', sort_order='descending', relevance_rank=1)]