## Routing & Query Construction

In a real-world scenario, knowledge isn’t stored in a single, uniform library.

We often have multiple data sources: documentation for different programming languages, internal wikis, public websites, or databases with structured metadata.

Question -> LLM -> 

1. Graph database
2. Vector store

We’ll start by defining the “contract” for our LLM’s output using a Pydantic model. This schema explicitly tells the LLM the possible destinations for a query.

In [1]:
from typing import Literal
from pydantic import BaseModel, Field


class RouteQuery(BaseModel):
    """
    A data model to route query to the required database
    """

    datasource: Literal["python_docs", "js_docs", "golang_docs"] = Field(
        ...,
        description="Given a user question, choose which datasource would be most relevant for answring the question."
    )

In [2]:
# initialize llm to use
import os
from dotenv import load_dotenv
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

load_dotenv()

model = HuggingFaceEndpoint(
    model="openai/gpt-oss-20b",
    max_new_tokens=1024,
    huggingfacehub_api_token=os.getenv("HUGGINGFACE_API_KEY")
)

llm = ChatHuggingFace(
    llm=model
)

  from .autonotebook import tqdm as notebook_tqdm


In [42]:
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import JsonOutputParser

output_parser = JsonOutputParser(
    name="model_output",
    pydantic_object=RouteQuery
)

# The system prompt provides the core instruction for the LLM's task.
system = """You are an expert at routing a user question to the appropriate data source.

Based on the programming language the question is referring to, route it to the relevant data source.

{format_instructions}
"""

# The full prompt template combines the system message and the user's question.
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}"),
    ]
).partial(format_instructions=output_parser.get_format_instructions())

# Define the complete router chain
router = prompt | llm | output_parser

In [43]:
prompt.messages

[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['format_instructions'], input_types={}, partial_variables={}, template='You are an expert at routing a user question to the appropriate data source.\n\nBased on the programming language the question is referring to, route it to the relevant data source.\n\n{format_instructions}\n'), additional_kwargs={}),
 HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, template='{question}'), additional_kwargs={})]

In [44]:
question = """Why doesn't the following code work:

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(["human", "speak in {language}"])
prompt.invoke("french")
"""

# Invoke the router and check the result
result = router.invoke({"question": question})

print(result)

{'datasource': 'python_docs'}


In [46]:
type(result)

dict

In [47]:
def choose_result(result):
    """A function to determine the downstream logic based on the router's output."""
    if "python_docs" in result["datasource"].lower():
        return "chain for python_docs"
    elif "js_docs" in result["datasource"].lower():
        return "chain for js_docs"
    else:
        return "chain for golang_docs"
    
from langchain_core.runnables import RunnableLambda
full_chain = router | RunnableLambda(choose_result)

final_destination = full_chain.invoke({"question": question})
print(final_destination)

chain for python_docs


### Semantic Routing

Logical routing works perfectly when you have clearly defined categories. But what if you want to route based on the style or domain of a question? For example, you might want to answer physics questions with a serious, academic tone and math questions with a step-by-step, pedagogical approach. This is where Semantic Routing comes in.


Instead of classifying the query, we define multiple expert prompts.

We then embed the user’s query and each of our prompt templates, and use cosine similarity to find the prompt that is most semantically aligned with the query.

In [48]:
from langchain_core.prompts import PromptTemplate

# A prompt for a physics expert
physics_template = """You are a very smart physics professor. \
You are great at answering questions about physics in a concise and easy to understand manner. \
When you don't know the answer to a question you admit that you don't know.

Here is a question:
{query}"""

# A prompt for a math expert
math_template = """You are a very good mathematician. You are great at answering math questions. \
You are so good because you are able to break down hard problems into their component parts, \
answer the component parts, and then put them together to answer the broader question.

Here is a question:
{query}"""

In [63]:
from langchain_huggingface import HuggingFaceEndpointEmbeddings
from langchain_community.utils.math import cosine_similarity

hf_embeddings = HuggingFaceEndpointEmbeddings(
    model="Qwen/Qwen3-Embedding-8B",
    task="feature-extraction",
    huggingfacehub_api_token=os.getenv("HUGGINGFACE_API_KEY")
)

prompt_templates = [physics_template, math_template]
prompt_embeddings = hf_embeddings.embed_documents(prompt_templates)

def prompt_router(input) -> dict:
    """
    function to route query to the most similar prompt
    """

    query_emb = hf_embeddings.embed_query(input["query"])
    
    similarity = cosine_similarity([query_emb], prompt_embeddings)[0]
    most_similar_index = similarity.argmax()
    chosen_template = prompt_templates[most_similar_index]

    print(f"DEBUG: Using {'MATH' if most_similar_index == 1 else 'PHYSICS'} template.")

    return PromptTemplate.from_template(chosen_template)


In [64]:
prompt_router({"query": "What is 2+2?"})

DEBUG: Using MATH template.


PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='You are a very good mathematician. You are great at answering math questions. You are so good because you are able to break down hard problems into their component parts, answer the component parts, and then put them together to answer the broader question.\n\nHere is a question:\n{query}')

In [65]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

chain = (
    {"query": RunnablePassthrough()}
    | RunnableLambda(prompt_router)
    | llm
    | StrOutputParser()
)

print(chain.invoke("What is a black hole?"))

DEBUG: Using PHYSICS template.
A **black hole** is a region of spacetime where gravity is so intense that nothing—matter, light, or even information—can escape once it crosses a particular boundary called the **event horizon**.

| Feature | What it means |
|---------|---------------|
| **Event horizon** | A spherical surface (for a non‑rotating hole) that marks the point of no return. Anything falling inside can never get out. |
| **Escape velocity > c** | The speed needed to escape from the horizon is greater than the speed of light, so even light is trapped. |
| **Singularity** | The center of a black hole, where the curvature of spacetime becomes infinite in classical GR. (Quantum gravity likely changes this picture.) |
| **Massive star collapse** | Most astrophysical black holes form when a massive star runs out of nuclear fuel and its core collapses under gravity. |
| **Observational evidence** | • Gravitational‑wave detections from black‑hole mergers (LIGO/Virgo). <br>• The 2019 

### Query Structuring


We’ve focused on retrieving from unstructured text. But most real-world data is semi-structured; 

it contains valuable metadata like dates, authors, view counts, or categories. A simple vector search can’t leverage this information.

Query Structuring is the technique of converting a natural language question into a structured query that can use these metadata filters for highly precise retrieval.

In [73]:
## The following does not seem to work anymore

# from langchain_community.document_loaders import YoutubeLoader

# # Load a YouTube transcript to inspect its metadata
# docs = YoutubeLoader.from_youtube_url(
#     "https://www.youtube.com/watch?v=pbAd8O1Lvm4", add_video_info=True
# ).load()

# # Print the metadata of the first document
# print(docs[0].metadata)

In [74]:
from langchain_community.document_loaders import YoutubeLoader
import yt_dlp

# Get video info with yt-dlp
ydl_opts = {'quiet': True}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    info = ydl.extract_info("https://www.youtube.com/watch?v=pbAd8O1Lvm4", download=False)
    
# Load transcript without video info
docs = YoutubeLoader.from_youtube_url(
    "https://www.youtube.com/watch?v=pbAd8O1Lvm4",
    add_video_info=False
).load()

# Manually add the metadata
docs[0].metadata.update(**info)

print(docs[0].metadata)



{'source': 'pbAd8O1Lvm4', 'id': 'pbAd8O1Lvm4', 'title': 'Self-reflective RAG with LangGraph: Self-RAG and CRAG', 'formats': [{'format_id': 'sb3', 'format_note': 'storyboard', 'ext': 'mhtml', 'protocol': 'mhtml', 'acodec': 'none', 'vcodec': 'none', 'url': 'https://i.ytimg.com/sb/pbAd8O1Lvm4/storyboard3_L0/default.jpg?sqp=-oaymwENSDfyq4qpAwVwAcABBqLzl_8DBgjymIyuBg==&sigh=rs$AOn4CLCZSRMgkSsr7R3mQgKlyKscgQgrFg', 'width': 48, 'height': 27, 'fps': 0.0945179584120983, 'rows': 10, 'columns': 10, 'fragments': [{'url': 'https://i.ytimg.com/sb/pbAd8O1Lvm4/storyboard3_L0/default.jpg?sqp=-oaymwENSDfyq4qpAwVwAcABBqLzl_8DBgjymIyuBg==&sigh=rs$AOn4CLCZSRMgkSsr7R3mQgKlyKscgQgrFg', 'duration': 1058.0}], 'audio_ext': 'none', 'video_ext': 'none', 'vbr': 0, 'abr': 0, 'tbr': None, 'resolution': '48x27', 'aspect_ratio': 1.78, 'filesize_approx': None, 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36', 'Accept': 'text

In [75]:
import datetime
from typing import Optional

class TutorialSearch(BaseModel):
    """A data model for searching over a database of tutorial videos."""

    # The main query for a similarity search over the video's transcript.
    content_search: str = Field(..., description="Similarity search query applied to video transcripts.")
    
    # A more succinct query for searching just the video's title.
    title_search: str = Field(..., description="Alternate version of the content search query to apply to video titles.")
    
    # Optional metadata filters
    min_view_count: Optional[int] = Field(None, description="Minimum view count filter, inclusive.")
    max_view_count: Optional[int] = Field(None, description="Maximum view count filter, exclusive.")
    earliest_publish_date: Optional[datetime.date] = Field(None, description="Earliest publish date filter, inclusive.")
    latest_publish_date: Optional[datetime.date] = Field(None, description="Latest publish date filter, exclusive.")
    min_length_sec: Optional[int] = Field(None, description="Minimum video length in seconds, inclusive.")
    max_length_sec: Optional[int] = Field(None, description="Maximum video length in seconds, exclusive.")

    def pretty_print(self) -> None:
        """A helper function to print the populated fields of the model."""
        for field in self.__fields__:
            if getattr(self, field) is not None:
                print(f"{field}: {getattr(self, field)}")


This schema is our target. We’ll now create a chain that takes a user question and fills out this model.

In [80]:
# System prompt for the query analyzer
system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a database query optimized to retrieve the most relevant results.

If there are acronyms or words you are not familiar with, do not try to rephrase them.

{format_instructions}
"""

from langchain_core.output_parsers import JsonOutputParser

output_parser = JsonOutputParser(
    name="query_parser",
    pydantic_object=TutorialSearch
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}")
    ]
).partial(format_instructions=output_parser.get_format_instructions())

query_analyzer = prompt | llm | output_parser


In [82]:
# Test 1: A simple query
query_analyzer.invoke({"question": "rag from scratch"})

{'content_search': 'rag from scratch', 'title_search': 'rag from scratch'}

In [85]:
# Test 2: A query with a date filter
query_analyzer.invoke(
    {"question": "videos on chat langchain published in 2023"}
)

{'content_search': 'chat langchain 2023',
 'title_search': 'chat langchain 2023',
 'min_view_count': None,
 'max_view_count': None,
 'earliest_publish_date': '2023-01-01',
 'latest_publish_date': '2024-01-01',
 'min_length_sec': None,
 'max_length_sec': None}

In [86]:
# Test 3: A query with a length filter
query_analyzer.invoke(
    {
        "question": "how to use multi-modal models in an agent, only videos under 5 minutes"
    }
)

{'content_search': 'multi-modal models in an agent',
 'title_search': 'multi-modal models in an agent',
 'min_view_count': None,
 'max_view_count': None,
 'earliest_publish_date': None,
 'latest_publish_date': None,
 'min_length_sec': None,
 'max_length_sec': 300}