# RAG from Scratch: Routing

In using RAG, the model often receives the wrong context, either it's too broad, too narrow ot not relevant.

These are some things that can make this happen:
1. Not every question needs retrieval, yet the model pipeline will retrive the data in every cases.
2. Knowledge sources can come in different formats (documents, SQL, databases, code, APIs, etc.)
3. One single retriever cannot fit every task.

<br/>
<br/>
For this reason, we can implement extra step after doing the query transformation step, which is the **Routing**. Routing is a step where we choose to most appropriate source for a query before retrieval, because the right source might be stored in different database or even different database types (relational db, graph db, and vector stores).

Without routing, we would send every question to the same retriever every single time. By implementing routing system. the routing can then decide the following issue:
1. Whether the particular question need retrieval or not.
2. If yes, which retriever or knowledge database should be used?
3. If no, should the LLM answer directly?

## Method 1: Logical Routing

We give the LLM knowledge of the various data sources that we have at our disposal, and we let the LLM to use its logic to reason which database to be used and apply the question to.

Logical Routing is based on rules or categories. It will classify the question and decide the action. When we ask something like "Explain quantum computing" it will route to scientific pages like wikipedia. But for question like "Where is the nearest ATM", it will route it to a real-time API retriever.

In [1]:
! pip install -q langchain_community tiktoken langchain-ollama langchainhub chromadb langchain youtube-transcript-api pytube pydantic


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
import os
from access import Access

os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = Access.LANGCHAIN_API_KEY

In [3]:
from typing import Literal

from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from langchain_ollama import ChatOllama

# Data model
class RouteQuery(BaseModel):
    """Route a user query to the most relevant datasrouce."""
    
    datasource: Literal["python_docs", "js_docs", "golang_docs"] = Field(
        ...,
        description="Given a user question choose which datasource would be the most relevant for answering their question",
    )
    
# LLM with Function call
llm = ChatOllama(model="llama3.1", temperature=0)
structured_llm = llm.with_structured_output(RouteQuery)

# Prompt
system = """You are an expert at routing a user question to the appropriate data source.

Based on the programming language the question is referring to, route it to the relevant data source."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}"),
    ]
)

# Define router
router = prompt | structured_llm

  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


In [4]:
question = """Why doesn't the following code work:

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(["human", "speak in {language}"])
prompt.invoke("french")
"""

result = router.invoke({"question": question})

In [5]:
result

RouteQuery(datasource='python_docs')

In [6]:
result.datasource

'python_docs'

In [13]:
# Prepare the chain
# ---- your answer chains (very simple example) ----
answer_llm = ChatOllama(model="llama3.1", temperature=0)

python_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert in Python. Answer clearly."),
    ("human", "{question}")
])
js_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert in JavaScript. Answer clearly."),
    ("human", "{question}")
])
golang_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert in Golang. Answer clearly."),
    ("human", "{question}")
])

python_chain = python_prompt | answer_llm
js_chain = js_prompt | answer_llm
golang_chain = golang_prompt | answer_llm

def choose_route(result):
    if "python_docs" in result.datasource.lower():
        ### Logic here 
        # Chain for python docs
        return python_chain
    elif "js_docs" in result.datasource.lower():
        ### Logic here 
        # Chain for js docs
        return js_chain
    else:
        ### Logic here 
        # Chain for golang docs
        return golang_chain

from langchain_core.runnables import RunnableLambda
from langchain_core.output_parsers import StrOutputParser

full_chain = router | RunnableLambda(choose_route) | StrOutputParser()


In [14]:
full_chain.invoke({"question": question})

"It looks like you're setting a variable named `datasource` to the string `'python_docs'`.\n\nIn Python, this is simply assigning a value to a variable. The variable `datasource` now holds the string `'python_docs'`. \n\nHere's an example of how you might use it:\n```python\ndatasource = 'python_docs'\nprint(datasource)  # Outputs: python_docs\n```\nIs there something specific you'd like to do with this variable, or would you like more information on working with variables in Python?"

## Method 2: Semantic Routing

Sematic routing is a method that is based on embeddings similarity. In this method, we don't define rules. The step will be as follows:
1. We convert the question into a vector (embedding)
2. Compares it to vector "profiles" of each retriever
3. Picks the retriever with the closest semantic match

With this approach, we can have different knowledge base each for medical, finance, and even more. This method is more flecible and works even when the topic is vague or overlapping. It is good for large or unstructured knowledge spaces. The disadvantage is that we need embeddings and vector search to implement this and it is also harder to debug because the decision is not rule based.