# RAG Routing

In this notebook there'll be covered the following **routing options**:

1. **Completion Routers** - LLM Completion Routers use an LLM completion call to return a single word that best describes the query from a list of word options provided in the prompt. This word is then used as part of an If/Else condition to control the application's flow.

2. **Function Calling Routers** - LLM Function Calling Routers leverage the function-calling ability of LLMs to pick a route to traverse. Routes are set up as functions with appropriate descriptions, and based on the query, the LLM returns the correct function to use.

3. **Semantic Routers** - Semantic Routers use embeddings and similarity searches to select the best route. Each route has associated example queries that are embedded and stored as vectors; the incoming query is embedded, and a similarity search determines the closest match.

4. **Zero Shot Classification Routers** - Zero Shot Classification Routers use a Zero-Shot Classification model to assign a label to a piece of text from a predefined set of labels. They can classify new examples from previously unseen classes, making them versatile for various queries.

5. **Language Classification Routers** - Language Classification Routers identify the language of the query and route it accordingly. They are useful for applications requiring multilingual parsing capabilities.

6. **Keyword Routers** - Keyword Routers select a route by matching keywords between the query and predefined route lists. They can be powered by LLMs or other keyword matching libraries.

7. **Logical Routers** - Logical Routers use logic checks against variables such as string lengths, file names, and value comparisons to handle query routing. They rely on existing and discrete variables rather than natural language understanding.

One day maybe I'll add some pretty graphics here ;)

In [20]:
# import modules
import os
from langchain_openai import AzureChatOpenAI
import pdfplumber
import faiss
import numpy as np
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import NLTKTextSplitter
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import PromptTemplate
from langchain_community.chat_models import ChatOllama
import json
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from transformers import AutoTokenizer, AutoModel
import torch
from sentence_transformers import SentenceTransformer
import re

## 1. Completion Router

In [2]:
# design prompt

prompt = PromptTemplate(
    template="""

    You are a brilliant assistant who's exceptional in classification tasks.
    Your main task is to classify user's query below as either being about `Coffee`, `Tee`, `Soft Drinks`, `Alcoholic Drinks` or `Other`.

    Do not respond with more than one word.

    <user query>
    {user_query}
    </user query>

    Classification:
    """,
    input_variables=["user_query"],
)

In [3]:
user_query = "Where can I find kenyan K7 or Ruiru 11 sorts?" # K7 and Ruiru 11 are popular kenyan coffee sorts

In [4]:
# complete router

llama = ChatOllama(model="llama3", temperature=0)

completion_route_chain = prompt | llama | StrOutputParser()

input_data = {
    "user_query": user_query
}

route = completion_route_chain.invoke(input=input_data)
print(f"ROUTE: {route}")

ROUTE: Coffee


## 2. Function Calling Router

In [5]:
# To be added soon

## 3. Semantic Router

In [6]:
# use either semantic_router library or create a custom Route class from the one below

emb_model = "sentence-transformers/all-MiniLM-L6-v2"

class Route:
    def __init__(self, name, utterances, embedding_model_name=emb_model):
        self.name = name
        self.utterances = utterances
        self.embedding_model_name = embedding_model_name
        self.tokenizer = AutoTokenizer.from_pretrained(embedding_model_name)
        self.model = AutoModel.from_pretrained(embedding_model_name)
        self.embeddings = self._embed_utterances(utterances)

    def _embed_utterances(self, utterances):
        # tokenize utterances
        tokens = self.tokenizer(utterances, padding=True, truncation=True, return_tensors="pt")
        # get embeddings
        with torch.no_grad():
            embeddings = self.model(**tokens).last_hidden_state.mean(dim=1).numpy()
        return embeddings

def embed_query(query, embedding_model_name='sentence-transformers/all-MiniLM-L6-v2'):
    tokenizer = AutoTokenizer.from_pretrained(embedding_model_name)
    model = AutoModel.from_pretrained(embedding_model_name)
    tokens = tokenizer(query, return_tensors="pt")
    with torch.no_grad():
        embedding = model(**tokens).last_hidden_state.mean(dim=1).numpy()
    return embedding

def find_best_route(query, routes):
    query_embedding = embed_query(query)
    best_match_route = None
    highest_similarity = -1
    
    for route in routes:
        similarities = cosine_similarity(query_embedding, route.embeddings).flatten()
        max_similarity = np.max(similarities)
        
        if max_similarity > highest_similarity:
            highest_similarity = max_similarity
            best_match_route = route
            
    return best_match_route

# example routing
fishing = Route(
    name="fishing",
    utterances=[
        "What's the best bait for catching bass?",
        "Do you prefer freshwater or saltwater fishing?",
        "What's your favorite fishing spot?",
        "Have you ever caught a really big fish?",
        "Any tips for a beginner fisherman?",
    ],
)

hunting = Route(
    name="hunting",
    utterances=[
        "What's the best time of year for deer hunting?",
        "Do you use a bow or a rifle?",
        "What's your most memorable hunting trip?",
        "How do you track game in the wild?",
        "Any tips for staying safe while hunting?",
        "Ducks hunting tips"
    ],
)

camping = Route(
    name="camping",
    utterances=[
        "What's your favorite camping spot?",
        "Do you prefer tents or RVs for camping?",
        "How do you make a campfire?",
        "What's your go-to camping meal?",
        "Any tips for a first-time camper?",
    ],
)

routes = [fishing, hunting, camping]

query = "I am looking for a sea near-shore location for hunting ducks"
best_route = find_best_route(query, routes)
print(f"THE BEST ROUTE: {best_route.name}")

THE BEST ROUTE: hunting


## 4. Zero Shot Classification Router

The implementation can be found on the Haystack GitHub [here](https://github.com/deepset-ai/haystack/blob/main/haystack/components/routers/zero_shot_text_router.py#L130) 🙃

## 5. Language Classification Router

Practically, there are two options how to establish routing based on multiple languages.

- **Option 1**: Utilize external services for language detection (e.g. Azure Speech)
- **Option 2**: Do the translation and routing via Prompt Engineering (example below)

In [7]:
# design prompt

prompt = PromptTemplate(
    template="""

    You are a brilliant assistant who's exceptional in language identification tasks.
    Your main task is to identify the language of the user's query below and respond using one of the ISO 639 langauge codes.

    Do not respond with more than one word.

    <ISO codes>
    {iso_codes}
    </ISO codes>

    <user query>
    {user_query}
    </user query>

    Language:
    """,
    input_variables=["iso_codes", "user_query"],
)

iso_639_languages = {
    "English": "en",
    "Mandarin Chinese": "zh",
    "Hindi": "hi",
    "Spanish": "es",
    "French": "fr",
    "German": "de",
    "Standard Arabic": "ar",
    "Bengali": "bn",
    "Portuguese": "pt",
    "Russian": "ru",
    "Japanese": "ja"
}

In [8]:
# query

query = "Was macht man am Freitag Abend in Berlin?" # german

In [9]:
# language router

llama = ChatOllama(model="llama3", temperature=0)

completion_route_chain = prompt | llama | StrOutputParser()

input_data = {
    "iso_codes": iso_639_languages,
    "user_query": query   
}

route = completion_route_chain.invoke(input=input_data)
print(f"ROUTE: {route}")

ROUTE: de


## 6. Keyword Router

A keyword router will select a route by matching **keywords** between the **user's query** and **routes list**. In some specific use cases, we only need a couple of keywords to route the query to a specific module or handler. 

Why do we need to make extra LLM calls, if we can save some **latency** and **extra money**?!

In [10]:
# OPTION 1: simple keyword router

class KeywordRouter:
    def __init__(self, routes):
        self.routes = routes

    def find_keyword_route(self, query):
        query_lower = query.lower()
        for route, keywords in self.routes.items():
            if any(keyword in query_lower for keyword in keywords):
                return route
        return "default"

# define routes --> better descriptions = better routing
routes = {
    "web": ["html", "css", "javascript", "web", "website", "frontend", "backend"],
    "blockchain": ["blockchain", "crypto", "bitcoin", "ethereum", "smart contract", "decentralized"],
    "opensource": ["open-source", "open source", "github", "git", "contribution", "license"],
}

user_query = "How to be a frontend developer?"

keyword_router = KeywordRouter(routes=routes)

route = keyword_router.find_keyword_route(query=user_query)
print(f"ROUTE: {route}")

ROUTE: web


In [11]:
# OPTION 2: keyword router w/ retrieval --> 1st step is to create different retrievers (simulation of multiple routes --> web/blockchain/opensource)

emb_model = SentenceTransformerEmbeddings(model_name="thenlper/gte-large")

data_path = "../data/rag-routing"
pdf_files = [f for f in os.listdir(data_path) if f.endswith('.pdf')]
data = [PyPDFLoader(os.path.join(data_path, file)).load() for file in pdf_files]
docs_list = [item for sublist in data for item in sublist]
text_splitter = NLTKTextSplitter()
doc_chunks = text_splitter.split_documents(docs_list)

print("TOTAL NO. OF CHUNKS: ", len(doc_chunks))

TOTAL NO. OF CHUNKS:  99


In [12]:
# 1st retriever representing the first route

chroma_db = Chroma.from_documents(documents=doc_chunks, embedding=emb_model)
retriever = chroma_db.as_retriever(search_type="mmr")

In [13]:
# 2nd retriever representing the second route



## 7. Logical Router

A logical router takes conditions that you specify and routes your data through different paths down the pipeline.

Example conditions:
- Query input length
- Number of specified values from the query
- Special characters in the query (e.g. ?!%$&)
- Number of specific words (e.g. "Hello")

In [28]:
# query length

def query_length_router(query:str):
    if len(query.split()) > 2:
        response = retriever.invoke(input=query)
        output = {
            "sufficient_length": query,
            "response": response
        }
        return output
    else:
        return {"insufficient_length": "Your query is too short! Please explain it."}
    

# number of specific values from the query
def query_value_count_router(query: str):
    values = re.findall(r'\d+', query)  # find all digit sequences
    if len(values) >= 2:  # check if there are 2 numbers
        response = retriever.invoke(input=query)
        return {
            "sufficient_values": query,
            "response": response
        }
    else:
        return {"insufficient_values": "Please include more numerical values in your query."}

# special characters
def special_characters_router(query: str):
    if any(char in set('?!%$&') for char in query):  # check for special characters
        response = retriever.invoke(input=query)
        return {
            "contains_special_chars": query,
            "response": response
        }
    else:
        return {"no_special_chars": "Please include special characters for emphasis or clarification."}


# specific words
def specific_word_count_router(query: str, word: str = "bitcoin"):
    count = query.lower().split().count(word.lower())
    if count > 0:
        response = retriever.invoke(input=query)
        return {
            "word_count_sufficient": query,
            "response": response
        }
    else:
        return {"word_count_insufficient": f"The word '{word}' does not appear in your query."}


user_query = "What was the bitcoin price between 2020 and 2024?"

response_query_length = query_length_router(user_query)
response_specific_values = query_value_count_router(user_query)
response_special_characters = special_characters_router(user_query)
response_specific_word_count = specific_word_count_router(user_query, "bitcoin")

print("Response for Query Length:", response_query_length)
print("Response for Specific Values:", response_specific_values)
print("Response for Special Characters:", response_special_characters)
print("Response for Specific Word Count:", response_specific_word_count)

Response for Query Length: {'sufficient_length': 'What was the bitcoin price between 2020 and 2024?', 'response': [Document(page_content='21 \nSutardja Center for Entrepreneurship & Technology Technical Report \n \nFigure 7.\n\nBitcoin price in 2015.\n\n9\nThis enthusiasm may be because of the large quantities of capital being injected into \nthe digital infrastructure.\n\nExcitement grows as Bitcoin and blockchain firms have \nreceived a record US$1 Billion in investments as the year comes to an end.\n\nAmerican \nExpress, Bain Capital, Deloitte, Goldman Sachs, MasterCard, the New York Life \nInsurance Company, the New York Stock Exchange -- all of them have poured \nmillions of dollars into Bitcoin firms recently.\n\nCorporate funding into Bitcoin & Blockchain infrastructure is growing and \ngenerating interest in several segments.\n\nNasdaq is tapping blockchain technology to \ncreate a more secure, efficient system to trade stocks.\n\nDocuSign, a company that \nspecializes in elect