# RAG System with Smart Routing

This notebook implements a Retrieval-Augmented Generation (RAG) system with the following components:

1. **Document Loading**: Loading blog posts about AI/ML topics from Lilian Weng's blog
2. **Vector Storage**: Using SKLearn vector store with Nomic embeddings for document storage
3. **Smart Router**: A system that decides whether to use:
   - Vector store: For questions about agents, prompt engineering, and adversarial attacks
   - Web search: For current events and topics not in the knowledge base

The system demonstrates how to intelligently route questions to the appropriate data source.

In [None]:
from langchain_openai import ChatOpenAI


llm = ChatOpenAI(
        api_key="lm-studio",
        base_url="http://localhost:1234/v1",
        model="gemma-3-4b-it",
        temperature=0.7,
        max_tokens=4096
    )


llm_json_mode = ChatOpenAI(
        api_key="lm-studio",
        base_url="http://localhost:1234/v1",
        model="gemma-3-4b-it",
        temperature=0.7,
        max_tokens=4096, 
        format="json"
    )
    

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

In [4]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import SKLearnVectorStore
from langchain_nomic.embeddings import NomicEmbeddings

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

# Load documents
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

# Split documents
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=200
)
doc_splits = text_splitter.split_documents(docs_list)

# Add to vectorDB
vectorstore = SKLearnVectorStore.from_documents(
    documents=doc_splits,
    embedding=NomicEmbeddings(model="nomic-embed-text-v1.5", inference_mode="local"),
)

# Create retriever
retriever = vectorstore.as_retriever(k=3)

USER_AGENT environment variable not set, consider setting it to identify your requests.
Failed to load libllamamodel-mainline-cuda-avxonly.so: dlopen: libcudart.so.11.0: cannot open shared object file: No such file or directory
Failed to load libllamamodel-mainline-cuda.so: dlopen: libcudart.so.11.0: cannot open shared object file: No such file or directory
Failed to load libllamamodel-mainline-cuda-avxonly.so: dlopen: libcudart.so.11.0: cannot open shared object file: No such file or directory
Failed to load libllamamodel-mainline-cuda.so: dlopen: libcudart.so.11.0: cannot open shared object file: No such file or directory
Embedding texts: 100%|██████████| 47/47 [01:03<00:00,  1.35s/inputs]
Embedding texts: 100%|██████████| 47/47 [01:03<00:00,  1.35s/inputs]


## Router Implementation

The router decides between two data sources:
1. **Vector Store**: For questions about content in our knowledge base
2. **Web Search**: For current events and topics outside our knowledge base

The router uses the LLM in JSON mode to return a structured decision.

In [None]:
from typing import Dict, List, Literal
from dataclasses import dataclass
import json
from langchain_core.messages import HumanMessage, SystemMessage

@dataclass
class RouterConfig:
    vectorstore_topics: List[str] = ("agents", "prompt engineering", "adversarial attacks")
    system_prompt: str = """You are an expert at routing questions to the appropriate data source.
    
    RULES:
    1. The vectorstore contains documents about: {topics}
    2. Use vectorstore ONLY for questions about these topics
    3. Use websearch for everything else, especially current events
    
    Return a JSON object with format: {{"datasource": "websearch"}} or {{"datasource": "vectorstore"}}
    """
    
    def get_prompt(self) -> str:
        return self.system_prompt.format(topics=", ".join(self.vectorstore_topics))

class QuestionRouter:
    def __init__(self, llm):
        self.llm = llm
        self.config = RouterConfig()
    
    def route(self, question: str) -> Dict[str, Literal["websearch", "vectorstore"]]:
    """Route a single question to appropriate data source."""
    try:
        print(f"Envoi de la requête à LM-Studio: {question}")
        response = self.llm.invoke([
            SystemMessage(content=self.config.get_prompt()),
            HumanMessage(content=question)
        ])
        print(f"Réponse reçue: {response.content}")
        return json.loads(response.content)
    except Exception as e:
        print(f"Warning: Routing failed - {str(e)}")
        print(f"Trace complète:", traceback.format_exc())
        return {"datasource": "websearch"}

# Create router instance
router = QuestionRouter(llm_json_mode)

# Test cases
test_questions = [
    "Who is favored to win the NFC Championship game in the 2024 season?",  # Should be websearch
    "What are the models released today for llama3.2?",                     # Should be websearch
    "What are the types of agent memory?"                                   # Should be vectorstore
]

# Run tests and display results
print("Router Test Results:")
print("-" * 50)
for question in test_questions:
    result = router.route(question)
    print(f"Question: {question}")
    print(f"Routed to: {result['datasource']}\n")

Router Test Results:
--------------------------------------------------
Question: Who is favored to win the NFC Championship game in the 2024 season?
Routed to: websearch

Question: What are the models released today for llama3.2?
Routed to: websearch

Question: What are the types of agent memory ?
Routed to: websearch

