### **Implementing Hybrid Routing (Embedding-Based + Logical & Semantic Routing)**
We'll implement a **hybrid routing system** that first uses **embedding-based similarity search** to find the most relevant knowledge source, and then applies **logical & semantic routing** using an LLM to refine the selection.


### **Install Required Libraries**
Ensure you have the necessary libraries installed:

In [None]:
!pip install langchain langchain-openai langchain-huggingface faiss-cpu sentence-transformers

In [None]:
# Load Environment Variables

import os
from dotenv import load_dotenv

load_dotenv()
HF_TOKEN = os.getenv("HF_TOKEN")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

### **Define Your Knowledge Sources**
We assume we have three document sources:  
- **Python Docs**  
- **JavaScript Docs**  
- **Golang Docs**  

Each source contains **precomputed embeddings**.

In [None]:
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

In [None]:
# Load embedding model
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


In [None]:
# Load vector databases for each knowledge source
vector_stores = {
    "python_docs": FAISS.load_local("faiss_index/python", embedding_model),
    "js_docs": FAISS.load_local("faiss_index/javascript", embedding_model),
    "golang_docs": FAISS.load_local("faiss_index/golang", embedding_model),
}

### **Step 1: Embedding-Based Routing**
This step **converts the query into an embedding** and finds the **closest** knowledge source.

In [None]:
def embedding_based_routing(query: str):
    """Find the most relevant knowledge source based on embedding similarity."""
    max_score = -1
    best_source = None

    # Compute query embedding
    query_embedding = embedding_model.embed_query(query)

    for source, db in vector_stores.items():
        docs = db.similarity_search_by_vector(query_embedding, k=1)  # Retrieve top 1 doc
        score = docs[0].metadata.get("score", 0)  # Get similarity score

        if score > max_score:
            max_score = score
            best_source = source

    return best_source

**What the above code does:**  
- Converts the **query** into an embedding  
- Finds the **most similar document** from each **vector database**  
- Picks the **highest-scoring** source  

---

### **Step 2: Logical & Semantic Routing with an LLM**
Once we get the **best source**, we use an **LLM** to **validate the selection**.

In [None]:
from langchain_huggingface import HuggingFaceEndpoint
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import Literal

In [None]:
# Define routing model output
class RouteQuery(BaseModel):
    """Route a user query to the most relevant datasource."""
    datasource: Literal["python_docs", "js_docs", "golang_docs"] = Field(...)


In [None]:
# Load LLM (Mistral-7B or OpenAI)
llm = HuggingFaceEndpoint(
    repo_id="mistralai/Mistral-7B-Instruct-v0.2",
    huggingfacehub_api_token=HF_TOKEN,
    temperature=0.1,
    max_length=512
)
structured_llm = llm.with_structured_output(RouteQuery)


In [None]:
# Define LLM routing prompt
system_prompt = """You are an expert at routing a user question to the appropriate data source.

Based on the programming language the question is referring to, choose the best source:
- "python_docs" for Python-related questions.
- "js_docs" for JavaScript-related questions.
- "golang_docs" for Golang-related questions.

If unsure, choose the closest match."""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{question}")
])

# Define router
router = prompt | structured_llm

**What This Does:**  
- Uses an **LLM** to **refine** the routing decision  
- If embedding-based routing picks **Python**, but the LLM thinks **JavaScript**, it adjusts  
- Ensures **better accuracy** than embeddings alone  

---

### **Step 3: Hybrid Routing Logic**
Now, we **combine both methods** for optimal routing.

In [None]:
def hybrid_routing(query: str):
    """Combines embedding-based and LLM-based routing."""
    
    # Step 1: Use embedding-based routing
    embedding_source = embedding_based_routing(query)
    print(f"🔍 Embedding-based suggestion: {embedding_source}")

    # Step 2: Validate with LLM-based routing
    llm_source = router.invoke({"question": query}).datasource
    print(f"🤖 LLM-based suggestion: {llm_source}")

    # Step 3: Final decision (prefer LLM if different)
    final_source = llm_source if llm_source != embedding_source else embedding_source
    print(f"✅ Final decision: {final_source}")

    return final_source

### **Step 4: Retrieve & Generate Answer**
Once we **route** the query, we retrieve **relevant documents** and use an **LLM** to generate a response.

In [None]:
def retrieve_and_answer(query: str):
    """Retrieve relevant documents and generate an answer."""
    
    # Step 1: Determine knowledge source
    best_source = hybrid_routing(query)

    # Step 2: Retrieve documents
    docs = vector_stores[best_source].similarity_search(query, k=3)
    
    # Step 3: Generate answer
    context = "\n\n".join([doc.page_content for doc in docs])
    
    answer_prompt = ChatPromptTemplate.from_template(
        "Answer the question based on this context:\n\n{context}\n\nQuestion: {question}"
    )
    answer_chain = answer_prompt | llm

    answer = answer_chain.invoke({"context": context, "question": query})
    
    return answer

### **Test the System**
Now, let's **test** the hybrid routing and answer generation.

In [None]:
query = "How do I create a dictionary in Python?"
response = retrieve_and_answer(query)

print("\n💡 Final Answer:\n", response)

### **Summary of What We Built**
1️ **Embedding-Based Routing**  
   - Uses **vector similarity** to find the best **knowledge source**.  

2️ **Logical & Semantic Routing**  
   - Uses an **LLM** to **validate and refine** the decision.  

3️ **Hybrid Routing**   
   - Uses **both** methods for more **accurate** and **robust** retrieval.  

4️ **Final Answer Generation**  
   - Retrieves **relevant documents** and generates an answer using an LLM.  