In [None]:
!pip install -U langchain langgraph langchain-community faiss-cpu openai transformers accelerate sentence-transformers langchain_openai

Collecting langgraph
  Using cached langgraph-0.4.5-py3-none-any.whl.metadata (7.3 kB)
Collecting langchain-community
  Using cached langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting faiss-cpu
  Using cached faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting openai
  Using cached openai-1.79.0-py3-none-any.whl.metadata (25 kB)
Collecting accelerate
  Using cached accelerate-1.7.0-py3-none-any.whl.metadata (19 kB)
Collecting langchain_openai
  Using cached langchain_openai-0.3.17-py3-none-any.whl.metadata (2.3 kB)
Collecting langgraph-checkpoint<3.0.0,>=2.0.26 (from langgraph)
  Using cached langgraph_checkpoint-2.0.26-py3-none-any.whl.metadata (4.6 kB)
Collecting langgraph-prebuilt>=0.1.8 (from langgraph)
  Using cached langgraph_prebuilt-0.1.8-py3-none-any.whl.metadata (5.0 kB)
Collecting langgraph-sdk>=0.1.42 (from langgraph)
  Using cached langgraph_sdk-0.1.69-py3-none-any.whl.metadata (1.8 kB)
Collecting dataclasses-json<0.7,>

### *Building a RAG Bot for Specialized Question-Answering*

#### *Problem Statement:*

You are tasked with creating a *Retrieval-Augmented Generation (RAG) Bot* that can answer questions based on a set of predefined data dumps. Your system will leverage *specialized agents* to handle different types of questions, retrieve relevant data from the appropriate sources, and generate accurate answers.

### *Data Dump:*

You are provided with two data dumps that are imaginary and created by us with the following categories:

1. *Car Data* (information about car models, brands, specifications, etc.)  \[Artificially curated\] \[Structured data\]  
2. *Country Data* (information about countries, capitals, population, language, etc.) \[Artificially curated\]\[Unstructured data\]

### *Task:*

You Can complete this task on a Jupyter NoteBook

* *Data Chunking & Storage*: You will need to chunk and organize the provided data dumps into structured formats that can be easily retrieved by your bot.  
* *Specialized Agents: Create specialized **RAG agents* that can:  
  * Retrieve information from the *Car Data* dump when asked questions related to cars.  
  * Retrieve information from the *Country Data* dump when asked about countries.  
  * Retrieve or compute mathematical answers.  
* *Query Handling*: When a user asks a question, your system should route the query to the appropriate agent based on the nature of the question:  
  * *If the query is related to cars*: The Car Agent should fetch relevant car-related data and provide an answer.  
  * *If the query is related to countries*: The Country Agent should fetch relevant country-related data.  
  * *If the query involves solving a math problem*: The Mathematical Agent should compute and return the result. It needs to verify that the result is correct.   
  * *If the query falls outside these categories*: Return a message saying the system cannot provide an answer.

### *Requirements:*

* *Data Chunking & Storage*: You must chunk the data into manageable units for efficient retrieval and ensure it's stored in a way that allows fast querying.  
* *Retrieval-Augmented Generation (RAG)*: Use RAG to augment your system's ability to generate relevant responses based on the data.  
* *Routing & Query Handling*: Implement a system where the right agent is invoked based on the user's query type, ensuring that only relevant data is used to answer the question.  
* *Edge Cases & Error Handling*: Consider scenarios where data is missing or the query is outside the scope of the provided data dumps.
* *Use LangGraph for Agentic AI workflow*


In [None]:
!pip install chromadb



In [None]:
!pip install tiktoken



In [None]:
import os
import pandas as pd
import numpy as np
import re
from typing import List, Dict, Any, Tuple, Optional
from langchain_openai import OpenAIEmbeddings
from langchain_community.llms import HuggingFacePipeline
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain.schema import Document
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain.tools.base import Tool
from langchain.tools import tool
from langgraph.graph import StateGraph, END
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration, pipeline

**LOADING DATASETS**

In [None]:
df = pd.read_csv("cars_dataset.csv")
df

Unnamed: 0,Car Name,Manufacturer,Launch Year,Description,Engine Specifications,Other Specifications,User Ratings,NCAP Global Rating
0,OffDecision12,Ross PLC,2017,Experience the fusion of style and performance...,"I6, 253 HP, 1520cc","SUV, 10 km/l, 203 km/h top speed",2.5,4
1,ExistGround23,Ross PLC,2017,The ExistGround23 by Ross PLC is a luxurious a...,"Electric, 281 HP, 4515cc","SUV, 15 km/l, 238 km/h top speed",2.1,1
2,SometimesHerself24,Ross PLC,2021,The SometimesHerself24 by Ross PLC is a effici...,"V8, 367 HP, 2351cc","Coupe, 15 km/l, 219 km/h top speed",5.0,1
3,OffAround14,Ross PLC,2006,The OffAround14 by Ross PLC is a elegant and v...,"Electric, 585 HP, 1464cc","Coupe, 12 km/l, 221 km/h top speed",1.7,4
4,PriceIdea77,Ross PLC,2002,"Ross PLC presents the PriceIdea77, a efficient...","V6, 422 HP, 1762cc","Hatchback, 6 km/l, 238 km/h top speed",2.8,2
...,...,...,...,...,...,...,...,...
1995,ApproachWife68,"Adams, Nelson and Taylor",2023,"Adams, Nelson and Taylor presents the Approach...","V6, 324 HP, 2388cc","Coupe, 11 km/l, 113 km/h top speed",2.8,2
1996,"Adams, Nelson and Taylor",ThoughNumber19,2021,"ThoughNumber19 presents the Adams, Nelson and ...","I4, 307 HP, 3055cc","Convertible, 5 km/l, 119 km/h top speed",1.7,5
1997,"Adams, Nelson and Taylor",ProduceThis27,1999,"The Adams, Nelson and Taylor by ProduceThis27 ...","V6, 211 HP, 4787cc","Convertible, 10 km/l, 176 km/h top speed",2.9,1
1998,ConsiderSuffer61,"Adams, Nelson and Taylor",2012,Experience the fusion of style and performance...,"V8, 273 HP, 4048cc","SUV, 11 km/l, 245 km/h top speed",1.0,2


In [None]:
df.shape

(2000, 8)

In [None]:
print("\nMissing values per column:\n", df.isnull().sum())


Missing values per column:
 Car Name                 0
Manufacturer             0
Launch Year              0
Description              0
Engine Specifications    0
Other Specifications     0
User Ratings             0
NCAP Global Rating       0
dtype: int64


In [None]:
#drop duplicates
df.drop_duplicates(inplace=True)

In [None]:
df.head()

Unnamed: 0,Car Name,Manufacturer,Launch Year,Description,Engine Specifications,Other Specifications,User Ratings,NCAP Global Rating
0,OffDecision12,Ross PLC,2017,Experience the fusion of style and performance...,"I6, 253 HP, 1520cc","SUV, 10 km/l, 203 km/h top speed",2.5,4
1,ExistGround23,Ross PLC,2017,The ExistGround23 by Ross PLC is a luxurious a...,"Electric, 281 HP, 4515cc","SUV, 15 km/l, 238 km/h top speed",2.1,1
2,SometimesHerself24,Ross PLC,2021,The SometimesHerself24 by Ross PLC is a effici...,"V8, 367 HP, 2351cc","Coupe, 15 km/l, 219 km/h top speed",5.0,1
3,OffAround14,Ross PLC,2006,The OffAround14 by Ross PLC is a elegant and v...,"Electric, 585 HP, 1464cc","Coupe, 12 km/l, 221 km/h top speed",1.7,4
4,PriceIdea77,Ross PLC,2002,"Ross PLC presents the PriceIdea77, a efficient...","V6, 422 HP, 1762cc","Hatchback, 6 km/l, 238 km/h top speed",2.8,2


In [None]:
with open("country_data.md", "r", encoding="utf-8") as f:
    country_data_text = f.read()

print(f"Loaded country data with {len(country_data_text)} characters")
print(country_data_text[:500] + "...")

Loaded country data with 159920 characters
# Testlio Countries Data

# Country: Washington

Comoros is the capital of Washington. Washington has a total area of 319875 square kilometers and a population of 91971427\.  
Bengali are the official languages spoken in the country.  
Many rivers flow through Washington, including: Increase River  
Washington's National Animal is the Wolf and its National Bird is the Cardinal.

### About the Country best play:

The Tempest is a play by William Shakespeare, probably written in 1610–1611, and tho...


**DATA CHUNKING AND STORAGE**

In [None]:
car_docs = []
for idx, row in df.iterrows():
    content = " ".join([f"{col}: {row[col]}" for col in df.columns])
    metadata = {"source": "car_data", "row_id": idx}
    car_docs.append(Document(page_content=content, metadata=metadata))
    if idx % 500 == 0 and idx > 0:
        print(f"Processed {idx} car data rows...")

print(f"\nCreated {len(car_docs)} car chunks")
print(car_docs[0].page_content[:300] + "...")

Processed 500 car data rows...
Processed 1000 car data rows...
Processed 1500 car data rows...

Created 2000 car chunks
Car Name: OffDecision12 Manufacturer: Ross PLC Launch Year: 2017 Description: Experience the fusion of style and performance with the OffDecision12 from Ross PLC. This elegant convertible boasts high-performance engine along with next-generation navigation system. Its innovative design and reliable ...


In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ".", ",", " ", ""]
)

country_docs = text_splitter.create_documents([country_data_text], metadatas=[{"source": "country_data"}])
print(f"Created {len(country_docs)} country chunks")
print(country_docs[0].page_content[:300] + "...")

Created 245 country chunks
# Testlio Countries Data

# Country: Washington

Comoros is the capital of Washington. Washington has a total area of 319875 square kilometers and a population of 91971427\.  
Bengali are the official languages spoken in the country.  
Many rivers flow through Washington, including: Increase River  ...


In [None]:
os.environ["OPENAI_API_KEY"] = "your_api_key"
print("Setting up OpenAI embeddings...")
openai_embeddings = OpenAIEmbeddings()
print("OpenAI embeddings initialized")

print("\nSetting up Google FLAN-T5 Large model...")
model_id = "google/flan-t5-large"
try:
    tokenizer = T5Tokenizer.from_pretrained(model_id)
    model = T5ForConditionalGeneration.from_pretrained(
        model_id,
        torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
        device_map="auto" if torch.cuda.is_available() else None,
    )

    hf_pipeline = pipeline(
        "text2text-generation",
        model=model,
        tokenizer=tokenizer,
        max_length=512,
        temperature=0.1,
        do_sample=True
    )

    # Wrap the pipeline in LangChain-compatible HuggingFacePipeline
    llm = HuggingFacePipeline(pipeline=hf_pipeline)
    print("FLAN-T5 Large model loaded successfully.")

except Exception as e:
    print(f"Error loading FLAN-T5 model: {e}")
    from langchain_community.llms import HuggingFaceHub
    llm = HuggingFaceHub(repo_id=model_id, model_kwargs={"temperature": 0.1, "max_length": 512})
    print("Using HuggingFace Hub API for model inference")

Setting up OpenAI embeddings...
OpenAI embeddings initialized

Setting up Google FLAN-T5 Large model...


Device set to use cuda:0


FLAN-T5 Large model loaded successfully.


In [None]:
car_vectorstore = FAISS.from_documents(car_docs, openai_embeddings)
country_vectorstore = FAISS.from_documents(country_docs, openai_embeddings)

print("Vector stores created successfully")

Vector stores created successfully


In [None]:
car_template = """Answer the question about cars based only on the following context:

{context}

Question: {question}
Answer: """

country_template = """Answer the question about countries based only on the following context:

{context}

Question: {question}
Answer: """

car_prompt = PromptTemplate.from_template(car_template)
country_prompt = PromptTemplate.from_template(country_template)

print("Created templates for car and country data RAG chains")

Created templates for car and country data RAG chains


In [None]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

print("Building car RAG chain...")
car_retriever = car_vectorstore.as_retriever(search_kwargs={"k": 5})
car_rag_chain = {
    "context": car_retriever | format_docs,
    "question": RunnablePassthrough()
} | car_prompt | llm | StrOutputParser()
print("Car RAG chain built successfully")

print("\nBuilding country RAG chain...")
country_retriever = country_vectorstore.as_retriever(search_kwargs={"k": 5})
country_rag_chain = {
    "context": country_retriever | format_docs,
    "question": RunnablePassthrough()
} | country_prompt | llm | StrOutputParser()
print("Country RAG chain built successfully")

Building car RAG chain...
Car RAG chain built successfully

Building country RAG chain...
Country RAG chain built successfully


**SPECIALIZED AGENTS**

In [None]:
def calculator(expression: str) -> str:
    try:
        result = eval(expression)
        return f"The result of {expression} is {result}"
    except Exception as e:
        return f"Error calculating {expression}: {str(e)}"

math_tools = [calculator]

def math_agent(question):
    expression = question.replace("what is", "").replace("calculate", "").replace("?", "").strip()
    try:
        result = eval(expression)
        return f"The result of {expression} is {result}"
    except Exception:
        try:
            response = llm.invoke(f"Convert this math problem to a Python expression that can be evaluated: {question}")
            expression = response.strip()
            import re
            match = re.search(r'`([^`]+)`', expression)
            if match:
                expression = match.group(1)
            expression = re.sub(r'#.*', '', expression).strip()
            result = eval(expression)
            return f"The result of {expression} is {result}"
        except Exception as e:
            return f"I couldn't solve this math problem: {str(e)}"

print("Math function created successfully")

Math function created successfully


In [None]:
def fallback_agent(question):
    """Fallback agent that handles unknown questions with a static helpful response."""
    print(f"Processing fallback for question: '{question}'")

    return (
        f"I'm sorry, but I can only answer questions about cars, countries, or math.\n\n"

    )


print("Fallback agent created successfully")


Fallback agent created successfully


**QUERY HANDLING**

In [None]:
def route_query(question):
    """Route a question to the appropriate agent type with better keyword control."""
    question = question.lower()

    if any(word in question for word in ["car", "mileage", "engine", "horsepower", "model", "suv", "vehicle"]):
        return "car"
    elif any(word in question for word in ["capital", "population", "country", "continent", "city", "nation", "flag", "currency"]):
        return "country"
    elif any(op in question for op in ["+", "-", "*", "/", "multiply", "divide", "calculate", "what is", "=", "%"]):
        return "math"
    elif "mars" in question or "alien" in question or "fictional" in question:
        return "unknown"
    else:
        return "unknown"


In [None]:
def route_question(state):
    question = state["question"]
    print(f"Routing question: '{question}'")
    agent_type = route_query(question)
    print(f"Selected agent type: {agent_type}")
    return {"question": question, "agent_type": agent_type}

def process_car_question(state):
    question = state["question"]
    print(f"Processing car question: '{question}'")
    answer = car_rag_chain.invoke(question)
    return {"question": question, "agent_type": "car", "answer": answer}

def process_country_question(state):
    question = state["question"]
    print(f"Processing country question: '{question}'")
    answer = country_rag_chain.invoke(question)

    # Catch bad answers
    if answer.strip().lower() in ["unanswerable", "unknown", "i don't know", ""]:
        answer = fallback_agent(question)

    return {"question": question, "agent_type": "country", "answer": answer}

def process_math_question(state):
    question = state["question"]
    print(f"Processing math question: '{question}'")
    answer = math_agent(question)
    return {"question": question, "agent_type": "math", "answer": answer}

def process_unknown_question(state):
    question = state["question"]
    answer = fallback_agent(question)
    return {"question": question, "agent_type": "unknown", "answer": answer}



print("Defined all graph nodes for workflow")

Defined all graph nodes for workflow


In [None]:
workflow = StateGraph(dict)
workflow.add_node("router", route_question)
workflow.add_node("car_agent", process_car_question)
workflow.add_node("country_agent", process_country_question)
workflow.add_node("math_agent", process_math_question)
workflow.add_node("unknown_agent", process_unknown_question)

workflow.set_entry_point("router")

workflow.add_conditional_edges(
    "router",
    lambda x: x["agent_type"].lower(),
    {
        "car": "car_agent",
        "country": "country_agent",
        "math": "math_agent",
        "unknown": "unknown_agent",
    },
)

workflow.add_edge("car_agent", END)
workflow.add_edge("country_agent", END)
workflow.add_edge("math_agent", END)
workflow.add_edge("unknown_agent", END)

rag_bot = workflow.compile()
print("LangGraph workflow compiled successfully")

LangGraph workflow compiled successfully


In [None]:
def ask_question(question):
    """Function to ask a question to the RAG Bot"""
    print(f"\nQuestion: {question}")
    print("-" * 50)
    result = rag_bot.invoke({"question": question})
    print("\nResult:")
    print(f"Agent used: {result['agent_type']}")
    print(f"Answer: {result['answer']}")
    print("-" * 50)
    return result["answer"]



In [None]:
def run_test_question(question, category):
    print(f"\n### Testing {category} question ###")
    ask_question(question)

# Example test questions
car_question = "What are the specifications of the Tesla Model S?"
run_test_question(car_question, "car")

country_question = "What is the capital of France?"
run_test_question(country_question, "country")

math_question = "12 * 11?"
run_test_question(math_question, "math")

unknown_question = "Who is the president of Mars?"
run_test_question(unknown_question, "unknown")


### Testing car question ###

Question: What are the specifications of the Tesla Model S?
--------------------------------------------------
Routing question: 'What are the specifications of the Tesla Model S?'
Selected agent type: car
Processing car question: 'What are the specifications of the Tesla Model S?'


Token indices sequence length is longer than the specified maximum sequence length for this model (649 > 512). Running this sequence through the model will result in indexing errors



Result:
Agent used: car
Answer: electric
--------------------------------------------------

### Testing country question ###

Question: What is the capital of France?
--------------------------------------------------
Routing question: 'What is the capital of France?'
Selected agent type: country
Processing country question: 'What is the capital of France?'

Result:
Agent used: country
Answer: Paris
--------------------------------------------------

### Testing math question ###

Question: 12 * 11?
--------------------------------------------------
Routing question: '12 * 11?'
Selected agent type: math
Processing math question: '12 * 11?'

Result:
Agent used: math
Answer: The result of 12 * 11 is 132
--------------------------------------------------

### Testing unknown question ###

Question: Who is the president of Mars?
--------------------------------------------------
Routing question: 'Who is the president of Mars?'
Selected agent type: unknown
Processing fallback for questio

In [None]:
def interactive_mode():
    """Run the RAG bot in interactive mode"""
    print("\nEntering interactive mode. Type 'exit' to quit.\n")

    while True:
        user_question = input("\nEnter your question (or type 'exit'): ")
        if user_question.lower() == 'exit':
            print("Exiting interactive mode.")
            break

        ask_question(user_question)


interactive_mode()


Entering interactive mode. Type 'exit' to quit.


Enter your question (or type 'exit'): "What is the horsepower of OffAround14?"

Question: "What is the horsepower of OffAround14?"
--------------------------------------------------
Routing question: '"What is the horsepower of OffAround14?"'
Selected agent type: car
Processing car question: '"What is the horsepower of OffAround14?"'

Result:
Agent used: car
Answer: 585
--------------------------------------------------

Enter your question (or type 'exit'): exit
Exiting interactive mode.
