<a href="https://colab.research.google.com/github/Abhiprameesh/Voice-AI-call-center-RAG_Models/blob/main/ChromaDB_RAG_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install sentence-transformers transformers chromadb pandas --quiet

import pandas as pd
import pickle
from sentence_transformers import SentenceTransformer
from transformers import pipeline
import chromadb
from chromadb.config import Settings

df = pd.read_csv("RAGChat.csv")
print("Sample rows:")
print(df.head())
# Expected Columns: ["Intent (Category)", "User Question Variation", "Official Bot Response", "Source URL", "Notes"]

embedder = SentenceTransformer('all-MiniLM-L6-v2')
chroma_client = chromadb.PersistentClient(path="chroma_db")

collection = chroma_client.get_or_create_collection(name="rag_collection")

collection.add(
    documents=df["User Question Variation"].tolist(),
    embeddings=embedder.encode(df["User Question Variation"].tolist(), normalize_embeddings=True).tolist(),
    metadatas=[{"response": r, "source": s} for r, s in zip(df["Official Bot Response"], df["Source URL"])],
    ids=[str(i) for i in range(len(df))]
)

print(f"ChromaDB collection created with {collection.count()} entries.")

def retrieve_answer(query, top_k=2):
    query_vec = embedder.encode([query], normalize_embeddings=True).tolist()[0]
    results = collection.query(
        query_embeddings=[query_vec],
        n_results=top_k
    )

    retrieved_data = []
    for meta in results["metadatas"][0]:
        retrieved_data.append(f"- {meta['response']} (Source: {meta['source']})")

    return "\n".join(retrieved_data)

rag_llm = pipeline("text2text-generation", model="google/flan-t5-base", device=-1)

def respectful_bot(query):
    context = retrieve_answer(query, top_k=2)

    if not context.strip():
        return "I don’t know the answer. Please check the BBMP official website."

    prompt = f"""
You are a respectful municipal chatbot.
Answer the user question ONLY using the context below.
If the answer is not in the context, reply exactly: "I don’t know the answer."

User Question: {query}
Context: {context}
"""
    result = rag_llm(prompt, max_length=256, temperature=0)[0]['generated_text']
    return result

test_queries = [
    "When does garbage get collected in my area?",
    "How can I pay my property tax online?",
    "Where do I complain about potholes?",
    "Who is the Prime Minister of India?"
]

for q in test_queries:
    print("User:", q)
    print("Bot:", respectful_bot(q))
    print("-" * 60)

def save_index():
    print("ChromaDB index saved.")

def load_index():
    global chroma_client, collection
    chroma_client = chromadb.PersistentClient(path="chroma_db")
    collection = chroma_client.get_collection("rag_collection")
    print("ChromaDB index loaded.")

save_index()

Sample rows:
      Intent (Category)                            User Question Variation  \
0  ask_garbage_schedule         What time is garbage collected in my area?   
1  ask_garbage_schedule      When does garbage pickup happen in my street?   
2   ask_collection_days        Which days is garbage collected in my area?   
3     ask_missed_pickup   My garbage was not picked up today—what do I do?   
4        ask_bulk_waste  How can I dispose of old furniture or bulky wa...   

                               Official Bot Response  \
0  As per the latest BBMP update effective from A...   
1  BBMP collects waste door-to-door beginning at ...   
2  BBMP arranges daily wet waste collection for m...   
3  We regret the inconvenience. Missed pickups mu...   
4  Bulk waste such as furniture, mattresses, and ...   

                                          Source URL  \
0  https://www.hindustantimes.com/cities/bengalur...   
1  https://www.goodreturns.in/news/bengaluru-wast...   
2  https://si

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


User: When does garbage get collected in my area?


Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Bot: Door-to-door garbage collection now starts at 5:30 a.m. daily in most wards. Residents should put their waste outside before 5:30 a.m. to ensure timely pickup. This change aims to prevent leftover garbage during the day and improve overall city cleanliness. Local ward timings may vary slightly, so check with local ward officials for specifics. (Source: https://www.hindustantimes.com/cities/bengaluru-news/bengaluru-sets-new-morning-schedule-for-waste-pickup-check-new-timings-here-101756175113381.html) - BBMP arranges daily wet waste collection for most households, typically early morning. Dry waste is collected two to three times a week, depending on the ward. Users should provide their ward or locality to BBMP for the exact dry waste schedule. These differentiated schedules assist in effective waste processing and recycling. (Source: https://site.bbmp.gov.in/documents/Schedule%20Yelahanka.pdf
------------------------------------------------------------
User: How can I pay my prope

Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Bot: You can pay property tax online through the official BBMP property tax portal at https://bbmptax.karnataka.gov.in/. Enter your Property Identification Number (PID) or SAS Application Number, verify your property details, choose your payment mode (credit/debit card, UPI, net banking) and complete the transaction. An e-receipt and SMS confirmation will be generated within 24 hours upon successful payment. (Source: https://bbmptax.karnataka.gov.in) - Property tax payment forms (including Form IV, Form V for assessments and revisions) are available on the official BBMP portal. You can download, fill, and upload these forms during online payment or submit them offline at the revenue office.
------------------------------------------------------------
User: Where do I complain about potholes?


Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Bot: I don’t know the answer.
------------------------------------------------------------
User: Who is the Prime Minister of India?


Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Bot: I don’t know the answer.
------------------------------------------------------------
✅ ChromaDB index saved.
