<a href="https://colab.research.google.com/github/GYVVishnu77/1M1B_Project_Urban_Mobility/blob/main/traffic_mgmt_service.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install -q sentence-transformers faiss-cpu transformers pypdf


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.8/23.8 MB[0m [31m58.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m329.1/329.1 kB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
from google.colab import files
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
from transformers import pipeline
from pypdf import PdfReader


print("✅ Libraries loaded successfully")


# =====================================
# Upload PDFs
# =====================================

print("\nUpload ONLY 2–3 PDFs for best performance.\n")

uploaded = files.upload()
pdf_files = list(uploaded.keys())

texts = []

for pdf in pdf_files:
    reader = PdfReader(pdf)

    for page in reader.pages:
        content = page.extract_text()
        if content:
            texts.append(content)

print("✅ Pages extracted:", len(texts))


# =====================================
# Smart Chunking
# =====================================

def chunk_text(text, size=700, overlap=150):
    chunks = []
    start = 0

    while start < len(text):
        end = start + size
        chunks.append(text[start:end])
        start += size - overlap

    return chunks


chunks = []

for t in texts:
    chunks.extend(chunk_text(t))

print("✅ Chunks created:", len(chunks))


# =====================================
# Embeddings (LOW RAM)
# =====================================

print("\nLoading embedding model...")

embed_model = SentenceTransformer(
    "sentence-transformers/paraphrase-MiniLM-L3-v2"
)

embeddings = embed_model.encode(
    chunks,
    show_progress_bar=True
)

dimension = embeddings.shape[1]

index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings))

print("✅ Vector database ready!")


# =====================================
# Load Lightweight LLM
# =====================================

print("\nLoading lightweight LLM...")

generator = pipeline(
    "text2text-generation",
    model="google/flan-t5-small",
    max_length=512,
    temperature=0.1
)

print("✅ LLM ready!")


# =====================================
# Retrieval Function
# =====================================

def retrieve(query, k=3):

    q_embedding = embed_model.encode([query])

    distances, indices = index.search(
        np.array(q_embedding), k
    )

    results = [chunks[i] for i in indices[0]]

    return results


# =====================================
# Intelligent RAG Function
# =====================================

def ask(query):

    contexts = retrieve(query)

    context_text = "\n\n".join(contexts)

    prompt = f"""
You are an expert in urban planning and smart cities.

Using ONLY the context below, generate a structured and detailed answer.

RULES:
- Do NOT give one-word answers.
- Do NOT abbreviate.
- Explain clearly.
- If information is missing, say so.

Format your answer as:

Problem:
Solutions:
Expected Impact:

Context:
{context_text}

Question:
{query}

Answer:
"""

    result = generator(prompt)[0]["generated_text"]

    print("\n✅ ANSWER:\n")
    print(result)

    print("\n📚 Retrieved Context Preview:\n")
    print(context_text[:500])   # Debug preview

    print("\n" + "="*65 + "\n")


# =====================================
# Interactive Agent
# =====================================

print("\n🎉 URBAN INTELLIGENCE AGENT READY!")
print("Type 'exit' to stop.\n")

while True:

    q = input("Ask a question: ")

    if q.lower() == "exit":
        break

    ask(q)


✅ Libraries loaded successfully

Upload ONLY 2–3 PDFs for best performance.



Saving mobility.pdf to mobility.pdf
Saving sustainable.pdf to sustainable.pdf
✅ Pages extracted: 56
✅ Chunks created: 194

Loading embedding model...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/69.6M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

✅ Vector database ready!

Loading lightweight LLM...


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


✅ LLM ready!

🎉 URBAN INTELLIGENCE AGENT READY!
Type 'exit' to stop.

Ask a question: give traffic updates yet bangalore?


Token indices sequence length is longer than the specified maximum sequence length for this model (724 > 512). Running this sequence through the model will result in indexing errors



✅ ANSWER:

                                                                                                                                                                                                                                                               

📚 Retrieved Context Preview:

2019, https://www.hindustantimes.com/cities/
gurugram-draft-mobility-plan-must-address-regional-connectivity-issues/story-
Ag9Dyx97MaQWMZXJTakfyI.html?utm_source=chatgpt.com 
26 Ashish Verma, “Bengaluru’s Mobility Plan Has Major Drawbacks, Says IISc Review,” 
Citizen Matters, January 14, 2020, https://citizenmatters.in/bengaluru-comprehensive-
mobility-plan-drawbacks-iisc-review-mode-share-walking-road-capacity/?utm_
source=chatgpt.com 
27 Ved Ghulghule, “Rs. 18,585 Cr Mobility Masterplan Set to


Ask a question: exit
