<a href="https://colab.research.google.com/github/Tuevu110405/Agentic-RAG-Project/blob/main/Project_Datacom.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install neccessary packages

In [1]:
txt_content = """
torch --index-url https://download.pytorch.org/whl/cu118
torchvision --index-url https://download.pytorch.org/whl/cu118
torchaudio --index-url https://download.pytorch.org/whl/cu118
transformers --prefer-binary --extra-index-url=https://download.pytorch.org/whl/cu118
numpy==1.26.4
accelerate --prefer-binary --extra-index-url=https://download.pytorch.org/whl/cu118
bitsandbytes --prefer-binary --extra-index-url=https://download.pytorch.org/whl/cu118
triton --prefer-binary --extra-index-url=https://download.pytorch.org/whl/cu118
huggingface-hub==0.34.0
langchain==0.1.14
langchain-core==0.1.43
langchain-community==0.0.31
pypdf==4.2.0
sentence-transformers==5.1.0
beautifulsoup4==4.12.3
langserve[all]
faiss-cpu
rapidocr-onnxruntime==1.4.1
unstructured==0.18.5
wget
rank_bm25
langchainhub
underthesea
langchain-experimental==0.0.56
gradio
"""

with open('/content/requirements.txt', 'w') as f:
    f.write(txt_content)

In [2]:
!pip install -r /content/requirements.txt

Collecting numpy==1.26.4 (from -r /content/requirements.txt (line 6))
  Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
Collecting bitsandbytes (from -r /content/requirements.txt (line 8))
  Downloading bitsandbytes-0.49.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting huggingface-hub==0.34.0 (from -r /content/requirements.txt (line 10))
  Downloading huggingface_hub-0.34.0-py3-none-any.whl.metadata (14 kB)
Collecting langchain==0.1.14 (from -r /content/requirements.txt (line 11))
  Downloading langchain-0.1.14-py3-none-any.whl.metadata (13 kB)
Collecting langchain-core==0.1.43 (from -r /content/requirements.txt (line 12))
  Downloading langchain_core-0.1.43-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain-community==0.0.31 (from -r /content/requirements.txt (line 13))
  Downloading langchain_

In [1]:
# Create score data
import pandas as pd

# Create sample data
data = {
    "name": ["Alice", "Bob", "Charlie", "David", "Eve", "Christ"],
    "score": [85, 92, 78, 88, 95, 100]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Save to CSV (index=False prevents adding a separate index column)
df.to_csv("student_scores.csv", index=False)

print("✅ Created student_scores.csv successfully.")
print(df)

✅ Created student_scores.csv successfully.
      name  score
0    Alice     85
1      Bob     92
2  Charlie     78
3    David     88
4      Eve     95
5   Christ    100


# Config

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import torch

class Config:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    embed_model = "google/embeddinggemma-300m"
    router_model = "Qwen/Qwen2.5-1.5B-Instruct"
    # agent_model = "Qwen/Qwen3-8B"
    agent_model = "Qwen/Qwen3-4B"
    router_max_new_tokens = 128
    router_temperature = 0.1
    agent_max_new_tokens = 1024
    agent_temperature = 0.6
    agent_is_quantized = False
    study_path = "/content/drive/MyDrive/Project_Datacom/study.txt"
    article_path = "/content/drive/MyDrive/Project_Datacom/article.txt"
    csv_path = "/content/student_scores.csv"

config = Config()


# Models

In [4]:
import torch
from langchain_core.prompts import PromptTemplate
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
from langchain_community.chat_models import ChatHuggingFace
from langchain_community.llms import HuggingFacePipeline
from langchain_community.embeddings import HuggingFaceEmbeddings


def get_router():
    print("Loading router model...")
    router_name = config.router_model
    tokenizer = AutoTokenizer.from_pretrained(router_name)
    model = AutoModelForCausalLM.from_pretrained(
        router_name,
        device_map="auto",
        trust_remote_code = True,
        quantization_config = None
    )
    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=config.router_max_new_tokens,
        temperature = config.router_temperature,
        repetition_penalty = 1.1,
        return_full_text = False
    )
    llm = HuggingFacePipeline(pipeline=pipe)

    return llm


def get_agent():
    print("Loading router model...")
    router_name = config.agent_model
    tokenizer = AutoTokenizer.from_pretrained(config.agent_model)
    bnb_config = None
    if config.agent_is_quantized:
        bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
            # bnb_4bit_compute_dtype=torch.bfloat16
        )

    model = AutoModelForCausalLM.from_pretrained(
        router_name,
        device_map="auto",
        trust_remote_code = True,
        quantization_config = None
    )
    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=config.agent_max_new_tokens,
        temperature = config.agent_temperature,
        repetition_penalty = 1.2,
        return_full_text = False
    )
    llm = HuggingFacePipeline(pipeline=pipe)

    return llm




# Rag

## bm25 utils

In [5]:
import numpy as np
import re
import unicodedata

def normalize_score_bm25(score):
    s = np.array(score, dtype=float)
    mn, mx = s.min(), s.max()
    if mx - mn < 1e-9:
        return np.ones_like(s)
    return (s - mn) / (mx - mn)

def tokenize_bm25(text):
    # Unicode normalize
    text = unicodedata.normalize("NFC", text)
    # Lowercase
    text = text.lower()
    # Loại bỏ ký tự lạ, chỉ giữ chữ + số
    text = re.sub(r"[^0-9a-zA-Z\u00C0-\u1EF9\s]", " ", text)  # giữ tiếng Việt có dấu
    # Tách chữ và số rời
    text = re.sub(r"(\d)", r" \1 ", text)
    tokens = text.split()
    return tokens

In [6]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from typing import Literal, List
from langchain_core.documents import Document
from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from underthesea import sent_tokenize
import re

class TextSplitter:
    def __init__(
            self,
            separators: List[str] = ['\n\n','\n',' ',''],
            chunk_size = 300,
            chunk_overlap = 0,
            embed_model = None
    ):
        '''
            Args:
                This project mainly use semantic chunking, remember to pass HuggingFace embedding model
        '''
        self.splitter = RecursiveCharacterTextSplitter(
            separators= separators,
            chunk_size = chunk_size,
            chunk_overlap = chunk_overlap
        )
        self.embed_model = embed_model

    def recursive_splitter(self, documents):
        return self.splitter.split_documents(documents)


    def semantic_splitting_v2(self, sentences, threshold=0.6):
        '''
            Split sentences into chunks
            Args:
                sentences: list of sentences
                embed_model: embedding model
                threshold: cosine similarity threshold

        '''
        # 1. Handling input exception
        if not sentences:
            return []

        # 2. Encode all sentences once (Batch processing) -> increasing speed
        # convert_to_numpy=True assure that these embeddings are numpy
        embeddings = self.embed_model.encode(sentences, convert_to_tensor=False)
        print("Encode for chunking sucessfully")

        # 3. Initialize the chunks list
        chunks = [[sentences[0]]] # Bắt đầu chunk đầu tiên với câu đầu

        # 4. traversing the next sentences
        for i in range(1, len(sentences)):
            current_sentence = sentences[i]
            current_embedding = embeddings[i].reshape(1, -1)
            prev_embedding = embeddings[i-1].reshape(1, -1)

            # 5. Cosine Similarity Calculation
            # Returning matrix [[score]], get [0][0] to get real score
            sim_score = cosine_similarity(prev_embedding, current_embedding)[0][0]

            # 6. linking related sentences
            if sim_score >= threshold:
                # if semantic score is high, combining into current chunk
                chunks[-1].append(current_sentence)
            else:
                # Nếu khác nhau (score thấp), tạo chunk mới
                # if semantic score is low, creating new chunk
                chunks.append([current_sentence])

        # 7. Combining each chunk into a paragraph
        final_chunks = [' '.join(chunk) for chunk in chunks]

        return final_chunks

In [7]:
class Loader:
    def __init__(
            self,
            embed_model,
            split_kwargs = {
                     "chunk_size" : 300,
                     "chunk_overlap" : 20
                 }
    ):

        self.embed_model = embed_model

        self.doc_splitter = TextSplitter(embed_model=self.embed_model, **split_kwargs)

    def _clean_text(self, text):
        if not isinstance(text, str):
            return ""
        # 1
        text = re.sub(r'<[^>]+>','', text)
        # 2
        text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)
        text = re.sub(r'\S*@\S*\s?', '', text)
        # 3
        vietnamese_chars = "a-zA-ZàáảãạăằắẳẵặâầấẩẫậèéẻẽẹêềếểễệìíỉĩịòóỏõọôồốổỗộơờớởỡợùúủũụưừứửữựỳýỷỹỵđÀÁẢÃẠĂẰẮẲẴẶÂẦẤẨẪẬÈÉẺẼẸÊỀẾỂỄỆÌÍỈĨỊÒÓỎÕỌÔỒỐỔỖỘƠỜỚỞỠỢÙÚỦŨỤƯỪỨỬỮỰỲÝỶỸỴĐ"
        pattern = f"[^{vietnamese_chars}\d\s.,?!:;]"
        text = re.sub(pattern, '', text)
        # 4
        text = re.sub(r'\s([.,?!:;])', r'\1', text)
        text = re.sub(r'\s+', ' ', text).strip()
        return text

    def semantic_load(self, txt_file_path, semantic_threshold):
        with open(txt_file_path, "r", encoding = 'utf-8') as f:
            sentences = f.read()
        sentences = self._clean_text(sentences)
        sentences = sent_tokenize(sentences)

        chunks = self.doc_splitter.semantic_splitting_v2(sentences=sentences, threshold=semantic_threshold)
        return chunks

  pattern = f"[^{vietnamese_chars}\d\s.,?!:;]"


In [8]:
import torch
import faiss
from langchain_community.retrievers import BM25Retriever
from sentence_transformers import SentenceTransformer
from rank_bm25 import BM25Okapi
import numpy as np
from typing import List
from langchain_core.documents import Document

class VectorDB:
    # build database with multiple collections
    def __init__(
            self,
            # documents: List[str],
            embed_model, # Pass vectordatabase instance
    ):
        self.semantic_index = {}
        self.bm25_index = {}
        self.bm25_id = {}
        self.embedding_model = embed_model
        self.documents = {}

    def _build_rv(self, documents):
        # Initialize BM25 index
        tokenized_docs = [tokenize_bm25(doc) for doc in documents]
        bm25_index = BM25Okapi(tokenized_docs)
        bm25_id = list(range(len(tokenized_docs)))
        return bm25_index, bm25_id

    def _build_db(self, documents):
        # 1. Gọi API để lấy embedding (trả về numpy array)
        # Class wrapper đã xử lý việc loop qua từng document
        embeddings = self.embedding_model.encode(documents)

        # 2. Chuẩn hóa vector (L2 Normalization) bằng Numpy
        # Để dùng Cosine Similarity với Faiss IndexFlatIP, vector phải được chuẩn hóa
        norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
        # Tránh lỗi chia cho 0
        embeddings = embeddings / np.maximum(norms, 1e-10)

        # 3. Chuyển sang float32 cho Faiss
        embeddings = embeddings.astype("float32")

        # 4. Tạo Index Faiss
        index = faiss.IndexFlatIP(embeddings.shape[1])
        index.add(embeddings)
        print(f"Initialize semantic index successfully. Size: {embeddings.shape}")
        return index

    def add_collection(self,
                       collection_name: str,
                       documents: List[str],
                       ):
        # add collection to database
        if collection_name not in self.semantic_index.keys():
            self.semantic_index[collection_name] = self._build_db(documents)
            self.bm25_index[collection_name], self.bm25_id[collection_name] = self._build_rv(documents)
            self.documents[collection_name] = documents
            print(f"Collection {collection_name} added successfully")
        else:
            print(f"Collection {collection_name} already exists")

    def semantic_search(self, query: str, top_k: int, collection_name):
        # 1. Encode query
        query_embedding = self.embedding_model.encode(query) # Trả về (1, dim) hoặc (dim,)

        # 2. Reshape nếu cần (đảm bảo là 2D array)
        if len(query_embedding.shape) == 1:
            query_embedding = query_embedding.reshape(1, -1)

        # 3. Chuẩn hóa query (L2 Norm)
        norm = np.linalg.norm(query_embedding, axis=1, keepdims=True)
        query_embedding = query_embedding / np.maximum(norm, 1e-10)

        # 4. Search
        distance, index = self.semantic_index[collection_name].search(query_embedding.astype("float32"), top_k)

        return distance[0], index[0]

    def text_search(self, query: str, top_k: int, collection_name: str):
        tokenized_query = tokenize_bm25(query)
        scores = self.bm25_index[collection_name].get_scores(tokenized_query)
        normalized_scores = normalize_score_bm25(scores)

        # Lấy top k index
        top_indices = np.argsort(normalized_scores)[::-1][:top_k]
        results = [(self.bm25_id[collection_name][idx], normalized_scores[idx]) for idx in top_indices]
        return results

    def hybrid_search(self, query_dense: str, query_sparse : str, top_k: int,collection_name: str, weights: list = [0.5, 0.5]):
        """
        Args:
            query: Câu truy vấn (String)
            top_k: Số lượng kết quả
            weights: [Semantic weight, Keyword weight]
        """
        # Cả semantic và keyword đều dùng chung string query đầu vào
        sem_distance, sem_index = self.semantic_search(query_dense, top_k, collection_name)
        text_results = self.text_search(query_sparse, top_k, collection_name)

        combined_scores = {}

        # Cộng điểm Semantic (weights[0])
        for idx, score in zip(sem_index, sem_distance):
            if idx != -1: # Faiss trả về -1 nếu không tìm thấy
                combined_scores[idx] = combined_scores.get(idx, 0.0) + weights[0] * score

        # Cộng điểm Keyword/BM25 (weights[1])
        for idx, score in text_results:
            combined_scores[idx] = combined_scores.get(idx, 0.0) + weights[1] * score

        # Sắp xếp kết quả cuối cùng
        sorted_results = sorted(list(combined_scores.items()), key=lambda x: x[1], reverse=True)[:top_k]

        final_docs = []
        for idx, score in sorted_results:
            doc = Document(
                page_content=self.documents[collection_name][idx],
                metadata={"id": int(idx), "score": float(score)}
            )
            final_docs.append(doc)

        return final_docs

In [21]:
from typing import List
import numpy as np

# 1. Tạo Adapter để LangChain hiểu được SentenceTransformer
class HelperEmbeddingsAdapter:
    def __init__(self, model):
        self.model = model

    def embed_documents(self, texts: list[str]) -> list[list[float]]:
        # Chuyển đổi output của SentenceTransformer (numpy) sang list (LangChain yêu cầu)
        embeddings = self.model.encode(texts)
        return embeddings.tolist()

    def embed_query(self, text: str) -> list[float]:
        # Tương tự cho câu query đơn lẻ
        embedding = self.model.encode(text)
        return embedding.tolist()

    # Hàm __call__ để dự phòng nếu LangChain gọi trực tiếp object
    def __call__(self, text: str) -> list[float]:
        return self.embed_query(text)

In [9]:
import re
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
from langchain_experimental.utilities import PythonREPL
from langchain_community.vectorstores import FAISS, DistanceStrategy
import json

python_repl = PythonREPL()



class Router:
    def __init__(self, llm_mini, embed_model):
        self.llm_mini = llm_mini
        self.embed_model_lc = HelperEmbeddingsAdapter(embed_model)
        self.routes = {
            "toxic": [
                "How to make bom",
                "How to be a terrorist"
                "Articles insults the leaders",
                "Region discrimination"
                "Articles breach the sovereignty of Vietnamese territory"
            ],
            "study": [
                "Vietnamese Administrative Reform",
                "When did the Dien Bien Phu happen",
                "Summarizing Vietnamese history in the 20th century"

            ],
            "article": [
                "What is the price of gold today",
                "Which country is the host of World Cup"
            ],
            "score" : [
                "Analyzing the score of student 12A"
                "How many students is considered as excellent"
                "Visualizing the Literature score"
            ],
            "logic": [
                "Calculate the integral of x^2",
                "If A is taller than B and B is taller than C, who is tallest?",
                "Solve this equation: 2x + 5 = 15",
                "What is the probability of rolling a 6?",
                "Find the next number in sequence: 2, 4, 8, 16..."
            ],
            "greet": [
                "Hello",
                "How are you?",
                "What is your name?"
                "What is the meaning of life?"
            ]
        }
        self.vector_store = self._build_route_index()

        self.processor_prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a strict query classifier. Classify the user query into exactly one of these categories.
Mission:
1. Classify the user query into exactly one of these genres:
- toxic: Questions that require guidance on illegal activities (tax evasion, weapons manufacturing, cyberattacks, etc.), and questions that are subversive to the Vietnamese state (violating sovereignty and the Party's policies).
- study: Academic questions about History, Geography
- article: finance, sport
- score: Questions about student scores, grades, statistics, or data analysis.
- logic: General math problems, logical puzzles, riddles, physics calculations, algebra, calculus.
- greet: basic communication, greetings
If the question belongs to study or article genre,proceed with the following tasks
1.1. Rewrite the query: Ensure clarity, complete subject and predicate, correct spelling errors, and return only strings (not lists).
1.2. Extract keywords: Proper nouns, place names, and important technical terms.
MANDATORY RULES:
- Return only a single valid JSON object.
- Do not include markdown ticks (```json ... ```).
- No further explanation.

Example 1:
User: Born Year of Ho Chi Minh?
Output: {{
    "genre": "study",
    "rewrite": "In what year was President Ho Chi Minh born?",
    "keywords": "Ho Chi Minh, born, year"
}}
Example 2:
User: Who is the student with the highest score?
Output: {{
    "genre": "score"
    }}
Example 3:
User: price gold?
Output: {{
    "genre": "article",
    "rewrite": "What is the price of gold today?",
    "keywords": "gold, price"
}}
Example 4:
User: How many students is considered as excellent?
Output: {{
    "genre": "score
    }}"""),
   ("user", """
   {query}
   """)

        ])
        self.llm_chain = self._build_llm_router()


    def _build_route_index(self):
        print("Building Index For Semantic Router... ")
        texts = []
        metadatas = []
        for agent, examples in self.routes.items():
            for example in examples:
                texts.append(example)
                metadatas.append({"agent": agent})

        return FAISS.from_texts(texts,
                                self.embed_model_lc,
                                metadatas=metadatas,
                                distance_strategy=DistanceStrategy.COSINE
                                )

    def _build_llm_router(self):
        chain = (
            self.processor_prompt
            | self.llm_mini
            | StrOutputParser()
        )

        return chain



    def _process_llm_result(self, response_text, query):
        text = response_text.strip()
        print(f"DEBUG Raw Output: {text}")

        try:
            # --- CHIẾN THUẬT 1: Thử parse chuẩn (Trường hợp lý tưởng) ---
            # Tìm JSON object bao trùm nhất (Greedy)
            match = re.search(r"\{.*\}", text, re.DOTALL)
            if match:
                json_str = match.group()
                # Fix lỗi quote
                if "'" in json_str and '"' not in json_str:
                     json_str = json_str.replace("'", '"')
                json_str = re.sub(r",\s*}", "}", json_str)

                # Thử parse
                data = json.loads(json_str)
                return data.get("rewrite", query), data.get("keywords", query), data.get("genre", "greet")

        except json.JSONDecodeError:
            # Nếu parse chuẩn thất bại (thường do lỗi "Extra data" như bạn gặp)
            print("   -> Lỗi JSON chuẩn. Chuyển sang chế độ Gộp (Merge)...")
            pass

        # --- CHIẾN THUẬT 2: Gộp nhiều JSON rời rạc (Trường hợp của bạn) ---
        try:
            # Tìm TẤT CẢ các đoạn {...} rời rạc (Non-greedy)
            # Regex này tìm từng cụm {} nhỏ nhất có thể
            matches = re.findall(r"\{.*?\}", text, re.DOTALL)

            if len(matches) > 1:
                merged_data = {}
                for m in matches:
                    try:
                        # Fix lỗi quote cho từng mảnh
                        if "'" in m and '"' not in m: m = m.replace("'", '"')
                        # Parse từng mảnh
                        partial_data = json.loads(m)
                        # Gộp vào dict tổng
                        merged_data.update(partial_data)
                    except:
                        continue # Bỏ qua mảnh lỗi

                print(f"   -> Đã gộp thành công: {merged_data}")
                return (
                    merged_data.get("rewrite", query),
                    merged_data.get("keywords", query),
                    merged_data.get("genre", "greet")
                )
        except Exception as e:
            print(f"   -> Lỗi khi gộp JSON: {e}")

        # --- FALLBACK ---
        print("Không tìm thấy JSON hợp lệ. Fallback về Greet.")
        return query, query, "greet"


    def decide_route(self, query: str):
        results = self.vector_store.similarity_search_with_score(query, k=1)
        doc, score = results[0]
        print(f"Semantic Score (Cosine): {score:.4f} -> {doc.metadata['agent']}")

        if score < 0.4:
            print(f"Fast-tracked to: {doc.metadata['agent']}")
            return query, query, doc.metadata["agent"]

        print("Score too low. Asking Qwen-Mini...")
        try:
            # Gọi invoke với dictionary input khớp với prompt {query}
            response_text = self.llm_chain.invoke({"query": query})
            return self._process_llm_result(response_text, query)
        except Exception as e:
            print(f"Lỗi LLM Router: {e}")
            return query, query, "greet"





class Main:
    def __init__(self, llm_mini, llm_large, vector_db, embed_model):
        self.llm_mini = llm_mini
        self.query_processor = Router(llm_mini, embed_model)
        self.llm_large = llm_large
        self.vector_db = vector_db
        self.python_repl = PythonREPL()
        self.answer_prompt_study = ChatPromptTemplate.from_messages([
            ("system", """You are a helpful AI assistant; your task is to answer question from user related to history and geography
SAFETY AND HONESTY RULES:
1. Only use the information in the "Context" section below to answer.
2. If you cannot find the information in the context, use your own knowledge, but prioritize the context.


OUTPUT FORMAT:
- Briefly step-by-step reasoning.

"""),
            ("user", """Context:
{context}

Query:
{question}
""")
        ])

        self.answer_prompt_article = ChatPromptTemplate.from_messages([
            ("system", """You are a helpful AI assistant; your task is to answer question from user related to sport and finance
SAFETY AND HONESTY RULES:
1. Only use the information in the "Context" section below to answer.
2. If you cannot find the information in the context, use your own knowledge, but prioritize the context.

OUTPUT FORMAT:
- Briefly step-by-step reasoning.

"""),
            ("user", """Context:
{context}

Query:
{question}
""")
        ])
        self.answer_prompt_greet = ChatPromptTemplate.from_messages([
            ("system", """You are a helpful AI assistant; your task is to greet users and introduce you as an assistant of answer question from user related to study, article, math, student score
            RULES: Remember to answer with positive thought
"""
)
        ])

    #rag direction methods
    #format docs resulted from retrieving(rag direction)
    def format_docs(self, docs):
        return "\n\n".join(f"{doc.page_content}" for doc in docs)

    #math direction methods
    #extract code generated from llm
    def extract_code(self, text):
        match = re.search(r"```python\n(.*?)\n```", text, re.DOTALL)
        return match.group(1) if match else None


    #study direction
    def get_answer_study(self, question_text, rewrite, keywords, top_k=5, weights=[0.7, 0.3]):
        if isinstance(rewrite, list):
            rewrite = " ".join(rewrite)

        print(f"DEBUG-Rewrite: {rewrite}")
        print(f"DEBUG-Keywords: {keywords}")

        docs = self.vector_db.hybrid_search(#query = rewritten_query,

                                            query_dense = rewrite,
                                            query_sparse = keywords,
                                            top_k=top_k,
                                            weights=weights,
                                            collection_name = 'study')
        context_text = self.format_docs(docs)

        chain = (
            {
                "context" : lambda x: context_text,
                "question": lambda x : question_text
            }
            | self.answer_prompt_study
            | self.llm_large
        )

        return chain.invoke(question_text)

    def get_answer_article(self, question_text, rewrite, keywords, top_k=5, weights=[0.7, 0.3]):
        if isinstance(rewrite, list):
            rewrite = " ".join(rewrite)

        print(f"DEBUG-Rewrite: {rewrite}")
        print(f"DEBUG-Keywords: {keywords}")

        docs = self.vector_db.hybrid_search(#query = rewritten_query,

                                            query_dense = rewrite,
                                            query_sparse = keywords,
                                            top_k=top_k,
                                            weights=weights,
                                            collection_name = 'article')
        context_text = self.format_docs(docs)

        chain = (
            {
                "context" : lambda x: context_text,
                "question": lambda x : question_text
            }
            | self.answer_prompt_article
            | self.llm_large
        )

        return chain.invoke(question_text)

    def get_answer_toxic(self, question_text):
        chain = (
            {
                "question": lambda x : question_text

            }
            | self.answer_prompt_toxic
            | self.llm_large
        )
        return chain.invoke(question_text)

    def get_answer_greet(self, question_text):
        chain = (
            {
                "question": lambda x : question_text,
            }
            | self.answer_prompt_greet
            | self.llm_mini
        )
        return chain.invoke(question_text)

    #math direction of TA
    def _clean_code(self, code):
        """Tự động thêm import hoặc sửa lỗi cú pháp cơ bản"""
        if "import math" not in code:
            code = "import math\nimport json\n" + code
        return code

    # --- MAIN METHOD: NÂNG CẤP ---
    def get_answer_math_logic(self, question_text, max_retries=3):
        # 1. Khởi tạo lịch sử hội thoại (Memory ngắn hạn)
        # Lưu ý: Chúng ta dùng list Message để duy trì ngữ cảnh cho việc sửa lỗi
        messages = [
            SystemMessage(content="""You are a helpful AI assistant; your task is to answer questions by writing Python code that accurately solves the user's question.

1. Write Python code to SOLVE the user's logic puzzle, math problem, or data request.
2. The code MUST print a friendly, natural language response explaining the result.
   - Bad: `print(x)`
   - Bad: `print("Answer: A")`
   - Good: `print(f"I calculated it, and the total is {total}. This is because...")`

Rules for Code:
1. The name of variable: Avoid using Python keywords (`lambda`, `class`, `return`, `min`, `max`, `sum`...). Should use `var_x`, `total_v`,...
Example:
Example A: The "Who is Tallest?" Logic Puzzle
User: "An is taller than Binh. Chi is taller than An. Who is the tallest?"

```python
# 1. Define relative heights (using an arbitrary base)
# Let Binh = 100 units
heights = {{}}
heights['Binh'] = 100
heights['An'] = heights['Binh'] + 10  # An is taller than Binh
heights['Chi'] = heights['An'] + 10    # Chi is taller than An

# 2. Sort to find the tallest
sorted_people = sorted(heights.items(), key=lambda x: x[1], reverse=True)
tallest_name = sorted_people[0][0]

# 3. Print a conversational explanation
print(f"Based on your description, **{{tallest_name}}** is the tallest.")
print("Here is the order from tallest to shortest:")
for name, height in sorted_people:
    print(f"- {{name}}")

```
Example B: The "Chicken and Rabbit" Problem (Algebra)
User: "A farm has 35 heads and 94 legs. How many chickens and rabbits?"
```python
from sympy import symbols, Eq, solve

# 1. Setup Symbols
c, r = symbols('c r') # c=chickens, r=rabbits

# 2. Equations
# Heads: c + r = 35
# Legs:  2c + 4r = 94
eq1 = Eq(c + r, 35)
eq2 = Eq(2*c + 4*r, 94)

# 3. Solve
sol = solve((eq1, eq2), (c, r))
chickens = sol[c]
rabbits = sol[r]

# 4. Conversational Output
print(f"I solved the system of equations and found the answer:")
print(f"- **{{rabbits}} Rabbits**")
print(f"- **{{chickens}} Chickens**")
print(f"Check: {{chickens}} + {{rabbits}} = 35 heads, and 2*{{chickens}} + 4*{{rabbits}} = 94 legs.")
```
Example C: General Math / Probability
User: "What is the probability of rolling a sum of 7 with two dice?"
```python
# 1. Calculate all outcomes
outcomes = [(d1, d2) for d1 in range(1, 7) for d2 in range(1, 7)]
total_combinations = len(outcomes)

# 2. Find winners (Sum = 7)
winners = [pair for pair in outcomes if sum(pair) == 7]
count = len(winners)

# 3. Calculate Stats
prob = count / total_combinations
percent = prob * 100

# 4. Conversational Output
print(f"There are {{total_combinations}} possible combinations for two dice.")
print(f"A sum of 7 appears {{count}} times: {{winners}}.")
print(f"So, the probability is **{{count}}/{{total_combinations}}** (approx **{{percent:.1f}}%**).")
```


"""

),
            HumanMessage(content=f"Câu hỏi: {question_text}")
        ]


        # 2. Vòng lặp Suy luận & Sửa lỗi (ReAct Loop)
        # 2. Vòng lặp Suy luận & Sửa lỗi (ReAct Loop)
        for attempt in range(max_retries):
            print(f"   [Logic] Attempt {attempt + 1}/{max_retries}...")

            # Bước A: Gọi LLM để sinh code
            try:
                ai_msg = self.llm_large.invoke(messages)
            except Exception as e:
                print(f"   [Logic] LLM Error: {e}")
                return "I apologize, but I encountered an error while processing your request."

            content = ai_msg.content if hasattr(ai_msg, 'content') else str(ai_msg)
            messages.append(ai_msg) # Lưu message của AI vào lịch sử

            # Bước B: Trích xuất Code
            code_block = self.extract_code(content)

            if not code_block:
                # Nếu không có code, có thể AI đã trả lời trực tiếp bằng lời
                # Nếu là lượt cuối cùng, trả về luôn nội dung đó
                if attempt == max_retries - 1:
                    return content

                # Nếu chưa phải lượt cuối, nhắc AI viết code (vì ta đang ở trong Logic Agent)
                messages.append(HumanMessage(content="You didn't provide any Python code. Please write the Python code to calculate the answer."))
                continue

            # Bước C: Thực thi Code
            code_block = self._clean_code(code_block)
            print(f"   [Logic] Executing Code...")

            try:
                # Chạy code và lấy output (print)
                exec_result = self.python_repl.run(code_block)
                exec_result = str(exec_result).strip()
                print(f"   [Logic] Output: {exec_result}")

                # --- THAY ĐỔI QUAN TRỌNG: XỬ LÝ KẾT QUẢ ---

                # 1. Nếu code chạy nhưng không in ra gì
                if not exec_result:
                    feedback = "The code executed successfully but printed nothing. Please rewrite the code to PRINT the final result."
                    messages.append(HumanMessage(content=feedback))
                    continue

                # 2. Nếu có kết quả: Đưa kết quả lại cho LLM để sinh câu trả lời tự nhiên
                # Đây là bước "Reasoning based on Tool Output"
                interpretation_prompt = (
                    f"Code executed successfully. Output:\n{exec_result}\n\n"
                    "Based on this output, please provide a clear, natural language answer to the user's question."
                )

                # Gọi LLM lần cuối để diễn giải kết quả
                messages.append(HumanMessage(content=interpretation_prompt))
                final_response_msg = self.llm_large.invoke(messages)

                return final_response_msg.content # Trả về câu trả lời tự nhiên

            except Exception as e:
                # Bước E: Xử lý lỗi (Self-Correction)
                error_msg = str(e)
                print(f"   [Logic] Runtime Error: {error_msg}")

                # Gửi lỗi lại cho LLM để nó tự sửa code
                fix_prompt = f"The code encountered an error: {error_msg}. Please rewrite the Python code to fix this."
                messages.append(HumanMessage(content=fix_prompt))
                continue

        # 3. Fallback (Nếu hết lượt mà vẫn lỗi)
        print("   [Logic] Retries exhausted.")
        return "I tried to run the calculation multiple times but encountered errors. Please check the logic."

    def get_answer_score(self, question_text, csv_path = config.csv_path):
        messages = [
        SystemMessage(content=f"""You are a Python Expert and Data Analyst.


RESOURCES:
- You have a CSV file at: '{csv_path}'
- Columns: "name", "score

TASK:
1. Write Python code to load the CSV using pandas.
2. Calculate the answer based on the user's query.
3. The code MUST print the final answer in a friendly, natural language format.
   - ❌ Bad: `print(5)`
   - ❌ Bad: `print("Answer: A")`
   - ✅ Good: `print(f"The average Math score is {{avg_score:.2f}}.")`
   - ✅ Good: `print(f"The student with the highest score is {{name}} with {{score}} points.")`

TIPS:
- Use `pd.read_csv('{csv_path}')` to load data.
- Handle potential empty results gracefully.
- If the user asks for a list, print it nicely (e.g., bullet points).

Return ONLY the Python code block.


"""

),
            HumanMessage(content=f"Câu hỏi: {question_text}")
        ]


        # 2. Vòng lặp Suy luận & Sửa lỗi (ReAct Loop)
        # 2. Vòng lặp Suy luận & Sửa lỗi (ReAct Loop)
        for attempt in range(3):
            print(f"   [Score Agent] Attempt {attempt+1}...")

            # 1. Get Code from LLM
            try:
                ai_msg = self.llm_large.invoke(messages)
                messages.append(ai_msg)
            except Exception as e:
                return f"System Error: {e}"
            content = ai_msg.content if hasattr(ai_msg, 'content') else str(ai_msg)
            # 2. Extract Code
            code = self.extract_code(content)
            if not code:
                # If LLM replied with text only, return it (it might be a simple refusal or answer)
                return content

            # 3. Execute Code
            try:
                # Add import if missing
                if "import pandas" not in code:
                    code = "import pandas as pd\n" + code

                output = self.python_repl.run(code)
                print(f"   [Output]: {output}")

                # Check if output is valid
                if output.strip():
                    return output.strip() # Return the print output directly
                else:
                    # Feedback loop: Code ran but printed nothing
                    messages.append(HumanMessage(content="The code ran successfully but printed nothing. Please modify the code to PRINT the final answer string."))

            except Exception as e:
                print(f"   [Error]: {e}")
                # Feedback loop: Code failed
                messages.append(HumanMessage(content=f"Runtime Error: {e}. Please fix the python code."))

        return "I tried to analyze the data but encountered technical errors."


    def flow(self, q_text):
        """
        Hàm chính nhận 1 dictionary từ file JSON và trả về câu trả lời.

        """
        final_answer = ""



        # 1. Format choices thành string A. ... B. ...


        # 2. Router & Processing (Gọi 1 lần duy nhất)

        rewrite, keywords, genre = self.query_processor.decide_route(q_text)
        print(f"ROUTER DECISION: {genre}")

        # 3. Routing


        try:
            if genre == "logic":
                final_answer = self.get_answer_math_logic(q_text)

            elif genre == "study":
                final_answer = self.get_answer_study(q_text, rewrite, keywords)
            elif genre == "article":
                final_answer = self.get_answer_article(q_text, rewrite, keywords)
            elif genre == "toxic":
                final_answer = self.get_answer_toxic(q_text)
            elif genre == "score":
                final_answer = self.get_answer_score(q_text)


            else: # default as greet
                final_answer = self.get_answer_greet(q_text)

        except Exception as e:
            print(f"CRITICAL ERROR in flow: {e}")

        return final_answer

# Interface

In [18]:
import gradio as gr
def ui(app_instance):
    def generate_response(message, history):
        try:
            response = app_instance.flow(message)
            return response
        except Exception as e:
            return f"Error {e}"

    demo = gr.ChatInterface(
        fn = generate_response,
        title = "Agentic RAG System",
        description = """Brilliant Agentic System Answering question in different domain
        - **Study**: History, Geography
        - **Article**: Sport, Gold Price
        - **Logic**: Math, Logical puzzle(coding support)
        - **Score**: Student score analysis
        """,
        theme= "soft",

        examples=[
            "Chiến thắng Điện Biên Phủ năm nào?",          # Study
            "Giá vàng hôm nay thế nào?",                  # Article
            "Tính tổng của 15 và 25?",                    # Logic
            "Học sinh nào có điểm cao nhất?",             # Score
            "Hello bot",                                  # Greet
            "Giải phương trình 2x + 5 = 15"               # Logic
        ],
        cache_examples=False,

    )
    return demo

# Main

In [14]:
# initialize vector store
embed_model = SentenceTransformer(Config.embed_model)


vector_db = VectorDB(embed_model=embed_model)
loader = Loader(embed_model=embed_model)

# Load dữ liệu vào VectorDB
print("   -> Indexing Study Data...")
study_chunks = loader.semantic_load(config.study_path, semantic_threshold=0.5)
vector_db.add_collection("study", study_chunks)

print("   -> Indexing Article Data...")
article_chunks = loader.semantic_load(config.article_path, semantic_threshold=0.5)
vector_db.add_collection("article", article_chunks)

modules.json:   0%|          | 0.00/573 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/997 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/18.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/58.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.49k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.21G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.16M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/35.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/312 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/134 [00:00<?, ?B/s]

2_Dense/model.safetensors:   0%|          | 0.00/9.44M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/134 [00:00<?, ?B/s]

3_Dense/model.safetensors:   0%|          | 0.00/9.44M [00:00<?, ?B/s]

   -> Indexing Study Data...
Encode for chunking sucessfully
Initialize semantic index successfully. Size: (415, 768)
Collection study added successfully
   -> Indexing Article Data...
Encode for chunking sucessfully
Initialize semantic index successfully. Size: (66, 768)
Collection article added successfully


In [15]:
# initialize lm
llm_router = get_router()
llm_agent = get_agent()



Loading router model...


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

Device set to use cuda:0


Loading router model...


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/726 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/3.99G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/99.6M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Device set to use cuda:0


In [22]:
app = Main(llm_mini = llm_router, llm_large = llm_agent, vector_db = vector_db, embed_model=embed_model)



Building Index For Semantic Router... 


In [None]:
print("\n==========================================")
print("HỆ THỐNG ĐÃ SẴN SÀNG! (Gõ 'exit' để thoát)")
print("==========================================")

# List câu hỏi test để tự động chạy thử 1 vòng
test_queries = [
    # "Chiến thắng Điện Biên Phủ năm nào?",          # Study
    # "Giá vàng hôm nay thế nào?",                  # Article
    # "Tính tổng của 15 và 25?",                    # Logic
    "Học sinh nào có điểm cao nhất?",             # Score
    "Hello bot"                                   # Greet
]

# Chạy test tự động trước
print("\n--- Running Auto-Tests ---")
for q in test_queries:
    print(f"\nTest Query: {q}")
    result = app.flow(q)
    print(f"   [ANSWER]: {result}\n")

# Chạy vòng lặp chat
while True:
    user_input = input("\nBạn: ")
    if user_input.lower() in ["exit", "quit"]:
        break

    payload = {"qid": "user", "question": user_input, "choices": []}
    response = app.flow(payload)
    print(f"Bot: {response}")


HỆ THỐNG ĐÃ SẴN SÀNG! (Gõ 'exit' để thoát)

--- Running Auto-Tests ---

Test Query: Học sinh nào có điểm cao nhất?
Semantic Score (Cosine): 0.9709 -> score
Score too low. Asking Qwen-Mini...
DEBUG Raw Output: Output: {
        "genre": "score",
        "rewrite": "Find out who has the best grade among all students.",
        "keywords": "best grade, student"
    }
ROUTER DECISION: score
   [Score Agent] Attempt 1...
   [ANSWER]:  Điểm của họ là bao nhiêu?
Okay, let's see. The user is asking which student has the highest score and what that score is. So first, I need to load the CSV file using pandas. The file is called'student_scores.csv' and has columns 'name' and'score'. 

First step: Load the data. I'll use pd.read_csv to read the file. Then, I need to find the maximum score and the corresponding name. 

Wait, but how do I handle if there are multiple students with the same highest score? The question says "học sinh nào" which is singular, so maybe assume there's one unique top sco

In [23]:
ui = ui(app)
ui.launch(share=True, debug=True)

  self.chatbot = Chatbot(


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://4fa04daf002b8fee7f.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Semantic Score (Cosine): 0.2756 -> logic
Fast-tracked to: logic
ROUTER DECISION: logic
   [Logic] Attempt 1/3...




   [Logic] Executing Code...
   [Logic] Output: NameError("name 'x' is not defined")
   [Logic] Runtime Error: CUDA out of memory. Tried to allocate 116.00 MiB. GPU 0 has a total capacity of 22.16 GiB of which 29.38 MiB is free. Process 35333 has 22.13 GiB memory in use. Of the allocated memory 21.56 GiB is allocated by PyTorch, and 350.56 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
   [Logic] Attempt 2/3...
   [Logic] LLM Error: CUDA out of memory. Tried to allocate 138.00 MiB. GPU 0 has a total capacity of 22.16 GiB of which 35.38 MiB is free. Process 35333 has 22.12 GiB memory in use. Of the allocated memory 21.57 GiB is allocated by PyTorch, and 332.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try 

