# RAG Demo (Bilingual): Hybrid Retrieval + Evaluation (Precision@k, Recall@k, MRR)

This notebook is a **teaching-friendly** demo of key Retrieval-Augmented Generation (RAG) concepts:

## What you will build
- **Offline phase** (indexing):
  - documents → chunking → embeddings → **FAISS ANN index**
  - documents → tokenization → **BM25 index**
- **Online phase** (question answering):
  - query → hybrid retrieval (**vector + BM25**) → metadata filtering → context building → *grounded* answer
- **Evaluation**:
  - **Precision@k**, **Recall@k**, **MRR**
- **Auditability & latency**:
  - structured logs + timing breakdown

The examples and comments are bilingual: Albanian + English.

In [None]:
# =========================================================
# 0) Install dependencies (Colab)
# =========================================================
# faiss-cpu: vector search / ANN indexing
# sentence-transformers: embeddings for docs + queries
# rank-bm25: keyword search baseline
# pandas/numpy: data handling
!pip -q install faiss-cpu sentence-transformers rank-bm25 pandas numpy

In [None]:
# =========================================================
# 1) Imports + configuration
# =========================================================
import os
import time
import re
from dataclasses import dataclass

import numpy as np
import pandas as pd
import faiss

from sentence_transformers import SentenceTransformer
from rank_bm25 import BM25Okapi

# ---------------------------------------------------------
# OPTIONAL: Silence HF "HF_TOKEN missing" warning.
# Public models work without authentication; this keeps output clean.
# ---------------------------------------------------------
os.environ["HF_HUB_DISABLE_IMPLICIT_TOKEN"] = "1"

np.random.seed(42)

## 2) Toy corpus (bilingual) with metadata

We use a tiny corpus that intentionally includes:
- **new vs old versions** (year metadata)
- **official vs unofficial** sources
- different **authorities**
- Albanian + English text

This allows us to demonstrate why **filtering** and **audit logs** matter.

In [None]:
# =========================================================
# 2) Create a bilingual toy corpus with metadata
# =========================================================
docs = [
    {
        "doc_id": "PROC-001",
        "title": "Procedura zyrtare: Përmbytjet (2023) / Official Flood Procedure (2023)",
        "text": (
            "Në rast rreziku për përmbytje, Bashkia lëshon paralajmërime zyrtare "
            "dhe njofton popullatën përmes kanaleve të komunikimit."
        ),
        "source": "official",
        "authority": "municipality",
        "lang": "sq",
        "year": 2023,
        "version": "v2",
    },
    {
        "doc_id": "PROC-001-OLD",
        "title": "Procedura e vjetër: Përmbytjet (2019) / Old Flood Procedure (2019)",
        "text": (
            "Në rast përmbytjeje, njoftimet bëhen në mënyrë lokale nga vullnetarët. "
            "(Dokument i vjetër, jo i përditësuar)."
        ),
        "source": "official",
        "authority": "municipality",
        "lang": "sq",
        "year": 2019,
        "version": "v1",
    },
    {
        "doc_id": "EVAC-010",
        "title": "Protokoll evakuimi (2022) / Evacuation Protocol (2022)",
        "text": (
            "Evakuimi fillon kur Shërbimi i Emergjencave lëshon urdhër evakuimi "
            "dhe koordinohet me policinë dhe zjarrfikëset."
        ),
        "source": "official",
        "authority": "civil_protection",
        "lang": "sq",
        "year": 2022,
        "version": "v1",
    },
    {
        "doc_id": "NATO-ENG-010",
        "title": "NATO Flood Response SOP (2021)",
        "text": (
            "The incident commander issues warnings and coordinates evacuation "
            "with local authorities and emergency responders."
        ),
        "source": "official",
        "authority": "NATO",
        "lang": "en",
        "year": 2021,
        "version": "v1",
    },
    {
        "doc_id": "BLOG-777",
        "title": "Postim jozyrtar (blog) / Unofficial blog post",
        "text": (
            "Dikush mund të shkruajë online çfarëdo për përmbytjet; "
            "ky burim nuk është zyrtar dhe mund të jetë i pasaktë."
        ),
        "source": "unofficial",
        "authority": "blog",
        "lang": "sq",
        "year": 2020,
        "version": "v1",
    },
    {
        "doc_id": "LAW-100",
        "title": "Referencë ligjore (2020) / Legal Reference (2020)",
        "text": (
            "Neni 12: Autoriteti përgjegjës për menaxhimin e emergjencave "
            "përcaktohet nga ligji dhe aktet nënligjore përkatëse."
        ),
        "source": "official",
        "authority": "law",
        "lang": "sq",
        "year": 2020,
        "version": "v1",
    },
]

df = pd.DataFrame(docs)
df

Unnamed: 0,doc_id,title,text,source,authority,lang,year,version
0,PROC-001,Procedura zyrtare: Përmbytjet (2023) / Officia...,"Në rast rreziku për përmbytje, Bashkia lëshon ...",official,municipality,sq,2023,v2
1,PROC-001-OLD,Procedura e vjetër: Përmbytjet (2019) / Old Fl...,"Në rast përmbytjeje, njoftimet bëhen në mënyrë...",official,municipality,sq,2019,v1
2,EVAC-010,Protokoll evakuimi (2022) / Evacuation Protoco...,Evakuimi fillon kur Shërbimi i Emergjencave lë...,official,civil_protection,sq,2022,v1
3,NATO-ENG-010,NATO Flood Response SOP (2021),The incident commander issues warnings and coo...,official,NATO,en,2021,v1
4,BLOG-777,Postim jozyrtar (blog) / Unofficial blog post,Dikush mund të shkruajë online çfarëdo për për...,unofficial,blog,sq,2020,v1
5,LAW-100,Referencë ligjore (2020) / Legal Reference (2020),Neni 12: Autoriteti përgjegjës për menaxhimin ...,official,law,sq,2020,v1


## 3) Chunking (safe) + `chunks_df`

Chunking is often needed because:
- documents can be long
- retrieval works better at "passage" level
- LLM context windows are limited

We use a **simple character-based chunker** that is safe:
- no infinite loops
- handles None / NaN
- validates overlap < chunk_size

In [None]:
# =========================================================
# 3) Safe chunking (no infinite loops)
# =========================================================
def chunk_text(text, chunk_size=180, overlap=40):
    # Handle missing values safely
    if text is None or (isinstance(text, float) and pd.isna(text)):
        return []
    text = str(text)

    # Validate parameters to avoid infinite loops
    if chunk_size <= 0:
        raise ValueError("chunk_size must be > 0")
    if overlap < 0:
        raise ValueError("overlap must be >= 0")
    if overlap >= chunk_size:
        raise ValueError("overlap must be < chunk_size (otherwise start won't advance)")

    chunks = []
    start = 0
    step = chunk_size - overlap  # guarantees progress

    while start < len(text):
        end = min(len(text), start + chunk_size)
        chunk = text[start:end].strip()
        if chunk:
            chunks.append(chunk)

        # Stop if we've reached the end
        if end == len(text):
            break

        # Always move forward
        start += step

    return chunks

# Build a chunk-level table from the document-level df
rows = []
for _, r in df.iterrows():
    chunks = chunk_text(r["text"], chunk_size=160, overlap=30)
    for j, ch in enumerate(chunks):
        rows.append({
            "chunk_id": f"{r['doc_id']}::c{j}",
            "doc_id": r["doc_id"],
            "title": r["title"],
            "chunk_text": ch,
            "source": r["source"],
            "authority": r["authority"],
            "lang": r["lang"],
            "year": int(r["year"]),
            "version": r["version"],
        })

chunks_df = pd.DataFrame(rows)
chunks_df

Unnamed: 0,chunk_id,doc_id,title,chunk_text,source,authority,lang,year,version
0,PROC-001::c0,PROC-001,Procedura zyrtare: Përmbytjet (2023) / Officia...,"Në rast rreziku për përmbytje, Bashkia lëshon ...",official,municipality,sq,2023,v2
1,PROC-001-OLD::c0,PROC-001-OLD,Procedura e vjetër: Përmbytjet (2019) / Old Fl...,"Në rast përmbytjeje, njoftimet bëhen në mënyrë...",official,municipality,sq,2019,v1
2,EVAC-010::c0,EVAC-010,Protokoll evakuimi (2022) / Evacuation Protoco...,Evakuimi fillon kur Shërbimi i Emergjencave lë...,official,civil_protection,sq,2022,v1
3,NATO-ENG-010::c0,NATO-ENG-010,NATO Flood Response SOP (2021),The incident commander issues warnings and coo...,official,NATO,en,2021,v1
4,BLOG-777::c0,BLOG-777,Postim jozyrtar (blog) / Unofficial blog post,Dikush mund të shkruajë online çfarëdo për për...,unofficial,blog,sq,2020,v1
5,LAW-100::c0,LAW-100,Referencë ligjore (2020) / Legal Reference (2020),Neni 12: Autoriteti përgjegjës për menaxhimin ...,official,law,sq,2020,v1


## 4) Offline indexing: embeddings + FAISS ANN + BM25

We build **two indices**:
1) **FAISS HNSW** for semantic (vector) search
2) **BM25** for lexical (keyword) search

Later, we combine them in **hybrid retrieval**.

In [None]:
# =========================================================
# 4) Embeddings + FAISS ANN + BM25 (offline)
# =========================================================

# Multilingual embeddings (works for Albanian + English)
embed_model = SentenceTransformer("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")

# Encode chunk texts; normalize so dot-product == cosine similarity
X = embed_model.encode(
    chunks_df["chunk_text"].tolist(),
    normalize_embeddings=True,
    show_progress_bar=True
).astype("float32")

d = X.shape[1]

# FAISS ANN index (HNSW)
index = faiss.IndexHNSWFlat(d, 32)   # M=32
index.hnsw.efSearch = 64             # higher => better recall, slower
index.add(X)

# Simple tokenizer (demo)
def tokenize(text):
    text = text.lower()
    # Keep basic latin chars + Albanian ë/ç + digits; replace others with spaces
    text = re.sub(r"[^a-zëç0-9\s]", " ", text)
    return [t for t in text.split() if t]

# BM25 index over chunk texts
bm25 = BM25Okapi([tokenize(t) for t in chunks_df["chunk_text"].tolist()])

print(f"Built indexes over {len(chunks_df)} chunks | Embedding dim = {d}")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Built indexes over 6 chunks | Embedding dim = 384


## 5) Retrieval functions (vector / BM25 / hybrid) + filtering

**Hybrid idea:** combine semantic similarity (vector) with lexical match (BM25).  
This is often stronger than either alone, especially for mixed query types:
- paraphrases (vector helps)
- exact identifiers / legal references (BM25 helps)

Then we apply metadata filters:
- only official sources
- year >= threshold (avoid outdated content)
- language whitelist

In [None]:
# =========================================================
# 5) Retrieval (vector / BM25 / hybrid) + metadata filtering
# =========================================================

def vector_search(query, k=10):
    # Embed query and retrieve top-k via FAISS (cosine via dot-product on normalized vectors)
    q = embed_model.encode([query], normalize_embeddings=True).astype("float32")
    D, I = index.search(q, k)
    return [(int(i), float(s)) for i, s in zip(I[0], D[0])]

def bm25_search(query, k=10):
    scores = bm25.get_scores(tokenize(query))
    top = np.argsort(-scores)[:k]
    return [(int(i), float(scores[i])) for i in top]

def hybrid_search(query, k=10, w_vec=0.65, w_kw=0.35, pool=30):
    # 1) Candidate pool from both retrievers
    vec = vector_search(query, k=pool)
    kw  = bm25_search(query, k=pool)
    cand = set([i for i,_ in vec] + [i for i,_ in kw])

    # 2) Score dictionaries
    vec_d = dict(vec)
    kw_d  = dict(kw)

    # 3) Normalize both score ranges to [0,1] so weighting is meaningful
    vec_vals = np.array(list(vec_d.values()) + [0.0], dtype="float32")
    kw_vals  = np.array(list(kw_d.values())  + [0.0], dtype="float32")
    vmin, vmax = float(vec_vals.min()), float(vec_vals.max())
    kmin, kmax = float(kw_vals.min()), float(kw_vals.max())

    def norm(x, mn, mx):
        return 0.0 if mx == mn else (x - mn) / (mx - mn)

    # 4) Weighted combination
    scored = []
    for i in cand:
        sv = norm(vec_d.get(i, 0.0), vmin, vmax)
        sk = norm(kw_d.get(i,  0.0), kmin, kmax)
        scored.append((i, float(w_vec*sv + w_kw*sk), float(sv), float(sk)))

    scored.sort(key=lambda x: -x[1])
    return scored[:k]

def apply_filters(items, only_official=True, year_gte=None, langs=None):
    out = []
    for item in items:
        i = item[0]
        r = chunks_df.iloc[i]
        if only_official and r["source"] != "official":
            continue
        if year_gte is not None and int(r["year"]) < int(year_gte):
            continue
        if langs is not None and r["lang"] not in langs:
            continue
        out.append(item)
    return out

## 6) Context building + simple grounded answer (no API keys)

To keep the demo fully runnable without external keys:
- we build a context string (what you'd feed to an LLM)
- we produce a *toy grounded answer* by selecting the most relevant line

In a real RAG system, you'd replace this step with an LLM call.

In [None]:
# =========================================================
# 6) Context construction + toy grounded answer
# =========================================================
def build_context(top_items, max_chars=900):
    ctx_lines = []
    total = 0
    for item in top_items:
        i = item[0]
        row = chunks_df.iloc[i]
        line = f"[{row['chunk_id']} | {row['source']} | {row['year']} | {row['lang']}] {row['chunk_text']}"
        if total + len(line) > max_chars:
            break
        ctx_lines.append(line)
        total += len(line)
    return "\n".join(ctx_lines)

def grounded_answer(query, context):
    # If nothing retrieved, say we have no evidence (this is "grounded refusal")
    if not context.strip():
        return "Nuk gjeta evidencë në dokumentet e rikthyera. / I couldn't find evidence in the retrieved documents."

    # Token-overlap heuristic: choose the single most overlapping context line
    q_tokens = set(tokenize(query))
    best_line, best_score = None, -1
    for line in context.splitlines():
        l_tokens = set(tokenize(line))
        score = len(q_tokens & l_tokens)
        if score > best_score:
            best_score = score
            best_line = line

    return "Grounded answer (based on retrieved context):\n" + best_line

## 7) End-to-end RAG runner with logs (auditability) + timing (latency)

We log:
- query, retrieval mode, filters
- embedding time, search time, "generation" time
- exactly which chunks were returned (auditable evidence)

In [None]:
# =========================================================
# 7) End-to-end RAG function with a structured log
# =========================================================
@dataclass
class RAGLog:
    query: str
    retrieval_mode: str
    filters: dict
    t_embed_ms: float
    t_search_ms: float
    t_generate_ms: float
    retrieved: list
    context: str
    answer: str

def rag_query(query, mode="hybrid", k=5, only_official=True, year_gte=2020, langs=None):
    # 1) Measure embedding time explicitly (for lecture/latency discussion)
    t0 = time.time()
    _ = embed_model.encode([query], normalize_embeddings=True).astype("float32")
    t1 = time.time()

    # 2) Retrieval time
    t2 = time.time()
    if mode == "vector":
        raw = [(i, s, None, None) for i, s in vector_search(query, k=50)]
    elif mode == "bm25":
        raw = [(i, s, None, None) for i, s in bm25_search(query, k=50)]
    else:
        raw = hybrid_search(query, k=50)
    t3 = time.time()

    # 3) Filter + top-k
    top = apply_filters(raw, only_official=only_official, year_gte=year_gte, langs=langs)[:k]

    # 4) Build context
    context = build_context(top)

    # 5) "Generate" time (toy grounded answer)
    t4 = time.time()
    ans = grounded_answer(query, context)
    t5 = time.time()

    # 6) Build auditable retrieval evidence
    retrieved = []
    for item in top:
        i = item[0]
        row = chunks_df.iloc[i].to_dict()
        retrieved.append({
            "chunk_id": row["chunk_id"],
            "doc_id": row["doc_id"],
            "title": row["title"],
            "source": row["source"],
            "authority": row["authority"],
            "lang": row["lang"],
            "year": int(row["year"]),
            "text": row["chunk_text"],
            "scores": item[1:],  # hybrid details if present
        })

    return RAGLog(
        query=query,
        retrieval_mode=mode,
        filters={"only_official": only_official, "year_gte": year_gte, "langs": sorted(list(langs)) if langs else None},
        t_embed_ms=(t1 - t0) * 1000.0,
        t_search_ms=(t3 - t2) * 1000.0,
        t_generate_ms=(t5 - t4) * 1000.0,
        retrieved=retrieved,
        context=context,
        answer=ans,
    )

## 8) Run a bilingual hybrid demo

We try Albanian + English paraphrases + a keyword-style query ("Neni 12").

In [None]:
# =========================================================
# 8) Interactive bilingual demo
# =========================================================
queries = [
    "Kush lëshon paralajmërime për përmbytje?",
    "Who issues flood warnings?",
    "Kur fillon evakuimi?",
    "What starts the evacuation process?",
    "Neni 12",
]

for q in queries:
    log = rag_query(q, mode="hybrid", k=3, only_official=True, year_gte=2020, langs={"sq","en"})
    print("="*90)
    print("QUERY:", q)
    print("LATENCY (ms): embed", round(log.t_embed_ms,2), "| search", round(log.t_search_ms,2), "| generate", round(log.t_generate_ms,2))
    print("\nCONTEXT:\n", log.context)
    print("\nANSWER:\n", log.answer)
    print("\nRETRIEVED (audit table):")
    display(pd.DataFrame(log.retrieved)[["chunk_id","doc_id","source","year","lang","scores"]])

QUERY: Kush lëshon paralajmërime për përmbytje?
LATENCY (ms): embed 58.87 | search 53.1 | generate 0.11

CONTEXT:
 [LAW-100::c0 | official | 2020 | sq] Neni 12: Autoriteti përgjegjës për menaxhimin e emergjencave përcaktohet nga ligji dhe aktet nënligjore përkatëse.
[PROC-001::c0 | official | 2023 | sq] Në rast rreziku për përmbytje, Bashkia lëshon paralajmërime zyrtare dhe njofton popullatën përmes kanaleve të komunikimit.
[EVAC-010::c0 | official | 2022 | sq] Evakuimi fillon kur Shërbimi i Emergjencave lëshon urdhër evakuimi dhe koordinohet me policinë dhe zjarrfikëset.

ANSWER:
 Grounded answer (based on retrieved context):
[PROC-001::c0 | official | 2023 | sq] Në rast rreziku për përmbytje, Bashkia lëshon paralajmërime zyrtare dhe njofton popullatën përmes kanaleve të komunikimit.

RETRIEVED (audit table):


Unnamed: 0,chunk_id,doc_id,source,year,lang,scores
0,LAW-100::c0,LAW-100,official,2020,sq,"(0.65, 1.0, 0.0)"
1,PROC-001::c0,PROC-001,official,2023,sq,"(0.35000000984007795, 1.9404954579595038e-39, ..."
2,EVAC-010::c0,EVAC-010,official,2022,sq,"(0.06644402953328594, 3.4322902163529306e-39, ..."


LATENCY (ms): embed 45.48 | search 40.08 | generate 0.1

CONTEXT:
 [LAW-100::c0 | official | 2020 | sq] Neni 12: Autoriteti përgjegjës për menaxhimin e emergjencave përcaktohet nga ligji dhe aktet nënligjore përkatëse.
[LAW-100::c0 | official | 2020 | sq] Neni 12: Autoriteti përgjegjës për menaxhimin e emergjencave përcaktohet nga ligji dhe aktet nënligjore përkatëse.

ANSWER:
 Grounded answer (based on retrieved context):

RETRIEVED (audit table):


Unnamed: 0,chunk_id,doc_id,source,year,lang,scores
0,LAW-100::c0,LAW-100,official,2020,sq,"(0.65, 1.0, 0.0)"
1,NATO-ENG-010::c0,NATO-ENG-010,official,2021,en,"(0.35000001367966477, 2.446731023627791e-39, 1..."
2,LAW-100::c0,LAW-100,official,2020,sq,"(2.482379323003325e-39, 3.8190451123128074e-39..."


QUERY: Kur fillon evakuimi?
LATENCY (ms): embed 43.05 | search 39.39 | generate 0.11

CONTEXT:
 [LAW-100::c0 | official | 2020 | sq] Neni 12: Autoriteti përgjegjës për menaxhimin e emergjencave përcaktohet nga ligji dhe aktet nënligjore përkatëse.
[EVAC-010::c0 | official | 2022 | sq] Evakuimi fillon kur Shërbimi i Emergjencave lëshon urdhër evakuimi dhe koordinohet me policinë dhe zjarrfikëset.
[LAW-100::c0 | official | 2020 | sq] Neni 12: Autoriteti përgjegjës për menaxhimin e emergjencave përcaktohet nga ligji dhe aktet nënligjore përkatëse.

ANSWER:
 Grounded answer (based on retrieved context):
[EVAC-010::c0 | official | 2022 | sq] Evakuimi fillon kur Shërbimi i Emergjencave lëshon urdhër evakuimi dhe koordinohet me policinë dhe zjarrfikëset.

RETRIEVED (audit table):


Unnamed: 0,chunk_id,doc_id,source,year,lang,scores
0,LAW-100::c0,LAW-100,official,2020,sq,"(0.65, 1.0, 0.0)"
1,EVAC-010::c0,EVAC-010,official,2022,sq,"(0.34999998587070613, 1.3663485856045059e-39, ..."
2,LAW-100::c0,LAW-100,official,2020,sq,"(2.6872479620788103e-39, 4.1342276339674e-39, ..."


QUERY: What starts the evacuation process?
LATENCY (ms): embed 45.6 | search 40.47 | generate 0.08

CONTEXT:
 [LAW-100::c0 | official | 2020 | sq] Neni 12: Autoriteti përgjegjës për menaxhimin e emergjencave përcaktohet nga ligji dhe aktet nënligjore përkatëse.
[PROC-001::c0 | official | 2023 | sq] Në rast rreziku për përmbytje, Bashkia lëshon paralajmërime zyrtare dhe njofton popullatën përmes kanaleve të komunikimit.

ANSWER:
 Grounded answer (based on retrieved context):

RETRIEVED (audit table):


Unnamed: 0,chunk_id,doc_id,source,year,lang,scores
0,LAW-100::c0,LAW-100,official,2020,sq,"(0.65, 1.0, 0.0)"
1,NATO-ENG-010::c0,NATO-ENG-010,official,2021,en,"(0.35000001367966477, 2.9325013246540214e-39, ..."
2,PROC-001::c0,PROC-001,official,2023,sq,"(2.4363639367110965e-39, 3.7482522103247636e-3..."


QUERY: Neni 12
LATENCY (ms): embed 41.56 | search 39.53 | generate 0.1

CONTEXT:
 [LAW-100::c0 | official | 2020 | sq] Neni 12: Autoriteti përgjegjës për menaxhimin e emergjencave përcaktohet nga ligji dhe aktet nënligjore përkatëse.
[LAW-100::c0 | official | 2020 | sq] Neni 12: Autoriteti përgjegjës për menaxhimin e emergjencave përcaktohet nga ligji dhe aktet nënligjore përkatëse.
[EVAC-010::c0 | official | 2022 | sq] Evakuimi fillon kur Shërbimi i Emergjencave lëshon urdhër evakuimi dhe koordinohet me policinë dhe zjarrfikëset.

ANSWER:
 Grounded answer (based on retrieved context):
[LAW-100::c0 | official | 2020 | sq] Neni 12: Autoriteti përgjegjës për menaxhimin e emergjencave përcaktohet nga ligji dhe aktet nënligjore përkatëse.

RETRIEVED (audit table):


Unnamed: 0,chunk_id,doc_id,source,year,lang,scores
0,LAW-100::c0,LAW-100,official,2020,sq,"(0.65, 1.0, 0.0)"
1,LAW-100::c0,LAW-100,official,2020,sq,"(0.3500000038261909, 4.9466252122131405e-39, 1..."
2,EVAC-010::c0,EVAC-010,official,2022,sq,"(3.784857403929928e-39, 5.822857544507582e-39,..."


## 9) Evaluation: Precision@k, Recall@k, MRR

We define a tiny labeled evaluation set:
- each query has a set of relevant **doc_ids**
- compute metrics at k

This is exactly how you'd evaluate retrieval quality in a real RAG system (with a larger dataset).

In [None]:
# =========================================================
# 9) Evaluation metrics: Precision@k, Recall@k, MRR
# =========================================================
eval_set = [
    {"q": "Kush lëshon paralajmërime për përmbytje?", "relevant_doc_ids": {"PROC-001"}},
    {"q": "Who issues flood warnings?", "relevant_doc_ids": {"PROC-001", "NATO-ENG-010"}},
    {"q": "Kur fillon evakuimi?", "relevant_doc_ids": {"EVAC-010"}},
    {"q": "Cili nen përcakton autoritetin përgjegjës?", "relevant_doc_ids": {"LAW-100"}},
    {"q": "Neni 12", "relevant_doc_ids": {"LAW-100"}},
]

def topk_docids(query, mode, k, only_official=True, year_gte=2020, langs={"sq","en"}):
    # Get top-k chunks, then return their doc_ids
    if mode == "vector":
        raw = [(i, s, None, None) for i, s in vector_search(query, k=50)]
    elif mode == "bm25":
        raw = [(i, s, None, None) for i, s in bm25_search(query, k=50)]
    else:
        raw = hybrid_search(query, k=50)

    top = apply_filters(raw, only_official=only_official, year_gte=year_gte, langs=langs)[:k]
    return [chunks_df.iloc[i]["doc_id"] for i, *_ in top]

def precision_at_k(retrieved, relevant):
    if not retrieved:
        return 0.0
    return sum(1 for d in retrieved if d in relevant) / len(retrieved)

def recall_at_k(retrieved, relevant):
    if not relevant:
        return 0.0
    return sum(1 for d in set(retrieved) if d in relevant) / len(relevant)

def mrr(retrieved, relevant):
    for rank, d in enumerate(retrieved, start=1):
        if d in relevant:
            return 1.0 / rank
    return 0.0

def evaluate(mode="hybrid", k=3):
    rows = []
    for ex in eval_set:
        ret = topk_docids(ex["q"], mode=mode, k=k, only_official=True, year_gte=2020, langs={"sq","en"})
        rel = ex["relevant_doc_ids"]
        rows.append({
            "query": ex["q"],
            "retrieved_doc_ids": ret,
            f"P@{k}": round(precision_at_k(ret, rel), 3),
            f"R@{k}": round(recall_at_k(ret, rel), 3),
            "MRR": round(mrr(ret, rel), 3),
        })
    return pd.DataFrame(rows)

display(evaluate(mode="vector", k=3))
display(evaluate(mode="bm25", k=3))
display(evaluate(mode="hybrid", k=3))

Unnamed: 0,query,retrieved_doc_ids,P@3,R@3,MRR
0,Kush lëshon paralajmërime për përmbytje?,"[PROC-001, NATO-ENG-010, EVAC-010]",0.333,1.0,1.0
1,Who issues flood warnings?,"[PROC-001, NATO-ENG-010, EVAC-010]",0.667,1.0,1.0
2,Kur fillon evakuimi?,"[EVAC-010, NATO-ENG-010, PROC-001]",0.333,1.0,1.0
3,Cili nen përcakton autoritetin përgjegjës?,"[LAW-100, NATO-ENG-010, PROC-001]",0.333,1.0,1.0
4,Neni 12,"[LAW-100, PROC-001, NATO-ENG-010]",0.333,1.0,1.0


Unnamed: 0,query,retrieved_doc_ids,P@3,R@3,MRR
0,Kush lëshon paralajmërime për përmbytje?,"[PROC-001, EVAC-010, NATO-ENG-010]",0.333,1.0,1.0
1,Who issues flood warnings?,"[NATO-ENG-010, PROC-001, EVAC-010]",0.667,1.0,1.0
2,Kur fillon evakuimi?,"[EVAC-010, PROC-001, NATO-ENG-010]",0.333,1.0,1.0
3,Cili nen përcakton autoritetin përgjegjës?,"[LAW-100, PROC-001, EVAC-010]",0.333,1.0,1.0
4,Neni 12,"[LAW-100, PROC-001, EVAC-010]",0.333,1.0,1.0


Unnamed: 0,query,retrieved_doc_ids,P@3,R@3,MRR
0,Kush lëshon paralajmërime për përmbytje?,"[LAW-100, PROC-001, EVAC-010]",0.333,1.0,0.5
1,Who issues flood warnings?,"[LAW-100, NATO-ENG-010, LAW-100]",0.333,0.5,0.5
2,Kur fillon evakuimi?,"[LAW-100, EVAC-010, LAW-100]",0.333,1.0,0.5
3,Cili nen përcakton autoritetin përgjegjës?,"[LAW-100, LAW-100, EVAC-010]",0.667,1.0,1.0
4,Neni 12,"[LAW-100, LAW-100, EVAC-010]",0.667,1.0,1.0


## 10) Teaching moment: filtering (outdated vs recent)

If you allow older documents, retrieval may pull **outdated procedures**.
A generator (toy or real LLM) will likely follow the context it is given.

In [None]:
# =========================================================
# 10) Filtering demonstration
# =========================================================
q = "Kush lëshon paralajmërime për përmbytje?"

log_old_ok = rag_query(q, mode="vector", k=3, only_official=True, year_gte=2018, langs={"sq","en"})
log_new_only = rag_query(q, mode="vector", k=3, only_official=True, year_gte=2020, langs={"sq","en"})

print("=== WITHOUT strict year filter (year>=2018) ===")
print(log_old_ok.context)
print("\nAnswer:\n", log_old_ok.answer)

print("\n\n=== WITH year>=2020 filter ===")
print(log_new_only.context)
print("\nAnswer:\n", log_new_only.answer)

=== WITHOUT strict year filter (year>=2018) ===
[PROC-001::c0 | official | 2023 | sq] Në rast rreziku për përmbytje, Bashkia lëshon paralajmërime zyrtare dhe njofton popullatën përmes kanaleve të komunikimit.
[PROC-001-OLD::c0 | official | 2019 | sq] Në rast përmbytjeje, njoftimet bëhen në mënyrë lokale nga vullnetarët. (Dokument i vjetër, jo i përditësuar).

Answer:
 Grounded answer (based on retrieved context):
[PROC-001::c0 | official | 2023 | sq] Në rast rreziku për përmbytje, Bashkia lëshon paralajmërime zyrtare dhe njofton popullatën përmes kanaleve të komunikimit.


=== WITH year>=2020 filter ===
[PROC-001::c0 | official | 2023 | sq] Në rast rreziku për përmbytje, Bashkia lëshon paralajmërime zyrtare dhe njofton popullatën përmes kanaleve të komunikimit.
[EVAC-010::c0 | official | 2022 | sq] Evakuimi fillon kur Shërbimi i Emergjencave lëshon urdhër evakuimi dhe koordinohet me policinë dhe zjarrfikëset.

Answer:
 Grounded answer (based on retrieved context):
[PROC-001::c0 | offic