# Lab 3 — Full-Stack Multimodal RAG (PDF + Image) with Ablations & Evaluation
**Generated:** 2026-02-05

This notebook implements the complete Lab 3 workflow:
- Ingest PDFs + images (OCR + optional captioning)
- Two chunking strategies (page-based vs fixed-size)
- Sparse retrieval (BM25)
- Dense retrieval (SentenceTransformers + FAISS)
- Hybrid retrieval + cross-encoder reranking
- Evidence-grounded answer generation (Gemini or HuggingFace) with strict citations
- Correct missing-evidence behavior
- Evaluation: Precision@5, Recall@10 + answer quality rubric
- Ablations: chunking / retrieval / text-only vs multimodal
- README template for GitHub submission

## 0) Setup & Install
**What this cell does:** installs required libraries.

**Why it matters:** ensures Colab/local runs are reproducible.

**Assumptions/tradeoffs:** OCR/captioning models can be heavy; we make them optional and fall back gracefully.

In [None]:
# If you're in Colab, run installs. If local, install these in your environment.
!pip -q install pymupdf pillow scikit-learn pandas numpy requests
!pip -q install sentence-transformers faiss-cpu rank-bm25
!pip -q install transformers accelerate
!pip -q install google-generativeai
!pip -q install easyocr

import os, re, glob, json, math, time, shutil
from pathlib import Path
from dataclasses import dataclass
from typing import List, Dict, Any, Tuple, Optional

import numpy as np
import pandas as pd

import fitz  # PyMuPDF
from PIL import Image

from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer, CrossEncoder
import faiss

from transformers import pipeline

# Gemini import is optional; we guard it to avoid deployment crashes
try:
    import google.generativeai as genai
    GEMINI_LIB_AVAILABLE = True
except Exception:
    genai = None
    GEMINI_LIB_AVAILABLE = False

## 1) Lab Configuration (EDIT ME)
**What this cell does:** defines paths, knobs, and API keys.

**Why it matters:** you control chunking/retrieval and point to your files.

**Key assumptions:** you have multiple PDFs and multiple images; the notebook will copy them into a standard `project_data_mm/` workspace.

In [None]:
# =========================
# Lab Configuration (EDIT ME)
# =========================

# Put your files here (Colab examples)
PDF_FILES = [
    r"/content/1-s2.0-S221201731200326X-main.pdf",
     r"/content/Comparative_study_of_different_weather_forecasting_models.pdf",
]

IMAGE_FILES = [
    r"/content/Figure_8_Evidence_Based_Visualization.png",
     r"/content/image.png",
     r"/content/figure4_probabilistic_calibration_FINAL_NO_CLIP.png",
     r"/content/figure5_child_centered_risk_map.gif",
     r"/content/figure6_forecast_evaluation_performance.png"
]

# Workspace folders (auto-created)
DATA_DIR = "project_data_mm"
PDF_DIR  = os.path.join(DATA_DIR, "pdfs")
IMG_DIR  = os.path.join(DATA_DIR, "images")
os.makedirs(PDF_DIR, exist_ok=True)
os.makedirs(IMG_DIR, exist_ok=True)

# Retrieval knobs
TOP_K_TEXT     = 5
TOP_K_IMAGES   = 3
TOP_K_EVIDENCE = 8

# Fusion knob (text vs images) used for hybrid across modalities
ALPHA = 0.5  # 0.0 = images dominate, 1.0 = text dominates

# Chunking knobs (fixed-size strategy)
CHUNK_SIZE    = 900
CHUNK_OVERLAP = 150

# Reproducibility
RANDOM_SEED = 0
np.random.seed(RANDOM_SEED)

# OCR + captioning toggles
USE_OCR = True
USE_CAPTIONING = False  # optional; can be slow

# Generator choice
USE_GEMINI = False  # if True, uses Gemini API; else uses a HuggingFace model
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY", "").strip()  # or paste key here
GEMINI_MODEL = "gemini-1.5-flash"

HF_GENERATOR_MODEL = "google/flan-t5-base"  # small + works in free tiers

# Missing evidence behavior
MIN_RELEVANCE_FOR_ANSWER = 0.15  # threshold on normalized relevance score (0..1)

print("✅ Config loaded")

## 2) Standardize Workspace (copy your PDFs/images into expected folders)
**What this cell does:** copies your provided files into `project_data_mm/pdfs` and `project_data_mm/images`.

**Why it matters:** ingestion expects a consistent folder layout.

**Assumptions/tradeoffs:** copies (not moves) so you don’t lose originals.

In [None]:
def setup_multimodal_workspace(pdf_files: List[str], image_files: List[str]) -> Tuple[List[str], List[str]]:
    os.makedirs(PDF_DIR, exist_ok=True)
    os.makedirs(IMG_DIR, exist_ok=True)

    copied_pdfs, copied_imgs = [], []

    for p in pdf_files:
        src = Path(p)
        if not src.exists():
            raise FileNotFoundError(f"PDF not found: {src}")
        dst = Path(PDF_DIR) / src.name
        shutil.copy2(src, dst)
        copied_pdfs.append(str(dst))

    for im in image_files:
        src = Path(im)
        if not src.exists():
            raise FileNotFoundError(f"Image not found: {src}")
        dst = Path(IMG_DIR) / src.name
        shutil.copy2(src, dst)
        copied_imgs.append(str(dst))

    return copied_pdfs, copied_imgs

PDFS_IN_WORKSPACE, IMAGES_IN_WORKSPACE = setup_multimodal_workspace(PDF_FILES, IMAGE_FILES)

print("✅ Workspace ready")
print("PDFs:", len(PDFS_IN_WORKSPACE))
print("Images:", len(IMAGES_IN_WORKSPACE))

## 3) Ingestion: PDFs (page text) + Images (OCR + optional caption)
**What this cell does:** loads PDFs (page text) and images (OCR; optional captioning).

**Why it matters:** retrieval can only use what you index.

**Assumptions/tradeoffs:** OCR quality varies; we store OCR text as the “image document text.” Captioning is optional.

In [None]:
@dataclass
class TextChunk:
    chunk_id: str
    modality: str  # "text" or "image"
    source: str    # filename
    page: Optional[int]
    text: str

def read_pdf_pages(pdf_path: str) -> List[TextChunk]:
    doc = fitz.open(pdf_path)
    chunks = []
    for i in range(doc.page_count):
        page = doc.load_page(i)
        txt = page.get_text("text") or ""
        txt = re.sub(r"\s+", " ", txt).strip()
        if txt:
            chunks.append(TextChunk(
                chunk_id=f"T::{Path(pdf_path).name}::p{i+1}",
                modality="text",
                source=Path(pdf_path).name,
                page=i+1,
                text=txt
            ))
    doc.close()
    return chunks

def ocr_image_easyocr(img_path: str) -> str:
    import easyocr
    reader = easyocr.Reader(['en'], gpu=False)
    res = reader.readtext(img_path, detail=0)
    return " ".join(res).strip()

def caption_image_blip(img_path: str) -> str:
    # Optional captioning using Transformers BLIP
    from transformers import BlipProcessor, BlipForConditionalGeneration
    processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
    model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
    image = Image.open(img_path).convert("RGB")
    inputs = processor(image, return_tensors="pt")
    out = model.generate(**inputs, max_new_tokens=30)
    return processor.decode(out[0], skip_special_tokens=True).strip()

def read_images(img_paths: List[str], use_ocr=True, use_captioning=False) -> List[TextChunk]:
    chunks = []
    for p in img_paths:
        base = Path(p).name
        ocr_txt = ""
        cap = ""
        if use_ocr:
            try:
                ocr_txt = ocr_image_easyocr(p)
            except Exception:
                ocr_txt = ""
        if use_captioning:
            try:
                cap = caption_image_blip(p)
            except Exception:
                cap = ""
        name_txt = re.sub(r"[_\-]+", " ", Path(p).stem)
        combined = " ".join([t for t in [name_txt, cap, ocr_txt] if t]).strip()
        if not combined:
            combined = name_txt
        chunks.append(TextChunk(
            chunk_id=f"I::{base}",
            modality="image",
            source=base,
            page=None,
            text=combined
        ))
    return chunks

text_page_chunks = []
for pdf in PDFS_IN_WORKSPACE:
    text_page_chunks.extend(read_pdf_pages(pdf))

image_chunks = read_images(IMAGES_IN_WORKSPACE, use_ocr=USE_OCR, use_captioning=USE_CAPTIONING)

print("✅ Ingested text pages:", len(text_page_chunks))
print("✅ Ingested images:", len(image_chunks))

## 4) Chunking Strategies: Page-based vs Fixed-size
We compare:
- **Page-based chunking**: each PDF page is a chunk (good for citation stability)
- **Fixed-size chunking**: character windows with overlap (can improve recall)

**Tradeoffs:** fixed-size may improve retrieval but can weaken citation specificity.

In [None]:
def fixed_size_chunking(page_chunks: List[TextChunk], chunk_size: int, overlap: int) -> List[TextChunk]:
    out = []
    for ch in page_chunks:
        txt = ch.text
        if len(txt) <= chunk_size:
            out.append(ch)
            continue
        start = 0
        part = 0
        while start < len(txt):
            end = min(len(txt), start + chunk_size)
            sub = txt[start:end].strip()
            if sub:
                out.append(TextChunk(
                    chunk_id=f"{ch.chunk_id}::c{part}",
                    modality=ch.modality,
                    source=ch.source,
                    page=ch.page,
                    text=sub
                ))
            if end == len(txt):
                break
            start = max(0, end - overlap)
            part += 1
    return out

text_fixed_chunks = fixed_size_chunking(text_page_chunks, CHUNK_SIZE, CHUNK_OVERLAP)

print("Page-based text chunks:", len(text_page_chunks))
print("Fixed-size text chunks:", len(text_fixed_chunks))

## 5) Sparse Retrieval: BM25
BM25 is the required sparse keyword baseline (strong for exact terminology).

In [None]:
def tokenize(text: str) -> List[str]:
    return re.findall(r"[A-Za-z0-9]+", text.lower())

class BM25Index:
    def __init__(self, chunks: List[TextChunk]):
        self.chunks = chunks
        self.tokenized = [tokenize(c.text) for c in chunks]
        self.bm25 = BM25Okapi(self.tokenized)

    def search(self, query: str, top_k: int) -> List[Tuple[int, float]]:
        qtok = tokenize(query)
        scores = self.bm25.get_scores(qtok)
        idx = np.argsort(scores)[::-1][:top_k]
        return [(int(i), float(scores[i])) for i in idx]

def minmax_norm(scores: np.ndarray) -> np.ndarray:
    if len(scores) == 0:
        return scores
    mn, mx = float(scores.min()), float(scores.max())
    if mx - mn < 1e-9:
        return np.zeros_like(scores, dtype=float)
    return (scores - mn) / (mx - mn)

## 6) Dense Retrieval: SentenceTransformers + FAISS
Dense retrieval is required for semantic matching (paraphrases).

In [None]:
class FaissIndex:
    def __init__(self, chunks: List[TextChunk], embedder_name="sentence-transformers/all-MiniLM-L6-v2"):
        self.chunks = chunks
        self.embedder = SentenceTransformer(embedder_name)
        embs = self.embedder.encode([c.text for c in chunks], convert_to_numpy=True, normalize_embeddings=True, show_progress_bar=True)
        self.embs = embs.astype(np.float32)
        dim = self.embs.shape[1]
        self.index = faiss.IndexFlatIP(dim)  # cosine via normalized vectors
        self.index.add(self.embs)

    def search(self, query: str, top_k: int) -> List[Tuple[int, float]]:
        q = self.embedder.encode([query], convert_to_numpy=True, normalize_embeddings=True).astype(np.float32)
        scores, idx = self.index.search(q, top_k)
        out = []
        for i, s in zip(idx[0], scores[0]):
            if i == -1:
                continue
            out.append((int(i), float(s)))
        return out

## 7) Hybrid Retrieval + Cross-Encoder Reranking
Required methods:
- dense, sparse, hybrid, hybrid + rerank

In [None]:
class CrossEncoderReranker:
    def __init__(self, model_name="cross-encoder/ms-marco-MiniLM-L-6-v2"):
        self.ce = CrossEncoder(model_name)

    def rerank(self, query: str, chunks: List[TextChunk], candidates: List[Tuple[int, float]], top_k: int) -> List[Tuple[int, float]]:
        if not candidates:
            return []
        pairs = [(query, chunks[i].text) for i, _ in candidates]
        scores = self.ce.predict(pairs)
        reranked = sorted([(candidates[j][0], float(scores[j])) for j in range(len(candidates))], key=lambda x: x[1], reverse=True)
        return reranked[:top_k]

def fuse_scores(sparse_hits: List[Tuple[int,float]], dense_hits: List[Tuple[int,float]], alpha: float, top_k: int) -> List[Tuple[int,float]]:
    s_idx = [i for i,_ in sparse_hits]
    d_idx = [i for i,_ in dense_hits]
    all_idx = sorted(set(s_idx) | set(d_idx))

    s_map = {i:s for i,s in sparse_hits}
    d_map = {i:s for i,s in dense_hits}

    s_arr = np.array([s_map.get(i, 0.0) for i in all_idx], dtype=float)
    d_arr = np.array([d_map.get(i, 0.0) for i in all_idx], dtype=float)

    s_n = minmax_norm(s_arr)
    d_n = minmax_norm(d_arr)

    fused = alpha * s_n + (1.0 - alpha) * d_n
    pairs = list(zip(all_idx, fused.tolist()))
    pairs.sort(key=lambda x: x[1], reverse=True)
    return pairs[:top_k]

## 8) Index Builder (chunking + modality)
Builds BM25 + FAISS + reranker for either:
- text-only RAG
- multimodal RAG (text + image OCR/captions)

In [None]:
def build_indexes(text_chunks: List[TextChunk], image_chunks: List[TextChunk], multimodal: bool):
    chunks = text_chunks + image_chunks if multimodal else text_chunks
    bm25 = BM25Index(chunks)
    faiss_idx = FaissIndex(chunks)
    reranker = CrossEncoderReranker()
    return chunks, bm25, faiss_idx, reranker

print("✅ Index builder ready")

## 9) Retrieval API (dense/sparse/hybrid/hybrid_rerank)
Single function to support all ablations.

In [None]:
def retrieve(query: str, chunks: List[TextChunk], bm25: BM25Index, faiss_idx: FaissIndex,
             reranker: CrossEncoderReranker, method: str,
             top_k_evidence: int, alpha: float) -> List[Tuple[TextChunk, float]]:
    method = method.lower().strip()
    cand_k = max(30, top_k_evidence * 5)

    if method == "sparse":
        hits = bm25.search(query, cand_k)
        s = np.array([h[1] for h in hits], dtype=float)
        s = minmax_norm(s)
        hits = [(hits[i][0], float(s[i])) for i in range(len(hits))]
        return [(chunks[i], score) for i, score in hits[:top_k_evidence]]

    if method == "dense":
        hits = faiss_idx.search(query, cand_k)
        d = np.array([h[1] for h in hits], dtype=float)
        d = minmax_norm(d)
        hits = [(hits[i][0], float(d[i])) for i in range(len(hits))]
        return [(chunks[i], score) for i, score in hits[:top_k_evidence]]

    if method in ("hybrid", "hybrid_rerank"):
        sparse_hits = bm25.search(query, cand_k)
        dense_hits  = faiss_idx.search(query, cand_k)
        fused = fuse_scores(sparse_hits, dense_hits, alpha=alpha, top_k=cand_k)

        if method == "hybrid":
            return [(chunks[i], float(score)) for i, score in fused[:top_k_evidence]]

        reranked = reranker.rerank(query, chunks, fused[:min(80, len(fused))], top_k=top_k_evidence)
        rr_scores = np.array([s for _, s in reranked], dtype=float)
        rr_scores = minmax_norm(rr_scores)
        reranked = [(reranked[i][0], float(rr_scores[i])) for i in range(len(reranked))]
        return [(chunks[i], score) for i, score in reranked]

    raise ValueError(f"Unknown method: {method}")

## 10) Evidence Packaging + Strict Citations
Citations are traceable chunk ids. PDFs include page numbers.

In [None]:
def format_citation(ch: TextChunk) -> str:
    if ch.modality == "text":
        return f"[{ch.chunk_id}] ({ch.source}, page {ch.page})"
    return f"[{ch.chunk_id}] ({ch.source})"

def build_context(evidence: List[Tuple[TextChunk, float]], max_chars: int = 4500) -> Tuple[str, List[Dict[str,Any]]]:
    items = []
    ctx_parts = []
    used = 0
    for ch, score in evidence:
        cit = format_citation(ch)
        snippet = ch.text.strip()
        entry = {
            "chunk_id": ch.chunk_id,
            "modality": ch.modality,
            "source": ch.source,
            "page": ch.page,
            "score": float(score),
            "citation": cit,
            "text": snippet
        }
        items.append(entry)

        block = f"{cit}\n{snippet}\n"
        if used + len(block) > max_chars:
            break
        ctx_parts.append(block)
        used += len(block)

    return "\n---\n".join(ctx_parts), items

## 11) Grounded Answer Generation (Gemini or HuggingFace) + Missing-Evidence Rule
Required behavior: if evidence is insufficient, output:
**Not enough evidence in the retrieved context.**

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

def init_hf_generator(model_name: str):
    # Initialize tokenizer and model directly for sequence-to-sequence tasks like Flan-T5
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

    # Create a callable that mimics the pipeline's output format
    def custom_generator(prompt_text, max_new_tokens=200, do_sample=False):
        input_ids = tokenizer(prompt_text, return_tensors="pt").input_ids
        # Ensure outputs are not too long to prevent memory issues
        output_ids = model.generate(input_ids, max_new_tokens=max_new_tokens, do_sample=do_sample, early_stopping=True)
        generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
        return [{"generated_text": generated_text}]
    return custom_generator

HF_GEN = None
if not USE_GEMINI:
    HF_GEN = init_hf_generator(HF_GENERATOR_MODEL)

def gemini_generate(prompt: str) -> str:
    if (not GEMINI_LIB_AVAILABLE) or (not GEMINI_API_KEY):
        return ""
    try:
        genai.configure(api_key=GEMINI_API_KEY)
        model = genai.GenerativeModel(GEMINI_MODEL)
        resp = model.generate_content(prompt)
        return (getattr(resp, "text", "") or "").strip()
    except Exception:
        return ""

def hf_generate(prompt: str) -> str:
    if HF_GEN is None:
        return ""
    out = HF_GEN(prompt, max_new_tokens=200, do_sample=False)
    return (out[0]["generated_text"] or "").strip()

def answer_question(question: str, evidence: List[Tuple[TextChunk,float]]) -> Dict[str,Any]:
    if (not evidence) or (max([s for _, s in evidence]) < MIN_RELEVANCE_FOR_ANSWER):
        return {"answer": "Not enough evidence in the retrieved context.",
                "citations": [], "used_evidence": []}

    context, items = build_context(evidence)

    prompt = (
        "You are a careful assistant. Answer ONLY using the evidence below.\n"
        "If the evidence does not support the answer, respond exactly:\n"
        "Not enough evidence in the retrieved context.\n\n"
        f"Question:\n{question}\n\n"
        f"Evidence:\n{context}\n\n"
        "Answer in 6-10 bullet points max. Include citations like [T::file::p1] inline.\n"
    )

    text = gemini_generate(prompt) if USE_GEMINI else hf_generate(prompt)
    if not text:
        text = "Not enough evidence in the retrieved context."

    # Safety: if no citations, refuse
    if not re.search(r"\[T::|\[I::", text):
        text = "Not enough evidence in the retrieved context."

    cited = sorted(set(re.findall(r"\[(T::[^\]]+|I::[^\]]+)\]", text)))
    return {"answer": text, "citations": cited, "used_evidence": items}

## 12) Quick Demo Run (needed for screenshots)
Run one query end-to-end and inspect top evidence + grounded answer.

In [None]:
QUERY = "Explain forcasting model"
CHUNKING = "page"          # "page" or "fixed"
MULTIMODAL = True          # True or False
METHOD = "hybrid_rerank"   # "sparse" | "dense" | "hybrid" | "hybrid_rerank"

text_chunks = text_page_chunks if CHUNKING == "page" else text_fixed_chunks
chunks, bm25, faiss_idx, reranker = build_indexes(text_chunks, image_chunks, multimodal=MULTIMODAL)

evidence = retrieve(QUERY, chunks, bm25, faiss_idx, reranker, method=METHOD, top_k_evidence=TOP_K_EVIDENCE, alpha=ALPHA)

print("Top evidence:")
for ch, s in evidence[:8]:
    print(f"- {ch.chunk_id} | {ch.modality} | {ch.source} | page={ch.page} | score={s:.3f}")

out = answer_question(QUERY, evidence)
print("\n=== Answer ===\n")
print(out["answer"])
print("\nCited chunk ids:", out["citations"])

## 13) Evaluation Setup (Queries + Gold Labels)
Fill `gold` with relevant chunk_ids after inspection.

This is required for Precision@5 and Recall@10.

In [None]:
# EVAL_QUERIES = [
#     {"query": "Explain forcasting model", "gold": []},
#     {"query": "What does the forcasting model architecture show?", "gold": []},
# ]
EVAL_QUERIES = [
  {
    "query": "Explain forcasting model",
    "gold": ["pdf:stormcare:page:2", "img:stormcare:fig:arch:1"]
  },
  {
    "query": "What does the forcasting model architecture show?",
    "gold": ["img:stormcare:fig:arch:1"]
  }
]

def suggest_gold_from_topk(query: str, chunking="page", multimodal=True, method="hybrid_rerank"):
    text_chunks = text_page_chunks if chunking == "page" else text_fixed_chunks
    chunks, bm25, faiss_idx, reranker = build_indexes(text_chunks, image_chunks, multimodal=multimodal)
    ev = retrieve(query, chunks, bm25, faiss_idx, reranker, method=method, top_k_evidence=10, alpha=ALPHA)
    return [(c.chunk_id, c.source, c.page, c.modality) for c,_ in ev]

for q in EVAL_QUERIES:
    print("\nQuery:", q["query"])
    for item in suggest_gold_from_topk(q["query"]):
        print(" ", item)

## 14) Retrieval Metrics (Precision@5, Recall@10) + Ablations
Runs: (chunking × modality × method) for each query.

In [None]:
def compute_precision_recall(retrieved_ids: List[str], gold_ids: List[str], k_p=5, k_r=10) -> Tuple[float,float]:
    gold = set(gold_ids)
    if not gold:
        return (np.nan, np.nan)
    p_list = retrieved_ids[:k_p]
    r_list = retrieved_ids[:k_r]
    precision = len(set(p_list) & gold) / max(1, len(p_list))
    recall = len(set(r_list) & gold) / max(1, len(gold))
    return precision, recall

METHODS = ["dense", "sparse", "hybrid", "hybrid_rerank"]
CHUNKINGS = ["page", "fixed"]
MODALITIES = [("text_only", False), ("multimodal", True)]

rows = []
for cq in EVAL_QUERIES:
    query = cq["query"]
    gold = cq["gold"]

    for chunking in CHUNKINGS:
        for mod_name, multimodal in MODALITIES:
            text_chunks = text_page_chunks if chunking == "page" else text_fixed_chunks
            chunks, bm25, faiss_idx, reranker = build_indexes(text_chunks, image_chunks, multimodal=multimodal)

            for method in METHODS:
                ev = retrieve(query, chunks, bm25, faiss_idx, reranker, method=method, top_k_evidence=10, alpha=ALPHA)
                retrieved_ids = [c.chunk_id for c,_ in ev]
                p5, r10 = compute_precision_recall(retrieved_ids, gold, k_p=5, k_r=10)
                rows.append({"Query": query, "Chunking": chunking, "Modality": mod_name,
                             "Method": method, "Precision@5": p5, "Recall@10": r10})

results_df = pd.DataFrame(rows)
results_df

## 15) Answer Grid (for qualitative scoring)
Stores generated answers across ablations so you can score faithfulness/coverage/missing-evidence behavior.

In [None]:
def run_answer_grid(queries: List[Dict[str,Any]]):
    rows = []
    for cq in queries:
        q = cq["query"]
        for chunking in CHUNKINGS:
            for mod_name, multimodal in MODALITIES:
                text_chunks = text_page_chunks if chunking == "page" else text_fixed_chunks
                chunks, bm25, faiss_idx, reranker = build_indexes(text_chunks, image_chunks, multimodal=multimodal)
                for method in METHODS:
                    ev = retrieve(q, chunks, bm25, faiss_idx, reranker, method=method, top_k_evidence=TOP_K_EVIDENCE, alpha=ALPHA)
                    ans = answer_question(q, ev)
                    rows.append({
                        "Query": q,
                        "Chunking": chunking,
                        "Modality": mod_name,
                        "Method": method,
                        "Answer": ans["answer"],
                        "Citations": ", ".join(ans["citations"]),
                        "MaxEvidenceScore": max([s for _,s in ev], default=0.0)
                    })
    return pd.DataFrame(rows)

answers_df = run_answer_grid(EVAL_QUERIES)
answers_df.head(8)

### Manual Rubric Columns
Add and fill:
- Faithfulness (0–2)
- Coverage (0–2)
- MissingEvidenceBehavior (0–2)

In [None]:
for col in ["Faithfulness","Coverage","MissingEvidenceBehavior"]:
    if col not in answers_df.columns:
        answers_df[col] = np.nan
answers_df

## 16) Summary Tables (for README)
Mean Precision@5 and Recall@10 across labeled queries.

In [None]:
summary = (results_df
           .groupby(["Chunking","Modality","Method"], dropna=False)[["Precision@5","Recall@10"]]
           .mean()
           .reset_index()
           .sort_values(["Modality","Chunking","Method"]))
summary

## 17) README Template Generator
Prints a ready-to-paste README for GitHub.

In [None]:
def df_to_md_table(df: pd.DataFrame, max_rows=60) -> str:
    d = df.copy().head(max_rows)
    return d.to_markdown(index=False)

readme = "\n".join([
"# Lab 3 Results — Multimodal RAG",
"",
"## Dataset",
f"- PDFs: {len(PDFS_IN_WORKSPACE)} file(s) in `{PDF_DIR}`",
f"- Images: {len(IMAGES_IN_WORKSPACE)} file(s) in `{IMG_DIR}`",
f"- Modalities: PDF text (page extraction), Image OCR (EasyOCR), optional captioning ({USE_CAPTIONING})",
"",
"## System Summary",
f"- Chunking: page-based + fixed-size ({CHUNK_SIZE} chars, overlap {CHUNK_OVERLAP})",
f"- Retrieval: Dense (SentenceTransformers+FAISS), Sparse (BM25), Hybrid (ALPHA={ALPHA}), Hybrid+Rerank (cross-encoder)",
f"- Generator: {'Gemini ('+GEMINI_MODEL+')' if USE_GEMINI else 'HuggingFace ('+HF_GENERATOR_MODEL+')'}",
f"- Missing evidence rule: threshold {MIN_RELEVANCE_FOR_ANSWER} -> outputs 'Not enough evidence in the retrieved context.'",
"",
"## Retrieval Results (Mean across labeled queries)",
df_to_md_table(summary),

])
print(readme)