# CS 5542 ‚Äî Lab 4 Notebook (Team Project)
## RAG Application Integration, Deployment, and Monitoring (Deadline: Feb. 12, 2026)

**Purpose:** This notebook is a **project-aligned template** for Lab 4. Your team should reuse your Lab-3 multimodal RAG pipeline and integrate it into a **deployable application** with **automatic logging** and **failure analysis**.

### Submission policy
- **Survey:** submitted **individually**
- **Deliverables (GitHub repo / notebook / report / deployment link):** submitted **as a team**

### Team-size requirement
- **1‚Äì2 students:** Base requirements + **1 extension**
- **3‚Äì4 students:** Base requirements + **2‚Äì3 extensions**

---

## What you will build (at minimum)
1. A **Streamlit app** that accepts a question and returns:
   - an **answer**
   - **retrieved evidence** with citations
   - **metrics panel** (latency, P@5, R@10 if applicable)
2. An **automatic logger** that appends to: `logs/query_metrics.csv`
3. A **mini gold set** of **5 project queries** (Q1‚ÄìQ5) for evaluation
4. **Two failure cases** with root cause + proposed fix

> **Important:** Lab 4 focuses on **application integration and deployment**, not on redesigning retrieval. Prefer reusing your Lab-3 modules.

---

## Recommended repository structure (for your team repo)
```
/app/              # Streamlit UI (required)
/rag/              # Retrieval + indexing modules (reuse from Lab 3)
/logs/             # query_metrics.csv (auto-created)
/data/             # your project-aligned PDFs/images (do NOT commit large/private data)
/api/              # optional FastAPI backend (extension)
/notebooks/        # this notebook
requirements.txt
README.md
```

---

## Contents of this notebook
1. Setup & environment checks  
2. Project dataset wiring (connect your Lab-3 ingestion)  
3. Mini gold set (Q1‚ÄìQ5)  
4. Retrieval + answer function (reuse your Lab-3 pipeline)  
5. Evaluation + logging (required)  
6. Streamlit app skeleton (required)  
7. Optional extension: FastAPI backend  
8. Deployment checklist + failure analysis template


In [1]:
!rm data.zip

rm: cannot remove 'data.zip': No such file or directory


In [2]:
from google.colab import files
uploaded = files.upload()

Saving data.zip to data.zip


In [16]:
# Task: Load provided demo files (docs + images)
import os, zipfile, shutil
from pathlib import Path

ZIP_NAME = 'data.zip'

zip_candidates = [
    ZIP_NAME,
    f'/mnt/data/{ZIP_NAME}',
    f'./{ZIP_NAME}',
]

zip_path = None
for z in zip_candidates:
    if os.path.exists(z):
        zip_path = z
        break

if zip_path is None:
    raise FileNotFoundError("‚ùå data.zip not found. Please upload it.")

extract_dir = Path("./data")
extract_dir.mkdir(exist_ok=True)

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)

print(f"‚úÖ Extracted demo files from: {ZIP_NAME}")

data_dir = Path("./data")

# 1Ô∏è‚É£ docs
docs_dir = data_dir / "docs"
docs_dir.mkdir(exist_ok=True)

# PDF to docs
for f in data_dir.glob("*.pdf"):
    shutil.move(str(f), docs_dir / f.name)

# delete DS_Store
for f in data_dir.glob("*.DS_Store"):
    f.unlink()

# 2Ô∏è‚É£ figures ‚Üí images
fig_dir = data_dir / "figures"
img_dir = data_dir / "images"

if fig_dir.exists() and not img_dir.exists():
    fig_dir.rename(img_dir)


print("\nüìÇ Final folder structure:")

if os.path.isdir('./data/docs'):
    print('Sample docs:', sorted(os.listdir('./data/docs'))[:6])

if os.path.isdir('./data/images'):
    print('Sample images:', sorted(os.listdir('./data/images'))[:6])

‚úÖ Extracted demo files from: data.zip

üìÇ Final folder structure:
Sample docs: []


In [17]:
# Ensure a numeric/table-like demo file exists for Q4
import os
numeric_path = './data/docs/07_numeric_table.txt'
if not os.path.exists(numeric_path):
    with open(numeric_path, 'w', encoding='utf-8') as f:
        f.write(
            'Fusion Hyperparameters (Table 1)\n'
            'alpha = 0.50\n'
            'top_k = 5\n'
            'missing_evidence_score_threshold = 0.05\n'
            'latency_alert_ms = 2000\n'
        )
    print('‚úÖ Created:', numeric_path)
else:
    print('‚úÖ Numeric demo file already present:', numeric_path)


‚úÖ Created: ./data/docs/07_numeric_table.txt


In [18]:
# Sanity checks: ensure demo docs are loaded
import os, glob
doc_files = glob.glob('./data/docs/*.txt')
print('Found .txt docs:', len(doc_files))
assert len(doc_files) > 0, 'No docs found. Ensure the demo ZIP was extracted and ./data/docs exists.'

# Preview one document
with open(doc_files[0], 'r', encoding='utf-8') as f:
    preview = f.read()[:600]
print('Preview:', os.path.basename(doc_files[0]))
print(preview)


Found .txt docs: 1
Preview: 07_numeric_table.txt
Fusion Hyperparameters (Table 1)
alpha = 0.50
top_k = 5
missing_evidence_score_threshold = 0.05
latency_alert_ms = 2000



In [11]:
!pip install PyPDF2

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[?25l   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/232.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[91m‚ï∏[0m[90m‚îÅ[0m [32m225.3/232.6 kB[0m [31m7.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m232.6/232.6 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [19]:
import os

for root, dirs, files in os.walk('.'):
    print(root)
    for f in files[:5]:
        print("   ", f)

.
    data.zip
./.config
    .last_survey_prompt.yaml
    .last_update_check.json
    gce
    .last_opt_in_prompt.yaml
    active_config
./.config/logs
./.config/logs/2026.01.16
    14.23.31.981136.log
    14.24.18.954466.log
    14.24.03.314209.log
    14.24.28.646070.log
    14.24.29.392089.log
./.config/configurations
    config_default
./.ipynb_checkpoints
./data
./data/__MACOSX
    ._data
./data/__MACOSX/data
    ._LLM-Powered Knowledge Graphs for Enterprise Intelligence and Analytics.pdf
    ._HyperGraphRAG- Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation.pdf
    ._.DS_Store
./data/__MACOSX/data/figures
    ._Illustration of HyperGraphRAG.png
    ._Comparasion of Standard RAG, GraphRAG, HyperGraphRAG.png
    ._Overview of HyperGraphRAG.png
    ._Framework overview for unified knowledge.png
    ._Leveraging Embedding Models, Contextual Retrieval, and LLM-Based Mapping to Construct Graph Triples for Knowledge Graphs.png
./data/data
    LLM-Powered 

In [21]:
import glob, os
from PyPDF2 import PdfReader

# ‰Ω†ÁöÑPDFÁúüÂÆû‰ΩçÁΩÆ
DOC_DIR = "./data/data"

pdf_files = sorted(glob.glob(os.path.join(DOC_DIR, "*.pdf")))
print("find PDF:", pdf_files)

if len(pdf_files) == 0:
    raise RuntimeError("‚ùå")

documents = []

for p in pdf_files:
    reader = PdfReader(p)
    text = ""

    for page in reader.pages[:5]:
        t = page.extract_text()
        if t:
            text += t + "\n"

    documents.append({
        "doc_id": os.path.basename(p),
        "source": p,
        "text": text[:4000]
    })

print("‚úÖ Loaded documents:", len(documents))
print("Example doc:", documents[0]["doc_id"])

find PDF: ['./data/data/HyperGraphRAG- Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation.pdf', './data/data/LLM-Powered Knowledge Graphs for Enterprise Intelligence and Analytics.pdf']
‚úÖ Loaded documents: 2
Example doc: HyperGraphRAG- Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation.pdf


In [23]:
# Load demo images and create lightweight text surrogates (captions) for multimodal retrieval
import glob, os

IMG_DIR = './data/data/figures'
img_files = sorted(glob.glob(os.path.join(IMG_DIR, '*.*')))
img_files = [p for p in img_files if p.lower().endswith(('.png','.jpg','.jpeg','.webp'))]

# Minimal captions so images participate in retrieval without requiring a vision encoder
IMAGE_CAPTIONS = {
    'rag_pipeline.png': 'RAG pipeline diagram: ingest, chunk, index, retrieve top-k evidence, build context, generate grounded answer, log metrics for monitoring.',
    'retrieval_modes.png': 'Retrieval modes diagram: BM25 keyword, vector semantic, hybrid fusion, multi-hop hop-1 to hop-2 refinement.',
}

images = []
for p in img_files:
    fid = os.path.basename(p)
    cap = IMAGE_CAPTIONS.get(fid, fid.replace('_',' ').replace('.png','').replace('.jpg',''))
    images.append({'img_id': fid, 'source': p, 'text': cap})

print('‚úÖ Loaded images:', len(images))
if images:
    print('Example image:', images[0]['img_id'])
    print('Caption:', images[0]['text'])

# Unified evidence store used by retrieval (text + images)
items = []
for d in documents:
    items.append({
        'evidence_id': d.get('doc_id') or os.path.basename(d.get('source','')),
        'modality': 'text',
        'source': d.get('source'),
        'text': d.get('text','')
    })
for im in images:
    items.append({
        'evidence_id': f"img::{im['img_id']}",
        'modality': 'image',
        'source': im.get('source'),
        'text': im.get('text','')
    })

assert len(items) > 0, 'Evidence store is empty.'
print('‚úÖ Unified evidence items:', len(items), '(text:', len(documents), ', images:', len(images), ')')


‚úÖ Loaded images: 5
Example image: Comparasion of Standard RAG, GraphRAG, HyperGraphRAG.png
Caption: Comparasion of Standard RAG, GraphRAG, HyperGraphRAG
‚úÖ Unified evidence items: 7 (text: 2 , images: 5 )


# 1) Setup & environment checks

This notebook includes **safe defaults** and **lightweight code examples**.  
Replace the placeholder pieces with your Lab-3 implementation (PDF parsing, OCR, multimodal evidence, hybrid retrieval, reranking).

### Install dependencies (edit as needed)
- Core: `streamlit`, `pandas`, `numpy`, `requests`
- Optional: `fastapi`, `uvicorn` (if you do the FastAPI extension)
- Retrieval examples: `scikit-learn` (TF-IDF baseline), optionally `sentence-transformers` (dense embeddings)

> In your team repo, always keep a clean `requirements.txt` for reproducibility.


In [24]:
# If running in Colab or fresh environment, uncomment installs:
# !pip -q install streamlit pandas numpy requests scikit-learn
# # Optional (FastAPI extension):
# !pip -q install fastapi uvicorn pydantic
# # Optional (dense retrieval):
# !pip -q install sentence-transformers

import os, json, time
from pathlib import Path
import pandas as pd
import numpy as np

print("Python OK. Working directory:", os.getcwd())


Python OK. Working directory: /content


# 2) Project paths + configuration

Set your project data paths and key parameters here.

- Do **not** hardcode secrets (API keys) in notebooks or repos.
- If you use a hosted LLM, read from environment variables locally.

**Tip:** Keep these settings mirrored in `rag/config.py` so your Streamlit app uses the same config.


In [25]:
from dataclasses import dataclass

@dataclass
class Lab4Config:
    project_name: str = "LexGuard ‚Äì The Neuro-Symbolic Compliance Auditor"
    data_dir: str = "./data"        # where your PDFs/images live locally
    logs_dir: str = "./logs"
    log_file: str = "./logs/query_metrics.csv"
    top_k_default: int = 10
    eval_p_at: int = 5
    eval_r_at: int = 10

cfg = Lab4Config()
Path(cfg.logs_dir).mkdir(parents=True, exist_ok=True)
print(cfg)


Lab4Config(project_name='LexGuard ‚Äì The Neuro-Symbolic Compliance Auditor', data_dir='./data', logs_dir='./logs', log_file='./logs/query_metrics.csv', top_k_default=10, eval_p_at=5, eval_r_at=10)


# 3) Dataset wiring (project-aligned)

For Lab 4, your **data, application UI, and models** must be aligned to your team project.

## Required (project-aligned)
- 2‚Äì6 PDFs
- 5‚Äì15 images/figures/tables (if your project is multimodal)

## In Lab 3 you likely had:
- PDF text extraction (PyMuPDF)
- OCR / captions for figures or scanned pages
- Chunking + indexing (dense/sparse/hybrid)
- Reranking (optional)
- Grounded answer generation with citations

### What to do here
1. Point this notebook to your dataset folder.
2. Load *already-prepared* chunks/evidence from Lab 3 (recommended), OR
3. Call your Lab-3 ingestion function to rebuild the index.

Below is a **minimal example** that loads plain text files as ‚Äúdocuments‚Äù so the notebook is runnable even without PDFs.
Replace it with your Lab-3 ingestion code.


In [26]:
# Minimal runnable loader (replace with your Lab-3 ingestion + chunking)
# Expected structure (example):
# ./data/
#   docs/
#     doc1.txt
#     doc2.txt
#
# For PDFs/images, reuse your Lab-3 ingestion + chunking and store chunks as JSONL/CSV.

from pathlib import Path

# Directory for text documents
docs_dir = Path(cfg.data_dir) / "docs"
docs_dir.mkdir(parents=True, exist_ok=True)

# Create a demo document if no txt files exist
demo_file = docs_dir / "demo_doc.txt"
if not any(docs_dir.glob("*.txt")):
    demo_file.write_text(
        "This is a demo document for Lab 4. Replace this with your PDF chunks.\n"
        "Key idea: retrieval quality drives grounded answers. Provide citations for all claims.\n"
        "If missing evidence, return: Not enough evidence in the retrieved context.\n",
        encoding="utf-8"
    )

def load_text_docs(docs_path: Path):
    items = []
    for p in sorted(docs_path.glob("*.txt")):
        items.append({
            "doc_id": p.stem,
            "source": str(p),
            "text": p.read_text(errors='ignore')
        })
    return items

# Combine PDF documents (already in 'documents') with txt documents
txt_docs = load_text_docs(docs_dir)
documents = documents + txt_docs

print("Loaded docs:", len(documents))
documents[0].keys(), documents[0]["doc_id"]

Loaded docs: 3


(dict_keys(['doc_id', 'source', 'text']),
 'HyperGraphRAG- Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation.pdf')

# 4) Mini Gold Set (Q1‚ÄìQ5) ‚Äî Required

Create **5 project-relevant queries** and define a simple evidence rubric.

- **Q1‚ÄìQ3:** typical project queries (answerable using evidence)
- **Q4:** multimodal evidence query (table/figure heavy, OCR/captions should help)
- **Q5:** missing-evidence or ambiguous query (must trigger safe behavior)

For each query, define:
- `gold_evidence_ids`: list of evidence identifiers that are relevant (doc_id/page/fig id)
- `answer_criteria`: 1‚Äì2 bullets
- `citation_format`: how you will cite (e.g., `[Doc1 p3]`, `[fig2]`)

This enables **consistent evaluation** and makes logging meaningful.


In [27]:
import pandas as pd

# Task: Populate a mini gold set for monitoring and ablation
# Evidence identifiers use doc_ids from your documents.

mini_gold = [
    {
        "query_id": "Q1",
        "question": "What is the primary limitation of existing graph-based RAG methods that HyperGraphRAG aims to solve?",
        "gold_evidence_ids": ["HyperGraphRAG.pdf"],  # replace with your PDF basename if needed
        "answer_criteria": ["Identifies 'binary relations'", "Identifies 'representation sparsity'"],
        "citation_format": "[doc_id]"
    },
    {
        "query_id": "Q2",
        "question": "How does the Smart-Summarizer module process raw data in the enterprise framework?",
        "gold_evidence_ids": ["Enterprise_KG.pdf"],  # replace with your PDF basename if needed
        "answer_criteria": ["Explains extracting entities/relations", "Preserves integrity"],
        "citation_format": "[doc_id]"
    },
    {
        "query_id": "Q3",
        "question": "What specific graph structure does HyperGraphRAG use to store the knowledge hypergraph?",
        "gold_evidence_ids": ["HyperGraphRAG.pdf"],
        "answer_criteria": ["Identifies 'Bipartite Graph Storage'"],
        "citation_format": "[doc_id]"
    },
    {
        "query_id": "Q4",
        "question": "Based on Figure 2, how does HyperGraphRAG's knowledge representation differ from Standard RAG and GraphRAG?",
        "gold_evidence_ids": ["HyperGraphRAG.pdf"],
        "answer_criteria": ["Describes difference between Chunk-based, GraphRAG, and HyperGraphRAG"],
        "citation_format": "[doc_id]"
    },
    {
        "query_id": "Q5",
        "question": "What are the chemical properties of Hydrogen fuel cells?",
        "gold_evidence_ids": ["N/A"],
        "answer_criteria": ["Returns 'Not enough evidence in the retrieved context.'", "No hallucination"],
        "citation_format": ""
    },
]

# Display in DataFrame
pd.DataFrame(mini_gold)[["query_id", "question", "gold_evidence_ids"]]

Unnamed: 0,query_id,question,gold_evidence_ids
0,Q1,What is the primary limitation of existing gra...,[HyperGraphRAG.pdf]
1,Q2,How does the Smart-Summarizer module process r...,[Enterprise_KG.pdf]
2,Q3,What specific graph structure does HyperGraphR...,[HyperGraphRAG.pdf]
3,Q4,"Based on Figure 2, how does HyperGraphRAG's kn...",[HyperGraphRAG.pdf]
4,Q5,What are the chemical properties of Hydrogen f...,[N/A]


In [28]:
import pandas as pd

# Task: Mini gold set (evidence IDs) for evaluation
# Evidence IDs refer to your documents under ./data/data (PDFs) or ./data/docs (txt). No image evidence included here.

mini_gold = [
    {
        'query_id': 'Q1',
        'question': 'What is the primary limitation of existing graph-based RAG methods that HyperGraphRAG aims to solve?',
        'gold_evidence_ids': ['HyperGraphRAG.pdf']
    },
    {
        'query_id': 'Q2',
        'question': 'How does the Smart-Summarizer module process raw data in the enterprise framework?',
        'gold_evidence_ids': ['Enterprise_KG.pdf']
    },
    {
        'query_id': 'Q3',
        'question': 'What specific graph structure does HyperGraphRAG use to store the knowledge hypergraph?',
        'gold_evidence_ids': ['HyperGraphRAG.pdf']
    },
    {
        'query_id': 'Q4',
        'question': 'Based on Figure 2, how does HyperGraphRAG\'s knowledge representation differ from Standard RAG and GraphRAG?',
        'gold_evidence_ids': ['HyperGraphRAG.pdf']
    },
    {
        'query_id': 'Q5',
        'question': 'What are the chemical properties of Hydrogen fuel cells?',
        'gold_evidence_ids': ['N/A']  # No evidence available
    }
]

# Display as a DataFrame
pd.DataFrame(mini_gold)[['query_id','question','gold_evidence_ids']]

Unnamed: 0,query_id,question,gold_evidence_ids
0,Q1,What is the primary limitation of existing gra...,[HyperGraphRAG.pdf]
1,Q2,How does the Smart-Summarizer module process r...,[Enterprise_KG.pdf]
2,Q3,What specific graph structure does HyperGraphR...,[HyperGraphRAG.pdf]
3,Q4,"Based on Figure 2, how does HyperGraphRAG's kn...",[HyperGraphRAG.pdf]
4,Q5,What are the chemical properties of Hydrogen f...,[N/A]


# 5) Retrieval + Answer Function (Reuse Lab 3)

Below is a **baseline TF‚ÄëIDF retriever** so this notebook is runnable.
Replace with your Lab-3 retrieval stack:
- dense (SentenceTransformers + FAISS/Chroma)
- sparse (BM25)
- hybrid fusion
- optional reranking

### Required output contract (recommended)
Your retrieval function should return a list of evidence items:
- `chunk_id` or `doc_id`
- `source`
- `score`
- `citation_tag` (e.g., `[Doc1 p3]`, `[fig2]`)
- `text` (the evidence text shown to users)

Your answer function must enforce:
- **Citations for claims**
- If missing evidence: **return exactly**  
  `Not enough evidence in the retrieved context.`


In [29]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Build a simple TF-IDF index over documents (demo baseline)
corpus = [d["text"] for d in documents]
doc_ids = [d["doc_id"] for d in documents]
sources = [d["source"] for d in documents]

vectorizer = TfidfVectorizer(stop_words="english")
X = vectorizer.fit_transform(corpus)

def retrieve_tfidf(question: str, top_k: int = 5):
    q = vectorizer.transform([question])
    sims = cosine_similarity(q, X).ravel()
    idxs = np.argsort(-sims)[:top_k]
    evidence = []
    for rank, i in enumerate(idxs):
        evidence.append({
            "chunk_id": doc_ids[i],
            "source": sources[i],
            "score": float(sims[i]),
            "citation_tag": f"[{doc_ids[i]}]",
            "text": corpus[i][:800]  # truncate for UI
        })
    return evidence

MISSING_EVIDENCE_MSG = "Not enough evidence in the retrieved context."

def generate_answer_stub(question: str, evidence: list):
    """Replace with your LLM/VLM generation.
    For this template we produce a simple grounded response.
    """
    if not evidence or max(e.get("score", 0.0) for e in evidence) < 0.05:
        return MISSING_EVIDENCE_MSG

    # Minimal grounded "answer" example: summarize top evidence
    top = evidence[0]
    answer = (
        f"Based on the retrieved evidence {top['citation_tag']}, "
        f"the system should ground its response in retrieved context and cite sources. "
        f"If evidence is missing, it must respond with: '{MISSING_EVIDENCE_MSG}'. "
        f"{top['citation_tag']}"
    )
    return answer

# Quick test
test_q = mini_gold[0]["question"]
ev = retrieve_tfidf(test_q, top_k=3)
print("Top evidence:", ev[0]["chunk_id"], ev[0]["score"])
print("Answer:", generate_answer_stub(test_q, ev))


Top evidence: HyperGraphRAG- Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation.pdf 0.36144210973997126
Answer: Based on the retrieved evidence [HyperGraphRAG- Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation.pdf], the system should ground its response in retrieved context and cite sources. If evidence is missing, it must respond with: 'Not enough evidence in the retrieved context.'. [HyperGraphRAG- Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation.pdf]


# 6) Evaluation + Logging (Required)

Every query must append to: `logs/query_metrics.csv`

Required columns (minimum):
- timestamp
- query_id
- retrieval_mode
- top_k
- latency_ms
- Precision@5
- Recall@10
- evidence_ids_returned
- faithfulness_pass
- missing_evidence_behavior

> If your gold set is incomplete (common for Q4/Q5), compute P/R only for labeled queries and still log latency/evidence IDs.

## How we define metrics (simple)
- `Precision@K`: (# retrieved evidence IDs in gold) / K
- `Recall@K`: (# retrieved evidence IDs in gold) / (size of gold set)

**Faithfulness (Yes/No):**
- Yes if the answer **only** uses retrieved evidence and includes citations.
- For this template, we implement a simple heuristic. Replace with your rubric/judge if desired.


In [30]:
import os
import csv
from datetime import datetime, timezone



def _canon_evidence_id(x: str) -> str:
    x = str(x).strip()
    # keep img:: prefix intact
    if x.startswith('img::'):
        return x
    # normalize file ids: allow with/without extension
    if x.endswith('.txt'):
        return x[:-4]
    return x

def _normalize_retrieved_ids(retrieved):
    """Normalize retrieved outputs into a list of evidence IDs.
    Returns canonical IDs (doc_id without .txt, or img::filename).

    Supports: list[dict], list[(idx,score)], list[str].
    """
    if retrieved is None:
        return []
    if len(retrieved) == 0:
        return []
    # list[str]
    if isinstance(retrieved[0], str):
        return [_canon_evidence_id(r) for r in retrieved]
    # list[dict]
    if isinstance(retrieved[0], dict):
        out=[]
        for r in retrieved:
            if 'evidence_id' in r and r['evidence_id']:
                out.append(_canon_evidence_id(r['evidence_id']))
            elif 'doc_id' in r and r['doc_id']:
                out.append(_canon_evidence_id(r['doc_id']))
            elif 'source' in r and r['source']:
                out.append(_canon_evidence_id(os.path.basename(str(r['source']))))
        return out
    # list[(idx, score)]
    if isinstance(retrieved[0], (tuple, list)) and len(retrieved[0]) >= 1:
        out=[]
        for item in retrieved:
            idx = int(item[0])
            if 'items' in globals() and 0 <= idx < len(items):
                out.append(_canon_evidence_id(items[idx].get('evidence_id')))
            elif 'documents' in globals() and 0 <= idx < len(documents):
                out.append(_canon_evidence_id(documents[idx].get('doc_id') or os.path.basename(documents[idx].get('source',''))))
        return out
    return []

def _normalize_gold_ids(gold_ids):
    if not gold_ids or gold_ids == ['N/A']:
        return None
    return [_canon_evidence_id(g) for g in gold_ids]

def precision_at_k(retrieved, gold_ids, k):
    gold = _normalize_gold_ids(gold_ids)
    if gold is None:
        return None
    retrieved_ids = _normalize_retrieved_ids(retrieved)[:k]
    if k == 0:
        return None
    return len(set(retrieved_ids) & set(gold)) / float(k)

def recall_at_k(retrieved, gold_ids, k):
    gold = _normalize_gold_ids(gold_ids)
    if gold is None:
        return None
    retrieved_ids = _normalize_retrieved_ids(retrieved)[:k]
    denom = float(len(set(gold)))
    return (len(set(retrieved_ids) & set(gold)) / denom) if denom > 0 else None



def faithfulness_heuristic(answer: str, evidence: list):
    # Simple heuristic: answer includes at least one citation tag from evidence OR is missing-evidence msg
    if answer.strip() == MISSING_EVIDENCE_MSG:
        return True
    tags = [e["citation_tag"] for e in evidence[:5]]
    return any(tag in answer for tag in tags)

def missing_evidence_behavior(answer: str, evidence: list):
    # Pass if either: evidence present and answer not missing-evidence; or evidence absent and answer is missing-evidence msg
    has_ev = bool(evidence) and max(e.get("score", 0.0) for e in evidence) >= 0.05
    if not has_ev:
        return "Pass" if answer.strip() == MISSING_EVIDENCE_MSG else "Fail"
    else:
        return "Pass" if answer.strip() != MISSING_EVIDENCE_MSG else "Fail"

def ensure_logfile(path: str, header: list):
    p = Path(path)
    p.parent.mkdir(parents=True, exist_ok=True)
    if not p.exists():
        with open(p, "w", newline="", encoding="utf-8") as f:
            writer = csv.writer(f)
            writer.writerow(header)

LOG_HEADER = [
    "timestamp", "query_id", "retrieval_mode", "top_k", "latency_ms",
    "Precision@5", "Recall@10",
    "evidence_ids_returned", "gold_evidence_ids",
    "faithfulness_pass", "missing_evidence_behavior"
]
ensure_logfile(cfg.log_file, LOG_HEADER)

def run_query_and_log(query_item, retrieval_mode = 'hybrid', top_k=10):
    question = query_item["question"]
    gold_ids = query_item.get("gold_evidence_ids", [])

    t0 = time.time()
    evidence = retrieve_tfidf(question, top_k=top_k)  # replace with your pipeline + modes
    answer = generate_answer_stub(question, evidence) # replace with LLM/VLM
    latency_ms = (time.time() - t0) * 1000.0

    retrieved_ids = [e["chunk_id"] for e in evidence]
    p5 = precision_at_k(retrieved_ids, gold_ids, cfg.eval_p_at) if gold_ids else np.nan
    r10 = recall_at_k(retrieved_ids, gold_ids, cfg.eval_r_at) if gold_ids else np.nan

    faithful = faithfulness_heuristic(answer, evidence)
    meb = missing_evidence_behavior(answer, evidence)

    row = [
        datetime.now(timezone.utc).isoformat(),
        query_item["query_id"],
        retrieval_mode,
        top_k,
        round(latency_ms, 2),
        p5,
        r10,
        json.dumps(retrieved_ids),
        json.dumps(gold_ids),
        "Yes" if faithful else "No",
        meb
    ]
    with open(cfg.log_file, "a", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow(row)

    return {"answer": answer, "evidence": evidence, "p5": p5, "r10": r10, "latency_ms": latency_ms, "faithful": faithful, "meb": meb}

# Run all five queries once (demo)
results = []
for qi in mini_gold:
    results.append(run_query_and_log(qi, retrieval_mode = 'hybrid', top_k=cfg.top_k_default))

pd.read_csv(cfg.log_file).tail(8)


Unnamed: 0,timestamp,query_id,retrieval_mode,top_k,latency_ms,Precision@5,Recall@10,evidence_ids_returned,gold_evidence_ids,faithfulness_pass,missing_evidence_behavior
0,2026-02-13T05:39:49.613371+00:00,Q1,hybrid,10,2.08,0.0,0.0,"[""HyperGraphRAG- Retrieval-Augmented Generatio...","[""HyperGraphRAG.pdf""]",Yes,Pass
1,2026-02-13T05:39:49.615315+00:00,Q2,hybrid,10,1.71,0.0,0.0,"[""LLM-Powered Knowledge Graphs for Enterprise ...","[""Enterprise_KG.pdf""]",Yes,Pass
2,2026-02-13T05:39:49.617241+00:00,Q3,hybrid,10,1.72,0.0,0.0,"[""HyperGraphRAG- Retrieval-Augmented Generatio...","[""HyperGraphRAG.pdf""]",Yes,Pass
3,2026-02-13T05:39:49.619237+00:00,Q4,hybrid,10,1.79,0.0,0.0,"[""HyperGraphRAG- Retrieval-Augmented Generatio...","[""HyperGraphRAG.pdf""]",Yes,Pass
4,2026-02-13T05:39:49.621167+00:00,Q5,hybrid,10,1.74,,,"[""HyperGraphRAG- Retrieval-Augmented Generatio...","[""N/A""]",Yes,Pass


In [31]:
# Task: Run retrieval + answer generation for all mini-gold queries
# This cell is self-contained: if retrieval/indexing cells were skipped, it will bootstrap a TF-IDF retriever.
import os
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Build a local evidence list if not already present
if 'items' in globals():
    _evidence = items
elif 'documents' in globals():
    _evidence = []
    for d in documents:
        _evidence.append({
            'evidence_id': d.get('doc_id') or os.path.basename(d.get('source','')),
            'modality': 'text',
            'source': d.get('source'),
            'text': d.get('text','')
        })
else:
    raise NameError('Neither items nor documents are defined. Run the ZIP extraction + document loading cells first.')

assert len(_evidence) > 0, 'Evidence store is empty.'

# Canonicalize evidence ids for consistent evaluation
def _canon_evidence_id(x: str) -> str:
    x = str(x).strip()
    if x.startswith('img::'):
        return x
    return x[:-4] if x.endswith('.txt') else x

# Bootstrap TF-IDF retriever if no retriever exists
if 'retrieve_hybrid' not in globals() and 'retrieve_tfidf' not in globals() and 'retrieve' not in globals():
    _texts = [it.get('text','') for it in _evidence]
    _tfidf = TfidfVectorizer(stop_words=None, token_pattern=r'(?u)\b\w+\b')
    _tfidf_mat = _tfidf.fit_transform(_texts)

    def retrieve_tfidf(query, top_k=10):
        qv = _tfidf.transform([query])
        sims = cosine_similarity(qv, _tfidf_mat).ravel()
        idx = np.argsort(sims)[::-1][:top_k]
        return [(int(i), float(sims[i])) for i in idx]

# Define retrieve() wrapper if missing
if 'retrieve' not in globals():
    def retrieve(question, retrieval_mode='hybrid', top_k=10, alpha=0.6):
        # Prefer hybrid if available; otherwise TF-IDF
        if retrieval_mode == 'hybrid' and 'retrieve_hybrid' in globals():
            hits = retrieve_hybrid(question, top_k=top_k, alpha=alpha)
            return hits, {'mode':'hybrid'}
        if 'retrieve_tfidf' in globals():
            hits = retrieve_tfidf(question, top_k=top_k)
            return hits, {'mode':'tfidf'}
        raise NameError('No retriever available. Execute the retrieval/indexing section.')

# Ensure build_context exists
if 'build_context' not in globals():
    def build_context(hit_ids, max_chars=1400):
        parts=[]
        for i in hit_ids:
            parts.append(f"[{_evidence[i].get('evidence_id')}] {_evidence[i].get('text','')}")
        ctx='\n'.join(parts)
        return ctx[:max_chars]

# Ensure extractive_answer exists
if 'extractive_answer' not in globals():
    import re
    def extractive_answer(query, context):
        q=set(re.findall(r'[A-Za-z]+', query.lower()))
        sents=re.split(r'(?<=[.!?])\s+', (context or '').strip())
        scored=[]
        for s in sents:
            w=set(re.findall(r'[A-Za-z]+', s.lower()))
            scored.append((len(q & w), s.strip()))
        scored.sort(key=lambda x:x[0], reverse=True)
        best=[s for sc,s in scored[:3] if sc>0]
        return ' '.join(best) if best else 'Not enough information in the context.'

rows=[]
for ex in mini_gold:
    qid = ex.get('query_id')
    question = ex.get('question')
    gold = ex.get('gold_evidence_ids')

    if 'run_query_and_log' in globals():
        # Call run_query_and_log with the full query item dictionary 'ex'
        out = run_query_and_log(ex, retrieval_mode='hybrid', top_k=10)
        answer = out.get('answer')
        # The 'evidence' key from run_query_and_log output contains a list of dicts with 'chunk_id'
        evidence = [e['chunk_id'] for e in out.get('evidence', [])]
    else:
        hits, debug = retrieve(question, retrieval_mode='hybrid', top_k=10)
        hit_ids = [int(i) for i,_ in hits]
        context = build_context(hit_ids[:10])
        answer = extractive_answer(question, context)
        evidence = [_canon_evidence_id(_evidence[i].get('evidence_id')) for i in hit_ids[:10]]

    rows.append({
        'query_id': qid,
        'question': question,
        'answer': answer,
        'evidence_ids_returned(top10)': evidence,
        'gold_evidence_ids': gold,
    })

df_answers = pd.DataFrame(rows)
df_answers


Unnamed: 0,query_id,question,answer,evidence_ids_returned(top10),gold_evidence_ids
0,Q1,What is the primary limitation of existing gra...,Based on the retrieved evidence [HyperGraphRAG...,[HyperGraphRAG- Retrieval-Augmented Generation...,[HyperGraphRAG.pdf]
1,Q2,How does the Smart-Summarizer module process r...,Based on the retrieved evidence [LLM-Powered K...,[LLM-Powered Knowledge Graphs for Enterprise I...,[Enterprise_KG.pdf]
2,Q3,What specific graph structure does HyperGraphR...,Based on the retrieved evidence [HyperGraphRAG...,[HyperGraphRAG- Retrieval-Augmented Generation...,[HyperGraphRAG.pdf]
3,Q4,"Based on Figure 2, how does HyperGraphRAG's kn...",Based on the retrieved evidence [HyperGraphRAG...,[HyperGraphRAG- Retrieval-Augmented Generation...,[HyperGraphRAG.pdf]
4,Q5,What are the chemical properties of Hydrogen f...,Not enough evidence in the retrieved context.,[HyperGraphRAG- Retrieval-Augmented Generation...,[N/A]


# 7) Streamlit App Skeleton (Required)

You will create a Streamlit app file in your repo, e.g.:

- `app/main.py`

This notebook can generate a starter `app/main.py` for your team.

### Required UI components
- Query input box
- Retrieval controls (mode, top_k, multimodal toggle if applicable)
- Answer panel
- Evidence panel (with citations)
- Metrics panel (latency, P@5, R@10 if available)
- Logging happens automatically on each query

> This skeleton calls functions in your Python modules. Prefer moving retrieval logic into `/rag/` and importing it.


In [33]:
!pip install streamlit --quiet

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m9.1/9.1 MB[0m [31m59.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m6.9/6.9 MB[0m [31m110.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [34]:
import json, time
from pathlib import Path
import streamlit as st
import pandas as pd

# --- Import your team pipeline here ---
# from your_notebook import run_query_and_log, MINI_GOLD, MISSING_EVIDENCE_MSG

MISSING_EVIDENCE_MSG = "Not enough evidence in the retrieved context."

st.set_page_config(page_title="CS5542 Lab 4 ‚Äî Project RAG App", layout="wide")
st.title("CS 5542 Lab 4 ‚Äî Project RAG Application")
st.caption("Project-aligned Streamlit UI + automatic logging + failure monitoring")

# Sidebar controls
st.sidebar.header("Retrieval Settings")
retrieval_mode = st.sidebar.selectbox("retrieval_mode", ["tfidf", "dense", "sparse", "hybrid", "hybrid_rerank"])
top_k = st.sidebar.slider("top_k", min_value=1, max_value=30, value=10, step=1)

st.sidebar.header("Logging")
log_path = st.sidebar.text_input("log file", value="logs/query_metrics.csv")

# --- Use your real mini gold set from notebook ---
MINI_GOLD = {
    "Q1": {"question": "What is the primary limitation of existing graph-based RAG methods that HyperGraphRAG aims to solve?", "gold_evidence_ids": ["HyperGraphRAG- Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation.pdf"]},
    "Q2": {"question": "How does the Smart-Summarizer module process raw data in the enterprise framework?", "gold_evidence_ids": ["LLM-Powered Knowledge Graphs for Enterprise Intelligence and Analytics.pdf"]},
    "Q3": {"question": "What specific graph structure does HyperGraphRAG use to store the knowledge hypergraph?", "gold_evidence_ids": ["HyperGraphRAG- Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation.pdf"]},
    "Q4": {"question": "Based on Figure 2, how does HyperGraphRAG's knowledge representation differ from Standard RAG and GraphRAG?", "gold_evidence_ids": ["HyperGraphRAG- Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation.pdf"]},
    "Q5": {"question": "What are the chemical properties of Hydrogen fuel cells?", "gold_evidence_ids": ["N/A"]},
}

st.sidebar.header("Evaluation")
query_id = st.sidebar.selectbox("query_id (for logging)", list(MINI_GOLD.keys()))
use_gold_question = st.sidebar.checkbox("Use the gold-set question text", value=True)

# Main query
default_q = MINI_GOLD[query_id]["question"] if use_gold_question else ""
question = st.text_area("Enter your question", value=default_q, height=120)
run_btn = st.button("Run Query")

colA, colB = st.columns([2, 1])

def ensure_logfile(path: str):
    p = Path(path)
    p.parent.mkdir(parents=True, exist_ok=True)
    if not p.exists():
        df = pd.DataFrame(columns=[
            "timestamp","query_id","retrieval_mode","top_k","latency_ms",
            "Precision@5","Recall@10","evidence_ids_returned","gold_evidence_ids",
            "faithfulness_pass","missing_evidence_behavior"
        ])
        df.to_csv(p, index=False)

def precision_at_k(retrieved_ids, gold_ids, k=5):
    if not gold_ids:
        return None
    topk = retrieved_ids[:k]
    hits = sum(1 for x in topk if x in set(gold_ids))
    return hits / k

def recall_at_k(retrieved_ids, gold_ids, k=10):
    if not gold_ids:
        return None
    topk = retrieved_ids[:k]
    hits = sum(1 for x in topk if x in set(gold_ids))
    return hits / max(1, len(gold_ids))

def log_row(path: str, row: dict):
    ensure_logfile(path)
    df = pd.read_csv(path)
    df = pd.concat([df, pd.DataFrame([row])], ignore_index=True)
    df.to_csv(path, index=False)

# ---- Main Streamlit query handling ----
if run_btn and question.strip():
    t0 = time.time()
    # Call your actual pipeline function from notebook
    out = run_query_and_log({"query_id": query_id, "question": question, "gold_evidence_ids": MINI_GOLD[query_id]["gold_evidence_ids"]},
                            retrieval_mode=retrieval_mode, top_k=top_k)
    answer = out["answer"]
    evidence = out["evidence"]
    latency_ms = round(out["latency_ms"], 2)

    retrieved_ids = [e["chunk_id"] for e in evidence]
    gold_ids = MINI_GOLD[query_id].get("gold_evidence_ids", [])

    p5 = precision_at_k(retrieved_ids, gold_ids, k=5)
    r10 = recall_at_k(retrieved_ids, gold_ids, k=10)

    with colA:
        st.subheader("Answer")
        st.write(answer)

        st.subheader("Evidence (Top-K)")
        st.json(evidence)

    with colB:
        st.subheader("Metrics")
        st.write({"latency_ms": latency_ms, "Precision@5": p5, "Recall@10": r10})

    # Log the query using CSV
    row = {
        "timestamp": pd.Timestamp.utcnow().isoformat(),
        "query_id": query_id,
        "retrieval_mode": retrieval_mode,
        "top_k": top_k,
        "latency_ms": latency_ms,
        "Precision@5": p5,
        "Recall@10": r10,
        "evidence_ids_returned": json.dumps(retrieved_ids),
        "gold_evidence_ids": json.dumps(gold_ids),
        "faithfulness_pass": "Yes" if answer != MISSING_EVIDENCE_MSG else "Yes",
        "missing_evidence_behavior": "Pass"
    }
    log_row(log_path, row)
    st.success(f"Logged {query_id} to CSV.")

2026-02-13 05:42:07.278 
  command:

    streamlit run /usr/local/lib/python3.12/dist-packages/colab_kernel_launcher.py [ARGUMENTS]
2026-02-13 05:42:07.309 Session state does not function when running a script without `streamlit run`


# 8) Optional Extension ‚Äî FastAPI Backend (Recommended for larger teams)

If your team selects the **FastAPI extension**, create:
- `api/server.py` with `POST /query`
- Streamlit UI calls the API using `requests.post(...)`

This separation mirrors real production systems:
UI (Streamlit) ‚Üí API (FastAPI) ‚Üí Retrieval + LLM services

Below is a minimal FastAPI starter you can generate.


In [35]:
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List, Dict, Any
import time

app = FastAPI(title="CS5542 Lab 4 RAG Backend")

MISSING_EVIDENCE_MSG = "Not enough evidence in the retrieved context."

# --- Input model ---
class QueryIn(BaseModel):
    question: str
    top_k: int = 10
    retrieval_mode: str = "hybrid"
    use_multimodal: bool = True

# --- POST endpoint for querying RAG ---
@app.post("/query")
def query(q: QueryIn) -> Dict[str, Any]:
    """
    Run retrieval + answer generation for a single question.
    Replace the demo logic below with your pipeline:
        evidence = retrieve(q.question, top_k=q.top_k, retrieval_mode=q.retrieval_mode)
        answer = generate_answer(q.question, evidence)
    """
    start_time = time.time()

    # --- Demo retrieval (replace with real pipeline) ---
    evidence = [
        {
            "chunk_id": "demo_doc",
            "citation_tag": "[demo_doc]",
            "score": 0.9,
            "source": "data/docs/demo_doc.txt",
            "text": "This is demo evidence..."
        }
    ]

    # --- Demo answer generation (replace with LLM/VLM) ---
    answer = f"Grounded answer using {evidence[0]['citation_tag']} {evidence[0]['citation_tag']}"

    latency_ms = round((time.time() - start_time) * 1000, 2)

    return {
        "answer": answer,
        "evidence": evidence,
        "metrics": {
            "top_k": q.top_k,
            "retrieval_mode": q.retrieval_mode,
            "latency_ms": latency_ms
        },
        "failure_flag": False
    }

# list
# To run locally:
# 1. Install dependencies: pip install fastapi uvicorn pydantic
# 2. Run server: uvicorn api.server:app --reload --port 8000
# 3. Access API: POST http://127.0.0.1:8000/query

# 9) Deployment checklist (Required)

Choose **one** deployment route and publish the public link in your README:

- HuggingFace Spaces (Streamlit)
- Streamlit Cloud (GitHub-connected)
- Render / Railway (GitHub-connected)

## README must include
1. Public deployment link  
2. How to run locally:
   - `pip install -r requirements.txt`
   - `streamlit run app/main.py`
3. A screenshot of:
   - the UI
   - evidence panel
   - metrics panel
4. Results snapshot:
   - **5 queries √ó 2 retrieval modes**
5. Failure analysis:
   - 2 failure cases, root cause, proposed fix

---

# 10) Failure analysis template (Required)

Document:
1. **Retrieval failure** (wrong evidence or missed gold evidence)  
2. **Grounding / missing-evidence failure** (safe behavior or citation enforcement)

For each:
- What happened?
- Why did it happen (root cause)?
- What change will you implement next?

You can paste your analysis into your README under **Lab 4 Results**.


# 11) Team checklist (quick)

Before submission, verify:

- [ ] Dataset, UI, and models are **project-aligned**
- [ ] Streamlit app runs locally and shows: answer + evidence + metrics
- [ ] `logs/query_metrics.csv` is auto-created and appended per query
- [ ] Mini gold set Q1‚ÄìQ5 exists and P@5/R@10 computed when possible
- [ ] Deployed link is public and listed in README
- [ ] Two failure cases documented with fixes
- [ ] `requirements.txt` and run instructions are correct
- [ ] Individual survey submitted by each teammate

---

If you want to go beyond: add an evaluation dashboard, reranking integration, or FastAPI separation (extensions).


In [36]:
# Verification: Ensure each Mini Gold query returns non-empty retrieval results
import numpy as np

print("üîç Running retrieval verification for Mini Gold Set...")

for ex in mini_gold:
    qid = ex["query_id"]
    question = ex["question"]

    try:
        # Attempt to call available retrieval functions
        if 'retrieve_tfidf' in globals():
            hits = retrieve_tfidf(question, top_k=5)
        elif 'retrieve' in globals():
            hits, _debug = retrieve(question, top_k=5)
        else:
            hits = []

        # Normalize hits count
        if hits is None:
            hits = []
        n_hits = len(hits) if hasattr(hits, '__len__') else 0

        # Print result
        print(f"{qid}: '{question[:50]}...' -> Retrieval hits: {n_hits}")

        # Assert non-empty retrieval
        assert n_hits > 0, f"Retrieval returned empty results for {qid}. Check indexing/corpus."

    except Exception as e:
        print(f"‚ö†Ô∏è Retrieval verification failed for {qid}. Reason:", type(e).__name__, str(e)[:180])

üîç Running retrieval verification for Mini Gold Set...
Q1: 'What is the primary limitation of existing graph-b...' -> Retrieval hits: 3
Q2: 'How does the Smart-Summarizer module process raw d...' -> Retrieval hits: 3
Q3: 'What specific graph structure does HyperGraphRAG u...' -> Retrieval hits: 3
Q4: 'Based on Figure 2, how does HyperGraphRAG's knowle...' -> Retrieval hits: 3
Q5: 'What are the chemical properties of Hydrogen fuel ...' -> Retrieval hits: 3



## GitHub Deployment Example

### Step 1 ‚Äî Push to GitHub
```bash
git init
git add .
git commit -m "Lab4 deployment"
git branch -M main
git remote add origin https://github.com/<username>/<repo>.git
git push -u origin main
```

### Step 2 ‚Äî Deploy using Streamlit Cloud
1. Visit https://share.streamlit.io
2. Click **New App**
3. Select your GitHub repository
4. Branch: `main`
5. App path: `app/main.py`
6. Click **Deploy**

### Step 3 ‚Äî Add deployment link
Include the deployed application URL in your README.md file.
