# 03. RAG Generation (Orchestration Only)

This notebook orchestrates:

- Retrieval (vector / LLM rerank)
- Context construction
- LLM generation with citation validation + optional two-pass retry
- Optional run logging

All core logic lives in `src/`. This notebook only wires components together.


In [25]:
from pathlib import Path

# Project root convention: run notebooks from repo root (invest-rag/)
PROJECT_ROOT = Path.cwd()
assert (PROJECT_ROOT / "src").exists(), f"Run from project root. cwd={PROJECT_ROOT}"

#artifacts
INDEX_DIR  = PROJECT_ROOT / "indexes" / "faiss"
INDEX_PATH = INDEX_DIR / "index.bin"
META_PATH  = INDEX_DIR / "meta.jsonl"

assert INDEX_PATH.exists(), f"Missing index: {INDEX_PATH}"
assert META_PATH.exists(),  f"Missing meta:  {META_PATH}"

#retrieval modules
from src.llm.embedding import embed_query
from src.retrieval.vector_store import VectorStore
from src.eval.search_wrappers import make_vectorstore_search_fn, make_llm_rerank_search_fn

vs = VectorStore.load(index_path=INDEX_PATH, meta_path=META_PATH)

vector_search_fn = make_vectorstore_search_fn(vs, embed_query=embed_query, normalize=True)
rerank_search_fn = make_llm_rerank_search_fn(vector_search_fn, k_vec=10)

print("PROJECT_ROOT:", PROJECT_ROOT)
print("Artifacts:", INDEX_DIR)
print("Ready: vector_search_fn / rerank_search_fn")

PROJECT_ROOT: c:\Users\CG\Desktop\invest-rag
Artifacts: c:\Users\CG\Desktop\invest-rag\indexes\faiss
Ready: vector_search_fn / rerank_search_fn


## Context + Generation

We reuse the existing modules:

- `src/llm/context.py`: `build_context(results) -> str`
- `src/llm/generate.py`: `rag_generate_with_retry(query, context, retrieved) -> (answer, ok)`


In [7]:
from src.llm.context import build_context
from src.llm.generate import rag_generate_with_retry

## Run Logging (Optional)

Logs each run as a JSONL row under `logs/run_logs.jsonl`.


In [None]:
import datetime
import json

RUNLOG_PATH = PROJECT_ROOT / "logs" / "run_logs.jsonl"

def log_run(payload: dict, path: Path = RUNLOG_PATH) -> None:
    path.parent.mkdir(parents=True, exist_ok=True)
    row = {"ts": datetime.datetime.now().isoformat(timespec="seconds"), **payload}
    with path.open("a", encoding="utf-8") as f:
        f.write(json.dumps(row, ensure_ascii=False) + "\n")


## Orchestration

- Choose retrieval mode: vector vs rerank
- Retrieve top-k results
- Build context (from `src/llm/context.py`)
- Generate grounded answer with citations (from `src/llm/generate.py`)
- Return answer + sources list


In [9]:
USE_RERANK = False
K = 5

def retrieve(query: str, k: int = K, use_rerank: bool = USE_RERANK):
    fn = rerank_search_fn if use_rerank else vector_search_fn
    return fn(query, k)

def format_sources(results: list[dict], max_items: int = 8) -> list[str]:
    """Minimal helper (not retrieval logic): pretty-print sources."""
    seen = set()
    out = []
    for r in results:
        doc_id = r.get("doc_id")
        if not doc_id:
            continue
        title = (r.get("title") or "").strip()
        date  = (r.get("date") or "").strip()
        src   = (r.get("source") or "").strip()

        s = f"[{doc_id}] {title}".strip()
        tail = ", ".join([x for x in [date, src] if x])
        if tail:
            s += f" ({tail})"

        if s not in seen:
            seen.add(s)
            out.append(s)
        if len(out) >= max_items:
            break
    return out

def run_query(query: str, *, use_rerank: bool = USE_RERANK, k: int = K) -> dict:
    retrieved = retrieve(query, k=k, use_rerank=use_rerank)

    # ‚úÖ context module (no duplicated logic)
    context = build_context(retrieved)

    if not context.strip():
        return {
            "query": query,
            "answer": "Not enough context. Retrieved empty context.",
            "ok": False,
            "sources": [],
            "retrieved": retrieved,
        }

    # ‚úÖ generation module (includes citation validation + retry)
    answer, ok = rag_generate_with_retry(query=query, context=context, retrieved=retrieved)

    # optional logging
    log_run({
        "query": query,
        "k_ctx": k,
        "use_rerank": use_rerank,
        "ok": ok,
        "retrieved": [
            {
                "rank": r.get("rank"),
                "doc_id": r.get("doc_id"),
                "chunk_id": r.get("chunk_id"),
                "score": r.get("score"),
                "title": r.get("title"),
            }
            for r in retrieved
        ],
        "answer": answer,
    })

    return {
        "query": query,
        "answer": answer,
        "ok": ok,
        "sources": format_sources(retrieved),
        "retrieved": retrieved,
    }


## Grounding Strategy (Two-Pass)

`rag_generate_with_retry()` implements a two-pass strategy:

1. First pass: generate answer with `[doc_id]` citations.
2. Validate citations against retrieved doc_ids.
3. If invalid ‚Üí retry while restricting allowed doc_ids.

This reduces hallucinated citations while keeping latency low.


## Demo


In [16]:
q = "What is the foundational programming model that runs on all NVIDIA GPUs, and why does NVIDIA describe it as central to its full-stack offering?"

out = run_query(q, use_rerank=False, k=5)

print(out["answer"])
print("\nValid citations:", out["ok"])
print("\nSources:")
print("\n".join(out["sources"]))


The foundational programming model that runs on all NVIDIA GPUs is the CUDA programming model. NVIDIA describes it as central to its full-stack offering because it serves as the base for a large body of software, including hundreds of domain-specific software libraries, software development kits (SDKs), and Application Programming Interfaces (APIs). This comprehensive software stack accelerates performance and simplifies the deployment of NVIDIA accelerated computing for computationally intensive workloads such as AI model training and inference, data analytics, scientific computing, and 3D graphics. The CUDA model, along with the CUDA-X collection of acceleration libraries and domain-specific application frameworks, enables NVIDIA to provide a unified architecture that supports diverse computing requirements across various industries, making it a key element of their full-stack computing platform [nvidia_2024_item_1_business].

Valid citations: True

Sources:
[nvidia_2024_item_1_busin

When queried about an out-of-distribution company (Tesla), the system correctly abstains from hallucinating and reports insufficient evidence in the retrieved context, while still providing citation transparency.

In [17]:
unrelated_q = "Explain Risk Factors of the company Tesla"

out = run_query(unrelated_q, use_rerank=False, k=5)

print(out["answer"])
print("\nValid citations:", out["ok"])
print("\nSources:")
print("\n".join(out["sources"]))

I don't have enough information about the risk factors of Tesla in the provided context. The documents only include risk factors related to NVIDIA and Apple.

Valid citations: True

Sources:
[nvidia_2024_item_7_mda] (sec_10k_html)
[apple_2024_item_1a_risk_factors] (sec_10k_html)
[nvidia_2024_item_1_business] (sec_10k_html)


### Batch demo (optional)


In [14]:
demo_queries = [
    "What product does Apple describe as its first ‚Äúspatial computer,‚Äù and which operating system is it based on?",
    "In Apple‚Äôs ‚ÄúHome‚Äù product description, which device is described as a media streaming and gaming device, and which operating system is it based on?",
    "As of September 28, 2024, which contractual obligation had the largest amount payable within 12 months, and what was that amount?",
]

for q in demo_queries:
    out = run_query(q, use_rerank=False, k=5)
    print("\n" + "="*80)
    print("Q:", q)
    print("OK:", out["ok"])
    print("A:", out["answer"])
    print("Sources:", " | ".join(out["sources"]))



Q: What product does Apple describe as its first ‚Äúspatial computer,‚Äù and which operating system is it based on?
OK: True
A: Apple describes the Apple Vision Pro‚Ñ¢ as its first "spatial computer," which is based on its visionOS‚Ñ¢ operating system [apple_2024_item_1_business].
Sources: [apple_2024_item_1_business] (sec_10k_html) | [apple_2024_item_7_mda] (sec_10k_html) | [microsoft_2024_item_1_business] (sec_10k_html)

Q: In Apple‚Äôs ‚ÄúHome‚Äù product description, which device is described as a media streaming and gaming device, and which operating system is it based on?
OK: True
A: In Apple's "Home" product description, the device described as a media streaming and gaming device is Apple TV¬Æ, and it is based on the tvOS¬Æ operating system [apple_2024_item_1_business].
Sources: [apple_2024_item_1_business] (sec_10k_html) | [apple_2024_item_7_mda] (sec_10k_html)

Q: As of September 28, 2024, which contractual obligation had the largest amount payable within 12 months, and what w

## Vector vs Rerank comparison

This section runs the same query twice:

- **Vector baseline** (`use_rerank=False`)
- **LLM rerank** (`use_rerank=True`)

It then compares:
- retrieved top-k (rank / score / doc_id / title)
- answer + citation validity


In [23]:
# --- helpers for demo output (orchestration-only) ---
def preview_retrieved(retrieved, max_rows=5):
    rows = []
    for r in (retrieved or [])[:max_rows]:
        rows.append({
            "rank": r.get("rank"),
            "score": r.get("score"),
            "doc_id": r.get("doc_id"),
            "title": (r.get("title") or "")[:90],
        })
    return rows

def compare_top1(vec_retrieved, rr_retrieved):
    v1 = (vec_retrieved or [{}])[0].get("doc_id")
    r1 = (rr_retrieved  or [{}])[0].get("doc_id")
    return {"vector_top1": v1, "rerank_top1": r1, "changed": (v1 != r1)}

def preview_context_snippets(retrieved, k=3, max_chars=280):
    """Minimal 'eyes-on' context check: show a small snippet of top chunks."""
    for r in (retrieved or [])[:k]:
        print("-" * 90)
        print(f"[rank={r.get('rank')}] score={r.get('score'):.4f}")
        print(f"{r.get('doc_id')} | {r.get('section')}")
        text = (r.get("content") or "")
        print(text[:max_chars].replace("\n", " "))

In [24]:
# --- run comparison ---
QUERY = "Which product is described as the company‚Äôs first spatial computer, and what operating system is it based on?"

out_vec = run_query(QUERY, use_rerank=False, k=5)
out_rr  = run_query(QUERY, use_rerank=True,  k=5)

print("QUERY:", QUERY)

# (A) Compact table-like preview (metadata)
print("\n=== Retrieval top-5 (vector) [metadata preview] ===")
print(preview_retrieved(out_vec.get("retrieved", []), max_rows=5))

print("\n=== Retrieval top-5 (rerank) [metadata preview] ===")
print(preview_retrieved(out_rr.get("retrieved", []), max_rows=5))

print("\n=== Top-1 change ===")
print(compare_top1(out_vec.get("retrieved", []), out_rr.get("retrieved", [])))

# (B) Eyes-on context quality check (content snippets)
print("\n=== Context snippets (vector top-3) ===")
preview_context_snippets(out_vec.get("retrieved", []), k=3, max_chars=280)

print("\n=== Context snippets (rerank top-3) ===")
preview_context_snippets(out_rr.get("retrieved", []), k=3, max_chars=280)

# (C) Answers + citations
print("\n=== Answer (vector) ===")
print(out_vec.get("answer", ""))
print("Citations OK:", out_vec.get("ok"))
print("Sources:")
print("\n".join(out_vec.get("sources", [])))

print("\n=== Answer (rerank) ===")
print(out_rr.get("answer", ""))
print("Citations OK:", out_rr.get("ok"))
print("Sources:")
print("\n".join(out_rr.get("sources", [])))

QUERY: Which product is described as the company‚Äôs first spatial computer, and what operating system is it based on?

=== Retrieval top-5 (vector) [metadata preview] ===
[{'rank': 1, 'score': 0.4421590566635132, 'doc_id': 'microsoft_2024_item_1_business', 'title': ''}, {'rank': 2, 'score': 0.4192088842391968, 'doc_id': 'amd_2024_item_1_business', 'title': ''}, {'rank': 3, 'score': 0.41882139444351196, 'doc_id': 'microsoft_2024_item_1_business', 'title': ''}, {'rank': 4, 'score': 0.4184524714946747, 'doc_id': 'microsoft_2024_item_1_business', 'title': ''}, {'rank': 5, 'score': 0.40680205821990967, 'doc_id': 'amd_2024_item_1_business', 'title': ''}]

=== Retrieval top-5 (rerank) [metadata preview] ===
[{'rank': 1, 'score': 0.39269906282424927, 'doc_id': 'apple_2024_item_1_business', 'title': ''}, {'rank': 2, 'score': 0.4421590566635132, 'doc_id': 'microsoft_2024_item_1_business', 'title': ''}, {'rank': 3, 'score': 0.4192088842391968, 'doc_id': 'amd_2024_item_1_business', 'title': ''}, 

### üîé Result Analysis: Vector vs LLM Rerank

For this query, the company name was intentionally omitted to introduce cross-company ambiguity.

#### Vector Baseline

- Top-1 retrieved document: `microsoft_2024_item_1_business`
- Apple business section was not ranked at the top.
- Generated answer correctly abstained due to insufficient grounded context.

This indicates that the vector retriever struggled with generic phrasing such as "product" and "operating system," which appear across multiple companies.

---

#### LLM Rerank

- Top-1 document changed to: `apple_2024_item_1_business`
- Ranking corrected from Microsoft ‚Üí Apple.
- Generated answer correctly identified:
  - **Product:** Apple Vision Pro‚Ñ¢
  - **Operating System:** visionOS‚Ñ¢

The reranker successfully re-ordered the candidate documents based on deeper semantic understanding, enabling correct answer generation.

---

### ‚úÖ Key Takeaway

- The correct document was already present in the candidate pool.
- The performance difference came purely from ranking correction.
- This demonstrates the value of LLM-based reranking for ambiguous, cross-company queries.
- No hallucination occurred in either setting.