# Retrieval-Augmented Generation (RAG)

This notebook illustrates a minimal, self-contained example of Retrieval-Augmented Generation (RAG). The goal is to show how a model can ground its answers in external knowledge instead of relying only on its internal "prior" about the world. We will:

* Build a tiny in-memory documentation corpus for a fictional note-taking app called **LumaNote**.
* Implement a simple vector-based retriever using TF-IDF and cosine similarity.
* Implement two answering strategies:
  * a naive answerer that ignores the documentation (simulating a model with no retrieval), and
  * a RAG answerer that first retrieves relevant passages and then uses them as context.

To keep the notebook self-contained, we will not call any external LLM API.
Instead, we implement a small, deterministic answer generator that stitches together sentences from the retrieved passages. In a real system, you would replace that component with a call to a general-purpose language model.

In [1]:
# If you run this notebook on Colab or a fresh environment, you may need:
# !pip install scikit-learn numpy

from typing import List, Dict, Any
import re
import textwrap

import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
documents: List[Dict[str, str]] = [
    {
        "id": "guide_sync",
        "title": "Syncing notes across devices",
        "text": """LumaNote automatically synchronizes your notes across all signed-in devices.
To enable sync, sign in with the same account on each device and keep the "Cloud sync" toggle turned on in Settings → Sync.
Sync uses end-to-end encryption. LumaNote servers cannot read your note contents.
Sync runs every few seconds when you are online. Large attachments may take longer to upload."""
    },
    {
        "id": "guide_offline",
        "title": "Offline mode and local cache",
        "text": """When you lose network connectivity, LumaNote switches to offline mode.
You can continue creating and editing notes while offline. Changes are stored in a local cache.
When the connection is restored, LumaNote reconciles offline changes and uploads them to the cloud.
If the same note was edited on two devices while offline, you will be asked which version to keep."""
    },
    {
        "id": "guide_sharing",
        "title": "Sharing notes with teammates",
        "text": """You can share a note with teammates by clicking the Share button in the top-right corner.
Add teammates by email address and choose whether they can view or edit.
Shared notes show an avatar for each active collaborator. Changes appear in real time for everyone."""
    },
]

len(documents), [doc["title"] for doc in documents]

(3,
 ['Syncing notes across devices',
  'Offline mode and local cache',
  'Sharing notes with teammates'])

In [3]:
# Build a TF–IDF vector index for the document texts
corpus = [doc["text"] for doc in documents]
vectorizer = TfidfVectorizer(stop_words="english")
doc_vectors = vectorizer.fit_transform(corpus)


def retrieve(query: str, k: int = 3) -> List[Dict[str, Any]]:
    """Retrieve the top-k most similar documents to the query.

    This behaves like a very small vector store: it encodes the query and
    documents with TF–IDF and ranks them using cosine similarity.
    """
    query_vec = vectorizer.transform([query])
    similarities = cosine_similarity(query_vec, doc_vectors)[0]
    top_indices = np.argsort(similarities)[::-1][:k]

    results: List[Dict[str, Any]] = []
    for idx in top_indices:
        results.append(
            {
                "id": documents[idx]["id"],
                "title": documents[idx]["title"],
                "score": float(similarities[idx]),
                "text": documents[idx]["text"],
            }
        )
    return results


# Quick smoke test of retrieval
for r in retrieve("sync notes between devices"):
    print(f"{r['title']}  (score={r['score']:.3f})")

Syncing notes across devices  (score=0.575)
Offline mode and local cache  (score=0.105)
Sharing notes with teammates  (score=0.049)


In [4]:
def naive_answer(question: str) -> str:
    """Simulate a model that answers based only on general intuition.

    This function deliberately ignores the documentation. In a real scenario,
    this is similar to prompting an LLM without giving it any external context.
    """
    q = question.lower()
    if "sync" in q or "synchron" in q:
        return (
            "LumaNote can probably synchronize your notes by connecting to the cloud. "
            "Look for a sync option in the app's settings and make sure you are signed "
            "in on each device."
        )
    if "offline" in q:
        return (
            "Many note-taking apps allow you to keep working offline and upload your "
            "changes when the connection comes back. LumaNote likely does something similar."
        )
    if "share" in q or "teammate" in q or "collaborat" in q:
        return (
            "You can usually share notes with teammates via a share button and by "
            "adding their email addresses as collaborators."
        )
    return (
        "I do not have specific documentation here, but based on common product behavior "
        "I would expect LumaNote to support this in its settings or note options."
    )


# Example of the naive answerer
question = "How can I sync my notes between my phone and laptop?"
print("Question:", question)
print("\n[Naive answer without retrieval]")
print(naive_answer(question))

Question: How can I sync my notes between my phone and laptop?

[Naive answer without retrieval]
LumaNote can probably synchronize your notes by connecting to the cloud. Look for a sync option in the app's settings and make sure you are signed in on each device.


In [5]:
def generate_answer_from_context(question: str, context_docs: List[Dict[str, Any]]) -> str:
    """Create an answer by selecting sentences from retrieved documents.

    This is a stand-in for an LLM that has been given both the question and
    the retrieved passages as context.
    """
    if not context_docs:
        return "I could not find information about this question in the available documents."

    # Tokenize the question into lowercase words and drop very short terms
    query_terms = [w.lower() for w in re.findall(r"\w+", question) if len(w) > 3]
    query_term_set = set(query_terms)

    # Split documents into sentences
    sentences: List[str] = []
    for doc in context_docs:
        for sent in re.split(r"(?<=[.!?])\s+", doc["text"].strip()):
            sent = sent.strip()
            if sent:
                sentences.append(sent)

    # Score each sentence by overlap with query terms
    scored = []
    for sent in sentences:
        sent_terms = set(w.lower() for w in re.findall(r"\w+", sent))
        overlap = len(sent_terms & query_term_set)
        if overlap > 0:
            scored.append((overlap, sent))

    if not scored:
        return "I could not find information about this question in the available documents."

    scored.sort(reverse=True, key=lambda x: x[0])
    top_sentences = [s for _, s in scored[:3]]

    # Compose a short answer grounded in the retrieved sentences
    return " ".join(top_sentences)

In [6]:
def rag_answer(question: str, k: int = 3, show_sources: bool = True) -> str:
    """Answer a question using a simple RAG pipeline.

    1. Retrieve top-k relevant documents.
    2. Generate an answer using those documents as context.
    3. Optionally print the sources used.
    """
    retrieved_docs = retrieve(question, k=k)

    if show_sources:
        print("[Retrieved passages]")
        for d in retrieved_docs:
            snippet = textwrap.shorten(d["text"], width=100, placeholder="…")
            print(f"- {d['title']} (score={d['score']:.3f}): {snippet}")
        print()

    answer = generate_answer_from_context(question, retrieved_docs)
    return answer

In [7]:
def compare_answers(question: str) -> None:
    print("Question:", question)
    print("\n--- Naive answer (no retrieval) ---")
    print(naive_answer(question))

    print("\n--- RAG answer (with retrieval) ---")
    print(rag_answer(question))


# Try a few questions
compare_answers("How can I sync my notes between my phone and laptop?")

print("\n" + "="*80 + "\n")

compare_answers("What happens to my notes when I go offline?")

print("\n" + "="*80 + "\n")

compare_answers("How do I share a note with my teammates?")

Question: How can I sync my notes between my phone and laptop?

--- Naive answer (no retrieval) ---
LumaNote can probably synchronize your notes by connecting to the cloud. Look for a sync option in the app's settings and make sure you are signed in on each device.

--- RAG answer (with retrieval) ---
[Retrieved passages]
- Syncing notes across devices (score=0.620): LumaNote automatically synchronizes your notes across all signed-in devices. To enable sync, sign…
- Sharing notes with teammates (score=0.059): You can share a note with teammates by clicking the Share button in the top-right corner. Add…
- Offline mode and local cache (score=0.047): When you lose network connectivity, LumaNote switches to offline mode. You can continue creating…

LumaNote automatically synchronizes your notes across all signed-in devices. To enable sync, sign in with the same account on each device and keep the "Cloud sync" toggle turned on in Settings → Sync. Sync uses end-to-end encryption.


Question: