# Retrieval-augmented generation (RAG)

This notebook illustrates a basic example of retrieval-augmented generation (RAG) using an OpenAI model. The goal is to show, step by step, how a model can ground its answers in external knowledge instead of relying only on what it "remembers" from pre-training. We will:

* Build a tiny in-memory documentation corpus for a fictional note-taking app called *LumaNote*.
* Implement a simple vector-based retriever using TF-IDF and cosine similarity.
* Call an OpenAI model in two ways:
  * A _plain_ call, where the model does not see the documentation, and
  * A _RAG call, where the model first sees relevant passages retrieved from the documentation.

> **Note**: To run the notebook, you need an `OPENAI_API_KEY` environment variable. If this env is not defined, you will be asked for your OpenAI API key (it will not be stored in the notebook).


In [None]:
# If you run this notebook in a fresh environment, you may need:
# !pip install openai scikit-learn numpy

import os
from typing import List, Dict, Any
import textwrap

import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

from getpass import getpass
from textwrap import dedent
from openai import OpenAI

# Ask for the API key interactively if not defined as env variable
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

client = OpenAI()

Enter your OpenAI API key: ··········


In [None]:
# A small "product documentation" set for a fictional app called LumaNote.

documents: List[Dict[str, str]] = [
    {
        "id": "guide_sync",
        "title": "Syncing notes across devices",
        "text": """LumaNote automatically synchronizes your notes across all signed-in devices.
To enable sync, sign in with the same account on each device and keep the "Cloud sync" toggle turned on in Settings → Sync.
Sync uses end-to-end encryption. LumaNote servers cannot read your note contents.
Sync runs every few seconds when you are online. Large attachments may take longer to upload."""
    },
    {
        "id": "guide_offline",
        "title": "Offline mode and local cache",
        "text": """When you lose network connectivity, LumaNote switches to offline mode.
You can continue creating and editing notes while offline. Changes are stored in a local cache.
When the connection is restored, LumaNote reconciles offline changes and uploads them to the cloud.
If the same note was edited on two devices while offline, you will be asked which version to keep."""
    },
    {
        "id": "guide_sharing",
        "title": "Sharing notes with teammates",
        "text": """You can share a note with teammates by clicking the Share button in the top-right corner.
Add teammates by email address and choose whether they can view or edit.
Shared notes show an avatar for each active collaborator. Changes appear in real time for everyone."""
    },
]

len(documents), [doc["title"] for doc in documents]

(3,
 ['Syncing notes across devices',
  'Offline mode and local cache',
  'Sharing notes with teammates'])

In [None]:
# Build a TF–IDF vector index for the document texts.
# This plays the role of a very small vector store.

corpus = [doc["text"] for doc in documents]
vectorizer = TfidfVectorizer(stop_words="english")
doc_vectors = vectorizer.fit_transform(corpus)


def retrieve(query: str, k: int = 3) -> List[Dict[str, Any]]:
    """Retrieve the top-k most similar documents to the query.

    The query and documents are represented with TF–IDF vectors and ranked
    using cosine similarity.
    """
    query_vec = vectorizer.transform([query])
    similarities = cosine_similarity(query_vec, doc_vectors)[0]
    top_indices = np.argsort(similarities)[::-1][:k]

    results: List[Dict[str, Any]] = []
    for idx in top_indices:
        results.append(
            {
                "id": documents[idx]["id"],
                "title": documents[idx]["title"],
                "score": float(similarities[idx]),
                "text": documents[idx]["text"],
            }
        )
    return results


# Quick smoke test of retrieval
for r in retrieve("sync notes between devices"):
    print(f"{r['title']}  (score={r['score']:.3f})")

Syncing notes across devices  (score=0.575)
Offline mode and local cache  (score=0.105)
Sharing notes with teammates  (score=0.049)


In [None]:
def build_context(docs: List[Dict[str, Any]]) -> str:
    """Format retrieved documents into a context block for the model."""
    parts = []
    for i, d in enumerate(docs, start=1):
        parts.append(f"[Document {i}: {d['title']}]\n{d['text'].strip()}")
    return "\n\n".join(parts)

In [None]:
def plain_llm_answer(question: str, model: str = "gpt-4.1-mini") -> str:
    """Call the model without giving it any explicit documentation.

    The prompt only tells the model that the question is about the LumaNote app,
    but does not include the actual product docs.
    """
    prompt = (
        "You are a helpful assistant answering questions about the note-taking app LumaNote.\n"
        "Answer the user's question as clearly as you can. If you are not sure, say that you do not know.\n\n"
        f"User question: {question}"
    )

    response = client.responses.create(
        model=model,
        input=prompt,
    )

    return response.output[0].content[0].text


# Example: run a plain model call
question = "How can I sync my notes between my phone and laptop?"
print("Question:", question)
print("\n[Plain model answer (no retrieval)]")
print(plain_llm_answer(question))

Question: How can I sync my notes between my phone and laptop?

[Plain model answer (no retrieval)]
To sync your notes between your phone and laptop in LumaNote, follow these steps:

1. **Create a LumaNote account** (if you haven't already):
   - Open LumaNote on your phone and laptop.
   - Sign up or log in using the same account credentials on both devices.

2. **Enable Sync**:
   - On each device, go to the app settings.
   - Look for the sync or cloud sync option.
   - Make sure syncing is turned on.

3. **Automatic Sync**:
   - LumaNote will automatically sync your notes to the cloud when connected to the internet.
   - Any changes made on one device will update on the other when both are online.

If you experience any syncing issues, ensure both devices are connected to the internet and signed into the same account.

If these steps do not work or the app has no built-in sync function, you may need to export/import notes manually or use a cloud storage service supported by LumaNot

In [None]:
def rag_answer(question: str, k: int = 3, model: str = "gpt-4.1-mini", show_sources: bool = True) -> str:
    """Answer a question using a simple RAG pipeline.

    1. Retrieve top-k relevant documents from the local knowledge base.
    2. Format them as context.
    3. Call the model with a prompt that includes both the context and the question.
    """
    retrieved_docs = retrieve(question, k=k)
    context = build_context(retrieved_docs)

    if show_sources:
        print("[Retrieved passages]")
        for d in retrieved_docs:
            snippet = textwrap.shorten(d["text"], width=100, placeholder="…")
            print(f"- {d['title']} (score={d['score']:.3f}): {snippet}")
        print()

    prompt = (
        "You are a helpful assistant answering questions about the note-taking app LumaNote.\n"
        "Use ONLY the information in the documentation below to answer the question.\n"
        "If the answer is not contained in the documentation, say that you do not know.\n\n"
        "Documentation:\n"
        f"{context}\n\n"
        f"User question: {question}"
    )

    response = client.responses.create(
        model=model,
        input=prompt,
    )

    return response.output[0].content[0].text

In [None]:
def compare_plain_vs_rag(question: str) -> None:
    print("Question:", question)
    print("\n--- Plain model (no retrieval) ---")
    print(plain_llm_answer(question))

    print("\n--- RAG (with retrieved documentation) ---")
    print(rag_answer(question))


# Try a few questions
compare_plain_vs_rag("How can I sync my notes between my phone and laptop?")

print("\n" + "="*80 + "\n")

compare_plain_vs_rag("What happens to my notes when I go offline?")

print("\n" + "="*80 + "\n")

compare_plain_vs_rag("How do I share a note with my teammates?")

Question: How can I sync my notes between my phone and laptop?

--- Plain model (no retrieval) ---
To sync your notes between your phone and laptop in LumaNote, you need to use the cloud sync feature. Here’s how you can do it:

1. **Create a LumaNote account or log in:**  
   Make sure you are logged into the same LumaNote account on both your phone and laptop.

2. **Enable cloud sync:**  
   In the app settings on both devices, enable the cloud sync option. This usually involves turning on synchronization to LumaNote's cloud server.

3. **Sync your notes:**  
   Once cloud sync is enabled, your notes will automatically upload from one device and download to the other, keeping them updated across your phone and laptop.

If you do not see an option for cloud sync or account login, ensure you have the latest version of LumaNote installed. If the app uses third-party services like Google Drive or Dropbox, connect the same cloud storage account to both devices.

If you need detailed steps 