# Simple RAG from CSV (No LangChain/LlamaIndex)

This notebook demonstrates a minimal Retrieval-Augmented Generation (RAG) pipeline using:
- **CSV** as the knowledge source
- **OpenAI embeddings** (`text-embedding-3-small`) for retrieval
- **OpenAI chat model** (`gpt-4o-mini`) for final answer generation

The implementation avoids orchestration frameworks and uses only basic Python libraries (`pandas`, `numpy`, `openai`).

In [None]:
# If needed, uncomment and run:
# %pip install -q openai pandas numpy

In [1]:
import os
import numpy as np
import pandas as pd
from openai import OpenAI

# Make sure your API key is available:
# export OPENAI_API_KEY="your_key_here"
assert os.getenv("OPENAI_API_KEY"), "Please set OPENAI_API_KEY before running this notebook."

client = OpenAI()

## 1) Load CSV knowledge base

In [2]:
csv_path = "../data/sample_knowledge.csv"  # adjust if needed

kb_df = pd.read_csv(csv_path)
kb_df

Unnamed: 0,id,title,content
0,1,Refund Policy,Customers can request a refund within 30 days ...
1,2,Shipping Times,Standard shipping takes 3-5 business days in t...
2,3,Support Hours,Customer support is available Monday to Friday...
3,4,Account Security,Users should enable multi-factor authenticatio...
4,5,Subscription Cancellation,You can cancel your subscription any time from...


## 2) Build text chunks to embed

For simplicity, each row is one chunk.

In [3]:
kb_df["chunk_text"] = (
    "Title: " + kb_df["title"].astype(str) + "\n"
    + "Content: " + kb_df["content"].astype(str)
)

kb_df[["id", "chunk_text"]].head()

Unnamed: 0,id,chunk_text
0,1,Title: Refund Policy\nContent: Customers can r...
1,2,Title: Shipping Times\nContent: Standard shipp...
2,3,Title: Support Hours\nContent: Customer suppor...
3,4,Title: Account Security\nContent: Users should...
4,5,Title: Subscription Cancellation\nContent: You...


## 3) Create embeddings for all chunks

In [6]:
embedding_model = "text-embedding-3-small"


def get_embedding(text: str, model: str = embedding_model) -> np.ndarray:
    response = client.embeddings.create(model=model, input=text)
    return np.array(response.data[0].embedding, dtype=np.float32)

kb_df["embedding"] = kb_df["chunk_text"].apply(get_embedding)
print(f"Created {len(kb_df)} embeddings. Vector size: {kb_df['embedding'].iloc[0].shape[0]}")
print(kb_df["embedding"])

Created 5 embeddings. Vector size: 1536
0    [-0.02742195, 0.051342327, -0.0062121265, -0.0...
1    [-0.025481526, 0.027597789, 0.0594281, 0.00504...
2    [-0.03864596, 0.008250742, 0.06006369, 0.03402...
3    [0.065883756, 0.026073243, 0.028646111, 0.0490...
4    [0.041768648, 0.022653492, -0.0022432345, 0.01...
Name: embedding, dtype: object


## 4) Retrieve top-k relevant chunks for a user query

In [7]:
def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))


def retrieve(query: str, k: int = 3) -> pd.DataFrame:
    query_emb = get_embedding(query)
    scored = kb_df.copy()
    scored["score"] = scored["embedding"].apply(lambda emb: cosine_similarity(query_emb, emb))
    return scored.sort_values("score", ascending=False).head(k)

user_query = "How long does international shipping take?"
retrieved = retrieve(user_query, k=3)
retrieved[["id", "title", "score"]]

Unnamed: 0,id,title,score
1,2,Shipping Times,0.618575
2,3,Support Hours,0.217825
0,1,Refund Policy,0.205738


## 5) Build augmented prompt and generate answer

In [8]:
generation_model = "gpt-4o-mini"

context = "\n\n---\n\n".join(retrieved["chunk_text"].tolist())

system_prompt = (
    "You are a helpful assistant. Answer only from the provided context. "
    "If the answer is not in context, say you don't know."
)

user_prompt = f"""
Question: {user_query}

Context:
{context}
"""

response = client.chat.completions.create(
    model=generation_model,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ],
    temperature=0,
)

print(response.choices[0].message.content)

International shipping takes 7-14 business days.


## 6) Optional: wrap into one function

Use this to ask multiple questions after embeddings are built once.

In [9]:
def ask_rag(question: str, k: int = 3) -> str:
    top_docs = retrieve(question, k=k)
    context_text = "\n\n---\n\n".join(top_docs["chunk_text"].tolist())

    prompt = f"""
Question: {question}

Context:
{context_text}
"""

    answer = client.chat.completions.create(
        model=generation_model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt},
        ],
        temperature=0,
    )
    return answer.choices[0].message.content

ask_rag("When can I get a refund?")

'You can request a refund within 30 days of purchase if you provide a receipt.'

In [10]:
ask_rag("What are the support hours?")

'Customer support is available Monday to Friday, 9:00 AM to 6:00 PM Eastern Time.'

In [12]:
ask_rag("Where is the University of Oxford located?")

"I don't know."