This is an implementation of Simple "dumb" RAG. It just sees if the query is relevant to whats in the docs, otherwise it says "idk"

Overview

 1. load docs
 2. embed (sentance-transformer)
 3. store vectors (FAISS)
 4. retrieve
 5. augment prompt
 6. generate

---Boring stuff

1. Imports

pip install sentence-transformers faiss-cpu numpy python-dotenv openai

2. Load the .env

In [2]:
import os
from dotenv import load_dotenv

load_dotenv()
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")

if not OPENROUTER_API_KEY:
    raise ValueError("OPENROUTER_API_KEY missing in environment")

Docs

Small scale

In [3]:
documents = [
    "RAG stands for Retrieval-Augmented Generation.",
    "In RAG, documents are embedded into a vector space.",
    "FAISS is a library for efficient similarity search.",
    "Large Language Models can hallucinate without grounding.",
    "Embeddings capture semantic meaning of text.",
    "Ronaldo has played for Real Madrid, Manchester United, Juventus, Sporting Lisbon, Portugal and Al Nassr"
]

Chunking (no overlap)

In [4]:
def chunk(text, chunk_size=50):
  words = text.split()
  return [
      " ".join(words[i:i+chunk_size])
      for i in range(0,len(words), chunk_size)
  ]
chunks = []
for docs in documents:
  chunks.extend(chunk(docs))
print(chunks)

['RAG stands for Retrieval-Augmented Generation.', 'In RAG, documents are embedded into a vector space.', 'FAISS is a library for efficient similarity search.', 'Large Language Models can hallucinate without grounding.', 'Embeddings capture semantic meaning of text.', 'Ronaldo has played for Real Madrid, Manchester United, Juventus, Sporting Lisbon, Portugal and Al Nassr']


With overlap

In [6]:
'''
def chunk(text, chunk_size=5, overlap = 2):
    words = text.split()
    step = chunk_size - overlap
    return [
        " ".join(words[i:i+chunk_size])
        for i in range(0,len(words),step)
        ]
chunks = []
for doc in documents:
    chunks.extend(chunk(doc))
print(chunks)
'''

'\ndef chunk(text, chunk_size=5, overlap = 2):\n    words = text.split()\n    step = chunk_size - overlap\n    return [\n        " ".join(words[i:i+chunk_size])\n        for i in range(0,len(words),step)\n        ]\nchunks = []\nfor doc in documents:\n    chunks.extend(chunk(doc))\nprint(chunks)\n'

Embedding

In [7]:
from sentence_transformers import SentenceTransformer
import numpy as np

embedder = SentenceTransformer("all-MiniLM-L6-v2")

embeddings = embedder.encode(chunks)
embeddings = np.array(embeddings).astype("float32")

print(embeddings.shape)

  from .autonotebook import tqdm as notebook_tqdm
Loading weights: 100%|██████████| 103/103 [00:00<00:00, 792.89it/s, Materializing param=pooler.dense.weight]                             
[1mBertModel LOAD REPORT[0m from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


(6, 384)


Vector Store (FAISS)

In [8]:
import faiss

dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)

print("total vectors:", index.ntotal)

total vectors: 6


Retrieval

In [9]:
def retrieve(query, k=2):
    query_embedding = embedder.encode([query]).astype("float32")
    distances, indices = index.search(query_embedding, k)
    return [chunks[i] for i in indices[0]]

query = "Why are embeddings useful?"
retrieved_chunks = retrieve(query)

for c in retrieved_chunks:
    print("-", c)

- Embeddings capture semantic meaning of text.
- In RAG, documents are embedded into a vector space.


LLM Call

In [None]:
#Answering question related to content

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="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",
)

context = "\n".join(retrieved_chunks)

prompt = f"""
Answer the question using ONLY the context below.
If the answer is not in the context, say "I don't know".

Context:
{context}

Question:
{query} #"Why are embeddings useful?"
"""

response = client.chat.completions.create(
    model="nvidia/nemotron-3-nano-30b-a3b:free",
    messages=[{"role": "user", "content": prompt}],
)

print(response.choices[0].message.content)

Embeddings capture the semantic meaning of text, allowing documents to be represented as vectors in a vector space and thus making it possible to compare and retrieve content based on meaning.


In [13]:
## It says "idk" if question is not in the content

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=OPENROUTER_API_KEY,
)

context = "\n".join(retrieved_chunks)

prompt = f"""
Answer the question using ONLY the context below.
If the answer is not in the context, say "I don't know".

Context:
{context}

Question:
"How to make pizza" #THE QUESTION
"""

response = client.chat.completions.create(
    model="nvidia/nemotron-3-nano-30b-a3b:free",
    messages=[{"role": "user", "content": prompt}],
)

print(response.choices[0].message.content)

I don't know.
