# RAG Pipeline Notebook

This notebook shows a minimal, Colab-friendly pipeline to:

- Load text files from `blogs/`
- Chunk text
- Create embeddings with `sentence-transformers` (all-MiniLM-L6-v2)
- Build a FAISS index (or fallback)
- Ask two sample questions

(You can run the cells in order in Colab.)

In [1]:
# Cell 1: Install packages (uncomment in Colab)
# !pip install -q langchain==0.3.27 sentence-transformers faiss-cpu transformers torch streamlit



In [3]:
import os
os.getcwd()


'/content'

In [4]:
from google.colab import files
uploaded = files.upload()


Saving backend.py to backend (5).py


In [5]:
!unzip rag_test.zip -d ./rag_test


unzip:  cannot find or open rag_test.zip, rag_test.zip.zip or rag_test.zip.ZIP.


In [6]:
!ls


'app (1).py'	  'backend (3).py'   faiss_store	      'README (1).md'
 app.py		  'backend (4).py'   __pycache__	       README.md
'backend (1).py'  'backend (5).py'  'rag_pipeline (1).ipynb'   requirements.txt
'backend (2).py'   backend.py	     rag_pipeline.ipynb        sample_data


In [7]:
import sys
import os

# Add current folder to Python path
sys.path.append(os.getcwd())


In [8]:
from backend import ensure_index, get_answer


In [11]:
import sys
import os
import importlib

sys.path.append(os.getcwd())  # add current folder to path
import backend
importlib.reload(backend)      # reload the updated backend.py

from backend import ensure_index, get_answer


In [12]:
ensure_index()


True

In [14]:
from google.colab import files
uploaded = files.upload()  # select backend.py


In [None]:
import importlib
import backend
importlib.reload(backend)
from backend import ensure_index, get_answer


In [None]:
ensure_index()


In [None]:
from backend import ensure_index, get_answer


In [15]:
!rm -rf faiss_store/*


In [16]:
from backend import ensure_index, get_answer
ensure_index()


True

In [14]:
"""
backend.py - helper functions for the RAG app

This file creates an embedding + vector store from files in ./blogs,
saves a local index, and exposes get_answer(query) used by app.py.

It uses faiss + transformers if available, otherwise falls back to numpy.
"""

import os, pickle
from pathlib import Path
import numpy as np

BASE_DIR = Path.cwd()  # Current working directory in Colab

BLOG_DIR = BASE_DIR / "blogs"
STORE_DIR = BASE_DIR / "faiss_store"
STORE_DIR.mkdir(exist_ok=True)

# ----------------------------
# Helpers
# ----------------------------
def chunk_text(text, words_per_chunk=200, overlap=40):
    words = text.split()
    chunks = []
    i = 0
    while i < len(words):
        chunk = words[i:i+words_per_chunk]
        chunks.append(" ".join(chunk))
        i += words_per_chunk - overlap
    return chunks

def load_docs():
    docs = []
    for p in sorted(BLOG_DIR.glob("*.txt")):
        text = p.read_text(encoding="utf-8")
        chunks = chunk_text(text)
        for idx, c in enumerate(chunks):
            docs.append({"id": f"{p.name}_{idx}", "text": c, "source": str(p.name)})
    return docs

# ----------------------------
# Build / load index
# ----------------------------
def build_or_load_index(embedding_model_name="all-MiniLM-L6-v2"):
    docs = load_docs()
    emb_path = STORE_DIR / "embeddings.pkl"
    docs_path = STORE_DIR / "docs.pkl"
    idx_path = STORE_DIR / "faiss.index"

    # Load saved embeddings if they exist
    if emb_path.exists() and docs_path.exists():
        try:
            with open(docs_path, "rb") as f:
                docs = pickle.load(f)
            with open(emb_path, "rb") as f:
                embs = pickle.load(f)
            try:
                import faiss
                index = faiss.read_index(str(idx_path)) if idx_path.exists() else None
                return docs, embs, index
            except Exception:
                return docs, embs, None
        except Exception:
            pass  # rebuild

    # Create embeddings
    try:
        from sentence_transformers import SentenceTransformer
        model = SentenceTransformer(embedding_model_name)
        texts = [d["text"] for d in docs]
        embs = model.encode(texts, convert_to_numpy=True, show_progress_bar=False)
        if embs.ndim == 1:
            embs = embs.reshape(len(texts), -1)
    except Exception:
        embs = np.random.rand(len(docs), 384).astype("float32")

    # Save embeddings and docs
    with open(docs_path, "wb") as f:
        pickle.dump(docs, f)
    with open(emb_path, "wb") as f:
        pickle.dump(embs, f)

    # Try FAISS
    try:
        import faiss
        if embs.dtype != "float32":
            embs = embs.astype("float32")
        faiss.normalize_L2(embs)
        d = embs.shape[1]
        index = faiss.IndexFlatIP(d)
        index.add(embs)
        faiss.write_index(index, str(idx_path))
        return docs, embs, index
    except Exception:
        return docs, embs, None

# ----------------------------
# Retrieval
# ----------------------------
def retrieve(query, docs, embs, index=None, embedding_model_name="all-MiniLM-L6-v2", k=3):
    try:
        from sentence_transformers import SentenceTransformer
        model = SentenceTransformer(embedding_model_name)
        q_emb = model.encode([query], convert_to_numpy=True)
        if q_emb.ndim == 1:
            q_emb = q_emb.reshape(1, -1)
    except Exception:
        q_emb = np.random.rand(1, embs.shape[1]).astype("float32")

    if index is not None:
        import faiss
        v = q_emb.astype("float32")
        faiss.normalize_L2(v)
        D, I = index.search(v, k)
        results = [docs[idx] for idx in I[0] if idx < len(docs)]
        return results
    else:
        # numpy fallback
        def normalize(a):
            if a.ndim == 1:
              norms = np.linalg.norm(a)
            else:
              norms = np.linalg.norm(a, axis=1, keepdims=True)
            norms[norms == 0] = 1.0
            return a / norms

        emb_norm = normalize(embs.astype("float32"))
        qn = q_emb.astype("float32")
        if qn.ndim == 1:
            qn = qn.reshape(1, -1)
        qn_norm = qn / np.linalg.norm(qn, axis=1, keepdims=True)
        sims = emb_norm @ qn_norm.T
        top_idx = sims[:, 0].argsort()[::-1][:k]
        return [docs[i] for i in top_idx]

# ----------------------------
# Answer generation
# ----------------------------
def generate_answer(context, query, model_name="gpt2", max_new_tokens=150):
    try:
        from transformers import pipeline
        import torch
        device = 0 if torch.cuda.is_available() else -1
        pipe = pipeline(
            "text-generation",
            model=model_name,
            device=device,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            top_p=0.9
        )
        prompt = f"""You are a helpful assistant. Use the context below to answer the user's question.

Context:
{context}

Question:
{query}

Answer:"""
        out = pipe(prompt, max_new_tokens=max_new_tokens, do_sample=True)
        text = out[0]["generated_text"]
        if "Answer:" in text:
            return text.split("Answer:")[-1].strip()
        return text.strip()
    except Exception:
        return context.strip()

# ----------------------------
# Public helper
# ----------------------------
_index_built = False
_docs = None
_embs = None
_index = None

def ensure_index():
    global _index_built, _docs, _embs, _index
    if not _index_built:
        _docs, _embs, _index = build_or_load_index()
        _index_built = True
    return True

def get_answer(query, k=3):
    ensure_index()
    results = retrieve(query, _docs, _embs, _index, k=k)
    context = "\n\n---\n\n".join([r["text"] for r in results])
    answer = generate_answer(context, query)
    return answer, results

In [2]:
from backend import ensure_index, get_answer
ensure_index()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


True

In [4]:
from backend import ensure_index, get_answer
ensure_index()


True

In [5]:
from google.colab import files
uploaded = files.upload()  # select backend.py


Saving backend.py to backend (7).py


In [6]:
from backend import ensure_index, get_answer
ensure_index()


True

In [8]:
import shutil
from pathlib import Path

store_dir = Path.cwd() / "faiss_store"
shutil.rmtree(store_dir, ignore_errors=True)
store_dir.mkdir(exist_ok=True)


In [9]:
from backend import ensure_index, get_answer

# Force rebuild of index
ensure_index()


True

In [11]:
import shutil
from pathlib import Path

store_dir = Path.cwd() / "faiss_store"
shutil.rmtree(store_dir, ignore_errors=True)
store_dir.mkdir(exist_ok=True)
print("Old index cleared, fresh directory created.")


Old index cleared, fresh directory created.


In [12]:
from backend import ensure_index, get_answer

# Rebuild everything from scratch
ensure_index()
print("Index rebuilt successfully.")


Index rebuilt successfully.


In [15]:
questions = [
    'What is energy mastery?',
    'How do high performers avoid burnout?'
]

for q in questions:
    ans, srcs = get_answer(q)
    print('Q:', q)
    print('A:', ans[:600])  # first 600 characters
    print('Sources:', ', '.join([s['source'] for s in srcs]))
    print('\n---\n')

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Q: What is energy mastery?
A: Energy mastery is mastery over one's body and mind. Energy mastery is how you feel. Energy mastery is how you feel when you're feeling good or bad. Energy mastery is how you feel when you're feeling good or bad.

Energy mastery is how you feel when you're feeling good or bad. Energy mastery is how you feel when you're feeling good or bad.

Energy mastery is how you feel when you're feeling good or bad. Energy mastery is how you feel when you're feeling good or bad.

Energy mastery is how you feel when you're feeling good or bad. Energy mastery is how you feel when you're feeling good or bad.


Sources: 

---



Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Q: How do high performers avoid burnout?
A: Yes, we do.
Sources: 

---

