# üß† Study Buddy ‚Äî Build Your Own RAG Chatbot with Gemini
Upload any PDF or text file (e.g., course notes, a Wikipedia export, or an article).

Ask questions like:
- ‚ÄúSummarize Chapter 2‚Äù
- ‚ÄúWhat is reinforcement learning?‚Äù
- ‚ÄúWhat‚Äôs the main takeaway from this section?‚Äù


In [2]:
# üß© Step 1: Install dependencies
!pip install -q google-generativeai PyPDF2 faiss-cpu

In [3]:
# üß† Step 2: Import libraries
import google.generativeai as genai
from getpass import getpass
import PyPDF2
import faiss
import numpy as np
import re

In [4]:
# ‚öôÔ∏è Step 3: Configure Gemini API
from google.colab import userdata
# GEMINI_API_KEY = getpass("üîë Enter your Gemini API key: ")
genai.configure(api_key=userdata.get('gemini_api_key'))

In [5]:
# üßæ Step 4: Upload your study material
from google.colab import files
uploaded = files.upload()

file_name = list(uploaded.keys())[0]
text = ""

if file_name.endswith(".pdf"):
    reader = PyPDF2.PdfReader(file_name)
    for page in reader.pages:
        text += page.extract_text() or ""
else:
    text = uploaded[file_name].decode("utf-8")

print(f"‚úÖ Loaded {len(text)} characters from {file_name}")

Saving supplement_set_theory.pdf to supplement_set_theory.pdf
‚úÖ Loaded 9161 characters from supplement_set_theory.pdf


In [6]:
# ü™Ñ Step 5: Split text into chunks
def split_text(text, chunk_size=1000, overlap=200):
    text = re.sub(r'\s+', ' ', text)
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start += chunk_size - overlap
    return chunks

chunks = split_text(text)
print(f"üìö Split into {len(chunks)} chunks")

üìö Split into 12 chunks


In [7]:
# üß© Step 6: Create embeddings and index
embed_model = "models/gemini-embedding-001"
embeddings = []

for chunk in chunks:
    result = genai.embed_content(model=embed_model, content=chunk)
    embeddings.append(result["embedding"])

embeddings = np.array(embeddings, dtype="float32")

index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)
print("‚úÖ Vector index built!")

‚úÖ Vector index built!


In [8]:
# üí¨ Step 7: Define RAG query function
def retrieve(query, k=3):
    q_embed = genai.embed_content(model=embed_model, content=query)["embedding"]
    _, idx = index.search(np.array([q_embed], dtype="float32"), k)
    return [chunks[i] for i in idx[0]]

def ask_study_buddy(query):
    docs = retrieve(query)
    context = "\n\n".join(docs)
    prompt = f"You are Study Buddy, a helpful assistant for learning.\nUse the context below to answer the question concisely and clearly.\n\nContext:\n{context}\n\nQuestion: {query}"
    model_name = "gemini-2.5-flash"
    model = genai.GenerativeModel(model_name)
    response = model.generate_content(prompt)
    return response.text

# üß™ Step 8: Try asking a question
question = "Explain set theory like I'm ten years old."
print(f"ü§î Q: {question}\n")
print("üí° A:", ask_study_buddy(question))

ü§î Q: Explain set theory like I'm ten years old.

üí° A: Imagine a **set** as a well-organized collection or group of things, like a special box where you put certain items. You know exactly what belongs in that box and what doesn't!

For example:
*   The set of all students in your class.
*   The set of all the numbers you count with (like 1, 2, 3, and so on).
*   The set whose only member is you!

The things *inside* the set are called its **members** or **elements**. If something, let's say 'x', is an element of a set 'A', we write it like this: x ‚àà A.

Some important sets have special names:
*   **N** is the set of all natural numbers (1, 2, 3, etc.).
*   **Z** is the set of all integers (which includes 0, positive numbers like 1, 2, and negative numbers like -1, -2).

You can describe a set by listing its members, like `{1, 2, 3, 4, 5}`. For longer lists, you might use dots, like `{1, 2, 3, ..., n}`. Or, you can describe it with a rule, like "the set of all even natural numbe