# 📘 RAG-Based AI Tutor Project

---

## ✅ Final Goal

Build a Retrieval-Augmented Generation (RAG)-based AI Tutor Chatbot that can:

1. 📚 Let users select from a list of preloaded PDF textbooks and ask questions about them.
2. 📝 Allow users to upload their own notes/texts and receive:
   - Summaries
   - Explanations
   - Definitions
   - Custom query-based responses

---

## 🔧 Jupyter Notebook Development Plan

---

### 🔹 Day 1: Setup & PDF Ingestion

- 📁 Create project structure:
- project/
- ├── pdfs/
- ├── data/
- ├── chunks/
- └── notebook.ipynb

- ✅ Install & import necessary libraries:
- `PyMuPDF` (`fitz`)
- `langchain`
- `sentence-transformers`
- `faiss-cpu`
- `openai` or `transformers` (depending on LLM)

- 📄 Write a function to:
- Loop through all PDFs in `pdfs/`
- Extract and clean text using `PyMuPDF`

- ✂️ Chunk the text:
- Use LangChain’s `CharacterTextSplitter` or custom sliding window
- Store chunks as `.json` or `.txt` in `chunks/`

---

### 🔹 Day 2: Embedding and Vector Store

- 📌 Load all chunked documents
- 🧹 Clean/normalize text (e.g., remove whitespace, special characters)
- 🧠 Generate embeddings using:
- `all-MiniLM-L6-v2` from `sentence-transformers`

- 💾 Store embeddings in FAISS vector index
- ✅ Save the index to disk using `faiss.write_index()`

---

### 🔹 Day 3: RAG Pipeline with PDF QA

- 🤖 Choose LLM:
- Use `transformers` pipeline (e.g., `mistralai/Mistral-7B-Instruct-v0.2`)
- Or `OpenAI` API for GPT-3.5 / GPT-4 (if available)

- 🔄 Retrieval flow:
1. User question → embed question
2. FAISS similarity search → retrieve top-K relevant chunks
3. Concatenate retrieved chunks with prompt template
4. Pass to LLM and return the answer

- ✅ Test with real textbook questions

---

### 🔹 Day 4: Student Interface via Notebook

- 💬 Ask user:
> "Do you want to use a textbook or upload your own notes?"

- 📚 If "Textbook":
- Show available PDFs from `pdfs/`
- Let user pick one and ask questions

- 📤 If "Upload":
- Add file upload UI using Jupyter widgets or `ipywidgets`
- Process uploaded file → chunk → embed → store → retrieve → answer

---

### 🔹 Day 5: Add Extra Features (Optional)

- 🔍 Add chapter-wise summary feature using LLM
- 🧾 Implement “explain this paragraph” with copy/paste input
- ❓ Build a simple MCQ generator:
- Use prompt templates like:  
  > "Generate a multiple choice question from this paragraph: [...]"

---

💡 Tip: Save intermediate results (chunks, embeddings, indexes) so you don’t have to recompute everything each time!
