A personal project I built to explore how RAG (Retrieval-Augmented Generation) pipelines work under the hood. The idea was simple — I wanted to be able to chat with my own PDF documents without sending everything to OpenAI and paying per token.
Upload any PDF through the UI or API, and DocuBrain will extract the text, chunk it intelligently, generate vector embeddings locally, and let you ask natural language questions about the document content.
I was curious about how tools like ChatPDF actually work internally. Instead of following a tutorial, I tried to build it from scratch — figuring out chunking strategies, why overlapping chunks matter, and how vector similarity search actually retrieves the right context. The biggest challenge was getting the embedding + retrieval pipeline to feel responsive without burning API credits.
- Backend: Node.js, Express.js
- Frontend: Vanilla JS/CSS (Dark Mode UI served from Express)
- File Handling: Multer, pdf-parse
- Embeddings: Xenova Transformers (all-MiniLM-L6-v2) — runs locally on CPU, zero API cost
- Vector Storage: MongoDB Atlas Vector Search
- Text Splitting: LangChain RecursiveCharacterTextSplitter
- LLM: Groq API (Llama-3.3-70B)
- Upload a PDF via the UI or the
/uploadendpoint. - Text is extracted and split into overlapping chunks (500 chars, 50 char overlap).
- Each chunk is embedded locally using Xenova — no external API needed.
- Embeddings are stored in MongoDB Atlas with a Vector Search index enabled.
- On chat, the top 3 most relevant chunks are retrieved via cosine similarity.
- Retrieved context is passed to Groq's Llama-3.3-70B model to generate the final answer.
DocuBrain includes a built-in Dark Mode UI served directly from the Express backend, but can also be consumed as a REST API.
POST /upload
- Form Data:
pdfFile(PDF file) - Returns: Total chunks created and a preview of the extracted text.
POST /chat
- Body:
{ "query": "your question here" } - Returns: AI-generated answer grounded in the retrieved document context, plus the source chunks.
- Why chunk overlap matters for context preservation across splits
- How cosine similarity search works in practice with MongoDB Atlas
- The tradeoff between local embeddings (slow but free) vs API embeddings (fast but costly)
- How to keep LLM responses grounded using retrieved context instead of model memory
git clone https://github.com/P-Suraj/docubrain.git
cd docubrain
npm install
# Create a .env file in the root directory
# Add: GROQ_API_KEY=your_key and MONGO_URI=your_mongodb_atlas_connection_string
npm run dev # Starts the server with nodemon
# Open http://localhost:5000 in your browser to view the UIBuilt by Suraj — 2nd year CSE @ Amrita Vishwa Vidyapeetham