| title | RAG Chatbot |
|---|---|
| emoji | 🤖 |
| colorFrom | blue |
| colorTo | purple |
| sdk | docker |
| app_port | 7860 |
| pinned | false |
A question-answering system that lets you upload documents and ask questions about them. The system retrieves relevant information from your documents and generates accurate answers.
1. Upload File (PDF/DOCX/TXT)
↓
2. Extract Text
↓
3. Split into Chunks (512 tokens each)
↓
4. Convert to Embeddings (384D vectors)
↓
5. Store in Vector Database (Qdrant)
↓
6. Save Metadata in MongoDB
What happens: Your document is broken into small chunks, each chunk is converted into a numerical vector that captures its meaning, and stored in a database for fast searching.
1. Type Your Question
↓
2. Check Cache (answered before?)
↓
3. Search Documents (if RAG is ON)
- BM25: Find keyword matches
- Vector: Find similar meanings
↓
4. Rerank Results (pick top 5 most relevant)
↓
5. Build Context from Chunks
↓
6. Generate Answer with LLM
↓
7. Stream Response to You
What happens: The system searches for relevant chunks from your documents, combines them as context, and uses an AI model to generate an answer based on that context.
DocumentProcessor - Main coordinator for document uploads
- Validates file type and size
- Calls the right loader for PDF, DOCX, or TXT files
- Manages the entire processing pipeline
Embedder - Converts text to vectors
- Uses FastEmbed with BAAI/bge-small-en-v1.5 model
- Generates 384-dimensional vectors for semantic search
- Each chunk becomes a searchable vector
Qdrant Vector Store - Stores embeddings
- Fast similarity search across millions of vectors
- Returns most relevant chunks for any query
- Handles all vector operations
HybridRetriever - Finds relevant information
- BM25: Traditional keyword search (good for exact matches)
- Vector Search: Semantic search (understands meaning)
- Combines both for better results
Reranker - Improves search quality
- Uses FlashRank model to score relevance
- Filters the best 5 chunks from 20 candidates
- Ensures only the most relevant context is used
Generator - Creates answers
- Uses Groq LLM (llama-3.1-70b)
- Streams responses in real-time
- Bases answers on retrieved context when RAG is ON
- Uses general knowledge when RAG is OFF
Semantic Cache - Speeds up responses
- Remembers previous questions and answers
- Returns cached response if same question asked again
- Separate caches for RAG ON vs RAG OFF
Conversation Memory - Remembers chat history
- Stores last 10 messages in Redis
- Enables follow-up questions
- Each session has independent history
MongoDB - Document metadata
- Tracks uploaded documents
- Stores file info, upload time, chunk count
- Links to vectors in Qdrant
Redis - Fast caching
- Stores conversation history
- Caches LLM responses
- In-memory for instant access
- LangChain 0.3.13 - RAG framework
- Groq API - Fast LLM (llama-3.1-70b)
- FastEmbed - Embedding generation
- FlashRank - Result reranking
- Qdrant - Vector database
- MongoDB - Document storage
- Redis - Caching layer
- FastAPI - Web framework
# Clone and install
git clone https://github.com/Abeshith/RAG.git
cd RAG
pip install -r requirements.txtCreate .env file:
GROQ_API_KEY=your_groq_key
MONGODB_URI=your_mongodb_uri
REDIS_URL=your_redis_url
QDRANT_URL=your_qdrant_url
QDRANT_API_KEY=your_qdrant_key
JWT_SECRET_KEY=your_secret_keyuvicorn app.main:app --host 0.0.0.0 --port 7860Open: http://localhost:7860
- Upload Documents: Click upload, select PDF/DOCX/TXT file
- Ask Questions: Type question in chat box
- Toggle RAG:
- ON = answers from your documents
- OFF = general knowledge answers
- View Sources: See which document chunks were used
GET /health/ - Check system status
POST /chat/stream - Send question, get streaming answer
POST /documents/upload - Upload new document
GET /documents/ - List all documents
GET /documents/stats - Get document statistics
DELETE /documents/{id} - Delete specific document
docker build -t rag-chatbot .
docker run -p 7860:7860 --env-file .env rag-chatbot