A powerful RAG system that allows you to query documents using local or cloud-based language models. This system supports both Ollama (local) and Google AI (cloud) models for generating answers based on your indexed documents.
- Document Indexing: Scrape and index web pages with configurable crawl depth
- File Upload: Upload and index PDF, TXT, and Markdown files directly
- Hybrid Search: Combines vector similarity (60%) with keyword matching (40%) for better results
- Conversational Memory: Chat with your documents with follow-up question support
- Multiple Model Support:
- Local models via Ollama (qwen3:4b, llama3.1:8b, mistral:7b, gemma2:9b, phi3:medium, qwen2.5:7b)
- Cloud models via Google AI (Gemini 2.5 Flash)
- Modern Chat Interface: Ask questions in a natural conversation flow
- Real-time Scraping Progress: Live progress bar, ETA, and logs during web scraping
- Markdown Rendering: Answers and search results display with proper formatting
- Dark Mode: Full dark theme support
- Glassmorphism UI: Modern, premium design with smooth animations
- Toast Notifications: Elegant feedback for all actions
- Python 3.8+
- Ollama installed and running (for local models)
- Google AI API key (for Gemini 2.5 Flash)
-
Clone the repository:
git clone https://github.com/yourusername/rag-system.git cd rag-system -
Install dependencies:
pip install -r requirements.txt
-
Start the application:
python app.py
-
Open your browser and navigate to
http://localhost:5000
Web Scraping:
- Go to "Add Source" → "Web Scraper" tab
- Enter a URL, optional name, crawl depth, and max pages
- Click "Start Scraping" and watch real-time progress
- Content is automatically chunked and indexed
File Upload:
- Go to "Add Source" → "File Upload" tab
- Drag & drop or click to upload PDF, TXT, or Markdown files
- Files are automatically processed and indexed
Semantic Search:
- Go to the "Search" section
- Enter your query and see results ranked by hybrid score (vector + keyword)
Chat with Documents:
- Use the Dashboard chat interface
- Ask questions and get AI-generated answers with source citations
- Ask follow-up questions - the system remembers context
- Click "New Chat" to start a fresh conversation
- Choose between Ollama (local) or Google AI (cloud)
- For Google AI, enter your API key
- Test the connection before saving
Create a .env file in the project root:
GOOGLE_API_KEY=your_google_ai_api_key
EMBEDDING_MODEL=google/embeddinggemma-300m
LLM_MODEL=qwen3:4brag-system/
├── app.py # Main Flask application
├── requirements.txt # Python dependencies
├── static/
│ ├── css/
│ │ └── style.css # Custom styles
│ └── js/
│ └── main.js # Frontend logic
├── templates/
│ └── index.html # Main application UI
└── data/
└── vectors.db # SQLite database with vector storage
| Endpoint | Method | Description |
|---|---|---|
/api/scrape |
POST | Scrape URL with SSE progress streaming |
/api/upload |
POST | Upload and index a file (PDF, TXT, MD) |
/api/search |
POST | Hybrid search (vector + keyword) |
/api/answer |
POST | Generate answer with conversational context |
/api/stats |
GET | Get document statistics |
/api/delete-source |
POST | Remove a source and its chunks |
/api/test-model |
POST | Test model connection |
- Flask - Web framework
- sentence-transformers - Text embeddings
- sqlite-vec - Vector storage in SQLite
- ollama - Local LLM interface
- google-generativeai - Gemini API
- beautifulsoup4 - Web scraping
- PyMuPDF - PDF text extraction
- marked.js - Markdown rendering (frontend)
- Ollama for providing easy access to local LLMs
- Google AI for the Gemini models
- Sentence Transformers for text embeddings
- SQLite Vector for efficient vector storage