A complete, local-first Retrieval-Augmented Generation (RAG) system. This project allows you to chat with your documents privately using local LLMs (via Ollama) or optional cloud-based models (via Groq).
- Local & Private: Runs entirely on your machine using Ollama for inference and local embeddings.
- Modern UI: Responsive React frontend with streaming responses and citation support.
- Easy Document Management: Upload, view, and delete knowledge sources (PDFs) via the UI.
- Vector Search: Powered by LlamaIndex for efficient retrieval.
The project is divided into two main components:
- Backend: FastAPI service using LlamaIndex to handle ingestion, indexing, and query processing.
- Frontend: React + Vite application providing the user interface.
Navigate to the backend folder and start the server:
cd backend
# Install dependencies
uv sync
# Make sure you have the ollama model pulled
ollama pull qwen2.5:7b
# Start the API server
uv run fastapi dev main.pyThe backend API will be available at http://localhost:8000.
Open a new terminal, navigate to the frontend folder, and start the UI:
cd frontend
# Install dependencies
bun install # or npm install
# Start the development server
bun run dev # or npm run devThe frontend will typically run at http://localhost:5173.
- Open your browser to the frontend URL (e.g.,
http://localhost:5173). - Use the Knowledge Base section to upload a PDF document.
- Click "Rebuild Index" to process the documents.
- Start chatting! The system will use your documents to answer questions.
.
├── backend/ # Python FastAPI application
│ ├── main.py # Entry point
│ ├── src/ # Application source code
│ └── pyproject.toml # Python dependencies
├── frontend/ # React application
│ ├── src/ # Frontend source code
│ ├── components/ # UI Components
│ └── package.json # Node dependencies
└── README.md # This file
- Backend: Python, FastAPI, LlamaIndex, uv
- Frontend: React 19, TypeScript, Vite, Tailwind CSS, Bun
- AI/ML: Ollama (Local LLM), HuggingFace (Embeddings)

