The document Q&A application using RAG. Upload documents, ask questions, get answers backed from the uploaded document content.
Built without LangChain or similar frameworks - all RAG components implemented from scratch using direct API calls to demonstrate understanding of core concepts:
- Custom document chunking logic
- Direct integration with Sentence Transformers for embeddings
- Manual vector similarity search with pgvector
- Direct LLM API calls (OpenAI, Ollama)
- Custom streaming response handling
- Switch between OpenAI (gpt-4o-mini) and Ollama (qwen2.5:7b) models
- Upload/delete documents (PDF, DOCX, TXT)
- Ask questions about uploaded documents
- Clear chat history
- Light/dark theme toggle
- Source citations for AI responses
Backend:
- FastAPI
- PostgreSQL + pgvector
- Sentence Transformers (embeddings)
- SQLAlchemy
Frontend:
- Vanilla JavaScript
- Server-Sent Events (SSE streaming responses)
AI:
- Ollama (local LLM)
- OpenAI API (optional)
Prerequisites:
- Docker & Docker Compose
- 8GB RAM (for Ollama)
- OpenAI API key
Install:
# Clone repo
git clone https://github.com/yourusername/rag-chatbot.git
cd rag-chatbot
# Create .env file
cp .env.example .env
# Edit .env and add your OpenAI API key.env configuration:
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=doc_rag_db
POSTGRES_HOST=db
POSTGRES_PORT=5432
OPENAI_MODEL=gpt-4o-mini
OPENAI_API_KEY=your-openai-api-key-hereStart services:
docker-compose build
docker-compose upAccess:
- UI: http://localhost:8000
- API docs: http://localhost:8000/docs
Note: First startup takes 10-15 minutes while Ollama downloads the model (~2.8GB)
- Go to Documents tab
- Upload your files
- Switch to Chat tab
- Ask questions
- View answers with source citations
View logs:
docker-compose logs -f apiAccess database:
docker-compose exec db psql -U postgres -d doc_rag_db- Documents split into chunks (~1200 chars, 150 char overlap)
- Each chunk converted to 384-dimensional vector
- Vectors stored in PostgreSQL with pgvector
- User question embedded and similar chunks retrieved
- LLM generates answer using retrieved chunks
- Response streamed back in real-time