RAG chat over PDF/DOCX documents. The user uploads a file, the system extracts the text, indexes it in ChromaDB and answers questions through Groq using the most relevant excerpts as context.
Access online demo: https://demo.mydoc.outleep.com.br
Chat com a LLM Interface principal de conversa com o modelo |
Upload para análise Tela usada para enviar arquivos e iniciar a análise do chat |
- One chat per document — every conversation starts with a
.pdfor.docxupload; the text is split into 4000-char chunks and stored in the vector database. - RAG via Groq — every user message triggers a similarity search and the top 4 chunks are injected as context for the LLM.
- 10-message memory kept in Redis (rolling window) plus a pinned document summary generated when the chat is created.
- Per-visitor isolation — backend-issued HttpOnly
visitor_idcookie (UUID v4); chats, messages and embeddings are scoped to the browser that owns them. - 12-hour TTL — chats are auto-purged from SQLite, Chroma and Redis. A yellow banner at the top warns the user.
- Per-visitor rate limit — 5 uploads/min and 30 messages/min via
Redis (fixed 60-second window, returns
429withRetry-After). - Basic CRUD — list chats, list messages (paginated), send message, delete chat (cascading purge: SQLite → Chroma → Redis).
- Retro UI — old-school OS window in soft purple/yellow/pink with beveled borders; pixelated rotating-square loader for "AI thinking" and "analyzing document" states.
| Layer | Tech | Role |
|---|---|---|
| API | FastAPI + Uvicorn | Routing, dependencies, middleware |
| LLM | Groq (llama-3.3-70b-versatile) |
Replies + initial summarization |
| Vector store | ChromaDB (HTTP) | Similarity search |
| Embeddings | sentence-transformers | all-MiniLM-L6-v2 (runs locally) |
| Chat memory | Redis | Rolling list + pinned context |
| Metadata | SQLite + SQLAlchemy + Alembic | chats table indexed by visitor |
| Parsing | pypdf, python-docx | Text extraction |
| Frontend | Vue 3 + Vite | SPA with retro theme |
| HTTP client | Axios (withCredentials) |
Sends the visitor cookie |
All routes live under /api.
| Method | Path | Description |
|---|---|---|
| GET | /api/health |
Liveness probe |
| GET | /api/chats |
List the visitor's chats (sweeps expired ones) |
| POST | /api/chats |
Create chat from upload (multipart/form) |
| GET | /api/chats/{id}/messages |
List messages (limit, offset) |
| POST | /api/chats/{id}/messages |
Send a new message (RAG + Groq) |
| DELETE | /api/chats/{id} |
Delete chat + embeddings + history |
Interactive docs: http://localhost:8000/docs.
.
├── alembic/ # SQLite migrations
├── compose.yaml # chroma + redis + api
├── Dockerfile.api # API image (Python 3.11 + uv)
├── package.json # frontend (kept at root to ease Docker)
├── pyproject.toml # backend
├── scripts/
│ ├── run_backend.sh
│ ├── run_frontend.sh
│ ├── run_full_stack_app.sh
│ └── docker_api_entrypoint.sh
└── src/
├── backend/
│ ├── api/ # routers, schemas, deps, middleware
│ ├── databases/ # engine, models, repos (sqlite + redis)
│ ├── integrations/
│ │ ├── llm_providers/ # Groq adapter (interface + impl)
│ │ └── vector_providers/ # Chroma adapter
│ ├── services/ # chat_service, document_parser, rate_limiter
│ ├── app.py # build_app() + lifespan
│ ├── main.py # uvicorn entrypoint
│ └── setup.py # config from .env
└── frontend/
├── components/ # AppShell, ChatWindow, NewChatForm, ...
├── composables/ # useApi, useChats, useMessages
├── styles/theme.scss
└── vite.config.js
- Docker + Docker Compose
- Node 18+ and uv (only if running outside Docker)
- A Groq token in
GROQ_TOKEN
cp .env.example .env
# edit .env and set GROQ_TOKENdocker compose up -d --build
docker compose logs -f api # follow boot + migrationsAPI available at http://localhost:8000.
For the frontend:
npm install
npm run dev # http://localhost:5173docker compose up -d chroma redis # infra only
uv sync
mkdir -p data && uv run alembic upgrade head
./scripts/run_full_stack_app.shnpm run build # outputs dist/frontend/
npm run preview # serves dist/ locallyVITE_API_BASE_URL controls the API URL baked into the bundle (defaults to /api).
Every variable is documented in .env.example. Highlights:
GROQ_TOKEN,GROQ_MODELCHROMA_HOST,CHROMA_PORT,CHROMA_COLLECTIONREDIS_HOST,REDIS_PORTSQLITE_PATHCHUNK_SIZE(4000),MEMORY_LIMIT(10),CHAT_TTL_SECONDS(43200)COOKIE_NAME,COOKIE_SECURE,COOKIE_SAMESITE,COOKIE_MAX_AGERATE_LIMIT_UPLOADS_PER_MIN(5),RATE_LIMIT_MESSAGES_PER_MIN(30)CORS_ORIGINS
- The TTL is fixed from creation — it is not refreshed by activity. Deliberate choice so the "12 hours" promise stays predictable.
- Expired-chat cleanup is lazy: it runs on
GET /api/chatsand on individual message calls. No cron, no worker. - The embedding model is downloaded by
sentence-transformerson first run and cached in theapi_hf_cachevolume.


