A dark-first React dashboard paired with a FastAPI scaffold for OCR-driven dialogue capture, relay-model cleanup, and local persona-based TTS playback.
The app includes:
- snipping-tool-style drawable capture zones
- multi-display zone targeting
- per-game profile management
- character vault with editable archetypes
- mood and traits controls
- relay AI settings for local models and API keys
- live OCR preview, relay preview, and TTS status indicators
The browser client connects to ws://localhost:8000/ws by default. To override that, set VITE_OVERLAY_SOCKET_URL.
For REST settings and profile calls, the default backend base URL is http://localhost:8000 and can be overridden with VITE_BACKEND_HTTP_URL.
The backend/ folder contains a FastAPI service with:
GET /healthfor service and dependency statusGET /screensfor multi-monitor capture metadataGET /profiles,POST /profiles,DELETE /profiles/{name}for profile storageGET /settings,PUT /settingsfor local-model and API-key configurationPOST /ttsfor local TTS requestsWS /wsfor bidirectional zone updates, OCR text events, relay events, status updates, and audio bytes
The speech pipeline is now:
- OCR extracts nameplate + dialog text
- a relay AI cleans OCR mistakes and rewrites the line into a more natural speakable sentence
- the TTS engine receives that cleaned line along with persona prompt context
Relay providers currently supported:
heuristic— fully local fallback cleanup without any external modelopenai-compatible— works with local or remote chat-completions endpoints, including local OpenAI-compatible model servers
The TTS engine remains local Qwen3-TTS in the current scaffold, with saved settings for model name/path.
Install the packages listed in backend/requirements.txt:
- fastapi
- uvicorn[standard]
- websockets
- pydantic
- numpy
- mss
- paddleocr
- qwen3_tts
- Create a Python environment for the
backend/folder. - Install the requirements from
backend/requirements.txt. - Start the FastAPI app with Uvicorn, targeting
backend.main:appon port8000.
If mss, PaddleOCR, Qwen3-TTS, or your relay model endpoint are not ready yet, the scaffold still starts and falls back to demo OCR text plus heuristic relay cleanup and silent WAV output so the UI flow can be tested safely.
- Screen capture uses
mssonly. - No memory reads or injected hooks are used in the backend scaffold.
- Zones are stored as percentages and converted to pixels per selected monitor at runtime.
- API keys are stored in
backend/settings.jsonand are not returned to the frontend once saved; the UI only receives whether a key is already configured.
Archetypes can be added in two ways:
- through the Character Vault UI in the frontend
- by editing
backend/profiles.jsonand adding a new{ id, name, basePrompt }entry inside a profile
Implemented now:
- dark mode default theme
- multi-display overlay dashboard UI
- profile and archetype management
- relay AI settings with local model/API key support
- WebSocket client helper
- backend FastAPI scaffold with OCR -> relay -> TTS flow
- Vitest coverage for zone math and profile serialization
Still recommended next:
- tune PaddleOCR preprocessing for your target games
- connect the saved Qwen3-TTS model path directly into your exact local loader contract
- add alias-based fuzzy persona matching beyond exact nameplate matches