Skip to content

BearddOddity/VoxLens

Repository files navigation

VoxLens

A dark-first React dashboard paired with a FastAPI scaffold for OCR-driven dialogue capture, relay-model cleanup, and local persona-based TTS playback.

Frontend

The app includes:

  • snipping-tool-style drawable capture zones
  • multi-display zone targeting
  • per-game profile management
  • character vault with editable archetypes
  • mood and traits controls
  • relay AI settings for local models and API keys
  • live OCR preview, relay preview, and TTS status indicators

The browser client connects to ws://localhost:8000/ws by default. To override that, set VITE_OVERLAY_SOCKET_URL. For REST settings and profile calls, the default backend base URL is http://localhost:8000 and can be overridden with VITE_BACKEND_HTTP_URL.

Backend scaffold

The backend/ folder contains a FastAPI service with:

  • GET /health for service and dependency status
  • GET /screens for multi-monitor capture metadata
  • GET /profiles, POST /profiles, DELETE /profiles/{name} for profile storage
  • GET /settings, PUT /settings for local-model and API-key configuration
  • POST /tts for local TTS requests
  • WS /ws for bidirectional zone updates, OCR text events, relay events, status updates, and audio bytes

Dual AI flow

The speech pipeline is now:

  1. OCR extracts nameplate + dialog text
  2. a relay AI cleans OCR mistakes and rewrites the line into a more natural speakable sentence
  3. the TTS engine receives that cleaned line along with persona prompt context

Relay providers currently supported:

  • heuristic — fully local fallback cleanup without any external model
  • openai-compatible — works with local or remote chat-completions endpoints, including local OpenAI-compatible model servers

The TTS engine remains local Qwen3-TTS in the current scaffold, with saved settings for model name/path.

Python packages

Install the packages listed in backend/requirements.txt:

  • fastapi
  • uvicorn[standard]
  • websockets
  • pydantic
  • numpy
  • mss
  • paddleocr
  • qwen3_tts

Running the backend

  1. Create a Python environment for the backend/ folder.
  2. Install the requirements from backend/requirements.txt.
  3. Start the FastAPI app with Uvicorn, targeting backend.main:app on port 8000.

If mss, PaddleOCR, Qwen3-TTS, or your relay model endpoint are not ready yet, the scaffold still starts and falls back to demo OCR text plus heuristic relay cleanup and silent WAV output so the UI flow can be tested safely.

Notes on capture and safety

  • Screen capture uses mss only.
  • No memory reads or injected hooks are used in the backend scaffold.
  • Zones are stored as percentages and converted to pixels per selected monitor at runtime.
  • API keys are stored in backend/settings.json and are not returned to the frontend once saved; the UI only receives whether a key is already configured.

Adding archetypes

Archetypes can be added in two ways:

  • through the Character Vault UI in the frontend
  • by editing backend/profiles.json and adding a new { id, name, basePrompt } entry inside a profile

Current status

Implemented now:

  • dark mode default theme
  • multi-display overlay dashboard UI
  • profile and archetype management
  • relay AI settings with local model/API key support
  • WebSocket client helper
  • backend FastAPI scaffold with OCR -> relay -> TTS flow
  • Vitest coverage for zone math and profile serialization

Still recommended next:

  • tune PaddleOCR preprocessing for your target games
  • connect the saved Qwen3-TTS model path directly into your exact local loader contract
  • add alias-based fuzzy persona matching beyond exact nameplate matches

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors