v4.0.0 — Full MLX LLM stack
Summary LLM and vision OCR now run in-process via mlx-lm + mlx-vlm on Apple Silicon, unifying with CSM-1B TTS under one framework. Ollama removed — no external daemon needed.
What changed
OllamaConfig→LLMConfig; config keyollama:→llm:(old key auto-migrated)ollamadependency replaced bymlx-lm+mlx-vlm- +25–30% generation speed
- Model discovery scans HF cache instead of Ollama API
- Default model:
mlx-community/Qwen3.5-9B-4bit