EchoTwin v0.1.0
First public release.
What works today
- One-on-one voice conversation in a cloned voice, sub-second mouth-to-ear (best 361 ms, typically 0.6–1.1 s measured live)
- Streaming ASR (sherpa-onnx zipformer) with speculative ASR/LLM execution, pre-opened TTS sockets, cached fillers
- Barge-in, tool calls (time/date/weather), hot-swappable personas with per-persona Fish Audio TTS tuning
- Per-turn cost ledger with daily/monthly budget caps
- Bilingual prompt system: persona-level
language: zh|enswitches every LLM-facing prompt
Experimental
- Organic multi-party mode: three-layer addressee pipeline (table-lookup reflexes → LLM arbiter with room context → golden-set-tested heuristics)
See docs/SETUP.md to go from zero to a talking bot in ~15 minutes.