Releases: AlanY1an/echotwin
EchoTwin v0.1.1
What's new
English deployments actually hear English now. The persona language field auto-selects the matching streaming-ASR model (Chinese-first bilingual zipformer for zh, the English zipformer for en) — previously full English speech was garbled by the bilingual model. Switching personas rebuilds the recognizers on the fly; asr.sherpa_stream.repo left empty means auto, an explicit repo still wins.
Admin commands work in guild channels. /persona-admin, /voice-admin and /admin no longer force you into a DM — still owner-only, replies are ephemeral. (Command contexts re-sync on bot restart; global propagation can take up to 1h.)
Smoother first run. scripts/download_models.sh now prefetches both streaming ASR models (zh + en, ~200 MB) and drops a dead ~900 MB prefetch; quick start gained the missing git clone step; CONTRIBUTING.md added.
Full diff: v0.1.0...v0.1.1
EchoTwin v0.1.0
First public release.
What works today
- One-on-one voice conversation in a cloned voice, sub-second mouth-to-ear (best 361 ms, typically 0.6–1.1 s measured live)
- Streaming ASR (sherpa-onnx zipformer) with speculative ASR/LLM execution, pre-opened TTS sockets, cached fillers
- Barge-in, tool calls (time/date/weather), hot-swappable personas with per-persona Fish Audio TTS tuning
- Per-turn cost ledger with daily/monthly budget caps
- Bilingual prompt system: persona-level
language: zh|enswitches every LLM-facing prompt
Experimental
- Organic multi-party mode: three-layer addressee pipeline (table-lookup reflexes → LLM arbiter with room context → golden-set-tested heuristics)
See docs/SETUP.md to go from zero to a talking bot in ~15 minutes.