0.1.3

Saganaki22 released this 13 Jun 00:42

· 5 commits to main since this release

643386b

ZONOS2 TTS ComfyUI v0.1.3

Performance

Added optimized single-token MoE expert dispatch.
Autoregressive generation now runs only the selected expert instead of scanning all 16 experts.
Measured approximately 1.6x–2x faster token generation on an RTX 5090.
Generated audio tokens remain identical to the previous implementation.
Multi-token prompt processing retains the original grouped dispatch path.

Voice Cloning

Changed clean_speaker_background default to false, matching upstream ZONOS2.
Improved reference-audio and accurate-mode tooltips.
Added clearer guidance about voice identity, accent, cadence, emotion, and prosody limitations.
Expanded troubleshooting recommendations for improving clone similarity and accent retention.

Documentation

Updated English and Chinese documentation.
Updated version badges to 0.1.3.
Documented the optimized MoE decoding path.
Added detailed reference-audio and sampling guidance.

Testing

Added top-1 and top-2 MoE dispatch equivalence tests.
Added regression tests for upstream-compatible clone defaults.
Verified optimized and original paths produce identical production-model audio tokens.
All 12 automated tests pass.

Assets 2