Skip to content

0.1.3

Choose a tag to compare

@Saganaki22 Saganaki22 released this 13 Jun 00:42
· 5 commits to main since this release

ZONOS2 TTS ComfyUI v0.1.3

Performance

  • Added optimized single-token MoE expert dispatch.
  • Autoregressive generation now runs only the selected expert instead of scanning all 16 experts.
  • Measured approximately 1.6x–2x faster token generation on an RTX 5090.
  • Generated audio tokens remain identical to the previous implementation.
  • Multi-token prompt processing retains the original grouped dispatch path.

Voice Cloning

  • Changed clean_speaker_background default to false, matching upstream ZONOS2.
  • Improved reference-audio and accurate-mode tooltips.
  • Added clearer guidance about voice identity, accent, cadence, emotion, and prosody limitations.
  • Expanded troubleshooting recommendations for improving clone similarity and accent retention.

Documentation

  • Updated English and Chinese documentation.
  • Updated version badges to 0.1.3.
  • Documented the optimized MoE decoding path.
  • Added detailed reference-audio and sampling guidance.

Testing

  • Added top-1 and top-2 MoE dispatch equivalence tests.
  • Added regression tests for upstream-compatible clone defaults.
  • Verified optimized and original paths produce identical production-model audio tokens.
  • All 12 automated tests pass.