Skip to content

Releases: Saganaki22/Zonos2_TTS-ComfyUI

0.1.7

14 Jun 16:20

Choose a tag to compare

Zonos2 TTS ComfyUI v0.1.7

Fixed

  • Corrected the mixed FP8 Hugging Face repository:
    • drbaph/ZONOS-FP8drbaph/ZONOS2-FP8
  • Fixed automatic FP8 model and asset downloads.
  • Updated the FP8 model-loader dropdown entry.
  • Corrected FP8 links in the English and Chinese READMEs.
  • Corrected the project metadata and Hugging Face badge.
  • Updated the example workflow JSON.
  • Updated the workflow embedded inside the example PNG without changing its rendered image.

Verification

  • 35 automated tests passing.
  • Workflow JSON and embedded PNG workflow metadata validated.

0.1.6

14 Jun 10:29

Choose a tag to compare

Zonos2 TTS ComfyUI v0.1.6

Fixed

  • Removed redundant ComfyUI memory-manager resume calls during voice cloning.
  • Removed the second full bundle resume during speaker embedding extraction.
  • Cached model-loader executions no longer unnecessarily resume the active bundle.
  • Fixed duplicate Loaded Zonos2Model, Zonos2DAC, and Zonos2SpeakerEncoder log messages.
  • Load messages now appear only when a model is newly registered.

Improved

  • Main model, DAC, and speaker encoder patchers are resumed through one batched ComfyUI memory-management call.
  • Preserved correct ComfyUI/AIMDO usage and residency tracking.
  • Added regression coverage for batched resumes and accurate load logging.

Notes

  • No ZONOS2 generation math, MoE routing, attention, or FP8 execution behavior was changed.
  • A complete ComfyUI restart is required after updating.

Verification

  • 35 automated tests passing.

0.1.5

14 Jun 09:42

Choose a tag to compare

Zonos2 TTS ComfyUI v0.1.5

Added

  • Added mixed FP8 E4M3 checkpoint support.
  • Added the drbaph/ZONOS-FP8 model catalog entry.
  • Added automatic download support for zonos2-fp8-mixed.safetensors.
  • Added FP8 metadata, policy, tensor-shape, and runtime validation.
  • Added native ComfyUI quantized tensor integration.

FP8 Policy

  • MoE expert gate/up (w13) projections use FP8 E4M3.
  • Expert down (w2), attention, LM head, routers, embeddings, norms, and other sensitive paths remain BF16.
  • FP8 checkpoints support dtype: auto and dtype: bf16.
  • Unsupported FP16 execution is rejected with a clear error.

AIMDO

  • Added actual-size-aware static and DynamicVRAM selection for FP8 models.
  • Mixed FP8 automatically selects AIMDO VBAR below approximately 12.78 GiB total VRAM when DynamicVRAM is enabled.
  • FP8 w13 and BF16 w2 expert projections are independently pageable.
  • Routed expert projections are loaded into VRAM on demand.
  • VBAR residency, page faults, and eviction use real AIMDO operations.
  • Systems with sufficient VRAM continue using the faster static loading path.

Downloads

  • Main checkpoints, DAC assets, and speaker encoder assets are checked independently.
  • Existing complete asset directories are not downloaded again.
  • A missing selected checkpoint is still downloaded when shared assets already exist.
  • BF16 and FP8 presets download from their respective Hugging Face repositories.

Validation

  • Retired all-layer FP8 checkpoint layouts are rejected.
  • Malformed 3D expert tensors are rejected before inference.
  • Clear compatibility errors are provided for unsupported FP8 checkpoints.
  • Added native and loader regression coverage.

Documentation

  • Updated English and Chinese documentation.
  • Added FP8 installation, memory usage, dtype, AIMDO, model structure, and troubleshooting information.
  • Added the mixed FP8 Hugging Face badge and model link.

Verification

  • 33 automated tests passing.
  • Static FP8 generation verified.
  • Real AIMDO VBAR generation verified with all expert w13 and w2 projections registered.

0.1.4

13 Jun 20:02

Choose a tag to compare

Zonos2 TTS ComfyUI v0.1.4

Highlights

  • Added adaptive ComfyUI/AIMDO memory management.
  • Added genuine AIMDO VBAR paging for low-VRAM GPUs.
  • Improved model loading, unloading, visualization, and lifecycle handling.

AIMDO and VRAM Management

  • BF16 main model size is estimated at approximately 14.324 GiB.
  • A 3 GiB runtime reserve creates an automatic VBAR cutoff of approximately 17.324 GiB total VRAM.
  • 8 GiB, 12 GiB, and 16 GiB GPUs use the real AIMDO CoreModelPatcher and VBAR path when DynamicVRAM is enabled.
  • Larger GPUs use the faster static CUDA path when the model fits with the runtime reserve.
  • Dynamic loading pages only selected MoE experts into VRAM.
  • VBAR allocation, residency, page faults, and eviction are genuine AIMDO operations.
  • DAC and speaker encoder remain managed as smaller static modules.
  • Automatic path selection is based on total VRAM rather than currently free VRAM.

Model Lifecycle

  • Integrated the main model, DAC, and speaker encoder with ComfyUI model management.
  • Reuses an existing bundle when model, dtype, and attention settings are unchanged.
  • Changing model, dtype, or attention now performs a complete hard unload.
  • Improved cleanup of tensors, AIMDO registrations, references, and accelerator caches.
  • Restored accurate tensor visibility in ComfyUI Memory Visualization.

Performance

  • Optimized single-token MoE decoding to execute only routed experts.
  • Avoided scanning all experts during each autoregressive generation step.
  • Preserved direct CUDA execution on GPUs with sufficient VRAM.
  • Retained file-backed expert weights for dynamic low-VRAM execution.

Fixes

  • Progress bars now use the checkpoint's real tensor count.
  • Fixed resampler caching so it does not retain the speaker encoder indefinitely.
  • Cached resamplers now follow speaker encoder device changes.
  • Improved speaker encoder registration with ComfyUI model management.
  • Improved model unloading and reloading after configuration changes.

Documentation

  • Expanded English and Chinese AIMDO memory-management documentation.
  • Documented static and dynamic loading behavior.
  • Documented the 14.324 GiB model estimate, 3 GiB reserve, and 17.324 GiB cutoff.
  • Added the complete supported-language tier table.
  • Added the official Zyphra ZONOS2 blog badge.
  • Expanded VRAM and out-of-memory troubleshooting guidance.

Testing

  • Added runtime-management, AIMDO selection, lifecycle, progress, resampler, and MoE module tests.
  • All 24 tests pass.

0.1.3

13 Jun 00:42

Choose a tag to compare

ZONOS2 TTS ComfyUI v0.1.3

Performance

  • Added optimized single-token MoE expert dispatch.
  • Autoregressive generation now runs only the selected expert instead of scanning all 16 experts.
  • Measured approximately 1.6x–2x faster token generation on an RTX 5090.
  • Generated audio tokens remain identical to the previous implementation.
  • Multi-token prompt processing retains the original grouped dispatch path.

Voice Cloning

  • Changed clean_speaker_background default to false, matching upstream ZONOS2.
  • Improved reference-audio and accurate-mode tooltips.
  • Added clearer guidance about voice identity, accent, cadence, emotion, and prosody limitations.
  • Expanded troubleshooting recommendations for improving clone similarity and accent retention.

Documentation

  • Updated English and Chinese documentation.
  • Updated version badges to 0.1.3.
  • Documented the optimized MoE decoding path.
  • Added detailed reference-audio and sampling guidance.

Testing

  • Added top-1 and top-2 MoE dispatch equivalence tests.
  • Added regression tests for upstream-compatible clone defaults.
  • Verified optimized and original paths produce identical production-model audio tokens.
  • All 12 automated tests pass.

0.1.2

12 Jun 21:14

Choose a tag to compare

ZONOS2 TTS ComfyUI v0.1.2

Changed

  • Added uv support to install.py.
    • Uses uv pip install --python <active ComfyUI Python> when available.
    • Falls back to python -m pip.
  • Added uv installation instructions to both READMEs.
  • Added all workflow and ZONOS2 images to the Chinese README.
  • Moved the citation outside the expandable sections.
  • Updated package, README badge, and workflow metadata to 0.1.2.

Added

  • Added an informational requirements.txt.
    • Dependencies are commented out intentionally.
    • pyproject.toml and install.py remain authoritative.
  • Added tests covering uv and pip installer behavior.

Removed

  • Removed TODO.md.

Validation

  • All 9 automated tests pass.
  • Verified both README files contain the same static images and animated ZONOS2 GIF.