14 Jun 16:20

Saganaki22

0.1.7 Latest

Latest

Zonos2 TTS ComfyUI v0.1.7

Fixed

Corrected the mixed FP8 Hugging Face repository:
- drbaph/ZONOS-FP8 → drbaph/ZONOS2-FP8
Fixed automatic FP8 model and asset downloads.
Updated the FP8 model-loader dropdown entry.
Corrected FP8 links in the English and Chinese READMEs.
Corrected the project metadata and Hugging Face badge.
Updated the example workflow JSON.
Updated the workflow embedded inside the example PNG without changing its rendered image.

Verification

35 automated tests passing.
Workflow JSON and embedded PNG workflow metadata validated.

Assets 2

14 Jun 10:29

Saganaki22

0.1.6

Zonos2 TTS ComfyUI v0.1.6

Fixed

Removed redundant ComfyUI memory-manager resume calls during voice cloning.
Removed the second full bundle resume during speaker embedding extraction.
Cached model-loader executions no longer unnecessarily resume the active bundle.
Fixed duplicate Loaded Zonos2Model, Zonos2DAC, and Zonos2SpeakerEncoder log messages.
Load messages now appear only when a model is newly registered.

Improved

Main model, DAC, and speaker encoder patchers are resumed through one batched ComfyUI memory-management call.
Preserved correct ComfyUI/AIMDO usage and residency tracking.
Added regression coverage for batched resumes and accurate load logging.

Notes

No ZONOS2 generation math, MoE routing, attention, or FP8 execution behavior was changed.
A complete ComfyUI restart is required after updating.

Verification

35 automated tests passing.

Assets 2

14 Jun 09:42

Saganaki22

0.1.5

Zonos2 TTS ComfyUI v0.1.5

Added

Added mixed FP8 E4M3 checkpoint support.
Added the drbaph/ZONOS-FP8 model catalog entry.
Added automatic download support for zonos2-fp8-mixed.safetensors.
Added FP8 metadata, policy, tensor-shape, and runtime validation.
Added native ComfyUI quantized tensor integration.

FP8 Policy

MoE expert gate/up (w13) projections use FP8 E4M3.
Expert down (w2), attention, LM head, routers, embeddings, norms, and other sensitive paths remain BF16.
FP8 checkpoints support dtype: auto and dtype: bf16.
Unsupported FP16 execution is rejected with a clear error.

AIMDO

Added actual-size-aware static and DynamicVRAM selection for FP8 models.
Mixed FP8 automatically selects AIMDO VBAR below approximately 12.78 GiB total VRAM when DynamicVRAM is enabled.
FP8 w13 and BF16 w2 expert projections are independently pageable.
Routed expert projections are loaded into VRAM on demand.
VBAR residency, page faults, and eviction use real AIMDO operations.
Systems with sufficient VRAM continue using the faster static loading path.

Downloads

Main checkpoints, DAC assets, and speaker encoder assets are checked independently.
Existing complete asset directories are not downloaded again.
A missing selected checkpoint is still downloaded when shared assets already exist.
BF16 and FP8 presets download from their respective Hugging Face repositories.

Validation

Retired all-layer FP8 checkpoint layouts are rejected.
Malformed 3D expert tensors are rejected before inference.
Clear compatibility errors are provided for unsupported FP8 checkpoints.
Added native and loader regression coverage.

Documentation

Updated English and Chinese documentation.
Added FP8 installation, memory usage, dtype, AIMDO, model structure, and troubleshooting information.
Added the mixed FP8 Hugging Face badge and model link.

Verification

33 automated tests passing.
Static FP8 generation verified.
Real AIMDO VBAR generation verified with all expert w13 and w2 projections registered.

Assets 2

13 Jun 20:02

Saganaki22

0.1.4

Zonos2 TTS ComfyUI v0.1.4

Highlights

Added adaptive ComfyUI/AIMDO memory management.
Added genuine AIMDO VBAR paging for low-VRAM GPUs.
Improved model loading, unloading, visualization, and lifecycle handling.

AIMDO and VRAM Management

BF16 main model size is estimated at approximately 14.324 GiB.
A 3 GiB runtime reserve creates an automatic VBAR cutoff of approximately 17.324 GiB total VRAM.
8 GiB, 12 GiB, and 16 GiB GPUs use the real AIMDO CoreModelPatcher and VBAR path when DynamicVRAM is enabled.
Larger GPUs use the faster static CUDA path when the model fits with the runtime reserve.
Dynamic loading pages only selected MoE experts into VRAM.
VBAR allocation, residency, page faults, and eviction are genuine AIMDO operations.
DAC and speaker encoder remain managed as smaller static modules.
Automatic path selection is based on total VRAM rather than currently free VRAM.

Model Lifecycle

Integrated the main model, DAC, and speaker encoder with ComfyUI model management.
Reuses an existing bundle when model, dtype, and attention settings are unchanged.
Changing model, dtype, or attention now performs a complete hard unload.
Improved cleanup of tensors, AIMDO registrations, references, and accelerator caches.
Restored accurate tensor visibility in ComfyUI Memory Visualization.

Performance

Optimized single-token MoE decoding to execute only routed experts.
Avoided scanning all experts during each autoregressive generation step.
Preserved direct CUDA execution on GPUs with sufficient VRAM.
Retained file-backed expert weights for dynamic low-VRAM execution.

Fixes

Progress bars now use the checkpoint's real tensor count.
Fixed resampler caching so it does not retain the speaker encoder indefinitely.
Cached resamplers now follow speaker encoder device changes.
Improved speaker encoder registration with ComfyUI model management.
Improved model unloading and reloading after configuration changes.

Documentation

Expanded English and Chinese AIMDO memory-management documentation.
Documented static and dynamic loading behavior.
Documented the 14.324 GiB model estimate, 3 GiB reserve, and 17.324 GiB cutoff.
Added the complete supported-language tier table.
Added the official Zyphra ZONOS2 blog badge.
Expanded VRAM and out-of-memory troubleshooting guidance.

Testing

Added runtime-management, AIMDO selection, lifecycle, progress, resampler, and MoE module tests.
All 24 tests pass.

Assets 2

13 Jun 00:42

Saganaki22

0.1.3

ZONOS2 TTS ComfyUI v0.1.3

Performance

Added optimized single-token MoE expert dispatch.
Autoregressive generation now runs only the selected expert instead of scanning all 16 experts.
Measured approximately 1.6x–2x faster token generation on an RTX 5090.
Generated audio tokens remain identical to the previous implementation.
Multi-token prompt processing retains the original grouped dispatch path.

Voice Cloning

Changed clean_speaker_background default to false, matching upstream ZONOS2.
Improved reference-audio and accurate-mode tooltips.
Added clearer guidance about voice identity, accent, cadence, emotion, and prosody limitations.
Expanded troubleshooting recommendations for improving clone similarity and accent retention.

Documentation

Updated English and Chinese documentation.
Updated version badges to 0.1.3.
Documented the optimized MoE decoding path.
Added detailed reference-audio and sampling guidance.

Testing

Added top-1 and top-2 MoE dispatch equivalence tests.
Added regression tests for upstream-compatible clone defaults.
Verified optimized and original paths produce identical production-model audio tokens.
All 12 automated tests pass.

Assets 2

12 Jun 21:14

Saganaki22

0.1.2

ZONOS2 TTS ComfyUI v0.1.2

Changed

Added uv support to install.py.
- Uses uv pip install --python <active ComfyUI Python> when available.
- Falls back to python -m pip.
Added uv installation instructions to both READMEs.
Added all workflow and ZONOS2 images to the Chinese README.
Moved the citation outside the expandable sections.
Updated package, README badge, and workflow metadata to 0.1.2.

Added

Added an informational requirements.txt.
- Dependencies are commented out intentionally.
- pyproject.toml and install.py remain authoritative.
Added tests covering uv and pip installer behavior.

Removed

Removed TODO.md.

Validation

All 9 automated tests pass.
Verified both README files contain the same static images and animated ZONOS2 GIF.

Assets 2