Releases: Saganaki22/Zonos2_TTS-ComfyUI
Releases · Saganaki22/Zonos2_TTS-ComfyUI
0.1.7
Zonos2 TTS ComfyUI v0.1.7
Fixed
- Corrected the mixed FP8 Hugging Face repository:
drbaph/ZONOS-FP8→drbaph/ZONOS2-FP8
- Fixed automatic FP8 model and asset downloads.
- Updated the FP8 model-loader dropdown entry.
- Corrected FP8 links in the English and Chinese READMEs.
- Corrected the project metadata and Hugging Face badge.
- Updated the example workflow JSON.
- Updated the workflow embedded inside the example PNG without changing its rendered image.
Verification
- 35 automated tests passing.
- Workflow JSON and embedded PNG workflow metadata validated.
0.1.6
Zonos2 TTS ComfyUI v0.1.6
Fixed
- Removed redundant ComfyUI memory-manager resume calls during voice cloning.
- Removed the second full bundle resume during speaker embedding extraction.
- Cached model-loader executions no longer unnecessarily resume the active bundle.
- Fixed duplicate
Loaded Zonos2Model,Zonos2DAC, andZonos2SpeakerEncoderlog messages. - Load messages now appear only when a model is newly registered.
Improved
- Main model, DAC, and speaker encoder patchers are resumed through one batched ComfyUI memory-management call.
- Preserved correct ComfyUI/AIMDO usage and residency tracking.
- Added regression coverage for batched resumes and accurate load logging.
Notes
- No ZONOS2 generation math, MoE routing, attention, or FP8 execution behavior was changed.
- A complete ComfyUI restart is required after updating.
Verification
- 35 automated tests passing.
0.1.5
Zonos2 TTS ComfyUI v0.1.5
Added
- Added mixed FP8 E4M3 checkpoint support.
- Added the
drbaph/ZONOS-FP8model catalog entry. - Added automatic download support for
zonos2-fp8-mixed.safetensors. - Added FP8 metadata, policy, tensor-shape, and runtime validation.
- Added native ComfyUI quantized tensor integration.
FP8 Policy
- MoE expert gate/up (
w13) projections use FP8 E4M3. - Expert down (
w2), attention, LM head, routers, embeddings, norms, and other sensitive paths remain BF16. - FP8 checkpoints support
dtype: autoanddtype: bf16. - Unsupported FP16 execution is rejected with a clear error.
AIMDO
- Added actual-size-aware static and DynamicVRAM selection for FP8 models.
- Mixed FP8 automatically selects AIMDO VBAR below approximately 12.78 GiB total VRAM when DynamicVRAM is enabled.
- FP8
w13and BF16w2expert projections are independently pageable. - Routed expert projections are loaded into VRAM on demand.
- VBAR residency, page faults, and eviction use real AIMDO operations.
- Systems with sufficient VRAM continue using the faster static loading path.
Downloads
- Main checkpoints, DAC assets, and speaker encoder assets are checked independently.
- Existing complete asset directories are not downloaded again.
- A missing selected checkpoint is still downloaded when shared assets already exist.
- BF16 and FP8 presets download from their respective Hugging Face repositories.
Validation
- Retired all-layer FP8 checkpoint layouts are rejected.
- Malformed 3D expert tensors are rejected before inference.
- Clear compatibility errors are provided for unsupported FP8 checkpoints.
- Added native and loader regression coverage.
Documentation
- Updated English and Chinese documentation.
- Added FP8 installation, memory usage, dtype, AIMDO, model structure, and troubleshooting information.
- Added the mixed FP8 Hugging Face badge and model link.
Verification
- 33 automated tests passing.
- Static FP8 generation verified.
- Real AIMDO VBAR generation verified with all expert
w13andw2projections registered.
0.1.4
Zonos2 TTS ComfyUI v0.1.4
Highlights
- Added adaptive ComfyUI/AIMDO memory management.
- Added genuine AIMDO VBAR paging for low-VRAM GPUs.
- Improved model loading, unloading, visualization, and lifecycle handling.
AIMDO and VRAM Management
- BF16 main model size is estimated at approximately 14.324 GiB.
- A 3 GiB runtime reserve creates an automatic VBAR cutoff of approximately 17.324 GiB total VRAM.
- 8 GiB, 12 GiB, and 16 GiB GPUs use the real AIMDO
CoreModelPatcherand VBAR path when DynamicVRAM is enabled. - Larger GPUs use the faster static CUDA path when the model fits with the runtime reserve.
- Dynamic loading pages only selected MoE experts into VRAM.
- VBAR allocation, residency, page faults, and eviction are genuine AIMDO operations.
- DAC and speaker encoder remain managed as smaller static modules.
- Automatic path selection is based on total VRAM rather than currently free VRAM.
Model Lifecycle
- Integrated the main model, DAC, and speaker encoder with ComfyUI model management.
- Reuses an existing bundle when model, dtype, and attention settings are unchanged.
- Changing model, dtype, or attention now performs a complete hard unload.
- Improved cleanup of tensors, AIMDO registrations, references, and accelerator caches.
- Restored accurate tensor visibility in ComfyUI Memory Visualization.
Performance
- Optimized single-token MoE decoding to execute only routed experts.
- Avoided scanning all experts during each autoregressive generation step.
- Preserved direct CUDA execution on GPUs with sufficient VRAM.
- Retained file-backed expert weights for dynamic low-VRAM execution.
Fixes
- Progress bars now use the checkpoint's real tensor count.
- Fixed resampler caching so it does not retain the speaker encoder indefinitely.
- Cached resamplers now follow speaker encoder device changes.
- Improved speaker encoder registration with ComfyUI model management.
- Improved model unloading and reloading after configuration changes.
Documentation
- Expanded English and Chinese AIMDO memory-management documentation.
- Documented static and dynamic loading behavior.
- Documented the 14.324 GiB model estimate, 3 GiB reserve, and 17.324 GiB cutoff.
- Added the complete supported-language tier table.
- Added the official Zyphra ZONOS2 blog badge.
- Expanded VRAM and out-of-memory troubleshooting guidance.
Testing
- Added runtime-management, AIMDO selection, lifecycle, progress, resampler, and MoE module tests.
- All 24 tests pass.
0.1.3
ZONOS2 TTS ComfyUI v0.1.3
Performance
- Added optimized single-token MoE expert dispatch.
- Autoregressive generation now runs only the selected expert instead of scanning all 16 experts.
- Measured approximately 1.6x–2x faster token generation on an RTX 5090.
- Generated audio tokens remain identical to the previous implementation.
- Multi-token prompt processing retains the original grouped dispatch path.
Voice Cloning
- Changed
clean_speaker_backgrounddefault tofalse, matching upstream ZONOS2. - Improved reference-audio and accurate-mode tooltips.
- Added clearer guidance about voice identity, accent, cadence, emotion, and prosody limitations.
- Expanded troubleshooting recommendations for improving clone similarity and accent retention.
Documentation
- Updated English and Chinese documentation.
- Updated version badges to
0.1.3. - Documented the optimized MoE decoding path.
- Added detailed reference-audio and sampling guidance.
Testing
- Added top-1 and top-2 MoE dispatch equivalence tests.
- Added regression tests for upstream-compatible clone defaults.
- Verified optimized and original paths produce identical production-model audio tokens.
- All 12 automated tests pass.
0.1.2
ZONOS2 TTS ComfyUI v0.1.2
Changed
- Added
uvsupport toinstall.py.- Uses
uv pip install --python <active ComfyUI Python>when available. - Falls back to
python -m pip.
- Uses
- Added uv installation instructions to both READMEs.
- Added all workflow and ZONOS2 images to the Chinese README.
- Moved the citation outside the expandable sections.
- Updated package, README badge, and workflow metadata to
0.1.2.
Added
- Added an informational
requirements.txt.- Dependencies are commented out intentionally.
pyproject.tomlandinstall.pyremain authoritative.
- Added tests covering uv and pip installer behavior.
Removed
- Removed
TODO.md.
Validation
- All 9 automated tests pass.
- Verified both README files contain the same static images and animated ZONOS2 GIF.