Skip to content

fix: make VAEDecodeAudio usable for LTX-2.x generated audio latents (CORE-157)#13716

Merged
comfyanonymous merged 1 commit intoComfy-Org:masterfrom
drozbay:barishozbay/core-157-drop-sample_rate-key-from-ltxvemptylatentaudio-output-dict
May 5, 2026
Merged

fix: make VAEDecodeAudio usable for LTX-2.x generated audio latents (CORE-157)#13716
comfyanonymous merged 1 commit intoComfy-Org:masterfrom
drozbay:barishozbay/core-157-drop-sample_rate-key-from-ltxvemptylatentaudio-output-dict

Conversation

@drozbay
Copy link
Copy Markdown
Contributor

@drozbay drozbay commented May 5, 2026

Summary

LTXVEmptyLatentAudio writes sample_rate = int(audio_vae.first_stage_model.sample_rate) into the empty latent dict, but that value is the encoder's internal mel rate (16000 Hz on LTXV-2), not the vocoder output rate (24000 Hz). vae_decode_audio prefers samples["sample_rate"] over vae.audio_sample_rate_output, so the standard VAEDecodeAudio tags audio at the encoder rate and playback runs ~33% slower with pitch shifted down a perfect fifth. LTXVAudioVAEDecode dodges this by reading first_stage_model.output_sample_rate directly.

Fix is to drop the sample_rate key. EmptyLatentAudio (Stable Audio) already follows the convention of not setting this field on latent dicts, and vae.audio_sample_rate_output is the canonical source: comfy/sd.py:831 already populates it for the LTX Audio detection branch. LTX Audio is the only audio VAE in the codebase whose encoder rate differs from its vocoder output rate, which is why this only surfaces here.

Tests using VAE Decode Audio node:

Before fix (LTX-2.3 Text-To-Video):

audio_test_00020_.mp4

After fix (LTX-2.3 Text-To-Video):

audio_test_00021_.mp4

Confirm Stable-Audio still works fine:

https://github.com/user-attachments/files/27410472/ComfyUI_00003_.mp3

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Walkthrough

The change removes the extraction of sample_rate from the audio VAE model within the LTXVEmptyLatentAudio.execute method. Specifically, the code no longer reads sample_rate from audio_vae.first_stage_model.sample_rate and the returned node output dictionary no longer includes a "sample_rate" field. The audio latent tensor construction and its return under "samples" with "type": "audio" remain unchanged.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title states 'make VAEDecodeAudio usable for LTX-2.x generated audio latents' but the actual change is removing the sample_rate key from LTXVEmptyLatentAudio to fix audio playback speed/pitch issues. Update the PR title to accurately reflect the primary change, such as 'fix: drop sample_rate key from LTXVEmptyLatentAudio output' to match the actual objective of removing the incorrect sample_rate field.
✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description clearly explains the problem (encoder vs vocoder sample rate mismatch), the root cause, and the rationale for the fix with supporting evidence.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@drozbay drozbay closed this May 5, 2026
@drozbay drozbay deleted the barishozbay/core-157-drop-sample_rate-key-from-ltxvemptylatentaudio-output-dict branch May 5, 2026 17:19
@drozbay drozbay restored the barishozbay/core-157-drop-sample_rate-key-from-ltxvemptylatentaudio-output-dict branch May 5, 2026 17:21
@drozbay drozbay reopened this May 5, 2026
@drozbay drozbay changed the title fix(audio): drop sample_rate key from LTXVEmptyLatentAudio (CORE-157) fix: make VAEDecodeAudio usable for LTX-2.x generated audio latents (CORE-157) May 5, 2026
@comfyanonymous comfyanonymous merged commit 41d73ad into Comfy-Org:master May 5, 2026
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants