You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LTXVEmptyLatentAudio writes sample_rate = int(audio_vae.first_stage_model.sample_rate) into the empty latent dict, but that value is the encoder's internal mel rate (16000 Hz on LTXV-2), not the vocoder output rate (24000 Hz). vae_decode_audio prefers samples["sample_rate"] over vae.audio_sample_rate_output, so the standard VAEDecodeAudio tags audio at the encoder rate and playback runs ~33% slower with pitch shifted down a perfect fifth. LTXVAudioVAEDecode dodges this by reading first_stage_model.output_sample_rate directly.
Fix is to drop the sample_rate key. EmptyLatentAudio (Stable Audio) already follows the convention of not setting this field on latent dicts, and vae.audio_sample_rate_output is the canonical source: comfy/sd.py:831 already populates it for the LTX Audio detection branch. LTX Audio is the only audio VAE in the codebase whose encoder rate differs from its vocoder output rate, which is why this only surfaces here.
The change removes the extraction of sample_rate from the audio VAE model within the LTXVEmptyLatentAudio.execute method. Specifically, the code no longer reads sample_rate from audio_vae.first_stage_model.sample_rate and the returned node output dictionary no longer includes a "sample_rate" field. The audio latent tensor construction and its return under "samples" with "type": "audio" remain unchanged.
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name
Status
Explanation
Resolution
Title check
⚠️ Warning
The PR title states 'make VAEDecodeAudio usable for LTX-2.x generated audio latents' but the actual change is removing the sample_rate key from LTXVEmptyLatentAudio to fix audio playback speed/pitch issues.
Update the PR title to accurately reflect the primary change, such as 'fix: drop sample_rate key from LTXVEmptyLatentAudio output' to match the actual objective of removing the incorrect sample_rate field.
✅ Passed checks (4 passed)
Check name
Status
Explanation
Docstring Coverage
✅ Passed
No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check
✅ Passed
Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check
✅ Passed
Check skipped because no linked issues were found for this pull request.
Description check
✅ Passed
The pull request description clearly explains the problem (encoder vs vocoder sample rate mismatch), the root cause, and the rationale for the fix with supporting evidence.
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
Tip
💬 Introducing Slack Agent: The best way for teams to turn conversations into code.
Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly
Built for teams:
Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls
One agent for your entire SDLC. Right inside Slack.
drozbay
changed the title
fix(audio): drop sample_rate key from LTXVEmptyLatentAudio (CORE-157)
fix: make VAEDecodeAudio usable for LTX-2.x generated audio latents (CORE-157)
May 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LTXVEmptyLatentAudiowritessample_rate = int(audio_vae.first_stage_model.sample_rate)into the empty latent dict, but that value is the encoder's internal mel rate (16000 Hz on LTXV-2), not the vocoder output rate (24000 Hz).vae_decode_audiopreferssamples["sample_rate"]overvae.audio_sample_rate_output, so the standardVAEDecodeAudiotags audio at the encoder rate and playback runs ~33% slower with pitch shifted down a perfect fifth.LTXVAudioVAEDecodedodges this by readingfirst_stage_model.output_sample_ratedirectly.Fix is to drop the
sample_ratekey.EmptyLatentAudio(Stable Audio) already follows the convention of not setting this field on latent dicts, andvae.audio_sample_rate_outputis the canonical source:comfy/sd.py:831already populates it for the LTX Audio detection branch. LTX Audio is the only audio VAE in the codebase whose encoder rate differs from its vocoder output rate, which is why this only surfaces here.Tests using
VAE Decode Audionode:Before fix (LTX-2.3 Text-To-Video):
audio_test_00020_.mp4
After fix (LTX-2.3 Text-To-Video):
audio_test_00021_.mp4
Confirm Stable-Audio still works fine:
https://github.com/user-attachments/files/27410472/ComfyUI_00003_.mp3