Skip to content

Audio-video pipeline: concat/split DiT, WAV output, docs#2

Merged
lmangani merged 2 commits intomainfrom
audio-video
Mar 19, 2026
Merged

Audio-video pipeline: concat/split DiT, WAV output, docs#2
lmangani merged 2 commits intomainfrom
audio-video

Conversation

@lmangani
Copy link
Contributor

  • Add combined AV latent path: patchify video+audio, single DiT forward, split output, Euler step on both; audio latent [T,8,16], n_audio_tok = T_lat
  • Add patchify_audio/unpatchify_audio in ltx_dit.hpp
  • Add --av, --audio-vae (optional), --out-wav; latent-to-waveform fallback for WAV (16 kHz) when full audio VAE decoder not yet used
  • Add docs/AV_PIPELINE.md (design, shapes, CLI)
  • Update README, DEV.md, CLAUDE.md for audio-video branch and AV usage

E added 2 commits March 19, 2026 00:22
- Add combined AV latent path: patchify video+audio, single DiT forward, split
  output, Euler step on both; audio latent [T,8,16], n_audio_tok = T_lat
- Add patchify_audio/unpatchify_audio in ltx_dit.hpp
- Add --av, --audio-vae (optional), --out-wav; latent-to-waveform fallback
  for WAV (16 kHz) when full audio VAE decoder not yet used
- Add docs/AV_PIPELINE.md (design, shapes, CLI)
- Update README, DEV.md, CLAUDE.md for audio-video branch and AV usage

Made-with: Cursor
- models.sh: --distilled downloads DiT from distilled/, matching distilled
  VAE (vae/) and text_encoders (text_encoders/) from same repo
- Document that VAE and text_encoders are from unsloth/LTX-2.3-GGUF
  (vae/, text_encoders/); quick-start uses correct VAE for dev vs distilled
- README, LTX_COMFY_REFERENCE, CLAUDE: single-repo layout and distilled
  file list

Made-with: Cursor
@lmangani lmangani merged commit 9788e11 into main Mar 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant