feat: Add latent_downscale_factor to LTXVAddGuide for IC-LoRA on small grids (CORE-102)#13565
feat: Add latent_downscale_factor to LTXVAddGuide for IC-LoRA on small grids (CORE-102)#13565drozbay wants to merge 1 commit intoComfy-Org:masterfrom
Conversation
📝 WalkthroughWalkthroughAdds a 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_extras/nodes_lt.py`:
- Around line 227-235: Change the latent_downscale_factor input from
io.Float.Input to io.Int.Input (keeping the same default and step semantics) so
the value is always an integer; update the declaration of
latent_downscale_factor accordingly and then remove the now-redundant int(...)
casts where it’s used (in encode, dilate_latent, and append_keyframe) and the
divisibility check that currently casts to int, ensuring all places consume the
integer directly to avoid the encode/dilate mismatch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bae905cd-901e-4bf8-acef-31b59405121e
📒 Files selected for processing (1)
comfy_extras/nodes_lt.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
comfy_extras/nodes_lt.py (1)
378-381: Minor: redundantint(...)casts after sanitization.
max(1, round(latent_downscale_factor))already yields a Pythonint(CPython'sround()on a float returnsint), so theint(...)wrappers on lines 380, 381 (and 418 below) are no-ops. Cleaning them up is purely cosmetic — feel free to ignore.♻️ Optional cleanup
latent_downscale_factor = max(1, round(latent_downscale_factor)) if latent_downscale_factor > 1: - if latent_width % int(latent_downscale_factor) != 0 or latent_height % int(latent_downscale_factor) != 0: - raise ValueError(f"Latent spatial size {latent_width}x{latent_height} must be divisible by latent_downscale_factor {int(latent_downscale_factor)}") + if latent_width % latent_downscale_factor != 0 or latent_height % latent_downscale_factor != 0: + raise ValueError(f"Latent spatial size {latent_width}x{latent_height} must be divisible by latent_downscale_factor {latent_downscale_factor}")…and on the
append_keyframecall:- latent_downscale_factor=int(latent_downscale_factor), + latent_downscale_factor=latent_downscale_factor,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy_extras/nodes_lt.py` around lines 378 - 381, The redundant int() casts around latent_downscale_factor in the divisibility check should be removed: latent_downscale_factor is already an int after latent_downscale_factor = max(1, round(latent_downscale_factor)), so update the conditional and ValueError to use latent_downscale_factor directly (refer to symbols latent_downscale_factor, latent_width, latent_height); also remove the other no-op int(...) usage mentioned near the append_keyframe call (and the similar occurrence around line 418) to clean up redundant casts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_extras/nodes_lt.py`:
- Around line 383-396: The non-causal branch currently prepends a throwaway
frame after computing num_frames_to_keep, which causes the encode truncation
inside cls.encode to drop the original last frame for inputs of length 8k+1
(variables: time_scale_factor, num_frames_to_keep, causal_fix; methods:
cls.encode, and the post-encode slices t = t[:, :, 1:, :, :] and image =
image[1:]). If the loss of the last frame is unintended, move the truncation to
happen before prepending: first compute/truncate image =
image[:num_frames_to_keep], then if not causal_fix prepend the throwaway frame,
call cls.encode, and keep the existing post-encode slicing; if the behavior is
intentional instead, update the tooltip/help text to state that inputs of length
8k+1 may yield fewer latents when frame_idx != 0.
---
Nitpick comments:
In `@comfy_extras/nodes_lt.py`:
- Around line 378-381: The redundant int() casts around latent_downscale_factor
in the divisibility check should be removed: latent_downscale_factor is already
an int after latent_downscale_factor = max(1, round(latent_downscale_factor)),
so update the conditional and ValueError to use latent_downscale_factor directly
(refer to symbols latent_downscale_factor, latent_width, latent_height); also
remove the other no-op int(...) usage mentioned near the append_keyframe call
(and the similar occurrence around line 418) to clean up redundant casts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bf40cffa-c4f4-4410-8532-e9264c87b543
📒 Files selected for processing (1)
comfy_extras/nodes_lt.py
| # For mid-video multi-frame guides, prepend+strip a throwaway first frame so the VAE's "first latent = 1 pixel frame" asymmetry lands on the discarded slot | ||
| time_scale_factor = scale_factors[0] | ||
| num_frames_to_keep = ((image.shape[0] - 1) // time_scale_factor) * time_scale_factor + 1 | ||
| causal_fix = frame_idx == 0 or num_frames_to_keep == 1 | ||
|
|
||
| if not causal_fix: | ||
| image = torch.cat([image[:1], image], dim=0) | ||
|
|
||
| image, t = cls.encode(vae, latent_width, latent_height, image, scale_factors, latent_downscale_factor) | ||
|
|
||
| if not causal_fix: | ||
| t = t[:, :, 1:, :, :] | ||
| image = image[1:] | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and read the relevant file
cd comfy_extras && wc -l nodes_lt.py && head -n 420 nodes_lt.py | tail -n 60Repository: Comfy-Org/ComfyUI
Length of output: 2828
🏁 Script executed:
# Search for the encode method definition
rg -n "def encode" comfy_extras/nodes_lt.py -A 20Repository: Comfy-Org/ComfyUI
Length of output: 2352
🏁 Script executed:
# Get broader context around lines 383-396
sed -n '360,420p' comfy_extras/nodes_lt.py | cat -nRepository: Comfy-Org/ComfyUI
Length of output: 3336
Confirm: The non-causal path drops the last input frame for 8k+1-length guides.
Tracing through with a concrete example (17-frame input, time_scale_factor=8, frame_idx > 0):
num_frames_to_keep = 17,causal_fix = False- After prepend: 18 frames
- Encode's truncation
images[:(18-1)//8*8+1]keeps only the first 17 → frame 18 (the original final frame) is excluded - After strip, latents span frames 1–16 instead of 0–17
So a 17-frame guide yields 2 latents instead of 3 when frame_idx > 0.
Per the PR comment, this may be intentional ("one extra encoded frame overhead" to shift the VAE's asymmetry onto a discarded slot). If so, update the tooltip to clarify that 8k+1 frames input may yield fewer latents when frame_idx != 0. If unintended, truncate before prepending to preserve all original frames within the encode boundary.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@comfy_extras/nodes_lt.py` around lines 383 - 396, The non-causal branch
currently prepends a throwaway frame after computing num_frames_to_keep, which
causes the encode truncation inside cls.encode to drop the original last frame
for inputs of length 8k+1 (variables: time_scale_factor, num_frames_to_keep,
causal_fix; methods: cls.encode, and the post-encode slices t = t[:, :, 1:, :,
:] and image = image[1:]). If the loss of the last frame is unintended, move the
truncation to happen before prepending: first compute/truncate image =
image[:num_frames_to_keep], then if not causal_fix prepend the throwaway frame,
call cls.encode, and keep the existing post-encode slicing; if the behavior is
intentional instead, update the tooltip/help text to state that inputs of length
8k+1 may yield fewer latents when frame_idx != 0.
0189f36 to
b8bd32c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_extras/nodes_lt.py`:
- Around line 350-368: The dilated mask in dilate_latent currently uses -1.0/1.0
which, after append_keyframe's mask = guide_mask - strength, produces negative
values; change the mask to be in [0,1] by filling with 0.0 for unsampled cells
and setting sampled positions to 1.0 (i.e., replace torch.full(..., -1.0) with
0.0 and keep dilated_mask[..., ::scale, ::scale] = 1.0), and make the identical
change for the other similar dilated_mask creation block elsewhere in this file
so all sparse-guide paths produce a proper noise mask in [0,1].
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8f992dd2-b4cd-4392-92f4-353493632c6d
📒 Files selected for processing (1)
comfy_extras/nodes_lt.py
Adds
latent_downscale_factorto the nativeLTXVAddGuidenode so guide frames can be encoded at a fractional scale and dilated back to full size. This brings the node closer to parity with Lightricks' upstreamLTXAddVideoICLoRAGuidefrom ComfyUI-LTXVideo and enables newer LTX-2 IC-LoRAs trained at fractional scales (less memory during inference).Default is
1.0(no scaling) — behavior is byte-identical to master when the input is left at default.Usage with the ltx-2.3-22b-ic-lora-union-control-ref0.5 control lora (note the matching
latent_downscale_factor=2):Example workflow:
droz_LTX2_context_win_canny_union_control_v1.json
Expected output:
ComfyUI_00215_.mp4
Input assets:
https://github.com/user-attachments/assets/df01826a-9a28-43cd-9765-f802cd849b18
https://github.com/user-attachments/assets/52e78b8d-682c-4efd-8cde-8258272d03bf
Note: this PR previously also bundled a "causal fix" prepend/strip workaround for mid-video multi-frame guides. That has been split out to its own PR (#13625) per maintainer request.