Skip to content

feat: Add latent_downscale_factor to LTXVAddGuide for IC-LoRA on small grids (CORE-102)#13565

Open
drozbay wants to merge 1 commit intoComfy-Org:masterfrom
drozbay:20260424a_ltx_add_guide
Open

feat: Add latent_downscale_factor to LTXVAddGuide for IC-LoRA on small grids (CORE-102)#13565
drozbay wants to merge 1 commit intoComfy-Org:masterfrom
drozbay:20260424a_ltx_add_guide

Conversation

@drozbay
Copy link
Copy Markdown
Contributor

@drozbay drozbay commented Apr 26, 2026

Adds latent_downscale_factor to the native LTXVAddGuide node so guide frames can be encoded at a fractional scale and dilated back to full size. This brings the node closer to parity with Lightricks' upstream LTXAddVideoICLoRAGuide from ComfyUI-LTXVideo and enables newer LTX-2 IC-LoRAs trained at fractional scales (less memory during inference).

Default is 1.0 (no scaling) — behavior is byte-identical to master when the input is left at default.

Usage with the ltx-2.3-22b-ic-lora-union-control-ref0.5 control lora (note the matching latent_downscale_factor=2):

image

Example workflow:

droz_LTX2_context_win_canny_union_control_v1.json

Expected output:

ComfyUI_00215_.mp4

Input assets:
https://github.com/user-attachments/assets/df01826a-9a28-43cd-9765-f802cd849b18
https://github.com/user-attachments/assets/52e78b8d-682c-4efd-8cde-8258272d03bf

Note: this PR previously also bundled a "causal fix" prepend/strip workaround for mid-video multi-frame guides. That has been split out to its own PR (#13625) per maintainer request.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

📝 Walkthrough

Walkthrough

Adds a latent_downscale_factor float input to LTXVAddGuide and updates method signatures (encode, execute) to accept it. encode can downscale target latent spatial size before encoding; a new dilate_latent method expands the encoded guide back to full latent spatial dimensions and produces a spatial guide_mask. execute now validates divisibility by the factor, reorders control flow to derive attention metadata from the dilated tensor, and forwards the dilated guide_mask and latent_downscale_factor into keyframe append logic (including the existing causal_fix handling).

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description clearly explains the addition of latent_downscale_factor to the LTXVAddGuide node, its purpose for supporting fractional-scale IC-LoRAs, and includes concrete usage examples with workflows and expected outputs.
Title check ✅ Passed The title accurately describes the primary changes—adding latent_downscale_factor to LTXVAddGuide node for IC-LoRA functionality. It is concise, specific, and clearly summarizes the main modification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_extras/nodes_lt.py`:
- Around line 227-235: Change the latent_downscale_factor input from
io.Float.Input to io.Int.Input (keeping the same default and step semantics) so
the value is always an integer; update the declaration of
latent_downscale_factor accordingly and then remove the now-redundant int(...)
casts where it’s used (in encode, dilate_latent, and append_keyframe) and the
divisibility check that currently casts to int, ensuring all places consume the
integer directly to avoid the encode/dilate mismatch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bae905cd-901e-4bf8-acef-31b59405121e

📥 Commits

Reviewing files that changed from the base of the PR and between df22bcd and 984578b.

📒 Files selected for processing (1)
  • comfy_extras/nodes_lt.py

Comment thread comfy_extras/nodes_lt.py
@alexisrolland alexisrolland changed the title feat: Add latent_downscaler_factor and causal fix to LTXVAddGuide node feat: Add latent_downscaler_factor and causal fix to LTXVAddGuide node (CORE-102) Apr 26, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
comfy_extras/nodes_lt.py (1)

378-381: Minor: redundant int(...) casts after sanitization.

max(1, round(latent_downscale_factor)) already yields a Python int (CPython's round() on a float returns int), so the int(...) wrappers on lines 380, 381 (and 418 below) are no-ops. Cleaning them up is purely cosmetic — feel free to ignore.

♻️ Optional cleanup
         latent_downscale_factor = max(1, round(latent_downscale_factor))
         if latent_downscale_factor > 1:
-            if latent_width % int(latent_downscale_factor) != 0 or latent_height % int(latent_downscale_factor) != 0:
-                raise ValueError(f"Latent spatial size {latent_width}x{latent_height} must be divisible by latent_downscale_factor {int(latent_downscale_factor)}")
+            if latent_width % latent_downscale_factor != 0 or latent_height % latent_downscale_factor != 0:
+                raise ValueError(f"Latent spatial size {latent_width}x{latent_height} must be divisible by latent_downscale_factor {latent_downscale_factor}")

…and on the append_keyframe call:

-            latent_downscale_factor=int(latent_downscale_factor),
+            latent_downscale_factor=latent_downscale_factor,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_extras/nodes_lt.py` around lines 378 - 381, The redundant int() casts
around latent_downscale_factor in the divisibility check should be removed:
latent_downscale_factor is already an int after latent_downscale_factor = max(1,
round(latent_downscale_factor)), so update the conditional and ValueError to use
latent_downscale_factor directly (refer to symbols latent_downscale_factor,
latent_width, latent_height); also remove the other no-op int(...) usage
mentioned near the append_keyframe call (and the similar occurrence around line
418) to clean up redundant casts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_extras/nodes_lt.py`:
- Around line 383-396: The non-causal branch currently prepends a throwaway
frame after computing num_frames_to_keep, which causes the encode truncation
inside cls.encode to drop the original last frame for inputs of length 8k+1
(variables: time_scale_factor, num_frames_to_keep, causal_fix; methods:
cls.encode, and the post-encode slices t = t[:, :, 1:, :, :] and image =
image[1:]). If the loss of the last frame is unintended, move the truncation to
happen before prepending: first compute/truncate image =
image[:num_frames_to_keep], then if not causal_fix prepend the throwaway frame,
call cls.encode, and keep the existing post-encode slicing; if the behavior is
intentional instead, update the tooltip/help text to state that inputs of length
8k+1 may yield fewer latents when frame_idx != 0.

---

Nitpick comments:
In `@comfy_extras/nodes_lt.py`:
- Around line 378-381: The redundant int() casts around latent_downscale_factor
in the divisibility check should be removed: latent_downscale_factor is already
an int after latent_downscale_factor = max(1, round(latent_downscale_factor)),
so update the conditional and ValueError to use latent_downscale_factor directly
(refer to symbols latent_downscale_factor, latent_width, latent_height); also
remove the other no-op int(...) usage mentioned near the append_keyframe call
(and the similar occurrence around line 418) to clean up redundant casts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bf40cffa-c4f4-4410-8532-e9264c87b543

📥 Commits

Reviewing files that changed from the base of the PR and between 984578b and 0189f36.

📒 Files selected for processing (1)
  • comfy_extras/nodes_lt.py

Comment thread comfy_extras/nodes_lt.py Outdated
Comment on lines +383 to +396
# For mid-video multi-frame guides, prepend+strip a throwaway first frame so the VAE's "first latent = 1 pixel frame" asymmetry lands on the discarded slot
time_scale_factor = scale_factors[0]
num_frames_to_keep = ((image.shape[0] - 1) // time_scale_factor) * time_scale_factor + 1
causal_fix = frame_idx == 0 or num_frames_to_keep == 1

if not causal_fix:
image = torch.cat([image[:1], image], dim=0)

image, t = cls.encode(vae, latent_width, latent_height, image, scale_factors, latent_downscale_factor)

if not causal_fix:
t = t[:, :, 1:, :, :]
image = image[1:]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the relevant file
cd comfy_extras && wc -l nodes_lt.py && head -n 420 nodes_lt.py | tail -n 60

Repository: Comfy-Org/ComfyUI

Length of output: 2828


🏁 Script executed:

# Search for the encode method definition
rg -n "def encode" comfy_extras/nodes_lt.py -A 20

Repository: Comfy-Org/ComfyUI

Length of output: 2352


🏁 Script executed:

# Get broader context around lines 383-396
sed -n '360,420p' comfy_extras/nodes_lt.py | cat -n

Repository: Comfy-Org/ComfyUI

Length of output: 3336


Confirm: The non-causal path drops the last input frame for 8k+1-length guides.

Tracing through with a concrete example (17-frame input, time_scale_factor=8, frame_idx > 0):

  1. num_frames_to_keep = 17, causal_fix = False
  2. After prepend: 18 frames
  3. Encode's truncation images[:(18-1)//8*8+1] keeps only the first 17 → frame 18 (the original final frame) is excluded
  4. After strip, latents span frames 1–16 instead of 0–17

So a 17-frame guide yields 2 latents instead of 3 when frame_idx > 0.

Per the PR comment, this may be intentional ("one extra encoded frame overhead" to shift the VAE's asymmetry onto a discarded slot). If so, update the tooltip to clarify that 8k+1 frames input may yield fewer latents when frame_idx != 0. If unintended, truncate before prepending to preserve all original frames within the encode boundary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_extras/nodes_lt.py` around lines 383 - 396, The non-causal branch
currently prepends a throwaway frame after computing num_frames_to_keep, which
causes the encode truncation inside cls.encode to drop the original last frame
for inputs of length 8k+1 (variables: time_scale_factor, num_frames_to_keep,
causal_fix; methods: cls.encode, and the post-encode slices t = t[:, :, 1:, :,
:] and image = image[1:]). If the loss of the last frame is unintended, move the
truncation to happen before prepending: first compute/truncate image =
image[:num_frames_to_keep], then if not causal_fix prepend the throwaway frame,
call cls.encode, and keep the existing post-encode slicing; if the behavior is
intentional instead, update the tooltip/help text to state that inputs of length
8k+1 may yield fewer latents when frame_idx != 0.

@drozbay drozbay changed the title feat: Add latent_downscaler_factor and causal fix to LTXVAddGuide node (CORE-102) feat: Add latent_downscale_factor and causal fix to LTXVAddGuide node (CORE-102) Apr 26, 2026
@drozbay drozbay force-pushed the 20260424a_ltx_add_guide branch from 0189f36 to b8bd32c Compare April 29, 2026 22:51
@drozbay drozbay changed the title feat: Add latent_downscale_factor and causal fix to LTXVAddGuide node (CORE-102) Add latent_downscale_factor to LTXVAddGuide for IC-LoRA on small grids Apr 29, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_extras/nodes_lt.py`:
- Around line 350-368: The dilated mask in dilate_latent currently uses -1.0/1.0
which, after append_keyframe's mask = guide_mask - strength, produces negative
values; change the mask to be in [0,1] by filling with 0.0 for unsampled cells
and setting sampled positions to 1.0 (i.e., replace torch.full(..., -1.0) with
0.0 and keep dilated_mask[..., ::scale, ::scale] = 1.0), and make the identical
change for the other similar dilated_mask creation block elsewhere in this file
so all sparse-guide paths produce a proper noise mask in [0,1].
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f992dd2-b4cd-4392-92f4-353493632c6d

📥 Commits

Reviewing files that changed from the base of the PR and between 0189f36 and b8bd32c.

📒 Files selected for processing (1)
  • comfy_extras/nodes_lt.py

Comment thread comfy_extras/nodes_lt.py
@drozbay drozbay changed the title Add latent_downscale_factor to LTXVAddGuide for IC-LoRA on small grids Add latent_downscale_factor to LTXVAddGuide for IC-LoRA on small grids (CORE-102) Apr 29, 2026
@drozbay drozbay changed the title Add latent_downscale_factor to LTXVAddGuide for IC-LoRA on small grids (CORE-102) feat: Add latent_downscale_factor to LTXVAddGuide for IC-LoRA on small grids (CORE-102) Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant