Reduce video tiny VAE peak VRAM and decode time (CORE-127)#13617
Reduce video tiny VAE peak VRAM and decode time (CORE-127)#13617Kosinkadink merged 3 commits intoComfy-Org:masterfrom
Conversation
📝 WalkthroughWalkthroughThe PR modifies comfy/taesd/taehv.py: 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy/taesd/taehv.py`:
- Around line 199-203: The decode() method in TAEHV hardcodes output_device to
comfy.model_management.intermediate_device(), breaking callers that expect to
override it; add an optional parameter (e.g. output_device=None) to TAEHV.decode
(and any callers/constructors that forward args) that defaults to
comfy.model_management.intermediate_device() when None, and pass that variable
into the apply_model_with_memblocks call (replacing the current hardcoded
comfy.model_management.intermediate_device() usage) so external nodes can supply
a different device while preserving the current default behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f697a2fe-614a-4a9e-99a4-d7d79d21402e
📒 Files selected for processing (1)
comfy/taesd/taehv.py
rattus128
left a comment
There was a problem hiding this comment.
LGTM. optional nits only.
Using a boolean for encode vs decode is slightly cleaner than the lambdas IMO as it consolidates all the logic together while solving your slicing problem. But either way.
Agreed and changed. |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
comfy/taesd/taehv.py (1)
192-194:⚠️ Potential issue | 🟠 MajorExpose
output_deviceoverride indecode()to preserve caller flexibility.Line 193 hardcodes
intermediate_device(), so callers can no longer steer decode output placement throughkwargs. That’s a backward-compatibility regression for custom nodes that depended on device control.♻️ Proposed fix
def decode(self, x, **kwargs): x = x.unsqueeze(0) if x.ndim == 4 else x # [T, C, H, W] -> [1, T, C, H, W] x = x.movedim(1, 2) if x.shape[1] != self.latent_channels else x # [B, T, C, H, W] or [B, C, T, H, W] x = self.process_in(x).movedim(2, 1) # [B, C, T, H, W] -> [B, T, C, H, W] + output_device = kwargs.get("output_device", comfy.model_management.intermediate_device()) x = apply_model_with_memblocks(self.decoder, x, self.parallel, self.show_progress_bar, - output_device=comfy.model_management.intermediate_device(), + output_device=output_device, patch_size=self.patch_size, decode=True) return x[:, self.frames_to_trim:].movedim(2, 1)As per coding guidelines:
comfy/**changes should prioritize backward compatibility for custom nodes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy/taesd/taehv.py` around lines 192 - 194, The call to apply_model_with_memblocks in decode() currently hardcodes output_device=comfy.model_management.intermediate_device(), which prevents callers from overriding output placement; change decode() to accept and forward an output_device kwarg (defaulting to comfy.model_management.intermediate_device()) and pass that variable into apply_model_with_memblocks (the call that uses self.decoder, x, self.parallel, self.show_progress_bar, patch_size=self.patch_size, decode=True) so callers can override device placement while preserving the original default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@comfy/taesd/taehv.py`:
- Around line 192-194: The call to apply_model_with_memblocks in decode()
currently hardcodes output_device=comfy.model_management.intermediate_device(),
which prevents callers from overriding output placement; change decode() to
accept and forward an output_device kwarg (defaulting to
comfy.model_management.intermediate_device()) and pass that variable into
apply_model_with_memblocks (the call that uses self.decoder, x, self.parallel,
self.show_progress_bar, patch_size=self.patch_size, decode=True) so callers can
override device placement while preserving the original default.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 58bf1aab-b15f-459b-8a64-052a3dad8416
📒 Files selected for processing (1)
comfy/taesd/taehv.py
apply_model_with_memblocks, instead of accumulating the full output on
GPU before returning.
hooks so they run on GPU one frame at a time, avoiding the full-video
spike.
post-movedim) input; chunk along the time dim with views instead.
Bit-exact vs. previous code (fp16).
LTX2, 1024×1024, T_in=128: