Fix: avoid loading SiglipVisionEncoder when not required in HunyuanVideo15Runner#651
Fix: avoid loading SiglipVisionEncoder when not required in HunyuanVideo15Runner#651gushiqiao merged 2 commits intoModelTC:mainfrom
Conversation
Summary of ChangesHello @FredyRivera-dev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an inefficiency in the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly optimizes the HunyuanVideo15Runner to avoid loading the SiglipVisionEncoder for tasks that do not require it, such as t2v. This is a good improvement for memory usage and initialization time. My review includes a suggestion to refine the list of tasks that trigger the encoder loading to make the optimization more effective and prevent loading the encoder for other unused cases.
| cpu_offload=siglip_offload, | ||
| ) | ||
| image_encoder = None | ||
| if self.config["task"] in ["i2v", "flf2v", "animate", "s2v"] and self.config.get("use_image_encoder", True): |
There was a problem hiding this comment.
While this change correctly prevents loading the image encoder for tasks like t2v, the list of tasks that trigger loading might be too broad, potentially causing the encoder to be loaded unnecessarily in other cases.
Based on the DefaultRunner implementation, it seems:
- The
animatetask (_run_input_encoder_local_animate) does not use the image encoder. - The
s2vtask (_run_input_encoder_local_s2v) is not implemented.
Including these tasks in the condition will lead to the image encoder being loaded when it's not used, which is contrary to the goal of this PR.
For improved correctness and maintainability, I suggest:
- Using a more precise set of tasks, like
{"i2v", "flf2v"}. - Using a
setfor a more efficient and idiomatic membership check. - Consider defining this set as a class-level constant for better readability if it's used elsewhere.
| if self.config["task"] in ["i2v", "flf2v", "animate", "s2v"] and self.config.get("use_image_encoder", True): | |
| if self.config["task"] in {"i2v", "flf2v"} and self.config.get("use_image_encoder", True): |
|
I've already made the changes so that it doesn't activate in other unnecessary cases either. |
…deo15Runner (#651) When running the `HunyuanVideo-1.5` model, the image encoder (`SiglipVisionEncoder`) was always loaded during model initialization, even when it was not needed. This caused unnecessary memory usage and increased initialization time, especially for `t2v` workflows. This PR updates the image encoder loading logic to match the pattern already used in `WanRunner`. ### Problem * `load_image_encoder()` in `HunyuanVideo15Runner` is called unconditionally during model initialization. * The original implementation always instantiated `SiglipVisionEncoder`, regardless of: * The task type (e.g. `t2v`) * The `use_image_encoder` configuration flag * The `use_image_encoder` flag was only checked later at runtime, not during model loading. This created an inconsistency between the loading path and the execution path. ### Root Cause The responsibility for checking whether the image encoder is needed was placed only in the runtime logic, while the loading logic ignored both the task type and the `use_image_encoder` flag. As a result, the image encoder was loaded even when it would never be used. ### Solution Update `load_image_encoder()` to follow the same pattern used in `WanRunner`: * Initialize `image_encoder` as `None` * Only load `SiglipVisionEncoder` when: * The task requires an image encoder (`i2v`, `flf2v`, `animate`, `s2v`) * `use_image_encoder` is enabled ### Changes #### Original implementation ```python def load_image_encoder(self): siglip_offload = self.config.get("siglip_cpu_offload", self.config.get("cpu_offload")) if siglip_offload: siglip_device = torch.device("cpu") else: siglip_device = torch.device(AI_DEVICE) image_encoder = SiglipVisionEncoder( config=self.config, device=siglip_device, checkpoint_path=self.config["model_path"], cpu_offload=siglip_offload, ) ``` #### Updated implementation ```python def load_image_encoder(self): image_encoder = None if self.config["task"] in ["i2v", "flf2v", "animate", "s2v"] and self.config.get("use_image_encoder", True): siglip_offload = self.config.get("siglip_cpu_offload", self.config.get("cpu_offload")) if siglip_offload: siglip_device = torch.device("cpu") else: siglip_device = torch.device(AI_DEVICE) image_encoder = SiglipVisionEncoder( config=self.config, device=siglip_device, checkpoint_path=self.config["model_path"], cpu_offload=siglip_offload, ) return image_encoder ``` ### Notes This is a minimal and safe change that only affects the loading logic. Runtime behavior remains unchanged, except that unused components are no longer loaded into memory.
When running the
HunyuanVideo-1.5model, the image encoder (SiglipVisionEncoder) was always loaded during model initialization, even when it was not needed. This caused unnecessary memory usage and increased initialization time, especially fort2vworkflows.This PR updates the image encoder loading logic to match the pattern already used in
WanRunner.Problem
load_image_encoder()inHunyuanVideo15Runneris called unconditionally during model initialization.The original implementation always instantiated
SiglipVisionEncoder, regardless of:t2v)use_image_encoderconfiguration flagThe
use_image_encoderflag was only checked later at runtime, not during model loading.This created an inconsistency between the loading path and the execution path.
Root Cause
The responsibility for checking whether the image encoder is needed was placed only in the runtime logic, while the loading logic ignored both the task type and the
use_image_encoderflag. As a result, the image encoder was loaded even when it would never be used.Solution
Update
load_image_encoder()to follow the same pattern used inWanRunner:Initialize
image_encoderasNoneOnly load
SiglipVisionEncoderwhen:i2v,flf2v,animate,s2v)use_image_encoderis enabledChanges
Original implementation
Updated implementation
Notes
This is a minimal and safe change that only affects the loading logic. Runtime behavior remains unchanged, except that unused components are no longer loaded into memory.