Skip to content

Fix: avoid loading SiglipVisionEncoder when not required in HunyuanVideo15Runner#651

Merged
gushiqiao merged 2 commits intoModelTC:mainfrom
FredyRivera-dev:main
Dec 23, 2025
Merged

Fix: avoid loading SiglipVisionEncoder when not required in HunyuanVideo15Runner#651
gushiqiao merged 2 commits intoModelTC:mainfrom
FredyRivera-dev:main

Conversation

@FredyRivera-dev
Copy link
Copy Markdown
Contributor

When running the HunyuanVideo-1.5 model, the image encoder (SiglipVisionEncoder) was always loaded during model initialization, even when it was not needed. This caused unnecessary memory usage and increased initialization time, especially for t2v workflows.

This PR updates the image encoder loading logic to match the pattern already used in WanRunner.

Problem

  • load_image_encoder() in HunyuanVideo15Runner is called unconditionally during model initialization.

  • The original implementation always instantiated SiglipVisionEncoder, regardless of:

    • The task type (e.g. t2v)
    • The use_image_encoder configuration flag
  • The use_image_encoder flag was only checked later at runtime, not during model loading.

This created an inconsistency between the loading path and the execution path.

Root Cause

The responsibility for checking whether the image encoder is needed was placed only in the runtime logic, while the loading logic ignored both the task type and the use_image_encoder flag. As a result, the image encoder was loaded even when it would never be used.

Solution

Update load_image_encoder() to follow the same pattern used in WanRunner:

  • Initialize image_encoder as None

  • Only load SiglipVisionEncoder when:

    • The task requires an image encoder (i2v, flf2v, animate, s2v)
    • use_image_encoder is enabled

Changes

Original implementation

def load_image_encoder(self):
    siglip_offload = self.config.get("siglip_cpu_offload", self.config.get("cpu_offload"))
    if siglip_offload:
        siglip_device = torch.device("cpu")
    else:
        siglip_device = torch.device(AI_DEVICE)
    image_encoder = SiglipVisionEncoder(
        config=self.config,
        device=siglip_device,
        checkpoint_path=self.config["model_path"],
        cpu_offload=siglip_offload,
    )

Updated implementation

def load_image_encoder(self):
    image_encoder = None
    if self.config["task"] in ["i2v", "flf2v", "animate", "s2v"] and self.config.get("use_image_encoder", True):
        siglip_offload = self.config.get("siglip_cpu_offload", self.config.get("cpu_offload"))
        if siglip_offload:
            siglip_device = torch.device("cpu")
        else:
            siglip_device = torch.device(AI_DEVICE)
        image_encoder = SiglipVisionEncoder(
            config=self.config,
            device=siglip_device,
            checkpoint_path=self.config["model_path"],
            cpu_offload=siglip_offload,
        )
    return image_encoder

Notes

This is a minimal and safe change that only affects the loading logic. Runtime behavior remains unchanged, except that unused components are no longer loaded into memory.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @FredyRivera-dev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an inefficiency in the HunyuanVideo15Runner where the SiglipVisionEncoder was always loaded during model initialization, regardless of whether it was actually needed for the specific task. By implementing conditional loading logic, similar to WanRunner, this change ensures that the image encoder is only instantiated when required, leading to reduced memory consumption and faster initialization times for workflows that do not utilize it.

Highlights

  • Unnecessary SiglipVisionEncoder loading: The SiglipVisionEncoder was previously loaded unconditionally during HunyuanVideo15Runner initialization, consuming memory and increasing startup time even when not required for certain tasks like t2v.
  • Conditional loading implementation: The load_image_encoder method has been updated to conditionally load the SiglipVisionEncoder only when the task type (i2v, flf2v, animate, s2v) and the use_image_encoder flag indicate its necessity, mirroring the logic in WanRunner.
  • Resource optimization: This change optimizes resource usage by preventing the loading of unused components, thereby reducing memory footprint and improving initialization efficiency without altering runtime behavior.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly optimizes the HunyuanVideo15Runner to avoid loading the SiglipVisionEncoder for tasks that do not require it, such as t2v. This is a good improvement for memory usage and initialization time. My review includes a suggestion to refine the list of tasks that trigger the encoder loading to make the optimization more effective and prevent loading the encoder for other unused cases.

cpu_offload=siglip_offload,
)
image_encoder = None
if self.config["task"] in ["i2v", "flf2v", "animate", "s2v"] and self.config.get("use_image_encoder", True):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this change correctly prevents loading the image encoder for tasks like t2v, the list of tasks that trigger loading might be too broad, potentially causing the encoder to be loaded unnecessarily in other cases.

Based on the DefaultRunner implementation, it seems:

  • The animate task (_run_input_encoder_local_animate) does not use the image encoder.
  • The s2v task (_run_input_encoder_local_s2v) is not implemented.

Including these tasks in the condition will lead to the image encoder being loaded when it's not used, which is contrary to the goal of this PR.

For improved correctness and maintainability, I suggest:

  1. Using a more precise set of tasks, like {"i2v", "flf2v"}.
  2. Using a set for a more efficient and idiomatic membership check.
  3. Consider defining this set as a class-level constant for better readability if it's used elsewhere.
Suggested change
if self.config["task"] in ["i2v", "flf2v", "animate", "s2v"] and self.config.get("use_image_encoder", True):
if self.config["task"] in {"i2v", "flf2v"} and self.config.get("use_image_encoder", True):

@FredyRivera-dev
Copy link
Copy Markdown
Contributor Author

I've already made the changes so that it doesn't activate in other unnecessary cases either.

@gushiqiao gushiqiao merged commit e139b07 into ModelTC:main Dec 23, 2025
helloyongyang pushed a commit that referenced this pull request Mar 6, 2026
…deo15Runner (#651)

When running the `HunyuanVideo-1.5` model, the image encoder
(`SiglipVisionEncoder`) was always loaded during model initialization,
even when it was not needed. This caused unnecessary memory usage and
increased initialization time, especially for `t2v` workflows.

This PR updates the image encoder loading logic to match the pattern
already used in `WanRunner`.

### Problem

* `load_image_encoder()` in `HunyuanVideo15Runner` is called
unconditionally during model initialization.
* The original implementation always instantiated `SiglipVisionEncoder`,
regardless of:

  * The task type (e.g. `t2v`)
  * The `use_image_encoder` configuration flag
* The `use_image_encoder` flag was only checked later at runtime, not
during model loading.

This created an inconsistency between the loading path and the execution
path.

### Root Cause

The responsibility for checking whether the image encoder is needed was
placed only in the runtime logic, while the loading logic ignored both
the task type and the `use_image_encoder` flag. As a result, the image
encoder was loaded even when it would never be used.


### Solution

Update `load_image_encoder()` to follow the same pattern used in
`WanRunner`:

* Initialize `image_encoder` as `None`
* Only load `SiglipVisionEncoder` when:

* The task requires an image encoder (`i2v`, `flf2v`, `animate`, `s2v`)
  * `use_image_encoder` is enabled

### Changes

#### Original implementation

```python
def load_image_encoder(self):
    siglip_offload = self.config.get("siglip_cpu_offload", self.config.get("cpu_offload"))
    if siglip_offload:
        siglip_device = torch.device("cpu")
    else:
        siglip_device = torch.device(AI_DEVICE)
    image_encoder = SiglipVisionEncoder(
        config=self.config,
        device=siglip_device,
        checkpoint_path=self.config["model_path"],
        cpu_offload=siglip_offload,
    )
```

#### Updated implementation

```python
def load_image_encoder(self):
    image_encoder = None
    if self.config["task"] in ["i2v", "flf2v", "animate", "s2v"] and self.config.get("use_image_encoder", True):
        siglip_offload = self.config.get("siglip_cpu_offload", self.config.get("cpu_offload"))
        if siglip_offload:
            siglip_device = torch.device("cpu")
        else:
            siglip_device = torch.device(AI_DEVICE)
        image_encoder = SiglipVisionEncoder(
            config=self.config,
            device=siglip_device,
            checkpoint_path=self.config["model_path"],
            cpu_offload=siglip_offload,
        )
    return image_encoder
```

### Notes

This is a minimal and safe change that only affects the loading logic.
Runtime behavior remains unchanged, except that unused components are no
longer loaded into memory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants