Skip to content

Support seedvr2 for super resolution#906

Merged
helloyongyang merged 32 commits intoModelTC:mainfrom
Musisoul:seedvr_260209
Mar 5, 2026
Merged

Support seedvr2 for super resolution#906
helloyongyang merged 32 commits intoModelTC:mainfrom
Musisoul:seedvr_260209

Conversation

@Musisoul
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the LightX2V framework by adding full support for the SeedVR2 super-resolution model. It enables users to perform video and image super-resolution through a new dedicated task, integrating the model's complex architecture, including its unique attention mechanisms, rotary embeddings, and diffusion scheduling. The changes encompass new configuration options, command-line arguments, and API extensions, alongside helper scripts to facilitate the use of this new capability.

Highlights

  • SeedVR2 Model Integration: Introduced comprehensive support for the SeedVR2 super-resolution model, including its architecture, inference pipeline, and dedicated runner.
  • Super-Resolution Task Support: Added a new 'sr' task type, enabling video and image super-resolution capabilities within the LightX2V framework.
  • New Configuration and Arguments: Included a new configuration file for seedvr2_3b.json and new command-line arguments (--video_path, --sr_ratio) to support super-resolution tasks.
  • Advanced VAE and Diffusion Components: Integrated SeedVR-specific 3D VAE architecture with causal convolutions, rotary embeddings, and a custom diffusion scheduler tailored for super-resolution.
  • API and Scripting Updates: Updated the API schema to include video_path for SR tasks and provided new shell scripts for running and testing SeedVR2 super-resolution.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • configs/seedvr/seedvr2_3b.json
    • Added a new configuration file for SeedVR2 3B model parameters.
  • lightx2v/infer.py
    • Imported the new SeedVRRunner.
    • Added 'seedvr2' to the list of available model classes.
    • Included 'sr' (super-resolution) as a new task choice.
    • Introduced '--video_path' argument for specifying input video for SR/V2V tasks.
    • Added '--sr_ratio' argument to define the super-resolution ratio.
  • lightx2v/models/networks/seedvr/init.py
    • Exported the new SeedVRNaDiTModel.
  • lightx2v/models/networks/seedvr/dit_v2/init.py
    • Exported the 'na' module.
  • lightx2v/models/networks/seedvr/dit_v2/attention.py
    • Added TorchAttention and FlashAttentionVarlen classes for attention mechanisms.
  • lightx2v/models/networks/seedvr/dit_v2/cache.py
    • Added a Cache class for efficient data caching during inference.
  • lightx2v/models/networks/seedvr/dit_v2/diffusion/init.py
    • Exported classifier-free guidance utility functions.
  • lightx2v/models/networks/seedvr/dit_v2/diffusion/utils.py
    • Added utility functions for tensor dimension expansion, schedule compatibility assertion, and classifier-free guidance.
  • lightx2v/models/networks/seedvr/dit_v2/na.py
    • Implemented utility functions for flattening, unflattening, concatenating, repeating, packing, unpacking, and windowing tensors.
  • lightx2v/models/networks/seedvr/dit_v2/ops.py
    • Added functions for slicing inputs, gathering outputs, and safe padding operations.
  • lightx2v/models/networks/seedvr/dit_v2/rope.py
    • Implemented various Rotary Embedding classes (RotaryEmbeddingBase, RotaryEmbedding3d, MMRotaryEmbeddingBase, NaMMRotaryEmbedding3d) and a factory function 'get_na_rope'.
  • lightx2v/models/networks/seedvr/dit_v2/rotary_embedding_torch.py
    • Added a comprehensive RotaryEmbedding module with helper functions for applying rotary embeddings.
  • lightx2v/models/networks/seedvr/dit_v2/window.py
    • Provided functions to get windowing operations and create 720P windows for different sizes and shifts.
  • lightx2v/models/networks/seedvr/infer/module_io.py
    • Defined the SeedVRPreInferOutput dataclass for pre-inference data.
  • lightx2v/models/networks/seedvr/infer/post_infer.py
    • Created the SeedVRPostInfer class for handling post-inference processing, including patch output.
  • lightx2v/models/networks/seedvr/infer/pre_infer.py
    • Developed the SeedVRPreInfer class for pre-inference steps like time embedding and patch input.
  • lightx2v/models/networks/seedvr/infer/transformer_infer.py
    • Implemented the SeedVRTransformerInfer class for the core transformer inference logic, including attention and MLP blocks.
  • lightx2v/models/networks/seedvr/infer/utils.py
    • Added utility functions for RMS and Layer normalization without weights, and for applying AdaLN.
  • lightx2v/models/networks/seedvr/model.py
    • Defined the SeedVRNaDiTModel, inheriting from BaseTransformerModel, with SeedVR-specific defaults, inference classes, and custom checkpoint loading.
  • lightx2v/models/networks/seedvr/weights/post_weights.py
    • Created the SeedVRPostWeights class to manage post-processing weights.
  • lightx2v/models/networks/seedvr/weights/pre_weights.py
    • Implemented the SeedVRPreWeights class for managing pre-processing weights, including text and time embeddings.
  • lightx2v/models/networks/seedvr/weights/transformer_weights.py
    • Developed SeedVRTransformerWeights and SeedVRTransformerBlockWeights classes for organizing transformer block weights.
  • lightx2v/models/runners/default_runner.py
    • Updated 'init_modules' to assign '_run_input_encoder_local_sr' for the 'sr' task.
  • lightx2v/models/runners/seedvr/init.py
    • Exported the SeedVRRunner.
  • lightx2v/models/runners/seedvr/seedvr_runner.py
    • Added the SeedVRRunner class, extending DefaultRunner, to handle SeedVR-specific model loading, VAE processing, text encoding, and input transformations for super-resolution.
  • lightx2v/models/schedulers/seedvr/init.py
    • Exported the SeedVRScheduler.
  • lightx2v/models/schedulers/seedvr/diffusion/init.py
    • Exported functions for creating diffusion sampler, timesteps, and schedule from configuration.
  • lightx2v/models/schedulers/seedvr/diffusion/config.py
    • Added functions to create schedule, sampler, and sampling timesteps based on configuration.
  • lightx2v/models/schedulers/seedvr/diffusion/samplers/base.py
    • Defined the abstract Sampler base class and SamplerModelArgs dataclass.
  • lightx2v/models/schedulers/seedvr/diffusion/samplers/euler.py
    • Implemented the EulerSampler for diffusion sampling.
  • lightx2v/models/schedulers/seedvr/diffusion/schedules/base.py
    • Defined the abstract Schedule base class.
  • lightx2v/models/schedulers/seedvr/diffusion/schedules/lerp.py
    • Implemented the LinearInterpolationSchedule for diffusion.
  • lightx2v/models/schedulers/seedvr/diffusion/timesteps/base.py
    • Defined abstract Timesteps and SamplingTimesteps base classes.
  • lightx2v/models/schedulers/seedvr/diffusion/timesteps/sampling/trailing.py
    • Implemented UniformTrailingSamplingTimesteps for sampling.
  • lightx2v/models/schedulers/seedvr/diffusion/types.py
    • Defined PredictionType and SamplingDirection enums.
  • lightx2v/models/schedulers/seedvr/diffusion/utils.py
    • Added utility functions for tensor manipulation and classifier-free guidance in diffusion.
  • lightx2v/models/schedulers/seedvr/scheduler.py
    • Created the SeedVRScheduler class, inheriting from BaseScheduler, with a linear interpolation schedule and Euler sampler.
  • lightx2v/models/video_encoders/hf/seedvr/init.py
    • Exported the 'attn_video_vae_v3_s8_c16_t4_inflation_sd3_init' function.
  • lightx2v/models/video_encoders/hf/seedvr/attn_video_vae.py
    • Implemented 3D versions of Upsample, Downsample, ResnetBlock, DownEncoderBlock, UpDecoderBlock, and UNetMidBlock for a 3D VAE architecture.
    • Defined Encoder3D and Decoder3D classes with causal convolution support.
    • Introduced VideoAutoencoderKL and VideoAutoencoderKLWrapper for SeedVR's VAE, including slicing and memory management.
  • lightx2v/models/video_encoders/hf/seedvr/causal_inflation_lib.py
    • Added InflatedCausalConv3d for causal 3D convolutions, along with functions for memory management, weight inflation, and head manipulation.
  • lightx2v/models/video_encoders/hf/seedvr/color_fix.py
    • Provided functions for color correction techniques like adaptive instance normalization and wavelet reconstruction.
  • lightx2v/models/video_encoders/hf/seedvr/common/cache.py
    • Added a Cache class for general-purpose caching.
  • lightx2v/models/video_encoders/hf/seedvr/common/distributed/advanced.py
    • Implemented advanced distributed functions for sequence parallel and model sharding groups.
  • lightx2v/models/video_encoders/hf/seedvr/common/distributed/basic.py
    • Added basic distributed utility functions like getting rank, world size, and device.
  • lightx2v/models/video_encoders/hf/seedvr/common/distributed/ops.py
    • Developed distributed operations for sequence parallel, including all-to-all, slicing, and gathering functions.
  • lightx2v/models/video_encoders/hf/seedvr/common/logger.py
    • Provided a utility function to get a logger with distributed rank information.
  • lightx2v/models/video_encoders/hf/seedvr/common/utils.py
    • Added safe padding and interpolation operations to handle half-precision tensors.
  • lightx2v/models/video_encoders/hf/seedvr/context_parallel_lib.py
    • Implemented functions for causal convolution input slicing and output gathering in a distributed context.
  • lightx2v/models/video_encoders/hf/seedvr/data/image/transforms/area_resize.py
    • Added AreaResize, AreaRandomCrop, and ScaleResize classes for image transformations.
  • lightx2v/models/video_encoders/hf/seedvr/data/image/transforms/divisible_crop.py
    • Implemented a DivisibleCrop class for cropping images to be divisible by a factor.
  • lightx2v/models/video_encoders/hf/seedvr/data/image/transforms/na_resize.py
    • Provided a NaResize function for various resizing modes.
  • lightx2v/models/video_encoders/hf/seedvr/data/image/transforms/side_resize.py
    • Added a SideResize class for resizing images based on their side length.
  • lightx2v/models/video_encoders/hf/seedvr/data/video/transforms/rearrange.py
    • Implemented a Rearrange class for tensor rearrangement using einops.
  • lightx2v/models/video_encoders/hf/seedvr/global_config.py
    • Added functions to get and set a global normalization memory limit.
  • lightx2v/models/video_encoders/hf/seedvr/types.py
    • Defined various types, enums, and named tuples for SeedVR components, including DiagonalGaussianDistribution and MemoryState.
  • lightx2v/pipeline.py
    • Imported SeedVRRunner.
    • Added 'video_path' and 'sr_ratio' parameters to the 'generate' method.
  • lightx2v/server/schema.py
    • Added 'video_path' field to the VideoTaskRequest schema for SR/V2V tasks.
  • lightx2v/server/services/generation/base.py
    • Removed the line setting 'task_data["video_path"]' to 'actual_save_path.name'.
  • lightx2v/server/services/inference/worker.py
    • Modified the return value for 'save_result_path' to directly use 'task_data["save_result_path"]'.
  • lightx2v/utils/input_info.py
    • Added the SRInputInfo dataclass for super-resolution task parameters.
    • Updated 'init_empty_input_info' to return SRInputInfo when the task is 'sr'.
  • scripts/seedvr2/run_seedvr2_3b_sr.sh
    • Added a new shell script to demonstrate running SeedVR2 super-resolution inference.
  • scripts/server/post_sr.py
    • Added a new Python script to post super-resolution requests to the LightX2V server.
  • scripts/server/start_server_sr.sh
    • Added a new shell script to start the LightX2V server configured for SeedVR2 super-resolution tasks.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the SeedVR2 model for super-resolution tasks. This is a substantial contribution that includes the model architecture, inference logic, a dedicated runner and scheduler, and example scripts. The overall structure is well-designed and integrates nicely with the existing framework. My review focuses on improving error handling, logging practices, and code clarity. I've also included suggestions for making the code more memory-efficient and maintainable.


# Create generator with specified parameters

pipe.create_generator(config_json="/path/to/LightX2V/configs/seedvr/seedvr2_3b.json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using an absolute path with a placeholder is not very portable. It would be better to use a relative path, as was done in the commented-out line 22. This makes the example script easier to run in different environments without modification.

Suggested change
pipe.create_generator(config_json="/path/to/LightX2V/configs/seedvr/seedvr2_3b.json")
pipe.create_generator(config_json="../configs/seedvr/seedvr2_3b.json")

pipe.create_generator(config_json="/path/to/LightX2V/configs/seedvr/seedvr2_3b.json")

seed = 42
prompt = "A cinematic video of a sunset over the ocean with golden reflections"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The prompt variable is defined and passed to pipe.generate, but for the SeedVR2 super-resolution task, the text embeddings are pre-computed and loaded from files (pos_emb.pt, neg_emb.pt) by the SeedVRRunner. The prompt provided here is not actually used to generate embeddings on the fly. This can be confusing for users. It would be clearer to either remove the unused prompt or add a comment explaining that pre-computed embeddings are used for this model and task.

from lightx2v.models.runners.longcat_image.longcat_image_runner import LongCatImageRunner # noqa: F401
from lightx2v.models.runners.ltx2.ltx2_runner import LTX2Runner # noqa: F401
from lightx2v.models.runners.qwen_image.qwen_image_runner import QwenImageRunner # noqa: F401
from lightx2v.models.runners.seedvr.seedvr_runner import SeedVRRunner # noqa: F401
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better code maintainability and readability, it's a good practice to keep imports sorted alphabetically. The new import for SeedVRRunner could be placed in alphabetical order among the other runner imports.

Comment on lines +296 to +303
from torchvision.io import read_video

video, _, _ = read_video(video_path, output_format="TCHW")
if video.numel() == 0:
raise ValueError(f"Failed to read video from {video_path}")
img = (video / 255.0).to(self.init_device)
elif "image_path" in self.input_info.__dataclass_fields__ and self.input_info.image_path:
from PIL import Image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better code readability and to adhere to common Python style guidelines (PEP 8), it's recommended to place all imports at the top of the file. Moving from torchvision.io import read_video and from PIL import Image to the top-level imports will make dependencies clearer and avoid repeated import attempts.

# up
reversed_block_out_channels = list(reversed(block_out_channels))
output_channel = reversed_block_out_channels[0]
print(f"slicing_up_num: {slicing_up_num}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This print statement appears to be for debugging purposes. It should be removed or converted to a logger.debug call before merging to keep the codebase clean and avoid polluting the console output in production.

Suggested change
print(f"slicing_up_num: {slicing_up_num}")
logger.debug(f"slicing_up_num: {slicing_up_num}")

@helloyongyang helloyongyang merged commit 78eed75 into ModelTC:main Mar 5, 2026
1 check passed
helloyongyang pushed a commit that referenced this pull request Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants