Support seedvr2 for super resolution#906
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the LightX2V framework by adding full support for the SeedVR2 super-resolution model. It enables users to perform video and image super-resolution through a new dedicated task, integrating the model's complex architecture, including its unique attention mechanisms, rotary embeddings, and diffusion scheduling. The changes encompass new configuration options, command-line arguments, and API extensions, alongside helper scripts to facilitate the use of this new capability. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for the SeedVR2 model for super-resolution tasks. This is a substantial contribution that includes the model architecture, inference logic, a dedicated runner and scheduler, and example scripts. The overall structure is well-designed and integrates nicely with the existing framework. My review focuses on improving error handling, logging practices, and code clarity. I've also included suggestions for making the code more memory-efficient and maintainable.
|
|
||
| # Create generator with specified parameters | ||
|
|
||
| pipe.create_generator(config_json="/path/to/LightX2V/configs/seedvr/seedvr2_3b.json") |
There was a problem hiding this comment.
Using an absolute path with a placeholder is not very portable. It would be better to use a relative path, as was done in the commented-out line 22. This makes the example script easier to run in different environments without modification.
| pipe.create_generator(config_json="/path/to/LightX2V/configs/seedvr/seedvr2_3b.json") | |
| pipe.create_generator(config_json="../configs/seedvr/seedvr2_3b.json") |
| pipe.create_generator(config_json="/path/to/LightX2V/configs/seedvr/seedvr2_3b.json") | ||
|
|
||
| seed = 42 | ||
| prompt = "A cinematic video of a sunset over the ocean with golden reflections" |
There was a problem hiding this comment.
The prompt variable is defined and passed to pipe.generate, but for the SeedVR2 super-resolution task, the text embeddings are pre-computed and loaded from files (pos_emb.pt, neg_emb.pt) by the SeedVRRunner. The prompt provided here is not actually used to generate embeddings on the fly. This can be confusing for users. It would be clearer to either remove the unused prompt or add a comment explaining that pre-computed embeddings are used for this model and task.
| from lightx2v.models.runners.longcat_image.longcat_image_runner import LongCatImageRunner # noqa: F401 | ||
| from lightx2v.models.runners.ltx2.ltx2_runner import LTX2Runner # noqa: F401 | ||
| from lightx2v.models.runners.qwen_image.qwen_image_runner import QwenImageRunner # noqa: F401 | ||
| from lightx2v.models.runners.seedvr.seedvr_runner import SeedVRRunner # noqa: F401 |
| from torchvision.io import read_video | ||
|
|
||
| video, _, _ = read_video(video_path, output_format="TCHW") | ||
| if video.numel() == 0: | ||
| raise ValueError(f"Failed to read video from {video_path}") | ||
| img = (video / 255.0).to(self.init_device) | ||
| elif "image_path" in self.input_info.__dataclass_fields__ and self.input_info.image_path: | ||
| from PIL import Image |
There was a problem hiding this comment.
For better code readability and to adhere to common Python style guidelines (PEP 8), it's recommended to place all imports at the top of the file. Moving from torchvision.io import read_video and from PIL import Image to the top-level imports will make dependencies clearer and avoid repeated import attempts.
| # up | ||
| reversed_block_out_channels = list(reversed(block_out_channels)) | ||
| output_channel = reversed_block_out_channels[0] | ||
| print(f"slicing_up_num: {slicing_up_num}") |
There was a problem hiding this comment.
This print statement appears to be for debugging purposes. It should be removed or converted to a logger.debug call before merging to keep the codebase clean and avoid polluting the console output in production.
| print(f"slicing_up_num: {slicing_up_num}") | |
| logger.debug(f"slicing_up_num: {slicing_up_num}") |
No description provided.