Skip to content

Stream vae#594

Merged
helloyongyang merged 3 commits intoModelTC:mainfrom
huochaitiantang:stream_vae
Dec 10, 2025
Merged

Stream vae#594
helloyongyang merged 3 commits intoModelTC:mainfrom
huochaitiantang:stream_vae

Conversation

@huochaitiantang
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @huochaitiantang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement by implementing a streaming Variational Autoencoder (VAE) decoding mechanism. This new approach allows for more efficient and potentially real-time processing of video and audio segments, which can be toggled via a configuration flag. The changes also include minor dependency adjustments to streamline the codebase and improve module loading.

Highlights

  • Streaming VAE Implementation: A new end_run_segment_stream method has been introduced in wan_audio_runner.py to enable streaming VAE decoding. This method processes video and audio segments incrementally, allowing for more efficient and potentially real-time output.
  • Configurable VAE Decoding: The run_main method in wan_audio_runner.py now includes a conditional check for a use_stream_vae configuration flag. This allows users to switch between the traditional VAE decoding and the newly implemented streaming VAE approach.
  • Dependency Optimization: The torch.distributed import and its associated get_rank_and_world_size method have been removed from wan_audio_runner.py. Additionally, OmniVAReader is now lazily imported within specific methods in va_controller.py, potentially improving module loading performance.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @huochaitiantang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system by integrating a streaming Variational Autoencoder (VAE) capability, which allows for more efficient and potentially real-time processing of generated video and audio segments. This new functionality can be activated via a configuration setting. Additionally, the changes include minor refactoring to streamline imports and remove unused distributed processing utilities.

Highlights

  • Streaming VAE Implementation: A new method, end_run_segment_stream, has been introduced to enable streaming processing of VAE (Variational Autoencoder) outputs, allowing for more dynamic handling of video and audio segments.
  • Conditional VAE Processing: The system now supports switching between the traditional VAE decoding and the new streaming VAE approach based on a use_stream_vae configuration flag, providing flexibility in how VAE outputs are handled.
  • Distributed Module Cleanup: The torch.distributed import and the get_rank_and_world_size utility method have been removed from wan_audio_runner.py, indicating a refactoring or removal of distributed processing logic from this specific file.
  • Localized Imports: The OmniVAReader import has been moved inside the next_control and before_control methods in va_controller.py, potentially to address circular dependencies or for lazy loading.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces streaming VAE functionality, which is a significant feature for improving real-time performance. The overall implementation is good, but I've identified a few areas for improvement. My feedback focuses on moving local imports to the module level for better code structure and performance, adding a missing stop check in the new streaming loop for graceful termination, and correcting a potential memory leak and logic error in how video segments are aggregated. Addressing these points will enhance the robustness and efficiency of the new streaming feature.

@helloyongyang helloyongyang merged commit 5b902af into ModelTC:main Dec 10, 2025
1 check passed
@huochaitiantang huochaitiantang deleted the stream_vae branch December 16, 2025 02:24
helloyongyang pushed a commit that referenced this pull request Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants