Dev/omni by huochaitiantang · Pull Request #577 · ModelTC/LightX2V

huochaitiantang · 2025-12-08T05:18:44Z

Tidy VAReader & OmniVAReader
Tidy VARecorder & X264VARecorder
VARecorder with stream, use buffer stream
Tidy env WORKER_RANK, READER_RANK, RECORDER_RANK
Support voice type choose

gemini-code-assist · 2025-12-08T05:19:10Z

Summary of Changes

Hello @huochaitiantang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the audio and video processing infrastructure by introducing a VAController to orchestrate reader and recorder components. It enhances streaming capabilities with a new OmniVAReader that integrates with an external chat server for real-time audio, and upgrades existing recorders with buffering and scheduling for smoother output. The changes also streamline distributed processing configurations and introduce a pause mechanism for model execution, leading to a more robust and responsive system for live audio-visual generation.

Highlights

Centralized VA Management: Introduced a new VAController class to centralize the initialization and control flow for VAReader and VARecorder instances, simplifying the audio/video processing pipeline.
Advanced Streaming Audio Input: Added OmniVAReader for sophisticated streaming audio input, integrating with an external seko-chatter process via ZeroMQ for real-time audio reception and processing, including immediate switching capabilities.
Enhanced Recorder Buffering and Scheduling: The VARecorder and X264VARecorder classes now feature stream buffering and a dedicated scheduling thread, allowing for more controlled and efficient real-time publishing of video and audio data.
Distributed Rank Configuration: Refactored the use of environment variables (WORKER_RANK, READER_RANK, RECORDER_RANK) to dynamically assign roles and target ranks in distributed processing, improving flexibility and clarity.
Model Pause Signal: Implemented a pause_signal mechanism within the BaseRunner to allow the model inference loop to be temporarily paused, particularly useful in dynamic streaming scenarios managed by the VAController.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant refactoring to support streaming audio and video by adding a VAController to manage VAReader and VARecorder. It also makes worker, reader, and recorder ranks configurable via environment variables, which increases flexibility. A new OmniVAReader is added for a specific streaming scenario, including logic for pausing and resuming inference.

My review focuses on the correctness and robustness of these new additions. I've found a few critical issues, including typos that would lead to runtime errors and a potential AttributeError due to incorrect object reset. I've also identified some high-severity issues related to logging configuration, environment variable handling, and the use of bare except clauses. Additionally, there are several medium-severity suggestions for improving code clarity and fixing minor bugs.

Overall, the direction of the changes is good, but the identified issues should be addressed to ensure the stability and correctness of the new streaming functionality.

lightx2v/deploy/common/va_reader_omni.py

lightx2v/models/runners/wan/wan_audio_runner.py

lightx2v/deploy/common/va_reader_omni.py

lightx2v/deploy/common/va_controller.py

lightx2v/deploy/common/va_reader_omni.py

lightx2v/deploy/common/va_recorder_x264.py

Tidy VAReader & OmniVAReader Tidy VARecorder & X264VARecorder VARecorder with stream, use buffer stream Tidy env WORKER_RANK, READER_RANK, RECORDER_RANK Support voice type choose

huochaitiantang added 14 commits October 31, 2025 16:07

Update with omni opt, immediate switch blank -> voice

ffb59b6

Fix trunc stream buffer failed & add huoshan tts args

db85978

Merge branch 'main' into dev/omni

008578a

Merge branch 'main' into dev/omni

89bacc4

Merge branch 'main' into dev/omni

04c11dd

Merge branch 'main' of https://github.com/ModelTC/LightX2V into dev/omni

ffa3ef0

Update seko-chatter version

b7e6167

Merge branch 'main' into dev/omni

184d728

support vsr stream

13bfb52

Fix old key valid

9b6f6b6

Merge branch 'main' into dev/omni

f2fb899

Unify omni & main with VAReader & VARecorder

71908be

fix stuck error

d25d0b7

Fix va_recorder_x264 whip

7a5985d

gemini-code-assist bot reviewed Dec 8, 2025

View reviewed changes

huochaitiantang added 3 commits December 8, 2025 13:33

Fix typo

e0ee9fa

Fix typo

ea1e4cc

Code lint

d6e9c76

helloyongyang approved these changes Dec 8, 2025

View reviewed changes

helloyongyang merged commit bc2828b into ModelTC:main Dec 8, 2025
1 check passed

huochaitiantang deleted the dev/omni branch December 16, 2025 02:24

helloyongyang pushed a commit that referenced this pull request Mar 6, 2026

Dev/omni (#577)

77f0e97

Tidy VAReader & OmniVAReader Tidy VARecorder & X264VARecorder VARecorder with stream, use buffer stream Tidy env WORKER_RANK, READER_RANK, RECORDER_RANK Support voice type choose

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev/omni#577

Dev/omni#577
helloyongyang merged 17 commits intoModelTC:mainfrom
huochaitiantang:dev/omni

huochaitiantang commented Dec 8, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

huochaitiantang commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 8, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

huochaitiantang commented Dec 8, 2025 •

edited

Loading