Skip to content

Dev/omni#577

Merged
helloyongyang merged 17 commits intoModelTC:mainfrom
huochaitiantang:dev/omni
Dec 8, 2025
Merged

Dev/omni#577
helloyongyang merged 17 commits intoModelTC:mainfrom
huochaitiantang:dev/omni

Conversation

@huochaitiantang
Copy link
Collaborator

@huochaitiantang huochaitiantang commented Dec 8, 2025

Tidy VAReader & OmniVAReader
Tidy VARecorder & X264VARecorder
VARecorder with stream, use buffer stream
Tidy env WORKER_RANK, READER_RANK, RECORDER_RANK
Support voice type choose

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @huochaitiantang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the audio and video processing infrastructure by introducing a VAController to orchestrate reader and recorder components. It enhances streaming capabilities with a new OmniVAReader that integrates with an external chat server for real-time audio, and upgrades existing recorders with buffering and scheduling for smoother output. The changes also streamline distributed processing configurations and introduce a pause mechanism for model execution, leading to a more robust and responsive system for live audio-visual generation.

Highlights

  • Centralized VA Management: Introduced a new VAController class to centralize the initialization and control flow for VAReader and VARecorder instances, simplifying the audio/video processing pipeline.
  • Advanced Streaming Audio Input: Added OmniVAReader for sophisticated streaming audio input, integrating with an external seko-chatter process via ZeroMQ for real-time audio reception and processing, including immediate switching capabilities.
  • Enhanced Recorder Buffering and Scheduling: The VARecorder and X264VARecorder classes now feature stream buffering and a dedicated scheduling thread, allowing for more controlled and efficient real-time publishing of video and audio data.
  • Distributed Rank Configuration: Refactored the use of environment variables (WORKER_RANK, READER_RANK, RECORDER_RANK) to dynamically assign roles and target ranks in distributed processing, improving flexibility and clarity.
  • Model Pause Signal: Implemented a pause_signal mechanism within the BaseRunner to allow the model inference loop to be temporarily paused, particularly useful in dynamic streaming scenarios managed by the VAController.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring to support streaming audio and video by adding a VAController to manage VAReader and VARecorder. It also makes worker, reader, and recorder ranks configurable via environment variables, which increases flexibility. A new OmniVAReader is added for a specific streaming scenario, including logic for pausing and resuming inference.

My review focuses on the correctness and robustness of these new additions. I've found a few critical issues, including typos that would lead to runtime errors and a potential AttributeError due to incorrect object reset. I've also identified some high-severity issues related to logging configuration, environment variable handling, and the use of bare except clauses. Additionally, there are several medium-severity suggestions for improving code clarity and fixing minor bugs.

Overall, the direction of the changes is good, but the identified issues should be addressed to ensure the stability and correctness of the new streaming functionality.

@helloyongyang helloyongyang merged commit bc2828b into ModelTC:main Dec 8, 2025
1 check passed
@huochaitiantang huochaitiantang deleted the dev/omni branch December 16, 2025 02:24
helloyongyang pushed a commit that referenced this pull request Mar 6, 2026
Tidy VAReader & OmniVAReader
Tidy VARecorder & X264VARecorder
VARecorder with stream, use buffer stream
Tidy env WORKER_RANK, READER_RANK, RECORDER_RANK
Support voice type choose
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants