Serious Issues with Real-Time Streaming #147
Unanswered
mingheyuemankong
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In the official streaming inference implementation, audio is accumulated from the beginning during inference. See:
https://github.com/QwenLM/Qwen3-ASR/blob/main/qwen_asr/inference/qwen3_asr.py#L800-L807
As the conversation progresses, the accumulated audio length increases, eventually leading to stuttering or even unresponsiveness. Therefore, this streaming is essentially pseudo-streaming because it does not perform incremental inference, which is in serious conflict with the actual claims. I would like to know if true streaming can be provided.
Beta Was this translation helpful? Give feedback.
All reactions