Serious Issues with Real-Time Streaming #147

mingheyuemankong · 2026-04-08T16:05:14Z

mingheyuemankong
Apr 8, 2026

In the official streaming inference implementation, audio is accumulated from the beginning during inference. See:
https://github.com/QwenLM/Qwen3-ASR/blob/main/qwen_asr/inference/qwen3_asr.py#L800-L807

As the conversation progresses, the accumulated audio length increases, eventually leading to stuttering or even unresponsiveness. Therefore, this streaming is essentially pseudo-streaming because it does not perform incremental inference, which is in serious conflict with the actual claims. I would like to know if true streaming can be provided.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serious Issues with Real-Time Streaming #147

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Serious Issues with Real-Time Streaming #147

Uh oh!

mingheyuemankong Apr 8, 2026

Replies: 0 comments

mingheyuemankong
Apr 8, 2026