You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
测试qwen3-asr模型的时候,基于vllm v1/realtime的WS接口接入,启用上下文KV缓存复用策略;多轮对话交互后发生异常:实际说话内容与ASR转写结果不一致,语音识别内容错误复用历史轮次转写结果。

模型输出结果如图:
模型启动脚本如下:
vllm版本是0.18.0昇腾910b芯片
vllm serve /appdata/model/qwen3-asr/Qwen3-ASR-1.7B
--host 0.0.0.0 --port 8001
--gpu-memory-utilization 0.9
--max-model-len 32768
--block-size 128
--allowed-local-media-path
--no-async-scheduling
--served-model-name qw3-asr
--mm_processor_cache_type lru
--mm_processor_cache_gb 0
--compilation-config {"cudagraph_mode": "FULL"}
--hf-overrides {"architectures":["Qwen3ASRRealtimeGeneration"]}
--max-num-batched-tokens 16384
Beta Was this translation helpful? Give feedback.
All reactions