Closed
Description
Your current environment
The output of python collect_env.py
Your output of `python collect_env.py` here
🐛 Describe the bug
vllm serve Qwen/Qwen3-30B-A3B --model-impl=transformers
Result in error:
File ".../vllm/model_executor/models/transformers.py", line 66, in vllm_flash_attention_forward
self_attn = attention_instances[module.layer_idx]
TypeError: 'NoneType' object is not subscriptable
which I believe is a unique bug related to MoE structure, as the dense models of Qwen3 can be served normally with transformers backend.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.