Steps to Reproduce
For Qwen3-Next model on gpu runner, when start_layer and end_layer include attention layer, cuda graph will not build correctly due to shape error
for example, start_layer = 0, end_layer = 4 on 4090D

Expected Behavior
model load success and server start success
Actual Behavior
model load success but cuda graph build not true
Version
main
Environment & Context