Skip to content

[Bug]: Qwen3-Next cuda graph build error #101

@yuhao-zh

Description

@yuhao-zh

Steps to Reproduce

For Qwen3-Next model on gpu runner, when start_layer and end_layer include attention layer, cuda graph will not build correctly due to shape error

for example, start_layer = 0, end_layer = 4 on 4090D
Image

Expected Behavior

model load success and server start success

Actual Behavior

model load success but cuda graph build not true

Version

main

Environment & Context

  • I'm using the latest version.
  • I have searched existing issues.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions