Skip to content

[Bug]: head_first parameter incompatible with flash-attn >=0.5.0 #344

@z1ying

Description

@z1ying

Is there an existing issue ? / 是否已有相关的 issue ?

  • I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。

Describe the bug / 描述这个 bug

I am currently integrating MiniCPM-SALA into vLLM for high-performance inference.

During testing, I found that the function chunk_simple_gla in the Hugging Face modeling code is called with the parameter head_first.

Flash-attn version >= 0.5.0 no longer supports this parameter, which causes errors when using the latest flash-attn for vLLM integration.

To Reproduce / 如何复现

  1. Install flash-attn >=0.5.0.
  2. Load MiniCPM-SALA using HF Transformers with trust_remote_code=True.
  3. During vLLM integration or manual call to chunk_simple_gla, the model triggers an error because head_first is no longer a valid parameter in flash-attn >=0.5.0.

Actual: TypeError: unexpected keyword argument 'head_first'.

Expected behavior / 期望的结果

Would it be possible to update the Hugging Face modeling file to remove the head_first parameter (or make it optional) so the model is compatible with flash-attn >= 0.5.0?

Expected: model runs correctly.

Thank you in advance for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions