Is there an existing issue ? / 是否已有相关的 issue ?
Describe the bug / 描述这个 bug
I am currently integrating MiniCPM-SALA into vLLM for high-performance inference.
During testing, I found that the function chunk_simple_gla in the Hugging Face modeling code is called with the parameter head_first.
Flash-attn version >= 0.5.0 no longer supports this parameter, which causes errors when using the latest flash-attn for vLLM integration.
To Reproduce / 如何复现
- Install flash-attn >=0.5.0.
- Load MiniCPM-SALA using HF Transformers with
trust_remote_code=True.
- During vLLM integration or manual call to
chunk_simple_gla, the model triggers an error because head_first is no longer a valid parameter in flash-attn >=0.5.0.
Actual: TypeError: unexpected keyword argument 'head_first'.
Expected behavior / 期望的结果
Would it be possible to update the Hugging Face modeling file to remove the head_first parameter (or make it optional) so the model is compatible with flash-attn >= 0.5.0?
Expected: model runs correctly.
Thank you in advance for your help!
Is there an existing issue ? / 是否已有相关的 issue ?
Describe the bug / 描述这个 bug
I am currently integrating MiniCPM-SALA into vLLM for high-performance inference.
During testing, I found that the function
chunk_simple_glain the Hugging Face modeling code is called with the parameterhead_first.Flash-attn version >= 0.5.0 no longer supports this parameter, which causes errors when using the latest flash-attn for vLLM integration.
To Reproduce / 如何复现
trust_remote_code=True.chunk_simple_gla, the model triggers an error becausehead_firstis no longer a valid parameter in flash-attn >=0.5.0.Actual: TypeError: unexpected keyword argument 'head_first'.
Expected behavior / 期望的结果
Would it be possible to update the Hugging Face modeling file to remove the
head_firstparameter (or make it optional) so the model is compatible with flash-attn >= 0.5.0?Expected: model runs correctly.
Thank you in advance for your help!