Does Qwen_2_5_VL support variable length attention computation? #38007

yingtongxiong · 2025-05-08T02:38:21Z

Feature request

Qwen_2_5_VL support variable length attention computation

Motivation

Hello, I try to run qwen25_vl with packing samples, however, I found that it seems this function only passes the attention_mask, not the position_ids in https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py#L908. So I pass the position_ids to this function and met the illegal memory access. Finally, I found that the position_ids has been expanded 3 times in dim 0, so how can I use the position_ids, what if I want to use varlen flash attention? Would anyone be able to help me with this?

Your contribution

no

zucchini-nlp · 2025-05-08T12:39:36Z

@yingtongxiong Qwen VL position ids are different from simple LLMs, so simply passing position_ids tp FA2 for packing will not solve the issue. Probably we'll need to pass different set of position_ids or infer it from 3D ids. I will take a look at it

yingtongxiong · 2025-05-09T03:22:31Z

@zucchini-nlp thank you very much. I see in verl, it passes position_ids[0] to flash attention. I am not sure it is correct.

yingtongxiong added the Feature request label May 8, 2025

zucchini-nlp self-assigned this May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does Qwen_2_5_VL support variable length attention computation? #38007

Does Qwen_2_5_VL support variable length attention computation? #38007

yingtongxiong commented May 8, 2025

zucchini-nlp commented May 8, 2025

Uh oh!

yingtongxiong commented May 9, 2025

Uh oh!

Does Qwen_2_5_VL support variable length attention computation? #38007

Does Qwen_2_5_VL support variable length attention computation? #38007

Comments

yingtongxiong commented May 8, 2025

Feature request

Motivation

Your contribution

zucchini-nlp commented May 8, 2025

Uh oh!

yingtongxiong commented May 9, 2025

Uh oh!