You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Qwen_2_5_VL support variable length attention computation
Motivation
Hello, I try to run qwen25_vl with packing samples, however, I found that it seems this function only passes the attention_mask, not the position_ids in https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py#L908. So I pass the position_ids to this function and met the illegal memory access. Finally, I found that the position_ids has been expanded 3 times in dim 0, so how can I use the position_ids, what if I want to use varlen flash attention? Would anyone be able to help me with this?
Your contribution
no
The text was updated successfully, but these errors were encountered:
@yingtongxiong Qwen VL position ids are different from simple LLMs, so simply passing position_ids tp FA2 for packing will not solve the issue. Probably we'll need to pass different set of position_ids or infer it from 3D ids. I will take a look at it
Feature request
Qwen_2_5_VL support variable length attention computation
Motivation
Hello, I try to run qwen25_vl with packing samples, however, I found that it seems this function only passes the attention_mask, not the position_ids in https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py#L908. So I pass the position_ids to this function and met the illegal memory access. Finally, I found that the position_ids has been expanded 3 times in dim 0, so how can I use the position_ids, what if I want to use varlen flash attention? Would anyone be able to help me with this?
Your contribution
no
The text was updated successfully, but these errors were encountered: