You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As per the attention mask example in the Gemma3 blog (https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/gemma3/attention-ascii.png), it looks like there is non-causal attention within the image and causal attention across images (i.e., an image does not attend to a future image). However, when running gemma3 generate using transformers (v4.51.3), looks like there is non-causal attention across images.
Hmm right, just checked the mask and an image attends to all images back and forward. I believe this needs a fix, will check with the original implementation once more and make a fix if needed, thanks for reporting
System Info
As per the attention mask example in the Gemma3 blog (https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/gemma3/attention-ascii.png), it looks like there is non-causal attention within the image and causal attention across images (i.e., an image does not attend to a future image). However, when running gemma3 generate using transformers (v4.51.3), looks like there is non-causal attention across images.
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Code attached
Expected behavior
Should the attention across image be non-causal or causal?
The text was updated successfully, but these errors were encountered: