You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for your great work.
Would you be willing to support attn_mask in flash attention. Since Query Denoise[1, 2] seems to be a common practice in Computer Vision tasks.
[1] Li F, Zhang H, Liu S, et al. Dn-detr: Accelerate detr training by introducing query denoising
[2] Zhang H, Li F, Liu S, et al. Dino: Detr with improved denoising anchor boxes for end-to-end object detection
The text was updated successfully, but these errors were encountered:
The Triton implementation in this repo supports attention bias.
However it's an experimental feature, as I sometimes see race conditions (due to the Triton compiler) in the backward pass with attention bias.
The Triton team has just rewritten their backend so things might be more stable, but I haven't tried.
@tridao also perf-wise, how does Triton impl compare to your custom cutlass impl? are there any recent evidence on the backward issues solved in more recent tritons?
Hi, thanks for your great work.
Would you be willing to support attn_mask in flash attention. Since Query Denoise[1, 2] seems to be a common practice in Computer Vision tasks.
[1] Li F, Zhang H, Liu S, et al. Dn-detr: Accelerate detr training by introducing query denoising
[2] Zhang H, Li F, Liu S, et al. Dino: Detr with improved denoising anchor boxes for end-to-end object detection
The text was updated successfully, but these errors were encountered: