You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Flash Attention can only be used with fp16 and bf16, not with fp32. Therefore, we should make flash attention optional in our codebase so that one can deactivate it during inference in exchange for higher precision.
Flash Attention can only be used with fp16 and bf16, not with fp32. Therefore, we should make flash attention optional in our codebase so that one can deactivate it during inference in exchange for higher precision.