New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --xformers-flash-attention option to improve the reproducibility #6988
Add --xformers-flash-attention option to improve the reproducibility #6988
Conversation
I've been working on xformers reproducibility as well, so I'm looking forward on this being implemented. That being said, have you tried this with xformers 0.0.14.dev0, the version currently being used by WebUI? |
If this does not work on SD1 the usefulness of this is going to be extremely limited. |
I'm getting reproducible results on torch: 1.13.1+cu117 xformers: 0.0.16rc396, using SD1, no warnings or errors of any kind. Have we found a reason to upgrade? |
In my short test of SD1 I've found that it seems to calls I'm not well versed enough to know what those 32 calls are, but it seems that a lot of SD1 becomes at least a bit more reproducible. For SD2 everything can use flash attention and greatly benefits from the reproducibility. My recommendation would be to extract the code into it's own function def get_supported_attention_op(q, k, v, attn_bias=None):
op = None
if shared.cmd_opts.xformers_flash_attention:
op = xformers.ops.MemoryEfficientAttentionFlashAttentionOp
fw, bw = op
if not fw.supports(xformers.ops.fmha.Inputs(query=q, key=k, value=v, attn_bias=attn_bias)):
op = None
return op op = get_supported_attention_op(q, k, v, attn_bias=None)
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None, op=op) |
And on torch: 1.12.1+cu113 • xformers: 0.0.14.dev this just breaks image generation with an error |
@AUTOMATIC1111 I'm not sure why you merged the PR if it breaks when using the current dependencies? Shouldn't
|
because i included a commit by myself to make it not break |
Ah, okay. So you're keeping the requirements to 11.3 for now, but inform people that flash attention is not used. |
What does this PR do?
Add the
--xformers-flash-attention
argument to enable the use of Flash Attention in xFormers. Using Flash Attention improves the reproducibility of SD image generation due to its deterministic behavior.xFormer's reproducibility problem has been discussed in several issues over the last year. I think this comment is the most exhaustive list by @0xdevalias in #4011.
Limitation
Unfortunately, Flash Attention won't accept SD1.x attention shapes, but will accept SD2.x or variants.
Test
I have tested this PR code using the following code on Ubuntu. I don't have a Windows environment personally, so I would appreciate it if someone could test it on Windows.
https://gist.github.com/takuma104/58fbd99a02006c67dbb9ff968c7417f2
Test Environment
Discussion
Any suggestions are welcome.