Flash Attention doesn't support attention mask #109

clownrat6 · 2024-03-10T02:20:35Z

Test code from this:

import time
import torch
import torch.nn.functional as F


N = 32
L = 2048
dims = 64
n_heads = 8
q = torch.randn(N, n_heads, L, dims, dtype=torch.float16).cuda()
k = torch.randn(N, n_heads, L, dims, dtype=torch.float16).cuda()
v = torch.randn(N, n_heads, L, dims, dtype=torch.float16).cuda()

dropout_rate = 0.2
num_trials = 10

with torch.backends.cuda.sdp_kernel(
    enable_flash=True, enable_math=False, enable_mem_efficient=False
):

    attn_bias = torch.zeros(N, n_heads, L, L, dtype=q.dtype).to(q.device)
    torch.cuda.synchronize()
    start = time.time()
    for i in range(num_trials):
        out = F.scaled_dot_product_attention(q, k, v, attn_mask=attn_bias, dropout_p=dropout_rate)
    torch.cuda.synchronize()
    end = time.time()
    print('Flash attention took {} seconds for {} trials'.format(end - start, num_trials))

Errors:

Related issues:
Dao-AILab/flash-attention#352

The text was updated successfully, but these errors were encountered:

LinB203 · 2024-03-10T03:14:01Z

Thanks! We fixed that by turning to xformers.
See https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/opensora/models/diffusion/latte/latte.py#L109-L118

clownrat6 · 2024-03-10T03:45:25Z

Bin God 🙏🙏

clownrat6 closed this as completed Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash Attention doesn't support attention mask #109

Flash Attention doesn't support attention mask #109

clownrat6 commented Mar 10, 2024 •

edited

Loading

LinB203 commented Mar 10, 2024

clownrat6 commented Mar 10, 2024

Flash Attention doesn't support attention mask #109

Flash Attention doesn't support attention mask #109

Comments

clownrat6 commented Mar 10, 2024 • edited Loading

LinB203 commented Mar 10, 2024

clownrat6 commented Mar 10, 2024

clownrat6 commented Mar 10, 2024 •

edited

Loading