feat: add bias mask #66

gaetansnl · 2022-09-28T12:04:18Z

Add mask support
Add mask broadcast support

PS: Didn't refactor mask creation methods, will do it in separate PR

pommedeterresautee

as discussed

pommedeterresautee · 2022-09-29T14:49:15Z

implementations/attention.py

@@ -9,13 +11,28 @@


 # Similar to https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py#L213
-def attention_reference(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, output: torch.Tensor, sm_scale: float, is_causal: bool):
+def attention_reference(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, output: torch.Tensor, sm_scale: float,
+                        is_causal: bool, attention_mask: Union[torch.Tensor, None]):


not very important but you can hint type the output -> torch.Tensor

pommedeterresautee · 2022-09-29T14:51:23Z

implementations/attention.py

        TMP,  # NOTE: TMP is a scratchpad buffer to workaround a compiler bug
        output,
        q_batch_stride, q_head_stride, q_m_stride, q_k_stride,
-        k_batch_stride, k_head_stride, k_n_stride, k_k_stride,
+        k_batch_stride, k_head_stride, k_n_stride, k_k_stride,  # We name n,k instead of k,n because of the transpose


should we keep the comment?

pommedeterresautee · 2022-09-29T14:54:45Z

implementations/attention.py

+        MASK_BATCH_SIZE: tl.constexpr,
+        MASK_HEAD_SIZE: tl.constexpr,
+        MASK_M_SIZE: tl.constexpr,
+        MASK_K_SIZE: tl.constexpr,


why are they constant? What makes them different from mask_*_stride for instance?

because we have conditions on them

pommedeterresautee · 2022-09-29T15:01:59Z

implementations/attention.py

@@ -197,7 +252,8 @@ class Attention(torch.autograd.Function):

    @staticmethod
    @custom_fwd(cast_inputs=torch.float16)
-    def forward(ctx: FunctionCtx, q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, output: torch.Tensor, sm_scale: float, is_causal: bool):
+    def forward(ctx: FunctionCtx, q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, output: torch.Tensor,
+                sm_scale: float, is_causal: bool, attention_mask: torch.Tensor = None):


attention_mask: torch.Tensor = None -> optional

pommedeterresautee · 2022-09-29T15:03:42Z

implementations/attention.py

+            assert attention_mask.size(0) == batch or attention_mask.size(0) == 1
+            assert attention_mask.size(1) == heads or attention_mask.size(1) == 1
+            assert attention_mask.size(2) == seq_length or attention_mask.size(2) == 1
+            assert attention_mask.size(3) == seq_length


can you add error message that would serve as documentaiton. Basically it may look like "mask is neither matching QKt shape or is broadcastable on its XXX axis"

pommedeterresautee · 2022-09-29T15:06:10Z

implementations/attention.py

+            assert attention_mask.size(2) == seq_length or attention_mask.size(2) == 1
+            assert attention_mask.size(3) == seq_length
+
+            # Move inside kernel ?


no, keep it outside of Triton if it works.
If the trick is from HF library, can you add a link to source code?

pommedeterresautee · 2022-09-29T15:25:53Z

implementations/attention.py

+            if MASK_M_SIZE == 1:
+                m = tl.load(mask + offs_mask)
+            else:
+                offs_mask += offs_m[:, None] * mask_m_stride
+                m = tl.load(mask + offs_mask, eviction_policy="evict_first")


as discussed, we may want to add some comment

pommedeterresautee · 2022-09-29T15:39:43Z

implementations/attention.py

            heads,
            seq_length,
            q,
            k,
            v,
            sm_scale,
+            attention_mask,
            tmp,
            output,
            q.stride(0), q.stride(1), q.stride(2), q.stride(3),
            k.stride(0), k.stride(1), k.stride(2), k.stride(3),
            v.stride(0), v.stride(1), v.stride(2), v.stride(3),
            output.stride(0), output.stride(1), output.stride(2), output.stride(3),
+            attention_mask.stride(0) if HAS_MASK else 0,
+            attention_mask.stride(1) if HAS_MASK else 0,
+            attention_mask.stride(2) if HAS_MASK else 0,
+            attention_mask.stride(3) if HAS_MASK else 0,


can you named argument because now it's very long

pommedeterresautee · 2022-09-29T15:42:44Z

test/test_attention.py

    batch, seq_length = shape
    if implementation == "original" and (dtype == torch.bfloat16 or seq_length != 512):
        pytest.skip("Original Triton implementation only supports fp16 and seq_length=512")
+    if implementation == "original" and mask_fn != generate_none_mask:


elif to highlight that we chain our tests

pommedeterresautee · 2022-09-29T15:44:36Z

test/test_attention.py

@@ -63,7 +80,7 @@ def test_mixed_stride():
    v = torch.rand_like(q)
    sm_scale = 0.3

-    expected = attention_reference(q=q, k=k, v=v, output=torch.empty_like(q), sm_scale=sm_scale, is_causal=False)
+    expected = attention_reference(q=q, k=k, v=v, output=torch.empty_like(q), sm_scale=sm_scale, is_causal=False, attention_mask=None)


can you please add dedicated test to mask

gaetansnl · 2022-09-29T16:09:04Z

implementations/attention.py

+            assert attention_mask.size(3) == seq_length
+
+            # Move inside kernel ?
+            attention_mask = attention_mask.clamp(min=torch.finfo(attention_mask.dtype).min,


move inside kernel

pommedeterresautee

lgtm

pommedeterresautee · 2022-09-30T15:59:09Z

================================================================================ 11 passed, 143 deselected, 598 warnings in 255.70s (0:04:15) =================================================================================

This was referenced Sep 28, 2022

feat: add bias mask #60

Closed

[TO BE CLOSED] feat: add mask #48

Closed

gaetansnl force-pushed the feat/bias-mask branch from bf96c43 to 9c24916 Compare September 28, 2022 12:06

feat: add bias mask

3af4a51

gaetansnl force-pushed the feat/bias-mask branch from 9c24916 to 3af4a51 Compare September 28, 2022 13:20

gaetansnl added 4 commits September 29, 2022 12:54

fix: ensure mask finite values

5b86b05

feat: support mask broadcast

1e8e679

fix: minor

a55e66f

fix: docs

05ab0f2

gaetansnl marked this pull request as ready for review September 29, 2022 14:29

gaetansnl requested a review from pommedeterresautee September 29, 2022 14:29

pommedeterresautee assigned gaetansnl Sep 29, 2022

pommedeterresautee added enhancement New feature or request model Model scope, HF, etc. labels Sep 29, 2022

fix: fix tests

4b2ad45

pommedeterresautee reviewed Sep 29, 2022

View reviewed changes

gaetansnl commented Sep 29, 2022

View reviewed changes

fix: perf enhancements

3cd945a

gaetansnl requested a review from pommedeterresautee September 30, 2022 07:57

fix: param error attention

985817b

gaetansnl mentioned this pull request Sep 30, 2022

T5 support #67

Closed

pommedeterresautee approved these changes Sep 30, 2022

View reviewed changes

fix: test

8cbce46

pommedeterresautee merged commit bbb6e12 into main Sep 30, 2022

pommedeterresautee deleted the feat/bias-mask branch September 30, 2022 15:59

pommedeterresautee linked an issue Oct 1, 2022 that may be closed by this pull request

Manage input mask in Flash Attention #34

Closed

pommedeterresautee mentioned this pull request Oct 1, 2022

Manage input mask in Flash Attention #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add bias mask #66

feat: add bias mask #66

gaetansnl commented Sep 28, 2022 •

edited

Loading

pommedeterresautee left a comment

pommedeterresautee Sep 29, 2022

pommedeterresautee Sep 29, 2022

gaetansnl Sep 30, 2022

pommedeterresautee Sep 29, 2022

gaetansnl Sep 30, 2022

pommedeterresautee Sep 29, 2022

pommedeterresautee Sep 29, 2022

pommedeterresautee Sep 29, 2022

pommedeterresautee Sep 29, 2022

pommedeterresautee Sep 29, 2022

pommedeterresautee Sep 29, 2022

pommedeterresautee Sep 29, 2022

gaetansnl Sep 29, 2022

pommedeterresautee left a comment

pommedeterresautee commented Sep 30, 2022

feat: add bias mask #66

feat: add bias mask #66

Conversation

gaetansnl commented Sep 28, 2022 • edited Loading

pommedeterresautee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pommedeterresautee left a comment

Choose a reason for hiding this comment

pommedeterresautee commented Sep 30, 2022

gaetansnl commented Sep 28, 2022 •

edited

Loading