[Unity][CUTLASS] Fixed stacked attention offload when QKV reshape uses the same shape expression by masahi · Pull Request #14728 · apache/tvm

masahi · 2023-04-26T20:52:09Z

When a single expression, such as R.shape([2, 4096, 8, 40]) is used by all reshape for QKV, the following composite function that only gets one shape parameter is generated. This is currently not handled properly by codegen.

@R.function
def fused_relax_split_relax_reshape_relax_reshape_relax_reshape_relax_nn_attention_cutlass(qkv: R.Tensor((4, 8, 6144), dtype="float32"), param_0: R.Shape([4, 8, 32, 64])) -> R.Tensor((4, 8, 32, 64), dtype="float32"):
    R.func_attr({"Codegen": "cutlass", "global_symbol": "fused_relax_split_relax_reshape_relax_reshape_relax_reshape_relax_nn_attention_cutlass"})
    # from tvm.script import relax as R
    
    @R.function
    def gv_1(qkv_1: R.Tensor((4, 8, 6144), dtype="float32"), param_0_1: R.Shape([4, 8, 32, 64])) -> R.Tensor((4, 8, 32, 64), dtype="float32"):
        R.func_attr({"Composite": "cutlass.stacked_attention", "Primitive": 1})
        with R.dataflow():
            lv: R.Tuple(R.Tensor((4, 8, 2048), dtype="float32"), R.Tensor((4, 8, 2048), dtype="float32"), R.Tensor((4, 8, 2048), dtype="float32")) = R.split(qkv_1, indices_or_sections=[2048, 4096], axis=2)
            lv1: R.Tensor((4, 8, 2048), dtype="float32") = lv[0]
            lv2: R.Tensor((4, 8, 32, 64), dtype="float32") = R.reshape(lv1, param_0_1)
            lv3: R.Tensor((4, 8, 2048), dtype="float32") = lv[1]
            lv4: R.Tensor((4, 8, 32, 64), dtype="float32") = R.reshape(lv3, param_0_1)
            lv5: R.Tensor((4, 8, 2048), dtype="float32") = lv[2]
            lv6: R.Tensor((4, 8, 32, 64), dtype="float32") = R.reshape(lv5, param_0_1)
            gv_2: R.Tensor((4, 8, 32, 64), dtype="float32") = R.nn.attention(lv2, lv4, lv6, scale=None)
            R.output(gv_2)
        return gv_2

    gv1: R.Tensor((4, 8, 32, 64), dtype="float32") = gv(qkv, param_0)
    return gv1

Apparently, it is due to EliminateCommonSubexpr() pass that I'm using which turns the original three reshape ops below into ones that share the same R.shape([2, 4096, 8, 40]).

lv_3: R.Tensor((2, 4096, 8, 40), dtype="float32") = R.reshape(lv_2, R.shape([2, 4096, 8, 40]))
lv1_2: R.Tensor((2, 4096, 8, 40), dtype="float32") = R.reshape(lv1_1, R.shape([2, 4096, 8, 40]))
lv2_3: R.Tensor((2, 4096, 8, 40), dtype="float32") = R.reshape(lv2_2, R.shape([2, 4096, 8, 40]))
lv_4: R.Tensor((2, 4096, 8, 40), dtype="float32") = cls.fused_relax_nn_attention_cutlass(lv_3, lv1_2, lv2_3)

@cyx-6

tvm-bot · 2023-04-26T20:52:12Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @quic-sanirudh _{See #10317 for details}

_{Generated by tvm-bot}

cyx-6

Thanks for reporting and fixing!

masahi added 2 commits April 27, 2023 05:26

fixed stacked attention offload when QKV have the same shape

f20181f

add test

2030c44

cyx-6 approved these changes Apr 26, 2023

View reviewed changes

masahi merged commit 94e3d51 into apache:unity Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unity][CUTLASS] Fixed stacked attention offload when QKV reshape uses the same shape expression#14728

[Unity][CUTLASS] Fixed stacked attention offload when QKV reshape uses the same shape expression#14728
masahi merged 2 commits intoapache:unityfrom
masahi:cutlass-stacked-attention-fix

masahi commented Apr 26, 2023 •

edited

Loading

Uh oh!

tvm-bot commented Apr 26, 2023

Uh oh!

cyx-6 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

masahi commented Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tvm-bot commented Apr 26, 2023

Uh oh!

cyx-6 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

masahi commented Apr 26, 2023 •

edited

Loading