add oss flash fmha and fmhca support #49438

wwbitejotunn · 2022-12-29T08:24:59Z

PR types

Performance optimization

PR changes

OPs

Describe

Add nvidia tensorrt oss plugin flash attention and cross attention support to accelerate the inference speed of stable diffusion and other models.
Using flash attention and cross attention plugin, the stable diffusion latency can be speed up from 1.52s to 1.02s.
Tensorrt 8.5.2 is required to using those plugins.
Using nsys, we can see that the plugin are successful involved by unit test under trt8.5.2.2 environment

paddle-bot · 2022-12-29T08:25:02Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle/fluid/framework/ir/trt_cross_multihead_matmul_fuse_pass.cc

MARD1NO · 2023-01-03T02:27:59Z

paddle/fluid/framework/ir/trt_cross_multihead_matmul_fuse_pass.cc

+
+void TrtCrossMultiHeadMatmulFusePass::ApplyImpl(Graph* graph) const {
+  FusePassBase::Init(name_scope_, graph);
+#ifdef PADDLE_WITH_TENSORRT


这个if 宏是不是应该包到562行后

我理解这里进行early stop的话是要在build fusion之前进行, 因此放在ApplyImpl的最开头.

经过讨论后, 目前采用运行期判定的方式进行early stop, 麻烦辛苦再看一下

paddle/fluid/framework/ir/trt_flash_multihead_matmul_fuse_pass.cc

MARD1NO · 2023-01-03T02:31:30Z

paddle/fluid/framework/ir/trt_flash_multihead_matmul_fuse_pass.cc

+  FusePassBase::Init(name_scope_, graph);
+  auto* scope = param_scope();
+
+#ifdef PADDLE_WITH_TENSORRT


经过讨论后, 目前采用运行期判定的方式进行early stop, 麻烦辛苦再看一下

paddle/fluid/inference/tensorrt/convert/flash_multihead_matmul_op.cc

heavengate · 2023-01-06T04:05:11Z

paddle/fluid/framework/ir/trt_cross_multihead_matmul_fuse_pass.cc

+          std::get<2>(trt_version) * 10 <
+      8520) {
+    VLOG(3) << "Flash attention oss plugin only available for trt version >= "
+               "8.5.2.2. Stop this pass";


这里只是输出了日志？应该return吧？

看着pass有限定trt 8.5.2.2才注册，这里应该不用判断了

考虑后还是在这里进行runtime时的early stop, 因此现在在这里加上了return

paddle/fluid/framework/ir/trt_flash_multihead_matmul_fuse_pass.cc

refine compile fix compile

XiaoguangHu01

LGTM

wwbitejotunn marked this pull request as ready for review December 29, 2022 09:09

wwbitejotunn force-pushed the sd-oss-fmha-fmhca branch from 4cecf1d to 9c1e126 Compare December 30, 2022 02:46

MARD1NO reviewed Jan 3, 2023

View reviewed changes

MARD1NO previously approved these changes Jan 6, 2023

View reviewed changes

heavengate reviewed Jan 6, 2023

View reviewed changes

paddle/fluid/framework/ir/trt_flash_multihead_matmul_fuse_pass.cc Outdated Show resolved Hide resolved

heavengate reviewed Jan 6, 2023

View reviewed changes

paddle/fluid/framework/ir/trt_flash_multihead_matmul_fuse_pass.cc Outdated Show resolved Hide resolved

wwbitejotunn dismissed MARD1NO’s stale review via cb2e248 January 6, 2023 08:51

wwbitejotunn added 13 commits January 9, 2023 13:01

add fmha_flashattention oss plugin

cb63b70

add fmhca

9237b27

add oss fmhca

389a14a

code reconstruct and add ut

06fd61a

code style refine

8335d30

fix ut and enforce check

5932d32

refine trt version check

1749c57

refine compile fix compile

fix cross ut

2ef9c87

code refine

9ac1d1d

use runtime trt version check

3f45375

bug fix and code refine

8fade2e

compile fix

3293ca1

mutable_data to alloc

6a679e8

wwbitejotunn force-pushed the sd-oss-fmha-fmhca branch from 9691d16 to 6a679e8 Compare January 9, 2023 13:05

code refine

059ab21

wwbitejotunn closed this Jan 9, 2023

wwbitejotunn reopened this Jan 9, 2023

wwbitejotunn marked this pull request as draft January 10, 2023 03:22

wwbitejotunn marked this pull request as ready for review January 10, 2023 03:23

heavengate approved these changes Jan 11, 2023

View reviewed changes

XiaoguangHu01 approved these changes Jan 13, 2023

View reviewed changes

heavengate merged commit a48b8e2 into PaddlePaddle:develop Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add oss flash fmha and fmhca support #49438

add oss flash fmha and fmhca support #49438

wwbitejotunn commented Dec 29, 2022 •

edited

paddle-bot bot commented Dec 29, 2022

MARD1NO Jan 3, 2023

wwbitejotunn Jan 3, 2023

wwbitejotunn Jan 5, 2023

MARD1NO Jan 3, 2023

wwbitejotunn Jan 5, 2023

heavengate Jan 6, 2023

heavengate Jan 6, 2023

wwbitejotunn Jan 6, 2023

XiaoguangHu01 left a comment

add oss flash fmha and fmhca support #49438

add oss flash fmha and fmhca support #49438

Conversation

wwbitejotunn commented Dec 29, 2022 • edited

PR types

PR changes

Describe

paddle-bot bot commented Dec 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

wwbitejotunn commented Dec 29, 2022 •

edited