New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Fused_mt Branch Migration #64125

Merged

heavengate merged 27 commits into PaddlePaddle:develop from penPenf28:xhw_fused_mt

May 23, 2024

Contributor

penPenf28 commented May 8, 2024 •

edited

Loading

PR Category

Inference

PR Types

New features

Description

移植如下内容，并做了一定程度上的适配

PR FusedMultiTransformer optimization #59385
- New fuse_mt op
- Some tools for exporting
PR support GQA and add unit test for varlen and gqa #61300
- Add parameter gqa_group_size of fuse_mt, which means actual num of kv heads
- Add new unit tests to ensure precision

重点修改paddle/fluid/operators/fused/fused_multi_transformer_op.cu，增加了GQA支持，目前仅支持flash_attention_v2的底层实现（only float16/bfloat16）

本地单测，移除了padding输入为定长的测试，只支持variable长度

test_fused_multi_transformer_op.py pass

paddle-bot bot commented May 8, 2024

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

CLAassistant commented May 8, 2024 •

edited

Loading

All committers have signed the CLA.

CLAassistant commented May 8, 2024

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

paddle-bot bot added the contributor label

penPenf28 marked this pull request as ready for review

May 8, 2024 12:45

penPenf28 requested review from wanghuancoder, luotao1, Aurelius84, XiaoguangHu01 and qili93 as code owners

May 8, 2024 12:45

penPenf28 changed the title ~~Fused_mt branch Migration~~ Fused_mt Branch Migration

wwbitejotunn reviewed

View reviewed changes

paddle/fluid/operators/fused/fused_multi_transformer_op.cu Outdated Show resolved Hide resolved

wwbitejotunn reviewed

View reviewed changes

paddle/fluid/operators/fused/mmha_util.cu.h Outdated Show resolved Hide resolved

wwbitejotunn reviewed

View reviewed changes

paddle/fluid/operators/fused/fused_multi_transformer_helper.cu.h Outdated Show resolved Hide resolved

wwbitejotunn reviewed

View reviewed changes

paddle/fluid/operators/fused/fmha_ref.h Outdated Show resolved Hide resolved

wwbitejotunn reviewed

View reviewed changes

paddle/fluid/operators/fused/fused_multi_transformer_op.cu.h Outdated Show resolved Hide resolved

XieYunshen previously approved these changes

View reviewed changes

Contributor

XieYunshen left a comment

LGTM 单测删除

penPenf28 dismissed XieYunshen’s stale review via

89205e7

May 18, 2024 11:06

RichardWooSJTU reviewed

View reviewed changes

paddle/fluid/platform/dynload/cublasLt.h Show resolved Hide resolved

penPenf28 added 11 commits

May 21, 2024 20:17


          Merge fused_mt branch

dc5642c


          Adjusted fuse_mt_int8

10e2022


          Revert attention_layer_norm.h

6bbda60


          Revert paddle/phi/kernels/fusion/gpu/fmha_ref.h

f4b8f6d


          Add win support and refine format.

1d5eed5


          Reformat for win.

7548bfc


          Removed redundant files, now only supports flash_attn_v2 and variable…

f280bc3

… length


          Refine static_fused_ft test


          Refine fused_mt related testcase

c610938


          Remove custom_adll_reduce

c49c8eb


          Remove operator cublaslt and revert parallel test

c01e19d

penPenf28 added 16 commits

May 21, 2024 20:19


          Refine empty seq_len

b325bb6


          Refine ft

fb533c1


          Refine ft_static test


          Remove float32 support and static parallel ft test

25057b9


          Refine type static error.

6f0130f


          Fix doc type error

5f07582


          Fuse_mt code format

bc36efe


          Remove some redundant code

235c856


          Remove redundant attention_layer_norm.h

e4cb5c3


          Remove redundant code in ft_op

b4cb3aa


          Remove Redundant code and skip fuse_mt doctest

0480c13


          Remove redundant fmha_ref mmha_util and other code

9c64e8e


          Remove redundant kernel


          Remove redundant file

bedda56


          Refine fuse_mt code

137ffbc


          Refine cublaslt comment

6e41213

penPenf28 force-pushed the xhw_fused_mt branch from ecca46f to 6e41213 Compare

May 21, 2024 13:54

sneaxiy approved these changes

View reviewed changes

XieYunshen approved these changes

View reviewed changes

yuanlehome approved these changes

View reviewed changes

qingqing01 approved these changes

View reviewed changes

YuanRisheng approved these changes

View reviewed changes

wanghuancoder approved these changes

View reviewed changes

tianshuo78520a approved these changes

View reviewed changes

Contributor

tianshuo78520a left a comment

LGTP for print

XiaoguangHu01 approved these changes

View reviewed changes

Contributor

XiaoguangHu01 left a comment

LGTM

heavengate merged commit f8f9bfa into PaddlePaddle:develop

32 checks passed

chen2016013 pushed a commit to chen2016013/Paddle that referenced this pull request


          Fused_mt Branch Migration (PaddlePaddle#64125)

b836968

* Merge fused_mt branch

* Adjusted fuse_mt_int8

* Revert attention_layer_norm.h

* Revert paddle/phi/kernels/fusion/gpu/fmha_ref.h

* Add win support and refine format.

* Reformat for win.

* Removed redundant files, now only supports flash_attn_v2 and variable length

* Refine static_fused_ft test

* Refine fused_mt related testcase

* Remove custom_adll_reduce

* Remove operator cublaslt and revert parallel test

* Refine empty seq_len

* Refine ft

* Refine ft_static test

* Remove float32 support and static parallel ft test

* Refine type static error.

* Fix doc type error

* Fuse_mt code format

* Remove some redundant code

* Remove redundant attention_layer_norm.h

* Remove redundant code in ft_op

* Remove Redundant code and skip fuse_mt doctest

* Remove redundant fmha_ref mmha_util and other code

* Remove redundant kernel

* Remove redundant file

* Refine fuse_mt code

* Refine cublaslt comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

RichardWooSJTU RichardWooSJTU left review comments

wwbitejotunn wwbitejotunn left review comments

qingqing01 qingqing01 approved these changes

yuanlehome yuanlehome approved these changes

wanghuancoder wanghuancoder approved these changes

YuanRisheng YuanRisheng approved these changes

tianshuo78520a tianshuo78520a approved these changes

XieYunshen XieYunshen approved these changes

sneaxiy sneaxiy approved these changes

XiaoguangHu01 XiaoguangHu01 approved these changes

luotao1 Awaiting requested review from luotao1

Aurelius84 Awaiting requested review from Aurelius84

qili93 Awaiting requested review from qili93