Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference] Transfer layout pass #63628

Merged
merged 47 commits into from
May 27, 2024

Conversation

kangguangli
Copy link
Contributor

@kangguangli kangguangli commented Apr 17, 2024

PR Category

Performance Optimization

PR Types

New features

Description

一些算子如Conv的性能与输入数据的Layout紧密相关,这个PR的作用是针对这些Layout性能敏感的算子进行优化,实现了transfer_layout_pass,该pass的主要目标是确保对于所有确定在某个Layout下有明显性能优势的算子,将其Layout转换为目标Layout。在此基础上,本PR的主要改进和实现点有两个:

  1. 引入了LayoutTransformationInterface,用于确定算子的目标Layout,以及获取Layout转换所需的信息。注意到,这里我们为它新增了CanBeModified接口,该接口用于确定某个算子的Layout是否可以在NCHW/NHWC之间任意转换。
  2. 在确保所有Layout性能敏感的算子会被正确转换的同时,本Pass同时保证额外插入的transpose算子数量会尽可能少。但是需要强调,这还取决于Layout可以任意转换的算子数量。目前,reshape/squeeze/unsqueeze算子会导致pass给出的结果并非全局最优,这里我们会进行后续优化。
    另外需要强调,本Pass假设所有输入起初都是NCHW的,如果某个输入一开始就是NHWC,可能会被错误改写,关于这一问题的详细说明,见Q&A部分。

方案

首先我们描述一下对这一问题的建模:在计算图中,存在以下三类节点,第一类是必须在NCHW下运行的节点,如用户输入输出;第二类是必须在NHWC运行的节点,如Conv/FusedConv;第三类是可以接受任意Layout的节点。于是我们的问题就是要对第三类节点染色,使得全局而言,在确保第二类节点都运行在NHWC下的同时,插入的transpose数量最少。
自然的,我们将这建模为一个最小割问题,其中,第一类节点和源点之间连一条权重无穷大的边,第二类节点和汇点之间连一条权重无穷大的边。接下来的问题,此时,最小割算法会将节点划分到两个集合,一个集合包含源点,这个集合的点会在NCHW下运行;另一个集合包含汇点,这个集合的点会在NHWC下运行。 割边的权重总和对应我们会插入transpose算子的数量,这里比较微妙的地方在于要确保在建图过程中正确将边权建模为:如果该边的两个端点Layout不同,插入的transpose算子数量。

性能测试

A30下测试SD1.5模型平均时延从 3517.366 ms下降到 2561.959 ms。更多实验待后续补充。

Q&A

  1. 关于变量Layout的一些问题
    由于很多算子的Infermeta都没有正确设置输出的layout,导致我们获取到的原Layout是不准确的。另外,对于matmul这样的算子,其输入一般是二维的,说它的输入是NCHW还是NHWC都是有问题的。因此这类算子我们一定不更改其Layout。
  2. 关于reshape类算子的进一步优化
    reshape类算子的Layout一般不能转换,但是像UNet中出现的 1 -> 1x1x1x32 这类情形是可以被转换,进而节省一个transpose的。此类优化后续单独提PR支持。

Others

Pcard-71500

commit ed4e432891074309b75aa852134fa4818a08dbfb
Author: kangguangli <kangguangli@hotmail.com>
Date:   Wed Apr 17 11:58:48 2024 +0000

    fix

commit ac82803f6d07a7fcb52545794fc4957f1fcd7bd9
Author: kangguangli <kangguangli@hotmail.com>
Date:   Thu Apr 11 06:40:35 2024 +0000

    finish graph

commit c934fc6cdb1629df9b8e048a528f880bc7a2cae9
Author: kangguangli <kangguangli@hotmail.com>
Date:   Wed Apr 10 12:55:21 2024 +0000

    add test
Copy link

paddle-bot bot commented Apr 17, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link

paddle-ci-bot bot commented Apr 25, 2024

Sorry to inform you that d26e874's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

commit 579d9adfc6caf2bd30fcdfbaaaccc5b7b8fa8c50
Author: kangguangli <kangguangli@hotmail.com>
Date:   Mon Apr 29 09:22:52 2024 +0000

    similar precision
commit 9543ac8429b3f2e8a06c590bd4c7136e7292146f
Author: root <kangguangli@hotmail.com>
Date:   Fri May 10 07:25:56 2024 +0000

    complement ops in ch_PP-OCRv4_server_rec
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators May 22, 2024
@PaddlePaddle PaddlePaddle unlocked this conversation May 22, 2024
@kangguangli kangguangli marked this pull request as draft May 22, 2024 05:36
@kangguangli kangguangli marked this pull request as ready for review May 22, 2024 05:37
@kangguangli kangguangli reopened this May 22, 2024
@kangguangli kangguangli marked this pull request as draft May 22, 2024 05:40
@kangguangli kangguangli marked this pull request as ready for review May 22, 2024 05:41
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators May 22, 2024
@PaddlePaddle PaddlePaddle unlocked this conversation May 22, 2024
Comment on lines 556 to 559
PADDLE_ENFORCE(
op->isa<pir::ModuleOp>(),
common::errors::InvalidArgument(
"The target of TransferLayoutPass should be a Module Op"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while op内部不能应用此pass吗?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

控制流情况如何处理?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前不行,后续可以支持,但是建议在transfer_layout_pass在模型测试比较稳定以后再开发。
如果要支持的话,目前看最好是单独设计一个针对控制流优化的方案,把这个pass的功能解耦一些,不然复杂度太高,不好开发和维护。

@kangguangli kangguangli changed the title [WIP][Inference] Transfer layout pass [Inference] Transfer layout pass May 27, 2024
Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kangguangli kangguangli merged commit f03a06e into PaddlePaddle:develop May 27, 2024
32 checks passed
@kangguangli kangguangli deleted the transfer_layout_pass branch May 27, 2024 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants