-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference] Transfer layout pass #63628
[Inference] Transfer layout pass #63628
Conversation
commit ed4e432891074309b75aa852134fa4818a08dbfb Author: kangguangli <kangguangli@hotmail.com> Date: Wed Apr 17 11:58:48 2024 +0000 fix commit ac82803f6d07a7fcb52545794fc4957f1fcd7bd9 Author: kangguangli <kangguangli@hotmail.com> Date: Thu Apr 11 06:40:35 2024 +0000 finish graph commit c934fc6cdb1629df9b8e048a528f880bc7a2cae9 Author: kangguangli <kangguangli@hotmail.com> Date: Wed Apr 10 12:55:21 2024 +0000 add test
你的PR提交成功,感谢你对开源项目的贡献! |
Sorry to inform you that d26e874's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
commit 579d9adfc6caf2bd30fcdfbaaaccc5b7b8fa8c50 Author: kangguangli <kangguangli@hotmail.com> Date: Mon Apr 29 09:22:52 2024 +0000 similar precision
commit 9543ac8429b3f2e8a06c590bd4c7136e7292146f Author: root <kangguangli@hotmail.com> Date: Fri May 10 07:25:56 2024 +0000 complement ops in ch_PP-OCRv4_server_rec
… transfer_layout_pass
… into transfer_layout_pass
PADDLE_ENFORCE( | ||
op->isa<pir::ModuleOp>(), | ||
common::errors::InvalidArgument( | ||
"The target of TransferLayoutPass should be a Module Op")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while op内部不能应用此pass吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
控制流情况如何处理?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前不行,后续可以支持,但是建议在transfer_layout_pass在模型测试比较稳定以后再开发。
如果要支持的话,目前看最好是单独设计一个针对控制流优化的方案,把这个pass的功能解耦一些,不然复杂度太高,不好开发和维护。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Performance Optimization
PR Types
New features
Description
一些算子如Conv的性能与输入数据的Layout紧密相关,这个PR的作用是针对这些Layout性能敏感的算子进行优化,实现了transfer_layout_pass,该pass的主要目标是确保对于所有确定在某个Layout下有明显性能优势的算子,将其Layout转换为目标Layout。在此基础上,本PR的主要改进和实现点有两个:
另外需要强调,本Pass假设所有输入起初都是NCHW的,如果某个输入一开始就是NHWC,可能会被错误改写,关于这一问题的详细说明,见Q&A部分。
方案
首先我们描述一下对这一问题的建模:在计算图中,存在以下三类节点,第一类是必须在NCHW下运行的节点,如用户输入输出;第二类是必须在NHWC运行的节点,如Conv/FusedConv;第三类是可以接受任意Layout的节点。于是我们的问题就是要对第三类节点染色,使得全局而言,在确保第二类节点都运行在NHWC下的同时,插入的transpose数量最少。
自然的,我们将这建模为一个最小割问题,其中,第一类节点和源点之间连一条权重无穷大的边,第二类节点和汇点之间连一条权重无穷大的边。接下来的问题,此时,最小割算法会将节点划分到两个集合,一个集合包含源点,这个集合的点会在NCHW下运行;另一个集合包含汇点,这个集合的点会在NHWC下运行。 割边的权重总和对应我们会插入transpose算子的数量,这里比较微妙的地方在于要确保在建图过程中正确将边权建模为:如果该边的两个端点Layout不同,插入的transpose算子数量。
性能测试
A30下测试SD1.5模型平均时延从 3517.366 ms下降到 2561.959 ms。更多实验待后续补充。
Q&A
由于很多算子的Infermeta都没有正确设置输出的layout,导致我们获取到的原Layout是不准确的。另外,对于matmul这样的算子,其输入一般是二维的,说它的输入是NCHW还是NHWC都是有问题的。因此这类算子我们一定不更改其Layout。
reshape类算子的Layout一般不能转换,但是像UNet中出现的 1 -> 1x1x1x32 这类情形是可以被转换,进而节省一个transpose的。此类优化后续单独提PR支持。
Others
Pcard-71500