Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 4 No.44】为 Paddle 优化 logsumexp op 在 GPU 上的计算性能 #52509

Merged
merged 2 commits into from Apr 10, 2023

Conversation

Asthestarsfalll
Copy link
Contributor

@Asthestarsfalll Asthestarsfalll commented Apr 4, 2023

PR types

Performance optimization

PR changes

OPs

Describe

目前 Paddle 内 logsumexp OP 的 GPU 计算调用了 eigen,性能较差,有较大的提升空间。

设计文档:https://github.com/PaddlePaddle/community/blob/master/rfcs/OPs-Perf/20220305_logsumexp_op_optimization.md

  • 开发环境

    • Nvidia GTX 1060
    • CUDA11.2, cuDNN 8
  • 测试环境

    • Tesla V100
    • CUDA11.2, cuDNN 8

当前 logsumexp 前向性能如下所示(1000次运行取平均值):

Case No. device input_shape input_type origin Paddle Perf(ms)
1 Tesla V100 [64L, 64L] float32 0.05317
2 Tesla V100 [1024L, 512L] float32 0.72666
3 Tesla V100 [64L, 64L] float16 0.052309
4 Tesla V100 [1024L, 512L] float16 0.721384

PyTorch 中 logsumexp 前向性能如下所示(1000次运行取平均值):

Case No. device input_shape input_type PyTorch Perf(ms) diff with original Paddle
1 Tesla V100 [64L, 64L] float32 0.030523 faster than 74.2%
2 Tesla V100 [1024L, 512L] float32 0.038930 faster than 1766.6%
3 Tesla V100 [64L, 64L] float16 0.031530 faster than 65.9%
4 Tesla V100 [1024L, 512L] float16 0.037380 faster than 1845.9%

当前前向性能如下(1000次运行取平均值) :

Case No. device input_shape input_type New Paddle Perf(ms) diff with original Paddle diff with PyTorch
1 Tesla V100 [64L, 64L] float32 0.016674 faster than 218.9% faster than 83.1%
2 Tesla V100 [1024L, 512L] float32 0.027571 faster than 2535.6% faster than 41.2%
3 Tesla V100 [64L, 64L] float16 0.017113 faster than 205.7% faster than 84.2%
4 Tesla V100 [1024L, 512L] float16 0.027820 faster than 2493.0% faster than 34.4%

@paddle-bot
Copy link

paddle-bot bot commented Apr 4, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Asthestarsfalll
Copy link
Contributor Author

@JamesLim-sy ci已通过,劳烦老师看下

Copy link
Contributor

@JamesLim-sy JamesLim-sy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JamesLim-sy JamesLim-sy merged commit 0e77696 into PaddlePaddle:develop Apr 10, 2023
25 checks passed
@Asthestarsfalll Asthestarsfalll deleted the logsumexp branch August 19, 2023 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants