Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon4 No.39】为 Paddle 优化 p_norm_grad op 在 GPU 上的计算性能 #496

Closed
wants to merge 3 commits into from

Conversation

zeroRains
Copy link
Contributor

为 Paddle 优化 p_norm_grad op 在 GPU 上的计算性能
任务:PaddlePaddle/Paddle#50657 (comment)

@paddle-bot
Copy link

paddle-bot bot commented Mar 31, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备,具体请参考示例模版
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

@zeroRains
Copy link
Contributor Author

呜呜呜,老师这个也抽空看看吧,已经三周了,(:з」∠) @JamesLim-sy

@zeroRains
Copy link
Contributor Author

麻烦老师抽空看看吧, @JamesLim-sy


## 2.1 关键模块与性能提升点

`p_norm_grad`算子的性能瓶颈在于Eigen实现了整个计算过程,在查阅了相关源码之后,确定可以使用`ElementWiseKernel`和`BroadcastKernel`对Eigen的实现进行替换,并结合一些计算融合的方法,减少`Kernel`的调用,提高`p_norm_grad`算子在GPU的计算性能。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ele和Bc确实能够提升OP的整体性能,那是否可以实现单个Kernel解决全部问题的方案

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

额,这个方案不可行么,那我研究研究融合进一个kernel的方法

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants