Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon4 No.33】为 Paddle 优化 Histogram op 在 GPU 上的计算性能 #486

Merged
merged 4 commits into from
Apr 20, 2023

Conversation

zeroRains
Copy link
Contributor

为 Paddle 优化 Histogram op 在 GPU 上的计算性能
任务:PaddlePaddle/Paddle#50657 (comment)

@paddle-bot
Copy link

paddle-bot bot commented Mar 28, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备,具体请参考示例模版
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

@zeroRains
Copy link
Contributor Author

@JamesLim-sy 老师,麻烦review一下,已经两个周了,(:з」∠)


## 2.1 关键模块与性能提升点

关键是使用`phi::funcs::ReduceKernel`,加速`Histogram`确定直方图边界的计算部分,从而提高`Histogram`算子在GPU上的计算性能。预期能够平均提升2倍以上。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paddle内置的Reduce计算目前针对的场景是单输入、单输出,不建议使用。鉴于Min Max是一体两面的计算,一个__device__ Kernel 内部可以同时得到Max_value 和 Min_value,另外如果对 cooperative_groups 或其他同类内存栅栏有了解的话,可以统一在一个 global kernel 内部完成全部计算.

@JamesLim-sy JamesLim-sy merged commit 0a4b4f7 into PaddlePaddle:master Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants