Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon4 No.44】为 Paddle 优化 logsumexp op 在 GPU 上的计算性能 #413

Merged
merged 5 commits into from
Apr 4, 2023

Conversation

Asthestarsfalll
Copy link
Contributor

No description provided.

@paddle-bot
Copy link

paddle-bot bot commented Mar 5, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备,具体请参考示例模版
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.


## 2.1 关键模块与性能提升点

具体实现方式上,可以借助shared memory合并带有Reduce计算的Kernel,以减少访问global memory的次数。fusion后的logsumexp Kernel会在一开始把输入加载到shared memory中,每个block的shared memory加载一个instance的feature,shape为`(1, c)`。后续的所有中间计算结果都保存到shared memory中,只将最后的输出out写到global memory里。
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确认一下,你是打算自己写CUDA Kernel吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,这是否是被鼓励的行为呢?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看算子类型哈,如果通过调用1个框架已充分优化的ElementwiseKernelBroadcastKernelReduceKernel就能实现的,则不建议重复开发,直接调用这些Kernel就行。对于像这一类比较复杂的算子,是鼓励自己写Kernel的哈,RFC里面可以明确说明一下是要重新写一个优化的CUDA Kernel。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

尝试了下自己写cuda kernel,有些困难😢,因此使用kps中的ReduceKernel和elementwise算子实现了一版,性能符合预期,这样也可以吧?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

性能表现已更新

@Asthestarsfalll
Copy link
Contributor Author

@Xreki 劳烦老师看一下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants