Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization of max_pool3d grad #45934

Merged
merged 1 commit into from Sep 20, 2022
Merged

Conversation

s5u13b
Copy link
Contributor

@s5u13b s5u13b commented Sep 9, 2022

PR types

Performance optimization

PR changes

OPs

Describe

  • Environment:
    • V100-32G, CUDA 11.2, cuDNN 8
  • Feature:
    • Replace the div and mod operation with fast_divmod operation.
    • Replace 1d gpu launch with 3d gpu launch.
    • Optimize the computation logic of input grad accumulation. Before optimization, the gpu launch config is based on the input data, and the input grad is accumulated through traverse the coresponding index of output mask data, which introduces the overhead of much output index computation. After optimization, the gpu launch config is based on the output data, the input grad is directly accumulated through looking up the max index of each output data, which saves the overhead of output index computation but requires atomic add operation.
    • (Config 0 is not optimized yet because paddle calls cudnn kernel in the config 0 of max_pool3d benchmarking.)
  • Performance (OP Benchmark):
Paddle Kernel Config ID Perf Before Perf After Improvement Perf of Pytorch
cudnn::pooling_bw_5d_kernel_max 0 1779.7us - - 725.72us
KernelMaxPool3DWithIdxGrad 1 6128.1us 677.62us 804.3% 725.83us

@paddle-bot-old paddle-bot-old bot added the contributor External developers label Sep 9, 2022
Copy link
Contributor

@JamesLim-sy JamesLim-sy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,补充下其他走相同Kernel的性能数据吧

@JamesLim-sy JamesLim-sy merged commit 0e563da into PaddlePaddle:develop Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants