Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize batchnorm1d using 2D kernel #43530

Merged
merged 24 commits into from Jul 14, 2022

Conversation

EsdeathYZH
Copy link
Contributor

@EsdeathYZH EsdeathYZH commented Jun 15, 2022

PR types

Function optimization

PR changes

OPs

Describe

调研设计文档

1 Motivation:

  • BatchNorm1D算子与pytorch相比性能差

2 Design:

  • 根据Batch size判断是否使用cudnn库进行计算,当batch size大于阈值时使用native kernel
  • 使用2D tile的方式划分block,提高访存局部性以及增加资源利用率

3 Evaluation:
测试环境:NVIDIA V100 GPU

输入形状 Pytorch Oneflow Paddle优化前 Paddle优化后
[126000, 16] 0.014 0.082 0.773 0.023
[136000, 32] 0.012 0.217 Error 0.022
[215857, 32] 0.015 0.371 Error 0.022
[215857, 64] 0.025 0.900 Error 0.029
[143042, 64] 0.020 0.544 Error 0.020
[62929, 64] 0.012 0.219 0.419 0.020
[136000, 16, 16] 0.961 0.967 0.971 0.092
[136000, 32, 32] 1.015 1.019 1.023 0.312
[1000000, 16, 16] 3.956 7.153 7.081 0.866
[1000000, 32, 32] 4.859 7.487 7.42 2.457

[N, C]输入获得了33倍的性能提升,[N, C, L]输入获得了8.2倍的性能提升

@paddle-bot-old paddle-bot-old bot added contributor External developers status: proposed labels Jun 15, 2022
@paddle-bot-old
Copy link

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot-old
Copy link

Sorry to inform you that 938cde3's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jul 7, 2022

Sorry to inform you that 44ad03e's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@@ -591,10 +591,12 @@ void BatchNormGradRawKernel(const Context &ctx,
// ctx.GetPlace()),
// epsilon, saved_mean_data, saved_var_data));
#else
// CUDNN PER_ACTIVATION mode only support small batch size
// CUDNN only support small batch size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和版本有关吗 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前用到的版本是不支持太大的batch_size。

@@ -137,6 +138,398 @@ static __global__ LAUNCH_BOUNDS(BlockDim) void BNForwardTraining(
}
}

template <typename T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续补充注释

@zkh2016 zkh2016 merged commit 1bc47c8 into PaddlePaddle:develop Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants