Optimize batchnorm1d using 2D kernel #43530

EsdeathYZH · 2022-06-15T03:18:13Z

PR types

Function optimization

PR changes

OPs

Describe

调研设计文档

1 Motivation：

BatchNorm1D算子与pytorch相比性能差

2 Design：

根据Batch size判断是否使用cudnn库进行计算，当batch size大于阈值时使用native kernel
使用2D tile的方式划分block，提高访存局部性以及增加资源利用率

3 Evaluation：
测试环境：NVIDIA V100 GPU

输入形状	Pytorch	Oneflow	Paddle优化前	Paddle优化后
[126000, 16]	0.014	0.082	0.773	0.023
[136000, 32]	0.012	0.217	Error	0.022
[215857, 32]	0.015	0.371	Error	0.022
[215857, 64]	0.025	0.900	Error	0.029
[143042, 64]	0.020	0.544	Error	0.020
[62929, 64]	0.012	0.219	0.419	0.020
[136000, 16, 16]	0.961	0.967	0.971	0.092
[136000, 32, 32]	1.015	1.019	1.023	0.312
[1000000, 16, 16]	3.956	7.153	7.081	0.866
[1000000, 32, 32]	4.859	7.487	7.42	2.457

[N, C]输入获得了33倍的性能提升，[N, C, L]输入获得了8.2倍的性能提升

paddle-bot-old · 2022-06-15T03:18:34Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot-old · 2022-06-26T02:44:39Z

Sorry to inform you that 938cde3's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

paddle/phi/kernels/gpu/batch_norm_kernel.cu

paddle-bot-old · 2022-07-07T02:47:21Z

Sorry to inform you that 44ad03e's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

xingfeng01 · 2022-07-13T02:05:52Z

paddle/phi/kernels/gpu/batch_norm_grad_kernel.cu

@@ -591,10 +591,12 @@ void BatchNormGradRawKernel(const Context &ctx,
 //             ctx.GetPlace()),
 //         epsilon, saved_mean_data, saved_var_data));
 #else
-      // CUDNN PER_ACTIVATION mode only support small batch size
+      // CUDNN only support small batch size


和版本有关吗？

目前用到的版本是不支持太大的batch_size。

xingfeng01 · 2022-07-13T02:06:53Z

paddle/phi/kernels/gpu/batch_norm_kernel.cu

@@ -137,6 +138,398 @@ static __global__ LAUNCH_BOUNDS(BlockDim) void BNForwardTraining(
  }
 }

+template <typename T>


后续补充注释

EsdeathYZH added 15 commits May 29, 2022 18:00

refactor code structure

441da36

add native kernel usage

c48e076

add wellford impl

0a68ba3

add shmem impl

b3248c9

add dispatch logic

78349a2

add channel_last impl

98c66f0

refine the global space init

570dc55

impl 2d kernel

aaca04a

Merge remote-tracking branch 'paddle/develop' into optim_batchnorm1d

29ef723

rm wellford

74b792b

fix backward

a0bd5b6

add unit test for batchnorm1d

2433ebf

fix bug

90c27a6

impl channel last 2d

91d83e5

refine

6871dbf

paddle-bot-old bot added contributor External developers status: proposed labels Jun 15, 2022

EsdeathYZH added 5 commits June 15, 2022 16:01

fix memory thpt

0571ecc

fix threshold

804ba03

fix backward threshold

48c6344

refine unit test

6785f6f

refine test

e46ef54

EsdeathYZH mentioned this pull request Jun 17, 2022

Fix cudnn error for BatchNorm1D kernel #43072

Merged

EsdeathYZH added 2 commits June 17, 2022 14:32

delete pragma unroll

938cde3

Merge remote-tracking branch 'paddle/develop' into optim_batchnorm1d

a79206f

zkh2016 reviewed Jun 28, 2022

View reviewed changes

paddle/phi/kernels/gpu/batch_norm_kernel.cu Outdated Show resolved Hide resolved

paddle/phi/kernels/gpu/batch_norm_kernel.cu Show resolved Hide resolved

refine code

44ad03e

fix

0b5ca0e

zkh2016 approved these changes Jul 12, 2022

View reviewed changes

niuliling123 approved these changes Jul 13, 2022

View reviewed changes

xingfeng01 reviewed Jul 13, 2022

View reviewed changes

zkh2016 merged commit 1bc47c8 into PaddlePaddle:develop Jul 14, 2022

zkh2016 mentioned this pull request Aug 2, 2022

Optimize BatchNorm1D backward #44783

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize batchnorm1d using 2D kernel #43530

Optimize batchnorm1d using 2D kernel #43530

EsdeathYZH commented Jun 15, 2022 •

edited by zkh2016

paddle-bot-old bot commented Jun 15, 2022

paddle-bot-old bot commented Jun 26, 2022

paddle-bot-old bot commented Jul 7, 2022

xingfeng01 Jul 13, 2022

zkh2016 Jul 14, 2022

xingfeng01 Jul 13, 2022

Optimize batchnorm1d using 2D kernel #43530

Optimize batchnorm1d using 2D kernel #43530

Conversation

EsdeathYZH commented Jun 15, 2022 • edited by zkh2016

PR types

PR changes

Describe

paddle-bot-old bot commented Jun 15, 2022

paddle-bot-old bot commented Jun 26, 2022

paddle-bot-old bot commented Jul 7, 2022

xingfeng01 Jul 13, 2022

Choose a reason for hiding this comment

zkh2016 Jul 14, 2022

Choose a reason for hiding this comment

xingfeng01 Jul 13, 2022

Choose a reason for hiding this comment

EsdeathYZH commented Jun 15, 2022 •

edited by zkh2016