Add fused_scale_bias_relu_conv_bnstats OP #55026

Tom-Zheng · 2023-06-30T03:57:03Z

PR types

New features

PR changes

OPs

Description

Please merge this PR after #54949

This PR adds fused_scale_bias_relu_conv_bnstats which is needed for ResUnit fusion. It is implemented using CUDNN Frontend API.

paddle-bot · 2023-06-30T03:57:07Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-ci-bot · 2023-07-12T03:09:49Z

Sorry to inform you that 6df2d66's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Tom-Zheng · 2023-07-17T09:37:41Z

@Xreki For your information, here is the overview of this PR:

Review changes from last PR #54949:

paddle/phi/kernels/autotune/cache_cudnn_frontend.h
paddle/phi/kernels/gpudnn/conv_cudnn_frontend.h
paddle/phi/kernels/gpudnn/conv_kernel.cu

New OP implementation:

OP kernel
The implementation mostly follows CUDNN Frontend example: https://github.com/NVIDIA/cudnn-frontend/blob/12f35fa2be5994c1106367cac2fba21457b064f4/samples/fusion_sample.cpp#L86

paddle/phi/kernels/fusion/gpu/fused_scale_bias_relu_conv_bnstats_op.cu

OP registry

paddle/phi/api/yaml/fused_ops.yaml
paddle/phi/infermeta/fusion.cc
paddle/phi/infermeta/fusion.h
paddle/phi/kernels/CMakeLists.txt

Unittests

test/legacy_test/CMakeLists.txt
test/legacy_test/test_fused_scale_bias_relu_conv_bnstats_op.py

Tom-Zheng · 2023-08-03T02:21:40Z

CI错误后续 @tianshuo78520a 会帮忙手动批准. @Xreki 请麻烦review.

Xreki · 2023-08-07T13:02:28Z

paddle/phi/infermeta/fusion.cc

+    float epsilon,
+    bool fuse_prologue,
+    bool exhaustive_search,
+    int64_t accumulation_count,


这个参数是什么功能？

BatchNorm normalize的元素个数, 单GPU (非SyncBatchNorm)时为N*H*W. 详见Table 42.

Xreki · 2023-08-07T14:58:17Z

paddle/phi/kernels/fusion/gpu/fused_scale_bias_relu_conv_bnstats_kernel.cu

@@ -0,0 +1,635 @@
+/* Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.


copyright年份2022 -> 2023

Xreki · 2023-08-07T14:59:46Z

paddle/phi/kernels/fusion/gpu/fused_scale_bias_relu_conv_bnstats_kernel.cu

+template <typename T>
+using CudnnDataType = phi::backends::gpu::CudnnDataType<T>;
+
+template <typename T, typename Context>


函数前加一些注释，解释下函数的功能吧

Xreki · 2023-08-07T15:07:51Z

paddle/phi/kernels/fusion/gpu/fused_scale_bias_relu_conv_bnstats_kernel.cu

+          .build();
+
+  std::array<cudnn_frontend::Operation const*, 1> ops = {&finalize_stat_op};
+  auto op_graph = cudnn_frontend::OperationGraphBuilder()


L449 - L488看着，不同的组合里面，这部分实现是一样的，是否可以封装成一个函数？

Xreki · 2023-08-07T15:09:15Z

paddle/phi/kernels/fusion/gpu/fused_scale_bias_relu_conv_bnstats_kernel.cu

+                        dev_ctx.GetComputeCapability()));
+  // attr
+  float exp_decay = 1. - momentum;
+  if (epsilon <= CUDNN_BN_MIN_EPSILON - FLT_EPSILON) {


可以用PADDLE_ENFORCE_LE

这只是个警告, 不会quit. 与batchnorm实现一致:

Paddle/paddle/phi/kernels/gpu/batch_norm_kernel.cu

Line 608 in 42e0c6b

LOG(ERROR) << "Provided epsilon is smaller than "

Xreki · 2023-08-07T15:12:39Z

paddle/phi/kernels/fusion/gpu/fused_scale_bias_relu_conv_bnstats_kernel.cu

+using CudnnDataType = phi::backends::gpu::CudnnDataType<T>;
+
+template <typename T, typename Context>
+void _FusedScaleBiasReluConvBnstatsImpl(


函数名前的_可以去掉

Xreki · 2023-08-07T15:14:04Z

paddle/phi/kernels/fusion/gpu/fused_scale_bias_relu_conv_bnstats_kernel.cu

+                   GPU,
+                   ALL_LAYOUT,
+                   phi::fusion::FusedScaleBiasReluConvBnstatsKernel,
+                   phi::dtype::float16) {


bfloat16类型支持吗？

暂不支持.

Xreki · 2023-08-07T15:14:34Z

test/legacy_test/test_fused_scale_bias_relu_conv_bnstats_op.py

@@ -0,0 +1,243 @@
+#   Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.


2022 -> 2023

Xreki · 2023-08-07T15:17:04Z

test/legacy_test/test_fused_scale_bias_relu_conv_bnstats_op.py

+
+
+@skip_check_grad_ci(reason="no grap op")
+@unittest.skipIf(skip_unit_test(), skip_msg)


继承了基类，子类可以不用加skip装饰器

Xreki · 2023-08-07T15:17:55Z

test/white_list/op_accuracy_white_list.py

@@ -92,6 +92,7 @@

 NO_FP16_COMPARED_WITH_FP32_OP_LIST = [
    'fake_quantize_moving_average_abs_max',
+    'fused_scale_bias_relu_conv_bnstats',


为何要把算子加入这个白名单？

如果不加, check_output_with_place 会将FP16的结果与FP32结果做对比, 然而该OP不支持FP32输入, 所以会出错.

Tom-Zheng · 2023-08-08T14:36:38Z

@Xreki The changes have commit. Would you please take a look?

Tom-Zheng · 2023-08-09T01:05:11Z

The CI failures need to be manually approved and should not block the review process. @Xreki

onecatcn · 2023-08-14T07:00:49Z

2023-08-08 11:03:40 ****************
2023-08-08 11:03:41 0. You must have Dianhai or XiaoguangHu01 approval for change 20+ files or add than 1000+ lines of content.
2023-08-08 11:03:41 1. You must have one RD (XiaoguangHu01,chenwhql,zhiqiu,Xreki,luotao1,qili93,Aurelius84) approval for the usage of const_cast.
2023-08-08 11:03:41 2. Unittest is not allowed to be disabled.
2023-08-08 11:03:41 You must have one RD (kolinwei(Recommend), wanghuancoder, luotao1, QingshuChen, qili93 or ZzSean or Aurelius84) approval for the usage of @unittest.skip or @unittest.skipIf.
2023-08-08 11:03:41 +@unittest.skipIf(skip_unit_test(), skip_msg)
2023-08-08 11:03:41 3. Developers are not allowed to set the check_dygraph field directly, which is set to True by default. If you need to change the check_dygraph field, you must have one RD (phlrain (Recommend), fuyinno4, QingshuChen (Recommend for kunlun) or lanxianghit) review and approve.
2023-08-08 11:03:41 The code that do not meet the specification are as follows:
2023-08-08 11:03:41 test/legacy_test/test_fused_scale_bias_relu_conv_bnstats_op.py :
2023-08-08 11:03:41 + place, atol=self.atol, rtol=self.rtol, check_dygraph=False
2023-08-08 11:03:41 4. Please use the default precision parameters of 'atol, rtol, eps, max_relative_error'. If you don't use the default value, you must have one RD (Xreki (Recommend), fuyinno4, QingshuChen(Recommend for kunlun), zhiqiu or qili93 (Recommend for NPU) , luotao1, lanxianghit, phlrain or ZzSean or Aurelius84) approval for the usage of other values. The detailed information is in the link: https://github.cor/PaddlePaddle/Paddle/wiki/OP-test-accuracy-requirements. The error line is
2023-08-08 11:03:41 + place, atol=self.atol, rtol=self.rtol, check_dygraph=False
2023-08-08 11:03:41 5. It is an Op accuracy problem, please take care of it. You must have one RD (zhangting2020 (Recommend), luotao1 or phlrain, qili93, QingshuChen or Aurelius84) approval for the usage (either add or delete) of @skip_check_grad_ci. For more information, please refer to: https://github.com/PaddlePaddle/Paddle/wiki/Gradient-Check-Is-Required-for-Op-Test. The corresponding lines are as follows:
2023-08-08 11:03:41 test/legacy_test/test_fused_scale_bias_relu_conv_bnstats_op.py
2023-08-08 11:03:41 + @skip_check_grad_ci(reason="no grap op")
2023-08-08 11:03:41 There are 6 approved errors.

paddle-ci-bot · 2023-08-16T03:02:24Z

Sorry to inform you that b4cb408's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

paddle-ci-bot · 2023-08-25T03:06:32Z

Sorry to inform you that 5786863's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

…nv_bnstats

Xreki

LGTM

Xreki · 2023-08-30T05:24:59Z

paddle/phi/kernels/fusion/gpu/fused_scale_bias_relu_conv_bnstats_kernel.cu

+
+  std::vector<void*> data_ptrs;
+  std::vector<int64_t> uids;
+  int64_t uid = 100;


uid是什么，为什么设置成100呢？不同pattern的uid都要设置一样吗？建议uid的管理后续可以考虑统一下。

uid是为了区分不同变量的。不需要一样，只要一个cudnn operation graph内不存在冲突即可。

Xreki · 2023-08-30T05:36:05Z

paddle/phi/kernels/fusion/gpu/fused_scale_bias_relu_conv_bnstats_kernel.cu

+}
+
+template <typename T, typename Context>
+void FusedScaleBiasReluConvBnstatsKernel(


FusedScaleBiasReluConvBnstatsKernel是FusedScaleBiasReluConvBnstatsImpl+BNFinalizeImpl，所以功能应该是FusedScaleBiasReluConvBn？

是，下个PR修改

zhangting2020

LGTM

lanxianghit

LGTM for new fused op

jzhang533

LGTM

XiaoguangHu01

LGTM

Tom-Zheng · 2023-08-31T03:00:40Z

@Xreki Would you please merge it?

* Add fused_scale_bias_relu_conv_bnstats op * Review changes * Fix no CUDNN Frontend build * Fix PADDLE_ENFORCE format * Fix PADDLE_ENFORCE CI error * Rename kernel filename * Refactor unittest to use paddle eager_op_test * Fix padding bugs * Review changes * test=cuda117 * test=cuda117

paddle-bot bot added contributor External developers status: proposed labels Jun 30, 2023

Tom-Zheng requested a review from Xreki June 30, 2023 04:08

Tom-Zheng force-pushed the add_fused_scale_bias_relu_conv_bnstats branch from 6ea7229 to 6df2d66 Compare July 3, 2023 02:41

Tom-Zheng added the NVIDIA label Jul 14, 2023

paddle-bot bot removed the status: proposed label Jul 14, 2023

onecatcn assigned Xreki Jul 14, 2023

Tom-Zheng mentioned this pull request Jul 14, 2023

Update CUDNN Frontend API to v0.9.1 #54949

Merged

Tom-Zheng force-pushed the add_fused_scale_bias_relu_conv_bnstats branch from 0675338 to 6f3ad2a Compare August 2, 2023 01:28

onecatcn assigned JamesLim-sy Aug 3, 2023

Xreki reviewed Aug 7, 2023

View reviewed changes

onecatcn requested review from XiaoguangHu01, kolinwei, wanghuancoder, phlrain, fuyinno4 and zhangting2020 August 14, 2023 07:01

Tom-Zheng added 5 commits August 17, 2023 07:16

Add fused_scale_bias_relu_conv_bnstats op

40bd0d2

Review changes

62f5fd6

Fix no CUDNN Frontend build

d00fa55

Fix PADDLE_ENFORCE format

f2fc441

Fix PADDLE_ENFORCE CI error

ffc828c

Tom-Zheng added 4 commits August 17, 2023 07:17

Rename kernel filename

08348c1

Refactor unittest to use paddle eager_op_test

cf6d85d

Fix padding bugs

af457c2

Review changes

bc6a1cd

Tom-Zheng force-pushed the add_fused_scale_bias_relu_conv_bnstats branch from b4cb408 to bc6a1cd Compare August 17, 2023 07:17

test=cuda117

5786863

Tom-Zheng and others added 2 commits August 29, 2023 06:41

test=cuda117

6d884db

Merge branch 'PaddlePaddle:develop' into add_fused_scale_bias_relu_co…

46ff19b

…nv_bnstats

Xreki approved these changes Aug 30, 2023

View reviewed changes

onecatcn removed the request for review from kolinwei August 30, 2023 09:18

wanghuancoder approved these changes Aug 30, 2023

View reviewed changes

onecatcn requested review from wanghuancoder and removed request for fuyinno4 August 30, 2023 09:19

onecatcn unassigned JamesLim-sy Aug 30, 2023

zhangting2020 approved these changes Aug 30, 2023

View reviewed changes

onecatcn requested review from jzhang533 and lanxianghit August 30, 2023 10:47

lanxianghit approved these changes Aug 30, 2023

View reviewed changes

jzhang533 approved these changes Aug 30, 2023

View reviewed changes

XiaoguangHu01 approved these changes Aug 30, 2023

View reviewed changes

Xreki merged commit 71e28b1 into PaddlePaddle:develop Aug 31, 2023
25 of 26 checks passed

Tom-Zheng mentioned this pull request Oct 30, 2023

[2/4] CUDNNv8 ResNet Fusion: Add fused_scale_bias_add_relu OP #58504

Merged

		@@ -0,0 +1,635 @@
		/* Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,243 @@
		# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.



		@skip_check_grad_ci(reason="no grap op")
		@unittest.skipIf(skip_unit_test(), skip_msg)

Add fused_scale_bias_relu_conv_bnstats OP #55026

Add fused_scale_bias_relu_conv_bnstats OP #55026

Conversation

Tom-Zheng commented Jun 30, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jun 30, 2023

paddle-ci-bot bot commented Jul 12, 2023

Tom-Zheng commented Jul 17, 2023

Tom-Zheng commented Aug 3, 2023

Choose a reason for hiding this comment

Tom-Zheng Aug 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tom-Zheng commented Aug 8, 2023

Tom-Zheng commented Aug 9, 2023

onecatcn commented Aug 14, 2023

paddle-ci-bot bot commented Aug 16, 2023

paddle-ci-bot bot commented Aug 25, 2023

Xreki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangting2020 left a comment

Choose a reason for hiding this comment

lanxianghit left a comment

Choose a reason for hiding this comment

jzhang533 left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

Tom-Zheng commented Aug 31, 2023

Tom-Zheng commented Jun 30, 2023 •

edited

Loading

Tom-Zheng Aug 7, 2023 •

edited

Loading