Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fused_scale_bias_relu_conv_bnstats OP #55026

Merged

Conversation

Tom-Zheng
Copy link
Contributor

@Tom-Zheng Tom-Zheng commented Jun 30, 2023

PR types

New features

PR changes

OPs

Description

Please merge this PR after #54949

This PR adds fused_scale_bias_relu_conv_bnstats which is needed for ResUnit fusion. It is implemented using CUDNN Frontend API.

@paddle-bot
Copy link

paddle-bot bot commented Jun 30, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added contributor External developers status: proposed labels Jun 30, 2023
@Tom-Zheng Tom-Zheng requested a review from Xreki June 30, 2023 04:08
@Tom-Zheng Tom-Zheng force-pushed the add_fused_scale_bias_relu_conv_bnstats branch from 6ea7229 to 6df2d66 Compare July 3, 2023 02:41
@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Jul 12, 2023

Sorry to inform you that 6df2d66's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@Tom-Zheng
Copy link
Contributor Author

@Xreki For your information, here is the overview of this PR:

Review changes from last PR #54949:

  • paddle/phi/kernels/autotune/cache_cudnn_frontend.h
  • paddle/phi/kernels/gpudnn/conv_cudnn_frontend.h
  • paddle/phi/kernels/gpudnn/conv_kernel.cu

New OP implementation:

OP kernel
The implementation mostly follows CUDNN Frontend example: https://github.com/NVIDIA/cudnn-frontend/blob/12f35fa2be5994c1106367cac2fba21457b064f4/samples/fusion_sample.cpp#L86

  • paddle/phi/kernels/fusion/gpu/fused_scale_bias_relu_conv_bnstats_op.cu

OP registry

  • paddle/phi/api/yaml/fused_ops.yaml
  • paddle/phi/infermeta/fusion.cc
  • paddle/phi/infermeta/fusion.h
  • paddle/phi/kernels/CMakeLists.txt

Unittests

  • test/legacy_test/CMakeLists.txt
  • test/legacy_test/test_fused_scale_bias_relu_conv_bnstats_op.py

@Tom-Zheng Tom-Zheng force-pushed the add_fused_scale_bias_relu_conv_bnstats branch from 0675338 to 6f3ad2a Compare August 2, 2023 01:28
@Tom-Zheng
Copy link
Contributor Author

CI错误后续 @tianshuo78520a 会帮忙手动批准. @Xreki 请麻烦review.

float epsilon,
bool fuse_prologue,
bool exhaustive_search,
int64_t accumulation_count,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个参数是什么功能?

Copy link
Contributor Author

@Tom-Zheng Tom-Zheng Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BatchNorm normalize的元素个数, 单GPU (非SyncBatchNorm)时为N*H*W. 详见Table 42.

@@ -0,0 +1,635 @@
/* Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyright年份2022 -> 2023

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

template <typename T>
using CudnnDataType = phi::backends::gpu::CudnnDataType<T>;

template <typename T, typename Context>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数前加一些注释,解释下函数的功能吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.build();

std::array<cudnn_frontend::Operation const*, 1> ops = {&finalize_stat_op};
auto op_graph = cudnn_frontend::OperationGraphBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L449 - L488看着,不同的组合里面,这部分实现是一样的,是否可以封装成一个函数?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

dev_ctx.GetComputeCapability()));
// attr
float exp_decay = 1. - momentum;
if (epsilon <= CUDNN_BN_MIN_EPSILON - FLT_EPSILON) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以用PADDLE_ENFORCE_LE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这只是个警告, 不会quit. 与batchnorm实现一致:

LOG(ERROR) << "Provided epsilon is smaller than "

using CudnnDataType = phi::backends::gpu::CudnnDataType<T>;

template <typename T, typename Context>
void _FusedScaleBiasReluConvBnstatsImpl(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数名前的_可以去掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

GPU,
ALL_LAYOUT,
phi::fusion::FusedScaleBiasReluConvBnstatsKernel,
phi::dtype::float16) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bfloat16类型支持吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

暂不支持.

@@ -0,0 +1,243 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2022 -> 2023

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



@skip_check_grad_ci(reason="no grap op")
@unittest.skipIf(skip_unit_test(), skip_msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

继承了基类,子类可以不用加skip装饰器

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -92,6 +92,7 @@

NO_FP16_COMPARED_WITH_FP32_OP_LIST = [
'fake_quantize_moving_average_abs_max',
'fused_scale_bias_relu_conv_bnstats',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为何要把算子加入这个白名单?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果不加, check_output_with_place 会将FP16的结果与FP32结果做对比, 然而该OP不支持FP32输入, 所以会出错.

@Tom-Zheng
Copy link
Contributor Author

@Xreki The changes have commit. Would you please take a look?

@Tom-Zheng
Copy link
Contributor Author

The CI failures need to be manually approved and should not block the review process. @Xreki

@onecatcn
Copy link
Contributor

2023-08-08 11:03:40 ****************
2023-08-08 11:03:41 0. You must have Dianhai or XiaoguangHu01 approval for change 20+ files or add than 1000+ lines of content.
2023-08-08 11:03:41 1. You must have one RD (XiaoguangHu01,chenwhql,zhiqiu,Xreki,luotao1,qili93,Aurelius84) approval for the usage of const_cast.
2023-08-08 11:03:41 2. Unittest is not allowed to be disabled.
2023-08-08 11:03:41 You must have one RD (kolinwei(Recommend), wanghuancoder, luotao1, QingshuChen, qili93 or ZzSean or Aurelius84) approval for the usage of @unittest.skip or @unittest.skipIf.
2023-08-08 11:03:41 +@unittest.skipIf(skip_unit_test(), skip_msg)
2023-08-08 11:03:41 3. Developers are not allowed to set the check_dygraph field directly, which is set to True by default. If you need to change the check_dygraph field, you must have one RD (phlrain (Recommend), fuyinno4, QingshuChen (Recommend for kunlun) or lanxianghit) review and approve.
2023-08-08 11:03:41 The code that do not meet the specification are as follows:
2023-08-08 11:03:41 test/legacy_test/test_fused_scale_bias_relu_conv_bnstats_op.py :
2023-08-08 11:03:41 + place, atol=self.atol, rtol=self.rtol, check_dygraph=False
2023-08-08 11:03:41 4. Please use the default precision parameters of 'atol, rtol, eps, max_relative_error'. If you don't use the default value, you must have one RD (Xreki (Recommend), fuyinno4, QingshuChen(Recommend for kunlun), zhiqiu or qili93 (Recommend for NPU) , luotao1, lanxianghit, phlrain or ZzSean or Aurelius84) approval for the usage of other values. The detailed information is in the link: https://github.cor/PaddlePaddle/Paddle/wiki/OP-test-accuracy-requirements. The error line is
2023-08-08 11:03:41 + place, atol=self.atol, rtol=self.rtol, check_dygraph=False
2023-08-08 11:03:41 5. It is an Op accuracy problem, please take care of it. You must have one RD (zhangting2020 (Recommend), luotao1 or phlrain, qili93, QingshuChen or Aurelius84) approval for the usage (either add or delete) of @skip_check_grad_ci. For more information, please refer to: https://github.com/PaddlePaddle/Paddle/wiki/Gradient-Check-Is-Required-for-Op-Test. The corresponding lines are as follows:
2023-08-08 11:03:41 test/legacy_test/test_fused_scale_bias_relu_conv_bnstats_op.py
2023-08-08 11:03:41 + @skip_check_grad_ci(reason="no grap op")
2023-08-08 11:03:41 There are 6 approved errors.

@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Aug 16, 2023

Sorry to inform you that b4cb408's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@Tom-Zheng Tom-Zheng force-pushed the add_fused_scale_bias_relu_conv_bnstats branch from b4cb408 to bc6a1cd Compare August 17, 2023 07:17
@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Aug 25, 2023

Sorry to inform you that 5786863's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


std::vector<void*> data_ptrs;
std::vector<int64_t> uids;
int64_t uid = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uid是什么,为什么设置成100呢?不同patternuid都要设置一样吗?建议uid的管理后续可以考虑统一下。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uid是为了区分不同变量的。不需要一样,只要一个cudnn operation graph内不存在冲突即可。

}

template <typename T, typename Context>
void FusedScaleBiasReluConvBnstatsKernel(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FusedScaleBiasReluConvBnstatsKernelFusedScaleBiasReluConvBnstatsImpl+BNFinalizeImpl,所以功能应该是FusedScaleBiasReluConvBn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是,下个PR修改

@onecatcn onecatcn removed the request for review from kolinwei August 30, 2023 09:18
@onecatcn onecatcn requested review from wanghuancoder and removed request for fuyinno4 August 30, 2023 09:19
Copy link
Contributor

@zhangting2020 zhangting2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@lanxianghit lanxianghit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for new fused op

Copy link
Contributor

@jzhang533 jzhang533 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Tom-Zheng
Copy link
Contributor Author

@Xreki Would you please merge it?

@Xreki Xreki merged commit 71e28b1 into PaddlePaddle:develop Aug 31, 2023
25 of 26 checks passed
BeingGod pushed a commit to BeingGod/Paddle that referenced this pull request Sep 9, 2023
* Add fused_scale_bias_relu_conv_bnstats op

* Review changes

* Fix no CUDNN Frontend build

* Fix PADDLE_ENFORCE format

* Fix PADDLE_ENFORCE CI error

* Rename kernel filename

* Refactor unittest to use paddle eager_op_test

* Fix padding bugs

* Review changes

* test=cuda117

* test=cuda117
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers NVIDIA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants