Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Paddle-Inference] Add cutlass conv2d_depthwise #51792

Merged
merged 26 commits into from Apr 17, 2023

Conversation

zhoutianzi666
Copy link
Contributor

@zhoutianzi666 zhoutianzi666 commented Mar 17, 2023

PR types

Others

PR changes

Others

Describe

  • 以前判断CUTLASS能否支持某个conv的逻辑是分散在每个Pass里的
    • 如conv bias act的融合是在 conv_elementwise_add_act_fuse_pass 中加了一堆逻辑判断其是否可由CUTLASS计算
    • conv2d_fusion_layout_transfer_pass 中也有一部分逻辑判断conv2d_fusion其是否可交由CUTLASS计算
    • 以后还会需要在更多pass(如即将支持的conv+bias+激活+elementwise+激活)中判断其所fuse的pattern 是否可由CUTLASS计算
    • 现将这些逻辑统一放到 cutlass_teller.h 中了。
  • 该PR还增加了conv2d_depthwise 的模版生成代码。
    • 只加了3x3s1s2,5x5s1s2
  • 该PR还在conv2d中添加sigmoid epilogue。

@@ -202,3 +202,20 @@ def GenerateFunctionForPhi(
op_dicts["op_name"] = camel_names[epi_func]
generated_code += SubstituteTemplate(CommonWrapperForPhi, op_dicts)
return generated_code


# we modify some template parameters based on CommonCutlassConvKernelDeclare.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conv2d_depthwise的模版和conv2d的模版元素稍有不同,我不想搞两份模版了,所以就在conv2d的模版上改动下!

@@ -152,14 +141,17 @@ void Conv2dFusionLayoutTransferPass::ApplyImpl(ir::Graph *graph) const {
std::string target_op_type = "conv2d_fusion";
std::unordered_set<ir::Node *> valid_ops;

// Determine if this conv2d_fusion can run in cuDNN's NHWC mode,
// will not set or change any attribute in op_desc
auto cuDNNIsValid = [&](ir::Node *op_node) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuDNNIsValid这个逻辑就单独成为一个,只给cuDNN使用!
此处判断此conv2d_fusion能否交给cu DNN以NHWC的方式运行呢

}
}
return true;
return CutlassTeller::Instance()->Conv2dFusionCanSupport(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里调用CUTLASS_teller中的函数!


// Determine this NCHW conv2d_fusion can be computed by cutlass?
// will not set or change any attribute in op_desc
bool Conv2dFusionCanSupport(ir::Node *conv2d_fusion_node,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数的作用是:判断这个conv2d_fusion 这个Op能不能给CUTLASS来计算呢!


// Determine whether this conv can be fused with the activation by cutlass
// backend.
bool Conv2dCanSupport(int oc,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数的作用是:这个conv + bias 和act能否融合,并且交给CUTLASS计算呢?

@@ -243,6 +254,11 @@ int ProfileToGetBestConfig(
auto func = all_func[i];
// When func has large diff, we will make it nullptr.
if (!func) continue;
cudaMemset(params.output,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

跑之前要记得清空哦!这样才能看出来到底有没有diff!
然后试跑一次,看能不能跑,不能跑赶紧continue哦!

@@ -0,0 +1,130 @@
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cutlass_teller.h放在framwork/ir目录下是否合适?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cutlass_teller.h放在framwork/ir目录下是否合适?

这个类只会在pass阶段被使用,因此最好放在这里

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teller放ir里面还是感觉有点奇怪

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_decl.h Outdated Show resolved Hide resolved
paddle/phi/kernels/fusion/cutlass/conv2d_fusion.cu Outdated Show resolved Hide resolved
Comment on lines 138 to 140
"arch": "cutlass::arch::Sm70",
"Ishape": "1,1,1",
"stages": "4",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里没什么是sm 70, stages为什么设置为4而不是2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里没什么是sm 70, stages为什么设置为4而不是2

conv2d_depthwise是cuda core,其实设置sm啥都无所谓的。

stages cutlass中重新改为2了!

Comment on lines 180 to 185
CHECK_EQ(op_node->IsOp(), true);
if (cuDNNIsValid(op_node)) {
if (cuDNNIsValid(op_node) || CutlassIsValid(op_node)) {
valid_ops.insert(op_node);
auto *op_desc = op_node->Op();
op_desc->SetAttr("data_format", std::string{"NHWC"});
if (cutlass_enable && CutlassIsValid(op_node)) {

if (CutlassIsValid(op_node)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PADDLE_WITH_CUTLASS 为off,同时enable_cutlass为true会存在问题

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PADDLE_WITH_CUTLASS 为off,同时enable_cutlass为true会存在问题

这部分逻辑统一到了cutlass_teller.h中处理了,当PADDLE_WITH_CUTLASS 为off时,一些函数都返回False

@@ -0,0 +1,130 @@
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teller放ir里面还是感觉有点奇怪

if (op_node->Op()->Type() != target_op_type) {
continue;
}
auto filter_name = op_node->Op()->Input("Filter").front();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加181-184行是为啥啊?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加181-184行是为啥啊?

这个是防止权重共享的逻辑。把他单独移出来判断,和CutlassIsValid 和cuDNNIsValid的逻辑解耦。

"cutlass::half_t",
"cutlass::half_t",
),
# (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥给注释掉了啊?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥给注释掉了啊?

稳定性起见,还是用fp32作为累加器,防止溢出。

continue;
}
auto filter_name = op_node->Op()->Input("Filter").front();
if (weights_shape_nhwc.count(filter_name)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是防止权重共享的逻辑。把他单独移出来判断,和CutlassIsValid 和cuDNNIsValid的逻辑解耦。

VLOG(3) << OpType2String(op_type) << ": tactic " << i << " has max diff "
<< conv2d_diff_gpu(params, op_type) << " compared with baseline,"
<< "cost_time: " << elapsed_time << "ms.";
std::cout << OpType2String(op_type) << ": tactic " << i
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

换回VLOG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

换回VLOG?

ok!

# groups_per_cta: per cta would process
# warp_m: per warp would process
[8, 8, 16, 16],
# [8, 16, 16, 16],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些注释的配置保留是以后可能会开起嘛?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些注释的配置保留是以后可能会开起嘛?

是的,这些也是一个配置,这里主要考虑到避免生成太多代码才把他注释掉的

@@ -71,6 +71,7 @@
int ow = params.ow;
int dilation_h = params.dilation_h;
int dilation_w = params.dilation_w;
int split_k_slices = ${split_k_slices};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于group=1的卷积来说,这里都设置为1.
但是对于depthwise_conv2d来说,这里需要灵活根据问题规模来设置了

namespace framework {
namespace ir {

typedef enum {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cba指的是:conv+bias+act形式的融合
cbaa指的是:conv+bias+elementwise_add + act形式的融合
今后还会支持更多形式的pattern,因此这里定义了一个枚举类型

if (!cutlass_can_support) {
bool cudnn_can_support =
oc % CUDNN_ALIGNMENT == 0 && ic % CUDNN_ALIGNMENT == 0;
if (!cudnn_can_support) {
return false;
}
}
return true;
};

auto CutlassIsValid = [&](ir::Node *op_node) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

判断这个conv2d_fusion能否由cutlass backend计算呢

@@ -112,17 +112,6 @@ void Conv2dFusionLayoutTransferPass::ApplyImpl(ir::Graph *graph) const {
phi::DataType::FLOAT16 ||
Get<bool>("enable_gpu_mixed");
bool cutlass_enable = Get<bool>("use_cutlass");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些删掉的逻辑全部放到cutlass_teller.h中进行了

Copy link
Contributor

@zhangjun zhangjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhangjun zhangjun merged commit bd3b096 into PaddlePaddle:develop Apr 17, 2023
24 checks passed
jjyaoao pushed a commit to jjyaoao/Paddle that referenced this pull request Apr 19, 2023
* initial commit for cutlass_teller

* second commit for cutlass_teller

* add conv2d_depthwise python template

* add conv2d_depthwise cutlass template

* /zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass_teller.h

* refine code in Conv2dFusionCanSupport

* add macro in cutlass_teller.h

* add 3x3 5x5 teller

* add groups not 1 or conv2d_depthwise teller

* 只生成ic是8的倍数的conv2d_depthwise 的kernel

* add EXPLICIT in cutlass_teller.h

* final commit

* add split_k_slices in conv2d_depthwise

* make stages == 2

* 重构部分代码

* add CutlassFusionType

* solve illegal memory

* make stride_h=stride_w && make dilation==1

* must check HasAttr(use_cutlass) before GetAttrIfExists

* add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String

* modify decl.h and util.cu
lijialin03 pushed a commit to lijialin03/Paddle that referenced this pull request Apr 25, 2023
* initial commit for cutlass_teller

* second commit for cutlass_teller

* add conv2d_depthwise python template

* add conv2d_depthwise cutlass template

* /zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass_teller.h

* refine code in Conv2dFusionCanSupport

* add macro in cutlass_teller.h

* add 3x3 5x5 teller

* add groups not 1 or conv2d_depthwise teller

* 只生成ic是8的倍数的conv2d_depthwise 的kernel

* add EXPLICIT in cutlass_teller.h

* final commit

* add split_k_slices in conv2d_depthwise

* make stages == 2

* 重构部分代码

* add CutlassFusionType

* solve illegal memory

* make stride_h=stride_w && make dilation==1

* must check HasAttr(use_cutlass) before GetAttrIfExists

* add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String

* modify decl.h and util.cu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants