[Paddle-Inference] Add cutlass conv2d_depthwise #51792

zhoutianzi666 · 2023-03-17T06:58:05Z

PR types

Others

PR changes

Others

Describe

以前判断CUTLASS能否支持某个conv的逻辑是分散在每个Pass里的
- 如conv bias act的融合是在 conv_elementwise_add_act_fuse_pass 中加了一堆逻辑判断其是否可由CUTLASS计算
- conv2d_fusion_layout_transfer_pass 中也有一部分逻辑判断conv2d_fusion其是否可交由CUTLASS计算
- 以后还会需要在更多pass(如即将支持的conv+bias+激活+elementwise+激活)中判断其所fuse的pattern 是否可由CUTLASS计算
- 现将这些逻辑统一放到 cutlass_teller.h 中了。
该PR还增加了conv2d_depthwise 的模版生成代码。
- 只加了3x3s1s2，5x5s1s2
该PR还在conv2d中添加sigmoid epilogue。

zhoutianzi666 · 2023-03-17T07:12:51Z

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_common.py

@@ -202,3 +202,20 @@ def GenerateFunctionForPhi(
        op_dicts["op_name"] = camel_names[epi_func]
        generated_code += SubstituteTemplate(CommonWrapperForPhi, op_dicts)
    return generated_code
+
+
+# we modify some template parameters based on CommonCutlassConvKernelDeclare.


conv2d_depthwise的模版和conv2d的模版元素稍有不同，我不想搞两份模版了，所以就在conv2d的模版上改动下！

zhoutianzi666 · 2023-03-17T07:13:58Z

paddle/fluid/framework/ir/conv2d_fusion_layout_transfer_pass.cc

@@ -152,14 +141,17 @@ void Conv2dFusionLayoutTransferPass::ApplyImpl(ir::Graph *graph) const {
  std::string target_op_type = "conv2d_fusion";
  std::unordered_set<ir::Node *> valid_ops;

+  // Determine if this conv2d_fusion can run in cuDNN's NHWC mode,
+  // will not set or change any attribute in op_desc
  auto cuDNNIsValid = [&](ir::Node *op_node) -> bool {


cuDNNIsValid这个逻辑就单独成为一个，只给cuDNN使用！
此处判断此conv2d_fusion能否交给cu DNN以NHWC的方式运行呢

zhoutianzi666 · 2023-03-17T07:14:30Z

paddle/fluid/framework/ir/conv2d_fusion_layout_transfer_pass.cc

-      }
-    }
-    return true;
+    return CutlassTeller::Instance()->Conv2dFusionCanSupport(


这里调用CUTLASS_teller中的函数！

zhoutianzi666 · 2023-03-17T07:16:21Z

paddle/fluid/framework/ir/cutlass_teller.h

+
+  // Determine this NCHW conv2d_fusion can be computed by cutlass?
+  // will not set or change any attribute in op_desc
+  bool Conv2dFusionCanSupport(ir::Node *conv2d_fusion_node,


这个函数的作用是：判断这个conv2d_fusion 这个Op能不能给CUTLASS来计算呢！

zhoutianzi666 · 2023-03-17T07:17:26Z

paddle/fluid/framework/ir/cutlass_teller.h

+
+  // Determine whether this conv can be fused with the activation by cutlass
+  // backend.
+  bool Conv2dCanSupport(int oc,


这个函数的作用是：这个conv + bias 和act能否融合，并且交给CUTLASS计算呢？

zhoutianzi666 · 2023-03-17T07:19:09Z

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_util.cu

@@ -243,6 +254,11 @@ int ProfileToGetBestConfig(
    auto func = all_func[i];
    // When func has large diff, we will make it nullptr.
    if (!func) continue;
+    cudaMemset(params.output,


跑之前要记得清空哦！这样才能看出来到底有没有diff！
然后试跑一次，看能不能跑，不能跑赶紧continue哦！

yuanlehome · 2023-03-17T10:44:55Z

paddle/fluid/framework/ir/cutlass_teller.h

@@ -0,0 +1,130 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.


cutlass_teller.h放在framwork/ir目录下是否合适？

cutlass_teller.h放在framwork/ir目录下是否合适？

这个类只会在pass阶段被使用，因此最好放在这里

teller放ir里面还是感觉有点奇怪

…_teller.h

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_decl.h

paddle/phi/kernels/fusion/cutlass/conv2d_fusion.cu

zhangjun · 2023-03-20T03:25:08Z

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_depthwise_bias_act.py

+        "arch": "cutlass::arch::Sm70",
+        "Ishape": "1,1,1",
+        "stages": "4",


这里没什么是sm 70， stages为什么设置为4而不是2

这里没什么是sm 70， stages为什么设置为4而不是2

conv2d_depthwise是cuda core，其实设置sm啥都无所谓的。

stages cutlass中重新改为2了！

zhangjun · 2023-03-20T03:41:35Z

paddle/fluid/framework/ir/conv2d_fusion_layout_transfer_pass.cc

    CHECK_EQ(op_node->IsOp(), true);
-    if (cuDNNIsValid(op_node)) {
+    if (cuDNNIsValid(op_node) || CutlassIsValid(op_node)) {
      valid_ops.insert(op_node);
      auto *op_desc = op_node->Op();
-      op_desc->SetAttr("data_format", std::string{"NHWC"});
-      if (cutlass_enable && CutlassIsValid(op_node)) {
+
+      if (CutlassIsValid(op_node)) {


PADDLE_WITH_CUTLASS 为off，同时enable_cutlass为true会存在问题

PADDLE_WITH_CUTLASS 为off，同时enable_cutlass为true会存在问题

这部分逻辑统一到了cutlass_teller.h中处理了，当PADDLE_WITH_CUTLASS 为off时，一些函数都返回False

zhangjun · 2023-03-20T03:42:44Z

paddle/fluid/framework/ir/cutlass_teller.h

@@ -0,0 +1,130 @@
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.


teller放ir里面还是感觉有点奇怪

yuanlehome · 2023-04-03T09:22:01Z

paddle/fluid/framework/ir/conv2d_fusion_layout_transfer_pass.cc

+    if (op_node->Op()->Type() != target_op_type) {
+      continue;
+    }
+    auto filter_name = op_node->Op()->Input("Filter").front();


加181-184行是为啥啊？

加181-184行是为啥啊？

这个是防止权重共享的逻辑。把他单独移出来判断，和CutlassIsValid 和cuDNNIsValid的逻辑解耦。

yuanlehome · 2023-04-03T09:24:20Z

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_bias_act.py

-            "cutlass::half_t",
-            "cutlass::half_t",
-        ),
+        # (


为啥给注释掉了啊？

为啥给注释掉了啊？

稳定性起见，还是用fp32作为累加器，防止溢出。

zhoutianzi666 · 2023-04-04T06:44:44Z

paddle/fluid/framework/ir/conv2d_fusion_layout_transfer_pass.cc

+      continue;
+    }
+    auto filter_name = op_node->Op()->Input("Filter").front();
+    if (weights_shape_nhwc.count(filter_name)) {


这个是防止权重共享的逻辑。把他单独移出来判断，和CutlassIsValid 和cuDNNIsValid的逻辑解耦。

jiweibo · 2023-04-12T11:43:40Z

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_util.cu

-      VLOG(3) << OpType2String(op_type) << ": tactic " << i << " has max diff "
-              << conv2d_diff_gpu(params, op_type) << " compared with baseline,"
-              << "cost_time: " << elapsed_time << "ms.";
+      std::cout << OpType2String(op_type) << ": tactic " << i


换回VLOG？

换回VLOG？

ok!

wwbitejotunn · 2023-04-13T05:54:21Z

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_depthwise_bias_act.py

+                        # groups_per_cta: per cta would process
+                        # warp_m: per warp would process
+                        [8, 8, 16, 16],
+                        # [8, 16, 16, 16],


这些注释的配置保留是以后可能会开起嘛?

这些注释的配置保留是以后可能会开起嘛?

是的，这些也是一个配置，这里主要考虑到避免生成太多代码才把他注释掉的

zhoutianzi666 · 2023-04-17T01:00:30Z

paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_common.py

@@ -71,6 +71,7 @@
  int ow = params.ow;
  int dilation_h = params.dilation_h;
  int dilation_w = params.dilation_w;
+  int split_k_slices = ${split_k_slices};


对于group=1的卷积来说，这里都设置为1.
但是对于depthwise_conv2d来说，这里需要灵活根据问题规模来设置了

zhoutianzi666 · 2023-04-17T05:23:57Z

paddle/fluid/framework/ir/cutlass_teller.h

+namespace framework {
+namespace ir {
+
+typedef enum {


cba指的是：conv+bias+act形式的融合
cbaa指的是：conv+bias+elementwise_add + act形式的融合
今后还会支持更多形式的pattern，因此这里定义了一个枚举类型

zhoutianzi666 · 2023-04-17T05:27:10Z

paddle/fluid/framework/ir/conv2d_fusion_layout_transfer_pass.cc

-      if (!cutlass_can_support) {
+      bool cudnn_can_support =
+          oc % CUDNN_ALIGNMENT == 0 && ic % CUDNN_ALIGNMENT == 0;
+      if (!cudnn_can_support) {
        return false;
      }
    }
    return true;
  };

  auto CutlassIsValid = [&](ir::Node *op_node) -> bool {


判断这个conv2d_fusion能否由cutlass backend计算呢

zhoutianzi666 · 2023-04-17T05:37:31Z

paddle/fluid/framework/ir/conv2d_fusion_layout_transfer_pass.cc

@@ -112,17 +112,6 @@ void Conv2dFusionLayoutTransferPass::ApplyImpl(ir::Graph *graph) const {
          phi::DataType::FLOAT16 ||
      Get<bool>("enable_gpu_mixed");
  bool cutlass_enable = Get<bool>("use_cutlass");
-


这些删掉的逻辑全部放到cutlass_teller.h中进行了

zhangjun

LGTM

* initial commit for cutlass_teller * second commit for cutlass_teller * add conv2d_depthwise python template * add conv2d_depthwise cutlass template * /zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass_teller.h * refine code in Conv2dFusionCanSupport * add macro in cutlass_teller.h * add 3x3 5x5 teller * add groups not 1 or conv2d_depthwise teller * 只生成ic是8的倍数的conv2d_depthwise 的kernel * add EXPLICIT in cutlass_teller.h * final commit * add split_k_slices in conv2d_depthwise * make stages == 2 * 重构部分代码 * add CutlassFusionType * solve illegal memory * make stride_h=stride_w && make dilation==1 * must check HasAttr(use_cutlass) before GetAttrIfExists * add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String * modify decl.h and util.cu

zhoutianzi666 added 3 commits March 17, 2023 04:00

initial commit for cutlass_teller

032b0e0

second commit for cutlass_teller

2e04c90

add conv2d_depthwise python template

9105bbd

zhoutianzi666 commented Mar 17, 2023

View reviewed changes

add conv2d_depthwise cutlass template

cc8d9b1

yuanlehome reviewed Mar 17, 2023

View reviewed changes

/zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass…

638ca64

…_teller.h

zhangjun reviewed Mar 20, 2023

View reviewed changes

zhoutianzi666 added 8 commits March 20, 2023 06:55

refine code in Conv2dFusionCanSupport

c247ded

Merge branch 'develop' into add_depthwise

d60943b

add macro in cutlass_teller.h

c020d87

add 3x3 5x5 teller

e16ed71

add groups not 1 or conv2d_depthwise teller

f7ad015

只生成ic是8的倍数的conv2d_depthwise 的kernel

e9644f6

add EXPLICIT in cutlass_teller.h

dbb9b25

final commit

60aad1f

yuanlehome reviewed Apr 3, 2023

View reviewed changes

zhoutianzi666 added 2 commits April 4, 2023 03:10

add split_k_slices in conv2d_depthwise

56b8be5

Merge branch 'develop' into add_depthwise

448025c

zhoutianzi666 commented Apr 4, 2023

View reviewed changes

zhoutianzi666 added 6 commits April 4, 2023 06:58

make stages == 2

9cea630

重构部分代码

662208e

add CutlassFusionType

8564e8b

solve illegal memory

3b25b6b

make stride_h=stride_w && make dilation==1

743eb14

must check HasAttr(use_cutlass) before GetAttrIfExists

7f608fa

jiweibo reviewed Apr 12, 2023

View reviewed changes

zhoutianzi666 added 2 commits April 13, 2023 03:12

vlog(3)

ebd8b8d

vlog(3)

cda8682

Merge branch 'develop' into add_depthwise

db201ac

wwbitejotunn reviewed Apr 14, 2023

View reviewed changes

zhoutianzi666 added 2 commits April 17, 2023 00:52

add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String

b96276d

modify decl.h and util.cu

188cb3a

zhoutianzi666 commented Apr 17, 2023

View reviewed changes

zhangjun approved these changes Apr 17, 2023

View reviewed changes

zhangjun merged commit bd3b096 into PaddlePaddle:develop Apr 17, 2023
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Paddle-Inference] Add cutlass conv2d_depthwise #51792

[Paddle-Inference] Add cutlass conv2d_depthwise #51792

zhoutianzi666 commented Mar 17, 2023 •

edited

zhoutianzi666 Mar 17, 2023

zhoutianzi666 Mar 17, 2023 •

edited

zhoutianzi666 Mar 17, 2023

zhoutianzi666 Mar 17, 2023

zhoutianzi666 Mar 17, 2023

zhoutianzi666 Mar 17, 2023

yuanlehome Mar 17, 2023

zhoutianzi666 Mar 20, 2023

zhangjun Mar 20, 2023

zhangjun Mar 20, 2023

zhoutianzi666 Mar 20, 2023 •

edited

zhangjun Mar 20, 2023

zhoutianzi666 Apr 4, 2023

zhangjun Mar 20, 2023

yuanlehome Apr 3, 2023

zhoutianzi666 Apr 4, 2023

yuanlehome Apr 3, 2023

zhoutianzi666 Apr 4, 2023

zhoutianzi666 Apr 4, 2023

jiweibo Apr 12, 2023

zhoutianzi666 Apr 13, 2023

wwbitejotunn Apr 13, 2023

zhoutianzi666 Apr 17, 2023 •

edited

zhoutianzi666 Apr 17, 2023

zhoutianzi666 Apr 17, 2023 •

edited

zhoutianzi666 Apr 17, 2023

zhoutianzi666 Apr 17, 2023

zhangjun left a comment

		@@ -0,0 +1,130 @@
		// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.

[Paddle-Inference] Add cutlass conv2d_depthwise #51792

[Paddle-Inference] Add cutlass conv2d_depthwise #51792

Conversation

zhoutianzi666 commented Mar 17, 2023 • edited

PR types

PR changes

Describe

Choose a reason for hiding this comment

zhoutianzi666 Mar 17, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhoutianzi666 Mar 20, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhoutianzi666 Apr 17, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhoutianzi666 Apr 17, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangjun left a comment

Choose a reason for hiding this comment

zhoutianzi666 commented Mar 17, 2023 •

edited

zhoutianzi666 Mar 17, 2023 •

edited

zhoutianzi666 Mar 20, 2023 •

edited

zhoutianzi666 Apr 17, 2023 •

edited

zhoutianzi666 Apr 17, 2023 •

edited