New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Paddle-Inference] Add cutlass conv2d_depthwise #51792
Conversation
@@ -202,3 +202,20 @@ def GenerateFunctionForPhi( | |||
op_dicts["op_name"] = camel_names[epi_func] | |||
generated_code += SubstituteTemplate(CommonWrapperForPhi, op_dicts) | |||
return generated_code | |||
|
|||
|
|||
# we modify some template parameters based on CommonCutlassConvKernelDeclare. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conv2d_depthwise的模版和conv2d的模版元素稍有不同,我不想搞两份模版了,所以就在conv2d的模版上改动下!
@@ -152,14 +141,17 @@ void Conv2dFusionLayoutTransferPass::ApplyImpl(ir::Graph *graph) const { | |||
std::string target_op_type = "conv2d_fusion"; | |||
std::unordered_set<ir::Node *> valid_ops; | |||
|
|||
// Determine if this conv2d_fusion can run in cuDNN's NHWC mode, | |||
// will not set or change any attribute in op_desc | |||
auto cuDNNIsValid = [&](ir::Node *op_node) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuDNNIsValid这个逻辑就单独成为一个,只给cuDNN使用!
此处判断此conv2d_fusion能否交给cu DNN以NHWC的方式运行呢
} | ||
} | ||
return true; | ||
return CutlassTeller::Instance()->Conv2dFusionCanSupport( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里调用CUTLASS_teller中的函数!
|
||
// Determine this NCHW conv2d_fusion can be computed by cutlass? | ||
// will not set or change any attribute in op_desc | ||
bool Conv2dFusionCanSupport(ir::Node *conv2d_fusion_node, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个函数的作用是:判断这个conv2d_fusion 这个Op能不能给CUTLASS来计算呢!
|
||
// Determine whether this conv can be fused with the activation by cutlass | ||
// backend. | ||
bool Conv2dCanSupport(int oc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个函数的作用是:这个conv + bias 和act能否融合,并且交给CUTLASS计算呢?
@@ -243,6 +254,11 @@ int ProfileToGetBestConfig( | |||
auto func = all_func[i]; | |||
// When func has large diff, we will make it nullptr. | |||
if (!func) continue; | |||
cudaMemset(params.output, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
跑之前要记得清空哦!这样才能看出来到底有没有diff!
然后试跑一次,看能不能跑,不能跑赶紧continue哦!
@@ -0,0 +1,130 @@ | |||
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cutlass_teller.h放在framwork/ir目录下是否合适?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cutlass_teller.h放在framwork/ir目录下是否合适?
这个类只会在pass阶段被使用,因此最好放在这里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
teller放ir里面还是感觉有点奇怪
"arch": "cutlass::arch::Sm70", | ||
"Ishape": "1,1,1", | ||
"stages": "4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里没什么是sm 70, stages为什么设置为4而不是2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里没什么是sm 70, stages为什么设置为4而不是2
conv2d_depthwise是cuda core,其实设置sm啥都无所谓的。
stages cutlass中重新改为2了!
CHECK_EQ(op_node->IsOp(), true); | ||
if (cuDNNIsValid(op_node)) { | ||
if (cuDNNIsValid(op_node) || CutlassIsValid(op_node)) { | ||
valid_ops.insert(op_node); | ||
auto *op_desc = op_node->Op(); | ||
op_desc->SetAttr("data_format", std::string{"NHWC"}); | ||
if (cutlass_enable && CutlassIsValid(op_node)) { | ||
|
||
if (CutlassIsValid(op_node)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PADDLE_WITH_CUTLASS 为off,同时enable_cutlass为true会存在问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PADDLE_WITH_CUTLASS 为off,同时enable_cutlass为true会存在问题
这部分逻辑统一到了cutlass_teller.h中处理了,当PADDLE_WITH_CUTLASS 为off时,一些函数都返回False
@@ -0,0 +1,130 @@ | |||
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
teller放ir里面还是感觉有点奇怪
if (op_node->Op()->Type() != target_op_type) { | ||
continue; | ||
} | ||
auto filter_name = op_node->Op()->Input("Filter").front(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加181-184行是为啥啊?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加181-184行是为啥啊?
这个是防止权重共享的逻辑。把他单独移出来判断,和CutlassIsValid 和cuDNNIsValid的逻辑解耦。
"cutlass::half_t", | ||
"cutlass::half_t", | ||
), | ||
# ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为啥给注释掉了啊?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为啥给注释掉了啊?
稳定性起见,还是用fp32作为累加器,防止溢出。
continue; | ||
} | ||
auto filter_name = op_node->Op()->Input("Filter").front(); | ||
if (weights_shape_nhwc.count(filter_name)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是防止权重共享的逻辑。把他单独移出来判断,和CutlassIsValid 和cuDNNIsValid的逻辑解耦。
VLOG(3) << OpType2String(op_type) << ": tactic " << i << " has max diff " | ||
<< conv2d_diff_gpu(params, op_type) << " compared with baseline," | ||
<< "cost_time: " << elapsed_time << "ms."; | ||
std::cout << OpType2String(op_type) << ": tactic " << i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
换回VLOG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
换回VLOG?
ok!
# groups_per_cta: per cta would process | ||
# warp_m: per warp would process | ||
[8, 8, 16, 16], | ||
# [8, 16, 16, 16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些注释的配置保留是以后可能会开起嘛?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些注释的配置保留是以后可能会开起嘛?
是的,这些也是一个配置,这里主要考虑到避免生成太多代码才把他注释掉的
@@ -71,6 +71,7 @@ | |||
int ow = params.ow; | |||
int dilation_h = params.dilation_h; | |||
int dilation_w = params.dilation_w; | |||
int split_k_slices = ${split_k_slices}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对于group=1的卷积来说,这里都设置为1.
但是对于depthwise_conv2d来说,这里需要灵活根据问题规模来设置了
namespace framework { | ||
namespace ir { | ||
|
||
typedef enum { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cba指的是:conv+bias+act形式的融合
cbaa指的是:conv+bias+elementwise_add + act形式的融合
今后还会支持更多形式的pattern,因此这里定义了一个枚举类型
if (!cutlass_can_support) { | ||
bool cudnn_can_support = | ||
oc % CUDNN_ALIGNMENT == 0 && ic % CUDNN_ALIGNMENT == 0; | ||
if (!cudnn_can_support) { | ||
return false; | ||
} | ||
} | ||
return true; | ||
}; | ||
|
||
auto CutlassIsValid = [&](ir::Node *op_node) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
判断这个conv2d_fusion能否由cutlass backend计算呢
@@ -112,17 +112,6 @@ void Conv2dFusionLayoutTransferPass::ApplyImpl(ir::Graph *graph) const { | |||
phi::DataType::FLOAT16 || | |||
Get<bool>("enable_gpu_mixed"); | |||
bool cutlass_enable = Get<bool>("use_cutlass"); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些删掉的逻辑全部放到cutlass_teller.h中进行了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* initial commit for cutlass_teller * second commit for cutlass_teller * add conv2d_depthwise python template * add conv2d_depthwise cutlass template * /zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass_teller.h * refine code in Conv2dFusionCanSupport * add macro in cutlass_teller.h * add 3x3 5x5 teller * add groups not 1 or conv2d_depthwise teller * 只生成ic是8的倍数的conv2d_depthwise 的kernel * add EXPLICIT in cutlass_teller.h * final commit * add split_k_slices in conv2d_depthwise * make stages == 2 * 重构部分代码 * add CutlassFusionType * solve illegal memory * make stride_h=stride_w && make dilation==1 * must check HasAttr(use_cutlass) before GetAttrIfExists * add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String * modify decl.h and util.cu
* initial commit for cutlass_teller * second commit for cutlass_teller * add conv2d_depthwise python template * add conv2d_depthwise cutlass template * /zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass_teller.h * refine code in Conv2dFusionCanSupport * add macro in cutlass_teller.h * add 3x3 5x5 teller * add groups not 1 or conv2d_depthwise teller * 只生成ic是8的倍数的conv2d_depthwise 的kernel * add EXPLICIT in cutlass_teller.h * final commit * add split_k_slices in conv2d_depthwise * make stages == 2 * 重构部分代码 * add CutlassFusionType * solve illegal memory * make stride_h=stride_w && make dilation==1 * must check HasAttr(use_cutlass) before GetAttrIfExists * add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String * modify decl.h and util.cu
PR types
Others
PR changes
Others
Describe