Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDNN v8 Implementation of Convolution Kernels #47454

Merged
merged 15 commits into from Nov 18, 2022

Conversation

Tom-Zheng
Copy link
Contributor

@Tom-Zheng Tom-Zheng commented Oct 28, 2022

PR types

New features

PR changes

OPs

Describe

We are making a transition from the legacy CUDNNv7 APIs into the latest CUDNN frontend API which is recommended for CUDNNv8 and later. CUDNN frontend API provides easier programming interface along with modern functionalities like autotuning and errata filter(blocks certain engine configs by json files), providing much more flexibility.
As a first step, this merge implements CUDNNv8 APIs for the convolution operator, both forward and backward.

  • To build paddle with CUDNNv8 APIs, set the build option WITH_CUDNN_FRONTEND to ON. Currently, the default value of this option is OFF.
  • We still use legacy APIs by default. To enable CUDNNv8 APIs at runtime, set the environment variable FLAGS_enable_cudnn_frontend=1.

@paddle-bot-old paddle-bot-old bot added the contributor External developers label Oct 28, 2022
@Tom-Zheng Tom-Zheng force-pushed the cudnnv8_convolution branch 2 times, most recently from fe24aac to fe4c630 Compare November 3, 2022 09:31
@Tom-Zheng Tom-Zheng changed the title [WIP] CUDNN v8 Implementation of Convolution Kernels CUDNN v8 Implementation of Convolution Kernels Nov 3, 2022
@Tom-Zheng Tom-Zheng marked this pull request as ready for review November 3, 2022 11:00
@onecatcn onecatcn requested a review from Xreki November 3, 2022 12:45
paddle/phi/kernels/autotune/cache_cudnn_frontend.h Outdated Show resolved Hide resolved
namespace phi {
namespace autotune {

class CudnnFrontendPlanCache {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前版本,AlgorithmsCacheCudnnAlgorithmsCacheMap存在太多的共同代码,#47667 尝试通过继承的方式重写,后续可更新下。

}

private:
static cudnn_frontend::feature_vector_t MakeKey(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudnn frontend API本身就提供了cache key的计算方式?相比当前ConvCacheKey,主要差别在哪里呢?

struct ConvCacheKey {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本质上没有大的区别,只是API的区别。cudnn v7的编程模式是自己去维护convolution 相关的descriptors, 自己去做数据结构去存各种参数,自己去实现key和cache。CUDNN Frontend API提供了更多的封装,面向对象的设计。在v7里面,这个ConvCacheKey 是通过ConvArgs里面的ConvertToConvCacheKey实现的。但在v8里面我们不再用这个ConvArgs管理descriptor了,而是用v8的class。那么最方便的就是使用提供好的key实现方式。

std::make_pair(MakeKey(op_graph, use_addto), plan.GetEngineConfig()));
}

bool IsStable(const cudnn_frontend::OperationGraph& op_graph,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没太理解,这个函数的作用是什么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有可能穷举搜索的结果不稳定,例如某5次搜索得到的algo是: [1,1,0,0,0],相比于只信任某一次搜索的结果,我们提供一个选项,可以设定一个saturation count,设为N. 某一个algo至少在N次搜索中得到最佳结果,才会被加到cache中去。上面的例子,如果N取3,那么algo1就不会加入cache,0会被认为是最快的algo并加入cache. 这个函数是用来判断是否达到saturation count的,如果返回true才会加到cache。

paddle/phi/kernels/gpudnn/conv_cudnn_frontend.h Outdated Show resolved Hide resolved
paddle/phi/kernels/gpudnn/conv_cudnn_frontend.h Outdated Show resolved Hide resolved
paddle/phi/kernels/gpudnn/conv_grad_kernel.cu Outdated Show resolved Hide resolved
paddle/phi/kernels/gpudnn/conv_grad_kernel_impl_v8.h Outdated Show resolved Hide resolved
padding_common[i] = paddings[2 * i];
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处大断代码和v7分支重复,会使后续功能维护变得很困难。请考虑一下细粒度一些的封装方案。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已重构。

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work~

paddle/phi/kernels/autotune/cache_cudnn_frontend.h Outdated Show resolved Hide resolved
paddle/phi/kernels/autotune/cache_cudnn_frontend.h Outdated Show resolved Hide resolved
bool ret = false;
std::lock_guard<std::mutex> lock(*cache_mutex_);
auto key = op_graph.getFeatureVector();
if (map_.count(MakeKey(op_graph, use_addto)) > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MakeKey这个函数需要多次重复调用吗,会有开销吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要多次调用。该函数定义是implicitly inline的,所以还好。

paddle/phi/kernels/autotune/cache_cudnn_frontend.h Outdated Show resolved Hide resolved
paddle/phi/kernels/gpudnn/conv_cudnn_frontend.h Outdated Show resolved Hide resolved
paddle/phi/kernels/gpudnn/conv_grad_kernel_impl_v8.h Outdated Show resolved Hide resolved
paddle/phi/kernels/gpudnn/conv_grad_kernel_impl_v8.h Outdated Show resolved Hide resolved
paddle/phi/kernels/gpudnn/conv_grad_kernel_impl_v8.h Outdated Show resolved Hide resolved
paddle/phi/kernels/gpudnn/conv_grad_kernel_impl_v8.h Outdated Show resolved Hide resolved
paddle/phi/kernels/gpudnn/conv_grad_kernel.cu Show resolved Hide resolved
@Xreki
Copy link
Contributor

Xreki commented Nov 11, 2022

另外,需修复一下rocm平台上的编译错误:

image

- move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
- add const specifier for input tensor
- add logging when plans fail to execute
- move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h
Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 相关功能和代码建议后续进一步优化,一方面和cudnn v7当前行为保持一致,另一方便后续便于将Frontend API应用到更多算子。


#include <vector>

#include "paddle/fluid/framework/convert_utils.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最近PHI算子库在做解耦Fluid依赖的工作,convert_utils.h刚刚清理完毕,参考PR:#48001 这里建议参考该PR的方式把这个头文件移除

#include <vector>

#include "paddle/fluid/framework/convert_utils.h"
#include "paddle/fluid/platform/device/gpu/cuda/cudnn_desc.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里还是不建议引入额外的Fluid头文件到PHi下,可以先在phi的backends/gpu/cuda目录下建立同样的头文件,把需要用到的函数拷贝过来,我看了这个文件里用到的应该不多,比较好处理

.setStrides(strides.size(), strides.data())
.setId(id)
.setAlignment(GetAlignment(tensor))
.setDataType(paddle::platform::ToCudnnDataType(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里或许可以这样改:phi下新加一个ToCudnnDataType,逻辑和fluid下的一样,只不过传入的datatype类型不是proto::Vartype,而是phi的DataType

@YuanRisheng
Copy link
Contributor

该PR先合入,Fluid头文件让相关同学进行清理

@Xreki Xreki merged commit 14a6e67 into PaddlePaddle:develop Nov 18, 2022
@Tom-Zheng Tom-Zheng deleted the cudnnv8_convolution branch November 21, 2022 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers NVIDIA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants