-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Depthwise Convolution Optimization #3718
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/function/DepthwiseConvOp.cpp#L21 这里边的函数是不是可以去掉了,是不是应该并且加一些check, 比如, device必须是 gpu
const float*, const float*, int, int, int, int, int, int, float*)> | ||
DepthWiseConv; | ||
|
||
if (filterWidth == 3 && strideW() == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我认为这里应该把最朴素的实现给添加上,并且我认为https://github.com/NHZlX/Paddle/blob/mobilenet_neon/paddle/function/neon/DepthwiseConvCpu.h#L98 这种实现会好一些
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不应该加上朴素的实现,加上的话,相当于如果不支持优化实现则走朴素的实现,但实际上如果不支持优化实现,转而执行GemmConv实现更好。另外,NaiveConv本身就有一Function的实现了,可以在ConvLayer里面判断该走哪个分支。
LGTM |
This depthwise convolution optimization is discussed with @NHZlX , and is based on the ARM NEON instruction set, also can be extended to X86 SSE and AVX instruction set.
The optimized logic is if the output size is greater than 4 than each step calculates the four elements of the output.
For example, convolution filter is 3x3:
Use 9 instructions to calculate four elements of the output:
Another implementation requires 4 instructions to calculate one element of the output. This method is slower than the previous method but can be used to calculate the remainder of output.