Use EigenBlasGemm improve convolution computing performance in ARMv7 environment. #3549

hedaoyuan · 2017-08-17T08:24:14Z

In some environment, the performance of Eigen's matrix multiplication is higher than OpenBlas (like ARMv7). So add an EigenBlasGemm.

Xreki · 2017-08-18T09:28:52Z

CMakeLists.txt

@@ -55,6 +55,7 @@ option(WITH_C_API       "Compile PaddlePaddle with C-API(Prediction)"   OFF)
 option(WITH_GOLANG      "Compile PaddlePaddle with GOLANG"              OFF)
 option(GLIDE_INSTALL    "Download and install go dependencies "         ON)
 option(USE_NNPACK       "Compile PaddlePaddle with NNPACK library"      OFF)
+option(USE_EIGEN_FOR_BLAS   "Use matrix multiplication in Eigen"        OFF)


建议将USE_EIGEN_FOR_BLAS的默认值改成ON，跑一遍单测看看有没有问题。确认没有问题之后，再将默认值改回OFF。

在本地ON和OFF都是测试过（x86, armv7, armv8），CI只是编译，修改过来实际也跑不了单侧。

若设置了USE_EIGEN_FOR_BLAS，teamcity上也会编译使用Eigen计算gemm的版本，是可以跑单测的。线下我只跑了mobilenet这一个模型，不确定测试的是否全面。

Xreki · 2017-08-18T09:53:12Z

paddle/function/CMakeLists.txt

@@ -4,6 +4,8 @@ file(GLOB cpp_files . *Op.cpp)
 list(APPEND h_files Function.h)
 list(APPEND cpp_files Function.cpp)
 list(APPEND cpp_files BufferArg.cpp)
+list(APPEND cpp_files GemmFunctor.cpp)
+list(APPEND cpp_files EigenGemm.cpp)


USE_EIGEN_FOR_BLAS为OFF时也要编译EigenGemm.cpp？

嗯，这个可以去掉。

Xreki · 2017-08-18T09:58:47Z

paddle/function/EigenGemm.cpp

+      c.device(device) += a.contract(b, dims);
+    } else {
+      c.device(device) =
+          c.constant(alpha) * a.contract(b, dims) + c.constant(beta) * c;


设置了transpose的size，在执行contract()时，内部就会自动transpose？

c.device(device) = c.constant(alpha) * a.contract(b, dims) + c.constant(beta) * c;

不能直接用c.device(device) = alpha * a.contract(b, dims) + beta * c;？

内部就会自动transpose？

是的。

c.device(device) = alpha * a.contract(b, dims) + beta * c;

也是可以的。

Xreki · 2017-08-18T10:20:37Z

paddle/function/GemmFunctor.h

-                  T* C,
-                  const int ldc);
+struct BlasGemm {
+  static void compute(const bool transA,


为什么要换成这种static函数的定义方式呢？

调用的时候省去一次构造，直接BlasGemm<Device, real>::compute。

Xreki

总体来说LGTM，并且我们需要这个PR来加速armv7a架构上的预测速度。

Xreki · 2017-08-21T04:15:31Z

CMakeLists.txt

@@ -55,6 +55,7 @@ option(WITH_C_API       "Compile PaddlePaddle with C-API(Prediction)"   OFF)
 option(WITH_GOLANG      "Compile PaddlePaddle with GOLANG"              OFF)
 option(GLIDE_INSTALL    "Download and install go dependencies "         ON)
 option(USE_NNPACK       "Compile PaddlePaddle with NNPACK library"      OFF)
+option(USE_EIGEN_FOR_BLAS   "Use matrix multiplication in Eigen"        OFF)


若设置了USE_EIGEN_FOR_BLAS，teamcity上也会编译使用Eigen计算gemm的版本，是可以跑单测的。线下我只跑了mobilenet这一个模型，不确定测试的是否全面。

hedaoyuan added 2 commits August 17, 2017 16:19

Add EigenGemm.

53b0e42

Fix GemmConvFunction.

ec2ba24

hedaoyuan changed the title ~~Add EigenBlasGemm~~ Use EigenBlasGemm improve convolution computing performance in ARMv7 environment. Aug 17, 2017

hedaoyuan requested a review from Xreki August 17, 2017 08:38

hedaoyuan added 2 commits August 17, 2017 16:40

Add PADDLE_USE_EIGEN_FOR_BLAS macro.

adcca2c

Remove the header files that do not need to be included.

6ba04dc

Xreki reviewed Aug 18, 2017

View reviewed changes

Follow comments.

430e0e4

Xreki approved these changes Aug 21, 2017

View reviewed changes

hedaoyuan merged commit a683a56 into PaddlePaddle:develop Aug 21, 2017

Xreki added this to Convolution Optimization in Embedded and Mobile Deployment Aug 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use EigenBlasGemm improve convolution computing performance in ARMv7 environment. #3549

Use EigenBlasGemm improve convolution computing performance in ARMv7 environment. #3549

hedaoyuan commented Aug 17, 2017 •

edited

Xreki Aug 18, 2017

hedaoyuan Aug 19, 2017

Xreki Aug 21, 2017

Xreki Aug 18, 2017

hedaoyuan Aug 19, 2017

hedaoyuan Aug 21, 2017

Xreki Aug 18, 2017

hedaoyuan Aug 19, 2017

hedaoyuan Aug 21, 2017

Xreki Aug 18, 2017

hedaoyuan Aug 19, 2017

Xreki left a comment

Xreki Aug 21, 2017

Use EigenBlasGemm improve convolution computing performance in ARMv7 environment. #3549

Use EigenBlasGemm improve convolution computing performance in ARMv7 environment. #3549

Conversation

hedaoyuan commented Aug 17, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hedaoyuan commented Aug 17, 2017 •

edited