Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance far below result with Raspberry Pi 3 B+ #238

Closed
iamliuyin opened this issue Mar 31, 2020 · 19 comments
Closed

Performance far below result with Raspberry Pi 3 B+ #238

iamliuyin opened this issue Mar 31, 2020 · 19 comments

Comments

@iamliuyin
Copy link

于老师您好!

我在用Raspberry Pi 3 B+做调试,看到项目README的结论,这款设备在320x240的图片下,单核性能是8.1fps,多核性能是23.74fps。但是我这边在320x240的实际测试结果,多核只有4.2fps,远远低于前面给出的结论。方便的话,请帮我找找原因,谢谢!

//下面是一些环境参数

  1. os: 2020-02-13-raspbian-buster-lite
  2. gcc version: 8.3.0 (Raspbian 8.3.0-6+rpi1) (本地编译)
  3. cmake options:
    AVX512 = OFF
    AVX2 = OFF
    NEON = ON
    OpenMP = TRUE
    DEMO = ON
    add_compile_options(-mfpu=neon) (不加编译不通过)
  4. opencv: 3.2.0

//cmake的输出
-- The C compiler identification is GNU 8.3.0
-- The CXX compiler identification is GNU 8.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
fatal: No names found, cannot describe anything.
BUILD_VERSION:v0.0.1
Using ENON
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- Found OpenCV: /usr (found version "3.2.0")
AVX512 = OFF
AVX2 = OFF
NEON = ON
OpenMP = TRUE
DEMO = ON
-- Configuring done
-- Generating done
-- Build files have been written to: /home/pi/dev/libfacedetection/build

//benchmark的运行结果
pi@raspberrypi:~/dev/libfacedetection/build $ ./benchmark 320_0.jpg
There are 4 threads, 4 processors.
cnn facedetection average time = 238.36ms | 4.20 FPS

@hanStudy
Copy link

hanStudy commented Apr 1, 2020

兄弟,你好,我的树莓派系统找不到OpenMP,而你的却能找到,这是什么原因?能给我讲讲吗

@iamliuyin
Copy link
Author

兄弟,你好,我的树莓派系统找不到OpenMP,而你的却能找到,这是什么原因?能给我讲讲吗

你是不是没装opencv?

@hanStudy
Copy link

hanStudy commented Apr 1, 2020

兄弟,你好,我的树莓派系统找不到OpenMP,而你的却能找到,这是什么原因?能给我讲讲吗

你是不是没装opencv?

你看我报的错
g++-6: error: unrecognized command line option ‘-mavx2’
g++-6: error: unrecognized command line option ‘-mfma’
CMakeFiles/facedetection.dir/build.make:62: recipe for target 'CMakeFiles/facedetection.dir/src/facedetectcnn-int8data.cpp.o' failed
make[2]: *** [CMakeFiles/facedetection.dir/src/facedetectcnn-int8data.cpp.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/facedetection.dir/all' failed
make[1]: *** [CMakeFiles/facedetection.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

@hanStudy
Copy link

hanStudy commented Apr 1, 2020

兄弟,你好,我的树莓派系统找不到OpenMP,而你的却能找到,这是什么原因?能给我讲讲吗

你是不是没装opencv?

OpenMP和opencv没关系吧

@iamliuyin
Copy link
Author

iamliuyin commented Apr 1, 2020

兄弟,你好,我的树莓派系统找不到OpenMP,而你的却能找到,这是什么原因?能给我讲讲吗

你是不是没装opencv?

你看我报的错
g++-6: error: unrecognized command line option ‘-mavx2’
g++-6: error: unrecognized command line option ‘-mfma’
CMakeFiles/facedetection.dir/build.make:62: recipe for target 'CMakeFiles/facedetection.dir/src/facedetectcnn-int8data.cpp.o' failed
make[2]: *** [CMakeFiles/facedetection.dir/src/facedetectcnn-int8data.cpp.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/facedetection.dir/all' failed
make[1]: *** [CMakeFiles/facedetection.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

arm上肯定不能开avx2啊,你可以改一下CMakeLists.txt,把avx2关掉,neon打开。
另外,opencv的package里面有openmp的库。

@zhengchunxian-ai
Copy link

3. cmake options:
AVX512 = OFF
AVX2 = OFF
NEON = ON
OpenMP = TRUE
DEMO = ON
add_compile_options(-mfpu=neon) (不加编译不通过)

你好
add_compile_options(-mfpu=neon) (不加编译不通过) 这条填哪里啊?

@zhengchunxian-ai
Copy link

你好,我使用Raspberry Pi 3 B+调试,编译的打印如下:
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
AVX512 = OFF
AVX2 = OFF
NEON = ON
OpenMP = TRUE
DEMO = OFF
-- Configuring done
-- Generating done
-- Build files have been written to: /zcx/libfacedetection-master/build
root@raspberrypi:/zcx/libfacedetection-master/build#
root@raspberrypi:/zcx/libfacedetection-master/build# cmake --build . --config Release
Scanning dependencies of target facedetection
[ 25%] Building CXX object CMakeFiles/facedetection.dir/src/facedetectcnn-int8data.cpp.o
[ 50%] Building CXX object CMakeFiles/facedetection.dir/src/facedetectcnn-model.cpp.o
[ 75%] Building CXX object CMakeFiles/facedetection.dir/src/facedetectcnn.cpp.o
In file included from /zcx/libfacedetection-master/src/facedetectcnn.h:60,
from /zcx/libfacedetection-master/src/facedetectcnn.cpp:39:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h: In function ‘int dotProductUint8Int8(unsigned char*, signed char*, int)’:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h:6733:1: error: inlining failed in call to always_inline ‘int32x4_t vdupq_n_s32(int32_t)’: target specific option mismatch
vdupq_n_s32 (int32_t __a)
^~~~~~~~~~~
/zcx/libfacedetection-master/src/facedetectcnn.cpp:93:29: note: called from here
result_vec = vdupq_n_s32(0); //zeros
~~~~~~~~~~~^~~
In file included from /zcx/libfacedetection-master/src/facedetectcnn.h:60,
from /zcx/libfacedetection-master/src/facedetectcnn.cpp:39:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h:11211:1: error: inlining failed in call to always_inline ‘int8x8x2_t vld2_s8(const int8_t*)’: target specific option mismatch
vld2_s8 (const int8_t * __a)
^~~~~~~
/zcx/libfacedetection-master/src/facedetectcnn.cpp:97:41: note: called from here
a = vld2_s8((signed char*)p1 + i);
^
In file included from /zcx/libfacedetection-master/src/facedetectcnn.h:60,
from /zcx/libfacedetection-master/src/facedetectcnn.cpp:39:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h:11211:1: error: inlining failed in call to always_inline ‘int8x8x2_t vld2_s8(const int8_t*)’: target specific option mismatch
vld2_s8 (const int8_t * __a)
^~~~~~~
/zcx/libfacedetection-master/src/facedetectcnn.cpp:98:27: note: called from here
b = vld2_s8(p2 + i);
...

想请教一下,这些错误是什么原因呢?

@iamliuyin
Copy link
Author

  1. cmake options:
    AVX512 = OFF
    AVX2 = OFF
    NEON = ON
    OpenMP = TRUE
    DEMO = ON
    add_compile_options(-mfpu=neon) (不加编译不通过)

你好
add_compile_options(-mfpu=neon) (不加编译不通过) 这条填哪里啊?

加到CMakeLists.txt里面
你要先确认下是不是跟neon相关的报错,这个选项只能解决这个问题

@iamliuyin
Copy link
Author

你好,我使用Raspberry Pi 3 B+调试,编译的打印如下:
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
AVX512 = OFF
AVX2 = OFF
NEON = ON
OpenMP = TRUE
DEMO = OFF
-- Configuring done
-- Generating done
-- Build files have been written to: /zcx/libfacedetection-master/build
root@raspberrypi:/zcx/libfacedetection-master/build#
root@raspberrypi:/zcx/libfacedetection-master/build# cmake --build . --config Release
Scanning dependencies of target facedetection
[ 25%] Building CXX object CMakeFiles/facedetection.dir/src/facedetectcnn-int8data.cpp.o
[ 50%] Building CXX object CMakeFiles/facedetection.dir/src/facedetectcnn-model.cpp.o
[ 75%] Building CXX object CMakeFiles/facedetection.dir/src/facedetectcnn.cpp.o
In file included from /zcx/libfacedetection-master/src/facedetectcnn.h:60,
from /zcx/libfacedetection-master/src/facedetectcnn.cpp:39:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h: In function ‘int dotProductUint8Int8(unsigned char*, signed char*, int)’:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h:6733:1: error: inlining failed in call to always_inline ‘int32x4_t vdupq_n_s32(int32_t)’: target specific option mismatch
vdupq_n_s32 (int32_t __a)
^~~~~~~~~~~
/zcx/libfacedetection-master/src/facedetectcnn.cpp:93:29: note: called from here
result_vec = vdupq_n_s32(0); //zeros

In file included from /zcx/libfacedetection-master/src/facedetectcnn.h:60,
from /zcx/libfacedetection-master/src/facedetectcnn.cpp:39:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h:11211:1: error: inlining failed in call to always_inline ‘int8x8x2_t vld2_s8(const int8_t*)’: target specific option mismatch
vld2_s8 (const int8_t * __a)
^~~~~~~
/zcx/libfacedetection-master/src/facedetectcnn.cpp:97:41: note: called from here
a = vld2_s8((signed char*)p1 + i);
^
In file included from /zcx/libfacedetection-master/src/facedetectcnn.h:60,
from /zcx/libfacedetection-master/src/facedetectcnn.cpp:39:
/usr/lib/gcc/arm-linux-gnueabihf/8/include/arm_neon.h:11211:1: error: inlining failed in call to always_inline ‘int8x8x2_t vld2_s8(const int8_t*)’: target specific option mismatch
vld2_s8 (const int8_t * __a)
^~~~~~~
/zcx/libfacedetection-master/src/facedetectcnn.cpp:98:27: note: called from here
b = vld2_s8(p2 + i);
...

想请教一下,这些错误是什么原因呢?

CMakeLists.txt里加一个编译选项:
add_compile_options(-mfpu=neon)

@hanStudy
Copy link

hanStudy commented Apr 1, 2020

多谢了,兄弟,已经解决!!!
总结:
1、缺少OpenMP可以自己在cmake-gui中给OpenMP_CXX_FLAGS添加-fopencv值来解决
2、arm不能使用AVX2,关闭AVX2,使用neon
3、编译不通过报错,再 CMakeLists.txt里加一条: add_compile_options(-mfpu=neon)
4、安装要加sudo,将会放到系统库中,否则失败
综上编译成功

@iamliuyin
Copy link
Author

多谢了,兄弟,已经解决!!!
总结:
1、缺少OpenMP可以自己在cmake-gui中给OpenMP_CXX_FLAGS添加-fopencv值来解决
2、arm不能使用AVX2,关闭AVX2,使用neon
3、编译不通过报错,再 CMakeLists.txt里加一条: add_compile_options(-mfpu=neon)
4、安装要加sudo,将会放到系统库中,否则失败
综上编译成功

恭喜!
你能顺便用benchmark跑一下320x240图片的识别效率吗?

@ShiqiYu
Copy link
Owner

ShiqiYu commented Apr 1, 2020

于老师您好!

我在用Raspberry Pi 3 B+做调试,看到项目README的结论,这款设备在320x240的图片下,单核性能是8.1fps,多核性能是23.74fps。但是我这边在320x240的实际测试结果,多核只有4.2fps,远远低于前面给出的结论。方便的话,请帮我找找原因,谢谢!

g++编译代码时加-O3了吗?

@hanStudy
Copy link

hanStudy commented Apr 1, 2020

多谢了,兄弟,已经解决!!!
总结:
1、缺少OpenMP可以自己在cmake-gui中给OpenMP_CXX_FLAGS添加-fopencv值来解决
2、arm不能使用AVX2,关闭AVX2,使用neon
3、编译不通过报错,再 CMakeLists.txt里加一条: add_compile_options(-mfpu=neon)
4、安装要加sudo,将会放到系统库中,否则失败
综上编译成功

恭喜!
你能顺便用benchmark跑一下320x240图片的识别效率吗?

There are 4 threads, 4 processors.
cnn facedetection average time = 128.56ms | 7.78 FPS

@iamliuyin
Copy link
Author

于老师您好!
我在用Raspberry Pi 3 B+做调试,看到项目README的结论,这款设备在320x240的图片下,单核性能是8.1fps,多核性能是23.74fps。但是我这边在320x240的实际测试结果,多核只有4.2fps,远远低于前面给出的结论。方便的话,请帮我找找原因,谢谢!

g++编译代码时加-O3了吗?

应该是有加的,我后面为做验证,又在CMakeLists.txt里面新加了

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O3")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3")

编译后效果还是一样,性能没有变化。

@ShiqiYu
Copy link
Owner

ShiqiYu commented Apr 1, 2020 via email

@iamliuyin
Copy link
Author

单核时间如何?

On Wed, Apr 1, 2020 at 5:05 PM ohohohh @.***> wrote: 于老师您好! 我在用Raspberry Pi 3 B+做调试,看到项目README的结论,这款设备在320x240的图片下,单核性能是8.1fps,多核性能是23.74fps。但是我这边在320x240的实际测试结果,多核只有4.2fps,远远低于前面给出的结论。方便的话,请帮我找找原因,谢谢! g++编译代码时加-O3了吗? 应该是有加的,我后面为做验证,又在CMakeLists.txt里面新加了 set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O3") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3") 编译后效果还是一样,性能没有变化。 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#238 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWR4HLDTR7ZJS2DSQGL5ODRKL7W5ANCNFSM4LXVFREA .

单核的表现:
pi@raspberrypi:~/dev/libfacedetection/build $ ./benchmark 320_0.jpg
There is 1 thread.
cnn facedetection average time = 468.18ms | 2.14 FPS

对比前面4核的:
pi@raspberrypi:~/dev/libfacedetection/build $ ./benchmark 320_0.jpg
There are 4 threads, 4 processors.
cnn facedetection average time = 238.36ms | 4.20 FPS

大约差了一倍。

@ShiqiYu
Copy link
Owner

ShiqiYu commented Apr 1, 2020 via email

@iamliuyin
Copy link
Author

理论上讲,耗时应该是上一个版本的2倍,8.1fps/2=4fps。但我一直没有在ARM上进行测试。

On Wed, Apr 1, 2020 at 9:45 PM ohohohh @.> wrote: 单核时间如何? … <#m_-7571150502542449208_> On Wed, Apr 1, 2020 at 5:05 PM ohohohh @.> wrote: 于老师您好! 我在用Raspberry Pi 3 B+做调试,看到项目README的结论,这款设备在320x240的图片下,单核性能是8.1fps,多核性能是23.74fps。但是我这边在320x240的实际测试结果,多核只有4.2fps,远远低于前面给出的结论。方便的话,请帮我找找原因,谢谢! g++编译代码时加-O3了吗? 应该是有加的,我后面为做验证,又在CMakeLists.txt里面新加了 set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O3") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3") 编译后效果还是一样,性能没有变化。 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#238 (comment) <#238 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWR4HLDTR7ZJS2DSQGL5ODRKL7W5ANCNFSM4LXVFREA . 单核的表现: @.:~/dev/libfacedetection/build $ ./benchmark 320_0.jpg There is 1 thread. cnn facedetection average time = 468.18ms | 2.14 FPS 对比前面4核的: @.:~/dev/libfacedetection/build $ ./benchmark 320_0.jpg There are 4 threads, 4 processors. cnn facedetection average time = 238.36ms | 4.20 FPS 大约差了一倍。 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#238 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWR4HOTA2DQNSTJKRK53RDRKNAQ3ANCNFSM4LXVFREA .

好的,了解了。
那上一个版本我能否一起对比下?我看这个仓库好像没有其他branch或者tag啊。

@weilan97
Copy link

@iamliuyin 我的测试结果跟你非常接近,Raspberry Pi 3 B+, 320X240图片,NEON = ON
OpenMP = TRUE,
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O3")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3") 都加上了。

多核 FPS 只有5

你后来有解决办法了吗?求指点

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants