Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About arm platform (ncnn model) #3

Closed
hanson-young opened this issue May 28, 2019 · 16 comments
Closed

About arm platform (ncnn model) #3

hanson-young opened this issue May 28, 2019 · 16 comments

Comments

@hanson-young
Copy link
Contributor

Do you have any plans to transfer to ncnn on arm platform? I failed to convert ncnn with caffe model which you provide.

:~/Documents/3rdpart/ncnn/build/tools/caffe$ ./caffe2ncnn ./mnet.prototxt ./mnet.prototxt.caffemodel ./retina.param ./retina.bin
Segmentation fault (core dumped)
@Charrin
Copy link
Owner

Charrin commented May 28, 2019

Do you have any plans to transfer to ncnn on arm platform? I failed to convert ncnn with caffe model which you provide.

:~/Documents/3rdpart/ncnn/build/tools/caffe$ ./caffe2ncnn ./mnet.prototxt ./mnet.prototxt.caffemodel ./retina.param ./retina.bin
Segmentation fault (core dumped)

It is caused by the empty of deconv layer weight, Caffe will init new weight, but NCNN not.
I have solved this problem, you can update the new mnet model.
Can you provide the test speed on ARM platform? Thank you!

@Charrin Charrin closed this as completed May 28, 2019
@hanson-young
Copy link
Contributor Author

Thanks a lot, I will provide it!

@hanson-young
Copy link
Contributor Author

@Charrin
I have solved the inference and post-processing on ncnn !But multi-threading does not improve performance
https://github.com/hanson-young/RetinaFace-Cpp/blob/master/retinaface_ncnn/images/result.jpg

qcom835 640*480 VGA(only inference)

130|greatqltechn:/data/local/tmp $ ./benchncnn 4 1 0 0                         
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = 0
 retinaface-mnet0.25  min =  130.10  max =  131.61  avg =  130.94
       mobilefacenet  min =   48.79  max =   49.55  avg =   49.21
  mobilefacenet-int8  min =   47.21  max =   48.11  avg =   47.76
          squeezenet  min =   63.86  max =   65.61  avg =   64.63
     squeezenet-int8  min =   49.12  max =   49.65  avg =   49.36
           mobilenet  min =  110.70  max =  112.14  avg =  111.47
      mobilenet-int8  min =   88.56  max =   89.66  avg =   89.31
        mobilenet_v2  min =   80.85  max =   82.40  avg =   81.81

@Charrin
Copy link
Owner

Charrin commented May 29, 2019

@Charrin
I have solved the inference and post-processing on ncnn !But multi-threading does not improve performance
https://github.com/hanson-young/RetinaFace-Cpp/blob/master/retinaface_ncnn/images/result.jpg

qcom835 640*480 VGA(only inference)

130|greatqltechn:/data/local/tmp $ ./benchncnn 4 1 0 0                         
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = 0
 retinaface-mnet0.25  min =  130.10  max =  131.61  avg =  130.94
       mobilefacenet  min =   48.79  max =   49.55  avg =   49.21
  mobilefacenet-int8  min =   47.21  max =   48.11  avg =   47.76
          squeezenet  min =   63.86  max =   65.61  avg =   64.63
     squeezenet-int8  min =   49.12  max =   49.65  avg =   49.36
           mobilenet  min =  110.70  max =  112.14  avg =  111.47
      mobilenet-int8  min =   88.56  max =   89.66  avg =   89.31
        mobilenet_v2  min =   80.85  max =   82.40  avg =   81.81

Thank you! I update your test result into my README

@nihui
Copy link

nihui commented May 30, 2019

@hanson-young hi, very appreciated for your work!

model graph is not optimal i think, thus you can try this ~
https://github.com/Tencent/ncnn/wiki/model-optimize

@Charrin
Copy link
Owner

Charrin commented May 30, 2019

@hanson-young hi, very appreciated for your work!

model graph is not optimal i think, thus you can try this ~
https://github.com/Tencent/ncnn/wiki/model-optimize

I have tried, it speeds up 10% on qcom 625
1-thread 379ms
2-thread 244ms
4-thread 180ms

@hanson-young
Copy link
Contributor Author

@nihui 感谢nihui大佬,问题已经解决了,cmake3.9.2在用ndk编译的时候调用openmp会出问题,我降低版本到3.5.1就好了https://gitlab.kitware.com/cmake/cmake/issues/17351
@Charrin 以下是我测试的结果:
高通835 VGA(640*480)

greatqltechn:/data/local/tmp $ ./benchncnn 4 4 0                                                                                                                                                                  
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
 retinaface-mnet0.25  min =   62.31  max =   63.49  avg =   62.79
retinaface-mnet0.25_opt  min =   67.09  max =   82.76  avg =   75.99
       mobilefacenet  min =   15.89  max =   16.32  avg =   16.09
   mobilefacenet_opt  min =   14.12  max =   14.59  avg =   14.42
  mobilefacenet_int8  min =   16.11  max =   16.45  avg =   16.26
          squeezenet  min =   22.76  max =   26.53  avg =   23.94
     squeezenet_int8  min =   18.77  max =   19.20  avg =   18.99
           mobilenet  min =   34.43  max =   34.91  avg =   34.66
      mobilenet_int8  min =   28.90  max =   31.59  avg =   30.00
130|greatqltechn:/data/local/tmp $ ./benchncnn 4 2 0                                                                                                                                                              
loop_count = 4
num_threads = 2
powersave = 0
gpu_device = -1
 retinaface-mnet0.25  min =   82.75  max =   83.10  avg =   82.97
retinaface-mnet0.25_opt  min =   73.44  max =   75.41  avg =   74.52
       mobilefacenet  min =   28.08  max =   30.48  avg =   28.97
   mobilefacenet_opt  min =   25.23  max =   25.98  avg =   25.54
  mobilefacenet_int8  min =   29.37  max =   29.91  avg =   29.69
          squeezenet  min =   35.18  max =   38.03  avg =   36.80
     squeezenet_int8  min =   29.45  max =   31.90  avg =   30.67
           mobilenet  min =   58.60  max =   59.68  avg =   59.17
      mobilenet_int8  min =   51.27  max =   52.94  avg =   51.73
130|greatqltechn:/data/local/tmp $ ./benchncnn 4 1 0                                                                                                                                                              
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
 retinaface-mnet0.25  min =  136.17  max =  138.68  avg =  137.37
retinaface-mnet0.25_opt  min =  123.71  max =  127.71  avg =  125.10
       mobilefacenet  min =   51.50  max =   53.77  avg =   52.40
   mobilefacenet_opt  min =   46.99  max =   47.81  avg =   47.52
  mobilefacenet_int8  min =   56.54  max =   58.16  avg =   57.55
          squeezenet  min =   64.10  max =   65.19  avg =   64.77
     squeezenet_int8  min =   51.01  max =   51.62  avg =   51.42
           mobilenet  min =  107.86  max =  111.64  avg =  109.71
      mobilenet_int8  min =   98.07  max =   98.55  avg =   98.30

@pineking
Copy link

@hanson-young have you compare the speed of arm platform inference for Retinaface vs. MTCNN model?

@hanson-young
Copy link
Contributor Author

@pineking It’s hard to say, related to specific platforms and uses

@hanjw123
Copy link

@hanson-young my test time on 835 is about 20 ms slower than your inference time,could you share your NCNN lib and include files for android?thank you very much!!

@hanson-young
Copy link
Contributor Author

@hanjw123 I compiled it on May 29, but you can get ncnn lib from here. https://github.com/Tencent/ncnn/releases.

@hanjw123
Copy link

@hanjw123 I compiled it on May 29, but you can get ncnn lib from here. https://github.com/Tencent/ncnn/releases.
@hanson-young ok! I tried an older version and it really speeds up, thank you very much!

@hanjw123
Copy link

@hanson-young my inference result is wrong..what's your ndk version and ANDROID_PLATFORM version?

@pineking
Copy link

@hanson-young my inference result is wrong..what's your ndk version and ANDROID_PLATFORM version?

@hanjw123 HI,I also test the speed of retinaface model, would you like to discuss this together?
my wechat is pineking

@hanson-young
Copy link
Contributor Author

@pineking I run it on arm arch64,not android application

@Linzaer
Copy link

Linzaer commented Sep 25, 2019

测试了一下caffe的mnet模型在树莓派4B上的速度,推理框架用的阿里的MNN,树莓派4B的cpu型号是BCM2711(四核Cortex A72,主频1.5GHz),测试分辨率为VGA (640*480),loop10次取平均:

核心数 fp32计算耗时(ms) 量化后int8计算耗时(ms)
1 167 183
2 116 102
3 105 76
4 96 61

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants