Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

能否提供一下在android主流CPU上的性能数据,谢谢 #5

Closed
nkyle04 opened this issue Sep 25, 2017 · 3 comments
Closed

能否提供一下在android主流CPU上的性能数据,谢谢 #5

nkyle04 opened this issue Sep 25, 2017 · 3 comments

Comments

@nkyle04
Copy link

nkyle04 commented Sep 25, 2017

只看到在IOS GPU上,squeezenet能跑到30ms。能否提供在android上的性能,这样可以对比跟其他框架的性能。
从代码上来看,ncnn使用neon指令实现了convolution,感觉要比这里直接使用gemm要快一些。

@cocodark
Copy link
Contributor

我们是在gemm里面使用neon指令进行矩阵运算的:
void Gemmer::dgemm_micro_kernel(int kc, float alpha, const float *A, const float *B, float beta, float *C, int incRowC, int incColC) {
#ifndef MDL_MAC
int i, j, l;
float32x4_t abv0 = vdupq_n_f32(0);
float32x4_t abv1 = vdupq_n_f32(0);
float32x4_t abv2 = vdupq_n_f32(0);
float32x4_t abv3 = vdupq_n_f32(0);

    float32x4_t av;
    float32x4_t bv;

    float32x2_t bv01;
    float32x2_t bv23;

    for (l = 0; l < kc; ++l) {
        av = vld1q_f32(A);
        bv = vld1q_f32(B);
        bv01 = vget_low_f32(bv);
        abv0 = vmlaq_lane_f32(abv0, av, bv01, 0);
        abv1 = vmlaq_lane_f32(abv1, av, bv01, 1);
        bv23 = vget_high_f32(bv);
        abv2 = vmlaq_lane_f32(abv2, av, bv23, 0);
        abv3 = vmlaq_lane_f32(abv3, av, bv23, 1);
        A += MR;
        B += NR;
    }

    vst1q_f32(AB_ + 0, abv0);
    vst1q_f32(AB_ + 4, abv1);
    vst1q_f32(AB_ + 8, abv2);
    vst1q_f32(AB_ + 12, abv3);

在小米6上,我们的性能如下:
googlenet 均值360ms
squeezenet 均值98ms
mobilenet 均值360ms
由于Android机型众多,我们无法一一覆盖,供参考,谢谢!

@nkyle04
Copy link
Author

nkyle04 commented Sep 25, 2017

感谢分享。另外问一下输入图片是多大的?

@allonli
Copy link
Collaborator

allonli commented Sep 25, 2017

224*224

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants