New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark results #1
Comments
👍 |
The daily benchmark results is available here: |
I really appreciate the results, but I am curious as to why the outcome of dsp is only available on inception_v3 ? |
@DiamonJoy The benchmark is actually CI result of MACE Model Zoo project. |
@llhe Amazing results! |
Tuned means the OpenCL kernel is tuned for the specific type of device instead of using the general rule. |
Is this tuning process done manually offline or it is done at run time automatically? |
@robertwgh It's offline now. We may consider improve the general rule or enable online increasing tuning in the future. |
Or incorporate more advanced rule like ML based models is also a potential choice. |
Yeah, that will be interesting. It would be extremely challenging given a large variety of the Android devices and SoC chipsets. |
From the code I found the CPU benchmark use OpenMP default thread num, it should be 2 threads. |
@izp001 |
It seems like CPU mode is much faster then GPU mode. |
@ligonzheng Only for some low end SoCs, CPU is faster than GPU. Usually GPU is faster or even much faster than CPU mode. And there are other benefits including power efficiency, multi tasking (when using GPU, CPU can be used for other computations like image processing algorithms). |
Some other question about using the mace : Thank you for your reply ! |
|
Happy to find out about this project, and thanks for sharing benchmark results! Wondering where would your results lie on the ReQuEST scoreboard? Specifically for MobileNets v1/v2, are you using the baseline models ( |
We have already added a CK package for MACE. |
@llhe
|
@liyancas From your results, the speed ordered like this: GPU > CPU quant > CPU float This is expected result on middle or high end mobiles. |
@llhe But for tflite, CPU float > CPU quantized. I don't know what's the reason. |
@liyancas Did you get the result from mobile-ai-bench? Could you please also report an issue in that project? If so, we'll have a look. |
Yes. I will double check the results. If the issue still exists and I will open an issue. Thanks for your help. |
@llhe I posted at XiaoMi/mobile-ai-bench#20 |
I am curious what's data type and the number of iterations used in these benchmarks? |
@liyancas the run time of quantized model |
@llhe For example mobilenetsv2 benchmark runs at Is it because snapdragon is actually that much faster, or because of how Mace ops / kernels are implemented? If it's the latter, what would be a good place to start working on possible optimizations? |
@achigeor Generally speaking, Adreno 630 is indeed faster than Mali G72-MP12 (depends on the exact config), and of course you can check whether ARM Compute Library is better optimized for Mali GPU which is not done by https://github.com/XiaoMi/mobile-ai-bench. |
@llhe how's the list of workgroup options determined which is used for finding the optimal tuning configuration(brute force method) for different kernels ? |
@chiraggirdhar95 The candidate parameters to search (https://github.com/XiaoMi/mace/blob/master/mace/ops/opencl/helper.cc#L131, https://github.com/XiaoMi/mace/blob/master/mace/ops/opencl/image/conv_2d_1x1.cc#L143) is kind of random with some heuristics about data locality (the access pattern could be different for different kernels but has limited patterns), including cache, cache line and vector register. However, as marked as TODO, this brute force search is naive and can be improved. |
@llhe Thank you for the promt reply! In our test app with our custom model, Ardeno 630 is x3 faster than Mali G76 too, on GPU with MACE. Do these results sound normal? I didn't expect that big of a difference. |
@achigeor thanks for sharing the timing numbers for your custom model. Can you also share for any open source model? |
A benchmark results of a previous version is available here:
More recent results will be available in the gitlab mirror project CI page soon.
A dedicated mobile device deep learning framework benchmark project MobileAIBench is available here: https://github.com/XiaoMi/mobile-ai-bench
The text was updated successfully, but these errors were encountered: