-
Notifications
You must be signed in to change notification settings - Fork 767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to Reduce CL target run Time at intialisation stage ( Prepare time) #977
Comments
Dear team, Could you let me know what could be the root cause? I need to review the usage of the ARMNN further , but in case prepare takes lot of time then i need an understanding about it. Tiime taken by ARMNN CPU to prepare : 992ms Tiime taken by opensource tflite CPU plugin : 44ms |
Hi @abhajaswal Could you please try with the latest release 22.05? There have been some improvements in the startup time since 20.02. In general, both for CPU and GPU, the first iteration is slower because during this run ACL performs various transformations on the tensors to make sure the memory is accessed in the best way possible. All this additional work is done by the operators in their corresponding For the OpenCL backend you also have to add the time to compile the OpenCL kernels at runtime, which occurs during configuration. To mitigate this problem you can save the compiled kernels to disk and restore them at runtime. For more information please see the example: https://github.com/ARM-software/ComputeLibrary/blob/main/examples/cl_cache.cpp Please also be aware that the use of the opencl tuner in acl can affect startup time too, for more information please see: https://arm-software.github.io/ComputeLibrary/latest/architecture.xhtml#architecture_opencl_tuner It would be helpful if you could share the complete command you used to run the example. |
Thanks ! Using cl_cache.bin i am able to reduce the time to load model from 20612 ms to Init : 1379 ms After cl_cache.bin restore initial was at time of 1st time save cl_cache.bin : ------------ PERFORMANCE ------------------ -rwxr-xr-x 1 root root 2419612 Jan 2 18:24 armnn_clcahae.bin This .bin file i will have to generate for N number of models , so wont it take up more memory . Actually i tried TFlite GPU delegate , the load time for it is also low, i dint had to generate cl_cache.bin step for it. |
Hi @abhajaswal Glad to hear you improved the load time using prebuilt opencl kernels.
Yes, you could easily implement deflating/inflating with something like zlib at runtime to reduce the size on disk if that's a concern.
Unfortunately not without a major rework of the library. At runtime the OpenCL kernels need to be compiled and that is what requires the additional time. Hope this helps. |
Hello I try to use the ACL 20.2
As you know when we run any example for ACL in iteration
Example mobilenet SSD v1 -> 1st time call for graph_run(0 takes about 1 min
2nd time onwards the graph run takes about 92 ms.
As i understand 1st time ACL creates the pipeline and memory/buffers etc , so it takes time , but is there any way i can reduce the
1st time initialisation time?
The text was updated successfully, but these errors were encountered: