Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCL for OpenCV on Pi #29

Open
spinoza1791 opened this issue Apr 10, 2018 · 38 comments
Open

OpenCL for OpenCV on Pi #29

spinoza1791 opened this issue Apr 10, 2018 · 38 comments
Assignees

Comments

@spinoza1791
Copy link

Has anyone been able to run OpenCV on the Pi GPU using OpenCL? Is there an example somewhere demonstrating accessing the GPU with OpenCV using OpenCL?

@doe300
Copy link
Owner

doe300 commented Apr 12, 2018

Being able to run OpenCV is one of the goals of this project. But at this stage is is neither tested nor expected to work.

@spinoza1791
Copy link
Author

spinoza1791 commented Apr 12, 2018 via email

@masterchop
Copy link

its a shame, it seems the GPU will help the maker comunity alot but no results so far anywhere. Damm Nvidia taking all the AI, i purchase the Movidious stick while i wait for this to happen, let see how it goes once i get it.

@soulslicer
Copy link

What about Caffe? Caffe is currently the only deep learning library to properly support OpenCL 1.2. If this can be made to work with Caffe, itll really be very powerful for AI work

@thortex
Copy link

thortex commented Jul 22, 2018

OpenCL-enabled OpenCV is located in my Github;
https://github.com/thortex/rpi3-opencv/
https://github.com/thortex/rpi3-opencv/releases/tag/v3.4.2-opencl

I'm testing original test cases provided by OpenCV;
https://github.com/thortex/rpi3-vc4cl/
https://github.com/thortex/rpi3-vc4cl/releases
https://github.com/thortex/rpi3-vc4cl/tree/master/test/opencv

There are 16,182 tests from 132 test cases.
I've run 2,865 test, got 1,555 NGs, and 1,308 OKs.

opencv-opencl-test.zip

@doe300
Copy link
Owner

doe300 commented Jul 22, 2018

There are 16,182 tests from 132 test cases.
I've run 2,865 test, got 1,555 NGs, and 1,308 OKs.

That sounds promising. For the failed tests, how good are the outputs? Are they any good to debug issues in VC4CL?

@spinoza1791
Copy link
Author

spinoza1791 commented Jul 22, 2018 via email

@thortex
Copy link

thortex commented Jul 28, 2018

log.arith.zip

I attached OpenCL Arithmetic test result of OpenCV 3.4.2 above.

Warning/Error summary is listed below:

  • Warning: OpenCV uses clGetProgramInfo API to save OpenCL program cache.
[ WARN:0] Can't save OpenCL binary into cache: /root/.cache/opencv/3.4.2/opencl_cache/32-bit--Broadcom--VideoCore_IV_GPU--0_4/core--lut_02217d060320fc126306ad16885be711.bin
OpenCV(3.4.2) /home/pi/rpi3-opencv/setup/opencv-3.4.2/modules/core/src/ocl.cpp:3752: error: (-220:Unknown error code -220) OpenCL error CL_INVALID_VALUE (-30) during call: clGetProgramInfo(handle, CL_PROGRAM_BINARIES, sizeof(ptr), &ptr, NULL) in function 'getProgramBinary'
  • Line#264: 64-bit operations are not supported by the VideoCore IV architecture, ...
  • Line#1103: ./modules/core/test/ocl/test_arithm.cpp:179: Failure (Add/Subtract/Multiply)
  • Line#7243: MinMax test.

If VC4CL supports clGetProgramInfo(), it's good to execute OpenCL tests for shortening processing time.

@thortex
Copy link

thortex commented Jul 28, 2018

Hi spinoza1791 , I think it won't work yet.
We have to debug VC4CL properly to run OpenCV in OpenCL mode.

@doe300
Copy link
Owner

doe300 commented Jul 28, 2018

Thanks @thortex for testing this and the logs.

A few quick comments on the log:

  • VC4CL only supports 64-bit types in a very limited scope (when statically convertible to 32-bit types). Idk, if you can deactivate 64-bit types from the OpenCV tests, but these tests will probably never pass.
  • The warning with the CL_INVALID_VALUE in clGetProgramInfo I will have to look into it. Looks like the VC4CL does something wrong there.
  • Even if it works, at least at the moment I would not enable caching, since currently the compiler (and the code generated) is the thing which changes the most in VC4CL.

@thortex
Copy link

thortex commented Aug 2, 2018

Thanks @doe300,

I checked the following fails:

Line#1103: ./modules/core/test/ocl/test_arithm.cpp:179: Failure (Add/Subtract/Multiply)

And I found failure patterns:

  • signed 8-bit data type matrix operation is OK, but unsigned 8-bit is failed.
  • signed 16-bit data type matrix operation is OK, but unsigned 16-bit is failed.
  • float64 data type matrix operation is OK, but float32 is failed.

an example result of unsigned 8-bit data type matrix Add operation is:

[128  30   7     [ 56  32  55    [255 255 255
  20  15   4   +   89  55  12  =  255 255 255
  50  25   5]      11  89  98]    255 255 255]

The expected result is:

[128  30   7     [ 56  32  55    [184  62  62
  20  15   4   +   89  55  12  =  109  70  16
  50  25   5]      11  89  98]     61 114 103]

I don't know why, but VC4CL returns 0xFF for all result of operation.

@doe300
Copy link
Owner

doe300 commented Aug 3, 2018

That's some interesting results, I will have to look into it.
What is the output for unsigned 16-bit integers? Are all elements set to 0xFFFF?

64-bit floating point test cannot pass, it is probably just skipped.

@doe300 doe300 self-assigned this Aug 3, 2018
@doe300 doe300 added the bug label Aug 3, 2018
@thortex
Copy link

thortex commented Aug 4, 2018

mini.zip

I attached mini.zip (including six test results for matrix add arithmetic operation).

CPU only: Matrix src1 + Matrix src2 = Matrix dst1
OpenCL (VC4CL): Matrix usrc1 + Matrix usrc2 = Matrix udst1

What is the output for unsigned 16-bit integers? Are all elements set to 0xFFFF?

It's 65535 (0xFFFF) described in line #722 of mini.log.

mini-debug.log includes VC4CL debugging outputs.

doe300 added a commit to doe300/VC4CLStdLib that referenced this issue Aug 11, 2018
@doe300
Copy link
Owner

doe300 commented Aug 11, 2018

I found and fixed a bug in conversion with saturation and the failing tests now succeed.

@thortex
Copy link

thortex commented Aug 11, 2018

Thanks doe300!!!

I'll also check other tests with bedb33c8d6241bab60e9ca3954b20faf8fbf7af3.

@julled
Copy link

julled commented Aug 29, 2018

So does this mean that its possible to run some OpenCV functions with OpenCL accelerations?
Do you have any minimal benchmark of a single function with/without OpenCL acceleration to give us an idea of a possible speedup? I am very curious about it!

@vb216
Copy link

vb216 commented Sep 8, 2018

@doe300 and @thortex , is it feasible results could be slower? I've been using the opencv_perf_imgproc for comparisons when looking at 32bit pi vs 64bit pi, so I thought I'd fire this up again with opencl using the work here (which is really great + interesting - thanks alot). I took opencv from @thortex repo, and then rebuilt so I got the perf binaries (on latest raspbian 32bit OS).

There's a few crashes but on the test subsets that run OK, so far I see some times coming in similar, but some quite alot (e.g. factor of 10) slower.

@doe300
Copy link
Owner

doe300 commented Sep 9, 2018

Yes it is possible. As mentioned in various other posts, memory access (esp. write access) is a bottleneck. Although there are some optimizations left to be done.

doe300 added a commit that referenced this issue Sep 30, 2018
@charlesrwest
Copy link

If I may ask, what is the current status of opencv support? Do most operations work? Has anyone tried it with the DNN module? I've got MobileNetV2 running with ~.2 sec inference time using optimized cpu based opencv. I would be interested to see if it could go faster using the GPU.

@doe300
Copy link
Owner

doe300 commented Nov 28, 2018

I don't have any progress testing OpenCV. The problem is that it is hard to test and not very suitable for debugging issues with wrongly generated code.

@abhiTronix
Copy link

@spinoza1791 @doe300 I've successfully compiled Latest OpenCV(4.0.1 - dev) with VC4CL OpenCL with no whatsoever compilation error. I have included VC4CL OpenCL during OpenCV compilation and also with FFmpeg build. I'll update here with benchmark results.

@spinoza1791
Copy link
Author

spinoza1791 commented Feb 5, 2019 via email

@thortex
Copy link

thortex commented Feb 5, 2019

Also updated: https://github.com/thortex/rpi3-opencv/releases/tag/v4.0.1

@spinoza1791
Copy link
Author

spinoza1791 commented Feb 5, 2019 via email

@abhiTronix
Copy link

Also updated: https://github.com/thortex/rpi3-opencv/releases/tag/v4.0.1

@thortex Looking at your build script, I think you're using OpenCV's inbuilt OpenCL module, not one provided by this repo. Check your cmake output again for confirmation Or check print(cv2.getBuildInformation()) output.

@thortex
Copy link

thortex commented Feb 7, 2019

@abhiTronix I used the dynamic load feature of OpenCV.

@spinoza1791
Copy link
Author

I successfully installed your OpenCL version of OpenCV on Pi 3B+ via https://github.com/thortex/rpi3-opencv/releases/tag/v4.0.1. What else is needed to install VC4CL and test?

@abhiTronix
Copy link

abhiTronix commented Feb 8, 2019

@thortex I'm referring to the script on your GitHub repo. OpenCV prioritizes inbuilt libraries over System Libraries, so they have to be manually linked with OpenCV to make them work. VC4CL can't be dynamically linked with OpenCV directly unless path specified/linked at runtime.

@thortex
Copy link

thortex commented Feb 9, 2019

@abhiTronix, thanks for your reviewing.
I added ICD OpenCL library dependency in
thortex/rpi3-opencv@d35998f
This release depends on https://github.com/thortex/rpi3-vc4cl/

@vb216
Copy link

vb216 commented Feb 11, 2019

Thanks for the updates to your repo - I compiled and tried to execute the performance test. Without running as sudo it won't run the opencl extensions which is a handy way to test the difference for me.

Anyway, with sudo, I noticed first few tests of the perf_imgproc run go OK, but then I start getting these kernel messages, and the process seems to hang.

I'm not pushing for a solution, just incase the info helps anyones work/findings.
FYI its a Pi 3B+, 4.14.79-v7+ #1159 SMP Sun Nov 4 17:50:20 GMT 2018 armv7l GNU/Linux

[37353.721913] INFO: task kworker/0:2:1865 blocked for more than 120 seconds.
[37353.721921] Tainted: G C 4.14.79-v7+ #1159
[37353.721923] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[37353.721927] kworker/0:2 D 0 1865 2 0x00000000
[37353.721943] Workqueue: events get_throttled_poll
[37353.721961] [<8079ef70>] (__schedule) from [<8079f5d8>] (schedule+0x50/0xa8)
[37353.721969] [<8079f5d8>] (schedule) from [<8079fa50>] (schedule_preempt_disabled+0x18/0x1c)
[37353.721978] [<8079fa50>] (schedule_preempt_disabled) from [<807a1358>] (__mutex_lock.constprop.3+0x190/0x58c)
[37353.721986] [<807a1358>] (__mutex_lock.constprop.3) from [<807a1870>] (__mutex_lock_slowpath+0x1c/0x20)
[37353.721994] [<807a1870>] (__mutex_lock_slowpath) from [<807a18d0>] (mutex_lock+0x5c/0x60)
[37353.722002] [<807a18d0>] (mutex_lock) from [<8063cdd0>] (rpi_firmware_transaction+0x44/0xac)
[37353.722012] [<8063cdd0>] (rpi_firmware_transaction) from [<8063cf30>] (rpi_firmware_property_list+0xf8/0x208)
[37353.722019] [<8063cf30>] (rpi_firmware_property_list) from [<8063d0a4>] (rpi_firmware_property+0x64/0x84)
[37353.722027] [<8063d0a4>] (rpi_firmware_property) from [<8063d278>] (rpi_firmware_get_throttled+0x124/0x214)
[37353.722035] [<8063d278>] (rpi_firmware_get_throttled) from [<8063d3fc>] (get_throttled_poll+0x28/0x54)
[37353.722043] [<8063d3fc>] (get_throttled_poll) from [<801379b4>] (process_one_work+0x158/0x454)
[37353.722050] [<801379b4>] (process_one_work) from [<80137d14>] (worker_thread+0x64/0x5b8)
[37353.722057] [<80137d14>] (worker_thread) from [<8013dd98>] (kthread+0x13c/0x16c)
[37353.722066] [<8013dd98>] (kthread) from [<801080ac>] (ret_from_fork+0x14/0x28)

@thortex
Copy link

thortex commented Feb 12, 2019

@vb216, VC4CL and OpenCL-enabled OpenCV are still under development, and OpenCV performance tests would be fail.
Disabling TBB or any thread libraries may help kernel thread problems.

@Favi0
Copy link

Favi0 commented Feb 15, 2019

I successfully installed your OpenCL version of OpenCV on Pi 3B+ via https://github.com/thortex/rpi3-opencv/releases/tag/v4.0.1. What else is needed to install VC4CL and test?

did you manage to install it and run yolo?

@spinoza1791
Copy link
Author

@spinoza1791 @doe300 I've successfully compiled Latest OpenCV(4.0.1 - dev) with VC4CL OpenCL with no whatsoever compilation error. I have included VC4CL OpenCL during OpenCV compilation and also with FFmpeg build. I'll update here with benchmark results.

@abhiTronix,
How did you configure your cmake options to compile for VC4CL? -D BUILD_ ?

@abhiTronix
Copy link

abhiTronix commented Feb 17, 2019

@spinoza1791

pi@raspberrypi:~/yolo $ python3 yolo.py --image 1.jpg --yolo coco
[INFO] loading YOLO from disk...
[VC4CL] can't open /dev/mem
[VC4CL] This program should be run as root. Try prefixing command with: sudo
[INFO] YOLO took 2.774960 seconds
pi@raspberrypi:~/yolo $ sudo python3 yolo.py --image 1.jpg --yolo coco
[INFO] loading YOLO from disk...
[INFO] YOLO took 1.662803 seconds

image_screenshot_17 02 2019

Here are tiny-yolo v3 with 1440p Images benchmarks. We can clearly see an almost 1.5x performance boost with Latest OpenCL+TBB Enabled OpenCV binaries over TBB enabled OpenCV Binaries only. A detailed Benchmark will be updated soon.

How did you configure your cmake options to compile for VC4CL? -D BUILD_

I don't remember exactly but I used default flags as mentioned in the @thortex repo.

@spinoza1791
Copy link
Author

What are the hardware specs here? Intel?

@abhiTronix
Copy link

@spinoza1791, this library only works with raspberry pi. But since you asked its Raspberry Pi 3 Model B rev 1.2.

@spinoza1791
Copy link
Author

Which darknet repo are you using? AlexeyAB?

@abhiTronix
Copy link

abhiTronix commented Feb 18, 2019

I'm using default pre-built binaries available at their official website.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants