-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyOpenCL: values only computed correctly for first worker in the group #54
Comments
Can you please give more details, e.g. expected vs. actual results. Also, about the work-group size: VC4CL only asserts that the local size divides the global size and is at most 12. So if the global-size is a power of two, the local size needs to be too. Any other restriction may be applied by pyopencl. |
I just tested with a fresh code pull. The assignment is still an issue. Here's the minimal kernel to reproduce: Expected result: array of [1.0, 1.0, 1.0, 1.0, 1.0, 1.0... ] |
…now handles optimized implicit UNIFORMs correctly, see #27, doe300/VC4C#54
I fixed some host-side issues regarding this, can you re-test? |
I can confirm this is now fixed. :-) |
Very good:) |
I'm testing out some basic PyOpenCL examples (PyOpenCL detects the library and parameters correctly). The demo benchmark looked like a nice simple kernel to try:
https://raw.githubusercontent.com/inducer/pyopencl/master/examples/benchmark.py
The value only gets reliably assigned for the first worker in the group. Writing a constant value of 1 to the output only results in 1's being written to the first worker in each group (the rest of the values are 0). This might be a writing issue, or it might be a problem with the call to get_global_id(0)
A secondary issue is that the openCL driver will only allow worker groups in powers of 2 - so I cannot assign 12 workers here, only 8:
Traceback_ (most recent call last):
File "opencl_test.py", line 91, in
exec_evt = prg.sum(queue, global_size, local_size, a_buf, b_buf, dest_buf)
File "/usr/local/lib/python2.7/dist-packages/pyopencl/cffi_cl.py", line 1766, in call
return self._enqueue(self, queue, global_size, local_size, *args, **kwargs)
File "", line 90, in enqueue_knl_sum
File "/usr/local/lib/python2.7/dist-packages/pyopencl/cffi_cl.py", line 1952, in enqueue_nd_range_kernel
global_work_size, local_work_size, c_wait_for, num_wait_for))
File "/usr/local/lib/python2.7/dist-packages/pyopencl/cffi_cl.py", line 664, in _handle_error
raise e
pyopencl.cffi_cl.LogicError: clEnqueueNDRangeKernel failed: INVALID_WORK_GROUP_SIZE
I'm unsure if there's an incorrect assumption in the pyopencl code somewhere... but this reduces total throughput by a third. With 8 workers the result is:
gpu: 2.32745s
cpu: 0.469361066818s
The text was updated successfully, but these errors were encountered: