-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors using CL_DEVICE_TYPE_CPU on Mac OSX Yosemite #158
Comments
Can you share the source of |
Hi Denis, Its the one from your gist - Thanks,
|
In that case its a known issue (see #92). That issue have never been resolved as far as I know. I would raise a ticket with Apple support, since the code in the gist is standard OpenCL and should work. Edit: Also, I don't have access to a MacOSX machine, so there is not much I can do here. |
Ah ok.. I’ve looked at #92. Will change the filter to CL_DEVICE_TYPE_GPU for the Mac platform for now. Not sure if this applies to the new Macs though. Thanks for your help, Rajesh.
|
Closing this issue as it seems to be a bug in the Apple OpenCL framework. FYI, the 10.10.2 update also does not fix this. |
Thank you for the info! If you are able to find a workaround, I would be glad to either introduce it to vexcl or accept a pull request. |
Looking into it… will keep you updated.
|
Denis, Could it be workgroups / queues related ? Never mind, will check it myself without bothering you. |
I think Apple's OpenCL implementation does not support workgroups of more than one workitem on CPUs (vexcl uses this restriction for kernels on CPU devices), but in the gist the kernel is never launched since it fails the compilation step. |
Here's what works on the CPU thus far ... (taken from your examples, of course ) #include <iostream>
#include <vector>
#include <string>
#include <stdexcept>
#define __CL_ENABLE_EXCEPTIONS
#include <vexcl/vexcl.hpp>
//---------------------------------------------------------------------------
int main()
{
const size_t n = 1024 * 1024;
vex::Context ctx( vex::Filter::Type(CL_DEVICE_TYPE_CPU) );
std::vector<double> a(n, 1.0);
std::vector<double> c(n, 0.5);
std::vector<double> results(n);
vex::vector<double> A(ctx.queue(), a);
vex::vector<double> B(ctx.queue(), n);
vex::vector<double> C(ctx.queue(), c);
A = (B + C) / 5;
vex::copy(A, results);
std::cout << "Size: " << results.size() << std::endl;
std::cout << "First 2: " << results[0] << "," << results[1] << std::endl;
VEX_FUNCTION(double, squared_radius, (double, x)(double, y),
return x * x + y * y;
);
A = sqrt(squared_radius(B, C));
vex::copy(A, results);
std::cout << "Size: " << results.size() << std::endl;
std::cout << "First 2: " << results[0] << "," << results[1] << std::endl;
VEX_STENCIL_OPERATOR(S, /*return type:*/double, /*window width:*/3, /*center:*/1,
"return sin(X[0] - X[-1]) + sin(X[1] - X[0]);", ctx);
A = S(A);
vex::copy(A, results);
std::cout << "Size: " << results.size() << std::endl;
std::cout << "First 2: " << results[0] << "," << results[1] << std::endl;
} I'm going to try writing a few custom kernels that use the stencil window to see if I can break anything, Let me know if this makes any sense. |
I think it would make more sense to run unit tests distributed with vexcl. You can do this with
After that the test log may be found at |
Hi Denis, here goes ... |
So the failing kernels I can see are:
#if defined(cl_khr_fp64)
# pragma OPENCL EXTENSION cl_khr_fp64: enable
#elif defined(cl_amd_fp64)
# pragma OPENCL EXTENSION cl_amd_fp64: enable
#endif
kernel void vexcl_vector_kernel
(
ulong n,
global double * prm_1
)
{
ulong chunk_size = (n + get_global_size(0) - 1) / get_global_size(0);
ulong chunk_start = get_global_id(0) * chunk_size;
ulong chunk_end = chunk_start + chunk_size;
if (n < chunk_end) chunk_end = n;
for(ulong idx = chunk_start; idx < chunk_end; ++idx)
{
prm_1[idx] = 42;
}
} This is the kernel from the gist above. Btw, I've had another idea worth testing about this kernel, see below.
About the failing assignment test: it occurred to me this could be due to a type mismatch: |
Will do … will get back to you shortly…
|
After using the updated compiler_bug.cpp from your gist, So basically, it makes no difference. |
I don't see anything wrong with this kernel, and it does work with any other OpenCL platform I have access to. I believe nothing left here but opening an issue with Apple support. Regarding the FFT issue: it does work correctly when workgroup size is set set to 1. So could you please check if fft tests are passing for you with branch issue-158-fft? Note however that (according to examples/fft_benchmark.cpp) VexCL's implementaion of FFT is about two orders of magnitude slower that fftw on a CPU, so there is probably no reason to use it with a CPU anyway. |
You're right .... there's an issue with the quantum of data being transferred. See this gist https://gist.github.com/lajash/1645b473676633b35d9e NDEnqueKernel issue with larger dataset. |
Checking fft now ... you're right, makes no sense to use vex::fft if no GPU involved. Will you be working on optimizing it in the future ? |
Re fft optimization: I don't think it makes sense when fftw is available. On a CPU one can just map the device memory to a host pointer and then use fftw (or any other host-side algorithm) on a device vectors (see the example here). Also, the FFT implementation was provided by @neapel, so he could probably chime in here. |
New test log added here.... https://gist.github.com/lajash/991c1bd6a1fc9d3ffa95 Doesn't look like it fixed anything though. :( |
Quick update ... the code at https://gist.github.com/lajash/1645b473676633b35d9e runs on my Mac now ... just tried running it multiple times and voila, it runs in 1 out of 4 tries ... but it's extremely slow !! So it looks like its Apple's icd that may have a issue with the CPU . This works on all other platforms I presume ? |
Your Stencil operators use slow path when run on a CPU, which may be significant for wide stencils like yours. Function based variant works 1.5 faster on my CPU than stencil-based one, and only slightly slower (about 15%) on a GPU. |
Thanks Denis …. will use function variants …
|
Running the compiler_bug.cpp from the command line using the following :
g++ -o compiler_bug compiler_bug.cpp -std=c++0x -I OpenHeaders -framework OpenCL &&
./compiler_bug
Intel(R) Core(TM) i7-2820QM CPU @ 2.30GHz
OpenCL compilation error
CVMS_ERROR_COMPILER_FAILURE: CVMS compiler has crashed or hung building an element.
clBuildProgram
Running stencil operators on the CPU also causes crashes. The same code runs on the GPU.
I'm testing this on a Macbook Pro with an i7 CPU and a ATI Radeon HD 6750M GPU.
The text was updated successfully, but these errors were encountered: