Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstraction of Cuda and OpenCL on constant vector, probably const_array<T> #182

Open
byzhang opened this issue Sep 26, 2015 · 2 comments
Open

Comments

@byzhang
Copy link
Contributor

byzhang commented Sep 26, 2015

as discussed in the readme.md

@ddemidov
Copy link
Owner

ddemidov commented Nov 8, 2015

I've been thinking about this for some time and I am not sure its possible to provide a useful abstraction over constant memory that would work both for OpenCL and CUDA. The way constant memory works is too different in these platforms.

OpenCL
Standard global memory buffer is decorated with __constant keyword when passed to kernel. That enables use of constant cache with that buffer.

CUDA
A __constant__ array is created at program scope (outside of any kernel), and initialized with a call to cudaMemcpyToSymbol API call. Any kernel in that program now may use the array; no need to pass it as kernel parameter.

Now, vexcl creates single-kernel program for each unique vector expression it encounters in code. I see two options for implementing constant_array<T,N> for CUDA backend:

  1. There are 'callbacks' that each new type can fill in to help vexcl generate kernel source and pass the actual arguments to kernel. I could add another callback that would be called right after the kernel is compiled, but before its first use (somewhere around here). constant_array<T,N> would use the callback to copy its contents to constant memory with cudaMemcpyToSymbol.
  2. I could call cudaMemcpyToSymbol each time a kernel using constant_array<T,N> is launched.

The first approach has the drawback that users won't be able to change contents of constant_array<T,N> (since its contents is copied to GPU once, when kernel is compiled). Moreover, vexcl has no means to differentiate between expressions that only differ by contents of their terminals, so the following would not work either:

constant_array<double, 32> A(...);
constant_array<double, 32> B(...);

x = func(A, ...); // Expression uses A, ok.
x = func(B, ...); // Expression uses B, but has same type as above.
                  // A was already copied to the kernel and will be used here as well.

The second approach has noticeable overhead of doing memory transfer (with cudaMemcpyToSymbol) each time a kernel is launched. Since primary use for constant memory use is speed optimization, this seems to be counterproductive.

@ddemidov
Copy link
Owner

ddemidov commented Nov 8, 2015

It should be still possible to use custom kernel with constant memory in CUDA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants