Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Thrust cannot access Device #24

Closed
isohl opened this issue Aug 3, 2015 · 4 comments
Closed

CUDA Thrust cannot access Device #24

isohl opened this issue Aug 3, 2015 · 4 comments
Assignees

Comments

@isohl
Copy link

isohl commented Aug 3, 2015

CUDA Thrust calls such as the following:

thrust::exclusive_scan(thrust::device_ptr<uint2>(input), thrust::device_ptr<uint2>(input + numElements), thrust::device_ptr<uint2>(output), zero);

give the error (generated with -DREALM_BACKTRACE):

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  function_attributes(): after cudaFuncGetAttributes: invalid device function
BACKTRACE (0, 7f3c38567700)
----------
./composite() [0xb21c75]
  /lib/x86_64-linux-gnu/libc.so.6 : ()+0x36d40
  /lib/x86_64-linux-gnu/libc.so.6 : gsignal()+0x39
  /lib/x86_64-linux-gnu/libc.so.6 : abort()+0x148
  /usr/lib/x86_64-linux-gnu/libstdc++.so.6 : __gnu_cxx::__verbose_terminate_handler()+0x155
  /usr/lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0x5e836
  /usr/lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0x5e863
  /usr/lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0x5eaa2
  ./composite : thrust::system::cuda::detail::bulk_::detail::throw_on_error(cudaError, char const*)+0x50
  ./composite : thrust::system::cuda::detail::bulk_::detail::function_attributes_t thrust::system::cuda::detail::bulk_::detail::function_attributes<void (*)(thrust::system::cuda::detail::bulk_::detail::cuda_task<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<9ul>, 128ul>, 0ul>, thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::scan_detail::accumulate_tiles, thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<1u>, thrust::device_ptr<uint2>, thrust::system::cuda::detail::aligned_decomposition<long>, thrust::detail::normal_iterator<thrust::pointer<uint2, thrust::system::cuda::detail::tag, thrust::use_default, thrust::use_default> >, thrust::plus<uint2>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> > >)>(void (*)(thrust::system::cuda::detail::bulk_::detail::cuda_task<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<9ul>, 128ul>, 0ul>, thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::scan_detail::accumulate_tiles, thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<1u>, thrust::device_ptr<uint2>, thrust::system::cuda::detail::aligned_decomposition<long>, thrust::detail::normal_iterator<thrust::pointer<uint2, thrust::system::cuda::detail::tag, thrust
----------
BACKTRACE (0, 7f3d429f9700)
----------
./composite() [0xb21c75]
  /lib/x86_64-linux-gnu/libc.so.6 : ()+0x36d40
  /lib/x86_64-linux-gnu/libglib-2.0.so.0 : g_main_context_check()+0x134
  /lib/x86_64-linux-gnu/libglib-2.0.so.0 : ()+0x48f7b
  /lib/x86_64-linux-gnu/libglib-2.0.so.0 : g_main_context_iteration()+0x2c
  /usr/local/Qt/5.4/gcc_64/lib/libQt5Core.so.5 : QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>)+0xc3
  /usr/local/Qt/5.4/gcc_64/lib/libQt5Core.so.5 : QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>)+0xcb
  /usr/local/Qt/5.4/gcc_64/lib/libQt5Core.so.5 : QCoreApplication::exec()+0x85
  /home/xin/gitlab/legioncomposite/QtViewer/libQtViewer.so.1 : interactThread(void*)+0x6d
  /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x8182
  /lib/x86_64-linux-gnu/libc.so.6 : clone()+0x6d

----------

A simple test of this failure is:

thrust::host_vector<int> H(4);
H[0] = 14;
H[1] = 20;
H[2] = 38;
H[3] = 46;
thrust::device_vector<int> D = H;
thrust::inclusive_scan(D.begin(), D.end(), D.begin());
@lightsighter
Copy link
Contributor

In Legion, we are very particular about data is managed. In particular, we want all data to be stored in logical regions and not in any third-party data structures like thrust vectors. Once it's out of our control, we can't reason about it and that can lead to problems. In this specific case, there is another issue: we actually don't allow Legion GPU tasks to do anything other than launch kernels (all data movement to and from the GPU should be done via the placement of physical instances in GPU memories through the mapping interface). You may have noticed that the Legion build system only links against '-lcuda' and not '-lcudart'. We do this intentionally, so we can scope exactly the set of CUDA functions that users are permitted to invoke without interfering with the Legion programming model. I'm actually not sure how you managed to get thrust to link with our Legion build system since thrust requires '-lcudart'.

@isohl
Copy link
Author

isohl commented Aug 3, 2015

Does this mean that you do not plan to support Thrust and other similar libraries?

@lightsighter
Copy link
Contributor

We will support a specific subset of thrust that plays nice with Legion. For example, once you've got a physical instance of a logical region in framebuffer memory, Legion can provide you a raw device pointer to that instance. You can then wrap this device pointer in a thrust vector and use thrust calls to launch kernels on the vector (be sure the vector doesn't have an allocator associated with it so thrust doesn't try to reclaim the memory after the handle goes out of scope or bad things will happen). Legion will actually modify the kernel launches underneath of the CUDA runtime API to properly defer them on the right streams so you don't have to synchronize before exiting your GPU task. There is a good example of that here: https://github.com/StanfordLegion/legion/blob/stable/examples/full_circuit/circuit_gpu.cu#L390-L401 Notice we don't need to synchronize with the GPU after the kernel launch; Legion handles that automatically while allowing the GPU processor to continue executing and launching more kernels.

@lightsighter
Copy link
Contributor

I've added some missing CUDA functions that thrust needs for most of its kernel calls. There is now an example program showing how to use thrust with Legion.

https://github.com/StanfordLegion/legion/tree/master/examples/thrust_interop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants