Conversation
This implements the copy{_n} algorithms for the cuda backend.
* std::copy see https://en.cppreference.com/w/cpp/algorithm/copy.html
* std::copy_n see https://en.cppreference.com/w/cpp/algorithm/copy_n.html
It provides tests and benchmarks similar to Thrust and some boilerplate for libcu++
The functionality is publicly available yet and implemented in a private internal header
Fixes NVIDIA#7366
This comment has been minimized.
This comment has been minimized.
libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.copy/pstl_copy.cu
Show resolved
Hide resolved
| } | ||
| catch (const ::cuda::cuda_error& __err) | ||
| { | ||
| if (__err.status() == cudaErrorMemoryAllocation) |
There was a problem hiding this comment.
Just from my curiosity, why do we want to translate that exception here?
There was a problem hiding this comment.
That is the standard requirement. Usually the algorithm should either throw a bad_alloc or terminate. We do the implementation extension to throw the cuda error too.
|
suggestion: |
Since we decided to not automatically derive the memory space from iterators, like Thrust does, the PSTL's CUDA backend must assume that all memory is device accessible. Therefore, we cannot implement heterogeneous copies with |
|
Benchmarks look fine, there is some minor slowdown for small integers, that could also just be the noise of my machine: |
| { // CUB requires a 32 or 64 bit offset type, so cast here | ||
| using _OffsetType = ::cub::detail::choose_signed_offset_t<iter_difference_t<_InputIterator>>; | ||
| return __par_impl( | ||
| __policy, | ||
| ::cuda::std::move(__first), | ||
| static_cast<_OffsetType>(__count), |
There was a problem hiding this comment.
Q: I don't understand why we need to cast the iterator difference type. CUB should be able to handle any offset type at the public API, otherwise we have a bug. Why is this needed?
There was a problem hiding this comment.
Cub has a static assert that it only accepts 32 and 64 bit signed integer types
That might not always be the case for example counting_iterator has the payload as difference type if it is a siggned integer
There was a problem hiding this comment.
Ah, I get it now. You are calling random internal dispatch function and not the public cub::DeviceTransform API. Why is this necessary? Why can't we just call the public API (which handles offset types accordingly).
There was a problem hiding this comment.
oh that was just something thrust did, so I wasnt sure whether its necessary anymore
😬 CI Workflow Results🟥 Finished in 1h 03m: Pass: 94%/101 | Total: 16h 54m | Max: 38m 44s | Hits: 98%/250260See results here. |
This implements the copy{_n} algorithms for the cuda backend.
Fixes #7366