-
Notifications
You must be signed in to change notification settings - Fork 343
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Compile-time Error
Component
CUB
Describe the bug
This test fails when targeting sm_120 architecture:
#include <execution>
#include <numeric>
#include <vector>
int32_t
main()
{
std::vector<double> in(1000);
std::vector<double> out(1000);
auto orr = std::inclusive_scan(std::execution::par, in.begin(), in.end(), out.begin());
return 0;
}
it fails with:
cuda/std/__atomic/functions/host.h", line 85: error: static assertion failed with "atomic_ref<T> where sizeof(T) > 8 is not supported on this system."
_LIBCUDACXX_INT128_WARN(_Tp)
^
cuda/std/__atomic/functions/host.h", line 85: note: the final comparison was 16 < 16
_LIBCUDACXX_INT128_WARN(_Tp)
^
On sm_120, we use Policy1000, which includes the WarpspeedPolicy structure. For type double, scan_use_warpspeed evaluates to true, leading to the instantiation of atomic_ref<tile_state_t<double>>.
The function storeTileAggregate in cub/detail/warpspeed/look_ahead.cuh contains:
if constexpr (sizeof(tile_state_t<AccumT>) <= 16
Meanwhile, cuda/std/__atomic/functions/host.h enforces:
static_assert(sizeof(_Tp) < 16, "atomic_ref<T> where sizeof(T) > 8 is not supported on this system.");
Because storeTileAggregate allows sizeof(tile_state_t<AccumT>) == 16, it takes the atomic_ref path.
However, atomic_ref later asserts against sizeof(T) < 16, causing the build to fail.
This results in an inconsistent condition between these two checks (<= 16 vs < 16), leading to a static assertion failure when building for sm_120.
How to Reproduce
nvc++ -stdpar --c++17 test_scan.cpp
Expected behavior
Compilation should succeed, or the size conditions should be made consistent to prevent this mismatch.
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status