Symptom
tests/static_set/atomic_storage_test spuriously fails in Release builds on CUDA < 13.1, missing one or more inserts:
tests/static_set/atomic_storage_test.cu:74: FAILED:
REQUIRE( count == num_keys )
with expansion:
99999 (0x1869f) == 100000 (0x186a0)
Reproduces in CI on CUDA 12.0. Uses cuco::pair<uint32_t, int32_t> (8 B), hitting the packed_cas path.
Root cause
The test was added in #706 as a minimal reproducer for the CCCL __cuda_is_local / isspacep.local miscompile (originally rapidsai/cudf#18584, rapidsai/cudf#18587, NVIDIA/spark-rapids#12586). NVIDIA/cccl#4586 introduced _CCCL_ATOMIC_UNSAFE_AUTOMATIC_STORAGE to bypass the broken check, with this auto-define in libcudacxx/include/cuda/std/__internal/atomic.h:
// Enable bypassing automatic storage checks ... when using CTK 12.2 and below and if NDEBUG is defined.
#ifndef _CCCL_ATOMIC_UNSAFE_AUTOMATIC_STORAGE
# if _CCCL_CUDACC_BELOW(13, 1) && !defined(NDEBUG)
# define _CCCL_ATOMIC_UNSAFE_AUTOMATIC_STORAGE
# endif
#endif
With the current code the bypass fires only in debug builds. Cuco CI runs CMAKE_BUILD_TYPE=Release, NDEBUG is defined, thus the bug shows up.
Fix
Define the macro for cuco regardless of build mode, but gate it on CTK so it auto-disappears once the upstream nvcc bug is fixed in 13.1+. Either at the header level in include/cuco/detail/__config:
#if !defined(_CCCL_ATOMIC_UNSAFE_AUTOMATIC_STORAGE) \
&& defined(__CUDACC_VER_MAJOR__) \
&& ((__CUDACC_VER_MAJOR__ < 13) \
|| (__CUDACC_VER_MAJOR__ == 13 && __CUDACC_VER_MINOR__ < 1))
# define _CCCL_ATOMIC_UNSAFE_AUTOMATIC_STORAGE
#endif
Reproduction
./ci/build.sh -tests
ctest -R atomic_storage_test --output-on-failure # CTK < 13.1
Related
Symptom
tests/static_set/atomic_storage_testspuriously fails in Release builds on CUDA < 13.1, missing one or more inserts:Reproduces in CI on CUDA 12.0. Uses cuco::pair<uint32_t, int32_t> (8 B), hitting the packed_cas path.
Root cause
The test was added in #706 as a minimal reproducer for the CCCL
__cuda_is_local / isspacep.localmiscompile (originally rapidsai/cudf#18584, rapidsai/cudf#18587, NVIDIA/spark-rapids#12586). NVIDIA/cccl#4586 introduced_CCCL_ATOMIC_UNSAFE_AUTOMATIC_STORAGEto bypass the broken check, with this auto-define inlibcudacxx/include/cuda/std/__internal/atomic.h:With the current code the bypass fires only in debug builds. Cuco CI runs
CMAKE_BUILD_TYPE=Release,NDEBUGis defined, thus the bug shows up.Fix
Define the macro for cuco regardless of build mode, but gate it on CTK so it auto-disappears once the upstream nvcc bug is fixed in 13.1+. Either at the header level in
include/cuco/detail/__config:Reproduction
Related