We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Silent Failure
CUB
One instance: https://github.com/NVIDIA/cccl/actions/runs/9606424817/job/26497169798?pr=1879
-- >> Running: compute-sanitizer --tool initcheck --check-device-heap yes --leak-check full --padding 512 --track-stream-ordered-races all --check-warpgroup-mma yes --require-cuda-init no --check-exit-code yes --error-exitcode 1 --nvtx true /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= COMPUTE-SANITIZER ========= Uninitialized __global__ memory read of size 2 bytes ========= at std::iterator_traits<T2>::value_type cub::CUB_200500_600_700_800_NS::ThreadLoad<(cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, const unsigned long long *>(T2)+0x1fd0 in /home/coder/cccl/cub/cub/thread/thread_load.cuh:263 ========= by thread (32,0,0) in block (0,0,0) ========= Address 0x7fc918d21b00 ========= Device Frame:void cub::CUB_200500_600_700_800_NS::IterateThreadLoad<(int)0, (int)2>::Load<(cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, unsigned long long>(const T2 *, T2 *)+0x1fd0 in /home/coder/cccl/cub/cub/thread/thread_load.cuh:122 ========= Device Frame:T1 cub::CUB_200500_600_700_800_NS::ThreadLoad<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, (int)5>(const T1 *, cub::CUB_200500_600_700_800_NS::Int2Type<T2>, cub::CUB_200500_600_700_800_NS::Int2Type<(int)1>)+0x1fd0 in /home/coder/cccl/cub/cub/thread/thread_load.cuh:341 ========= Device Frame:std::iterator_traits<T2>::value_type cub::CUB_200500_600_700_800_NS::ThreadLoad<(cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *>(T2)+0x1fd0 in /home/coder/cccl/cub/cub/thread/thread_load.cuh:354 ========= Device Frame:thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> cub::CUB_200500_600_700_800_NS::CacheModifiedInputIterator<(cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, int>::operator []<int>(T1) const+0x1fd0 in /home/coder/cccl/cub/cub/iterator/cache_modified_input_iterator.cuh:217 ========= Device Frame:void cub::CUB_200500_600_700_800_NS::AgentReduce<cub::CUB_200500_600_700_800_NS::AgentReducePolicy<(int)256, (int)16, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, (int)4, (cub::CUB_200500_600_700_800_NS::BlockReduceAlgorithm)2, (cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, cub::CUB_200500_600_700_800_NS::MemBoundScaling<(int)256, (int)16, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity>::ConsumeTile<(int)1, (int)0>(thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> &, int, int, cub::CUB_200500_600_700_800_NS::Int2Type<(int)0>, cub::CUB_200500_600_700_800_NS::Int2Type<T2>)+0x1fb0 in /home/coder/cccl/cub/cub/agent/agent_reduce.cuh:329 ========= Device Frame:thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> cub::CUB_200500_600_700_800_NS::AgentReduce<cub::CUB_200500_600_700_800_NS::AgentReducePolicy<(int)256, (int)16, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, (int)4, (cub::CUB_200500_600_700_800_NS::BlockReduceAlgorithm)2, (cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, cub::CUB_200500_600_700_800_NS::MemBoundScaling<(int)256, (int)16, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity>::ConsumeRange<(int)0>(cub::CUB_200500_600_700_800_NS::GridEvenShare<int> &, cub::CUB_200500_600_700_800_NS::Int2Type<T1>)+0x1f40 in /home/coder/cccl/cub/cub/agent/agent_reduce.cuh:362 ========= Device Frame:cub::CUB_200500_600_700_800_NS::AgentReduce<cub::CUB_200500_600_700_800_NS::AgentReducePolicy<(int)256, (int)16, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, (int)4, (cub::CUB_200500_600_700_800_NS::BlockReduceAlgorithm)2, (cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, cub::CUB_200500_600_700_800_NS::MemBoundScaling<(int)256, (int)16, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity>::ConsumeRange(int, int)+0x40 in /home/coder/cccl/cub/cub/agent/agent_reduce.cuh:386 ========= Device Frame:void cub::CUB_200500_600_700_800_NS::DeviceReduceSingleTileKernel<cub::CUB_200500_600_700_800_NS::DeviceReducePolicy<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, unsigned int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>>::Policy600, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity>(T2, T3, T4, T5, T6, T8)+0x40 in /home/coder/cccl/cub/cub/device/dispatch/dispatch_reduce.cuh:287 ========= Saved host backtrace up to driver entry point at kernel launch time ========= Host Frame: [0x2ef36f] ========= in /usr/lib/x86_64-linux-gnu/libcuda.so.1 ========= Host Frame:libcudart_static_4d8b33a106dceb3c07a56e26de61f2d53bb62a68 [0x12810d] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= Host Frame:cudaLaunchKernel [0x18b84d] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= Host Frame:void cub::CUB_200500_600_700_800_NS::DeviceReduceSingleTileKernel<cub::CUB_200500_600_700_800_NS::DeviceReducePolicy<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, unsigned int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> > >::Policy600, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity>(thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity) [0xc787c] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= Host Frame:cudaError thrust::THRUST_200500_600_700_800_NS::cuda_cub::launcher::triple_chevron::doit_host<void (*)(thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity), thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity>(void (*)(thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity), thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>* const&, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>* const&, int const&, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> > const&, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> const&, cuda::std::__4::__identity const&) const [clone .isra.0] [0x9957a] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= Host Frame:cudaError cub::CUB_200500_600_700_800_NS::DeviceReduce::Reduce<thrust::THRUST_200500_600_700_800_NS::zip_iterator<thrust::THRUST_200500_600_700_800_NS::tuple<thrust::THRUST_200500_600_700_800_NS::cuda_cub::transform_input_iterator_t<bool, thrust::THRUST_200500_600_700_800_NS::cuda_cub::transform_pair_of_input_iterators_t<bool, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<unsigned char const> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<unsigned char const> >, thrust::THRUST_200500_600_700_800_NS::equal_to<unsigned char> >, thrust::THRUST_200500_600_700_800_NS::detail::unary_negate<thrust::THRUST_200500_600_700_800_NS::cuda_cub::identity> >, thrust::THRUST_200500_600_700_800_NS::cuda_cub::counting_iterator_t<long> > >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, int>(void*, unsigned long&, thrust::THRUST_200500_600_700_800_NS::zip_iterator<thrust::THRUST_200500_600_700_800_NS::tuple<thrust::THRUST_200500_600_700_800_NS::cuda_cub::transform_input_iterator_t<bool, thrust::THRUST_200500_600_700_800_NS::cuda_cub::transform_pair_of_input_iterators_t<bool, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<unsigned char const> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<unsigned char const> >, thrust::THRUST_200500_600_700_800_NS::equal_to<unsigned char> >, thrust::THRUST_200500_600_700_800_NS::detail::unary_negate<thrust::THRUST_200500_600_700_800_NS::cuda_cub::identity> >, thrust::THRUST_200500_600_700_800_NS::cuda_cub::counting_iterator_t<long> > >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, CUstream_st*) [clone .isra.0] [0xa3347] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= Host Frame:void C_A_T_C_H_T_E_M_P_L_A_T_E_T_E_S_T_F_U_N_C_6<metal::list<unsigned char> >() [0xb609b] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= Host Frame:Catch::RunContext::invokeActiveTestCase() [0x3f642] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= Host Frame:Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) [0x57aa7] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= Host Frame:Catch::RunContext::runTest(Catch::TestCase const&) [0x6078a] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= Host Frame:Catch::Session::runInternal() [0x6ac0f] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= Host Frame:Catch::Session::run() [0x6b1fd] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= Host Frame:main [0x2f8c5] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0 ========= Host Frame: [0x23a8f] ========= in /usr/lib/x86_64-linux-gnu/libc.so.6 ========= Host Frame:__libc_start_main [0x23b48] ========= in /usr/lib/x86_64-linux-gnu/libc.so.6 ========= Host Frame:_start [0x36394] ========= in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
compute-sanitizer --tool initcheck --check-device-heap yes --leak-check full --padding 512 --track-stream-ordered-races all --check-warpgroup-mma yes --require-cuda-init no --check-exit-code yes --error-exitcode 1 --nvtx true /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
Uninitialized memory is not read.
No response
The text was updated successfully, but these errors were encountered:
compute-sanitizer
No branches or pull requests
Is this a duplicate?
Type of Bug
Silent Failure
Component
CUB
Describe the bug
One instance: https://github.com/NVIDIA/cccl/actions/runs/9606424817/job/26497169798?pr=1879
How to Reproduce
Expected behavior
Uninitialized memory is not read.
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
The text was updated successfully, but these errors were encountered: