Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Uninitialized __global__ memory read of size 2 bytes in cub::DeviceReduce #1889

Open
1 task done
Tracked by #1618
alliepiper opened this issue Jun 21, 2024 · 0 comments
Open
1 task done
Tracked by #1618
Labels
bug Something isn't working right.

Comments

@alliepiper
Copy link
Collaborator

Is this a duplicate?

Type of Bug

Silent Failure

Component

CUB

Describe the bug

One instance: https://github.com/NVIDIA/cccl/actions/runs/9606424817/job/26497169798?pr=1879

-- >> Running:
	compute-sanitizer --tool initcheck --check-device-heap yes --leak-check full --padding 512 --track-stream-ordered-races all --check-warpgroup-mma yes --require-cuda-init no --check-exit-code yes --error-exitcode 1 --nvtx true /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
========= COMPUTE-SANITIZER
========= Uninitialized __global__ memory read of size 2 bytes
=========     at std::iterator_traits<T2>::value_type cub::CUB_200500_600_700_800_NS::ThreadLoad<(cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, const unsigned long long *>(T2)+0x1fd0 in /home/coder/cccl/cub/cub/thread/thread_load.cuh:263
=========     by thread (32,0,0) in block (0,0,0)
=========     Address 0x7fc918d21b00
=========     Device Frame:void cub::CUB_200500_600_700_800_NS::IterateThreadLoad<(int)0, (int)2>::Load<(cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, unsigned long long>(const T2 *, T2 *)+0x1fd0 in /home/coder/cccl/cub/cub/thread/thread_load.cuh:122
=========     Device Frame:T1 cub::CUB_200500_600_700_800_NS::ThreadLoad<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, (int)5>(const T1 *, cub::CUB_200500_600_700_800_NS::Int2Type<T2>, cub::CUB_200500_600_700_800_NS::Int2Type<(int)1>)+0x1fd0 in /home/coder/cccl/cub/cub/thread/thread_load.cuh:341
=========     Device Frame:std::iterator_traits<T2>::value_type cub::CUB_200500_600_700_800_NS::ThreadLoad<(cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *>(T2)+0x1fd0 in /home/coder/cccl/cub/cub/thread/thread_load.cuh:354
=========     Device Frame:thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> cub::CUB_200500_600_700_800_NS::CacheModifiedInputIterator<(cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, int>::operator []<int>(T1) const+0x1fd0 in /home/coder/cccl/cub/cub/iterator/cache_modified_input_iterator.cuh:217
=========     Device Frame:void cub::CUB_200500_600_700_800_NS::AgentReduce<cub::CUB_200500_600_700_800_NS::AgentReducePolicy<(int)256, (int)16, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, (int)4, (cub::CUB_200500_600_700_800_NS::BlockReduceAlgorithm)2, (cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, cub::CUB_200500_600_700_800_NS::MemBoundScaling<(int)256, (int)16, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity>::ConsumeTile<(int)1, (int)0>(thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> &, int, int, cub::CUB_200500_600_700_800_NS::Int2Type<(int)0>, cub::CUB_200500_600_700_800_NS::Int2Type<T2>)+0x1fb0 in /home/coder/cccl/cub/cub/agent/agent_reduce.cuh:329
=========     Device Frame:thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> cub::CUB_200500_600_700_800_NS::AgentReduce<cub::CUB_200500_600_700_800_NS::AgentReducePolicy<(int)256, (int)16, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, (int)4, (cub::CUB_200500_600_700_800_NS::BlockReduceAlgorithm)2, (cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, cub::CUB_200500_600_700_800_NS::MemBoundScaling<(int)256, (int)16, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity>::ConsumeRange<(int)0>(cub::CUB_200500_600_700_800_NS::GridEvenShare<int> &, cub::CUB_200500_600_700_800_NS::Int2Type<T1>)+0x1f40 in /home/coder/cccl/cub/cub/agent/agent_reduce.cuh:362
=========     Device Frame:cub::CUB_200500_600_700_800_NS::AgentReduce<cub::CUB_200500_600_700_800_NS::AgentReducePolicy<(int)256, (int)16, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, (int)4, (cub::CUB_200500_600_700_800_NS::BlockReduceAlgorithm)2, (cub::CUB_200500_600_700_800_NS::CacheLoadModifier)5, cub::CUB_200500_600_700_800_NS::MemBoundScaling<(int)256, (int)16, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity>::ConsumeRange(int, int)+0x40 in /home/coder/cccl/cub/cub/agent/agent_reduce.cuh:386
=========     Device Frame:void cub::CUB_200500_600_700_800_NS::DeviceReduceSingleTileKernel<cub::CUB_200500_600_700_800_NS::DeviceReducePolicy<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, unsigned int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>>::Policy600, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> *, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity>(T2, T3, T4, T5, T6, T8)+0x40 in /home/coder/cccl/cub/cub/device/dispatch/dispatch_reduce.cuh:287
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x2ef36f]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:libcudart_static_4d8b33a106dceb3c07a56e26de61f2d53bb62a68 [0x12810d]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
=========     Host Frame:cudaLaunchKernel [0x18b84d]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
=========     Host Frame:void cub::CUB_200500_600_700_800_NS::DeviceReduceSingleTileKernel<cub::CUB_200500_600_700_800_NS::DeviceReducePolicy<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, unsigned int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> > >::Policy600, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity>(thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity) [0xc787c]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
=========     Host Frame:cudaError thrust::THRUST_200500_600_700_800_NS::cuda_cub::launcher::triple_chevron::doit_host<void (*)(thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity), thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity>(void (*)(thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, cuda::std::__4::__identity), thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>* const&, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>* const&, int const&, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> > const&, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> const&, cuda::std::__4::__identity const&) const [clone .isra.0] [0x9957a]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
=========     Host Frame:cudaError cub::CUB_200500_600_700_800_NS::DeviceReduce::Reduce<thrust::THRUST_200500_600_700_800_NS::zip_iterator<thrust::THRUST_200500_600_700_800_NS::tuple<thrust::THRUST_200500_600_700_800_NS::cuda_cub::transform_input_iterator_t<bool, thrust::THRUST_200500_600_700_800_NS::cuda_cub::transform_pair_of_input_iterators_t<bool, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<unsigned char const> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<unsigned char const> >, thrust::THRUST_200500_600_700_800_NS::equal_to<unsigned char> >, thrust::THRUST_200500_600_700_800_NS::detail::unary_negate<thrust::THRUST_200500_600_700_800_NS::cuda_cub::identity> >, thrust::THRUST_200500_600_700_800_NS::cuda_cub::counting_iterator_t<long> > >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, int>(void*, unsigned long&, thrust::THRUST_200500_600_700_800_NS::zip_iterator<thrust::THRUST_200500_600_700_800_NS::tuple<thrust::THRUST_200500_600_700_800_NS::cuda_cub::transform_input_iterator_t<bool, thrust::THRUST_200500_600_700_800_NS::cuda_cub::transform_pair_of_input_iterators_t<bool, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<unsigned char const> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<unsigned char const> >, thrust::THRUST_200500_600_700_800_NS::equal_to<unsigned char> >, thrust::THRUST_200500_600_700_800_NS::detail::unary_negate<thrust::THRUST_200500_600_700_800_NS::cuda_cub::identity> >, thrust::THRUST_200500_600_700_800_NS::cuda_cub::counting_iterator_t<long> > >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>*, int, thrust::THRUST_200500_600_700_800_NS::cuda_cub::__find_if::functor<thrust::THRUST_200500_600_700_800_NS::tuple<bool, long> >, thrust::THRUST_200500_600_700_800_NS::tuple<bool, long>, CUstream_st*) [clone .isra.0] [0xa3347]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
=========     Host Frame:void C_A_T_C_H_T_E_M_P_L_A_T_E_T_E_S_T_F_U_N_C_6<metal::list<unsigned char> >() [0xb609b]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
=========     Host Frame:Catch::RunContext::invokeActiveTestCase() [0x3f642]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
=========     Host Frame:Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) [0x57aa7]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
=========     Host Frame:Catch::RunContext::runTest(Catch::TestCase const&) [0x6078a]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
=========     Host Frame:Catch::Session::runInternal() [0x6ac0f]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
=========     Host Frame:Catch::Session::run() [0x6b1fd]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
=========     Host Frame:main [0x2f8c5]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0
=========     Host Frame: [0x23a8f]
=========                in /usr/lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:__libc_start_main [0x23b48]
=========                in /usr/lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:_start [0x36394]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0

How to Reproduce

compute-sanitizer --tool initcheck --check-device-heap yes --leak-check full --padding 512 --track-stream-ordered-races all --check-warpgroup-mma yes --require-cuda-init no --check-exit-code yes --error-exitcode 1 --nvtx true /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_adjacent_difference_substract_left.lid_0

Expected behavior

Uninitialized memory is not read.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Status: Todo
Development

No branches or pull requests

1 participant