Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Uninitialized __global__ memory read of size 8 bytes in CUB sorting tests #1891

Open
1 task done
Tracked by #1618
alliepiper opened this issue Jun 21, 2024 · 1 comment
Open
1 task done
Tracked by #1618
Labels
bug Something isn't working right.

Comments

@alliepiper
Copy link
Collaborator

alliepiper commented Jun 21, 2024

Is this a duplicate?

Type of Bug

Silent Failure

Component

CUB

Describe the bug

This appears in several tests for different CUB sorting algorithms:

  • radix sort
  • segmented radix sort
  • merge sort
  • thread sort

https://github.com/NVIDIA/cccl/actions/runs/9606424817/job/26497169798?pr=1879

	compute-sanitizer --tool initcheck --check-device-heap yes --leak-check full --padding 512 --track-stream-ordered-races all --check-warpgroup-mma yes --require-cuda-init no --check-exit-code yes --error-exitcode 1 --nvtx true /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
========= COMPUTE-SANITIZER
========= Uninitialized __global__ memory read of size 8 bytes
=========     at std::iterator_traits<T2>::value_type cub::CUB_200500_600_700_800_NS::ThreadLoad<(cub::CUB_200500_600_700_800_NS::CacheLoadModifier)2, const unsigned long long *>(T2)+0x8a70 in /home/coder/cccl/cub/cub/thread/thread_load.cuh:260
=========     by thread (16,0,0) in block (80,0,0)
=========     Address 0x7f615d40daf0
=========     Device Frame:void cub::CUB_200500_600_700_800_NS::IterateThreadLoad<(int)0, (int)2>::Load<(cub::CUB_200500_600_700_800_NS::CacheLoadModifier)2, unsigned long long>(const T2 *, T2 *)+0x8a70 in /home/coder/cccl/cub/cub/thread/thread_load.cuh:122
=========     Device Frame:T1 cub::CUB_200500_600_700_800_NS::ThreadLoad<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (int)2>(const T1 *, cub::CUB_200500_600_700_800_NS::Int2Type<T2>, cub::CUB_200500_600_700_800_NS::Int2Type<(int)1>)+0x8a70 in /home/coder/cccl/cub/cub/thread/thread_load.cuh:341
=========     Device Frame:std::iterator_traits<T2>::value_type cub::CUB_200500_600_700_800_NS::ThreadLoad<(cub::CUB_200500_600_700_800_NS::CacheLoadModifier)2, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple *>(T2)+0x8a50 in /home/coder/cccl/cub/cub/thread/thread_load.cuh:354
=========     Device Frame:void cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (bool)0>::WaitForValid<cub::CUB_200500_600_700_800_NS::detail::no_delay_constructor_t<(unsigned int)450>::delay_t>(int, unsigned int &, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple &, T1)+0x8a50 in /home/coder/cccl/cub/cub/agent/single_pass_scan_operators.cuh:830
=========     Device Frame:void cub::CUB_200500_600_700_800_NS::TilePrefixCallbackOp<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (bool)0>, (int)0, cub::CUB_200500_600_700_800_NS::detail::no_delay_constructor_t<(unsigned int)450>>::ProcessWindow<cub::CUB_200500_600_700_800_NS::detail::no_delay_constructor_t<(unsigned int)450>::delay_t>(int, unsigned int &, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple &, T1)+0x8920 in /home/coder/cccl/cub/cub/agent/single_pass_scan_operators.cuh:1145
=========     Device Frame:cub::CUB_200500_600_700_800_NS::TilePrefixCallbackOp<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (bool)0>, (int)0, cub::CUB_200500_600_700_800_NS::detail::no_delay_constructor_t<(unsigned int)450>>::operator ()(thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple)+0x8920 in /home/coder/cccl/cub/cub/agent/single_pass_scan_operators.cuh:1183
=========     Device Frame:void cub::CUB_200500_600_700_800_NS::BlockScanWarpScans<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (int)128, (int)1, (int)1, (int)0>::ExclusiveScan<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::TilePrefixCallbackOp<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (bool)0>, (int)0, cub::CUB_200500_600_700_800_NS::detail::no_delay_constructor_t<(unsigned int)450>>>(thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple &, T1, T2 &)+0x3580 in /home/coder/cccl/cub/cub/block/specializations/block_scan_warp_scans.cuh:415
=========     Device Frame:void cub::CUB_200500_600_700_800_NS::BlockScan<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (int)128, (cub::CUB_200500_600_700_800_NS::BlockScanAlgorithm)2, (int)1, (int)1, (int)0>::ExclusiveScan<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::TilePrefixCallbackOp<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (bool)0>, (int)0, cub::CUB_200500_600_700_800_NS::detail::no_delay_constructor_t<(unsigned int)450>>>(thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple &, T1, T2 &)+0x3030 in /home/coder/cccl/cub/cub/block/block_scan.cuh:992
=========     Device Frame:void cub::CUB_200500_600_700_800_NS::BlockScan<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (int)128, (cub::CUB_200500_600_700_800_NS::BlockScanAlgorithm)2, (int)1, (int)1, (int)0>::InclusiveScan<(int)3, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::TilePrefixCallbackOp<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (bool)0>, (int)0, cub::CUB_200500_600_700_800_NS::detail::no_delay_constructor_t<(unsigned int)450>>>(thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple (&)[T1], thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple (&)[T1], T2, T3 &)+0x3030 in /home/coder/cccl/cub/cub/block/block_scan.cuh:2576
=========     Device Frame:void cub::CUB_200500_600_700_800_NS::AgentScan<cub::CUB_200500_600_700_800_NS::AgentScanPolicy<(int)128, (int)15, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (cub::CUB_200500_600_700_800_NS::BlockLoadAlgorithm)4, (cub::CUB_200500_600_700_800_NS::CacheLoadModifier)0, (cub::CUB_200500_600_700_800_NS::BlockStoreAlgorithm)4, (cub::CUB_200500_600_700_800_NS::BlockScanAlgorithm)2, cub::CUB_200500_600_700_800_NS::MemBoundScaling<(int)128, (int)15, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple>, cub::CUB_200500_600_700_800_NS::detail::no_delay_constructor_t<(unsigned int)450>>, thrust::THRUST_200500_600_700_800_NS::transform_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::construct_key_flag_op, thrust::THRUST_200500_600_700_800_NS::counting_iterator<unsigned long, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::transform_output_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::write_output_op<thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>>, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int>>>, thrust::THRUST_200500_600_700_800_NS::discard_iterator<unsigned long>>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::NullType, int, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple>::ScanTile<cub::CUB_200500_600_700_800_NS::TilePrefixCallbackOp<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (bool)0>, (int)0, cub::CUB_200500_600_700_800_NS::detail::no_delay_constructor_t<(unsigned int)450>>>(thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple (&)[3], thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, T1 &, cub::CUB_200500_600_700_800_NS::Int2Type<(int)1>)+0x3030 in /home/coder/cccl/cub/cub/agent/agent_scan.cuh:280
=========     Device Frame:void cub::CUB_200500_600_700_800_NS::AgentScan<cub::CUB_200500_600_700_800_NS::AgentScanPolicy<(int)128, (int)15, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (cub::CUB_200500_600_700_800_NS::BlockLoadAlgorithm)4, (cub::CUB_200500_600_700_800_NS::CacheLoadModifier)0, (cub::CUB_200500_600_700_800_NS::BlockStoreAlgorithm)4, (cub::CUB_200500_600_700_800_NS::BlockScanAlgorithm)2, cub::CUB_200500_600_700_800_NS::MemBoundScaling<(int)128, (int)15, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple>, cub::CUB_200500_600_700_800_NS::detail::no_delay_constructor_t<(unsigned int)450>>, thrust::THRUST_200500_600_700_800_NS::transform_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::construct_key_flag_op, thrust::THRUST_200500_600_700_800_NS::counting_iterator<unsigned long, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::transform_output_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::write_output_op<thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>>, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int>>>, thrust::THRUST_200500_600_700_800_NS::discard_iterator<unsigned long>>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::NullType, int, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple>::ConsumeTile<(bool)1>(int, int, int, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (bool)0> &)+0x3030 in /home/coder/cccl/cub/cub/agent/agent_scan.cuh:369
=========     Device Frame:cub::CUB_200500_600_700_800_NS::AgentScan<cub::CUB_200500_600_700_800_NS::AgentScanPolicy<(int)128, (int)15, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (cub::CUB_200500_600_700_800_NS::BlockLoadAlgorithm)4, (cub::CUB_200500_600_700_800_NS::CacheLoadModifier)0, (cub::CUB_200500_600_700_800_NS::BlockStoreAlgorithm)4, (cub::CUB_200500_600_700_800_NS::BlockScanAlgorithm)2, cub::CUB_200500_600_700_800_NS::MemBoundScaling<(int)128, (int)15, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple>, cub::CUB_200500_600_700_800_NS::detail::no_delay_constructor_t<(unsigned int)450>>, thrust::THRUST_200500_600_700_800_NS::transform_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::construct_key_flag_op, thrust::THRUST_200500_600_700_800_NS::counting_iterator<unsigned long, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::transform_output_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::write_output_op<thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>>, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int>>>, thrust::THRUST_200500_600_700_800_NS::discard_iterator<unsigned long>>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::NullType, int, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple>::ConsumeRange(int, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (bool)0> &, int)+0xa0 in /home/coder/cccl/cub/cub/agent/agent_scan.cuh:419
=========     Device Frame:void cub::CUB_200500_600_700_800_NS::DeviceScanKernel<cub::CUB_200500_600_700_800_NS::DeviceScanPolicy<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op>::Policy900, thrust::THRUST_200500_600_700_800_NS::transform_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::construct_key_flag_op, thrust::THRUST_200500_600_700_800_NS::counting_iterator<unsigned long, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::transform_output_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::write_output_op<thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>>, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int>>>, thrust::THRUST_200500_600_700_800_NS::discard_iterator<unsigned long>>, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, (bool)0>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::NullType, int, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple>(T2, T3, T4, int, T5, T6, T7)+0x20 in /home/coder/cccl/cub/cub/device/dispatch/dispatch_scan.cuh:197
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x2ef36f]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:libcudart_static_4d8b33a106dceb3c07a56e26de61f2d53bb62a68 [0x1b8a7d]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:cudaLaunchKernel [0x21c1bd]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:void cub::CUB_200500_600_700_800_NS::DeviceScanKernel<cub::CUB_200500_600_700_800_NS::DeviceScanPolicy<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op>::Policy900, thrust::THRUST_200500_600_700_800_NS::transform_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::construct_key_flag_op, thrust::THRUST_200500_600_700_800_NS::counting_iterator<unsigned long, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::transform_output_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::write_output_op<thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int> > >, thrust::THRUST_200500_600_700_800_NS::discard_iterator<unsigned long> >, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, false>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::NullType, int, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple>(thrust::THRUST_200500_600_700_800_NS::transform_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::construct_key_flag_op, thrust::THRUST_200500_600_700_800_NS::counting_iterator<unsigned long, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::transform_output_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::write_output_op<thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int> > >, thrust::THRUST_200500_600_700_800_NS::discard_iterator<unsigned long> >, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, false>, int, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::NullType, int) [0x1246bf]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:cudaError thrust::THRUST_200500_600_700_800_NS::cuda_cub::launcher::triple_chevron::doit_host<void (*)(thrust::THRUST_200500_600_700_800_NS::transform_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::construct_key_flag_op, thrust::THRUST_200500_600_700_800_NS::counting_iterator<unsigned long, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::transform_output_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::write_output_op<thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int> > >, thrust::THRUST_200500_600_700_800_NS::discard_iterator<unsigned long> >, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, false>, int, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::NullType, int), thrust::THRUST_200500_600_700_800_NS::transform_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::construct_key_flag_op, thrust::THRUST_200500_600_700_800_NS::counting_iterator<unsigned long, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::transform_output_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::write_output_op<thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int> > >, thrust::THRUST_200500_600_700_800_NS::discard_iterator<unsigned long> >, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, false>, int, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::NullType, int>(void (*)(thrust::THRUST_200500_600_700_800_NS::transform_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::construct_key_flag_op, thrust::THRUST_200500_600_700_800_NS::counting_iterator<unsigned long, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::transform_output_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::write_output_op<thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int> > >, thrust::THRUST_200500_600_700_800_NS::discard_iterator<unsigned long> >, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, false>, int, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op, cub::CUB_200500_600_700_800_NS::NullType, int), thrust::THRUST_200500_600_700_800_NS::transform_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::construct_key_flag_op, thrust::THRUST_200500_600_700_800_NS::counting_iterator<unsigned long, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default>, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, thrust::THRUST_200500_600_700_800_NS::use_default> const&, thrust::THRUST_200500_600_700_800_NS::transform_output_iterator<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::write_output_op<thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int> > >, thrust::THRUST_200500_600_700_800_NS::discard_iterator<unsigned long> > const&, cub::CUB_200500_600_700_800_NS::ScanTileState<thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_tuple, false> const&, int const&, thrust::THRUST_200500_600_700_800_NS::system::detail::generic::key_flag_scan_op const&, cub::CUB_200500_600_700_800_NS::NullType const&, int const&) const [clone .isra.0] [0x12e3f4]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:void thrust::THRUST_200500_600_700_800_NS::system::detail::generic::shuffle_copy<thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int> >, thrust::THRUST_200500_600_700_800_NS::random::linear_congruential_engine<unsigned int, 48271u, 0u, 2147483647u>&>(thrust::THRUST_200500_600_700_800_NS::execution_policy<thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base> >&, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::pointer<int, thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::use_default, thrust::THRUST_200500_600_700_800_NS::use_default> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int> >, thrust::THRUST_200500_600_700_800_NS::random::linear_congruential_engine<unsigned int, 48271u, 0u, 2147483647u>&) [0x14d83f]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:void thrust::THRUST_200500_600_700_800_NS::system::detail::generic::shuffle<thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base>, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int> >, thrust::THRUST_200500_600_700_800_NS::random::linear_congruential_engine<unsigned int, 48271u, 0u, 2147483647u>&>(thrust::THRUST_200500_600_700_800_NS::execution_policy<thrust::THRUST_200500_600_700_800_NS::detail::execute_with_allocator<c2h::checked_cuda_allocator<char>, thrust::THRUST_200500_600_700_800_NS::cuda_cub::execute_on_stream_base> >&, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int> >, thrust::THRUST_200500_600_700_800_NS::detail::normal_iterator<thrust::THRUST_200500_600_700_800_NS::device_ptr<int> >, thrust::THRUST_200500_600_700_800_NS::random::linear_congruential_engine<unsigned int, 48271u, 0u, 2147483647u>&) [0x14dabf]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:thrust::THRUST_200500_600_700_800_NS::detail::vector_base<int, c2h::checked_cuda_allocator<int> > make_shuffled_key_ranks_vector<int>(int, c2h::seed_t) [0x163161]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:void C_A_T_C_H_T_E_M_P_L_A_T_E_T_E_S_T_F_U_N_C_0<metal::list<unsigned int> >() [0x114c0c]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:Catch::RunContext::invokeActiveTestCase() [0x440e2]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) [0x5c547]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:Catch::RunContext::runTest(Catch::TestCase const&) [0x6522a]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:Catch::Session::runInternal() [0x6f6af]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:Catch::Session::run() [0x6fc9d]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame:main [0x33675]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0
=========     Host Frame: [0x23a8f]
=========                in /usr/lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:__libc_start_main [0x23b48]
=========                in /usr/lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:_start [0x3ae34]
=========                in /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0

How to Reproduce

compute-sanitizer --tool initcheck --check-device-heap yes --leak-check full --padding 512 --track-stream-ordered-races all --check-warpgroup-mma yes --require-cuda-init no --check-exit-code yes --error-exitcode 1 --nvtx true /home/coder/cccl/build/cuda12.4-gcc13/cub-cpp17/bin/cub.cpp17.test.device_merge_sort.lid_0

Expected behavior

No diagnostic emitted.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

@alliepiper alliepiper added the bug Something isn't working right. label Jun 21, 2024
@alliepiper alliepiper changed the title [BUG]: Uninitialized __global__ memory read of size 8 bytes in CUB device merge sort test [BUG]: Uninitialized __global__ memory read of size 8 bytes in CUB sorting test Jun 21, 2024
@alliepiper alliepiper changed the title [BUG]: Uninitialized __global__ memory read of size 8 bytes in CUB sorting test [BUG]: Uninitialized __global__ memory read of size 8 bytes in CUB sorting tests Jun 21, 2024
@lilohuang
Copy link

might be related to #1790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Status: Todo
Development

No branches or pull requests

2 participants