Skip to content

Commit

Permalink
Add warp sync for Cuda parallel reduce
Browse files Browse the repository at this point in the history
compute-sanitizer --tool=racecheck discovered a potential racecondition for Cuda parallel reductions (using range policy) where data was being updated on a single thread inside a warp, but the warp was not being synchronized before being read.
  • Loading branch information
tcclevenger committed Nov 28, 2023
1 parent 0d34280 commit 4d4a343
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,12 @@ class ParallelReduce<CombinedFunctorReducerType, Kokkos::RangePolicy<Traits...>,

if (CudaTraits::WarpSize < word_count.value) {
__syncthreads();
} else if (word_count.value > 1) {
// Inside cuda_single_inter_block_reduce_scan() above, shared[i] below
// might have been updated by a single thread within a warp without
// synchronization afterwards. Synchronize threads within warp to avoid
// potential racecondition.
__syncwarp(0xffffffff);
}

for (unsigned i = threadIdx.y; i < word_count.value; i += blockDim.y) {
Expand Down

0 comments on commit 4d4a343

Please sign in to comment.