Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Soundness bugfix for barrier<thread_scope_block> on sm_70 #300

Merged
merged 1 commit into from
Sep 15, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions include/cuda/std/barrier
Original file line number Diff line number Diff line change
Expand Up @@ -189,15 +189,17 @@ public:
: "r"(static_cast<std::uint32_t>(__cvta_generic_to_shared(&__barrier)))
: "memory");
#else
unsigned int __activeA = __match_any_sync(__activemask(), __update);
unsigned int __activeB = __match_any_sync(__activemask(), reinterpret_cast<std::uintptr_t>(&__barrier));
unsigned int __mask = __activemask();
unsigned int __activeA = __match_any_sync(__mask, __update);
unsigned int __activeB = __match_any_sync(__mask, reinterpret_cast<std::uintptr_t>(&__barrier));
unsigned int __active = __activeA & __activeB;
int __inc = __popc(__active) * __update;

unsigned __laneid;
asm volatile ("mov.u32 %0, %laneid;" : "=r"(__laneid));
asm ("mov.u32 %0, %laneid;" : "=r"(__laneid));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing the volatile here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is not necessary. Two "calls" to this assembly instruction from this same thread always "return" the same value, so eliding one is a valid transformation for the compiler to do.

This assembly statement does not modify any memory, so it does not need a "memory" clobber either.

int __leader = __ffs(__active) - 1;

// All threads in mask synchronize here, establishing cummulativity to the __leader:
__syncwarp(__mask);
if(__leader == __laneid)
{
__token = __barrier.arrive(__inc);
Expand Down