-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PTX: Add helper functions for dsmem #1336
Conversation
- __as_ptr_dsmem - __from_ptr_{smem, dsmem, remote_dsmem, gmem}
c363edd
to
f60e347
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h
Outdated
Show resolved
Hide resolved
libcudacxx/test/libcudacxx/cuda/ptx/ptx.mbarrier.arrive.compile.pass.cpp
Outdated
Show resolved
Hide resolved
...libcxx/include/__cuda/ptx/parallel_synchronization_and_communication_instructions_mbarrier.h
Show resolved
Hide resolved
...libcxx/include/__cuda/ptx/parallel_synchronization_and_communication_instructions_mbarrier.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. I have implemented the requested changes. I have made a bigger update to the test code that removes the need for the non_eliminated_false
function.
...libcxx/include/__cuda/ptx/parallel_synchronization_and_communication_instructions_mbarrier.h
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h
Outdated
Show resolved
Hide resolved
libcudacxx/test/libcudacxx/cuda/ptx/ptx.mbarrier.arrive.compile.pass.cpp
Outdated
Show resolved
Hide resolved
...libcxx/include/__cuda/ptx/parallel_synchronization_and_communication_instructions_mbarrier.h
Outdated
Show resolved
Hide resolved
libcudacxx/test/libcudacxx/cuda/ptx/ptx.mbarrier.arrive.compile.pass.cpp
Show resolved
Hide resolved
libcudacxx/test/libcudacxx/cuda/ptx/ptx.mbarrier.wait.compile.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/test/libcudacxx/cuda/ptx/ptx.mbarrier.arrive.compile.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/test/libcudacxx/cuda/ptx/ptx.red.async.compile.pass.cpp
Outdated
Show resolved
Hide resolved
libcudacxx/test/libcudacxx/cuda/ptx/ptx.st.async.compile.pass.cpp
Outdated
Show resolved
Hide resolved
The newer mbarrier.arrive instructions accept .shared::cta instead of .shared. Updated for clarity.
The spurious newlines have been removed. One minor addition is the use of |
Is this one good to merge? I have added a follow-up PR: #1341 |
Description
This PR prepares the
cuda::ptx
code for the addition of more instructions. It adds:cuda::ptx::n32
integral constant type that can be used with"n"
annotated inline PTX instructions.__as_ptr_dsmem
function for pointers that may point to local or remote distributed shared memory__from_ptr_*
functions that map back from a state-space specific pointer to a generic pointer.In addition, the error functions that are used to generate link-time errors in case of an architecture mismatch are now renamed and all return void. While writing new instruction wrappers, it turned out to be unsustainable to have the error functions generate a return value, especially with templates.
Finally, there are some whitespace and comments changes that make it easier to contribute code in the future.
Checklist