Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PTX wrapping functions for TMA features #379

Merged
merged 7 commits into from Sep 13, 2023

Conversation

ahendriksen
Copy link
Contributor

@ahendriksen ahendriksen commented Aug 28, 2023

Description

closes #359

Add PTX wrapping functions for TMA features.

Checklist (still TODO)

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes. (There is a working draft of this functionality for the CUDA programming guide)

@ahendriksen ahendriksen force-pushed the add-cp-async-ptx-wrappers branch 3 times, most recently from d5a6e67 to 81fa05f Compare August 31, 2023 19:52
@ahendriksen ahendriksen marked this pull request as ready for review September 1, 2023 12:41
@ahendriksen ahendriksen requested review from a team as code owners September 1, 2023 12:41
@ahendriksen ahendriksen requested review from alliepiper and miscco and removed request for a team September 1, 2023 12:41
@ahendriksen ahendriksen force-pushed the add-cp-async-ptx-wrappers branch 3 times, most recently from e2f7979 to 70018f6 Compare September 5, 2023 15:08
@ahendriksen
Copy link
Contributor Author

I have:

  • removed superfluous empty lines
  • replaced __trap by assert in the tests
  • replaced bar.arrive_tx by cuda::device::arrive_tx
  • made it possible to skip kernel execution in the NVRTC tests 189bffe.
  • Made the experimental feature flag available on host compilation trajectory as well, following this discussion.

In terms of documentation, we can either rely on the CUDA programming guide, or perhaps add a section "Experimental API" in addition to the existing "Standard API" and "Extended API" sections.

This PR is now completely ready for (final?) review.

Copy link
Collaborator

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge after #358

@ahendriksen ahendriksen force-pushed the add-cp-async-ptx-wrappers branch 2 times, most recently from 98a7864 to 7028458 Compare September 12, 2023 13:48
Copy link
Contributor Author

@ahendriksen ahendriksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @miscco. I have implemented your feedback.

@miscco miscco merged commit abb7cb7 into NVIDIA:main Sep 13, 2023
466 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA]: Experimental PTX wrappers to expose TMA
3 participants