Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTX: Add helper functions for dsmem #1336

Merged
merged 17 commits into from
Feb 6, 2024
Merged

Conversation

ahendriksen
Copy link
Contributor

Description

This PR prepares the cuda::ptx code for the addition of more instructions. It adds:

  • The cuda::ptx::n32 integral constant type that can be used with "n" annotated inline PTX instructions.
  • The __as_ptr_dsmem function for pointers that may point to local or remote distributed shared memory
  • The __from_ptr_* functions that map back from a state-space specific pointer to a generic pointer.

In addition, the error functions that are used to generate link-time errors in case of an architecture mismatch are now renamed and all return void. While writing new instruction wrappers, it turned out to be unsustainable to have the error functions generate a return value, especially with templates.

Finally, there are some whitespace and comments changes that make it easier to contribute code in the future.

Checklist

  • New or existing tests cover these changes.
  • No docs changes needed.

Copy link

copy-pr-bot bot commented Feb 5, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits

Copy link
Contributor Author

@ahendriksen ahendriksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I have implemented the requested changes. I have made a bigger update to the test code that removes the need for the non_eliminated_false function.

The newer mbarrier.arrive instructions accept .shared::cta instead of
.shared. Updated for clarity.
@ahendriksen
Copy link
Contributor Author

The spurious newlines have been removed. One minor addition is the use of .shared::cta in mbarrier.arrive for clarity.

@ahendriksen ahendriksen mentioned this pull request Feb 6, 2024
2 tasks
@ahendriksen
Copy link
Contributor Author

Is this one good to merge? I have added a follow-up PR: #1341

@miscco miscco merged commit ac83b5f into NVIDIA:main Feb 6, 2024
538 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants