Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cuda::ptx::st_async #1078

Merged
merged 4 commits into from
Nov 13, 2023
Merged

Add cuda::ptx::st_async #1078

merged 4 commits into from
Nov 13, 2023

Conversation

ahendriksen
Copy link
Contributor

Description

closes #1077

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Because the size can be either 32 or 64 bit, this can catch a lot of
errors.

For instance:

uint64_t * remote_buffer;
uint64_t * remote_bar;
cuda::ptx::st_async(remote_buffer, 1, remote_bar);

would previously use the .b32 path because the `1` is an integer and
determines the type resolution.

Now, this will result in a compiler error.

Resolution is to either (a) change the value type, or (b) change the
buffer type.

a)
uint64_t * remote_buffer;
cuda::ptx::st_async(remote_buffer, uint64_t(1), remote_bar);

b)
int32_t * remote_buffer;
cuda::ptx::st_async(remote_buffer, 1, remote_bar);
The type may be misleading on this one, so I added a note on alignment
of the destination address.
@ahendriksen ahendriksen requested review from a team as code owners November 9, 2023 17:35
@ahendriksen ahendriksen requested review from griwes and elstehle and removed request for a team November 9, 2023 17:35
@jrhemstad jrhemstad added the backport branch/2.3.x For backporting to the 2.3.x release branch label Nov 13, 2023
@miscco miscco merged commit 22a570d into NVIDIA:main Nov 13, 2023
517 checks passed
Copy link
Contributor

Successfully created backport PR for branch/2.3.x:

@jrhemstad jrhemstad removed the backport branch/2.3.x For backporting to the 2.3.x release branch label Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA]: Add PTX st.async
3 participants