Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Experimental PTX wrappers to expose TMA #359

Closed
1 task done
ahendriksen opened this issue Aug 18, 2023 · 10 comments · Fixed by #379
Closed
1 task done

[FEA]: Experimental PTX wrappers to expose TMA #359

ahendriksen opened this issue Aug 18, 2023 · 10 comments · Fixed by #379
Labels
feature request New feature or request.

Comments

@ahendriksen
Copy link
Contributor

Is this a duplicate?

Area

libcu++

Is your feature request related to a problem? Please describe.

I would like to use TMA features, even if in a somewhat rough fashion using functions with a C-like signature that simply wrap PTX instructions.

Describe the solution you'd like

Location

Provide the header <cuda/experimental/ctk_12_features> that exposes a number of new PTX instructions, especially around TMA.

PTX instructions

  • cp.async.bulk: both shared to global and global to shared

  • fence.proxy.async.shared.cta

  • cp.async.bulk.commit.group

  • cp.async.bulk.wait.group.read

    • For initial exposure .read should be fine.
    • It has a template int argument that indicates how many groups to wait on.
  • cp.async.bulk.tensor.2d: both global to shared and shared to global

Higher dimensions: not sure if we should expose 3d, 4d, and 5d. Open to
suggestions.

mbarrier.arrive.expect_tx: This one has so many integration issues with the barrier::arrival_token that I have opened a separate PR #354 to get it included.

Deprecation policy

We can support these wrappers during the CTK 12 life-cycle, mark them deprecated for CTK 13 cycle, and remove them in the CTK 14 cycle.

Describe alternatives you've considered

Many alternatives were considered. It may not make sense to list them in this GitHub issue.

Additional context

No response

@ahendriksen ahendriksen added the feature request New feature or request. label Aug 18, 2023
@ahendriksen
Copy link
Contributor Author

ahendriksen commented Aug 18, 2023

@jrhemstad @griwes @gonzalobg, @vishalmehta1991 : I have created a minimal proposal to get preliminary support for some TMA instructions out there in libcu++.

If you could provide feedback on the location of the header, the deprecation policy, and how to go about this in general, that would be great!

@miscco
Copy link
Collaborator

miscco commented Aug 18, 2023

Personally I am against experimental headers. We just recently added an experimental feature by guarding it behind a feature flag (<cuda/memory_resource>).

That way, when the experimental phase is over a user has nothing to do as the only code change is removal of the feature flag guard.

@miscco
Copy link
Collaborator

miscco commented Aug 18, 2023

Moreover, I see no benefit in experimental features if it isn't clear that we can break them at any time. (Obviously with clear justification and a replacement)

@gonzalobg
Copy link
Collaborator

I think it makes sense to move the header out of experimental, since the plan is to not evolve and therefore not break these APIs.

@jrhemstad
Copy link
Collaborator

@miscco this is a stop-gap implementation that we fully intend to remove in the future.

@ahendriksen
Copy link
Contributor Author

For each of the exposed instructions (perhaps except the 2D bulk one), there is a 1:1 correspondence to the planned exposure in #38. So these features should be deprecatable in a painless fashion.

@miscco
Copy link
Collaborator

miscco commented Aug 18, 2023

I would rather not start adding experimental headers, but use a feature flag. That is much easier to remove

@gonzalobg
Copy link
Collaborator

@miscco this is a stop-gap implementation that we fully intend to remove in the future.

@jrhemstad the reason this stop-gap implementation is being added, is to teach users to use it in the programming guide. We can't never remove this; it would break users that read the docs and use these APIs.

@jrhemstad
Copy link
Collaborator

It's no different than any other API in that we are free to deprecate and remove it at a major version change, right?

@gonzalobg
Copy link
Collaborator

You can try, but given the time frame between these wrappers and their replacement landing, and the extremely high demand that has been expressed for these wrappers, I'm skeptical we will ever be able to remove them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants