Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for sm_90a in <nv/target> API #1270

Closed
3 tasks
jrhemstad opened this issue Jan 10, 2024 · 2 comments · Fixed by #1411
Closed
3 tasks

Support for sm_90a in <nv/target> API #1270

jrhemstad opened this issue Jan 10, 2024 · 2 comments · Fixed by #1411

Comments

@jrhemstad
Copy link
Collaborator

jrhemstad commented Jan 10, 2024

Summary:

Currently, <nv/target> does not support sm_90a. The problem with sm_90a is not its non-numeric nature but the fact that it includes features that might not be supported in future architectures, which breaks the assumptions <nv/target> was built around. Adding support for sm_90a would require a significant redesign of the API.

Requested Feature:

<nv/target> should supported sm_90a.

It's important to differentiate between numerical architecture values and feature-specific checks.

Suggestions include introducing new macros like NV_HAS_FEATURE_SM90A or NV_HAS_FEATURE_SM100FOO for feature-specific checks.

The goal is to make it clear and meaningful when writing code for specific architectures and their features.

Next Steps:

  • Discuss the possibility of introducing a new API or mechanism for feature-specific checks.
  • Explore the implementation of NV_HAS_FEATURE_SM90A or similar macros.
  • Define the behavior of IS_EXACTLY and HAS_FEATURE in the context of feature-specific SMs.
@ahendriksen
Copy link
Contributor

Define the behavior of IS_EXACTLY and HAS_FEATURE in the context of feature-specific SMs.

The current behavior of IS_EXACTLY_SM90 is that it is true when compiling for SM90a (godbolt). To not break existing code (if it exists), NV_IS_EXACTLY_SM90 should continue to be true for SM90a.

It would confusing when -arch sm90a is specified to have NV_IS_EXACTLY_SM90 and NV_IS_EXACTLY_SM90a both be true at the same time. In the below code example, the SM90a specific code would never run:

NV_DISPATCH_TARGET(
  NV_IS_EXACTLY_SM_90, (...), 
  NV_IS_EXACTLY_SM_90a, (/* this will never run */),
)

The proposal by @dkolsen-pgi to therefore change the syntax for checking for architecture specific features makes a lot of sense. It would introduce the NV_HAS_FEATURE_SM90A macro that is true only when compiling for SM90a. It will not fix the above code example, but make it less confusing for users why it doesn't work. We would get:

NV_DISPATCH_TARGET(
  NV_IS_EXACTLY_SM_90, (...),
  NV_HAS_FEATURE_SM_90a, (/* this will still not run */),
)
// The fix is to reorder the dispatch targets:
NV_DISPATCH_TARGET(
  NV_HAS_FEATURE_SM_90a, (/* this will run on -arch sm90a */),
  NV_IS_EXACTLY_SM_90, (/* this will run on -arch sm90 */),
)

As noted in internal discussion by @wmaxey, the NV_HAS_FEATURE_SM90A has exactly the same behavior as a hypothetical NV_IS_EXACTLY_SM90a macro would have, but just different syntax.

Tagging @gonzalobg

@ahendriksen
Copy link
Contributor

One thing that I see coming up is that a feature is arch-specific in multiple architectures (I can name at least one PTX instruction). To use this feature, I don't want to do:

NV_DISPATCH_TARGET(
  NV_HAS_FEATURE_SM_90a, (/* Code block X. This will run on -arch sm90a */),
  NV_HAS_FEATURE_SM_100a, (/* Repeat of code block X. This will run on -arch sm100a */),
  NV_PROVIDES_SM_90, (/* Code block Y. This will run on -arch sm90 and -arch sm100 */),
)

It would be great if we could have something like this:

NV_DISPATCH_TARGET(
  NV_HAS_FEATURE_SM_90a || NV_HAS_FEATURE_SM_100a, (
         /* Code block X. This will run on -arch sm90a and on -arch sm100a */
  ),
  NV_PROVIDES_SM_90, (/* Code block Y. This will run on -arch sm90 and -arch sm100 */),
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants