[RFC] Handle sm_* which is no longer supported by CUDA / ptxas exec check or configure check? #30

tob2 · 2022-03-03T10:26:59Z

@tschwinge @vries
From https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591154.html

In PR97348, we ran into the problem that recent CUDA dropped support for
sm_30, which inhibited the build when building with CUDA bin in the path,
because the nvptx-tools assembler uses CUDA's ptxas to do ptx verification.
...
Deal with PR97348 in the simplest way possible: when calling the assembler for
sm_30, specify --no-verify.

This has the unfortunate effect that after fixing PR104758 by building
libraries with sm_30, the libraries are no longer verified. This can be
improved upon by:

adding a configure test in gcc that tests if CUDA supports sm_30, and
if so disabling this patch

dealing with this in nvptx-tools somehow, either:

detect at ptxas execution time that it doesn't support sm_30, or

detect this at nvptx-tool configure time.

Newer versions of CUDA no longer support sm_30, and nvptx-tools as currently doesn't handle that gracefully when verifying ( SourceryTools/nvptx-tools#30 ). There's a --no-verify work-around in place in ASM_SPEC, but that one doesn't work when using -Wa,--verify on the command line. Use a more robust workaround: verify using sm_35 when misa=sm_30 is specified (either implicitly or explicitly). Tested on nvptx. gcc/ChangeLog: 2022-03-30 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.h (ASM_SPEC): Use "-m sm_35" for -misa=sm_30.

…itecture based products is dropped" This resolves #30 "[RFC] Handle sm_* which is no longer supported by CUDA / ptxas exec check or configure check?". Suggested-by: Tom de Vries <tdevries@suse.de>

tschwinge · 2022-04-12T07:46:51Z

Another thing that we could do: instead of invoking command-line ptxas for verification, we could dlopen the CUDA Driver (libcuda), and use that one to load and thus verify the PTX code (just like libgomp nvptx plugin and nvptx-tools nvptx-run are doing): https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html, etc.
That way, we could first check which .target values are actually supported by the current CUDA installation, and only adjust if actually necessary. (Well, or probably not at all, since all current versions of the CUDA Driver actually still do load .target sm_30 code?)
We'd avoid using the "narrow" ptxas command-line interface, and instead get the "full" API. (The latter hopefully doesn't bring its own set of compatibility concerns? Not expecting any, given that the libgomp nvptx plugin and nvptx-tools nvptx-run have been stable for a long time in that regard.)

How does the CUDA Driver API differ/relate to the PTX Compiler APIs, https://docs.nvidia.com/cuda/ptx-compiler-api/?

@tob2, @vries

vries · 2022-04-12T08:46:03Z

Another thing that we could do: instead of invoking command-line ptxas for verification, we could dlopen the CUDA Driver (libcuda), and use that one to load and thus verify the PTX code (just like libgomp nvptx plugin and nvptx-tools nvptx-run are doing): https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html, etc.
That way, we could first check which .target values are actually supported by the current CUDA installation, and only adjust if actually necessary. (Well, or probably not at all, since all current versions of the CUDA Driver actually still do load .target sm_30 code?) We'd avoid using the "narrow" ptxas command-line interface, and instead get the "full" API. (The latter hopefully doesn't bring its own set of compatibility concerns? Not expecting any, given that the libgomp nvptx plugin and nvptx-tools nvptx-run have been stable for a long time in that regard.)

I think I follow your thinking: using the cuda driver api means you're using a defined api rather than the defacto ptxas interface. Note that what you describe checks what .target values are supported by the CUDA driver api, not the CUDA runtime api (in other words, ptxas and friends). Those are separate things.

How does the CUDA Driver API differ/relate to the PTX Compiler APIs, https://docs.nvidia.com/cuda/ptx-compiler-api/?

Ah, did know that one. Well, after I read the introduction part, mainly this looks like the CUDA runtime api equivalent of the CUDA driver api part that does the ptx compilation.

At first glance, it sounds like that part you could use instead of ptxas. But, it's also a recent addition, meaning you'd also require a modern CUDA installation, which kind of could defeat the purpose, or give a very narrow range of CUDAs for which this setup would actually be useful.

Note that it's not a good idea to use the driver API because that requires one to have a driver installed, which doesn't work in a scenario where you use say a server class machine to do heavy toolchain and/or apps builds, and then execute on a separate machine with bulky video cards. We don't want to require (for verification purposes) installing the driver on the server class machine.

..., so that things keep working with CUDA 12.0+.

tob2 mentioned this issue Mar 3, 2022

Issue 30: Ignore not-supported sm_* error without --verify #31

Open

tschwinge mentioned this issue Apr 7, 2022

as: Deal with CUDA 11.0, "Support for Kepler 'sm_30' and 'sm_32' architecture based products is dropped" #33

Merged

tschwinge closed this as completed in #33 Apr 12, 2022

tschwinge reopened this Apr 12, 2022

tschwinge added a commit that referenced this issue Sep 4, 2023

Bump architecture baseline from sm_35 to sm_50 [#30]

1321670

..., so that things keep working with CUDA 12.0+.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Handle sm_* which is no longer supported by CUDA / ptxas exec check or configure check? #30

[RFC] Handle sm_* which is no longer supported by CUDA / ptxas exec check or configure check? #30

tob2 commented Mar 3, 2022

tschwinge commented Apr 12, 2022

vries commented Apr 12, 2022

[RFC] Handle sm_* which is no longer supported by CUDA / ptxas exec check or configure check? #30

[RFC] Handle sm_* which is no longer supported by CUDA / ptxas exec check or configure check? #30

Comments

tob2 commented Mar 3, 2022

tschwinge commented Apr 12, 2022

vries commented Apr 12, 2022