-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
Something I wrote on Slack that might serve as input...
To explain on this particular issue: if you've asked --cuda-compute-capabilities=9.0 for example, the CUDA sanity check will check if
- device code is present for 9.0 cards. If not, depending on the sanity check configuration options, it will check if PTX code is available that would be compatible with 9.0 cards (i.e. PTX code for <= 9.0 can be JIT-compiled into 9.0 device code). Having either device code for 9.0 or PTX code for <= 9.0 means the code will at least run (though having only PTX code that then gets JIT compiled may have a performance impact compared to having actual precompiled 9.0 device code in the binary).
- PTX code is present for 9.0. Note that this is not essential to run it on a 9.0 card. It would be needed if you ever want to run this binary on a newer (say 10.0) card in the future though. I.e. it provides you forward compatibility, without having to rebuild you stack for 10.0 (but at the cost of potentially losing some optimization compared to pre-compiled 10.0 code). Since having PTX code is not essential to running it on a 9.0 card, you can tell the sanity check to accept the fact that this PTX code is missing with --cuda-sanity-check-accept-missing-ptx, as the hint suggests.
- there is no device code for architectures other than 9.0. Note that a binary with extra device code is bigger and there might be some extra selection overhead in jumping to the correct device code - though I would personally expect this to be negligible. Note that a binary with both 8.0 and 9.0 device code will run on a 9.0 card, no problem. This is just an optimization concern - and a very small one at that. But: strictly speaking the build system did something else than what you asked for (you asked for only 9.0 code), hence the sanity check tells you about that.
This was in response to a question by Terje about how to deal with a failing CUDA sanity check when building on top of EESSI. In EESSI, we have configured the CUDA sanity check to be very strict and it is often reasonable to relax that, but indeed it's important to have docs clearly explaining the implications.
Metadata
Metadata
Assignees
Labels
No labels