[Core] Report CUDA versions when NVRTC compilation fails#2842
Conversation
When NVRTC kernel compilation fails, detect whether the linked NVRTC library and the CUDA headers used for compilation are from different CUDA versions, and if so emit an actionable note to stderr pointing the user toward NVTE_CUDA_INCLUDE_DIR / CUDA_HOME / LD_LIBRARY_PATH. The header version is obtained by compiling a tiny probe program that embeds CUDA_VERSION (from cuda.h) into a static_assert failure message, so the macro is resolved by the actual preprocessor rather than by parsing header text. All probe failures are silent; the check is purely informational and never causes a premature error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com>
Still buggy, include_directory_version returns CUDA runtime version instead of header version. Signed-off-by: Tim Moon <tmoon@nvidia.com>
The NVRTC probe approach was broken: NVRTC pre-defines CUDART_VERSION to its own version before processing any includes, so the probe always returned the NVRTC version regardless of the headers on the include path. Fix by reading cuda_runtime_api.h as text and parsing the "#define CUDART_VERSION <integer>" line directly. This is immune to NVRTC's internal macro management, and the format has been stable across all CUDA versions. Also decode raw CUDA version integers to "major.minor" strings in the error message for readability. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com>
Test that the CUDA include directory is found and that its version matches the compile-time CUDART_VERSION. Also export transformer_engine::cuda::* symbols and tighten the rtc export pattern in the version script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com>
Greptile SummaryThis PR improves NVRTC compilation failure diagnostics by reporting three CUDA versions — compile-time (
Confidence Score: 5/5Safe to merge — the new code only runs on the NVRTC compilation failure path, so it cannot affect the happy path or introduce regressions. All changes are confined to the diagnostic/error-reporting path triggered only when NVRTC compilation already fails. The version encoding arithmetic (major1000 + minor10) is consistent with the standard CUDA convention and with CUDA_VERSION. Error cases (missing headers, unparseable version, failed nvrtcVersion call) are handled gracefully with -1 / not-found rather than crashing. No new memory safety issues, no changes to the happy-path kernel compilation flow. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[nvrtcCompileProgram fails] --> B[Collect CUDA versions]
B --> C1["build_version = CUDA_VERSION (compile-time)"]
B --> C2["nvrtcVersion() → nvrtc_version"]
B --> C3["include_directory_version() → header_version"]
C3 --> D{include_dir empty?}
D -- yes --> E[return -1]
D -- no --> F[Open cuda_runtime_api.h]
F --> G{Parse CUDART_VERSION?}
G -- success --> H[return version int]
G -- failure --> I[return -1]
C1 & C2 & C3 --> J[Log all three versions via version_string]
J --> K{nvrtc_version != header_version?}
K -- yes --> L[Append mismatch warning with include_directory path]
K -- no --> M[Skip warning]
L & M --> N[Append NVRTC compile log]
N --> O[Print to stderr and throw]
Reviews (4): Last reviewed commit: "Merge branch 'main' into tmoon/nvrtc-ver..." | Re-trigger Greptile |
| return -1; | ||
| } | ||
|
|
||
| // Parse CUDART_VERSION from cuda_runtime_api.h. |
There was a problem hiding this comment.
I don't really like how we parse the header as a text file. However, when I tried compiling a test program with NVRTC it would override the header's CUDART_VERSION macro with the CUDA Runtime version.
|
/te-ci |
Suggestion from @ptrendx Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Test required exposing CUDA utility functions externally, which is beyond the scope of this work. Signed-off-by: Tim Moon <tmoon@nvidia.com>
|
/te-ci |
|
/te-ci |
Description
NVRTC compilation involves three CUDA versions:
libnvrtc.soIf the user's system is misconfigured, these CUDA versions may be inconsistent and cause strange errors (e.g. #1018). This PR reports each of the CUDA versions to help with debugging.
Closes #1018.
Type of change
Changes
Checklist: