Skip to content

[Core] Report CUDA versions when NVRTC compilation fails#2842

Merged
ptrendx merged 10 commits into
NVIDIA:mainfrom
timmoon10:tmoon/nvrtc-version-check
May 13, 2026
Merged

[Core] Report CUDA versions when NVRTC compilation fails#2842
ptrendx merged 10 commits into
NVIDIA:mainfrom
timmoon10:tmoon/nvrtc-version-check

Conversation

@timmoon10
Copy link
Copy Markdown
Collaborator

Description

NVRTC compilation involves three CUDA versions:

  • Compile-time CUDA: used to compile Transformer Engine
  • CUDA Runtime: linked during runtime and visible to libnvrtc.so
  • Run-time CUDA headers: included in run-time NVRTC compilations

If the user's system is misconfigured, these CUDA versions may be inconsistent and cause strange errors (e.g. #1018). This PR reports each of the CUDA versions to help with debugging.

Closes #1018.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Report CUDA versions whe NVRTC compilation fails

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

timmoon10 and others added 4 commits April 4, 2026 01:54
When NVRTC kernel compilation fails, detect whether the linked NVRTC
library and the CUDA headers used for compilation are from different
CUDA versions, and if so emit an actionable note to stderr pointing
the user toward NVTE_CUDA_INCLUDE_DIR / CUDA_HOME / LD_LIBRARY_PATH.

The header version is obtained by compiling a tiny probe program that
embeds CUDA_VERSION (from cuda.h) into a static_assert failure message,
so the macro is resolved by the actual preprocessor rather than by
parsing header text.  All probe failures are silent; the check is
purely informational and never causes a premature error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Still buggy, include_directory_version returns CUDA runtime version instead of header version.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
The NVRTC probe approach was broken: NVRTC pre-defines CUDART_VERSION
to its own version before processing any includes, so the probe always
returned the NVRTC version regardless of the headers on the include path.

Fix by reading cuda_runtime_api.h as text and parsing the
"#define CUDART_VERSION <integer>" line directly. This is immune to
NVRTC's internal macro management, and the format has been stable across
all CUDA versions.

Also decode raw CUDA version integers to "major.minor" strings in the
error message for readability.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Test that the CUDA include directory is found and that its version
matches the compile-time CUDART_VERSION.

Also export transformer_engine::cuda::* symbols and tighten the rtc
export pattern in the version script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10 timmoon10 requested a review from Oleg-Goncharov April 7, 2026 00:48
@timmoon10 timmoon10 added the enhancement New feature or request label Apr 7, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 7, 2026

Greptile Summary

This PR improves NVRTC compilation failure diagnostics by reporting three CUDA versions — compile-time (CUDA_VERSION), run-time NVRTC (nvrtcVersion), and run-time CUDA headers (include_directory_version) — when a compilation error occurs, along with a mismatch warning when the NVRTC and header versions differ.

  • Adds include_directory_version() to cuda_runtime.cpp/h, which parses CUDART_VERSION from cuda_runtime_api.h at the configured include path.
  • In rtc.cpp, the NVRTC failure handler now prepends version info and an optional mismatch warning to the diagnostic log before surfacing the compile error.
  • Version integers are decoded via a local version_string lambda that returns \"<not found>\" for -1 (undetected) and \"major.minor\" otherwise.

Confidence Score: 5/5

Safe to merge — the new code only runs on the NVRTC compilation failure path, so it cannot affect the happy path or introduce regressions.

All changes are confined to the diagnostic/error-reporting path triggered only when NVRTC compilation already fails. The version encoding arithmetic (major1000 + minor10) is consistent with the standard CUDA convention and with CUDA_VERSION. Error cases (missing headers, unparseable version, failed nvrtcVersion call) are handled gracefully with -1 / not-found rather than crashing. No new memory safety issues, no changes to the happy-path kernel compilation flow.

No files require special attention.

Important Files Changed

Filename Overview
transformer_engine/common/util/cuda_runtime.cpp Adds include_directory_version() that reads cuda_runtime_api.h and parses CUDART_VERSION; correct error handling and edge cases for missing/unparseable headers.
transformer_engine/common/util/cuda_runtime.h Declaration of include_directory_version() with accurate doc-comment; no issues.
transformer_engine/common/util/rtc.cpp NVRTC failure handler extended to prepend version diagnostics and a conditional mismatch warning; version encoding arithmetic matches the standard CUDA convention.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[nvrtcCompileProgram fails] --> B[Collect CUDA versions]
    B --> C1["build_version = CUDA_VERSION (compile-time)"]
    B --> C2["nvrtcVersion() → nvrtc_version"]
    B --> C3["include_directory_version() → header_version"]
    C3 --> D{include_dir empty?}
    D -- yes --> E[return -1]
    D -- no --> F[Open cuda_runtime_api.h]
    F --> G{Parse CUDART_VERSION?}
    G -- success --> H[return version int]
    G -- failure --> I[return -1]
    C1 & C2 & C3 --> J[Log all three versions via version_string]
    J --> K{nvrtc_version != header_version?}
    K -- yes --> L[Append mismatch warning with include_directory path]
    K -- no --> M[Skip warning]
    L & M --> N[Append NVRTC compile log]
    N --> O[Print to stderr and throw]
Loading

Reviews (4): Last reviewed commit: "Merge branch 'main' into tmoon/nvrtc-ver..." | Re-trigger Greptile

Comment thread transformer_engine/common/util/rtc.cpp
Comment thread transformer_engine/common/util/cuda_runtime.cpp
Comment thread transformer_engine/common/libtransformer_engine.version Outdated
return -1;
}

// Parse CUDART_VERSION from cuda_runtime_api.h.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like how we parse the header as a text file. However, when I tried compiling a test program with NVRTC it would override the header's CUDART_VERSION macro with the CUDA Runtime version.

@timmoon10
Copy link
Copy Markdown
Collaborator Author

/te-ci

Comment thread transformer_engine/common/util/rtc.cpp Outdated
timmoon10 and others added 3 commits April 7, 2026 10:51
Suggestion from @ptrendx

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Test required exposing CUDA utility functions externally, which is beyond the scope of this work.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10
Copy link
Copy Markdown
Collaborator Author

/te-ci

@ptrendx
Copy link
Copy Markdown
Member

ptrendx commented May 12, 2026

/te-ci

@ptrendx ptrendx merged commit c3a1d30 into NVIDIA:main May 13, 2026
38 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error in NVRTC compilation

2 participants