[REFACTOR][RUNTIME] Relocate nvtx.h to tvm/support/cuda and make it header-only#19621
Merged
Merged
Conversation
…eader-only The NVTXScopedRange utility is a thin RAII wrapper over nvtxRangePush/Pop with a no-op fallback when NVTX is not enabled. The two function bodies and the conditional include of <nvtx3/nvToolsExt.h> fit naturally inline in the header, eliminating the separate translation unit and its TVM_RUNTIME_DLL export annotations. - Move include/tvm/runtime/nvtx.h to include/tvm/support/cuda/nvtx.h under namespace tvm::support; delete src/runtime/nvtx.cc. - Inline the constructor/destructor; gate the real-vs-stub split with TVM_NVTX_ENABLED in the header. - Switch the CMake gate from a per-file COMPILE_DEFINITIONS on nvtx.cc to a global add_compile_definitions(TVM_NVTX_ENABLED=1) when USE_CUDA AND USE_NVTX, so every TU that includes the header agrees on the definition. - Update the three call-site files (vm.cc, paged_kv_cache.cc, attn_utils.h) to the new include path and qualify NVTXScopedRange as support::NVTXScopedRange.
Contributor
There was a problem hiding this comment.
Code Review
This pull request refactors the NVTX scoped range utility to be header-only, moving it from the tvm::runtime namespace to tvm::support and relocating the header to include/tvm/support/cuda/nvtx.h. This change eliminates the need for a separate source file implementation and simplifies compilation. The review feedback points out a potential segmentation fault if a null pointer is passed to nvtxRangePush in the inline constructor, suggesting a safe fallback to an empty string.
Per the in-tree convention: - Member functions defined inside the class body are implicitly inline; drop the redundant `inline` keyword. - For explicit inline at definition sites outside the class, use TVM_FFI_INLINE instead of the raw `inline` keyword.
… include web/emcc/wasm_runtime.cc directly #includes individual .cc files to assemble a single TU for emcc. Now that src/runtime/nvtx.cc is gone (folded into the header-only include/tvm/support/cuda/nvtx.h), drop the stale include. The new inline header is already pulled in transitively through vm.cc and paged_kv_cache.cc which the amalgamation also includes.
spectrometerHBH
approved these changes
May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The NVTXScopedRange utility is a thin RAII wrapper over nvtxRangePush/Pop
with a no-op fallback when NVTX is not enabled. The two function bodies
and the conditional include of
<nvtx3/nvToolsExt.h>fit naturally inlinein the header, eliminating the separate translation unit and its
TVM_RUNTIME_DLLexport annotations.include/tvm/runtime/nvtx.htoinclude/tvm/support/cuda/nvtx.hunder namespace
tvm::support; deletesrc/runtime/nvtx.cc.TVM_NVTX_ENABLEDin the header.COMPILE_DEFINITIONSonnvtx.ccto a globaladd_compile_definitions(TVM_NVTX_ENABLED=1)when
USE_CUDA AND USE_NVTX, so every TU that includes the headeragrees on the definition.
vm.cc,paged_kv_cache.cc,attn_utils.h) to the new include path and qualifyNVTXScopedRangeas
support::NVTXScopedRange.