Skip to content

[REFACTOR][RUNTIME] Relocate nvtx.h to tvm/support/cuda and make it header-only#19621

Merged
spectrometerHBH merged 3 commits into
apache:mainfrom
tqchen:tvm-nvtx-support-cuda
May 27, 2026
Merged

[REFACTOR][RUNTIME] Relocate nvtx.h to tvm/support/cuda and make it header-only#19621
spectrometerHBH merged 3 commits into
apache:mainfrom
tqchen:tvm-nvtx-support-cuda

Conversation

@tqchen

@tqchen tqchen commented May 27, 2026

Copy link
Copy Markdown
Member

Summary

The NVTXScopedRange utility is a thin RAII wrapper over nvtxRangePush/Pop
with a no-op fallback when NVTX is not enabled. The two function bodies
and the conditional include of <nvtx3/nvToolsExt.h> fit naturally inline
in the header, eliminating the separate translation unit and its
TVM_RUNTIME_DLL export annotations.

  • Move include/tvm/runtime/nvtx.h to include/tvm/support/cuda/nvtx.h
    under namespace tvm::support; delete src/runtime/nvtx.cc.
  • Inline the constructor/destructor; gate the real-vs-stub split with
    TVM_NVTX_ENABLED in the header.
  • Switch the CMake gate from a per-file COMPILE_DEFINITIONS on
    nvtx.cc to a global add_compile_definitions(TVM_NVTX_ENABLED=1)
    when USE_CUDA AND USE_NVTX, so every TU that includes the header
    agrees on the definition.
  • Update the three call-site files (vm.cc, paged_kv_cache.cc,
    attn_utils.h) to the new include path and qualify NVTXScopedRange
    as support::NVTXScopedRange.

…eader-only

The NVTXScopedRange utility is a thin RAII wrapper over
nvtxRangePush/Pop with a no-op fallback when NVTX is not enabled. The
two function bodies and the conditional include of <nvtx3/nvToolsExt.h>
fit naturally inline in the header, eliminating the separate translation
unit and its TVM_RUNTIME_DLL export annotations.

- Move include/tvm/runtime/nvtx.h to include/tvm/support/cuda/nvtx.h
  under namespace tvm::support; delete src/runtime/nvtx.cc.
- Inline the constructor/destructor; gate the real-vs-stub split with
  TVM_NVTX_ENABLED in the header.
- Switch the CMake gate from a per-file COMPILE_DEFINITIONS on nvtx.cc
  to a global add_compile_definitions(TVM_NVTX_ENABLED=1) when USE_CUDA
  AND USE_NVTX, so every TU that includes the header agrees on the
  definition.
- Update the three call-site files (vm.cc, paged_kv_cache.cc,
  attn_utils.h) to the new include path and qualify NVTXScopedRange as
  support::NVTXScopedRange.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the NVTX scoped range utility to be header-only, moving it from the tvm::runtime namespace to tvm::support and relocating the header to include/tvm/support/cuda/nvtx.h. This change eliminates the need for a separate source file implementation and simplifies compilation. The review feedback points out a potential segmentation fault if a null pointer is passed to nvtxRangePush in the inline constructor, suggesting a safe fallback to an empty string.

Comment thread include/tvm/support/cuda/nvtx.h
tqchen added 2 commits May 27, 2026 13:25
Per the in-tree convention:
- Member functions defined inside the class body are implicitly
  inline; drop the redundant `inline` keyword.
- For explicit inline at definition sites outside the class, use
  TVM_FFI_INLINE instead of the raw `inline` keyword.
… include

web/emcc/wasm_runtime.cc directly #includes individual .cc files to
assemble a single TU for emcc. Now that src/runtime/nvtx.cc is gone
(folded into the header-only include/tvm/support/cuda/nvtx.h), drop
the stale include. The new inline header is already pulled in
transitively through vm.cc and paged_kv_cache.cc which the amalgamation
also includes.
@spectrometerHBH spectrometerHBH merged commit 66a4bcd into apache:main May 27, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants