Skip to content

[Contrib] Fix CUDA contrib build after FFI/header cleanups#19539

Merged
tlopex merged 1 commit into
apache:mainfrom
MasterJH5574:tvm-dev/2026-05-11-contrib
May 12, 2026
Merged

[Contrib] Fix CUDA contrib build after FFI/header cleanups#19539
tlopex merged 1 commit into
apache:mainfrom
MasterJH5574:tvm-dev/2026-05-11-contrib

Conversation

@MasterJH5574
Copy link
Copy Markdown
Contributor

Six CUDA sources in src/runtime/contrib used LOG(FATAL) via transitive includes that #19483 trimmed; add the explicit <tvm/runtime/logging.h> include to thrust.cu, attention_kernels.cu, and the four cutlass kernel headers (fp16/fp8 sm90/sm100, gemm_runner, fp8_groupwise_scaled_gemm).

cache_kernels.cu used the bare Array{...} alias that #19483 removed; switch to ffi::Array{...}.

attention_kernels.cu registered FFI functions whose parameters were raw DLTensor*; the new reflection registry requires TypeSchema, so wrap both TVM_FFI_STATIC_INIT_BLOCK registrations to take Tensor and forward to the unchanged launchers via GetDLTensorPtr() (with const_cast for the output tensors, matching the mt_random_engine / cudnn pattern).

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds logging headers to several CUTLASS and VLLM source files and refactors VLLM kernel registrations to use the Tensor type. The reviewer identified that using the unqualified Tensor type in attention_kernels.cu is ambiguous and will likely cause compilation errors because tvm::runtime::Tensor lacks the GetDLTensorPtr() method. It is recommended to explicitly use ffi::Tensor to resolve this issue.

Comment thread src/runtime/contrib/vllm/attention_kernels.cu
Comment thread src/runtime/contrib/vllm/attention_kernels.cu
Comment thread src/runtime/contrib/vllm/attention_kernels.cu
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates several CUDA contrib modules (CUTLASS, Thrust, and vLLM) by adding the tvm/runtime/logging.h header and migrating vLLM kernels to use the new FFI Tensor and ffi::Array types. The changes include updating function signatures and registration logic to handle the transition from raw DLTensor pointers. Feedback suggests investigating if the Tensor class provides a more direct way to access mutable DLTensor pointers to avoid the verbose const_cast currently used in the attention kernels.

Comment thread src/runtime/contrib/vllm/attention_kernels.cu
Six CUDA sources in src/runtime/contrib used LOG(FATAL) via transitive
includes that apache#19483 trimmed; add the explicit <tvm/runtime/logging.h>
include to thrust.cu, attention_kernels.cu, and the four cutlass kernel
headers (fp16/fp8 sm90/sm100, gemm_runner, fp8_groupwise_scaled_gemm).

cache_kernels.cu used the bare Array{...} alias that apache#19483 removed;
switch to ffi::Array<Tensor>{...}.

attention_kernels.cu registered FFI functions whose parameters were raw
DLTensor*; the new reflection registry requires TypeSchema, so wrap
both TVM_FFI_STATIC_INIT_BLOCK registrations to take Tensor and forward
to the unchanged launchers via GetDLTensorPtr() (with const_cast for
the output tensors, matching the mt_random_engine / cudnn pattern).
@MasterJH5574 MasterJH5574 force-pushed the tvm-dev/2026-05-11-contrib branch from c0122a6 to 90683af Compare May 11, 2026 21:48
@tlopex tlopex merged commit dfc9fc0 into apache:main May 12, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants