[CUDA] FpA IntB Gemm Kernel Test #25109

tianleiwu · 2025-06-18T22:00:05Z

Enhance MatMulNBits CUDA kernel testing:
(1) Add a kernel testing for different cuda kernels used in MatMulNBits.
(2) Refactoring the gemm profiler to use cuda allocator
(2) Add verbose logging macros.
(3) Adjustments to speed up compiling when sm90 is excluded from build.

Example kernel test output:

add fpA intB gemm kernel test

d51f07f

tianleiwu marked this pull request as draft June 18, 2025 22:03

tianleiwu added 2 commits June 18, 2025 17:49

use size_t to avoid int overflow

339beb1

minor change

902eaa6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] FpA IntB Gemm Kernel Test #25109

[CUDA] FpA IntB Gemm Kernel Test #25109

tianleiwu commented Jun 18, 2025

Uh oh!

Uh oh!

[CUDA] FpA IntB Gemm Kernel Test #25109

Are you sure you want to change the base?

[CUDA] FpA IntB Gemm Kernel Test #25109

Conversation

tianleiwu commented Jun 18, 2025

Uh oh!

Uh oh!