Support importing both nvfuser and nvfuser_direct modules#4722
Support importing both nvfuser and nvfuser_direct modules#4722
Conversation
|
!test |
|
Review updated until commit abafc0a Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
jjsjann123
left a comment
There was a problem hiding this comment.
LGTM.
OOC, I remember @wujingyue mentioning some singleton issues when we import both libs. Do we have a repro/issue on that?
The singleton issue was caused by cleaning up the communicator. I'm not sure what kind of errors might occur with other singletons. I'm fine cherry-pick this PR for the inference demo and keeping the two modules separate. |
Not suggesting to hold off from merging. If CI are passing and we have issues that we can work around, we should merge it as-is. Always easier to keep things in main. |
Yes. It's |
7b18ab6 to
2702f9c
Compare
2702f9c to
abafc0a
Compare
|
!test |
for two enhancements: 1. #4722 2. #4837 cc @kshitij12345 for Lightning-AI/lightning-thunder#2345 (comment)
This PR modifies `nvfuser` and `nvfuser_direct` extensions to allow both of them to be imported in the same script. * Change assertion to warning * Add `py::module_local()` to `DataType` enum that is common between both extensions. The `DataType` is now local to the individual extension rather than the global namespace. PR Stack: - NVIDIA#4722 **<<< This PR.** - NVIDIA#4676 - NVIDIA#4662
This PR adds `cutlass_nvfp4_scaled_mm` to the `nvfuser_direct` python bindings, which support nvfp4 gemm. PR Stack: - NVIDIA#4722 - NVIDIA#4676 **<<< This PR.** - NVIDIA#4662
for two enhancements: 1. NVIDIA#4722 2. NVIDIA#4837 cc @kshitij12345 for Lightning-AI/lightning-thunder#2345 (comment)
…4662) The API for FP8 and NVFP4 are different in SGLang. Example: ```python >>> import nvfuser_direct >>> from nvfuser_direct import nvf_cutlass >>> help(nvf_cutlass.nvfp4_blockwise_scaled_grouped_mm) # nvfp4_blockwise_scaled_grouped_mm(Tensor! output, Tensor a, Tensor b, Tensor a_blockscale, # Tensor b_blockscale, Tensor alphas, Tensor ab_strides, Tensor c_strides, Tensor # problem_sizes, Tensor expert_offsets, Tensor sf_offsets) -> () ``` PR Stack: - #4722 - #4676 - #4662 **<<< This PR.** TODOs: - [ ] Create unit test. Co-authored-by: jjsjann123 <jiej@nvidia.com>
This PR modifies
nvfuserandnvfuser_directextensions to allow both of them to be imported in the same script.py::module_local()toDataTypeenum that is common between both extensions.The
DataTypeis now local to the individual extension rather than the global namespace.PR Stack:
nvfuser_directpython bindings #4676nvfuser_directpython bindings #4662