You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The change modifies the compute capability check from a range-based approach (>= 10.0 and < 12.0) to an exact match (== 10.0). This significantly narrows the supported architectures and may exclude valid compute capabilities like 10.1, 10.2, 11.0, 11.1 that were previously supported. The rationale for this restriction change should be validated, especially given the referenced error from nvfp4_scaled_mm.cu:255.
ifnotmicroarchitecture_is(10, 0):
pytest.skip(
reason="Nvfp4 Requires compute capability 10.0, other arches are not tested.",
allow_module_level=True,
)
The PR description mentions an exception from runGemm at nvfp4_scaled_mm.cu:255, but doesn't provide details about the root cause. The change to restrict to only compute capability 10.0 may be a workaround rather than a proper fix. The underlying issue causing failures on other architectures should be investigated to determine if this restriction is appropriate or if a more comprehensive solution is needed.
ifnotmicroarchitecture_is(10, 0):
pytest.skip(
reason="Nvfp4 Requires compute capability 10.0, other arches are not tested.",
allow_module_level=True,
)
This PR restricts nvfp4 GEMM tests to run only on compute capability 10.0 devices, preventing runtime errors on unsupported architectures. The change follows the same pattern as PR #5810 which fixed similar issues for mxfp8 GEMM tests.
Changes:
Replaced manual compute capability check with microarchitecture_is(10, 0) helper function
Updated skip message to clarify that only 10.0 is tested
Made the check more restrictive: old logic allowed 10.0-11.x, new logic allows only 10.0
Note: This is more restrictive than the previous implementation which allowed compute capabilities 10.x up to (but not including) 12.0. The commit message confirms this is intentional ("only test nvfp4_gemm on 10.0 device").
Confidence Score: 5/5
This PR is safe to merge with minimal risk
The change is a straightforward test infrastructure update that prevents test failures on unsupported hardware. It follows an established pattern from PR skip test cutlass mxfp8_gemm on unsupported arches #5810, uses an existing utility function correctly, and only affects test execution (not production code). The more restrictive logic is intentional as confirmed by the commit message.
No files require special attention
Important Files Changed
Filename
Overview
tests/python/direct/test_cutlass_nvfp4_gemm.py
Updated architecture check to restrict testing to compute capability 10.0 only, consistent with PR #5810 pattern
Sequence Diagram
sequenceDiagram
participant Test as test_cutlass_nvfp4_gemm.py
participant Utils as direct_utils.microarchitecture_is
participant CUDA as torch.cuda
participant Pytest as pytest.skip
Note over Test: Module import phase
Test->>Utils: microarchitecture_is(10, 0)
Utils->>CUDA: get_device_properties(current_device())
CUDA-->>Utils: device properties (major, minor)
Utils->>Utils: Check: major == 10 and minor == 0
alt Not compute capability 10.0
Utils-->>Test: False
Test->>Pytest: skip(reason="...", allow_module_level=True)
Note over Pytest: Tests skipped for this module
else Compute capability 10.0
Utils-->>Test: True
Note over Test: Proceed with test execution
Test->>Test: test_nvfp4_gemm()
Test->>Test: test_nvfp4_gemm_epilogue()
Test->>Test: test_nvfp4_grouped_mm()
end
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Same as #5810
err msg
Exception raised from runGemm at /opt/pytorch/nvfuser/cutlass/nvfp4_scaled_mm.cu:255