skip test cutlass mxfp8_gemm on unsupported arches by liqiangxl · Pull Request #5810 · NVIDIA/Fuser

liqiangxl · 2026-01-13T14:40:07Z

err msg on unsupported hardwares: Exception raised from runGemm at /opt/pytorch/nvfuser/cutlass/mxfp8_scaled_mm.cu:262

github-actions · 2026-01-13T14:41:09Z

Description

Replace manual compute capability check with microarchitecture_is utility function
Add import for microarchitecture_is from python.direct_utils module
Maintain same test skipping behavior for unsupported architectures (compute capability < 10.0)
Simplify architecture detection logic using centralized utility

Changes walkthrough

Relevant files

Enhancement

test_cutlass_mxfp8_gemm.py `Replace compute capability check with utility function` tests/python/direct/test_cutlass_mxfp8_gemm.py Replace manual compute capability check with microarchitecture_is utility function Add import statement for microarchitecture_is from python.direct_utils Maintain same test skipping behavior for unsupported architectures (compute capability < 10.0) Simplify architecture detection logic using centralized utility	+2/-2

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Logic Equivalence Verify that `microarchitecture_is(10, 0)` provides equivalent logic to the original condition `compute_cap < (10, 0) or compute_cap >= (12, 0)`. The new condition should only allow compute capabilities 10.x and 11.x, excluding < 10.0 and >= 12.0. if not microarchitecture_is(10, 0):

greptile-apps · 2026-01-13T14:42:58Z

Greptile Overview

Greptile Summary

Changes

This PR refactors the architecture compatibility check in test_cutlass_mxfp8_gemm.py to use the microarchitecture_is utility function from direct_utils instead of directly calling torch.cuda.get_device_capability().

Before:

compute_cap = torch.cuda.get_device_capability()
if compute_cap < (10, 0) or compute_cap >= (12, 0):
    pytest.skip(...)

After:

if not microarchitecture_is(10, 0):
    pytest.skip(...)

Key Observations

Logic Change: The original check allowed a range of compute capabilities [10.0, 12.0), which included architectures like 10.1, 10.2, 10.3, and 11.x. The new check using microarchitecture_is(10, 0) only allows exactly compute capability 10.0 (major==10 AND minor==0), making it more restrictive.
Consistency with Codebase: The pattern matches test_cutlass_gemm.py which also uses microarchitecture_is(10, 0) to restrict tests to exactly Blackwell 10.0.
Import Added: The change adds an import for microarchitecture_is from python.direct_utils, consistent with other test files in the repository.

Impact

If MxFP8 GEMM operations were intended to work on architectures 10.1-10.3 (e.g., B300/GB300 at 10.3) or 11.x, this change will skip those tests. If the restriction to exactly 10.0 is intentional for safety/testing reasons, this aligns with the conservative approach used in test_cutlass_gemm.py.

Confidence Score: 3/5

Safe to merge with awareness of narrowed architecture support compared to original logic
The code change is syntactically correct and follows established patterns in the codebase. However, there's a semantic difference: the original logic supported a range [10.0, 12.0) while the new logic only supports exactly 10.0. This could be intentional (conservative approach matching test_cutlass_gemm.py) or unintentional (overlooking that the original allowed 10.x and 11.x). The change won't break existing tests on 10.0 hardware, but may skip tests on 10.1+ and 11.x that previously ran.
tests/python/direct/test_cutlass_mxfp8_gemm.py - Verify the narrowed architecture support (only 10.0) is intentional vs the original range [10.0, 12.0)

Important Files Changed

File Analysis

Filename	Score	Overview
tests/python/direct/test_cutlass_mxfp8_gemm.py	3/5	Changes architecture check from range [10.0, 12.0) to exactly 10.0 only, which is more restrictive than the original logic and may skip tests on valid architectures like 10.1, 10.2, 10.3, and 11.x

Sequence Diagram

sequenceDiagram
    participant Test as test_cutlass_mxfp8_gemm.py
    participant DirectUtils as python.direct_utils
    participant CUDA as torch.cuda
    participant Pytest as pytest

    Note over Test: Module load time
    Test->>DirectUtils: import microarchitecture_is
    Test->>DirectUtils: microarchitecture_is(10, 0)
    DirectUtils->>CUDA: get_device_properties(current_device())
    CUDA-->>DirectUtils: device properties (major, minor)
    DirectUtils->>DirectUtils: Check: major == 10 AND minor == 0
    alt Architecture is NOT exactly 10.0
        DirectUtils-->>Test: False
        Test->>Pytest: pytest.skip(allow_module_level=True)
        Note over Test,Pytest: Module skipped - tests won't run
    else Architecture is exactly 10.0
        DirectUtils-->>Test: True
        Note over Test: Continue loading module
        Test->>DirectUtils: import linear_to_swizzled_128_4, swizzled_to_linear_128_4
        Note over Test: Tests will run
    end

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-13T14:43:03Z


-compute_cap = torch.cuda.get_device_capability()
-if compute_cap < (10, 0) or compute_cap >= (12, 0):
+if not microarchitecture_is(10, 0):


[P1] Logic change narrows supported architectures more than original

The original condition compute_cap < (10, 0) or compute_cap >= (12, 0) would skip tests outside the range [10.0, 12.0), meaning it would allow compute capabilities like 10.1, 10.2, 10.3, 11.0, 11.5, etc.

The new condition not microarchitecture_is(10, 0) only allows exactly 10.0 (major==10 AND minor==0), and skips everything else including 10.1, 10.2, 10.3, and all 11.x architectures.

If the intent was to maintain the original behavior while using the utility function, consider using a range check or adding support for additional architectures that were previously supported (10.1-10.3, 11.x). If restricting to exactly 10.0 is intentional, this is fine but represents a significant behavior change from the original code.

jacobhinkle

LGTM

greptile-apps · 2026-01-13T14:45:25Z

Greptile Overview

Greptile Summary

This PR updates the architecture compatibility check for MxFp8 GEMM tests, restricting execution to compute capability 10.0 only. This follows the pattern established in PR #5757 for test_cutlass_gemm.py.

Key Changes

Replaced direct torch.cuda.get_device_capability() check with microarchitecture_is(10, 0) helper function from python.direct_utils
Changed from range-based check (10.0 ≤ cc < 12.0) to exact match (cc == 10.0)
This restricts the test to only Blackwell 10.0 architectures (B200, GB200), excluding untested variants like 10.3 (B300, GB300) and 12.x

Behavioral Change

Before: Test ran on any compute capability from 10.0 up to (but excluding) 12.0
After: Test runs only on exactly compute capability 10.0

This is a conservative approach that prevents potential failures on untested architectures, consistent with the test_cutlass_gemm.py pattern.

Observations

The error message could be more specific about requiring exactly 10.0 rather than just "10"
Missing explanatory comments about tested/untested architectures (present in test_cutlass_gemm.py)
test_cutlass_nvfp4_gemm.py still uses the old pattern and may benefit from similar updates for consistency

Confidence Score: 4/5

This PR is safe to merge with minor documentation improvements recommended
The change follows an established pattern from PR Skip un-supported arches test_grouped_mm #5757 and correctly implements the architecture restriction. The logic is sound and the helper function is properly imported. Score of 4 (rather than 5) reflects that the error message could be clearer and documentation comments would improve maintainability, but these are style/clarity issues rather than functional problems.
No files require special attention - the single changed file is straightforward and follows established patterns

Important Files Changed

File Analysis

Filename	Score	Overview
tests/python/direct/test_cutlass_mxfp8_gemm.py	4/5	Updated architecture check to restrict test to compute capability 10.0 only, following pattern from test_cutlass_gemm.py. Error message could be more specific about the exact requirement.

Sequence Diagram

sequenceDiagram
    participant Test as test_mxfp8_gemm
    participant Utils as python.direct_utils
    participant PyTorch as torch.cuda
    participant Pytest as pytest
    
    Note over Test: Module Load Time
    Test->>Utils: import microarchitecture_is
    Test->>Utils: microarchitecture_is(10, 0)
    Utils->>PyTorch: get_device_properties(current_device())
    PyTorch-->>Utils: device_properties
    Utils->>Utils: Check if major == 10 and minor == 0
    alt Compute capability is exactly 10.0
        Utils-->>Test: True
        Note over Test: Test execution proceeds
    else Compute capability is NOT 10.0
        Utils-->>Test: False
        Test->>Pytest: pytest.skip(allow_module_level=True)
        Note over Test,Pytest: Entire test module skipped<br/>with reason message
    end
    
    Note over Test: Previous Behavior (before PR)
    Note over Test: Used torch.cuda.get_device_capability()<br/>directly and checked range [10.0, 12.0)

greptile-apps

_{3 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-13T14:45:28Z

-if compute_cap < (10, 0) or compute_cap >= (12, 0):
+if not microarchitecture_is(10, 0):
    pytest.skip(
        reason="MxFp8 Requires compute capability 10.",


The error message "MxFp8 Requires compute capability 10." is misleading. The new logic using microarchitecture_is(10, 0) only runs on exactly compute capability 10.0, not all 10.x architectures.

Previous behavior: Allowed 10.0, 10.1, 10.2, 10.3, 11.x (anything from 10.0 up to but excluding 12.0)
New behavior: Only allows 10.0

Consider updating the message to be more specific, e.g., "MxFp8 requires compute capability 10.0. Other architectures have not been tested." This matches the pattern used in test_cutlass_gemm.py which has detailed comments about tested vs untested architectures.

Suggested change

reason="MxFp8 Requires compute capability 10.",

reason="MxFp8 requires compute capability 10.0. Other architectures have not been tested.",

greptile-apps · 2026-01-13T14:45:29Z


-compute_cap = torch.cuda.get_device_capability()
-if compute_cap < (10, 0) or compute_cap >= (12, 0):
+if not microarchitecture_is(10, 0):


Consider adding explanatory comments about which architectures are supported/tested, similar to test_cutlass_gemm.py (lines 12-16). This would help future maintainers understand why the test is restricted to compute capability 10.0 only.

Example from test_cutlass_gemm.py:

# GPU Compute Capability: https://developer.nvidia.com/cuda/gpus # tested on blackwell compute 10.0 (B200 and GB200) # doesn't support 12.0 (RTX PRO 6000 and RTX 50XX) # Not tested on 10.3 (B300 and GB300) # Not tested on 12.1 (DGX Spark) if not microarchitecture_is(10, 0):

This documentation helps clarify the intentional restriction and provides context for when the restriction might be relaxed in the future.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-01-13T14:45:30Z

 import pytest
 import torch
 from nvfuser_direct import nvf_cutlass
+from python.direct_utils import microarchitecture_is


Note: test_cutlass_nvfp4_gemm.py still uses the old pattern with torch.cuda.get_device_capability() and allows all architectures in the range [10.0, 12.0). For consistency, consider updating that test file as well if NVFP4 should also be restricted to only tested architectures.

Current state in test_cutlass_nvfp4_gemm.py:

compute_cap = torch.cuda.get_device_capability() if compute_cap < (10, 0) or compute_cap >= (12, 0): pytest.skip( reason="Nvfp4 Requires compute capability 10.", allow_module_level=True, )

If NVFP4 has the same testing limitations as MxFp8 and grouped_mm, it should follow the same pattern for maintainability and clarity.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

liqiangxl · 2026-01-13T15:17:41Z

!build

Same as #5810 err msg `Exception raised from runGemm at /opt/pytorch/nvfuser/cutlass/nvfp4_scaled_mm.cu:255`

Same as #5810 err msg `Exception raised from run_nvfp4_scaled_group_mm at /opt/pytorch/nvfuser/cutlass/nvfp4_scaled_group_mm.cu:518`

Same as #5810 Skip tests in `test_narrow_precision` that use scaled/grouped mm err msg `Exception raised from runGemm at /opt/pytorch/nvfuser/cutlass/nvfp4_scaled_mm.cu:255`

liqiangxl and others added 2 commits January 13, 2026 06:39

skip test cutlass mxfp8_gemm on unsupported arches

a05c3cc

Merge branch 'main' into llu/skip_unsupported_arches_mxfp8_gemm

e8892c8

liqiangxl requested a review from jacobhinkle January 13, 2026 14:41

greptile-apps Bot reviewed Jan 13, 2026

View reviewed changes

jacobhinkle approved these changes Jan 13, 2026

View reviewed changes

greptile-apps Bot reviewed Jan 13, 2026

View reviewed changes

liqiangxl merged commit 3011961 into main Jan 13, 2026
20 checks passed

liqiangxl deleted the llu/skip_unsupported_arches_mxfp8_gemm branch January 13, 2026 15:46

This was referenced Jan 13, 2026

skip test nvfp4_gemm on unsupported arches #5815

Merged

skip scaled grouped mm test on unsupported arches #5816

Merged

github-actions Bot pushed a commit that referenced this pull request Jan 13, 2026

skip test nvfp4_gemm on unsupported arches (#5815)

3dee5aa

Same as #5810 err msg `Exception raised from runGemm at /opt/pytorch/nvfuser/cutlass/nvfp4_scaled_mm.cu:255`

github-actions Bot pushed a commit that referenced this pull request Jan 15, 2026

skip scaled grouped mm test on unsupported arches (#5816)

dea65c6

Same as #5810 err msg `Exception raised from run_nvfp4_scaled_group_mm at /opt/pytorch/nvfuser/cutlass/nvfp4_scaled_group_mm.cu:518`

liqiangxl mentioned this pull request Jan 20, 2026

skip scaled/grouped mm related tests on unsupported gpus #5847

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skip test cutlass mxfp8_gemm on unsupported arches#5810

skip test cutlass mxfp8_gemm on unsupported arches#5810
liqiangxl merged 2 commits intomainfrom
llu/skip_unsupported_arches_mxfp8_gemm

liqiangxl commented Jan 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 13, 2026

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps Bot commented Jan 13, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot Jan 13, 2026

Uh oh!

jacobhinkle left a comment

Uh oh!

greptile-apps Bot commented Jan 13, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

greptile-apps Bot Jan 13, 2026

Uh oh!

greptile-apps Bot Jan 13, 2026

Uh oh!

greptile-apps Bot Jan 13, 2026

Uh oh!

liqiangxl commented Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	reason="MxFp8 Requires compute capability 10.",
	reason="MxFp8 requires compute capability 10.0. Other architectures have not been tested.",

Conversation

liqiangxl commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jan 13, 2026

Description

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps Bot commented Jan 13, 2026

Greptile Overview

Greptile Summary

Changes

Key Observations

Impact

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

jacobhinkle left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jan 13, 2026

Greptile Overview

Greptile Summary

Key Changes

Behavioral Change

Observations

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

liqiangxl commented Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liqiangxl commented Jan 13, 2026 •

edited

Loading