Skip to content

[None][fix] Fix int4 awq for sm120/121#11561

Merged
pamelap-nvidia merged 6 commits into
NVIDIA:mainfrom
pamelap-nvidia:sm120_int4
May 20, 2026
Merged

[None][fix] Fix int4 awq for sm120/121#11561
pamelap-nvidia merged 6 commits into
NVIDIA:mainfrom
pamelap-nvidia:sm120_int4

Conversation

@pamelap-nvidia
Copy link
Copy Markdown
Collaborator

@pamelap-nvidia pamelap-nvidia commented Feb 18, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Fixed weight preprocessing logic for SM120/Blackwell architectures to ensure correct weight layout handling
    • Improved SM120 candidate selection for Blackwell to properly distinguish between weight-only and non-weight-only configurations
    • Enhanced W4A16 quantization support with expanded GPU architecture compatibility beyond previous limits, while maintaining FP8 activation constraints
  • Tests

    • Updated architecture capability checks to reflect expanded GPU support for W4A16 quantization

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@pamelap-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot --skip-test

@github-actions
Copy link
Copy Markdown

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@pamelap-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run --skip-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #36127 [ run ] triggered by Bot. Commit: 35b8b39

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #36127 [ run ] completed with state SUCCESS. Commit: 35b8b39
/LLM/main/L0_MergeRequest_PR pipeline #27917 (Partly Tested) completed with status: 'SUCCESS'

@pamelap-nvidia pamelap-nvidia force-pushed the sm120_int4 branch 2 times, most recently from e749bc8 to 73b677c Compare April 30, 2026 17:39
@pamelap-nvidia pamelap-nvidia changed the title fix sm120 int4 [none][fix] Fix int4 awq for sm120/121 Apr 30, 2026
@pamelap-nvidia pamelap-nvidia marked this pull request as ready for review April 30, 2026 17:40
@pamelap-nvidia pamelap-nvidia requested a review from a team as a code owner April 30, 2026 17:40
@pamelap-nvidia pamelap-nvidia requested a review from liji-nv April 30, 2026 17:40
@pamelap-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Walkthrough

This pull request differentiates SM architecture version support for quantized GEMM operations, introducing separate maximum SM versions for W4A8 (FP8-based) versus W4A16 (FP16/BF16-based) mixed-dtype operations. Additionally, weight preprocessing for SM >= 90 is refactored to apply consistently, and SM120/Blackwell candidate selection is gated on WEIGHT_ONLY flags.

Changes

Cohort / File(s) Summary
SM Architecture Gating
cpp/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp, tensorrt_llm/_torch/custom_ops/torch_custom_ops.py
SM120/Blackwell candidate selection now conditional on WEIGHT_ONLY not being set. FinegrainedMixedDtypeGemm introduces MAX_SUPPORTED_SM_VERSION_W4A16 to allow W4A16 (FP16/BF16) operations on newer architectures beyond FP8-limited W4A8.
Weight Preprocessing
tensorrt_llm/quantization/functional.py
SM >= 90 remapping to SM 80 is moved outside conditional shape-handling branches to ensure consistent weight layout transformations for Hopper/Blackwell paths.
Test Updates
tests/unittest/_torch/thop/parallel/test_finegrained_mixed_dtype_gemm.py, tests/unittest/_torch/thop/parallel/test_w4a16_linear.py, tests/unittest/_torch/thop/parallel/test_w4a8_linear.py
Updated SM version checks to use mode-specific constants: MAX_SUPPORTED_SM_VERSION for W4A8 and MAX_SUPPORTED_SM_VERSION_W4A16 for W4A16, with clarifying comments on FP8 dispatch dependencies.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description is incomplete; it only contains the template sections without any actual content describing the issue, solution, or test coverage. Fill in the Description, Test Coverage, and relevant PR Checklist sections with specific details about the fix and its validation.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The PR title clearly specifies the fix for int4 awq behavior on SM120/121, which directly relates to the main code changes across multiple files addressing SM120/121 support.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value).


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
tests/unittest/_torch/thop/parallel/test_w4a16_linear.py (1)

1-3: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA copyright/license header.

This modified Python source file is missing the required file header.

As per coding guidelines: "All source files (.cpp, .h, .cu, .py) should contain an NVIDIA copyright header with the year of latest modification and Apache 2.0 license notice."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/thop/parallel/test_w4a16_linear.py` around lines 1 - 3,
Add the required NVIDIA copyright/license header at the top of the Python source
file (the module that currently starts with "import pytest" and "import torch"):
insert the standard NVIDIA header block including the year of latest
modification and the Apache 2.0 license notice as specified by project
guidelines, placing it above all imports so the file begins with the header
comment followed by the existing import lines.
cpp/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp (1)

2-2: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update the modified-file copyright year.

This file was changed in this PR, but the header still ends at 2023.

As per coding guidelines: "Include NVIDIA copyright header on all new files; update year on modified files."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp` at line 2,
Update the copyright header in the file to include the current year (extend the
year range on the copyright header line from 2023 to the current year) so the
modified-file header is correct; locate the copyright header at the top of
cpp/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp and change the
year range in that header comment to include the current year (e.g., 2023-2026).
tests/unittest/_torch/thop/parallel/test_finegrained_mixed_dtype_gemm.py (1)

1-3: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA copyright/license header.

This modified Python source file is missing the required file header.

As per coding guidelines: "All source files (.cpp, .h, .cu, .py) should contain an NVIDIA copyright header with the year of latest modification and Apache 2.0 license notice."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/thop/parallel/test_finegrained_mixed_dtype_gemm.py`
around lines 1 - 3, Add the required NVIDIA copyright and Apache-2.0 license
header as a top-of-file comment block in
tests/unittest/_torch/thop/parallel/test_finegrained_mixed_dtype_gemm.py (place
it before the existing imports: pytest, torch, and the utils imports), using the
latest modification year and the standard NVIDIA header text; ensure the header
is a commented Python block (each line beginning with #) so it does not affect
execution.
tests/unittest/_torch/thop/parallel/test_w4a8_linear.py (1)

1-3: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required NVIDIA copyright/license header.

This modified Python source file is missing the required file header.

As per coding guidelines: "All source files (.cpp, .h, .cu, .py) should contain an NVIDIA copyright header with the year of latest modification and Apache 2.0 license notice."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/thop/parallel/test_w4a8_linear.py` around lines 1 - 3,
Add the required NVIDIA copyright/license header at the top of the file (above
the existing imports for pytest, torch and Parameter): insert the standard
NVIDIA header block including the correct latest modification year and the
Apache 2.0 license notice used across the codebase, ensuring it appears before
any code or imports so the file begins with the license header.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/quantization/functional.py`:
- Around line 968-971: The branch that sets do_weight_interleave based on sm_ ==
100 or sm_ == 103 is unreachable because sm_ is always remapped to 80 when sm_
>= 90; to fix, change the logic so the SM100/103 override is evaluated before
the remap or narrow the remap condition (e.g., only remap when 90 <= sm_ < 100),
ensuring the check against sm_ == 100 or sm_ == 103 (the code using variable sm_
and setting do_weight_interleave) can actually execute.

---

Outside diff comments:
In `@cpp/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp`:
- Line 2: Update the copyright header in the file to include the current year
(extend the year range on the copyright header line from 2023 to the current
year) so the modified-file header is correct; locate the copyright header at the
top of cpp/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp and change
the year range in that header comment to include the current year (e.g.,
2023-2026).

In `@tests/unittest/_torch/thop/parallel/test_finegrained_mixed_dtype_gemm.py`:
- Around line 1-3: Add the required NVIDIA copyright and Apache-2.0 license
header as a top-of-file comment block in
tests/unittest/_torch/thop/parallel/test_finegrained_mixed_dtype_gemm.py (place
it before the existing imports: pytest, torch, and the utils imports), using the
latest modification year and the standard NVIDIA header text; ensure the header
is a commented Python block (each line beginning with #) so it does not affect
execution.

In `@tests/unittest/_torch/thop/parallel/test_w4a16_linear.py`:
- Around line 1-3: Add the required NVIDIA copyright/license header at the top
of the Python source file (the module that currently starts with "import pytest"
and "import torch"): insert the standard NVIDIA header block including the year
of latest modification and the Apache 2.0 license notice as specified by project
guidelines, placing it above all imports so the file begins with the header
comment followed by the existing import lines.

In `@tests/unittest/_torch/thop/parallel/test_w4a8_linear.py`:
- Around line 1-3: Add the required NVIDIA copyright/license header at the top
of the file (above the existing imports for pytest, torch and Parameter): insert
the standard NVIDIA header block including the correct latest modification year
and the Apache 2.0 license notice used across the codebase, ensuring it appears
before any code or imports so the file begins with the license header.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 71af6f98-55b2-4986-a61c-11676022e347

📥 Commits

Reviewing files that changed from the base of the PR and between b857ee8 and b9b9599.

📒 Files selected for processing (6)
  • cpp/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp
  • tensorrt_llm/_torch/custom_ops/torch_custom_ops.py
  • tensorrt_llm/quantization/functional.py
  • tests/unittest/_torch/thop/parallel/test_finegrained_mixed_dtype_gemm.py
  • tests/unittest/_torch/thop/parallel/test_w4a16_linear.py
  • tests/unittest/_torch/thop/parallel/test_w4a8_linear.py

Comment thread tensorrt_llm/quantization/functional.py Outdated
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46416 [ run ] triggered by Bot. Commit: b9b9599 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46416 [ run ] completed with state FAILURE. Commit: b9b9599
/LLM/main/L0_MergeRequest_PR pipeline #36490 completed with status: 'ABORTED'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@pamelap-nvidia pamelap-nvidia changed the title [none][fix] Fix int4 awq for sm120/121 [None][fix] Fix int4 awq for sm120/121 May 3, 2026
@pamelap-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46578 [ run ] triggered by Bot. Commit: 68bc567 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46578 [ run ] completed with state SUCCESS. Commit: 68bc567
/LLM/main/L0_MergeRequest_PR pipeline #36628 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@pamelap-nvidia pamelap-nvidia requested a review from a team as a code owner May 5, 2026 18:46
@pamelap-nvidia pamelap-nvidia requested a review from yuxianq May 5, 2026 18:46
@pamelap-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46862 [ run ] triggered by Bot. Commit: a394664 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46862 [ run ] completed with state SUCCESS. Commit: a394664
/LLM/main/L0_MergeRequest_PR pipeline #36874 completed with status: 'SUCCESS'

CI Report

Link to invocation

Comment thread cpp/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp Outdated
Comment thread tensorrt_llm/_torch/custom_ops/torch_custom_ops.py Outdated
Comment thread tests/unittest/_torch/thop/parallel/test_finegrained_mixed_dtype_gemm.py Outdated
Comment thread tensorrt_llm/quantization/functional.py Outdated
@pamelap-nvidia pamelap-nvidia requested a review from farazkh80 May 8, 2026 19:42
@farazkh80
Copy link
Copy Markdown
Collaborator

I think @yuxianq covered most of my points. So other than those, LGTM!

Could we just run ≠the l0_6000 or just GPT-OSS E2E test before merging?

Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
The Python preprocess_weights_for_mixed_gemm had its
sm_ >= 90 -> sm_ = 80 adjustment trapped inside an elif tied to the 2-D
shape check, so vanilla 2-D Linear weights on SM120/121 kept sm_ = 120
and skipped both row permutation (gated sm_ < 100) and column interleave
(gated sm_ < 90). The CUTLASS Sm80 finegrained kernel then read a
mismatched layout, producing 80%+ mismatch with ground truth.

Move the adjustment out of the elif so it applies to both 2-D and 3-D
inputs, mirroring the C++ preprocessor.

Also lift the FinegrainedMixedDtypeGemm activation-aware version gate so
W4A16 is enabled up through SM121 while W4A8 stays gated at SM103
(no FP8-activation dispatch on the SM120 Sm80 fallback). Tests skip
correctly under the new gate.

Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
@pamelap-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run --extra-stage "GB10-PyTorch-Post-Merge-1,RTXPro6000D-PyTorch-Post-Merge-1,RTXPro6000D-4_GPUs-PyTorch-Post-Merge-1,RTXPro6000D-4_GPUs-PyTorch-Post-Merge-2"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48029 [ run ] triggered by Bot. Commit: 2823247 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48029 [ run ] completed with state FAILURE. Commit: 2823247
/LLM/main/L0_MergeRequest_PR pipeline #37865 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@pamelap-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run --extra-stage "GB10-PyTorch-Post-Merge-1,RTXPro6000D-PyTorch-Post-Merge-1,RTXPro6000D-4_GPUs-PyTorch-Post-Merge-1,RTXPro6000D-4_GPUs-PyTorch-Post-Merge-2"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48218 [ run ] triggered by Bot. Commit: 2823247 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48218 [ run ] completed with state SUCCESS. Commit: 2823247
/LLM/main/L0_MergeRequest_PR pipeline #38036 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@pamelap-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run --extra-stage "GB10-PyTorch-Post-Merge-1,RTXPro6000D-PyTorch-Post-Merge-1,RTXPro6000D-4_GPUs-PyTorch-Post-Merge-1,RTXPro6000D-4_GPUs-PyTorch-Post-Merge-2" --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48392 [ run ] triggered by Bot. Commit: 2823247 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48392 [ run ] completed with state SUCCESS. Commit: 2823247
/LLM/main/L0_MergeRequest_PR pipeline #38195 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@pamelap-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run --extra-stage "GB10-PyTorch-Post-Merge-1,RTXPro6000D-PyTorch-Post-Merge-1,RTXPro6000D-4_GPUs-PyTorch-Post-Merge-1,RTXPro6000D-4_GPUs-PyTorch-Post-Merge-2" --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48607 [ run ] triggered by Bot. Commit: 2823247 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48607 [ run ] completed with state FAILURE. Commit: 2823247
/LLM/main/L0_MergeRequest_PR pipeline #38390 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@pamelap-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run --extra-stage "GB10-PyTorch-Post-Merge-1,RTXPro6000D-PyTorch-Post-Merge-1,RTXPro6000D-4_GPUs-PyTorch-Post-Merge-1,RTXPro6000D-4_GPUs-PyTorch-Post-Merge-2" --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48655 [ run ] triggered by Bot. Commit: 2823247 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48655 [ run ] completed with state SUCCESS. Commit: 2823247
/LLM/main/L0_MergeRequest_PR pipeline #38435 completed with status: 'SUCCESS'

CI Report

Link to invocation

@pamelap-nvidia pamelap-nvidia merged commit 77db08d into NVIDIA:main May 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants