Skip to content

[None][fix] Raise clear error when GPT-OSS is used with non-TRTLLM attention backend#13166

Merged
karljang merged 5 commits into
NVIDIA:mainfrom
ssam18:fix/gpt-oss-attn-sink-backend-validation
May 14, 2026
Merged

[None][fix] Raise clear error when GPT-OSS is used with non-TRTLLM attention backend#13166
karljang merged 5 commits into
NVIDIA:mainfrom
ssam18:fix/gpt-oss-attn-sink-backend-validation

Conversation

@ssam18
Copy link
Copy Markdown
Contributor

@ssam18 ssam18 commented Apr 17, 2026

GPT-OSS uses attention sinks which are only supported by the TRTLLM backend, but this was only enforced during the first forward pass deep inside executor warmup, giving a very confusing traceback. This moves the check into AttentionBlock.init so users get a clear, actionable ValueError the moment the model is constructed with an incompatible backend. Also tightened the fallback assert in attention.py to include the actual backend name in the message. Fixes #13156

Summary by CodeRabbit

  • Bug Fixes
    • Improved error handling and validation for GPT-OSS attention configuration to provide clearer feedback when attention backend settings are incompatible.

Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>
@ssam18 ssam18 requested review from a team as code owners April 17, 2026 22:26
@ssam18 ssam18 changed the title [None][fix] Raise clear error when GPT-OSS is used with non-TRTLLM attention backend [fix] Raise clear error when GPT-OSS is used with non-TRTLLM attention backend Apr 17, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

Two files are updated to enhance validation and error messaging for attention backend compatibility in GPT-OSS models. A runtime validation is added to prevent GPT-OSS initialization with unsupported attention backends, and an existing assertion error message is clarified to include the configured backend.

Changes

Cohort / File(s) Summary
GPT-OSS Model Initialization
tensorrt_llm/_torch/models/modeling_gpt_oss.py
Added runtime validation in AttentionBlock.__init__ that raises ValueError when attn_backend is not "TRTLLM", executed after base-class initialization and before setting sliding window and attention sinks.
Attention Module Error Handling
tensorrt_llm/_torch/modules/attention.py
Improved assertion error message in Attention.forward() to include the currently configured backend and explicitly specify that attn_backend='TRTLLM' is required when attention_sinks is used.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (1 warning, 2 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description explains the issue, solution, and references the fixed issue, but does not include the test coverage section required by the template. Add the 'Test Coverage' section to clearly document what test cases validate this backend compatibility check.
Linked Issues check ❓ Inconclusive The PR partially addresses #13156 by moving the check earlier, but only implements one of the two desired outcomes (early error detection) rather than enabling FlashInfer support on SM_120. Clarify whether the early error detection approach fully resolves #13156 or if further investigation is needed for enabling FlashInfer on SM_120.
✅ Passed checks (2 passed)
Check name Status Explanation
Out of Scope Changes check ✅ Passed All changes are scoped to enforcing attention backend compatibility for GPT-OSS, with no out-of-scope modifications detected.
Title check ✅ Passed The title clearly and specifically describes the main change: moving a runtime validation earlier to raise a clear error when GPT-OSS is used with non-TRTLLM attention backend.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tensorrt_llm/_torch/models/modeling_gpt_oss.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Missing required NVIDIA copyright header/year update.

This modified Python source file should include the standard NVIDIA copyright header (with year updated to latest meaningful modification).

As per coding guidelines, "Add NVIDIA copyright header on ALL new files and update year on modified files" and "All TensorRT-LLM source files must contain an NVIDIA copyright header with the year of latest meaningful modification".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_gpt_oss.py` at line 1, This file
(modeling_gpt_oss.py) is missing the required NVIDIA copyright header; add the
standard NVIDIA copyright header block at the very top of the file (above the
import statements) and update the year to the latest meaningful modification
year, ensuring the header matches the project's canonical header text and
formatting used across other TensorRT-LLM source files.
tensorrt_llm/_torch/modules/attention.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Missing required NVIDIA copyright header/year update.

This modified Python source file should include the standard NVIDIA copyright header (with year updated to latest meaningful modification).

As per coding guidelines, "Add NVIDIA copyright header on ALL new files and update year on modified files" and "All TensorRT-LLM source files must contain an NVIDIA copyright header with the year of latest meaningful modification".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/attention.py` at line 1, Add the standard NVIDIA
copyright header (with the year updated to the latest meaningful modification)
at the very top of this source file so it precedes the existing "import
functools" line; ensure the header matches the project's canonical NVIDIA header
format and contains the correct year and copyright notice used across other
TensorRT-LLM files (so the file tensorrt_llm/_torch/modules/attention.py
conforms to licensing guidelines).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tensorrt_llm/_torch/models/modeling_gpt_oss.py`:
- Line 1: This file (modeling_gpt_oss.py) is missing the required NVIDIA
copyright header; add the standard NVIDIA copyright header block at the very top
of the file (above the import statements) and update the year to the latest
meaningful modification year, ensuring the header matches the project's
canonical header text and formatting used across other TensorRT-LLM source
files.

In `@tensorrt_llm/_torch/modules/attention.py`:
- Line 1: Add the standard NVIDIA copyright header (with the year updated to the
latest meaningful modification) at the very top of this source file so it
precedes the existing "import functools" line; ensure the header matches the
project's canonical NVIDIA header format and contains the correct year and
copyright notice used across other TensorRT-LLM files (so the file
tensorrt_llm/_torch/modules/attention.py conforms to licensing guidelines).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d6593c92-5d2d-41ad-bcac-18ce0ecb2178

📥 Commits

Reviewing files that changed from the base of the PR and between 813d877 and 80d2a13.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/models/modeling_gpt_oss.py
  • tensorrt_llm/_torch/modules/attention.py

Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>
@ssam18 ssam18 changed the title [fix] Raise clear error when GPT-OSS is used with non-TRTLLM attention backend [None][fix] Raise clear error when GPT-OSS is used with non-TRTLLM attention backend Apr 17, 2026
@svc-trtllm-gh-bot svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Apr 17, 2026
Copy link
Copy Markdown
Collaborator

@symphonylyh symphonylyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@karljang
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48119 [ run ] triggered by Bot. Commit: 3286e12 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48119 [ run ] completed with state SUCCESS. Commit: 3286e12
/LLM/main/L0_MergeRequest_PR pipeline #37946 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@ssam18
Copy link
Copy Markdown
Contributor Author

ssam18 commented May 13, 2026

/bot run --disable-fail-fast

@karljang Could you share which 6 tests failed in L0 #37946? The JUnit reports are redacted on my side. This PR only adds a ValueError in AttentionBlock.init and improves an assert message and it should not affect any test that wasn't already misconfigured. If the failures look unrelated, could you re-trigger CI? Thanks.

@karljang
Copy link
Copy Markdown
Collaborator

The failed cases don't seem to be related to this PR~ I'll just rerun it.

@karljang
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48231 [ run ] triggered by Bot. Commit: 7bfb10b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48231 [ run ] completed with state SUCCESS. Commit: 7bfb10b
/LLM/main/L0_MergeRequest_PR pipeline #38049 completed with status: 'SUCCESS'

CI Report

Link to invocation

@karljang karljang merged commit f5b0bde into NVIDIA:main May 14, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community want to contribute PRs initiated from Community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: GPT-OSS-20B fails on RTX Pro 6000 when using FlashInfer backend

6 participants