[None][fix] Raise clear error when GPT-OSS is used with non-TRTLLM attention backend by ssam18 · Pull Request #13166 · NVIDIA/TensorRT-LLM

ssam18 · 2026-04-17T22:26:11Z

GPT-OSS uses attention sinks which are only supported by the TRTLLM backend, but this was only enforced during the first forward pass deep inside executor warmup, giving a very confusing traceback. This moves the check into AttentionBlock.init so users get a clear, actionable ValueError the moment the model is constructed with an incompatible backend. Also tightened the fallback assert in attention.py to include the actual backend name in the message. Fixes #13156

Summary by CodeRabbit

Bug Fixes
- Improved error handling and validation for GPT-OSS attention configuration to provide clearer feedback when attention backend settings are incompatible.

Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>

coderabbitai · 2026-04-17T22:30:15Z

📝 Walkthrough

Walkthrough

Two files are updated to enhance validation and error messaging for attention backend compatibility in GPT-OSS models. A runtime validation is added to prevent GPT-OSS initialization with unsupported attention backends, and an existing assertion error message is clarified to include the configured backend.

Changes

Cohort / File(s)	Summary
GPT-OSS Model Initialization `tensorrt_llm/_torch/models/modeling_gpt_oss.py`	Added runtime validation in `AttentionBlock.__init__` that raises `ValueError` when `attn_backend` is not `"TRTLLM"`, executed after base-class initialization and before setting sliding window and attention sinks.
Attention Module Error Handling `tensorrt_llm/_torch/modules/attention.py`	Improved assertion error message in `Attention.forward()` to include the currently configured backend and explicitly specify that `attn_backend='TRTLLM'` is required when `attention_sinks` is used.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (1 warning, 2 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description explains the issue, solution, and references the fixed issue, but does not include the test coverage section required by the template.	Add the 'Test Coverage' section to clearly document what test cases validate this backend compatibility check.
Linked Issues check	❓ Inconclusive	The PR partially addresses `#13156` by moving the check earlier, but only implements one of the two desired outcomes (early error detection) rather than enabling FlashInfer support on SM_120.	Clarify whether the early error detection approach fully resolves `#13156` or if further investigation is needed for enabling FlashInfer on SM_120.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Out of Scope Changes check	✅ Passed	All changes are scoped to enforcing attention backend compatibility for GPT-OSS, with no out-of-scope modifications detected.
Title check	✅ Passed	The title clearly and specifically describes the main change: moving a runtime validation earlier to raise a clear error when GPT-OSS is used with non-TRTLLM attention backend.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tensorrt_llm/_torch/models/modeling_gpt_oss.py (1)
1-1: ⚠️ Potential issue | 🟠 Major

Missing required NVIDIA copyright header/year update.

This modified Python source file should include the standard NVIDIA copyright header (with year updated to latest meaningful modification).

As per coding guidelines, "Add NVIDIA copyright header on ALL new files and update year on modified files" and "All TensorRT-LLM source files must contain an NVIDIA copyright header with the year of latest meaningful modification".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_gpt_oss.py` at line 1, This file
(modeling_gpt_oss.py) is missing the required NVIDIA copyright header; add the
standard NVIDIA copyright header block at the very top of the file (above the
import statements) and update the year to the latest meaningful modification
year, ensuring the header matches the project's canonical header text and
formatting used across other TensorRT-LLM source files.
tensorrt_llm/_torch/modules/attention.py (1)
1-1: ⚠️ Potential issue | 🟠 Major

Missing required NVIDIA copyright header/year update.

This modified Python source file should include the standard NVIDIA copyright header (with year updated to latest meaningful modification).

As per coding guidelines, "Add NVIDIA copyright header on ALL new files and update year on modified files" and "All TensorRT-LLM source files must contain an NVIDIA copyright header with the year of latest meaningful modification".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/attention.py` at line 1, Add the standard NVIDIA
copyright header (with the year updated to the latest meaningful modification)
at the very top of this source file so it precedes the existing "import
functools" line; ensure the header matches the project's canonical NVIDIA header
format and contains the correct year and copyright notice used across other
TensorRT-LLM files (so the file tensorrt_llm/_torch/modules/attention.py
conforms to licensing guidelines).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tensorrt_llm/_torch/models/modeling_gpt_oss.py`:
- Line 1: This file (modeling_gpt_oss.py) is missing the required NVIDIA
copyright header; add the standard NVIDIA copyright header block at the very top
of the file (above the import statements) and update the year to the latest
meaningful modification year, ensuring the header matches the project's
canonical header text and formatting used across other TensorRT-LLM source
files.

In `@tensorrt_llm/_torch/modules/attention.py`:
- Line 1: Add the standard NVIDIA copyright header (with the year updated to the
latest meaningful modification) at the very top of this source file so it
precedes the existing "import functools" line; ensure the header matches the
project's canonical NVIDIA header format and contains the correct year and
copyright notice used across other TensorRT-LLM files (so the file
tensorrt_llm/_torch/modules/attention.py conforms to licensing guidelines).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d6593c92-5d2d-41ad-bcac-18ce0ecb2178

📥 Commits

Reviewing files that changed from the base of the PR and between 813d877 and 80d2a13.

📒 Files selected for processing (2)

tensorrt_llm/_torch/models/modeling_gpt_oss.py
tensorrt_llm/_torch/modules/attention.py

Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>

symphonylyh

LGTM

karljang · 2026-05-13T06:15:00Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-13T06:20:24Z

PR_Github #48119 [ run ] triggered by Bot. Commit: 3286e12 Link to invocation

tensorrt-cicd · 2026-05-13T12:32:58Z

PR_Github #48119 [ run ] completed with state SUCCESS. Commit: 3286e12
/LLM/main/L0_MergeRequest_PR pipeline #37946 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

ssam18 · 2026-05-13T16:05:26Z

/bot run --disable-fail-fast

@karljang Could you share which 6 tests failed in L0 #37946? The JUnit reports are redacted on my side. This PR only adds a ValueError in AttentionBlock.init and improves an assert message and it should not affect any test that wasn't already misconfigured. If the failures look unrelated, could you re-trigger CI? Thanks.

karljang · 2026-05-13T19:24:37Z

The failed cases don't seem to be related to this PR~ I'll just rerun it.

karljang · 2026-05-13T19:24:54Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-13T19:31:17Z

PR_Github #48231 [ run ] triggered by Bot. Commit: 7bfb10b Link to invocation

tensorrt-cicd · 2026-05-14T01:40:16Z

PR_Github #48231 [ run ] completed with state SUCCESS. Commit: 7bfb10b
/LLM/main/L0_MergeRequest_PR pipeline #38049 completed with status: 'SUCCESS'

CI Report

Link to invocation

[fix] Raise clear error when GPT-OSS is used with non-TRTLLM backend

80d2a13

Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>

ssam18 requested review from a team as code owners April 17, 2026 22:26

ssam18 requested review from pengbowang-nv and symphonylyh April 17, 2026 22:26

github-actions Bot assigned ssam18 Apr 17, 2026

ssam18 changed the title ~~[None][fix] Raise clear error when GPT-OSS is used with non-TRTLLM attention backend~~ [fix] Raise clear error when GPT-OSS is used with non-TRTLLM attention backend Apr 17, 2026

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

[None][fix] Fix yapf formatting in attention.py

9ac17cd

Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>

ssam18 changed the title ~~[fix] Raise clear error when GPT-OSS is used with non-TRTLLM attention backend~~ [None][fix] Raise clear error when GPT-OSS is used with non-TRTLLM attention backend Apr 17, 2026

svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Apr 17, 2026

symphonylyh approved these changes May 12, 2026

View reviewed changes

Merge branch 'main' into fix/gpt-oss-attn-sink-backend-validation

3286e12

pengbowang-nv approved these changes May 13, 2026

View reviewed changes

Merge branch 'main' into fix/gpt-oss-attn-sink-backend-validation

578c4ed

Merge branch 'main' into fix/gpt-oss-attn-sink-backend-validation

7bfb10b

karljang merged commit f5b0bde into NVIDIA:main May 14, 2026
6 checks passed

Conversation

ssam18 commented Apr 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 2 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

symphonylyh left a comment

Choose a reason for hiding this comment

Uh oh!

karljang commented May 13, 2026

Uh oh!

tensorrt-cicd commented May 13, 2026

Uh oh!

tensorrt-cicd commented May 13, 2026

Uh oh!

ssam18 commented May 13, 2026

Uh oh!

karljang commented May 13, 2026

Uh oh!

karljang commented May 13, 2026

Uh oh!

tensorrt-cicd commented May 13, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ssam18 commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading