Skip to content

[None][refactor] VisualGen attention backend refactor#12663

Merged
chang-l merged 3 commits intoNVIDIA:mainfrom
NVShreyas:user/shreyasm/vg-attn-backend-refactor
Apr 2, 2026
Merged

[None][refactor] VisualGen attention backend refactor#12663
chang-l merged 3 commits intoNVIDIA:mainfrom
NVShreyas:user/shreyasm/vg-attn-backend-refactor

Conversation

@NVShreyas
Copy link
Copy Markdown
Collaborator

@NVShreyas NVShreyas commented Apr 1, 2026

Summary by CodeRabbit

Release Notes

  • Refactor
    • Introduced a standardized AttentionBackend interface to unify behavior across attention implementations.
    • Simplified attention forward method signatures by automatically deriving batch size and sequence length from tensor shapes.
    • Updated FlashAttention, TRTLLM, Ulysses, and Vanilla attention implementations to adopt the new interface.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@NVShreyas NVShreyas requested a review from a team as a code owner April 1, 2026 16:38
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

This PR introduces a standardized AttentionBackend abstract base class and refactors all attention backend implementations to inherit from it. The changes remove explicit dimension parameters (batch_size, seq_len) from forward signatures, replacing them with shape-derived computations. Callers are updated to stop passing these dimensions.

Changes

Cohort / File(s) Summary
Interface Definition
tensorrt_llm/_torch/visual_gen/attention_backend/interface.py, tensorrt_llm/_torch/visual_gen/attention_backend/__init__.py
Introduced AttentionBackend as nn.Module + ABC with abstract forward(q, k, v, **kwargs) method and preferred_layout property. Added default support_fused_qkv() classmethod. Exported via __init__.py.
Backend Implementations
tensorrt_llm/_torch/visual_gen/attention_backend/flash_attn4.py, tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py, tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py, tensorrt_llm/_torch/visual_gen/attention_backend/parallel.py
Updated all backends to inherit AttentionBackend. Removed batch_size/seq_len/seq_len_kv parameters from forward() signatures and implemented shape-based dimension derivation (e.g., batch_size = q.shape[0]). Made attention_mask keyword-only.
Type Utilities
tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
Replaced DiffusionAttentionBackend union type with concrete AttentionBackend in return type annotations. Updated docstrings to document "FA4" backend support.
Attention Module Wrapper
tensorrt_llm/_torch/visual_gen/modules/attention.py
Modified _attn_impl() to remove explicit batch_size, seq_len, kv_seq_len parameters. Backend calls now use shape-derived values via **kwargs forwarding.
Model Callers
tensorrt_llm/_torch/visual_gen/models/flux/attention.py, tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py
Updated attention backend invocations to remove dimension arguments. Calls changed from _attn_impl(q, k, v, batch_size, seq_len) to _attn_impl(q, k, v).
Tests
tests/unittest/_torch/visual_gen/multi_gpu/test_ulysses_attention.py
Updated test backend class _FusedVanillaAttention to inherit AttentionBackend. Removed dimension arguments from all attention invocation sites in test assertions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is essentially empty, containing only the template structure with no actual content filling the required sections. Provide a clear description of what is being refactored and why. Include implementation details, test coverage information, and ensure all PR checklist items are explicitly addressed with actual content rather than template placeholders.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: refactoring the VisualGen attention backend interface and implementations.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py (2)

1-1: ⚠️ Potential issue | 🟡 Minor

Update copyright year to 2026.

Per coding guidelines, the copyright year should reflect the latest meaningful modification.

Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

As per coding guidelines: "All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the year of its latest meaningful modification."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py` at line 1, Update
the SPDX copyright header in the file by changing the year from 2025 to 2026:
locate the line beginning with "# SPDX-FileCopyrightText: Copyright (c) 2025
NVIDIA CORPORATION & AFFILIATES. All rights reserved." (the SPDX header in
trtllm.py) and revise the year to 2026 so it reads 2026.

242-245: ⚠️ Potential issue | 🟡 Minor

Handle mismatched k/v None states.

If only one of k or v is None (but not both), the code falls through to _concat_qkv which will fail when accessing .view() on None. Consider adding validation or explicitly handling this edge case.

Proposed fix
+        if (k is None) != (v is None):
+            raise ValueError("k and v must both be None (fused QKV) or both provided")
+
         if k is None and v is None:
             qkv = q.reshape(batch_size * seq_len, -1)
         else:
             qkv = self._concat_qkv(q, k, v, batch_size, seq_len, kv_seq_len)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py` around lines 242
- 245, The code currently only handles both k and v being None or both present;
add a guard in the block using q, k, v (around the qkv assignment) to detect
mismatched None states (when (k is None) != (v is None)) and handle it: either
raise a clear ValueError mentioning _concat_qkv and the mismatched k/v state, or
normalize by setting the missing tensor to the present one before calling
self._concat_qkv; ensure the check references k, v, self._concat_qkv, and
q.reshape so the intent and location are clear.
tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Update copyright year to 2026.

Per coding guidelines, the NVIDIA copyright header should include the year of the latest meaningful modification. Since this file is being modified in 2026, the copyright year should be updated.

Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

As per coding guidelines: "All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the year of its latest meaningful modification."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py` at line 1,
Update the copyright year in the SPDX header: locate the SPDX comment line that
currently reads "# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION
& AFFILIATES. All rights reserved." and change the year from 2025 to 2026 so the
header reflects the latest modification year.
🧹 Nitpick comments (3)
tensorrt_llm/_torch/visual_gen/modules/attention.py (1)

254-254: Call the backend module via __call__, not forward() directly.

Directly invoking self.attn.forward(...) bypasses nn.Module.__call__, so forward hooks and wrappers on the backend never run. Using self.attn(...) preserves the normal PyTorch dispatch path and still reaches the same implementation.

Suggested fix
-        out = self.attn.forward(q=q, k=k, v=v, **kwargs)
+        out = self.attn(q=q, k=k, v=v, **kwargs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/modules/attention.py` at line 254, Replace the
direct backend forward invocation with a normal module call so PyTorch
hooks/wrappers run: change the call site that currently does out =
self.attn.forward(q=q, k=k, v=v, **kwargs) to use the module __call__ (e.g., out
= self.attn(q=q, k=k, v=v, **kwargs)) in the method where self.attn is used so
that nn.Module.__call__ dispatch executes; ensure any keyword/positional
arguments remain identical to preserve behavior.
tests/unittest/_torch/visual_gen/multi_gpu/test_ulysses_attention.py (1)

455-466: Keep the new backend test double typed like the interface it is validating.

_FusedVanillaAttention.forward() drops the tensor and return annotations right where this class starts modeling the shared AttentionBackend contract. Adding them here makes interface drift visible to type checkers instead of silently collapsing to Any.

Suggested fix
-    def forward(self, q, k=None, v=None, **kwargs):
+    def forward(
+        self,
+        q: torch.Tensor,
+        k: torch.Tensor | None = None,
+        v: torch.Tensor | None = None,
+        **kwargs,
+    ) -> torch.Tensor:

As per coding guidelines, "Static type checking is opt-in by submodule PICs in Python. Always annotate functions with return types, and make the return type None if the function does not return anything."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/visual_gen/multi_gpu/test_ulysses_attention.py` around
lines 455 - 466, The new test backend _FusedVanillaAttention must preserve the
same typed signature as the AttentionBackend interface: add explicit parameter
and return type annotations on _FusedVanillaAttention.forward (e.g. q:
torch.Tensor, k: Optional[torch.Tensor]=None, v: Optional[torch.Tensor]=None,
**kwargs) -> torch.Tensor (or -> None if the interface forward returns None) so
static type checkers catch drift; ensure you import Optional and torch.Tensor
and match the exact return type declared on AttentionBackend.forward.
tensorrt_llm/_torch/visual_gen/attention_backend/parallel.py (1)

166-168: support_fused_qkv() is broader than this wrapper's public API.

UlyssesAttention.forward() still requires separate q, k, and v and only builds fused QKV internally. Returning True here makes the wrapper look interchangeable with backends that actually accept fused inputs, which is not true today. Either document this as an internal optimization hint or make the wrapper accept fused inputs as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/parallel.py` around lines
166 - 168, The method support_fused_qkv() currently returns True but the
wrapper’s public API (UlyssesAttention.forward) still requires separate q, k, v;
change support_fused_qkv() to return False so the wrapper is not advertised as
accepting fused inputs, or alternatively implement fused-input handling in
UlyssesAttention.forward (accept a single fused qkv tensor, split it into q/k/v
before existing logic) and update any input checks; prefer the first option (set
support_fused_qkv() -> False) unless you also add fused-input parsing in
UlyssesAttention.forward.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/visual_gen/attention_backend/interface.py`:
- Around line 48-55: Update the abstract base class method signature for
AttentionBackend.forward to accept Optional[torch.Tensor] for k and v (matching
implementations like TRTLLMAttention and call sites such as
UlyssesAttention._forward_fused which pass k=None, v=None), and split the
one-line ellipsis stubs into their own lines (replace single-line "->
torch.Tensor: ..." with a proper signature line followed by a separate line
containing "..." for both forward and the preferred_layout property) so the ABC
matches implementations and fixes the E704 style violation.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py`:
- Line 148: The MRO breaks because AttentionBackend.__init__ in the main backend
doesn't call super().__init__, so nn.Module.__init__ is never run when creating
TrtllmAttention (which inherits BaseTrtllmAttention, AttentionBackend); fix by
updating the main backend AttentionBackend.__init__ to call
super().__init__(**kwargs) (preserving existing logic) so the init chain reaches
visual_gen AttentionBackend -> nn.Module, or alternatively ensure
TrtllmAttention.__init__ explicitly invokes nn.Module.__init__(self) before
other inits; reference AttentionBackend.__init__, TrtllmAttention, and
BaseTrtllmAttention when making the change.

---

Outside diff comments:
In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py`:
- Line 1: Update the SPDX copyright header in the file by changing the year from
2025 to 2026: locate the line beginning with "# SPDX-FileCopyrightText:
Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved." (the
SPDX header in trtllm.py) and revise the year to 2026 so it reads 2026.
- Around line 242-245: The code currently only handles both k and v being None
or both present; add a guard in the block using q, k, v (around the qkv
assignment) to detect mismatched None states (when (k is None) != (v is None))
and handle it: either raise a clear ValueError mentioning _concat_qkv and the
mismatched k/v state, or normalize by setting the missing tensor to the present
one before calling self._concat_qkv; ensure the check references k, v,
self._concat_qkv, and q.reshape so the intent and location are clear.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py`:
- Line 1: Update the copyright year in the SPDX header: locate the SPDX comment
line that currently reads "# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA
CORPORATION & AFFILIATES. All rights reserved." and change the year from 2025 to
2026 so the header reflects the latest modification year.

---

Nitpick comments:
In `@tensorrt_llm/_torch/visual_gen/attention_backend/parallel.py`:
- Around line 166-168: The method support_fused_qkv() currently returns True but
the wrapper’s public API (UlyssesAttention.forward) still requires separate q,
k, v; change support_fused_qkv() to return False so the wrapper is not
advertised as accepting fused inputs, or alternatively implement fused-input
handling in UlyssesAttention.forward (accept a single fused qkv tensor, split it
into q/k/v before existing logic) and update any input checks; prefer the first
option (set support_fused_qkv() -> False) unless you also add fused-input
parsing in UlyssesAttention.forward.

In `@tensorrt_llm/_torch/visual_gen/modules/attention.py`:
- Line 254: Replace the direct backend forward invocation with a normal module
call so PyTorch hooks/wrappers run: change the call site that currently does out
= self.attn.forward(q=q, k=k, v=v, **kwargs) to use the module __call__ (e.g.,
out = self.attn(q=q, k=k, v=v, **kwargs)) in the method where self.attn is used
so that nn.Module.__call__ dispatch executes; ensure any keyword/positional
arguments remain identical to preserve behavior.

In `@tests/unittest/_torch/visual_gen/multi_gpu/test_ulysses_attention.py`:
- Around line 455-466: The new test backend _FusedVanillaAttention must preserve
the same typed signature as the AttentionBackend interface: add explicit
parameter and return type annotations on _FusedVanillaAttention.forward (e.g. q:
torch.Tensor, k: Optional[torch.Tensor]=None, v: Optional[torch.Tensor]=None,
**kwargs) -> torch.Tensor (or -> None if the interface forward returns None) so
static type checkers catch drift; ensure you import Optional and torch.Tensor
and match the exact return type declared on AttentionBackend.forward.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 27942b7e-f67f-4d22-ba17-6501dcd38b6d

📥 Commits

Reviewing files that changed from the base of the PR and between 1a65a0d and 7411cfd.

📒 Files selected for processing (11)
  • tensorrt_llm/_torch/visual_gen/attention_backend/__init__.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/flash_attn4.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/interface.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/parallel.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py
  • tensorrt_llm/_torch/visual_gen/models/flux/attention.py
  • tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py
  • tensorrt_llm/_torch/visual_gen/modules/attention.py
  • tests/unittest/_torch/visual_gen/multi_gpu/test_ulysses_attention.py

@NVShreyas
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41238 [ run ] triggered by Bot. Commit: 7411cfd Link to invocation

@NVShreyas
Copy link
Copy Markdown
Collaborator Author

/bot kill

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41239 [ kill ] triggered by Bot. Commit: 4b48264 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41238 [ run ] completed with state ABORTED. Commit: 7411cfd

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41239 [ kill ] completed with state SUCCESS. Commit: 4b48264
Successfully killed previous jobs for commit 4b48264

Link to invocation

@NVShreyas
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41242 [ run ] triggered by Bot. Commit: 4b48264 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41242 [ run ] completed with state SUCCESS. Commit: 4b48264
/LLM/main/L0_MergeRequest_PR pipeline #32201 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@NVShreyas
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41290 [ run ] triggered by Bot. Commit: 4b48264 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41290 [ run ] completed with state SUCCESS. Commit: 4b48264
/LLM/main/L0_MergeRequest_PR pipeline #32247 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
@NVShreyas NVShreyas force-pushed the user/shreyasm/vg-attn-backend-refactor branch from 4b48264 to 4ec41ad Compare April 2, 2026 14:52
@NVShreyas
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41443 [ run ] triggered by Bot. Commit: 4ec41ad Link to invocation

Copy link
Copy Markdown
Collaborator

@chang-l chang-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, LGTM with minor comments

Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
@NVShreyas
Copy link
Copy Markdown
Collaborator Author

/bot kill

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41467 [ kill ] triggered by Bot. Commit: 53312e6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41467 [ kill ] completed with state SUCCESS. Commit: 53312e6
Successfully killed previous jobs for commit 53312e6

Link to invocation

@NVShreyas
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41475 [ run ] triggered by Bot. Commit: 53312e6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41475 [ run ] completed with state SUCCESS. Commit: 53312e6
/LLM/main/L0_MergeRequest_PR pipeline #32400 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@NVShreyas
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41512 [ run ] triggered by Bot. Commit: 53312e6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41512 [ run ] completed with state SUCCESS. Commit: 53312e6
/LLM/main/L0_MergeRequest_PR pipeline #32428 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@NVShreyas
Copy link
Copy Markdown
Collaborator Author

/bot help

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@NVShreyas
Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "Failing autodeploy DS test is unrelated to this PRs changes"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41515 [ skip ] triggered by Bot. Commit: 53312e6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41515 [ skip ] completed with state SUCCESS. Commit: 53312e6
Skipping testing for commit 53312e6

Link to invocation

@chang-l chang-l merged commit 45449ad into NVIDIA:main Apr 2, 2026
5 checks passed
govind-ramnarayan pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Apr 6, 2026
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants