[TRTLLM-12127][fix] VisualGen metadata updates by o-stoner · Pull Request #12862 · NVIDIA/TensorRT-LLM

o-stoner · 2026-04-08T21:53:18Z

Summary by CodeRabbit

Refactor
- Restructured attention configuration handling to support shared metadata state across attention layers, improving resource management for TRTLLM backend operations.
Chores
- Updated test configurations to properly initialize attention metadata state for backend-specific attention implementations.

Description

Reverted SageAttention PR by @xrq-phys #11718 included "pipeline-level metadata" updates (see example) which are not blocked by the new kernels or JIT workflow. This PR adds this behavior back to VisualGen. Currently, this PR does not add back in the test_attention_trtllm_sage.py AttentionOp unit test for Sage Attention, as this will only pass once SageAttention integration is complete. The rest of the tests previously modified under unittest/_torch/visual_gen have all been updated to include attention_metadata_state behavior, just without SageAttentionConfig integration.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

o-stoner · 2026-04-08T21:56:08Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-08T22:04:09Z

PR_Github #42403 [ run ] triggered by Bot. Commit: 95f9d70 Link to invocation

tensorrt-cicd · 2026-04-09T04:20:50Z

PR_Github #42403 [ run ] completed with state SUCCESS. Commit: 95f9d70
/LLM/main/L0_MergeRequest_PR pipeline #33176 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

coderabbitai · 2026-04-09T23:37:44Z

📝 Walkthrough

Walkthrough

This pull request introduces a shared attention metadata state mechanism for the TRTLLM attention backend. A new attention_metadata_state dictionary is created at the model configuration level and threaded through the attention creation pipeline, allowing a single metadata instance and capacity tracking to be shared across all attention layers in a model instance.

Changes

Cohort / File(s)	Summary
Example Scripts `examples/visual_gen/visual_gen_wan_i2v.py`, `examples/visual_gen/visual_gen_wan_t2v.py`	Refactored attention configuration construction by extracting `attention_cfg` as a standalone dictionary before passing it to `kwargs`, improving code organization and clarity.
TRTLLM Attention Backend `tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py`	Modified `TrtllmAttentionMetadata` and `TrtllmAttention` to accept an external `attention_metadata_state` dict parameter. Metadata state is now shared across layers via this external dict instead of internal tracking, enabling capacity and metadata bookkeeping to be centralized across a model instance.
Attention Factory and Configuration `tensorrt_llm/_torch/visual_gen/attention_backend/utils.py`, `tensorrt_llm/_torch/visual_gen/config.py`	Added `create_attention_metadata_state()` helper function and `attention_metadata_state` field to `DiffusionModelConfig`. Updated `create_attention()` factory to accept and validate the metadata state, requiring it for TRTLLM backend with a `ValueError` if missing.
Model and Module Integration `tensorrt_llm/_torch/visual_gen/models/ltx2/transformer_ltx2.py`, `tensorrt_llm/_torch/visual_gen/modules/attention.py`	Updated model attention initialization to pass `attention_config` and `attention_metadata_state` through the attention creation pipeline, wiring the configuration metadata state into backend factories.
Test Configuration `tests/unittest/_torch/visual_gen/multi_gpu/test_flux_ulysses.py`, `tests/unittest/_torch/visual_gen/test_attention_integration.py`, `tests/unittest/_torch/visual_gen/test_attention_perf.py`, `tests/unittest/_torch/visual_gen/test_flux_attention.py`, `tests/unittest/_torch/visual_gen/test_ltx2_attention.py`	Updated test configurations to initialize `attention_metadata_state` using `create_attention_metadata_state()` when using TRTLLM backend, ensuring tests properly set up the shared metadata state mechanism.

Sequence Diagram

sequenceDiagram
    participant Config as DiffusionModelConfig
    participant Module as attention.py<br/>(Attention Module)
    participant Factory as utils.py<br/>(create_attention)
    participant Backend as trtllm.py<br/>(TrtllmAttention)

    Config->>Config: Create attention_metadata_state<br/>{"metadata": None, "capacity": (0,0)}
    
    Module->>Module: Extract attention_metadata_state<br/>from config
    
    Module->>Factory: Call create_attention(...)<br/>attention_metadata_state=state
    
    Factory->>Factory: Validate: TRTLLM backend<br/>requires metadata_state
    
    alt TRTLLM Backend Valid
        Factory->>Backend: Instantiate with<br/>attention_metadata_state dict
        Backend->>Backend: Store reference to<br/>shared metadata state
        Backend-->>Factory: Return configured attention
    else Missing metadata_state
        Factory->>Factory: Raise ValueError
    end
    
    Factory-->>Module: Attention instance ready

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description provides context about reverting a SageAttention PR and re-adding pipeline-level metadata behavior, but the Description section is empty and Test Coverage section lacks specific test listings.	Fill the 'Description' section with a brief explanation of the issue and solution, and provide specific test names in the 'Test Coverage' section that validate the metadata updates.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[TRTLLM-12127][fix] VisualGen metadata updates' is specific, concise, and accurately summarizes the main change—adding pipeline-level metadata updates to VisualGen.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/visual_gen/config.py (1)
1-5: ⚠️ Potential issue | 🟡 Minor

Add NVIDIA copyright header with 2026 year.

The file is missing the required SPDX copyright header. Add the header at the top:
# SPDX-FileCopyrightText: Copyright (c) 2025–2026, NVIDIA CORPORATION & AFFILIATES
# SPDX-License-Identifier: Apache-2.0
All TensorRT-LLM source files must include this header with the latest modification year.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/config.py` around lines 1 - 5, The file is
missing the required SPDX copyright header; add the two-line header exactly as
specified ("# SPDX-FileCopyrightText: Copyright (c) 2025–2026, NVIDIA
CORPORATION & AFFILIATES" and "# SPDX-License-Identifier: Apache-2.0") at the
very top of tensorrt_llm/_torch/visual_gen/config.py before any imports (above
the existing imports such as json, Enum, Path) so every source file includes the
latest modification year.

🧹 Nitpick comments (1)

tensorrt_llm/_torch/visual_gen/attention_backend/utils.py (1)
29-29: Unused attention_config parameter and import.

The AttentionConfig is imported (line 29) and attention_config is accepted as a parameter (line 81), but neither is used in the function body. Only attention_metadata_state is actually consumed (lines 113-119).

If this is a placeholder for future backend-specific configuration (e.g., FA4), consider either:

Adding a TODO comment explaining planned usage

Removing it until needed to avoid dead code

Also applies to: 81-81, 103-103
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/utils.py` at line 29, The
import AttentionConfig and the parameter attention_config on the function that
consumes attention_metadata_state are unused; either remove the import and the
attention_config parameter from the function signature (and any call sites) to
eliminate dead code, or keep them and add a clear TODO comment explaining the
intended future use for backend-specific configuration (e.g., FA4) near the
AttentionConfig import and inside the function where attention_metadata_state is
used; update references to attention_config in this module accordingly (symbols:
AttentionConfig, attention_config, attention_metadata_state).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tensorrt_llm/_torch/visual_gen/config.py`:
- Around line 1-5: The file is missing the required SPDX copyright header; add
the two-line header exactly as specified ("# SPDX-FileCopyrightText: Copyright
(c) 2025–2026, NVIDIA CORPORATION & AFFILIATES" and "# SPDX-License-Identifier:
Apache-2.0") at the very top of tensorrt_llm/_torch/visual_gen/config.py before
any imports (above the existing imports such as json, Enum, Path) so every
source file includes the latest modification year.

---

Nitpick comments:
In `@tensorrt_llm/_torch/visual_gen/attention_backend/utils.py`:
- Line 29: The import AttentionConfig and the parameter attention_config on the
function that consumes attention_metadata_state are unused; either remove the
import and the attention_config parameter from the function signature (and any
call sites) to eliminate dead code, or keep them and add a clear TODO comment
explaining the intended future use for backend-specific configuration (e.g.,
FA4) near the AttentionConfig import and inside the function where
attention_metadata_state is used; update references to attention_config in this
module accordingly (symbols: AttentionConfig, attention_config,
attention_metadata_state).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 504b39da-72dc-46d1-880a-a4f54d1bcdc2

📥 Commits

Reviewing files that changed from the base of the PR and between 2a0be45 and 4fbf4b1.

📒 Files selected for processing (12)

examples/visual_gen/visual_gen_wan_i2v.py
examples/visual_gen/visual_gen_wan_t2v.py
tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py
tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
tensorrt_llm/_torch/visual_gen/config.py
tensorrt_llm/_torch/visual_gen/models/ltx2/transformer_ltx2.py
tensorrt_llm/_torch/visual_gen/modules/attention.py
tests/unittest/_torch/visual_gen/multi_gpu/test_flux_ulysses.py
tests/unittest/_torch/visual_gen/test_attention_integration.py
tests/unittest/_torch/visual_gen/test_attention_perf.py
tests/unittest/_torch/visual_gen/test_flux_attention.py
tests/unittest/_torch/visual_gen/test_ltx2_attention.py

o-stoner · 2026-04-09T23:38:23Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-09T23:44:15Z

PR_Github #42592 [ run ] triggered by Bot. Commit: 4fbf4b1 Link to invocation

tensorrt-cicd · 2026-04-10T06:23:47Z

PR_Github #42592 [ run ] completed with state SUCCESS. Commit: 4fbf4b1
/LLM/main/L0_MergeRequest_PR pipeline #33318 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

o-stoner · 2026-04-15T17:13:14Z

/bot run --disable-fail-fast

o-stoner · 2026-04-15T17:28:58Z

/bot run --disable-fail-fast

o-stoner · 2026-04-16T18:39:59Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-16T18:47:47Z

PR_Github #43827 [ run ] triggered by Bot. Commit: 2d69ecb Link to invocation

o-stoner · 2026-04-16T20:01:06Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-16T20:07:40Z

PR_Github #43833 [ run ] triggered by Bot. Commit: 8390580 Link to invocation

tensorrt-cicd · 2026-04-16T20:07:44Z

PR_Github #43827 [ run ] completed with state ABORTED. Commit: 2d69ecb

Link to invocation

tensorrt-cicd · 2026-04-17T01:27:50Z

PR_Github #43833 [ run ] completed with state FAILURE. Commit: 8390580
/LLM/main/L0_MergeRequest_PR pipeline #34300 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

o-stoner · 2026-04-17T04:20:30Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-17T04:26:58Z

PR_Github #43948 [ run ] triggered by Bot. Commit: 8390580 Link to invocation

tensorrt-cicd · 2026-04-17T09:15:49Z

PR_Github #43948 [ run ] completed with state SUCCESS. Commit: 8390580
/LLM/main/L0_MergeRequest_PR pipeline #34394 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>

o-stoner · 2026-04-17T19:27:33Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-17T19:35:44Z

PR_Github #44067 [ run ] triggered by Bot. Commit: b300b14 Link to invocation

tensorrt-cicd · 2026-04-18T13:48:42Z

PR_Github #44067 [ run ] completed with state SUCCESS. Commit: b300b14
/LLM/main/L0_MergeRequest_PR pipeline #34497 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

o-stoner · 2026-04-20T19:40:57Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-20T19:47:16Z

PR_Github #44511 [ run ] triggered by Bot. Commit: 0f772c6 Link to invocation

tensorrt-cicd · 2026-04-21T00:35:08Z

PR_Github #44511 [ run ] completed with state SUCCESS. Commit: 0f772c6
/LLM/main/L0_MergeRequest_PR pipeline #34910 completed with status: 'SUCCESS'

CI Report

Link to invocation

github-actions Bot assigned o-stoner Apr 8, 2026

o-stoner changed the title ~~[none][fix] VisualGen metadata updates~~ [None][fix] VisualGen metadata updates Apr 8, 2026

o-stoner marked this pull request as ready for review April 9, 2026 23:32

o-stoner requested review from a team as code owners April 9, 2026 23:32

o-stoner requested review from chang-l, laikhtewari, xrq-phys and zhenhuaw-me April 9, 2026 23:32

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

chang-l reviewed Apr 10, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/visual_gen/config.py

Comment thread tensorrt_llm/_torch/visual_gen/attention_backend/utils.py

metadata + attention config updates

b300b14

Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>

o-stoner force-pushed the user/o-stoner/visual-gen-pipeline-metadata branch from 8390580 to b300b14 Compare April 17, 2026 19:26

Merge branch 'main' into user/o-stoner/visual-gen-pipeline-metadata

0f772c6

o-stoner changed the title ~~[None][fix] VisualGen metadata updates~~ [TRTLLM-12127][fix] VisualGen metadata updates Apr 21, 2026

chang-l approved these changes Apr 21, 2026

View reviewed changes

chang-l merged commit 6e5a339 into NVIDIA:main Apr 21, 2026
7 checks passed

Conversation

o-stoner commented Apr 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

o-stoner commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

coderabbitai Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

o-stoner commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

Uh oh!

Uh oh!

o-stoner commented Apr 15, 2026

Uh oh!

o-stoner commented Apr 15, 2026

Uh oh!

o-stoner commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

o-stoner commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

o-stoner commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

o-stoner commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 18, 2026

Uh oh!

o-stoner commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

o-stoner commented Apr 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading