Skip to content

[TRTLLM-12127][fix] VisualGen metadata updates#12862

Merged
chang-l merged 2 commits intoNVIDIA:mainfrom
o-stoner:user/o-stoner/visual-gen-pipeline-metadata
Apr 21, 2026
Merged

[TRTLLM-12127][fix] VisualGen metadata updates#12862
chang-l merged 2 commits intoNVIDIA:mainfrom
o-stoner:user/o-stoner/visual-gen-pipeline-metadata

Conversation

@o-stoner
Copy link
Copy Markdown
Collaborator

@o-stoner o-stoner commented Apr 8, 2026

Summary by CodeRabbit

  • Refactor

    • Restructured attention configuration handling to support shared metadata state across attention layers, improving resource management for TRTLLM backend operations.
  • Chores

    • Updated test configurations to properly initialize attention metadata state for backend-specific attention implementations.

Description

Reverted SageAttention PR by @xrq-phys #11718 included "pipeline-level metadata" updates (see example) which are not blocked by the new kernels or JIT workflow. This PR adds this behavior back to VisualGen. Currently, this PR does not add back in the test_attention_trtllm_sage.py AttentionOp unit test for Sage Attention, as this will only pass once SageAttention integration is complete. The rest of the tests previously modified under unittest/_torch/visual_gen have all been updated to include attention_metadata_state behavior, just without SageAttentionConfig integration.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@o-stoner o-stoner changed the title [none][fix] VisualGen metadata updates [None][fix] VisualGen metadata updates Apr 8, 2026
@o-stoner
Copy link
Copy Markdown
Collaborator Author

o-stoner commented Apr 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42403 [ run ] triggered by Bot. Commit: 95f9d70 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42403 [ run ] completed with state SUCCESS. Commit: 95f9d70
/LLM/main/L0_MergeRequest_PR pipeline #33176 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@o-stoner o-stoner marked this pull request as ready for review April 9, 2026 23:32
@o-stoner o-stoner requested review from a team as code owners April 9, 2026 23:32
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a shared attention metadata state mechanism for the TRTLLM attention backend. A new attention_metadata_state dictionary is created at the model configuration level and threaded through the attention creation pipeline, allowing a single metadata instance and capacity tracking to be shared across all attention layers in a model instance.

Changes

Cohort / File(s) Summary
Example Scripts
examples/visual_gen/visual_gen_wan_i2v.py, examples/visual_gen/visual_gen_wan_t2v.py
Refactored attention configuration construction by extracting attention_cfg as a standalone dictionary before passing it to kwargs, improving code organization and clarity.
TRTLLM Attention Backend
tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py
Modified TrtllmAttentionMetadata and TrtllmAttention to accept an external attention_metadata_state dict parameter. Metadata state is now shared across layers via this external dict instead of internal tracking, enabling capacity and metadata bookkeeping to be centralized across a model instance.
Attention Factory and Configuration
tensorrt_llm/_torch/visual_gen/attention_backend/utils.py, tensorrt_llm/_torch/visual_gen/config.py
Added create_attention_metadata_state() helper function and attention_metadata_state field to DiffusionModelConfig. Updated create_attention() factory to accept and validate the metadata state, requiring it for TRTLLM backend with a ValueError if missing.
Model and Module Integration
tensorrt_llm/_torch/visual_gen/models/ltx2/transformer_ltx2.py, tensorrt_llm/_torch/visual_gen/modules/attention.py
Updated model attention initialization to pass attention_config and attention_metadata_state through the attention creation pipeline, wiring the configuration metadata state into backend factories.
Test Configuration
tests/unittest/_torch/visual_gen/multi_gpu/test_flux_ulysses.py, tests/unittest/_torch/visual_gen/test_attention_integration.py, tests/unittest/_torch/visual_gen/test_attention_perf.py, tests/unittest/_torch/visual_gen/test_flux_attention.py, tests/unittest/_torch/visual_gen/test_ltx2_attention.py
Updated test configurations to initialize attention_metadata_state using create_attention_metadata_state() when using TRTLLM backend, ensuring tests properly set up the shared metadata state mechanism.

Sequence Diagram

sequenceDiagram
    participant Config as DiffusionModelConfig
    participant Module as attention.py<br/>(Attention Module)
    participant Factory as utils.py<br/>(create_attention)
    participant Backend as trtllm.py<br/>(TrtllmAttention)

    Config->>Config: Create attention_metadata_state<br/>{"metadata": None, "capacity": (0,0)}
    
    Module->>Module: Extract attention_metadata_state<br/>from config
    
    Module->>Factory: Call create_attention(...)<br/>attention_metadata_state=state
    
    Factory->>Factory: Validate: TRTLLM backend<br/>requires metadata_state
    
    alt TRTLLM Backend Valid
        Factory->>Backend: Instantiate with<br/>attention_metadata_state dict
        Backend->>Backend: Store reference to<br/>shared metadata state
        Backend-->>Factory: Return configured attention
    else Missing metadata_state
        Factory->>Factory: Raise ValueError
    end
    
    Factory-->>Module: Attention instance ready
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description provides context about reverting a SageAttention PR and re-adding pipeline-level metadata behavior, but the Description section is empty and Test Coverage section lacks specific test listings. Fill the 'Description' section with a brief explanation of the issue and solution, and provide specific test names in the 'Test Coverage' section that validate the metadata updates.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title '[TRTLLM-12127][fix] VisualGen metadata updates' is specific, concise, and accurately summarizes the main change—adding pipeline-level metadata updates to VisualGen.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/visual_gen/config.py (1)

1-5: ⚠️ Potential issue | 🟡 Minor

Add NVIDIA copyright header with 2026 year.

The file is missing the required SPDX copyright header. Add the header at the top:

# SPDX-FileCopyrightText: Copyright (c) 2025–2026, NVIDIA CORPORATION & AFFILIATES
# SPDX-License-Identifier: Apache-2.0

All TensorRT-LLM source files must include this header with the latest modification year.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/config.py` around lines 1 - 5, The file is
missing the required SPDX copyright header; add the two-line header exactly as
specified ("# SPDX-FileCopyrightText: Copyright (c) 2025–2026, NVIDIA
CORPORATION & AFFILIATES" and "# SPDX-License-Identifier: Apache-2.0") at the
very top of tensorrt_llm/_torch/visual_gen/config.py before any imports (above
the existing imports such as json, Enum, Path) so every source file includes the
latest modification year.
🧹 Nitpick comments (1)
tensorrt_llm/_torch/visual_gen/attention_backend/utils.py (1)

29-29: Unused attention_config parameter and import.

The AttentionConfig is imported (line 29) and attention_config is accepted as a parameter (line 81), but neither is used in the function body. Only attention_metadata_state is actually consumed (lines 113-119).

If this is a placeholder for future backend-specific configuration (e.g., FA4), consider either:

  1. Adding a TODO comment explaining planned usage
  2. Removing it until needed to avoid dead code

Also applies to: 81-81, 103-103

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/attention_backend/utils.py` at line 29, The
import AttentionConfig and the parameter attention_config on the function that
consumes attention_metadata_state are unused; either remove the import and the
attention_config parameter from the function signature (and any call sites) to
eliminate dead code, or keep them and add a clear TODO comment explaining the
intended future use for backend-specific configuration (e.g., FA4) near the
AttentionConfig import and inside the function where attention_metadata_state is
used; update references to attention_config in this module accordingly (symbols:
AttentionConfig, attention_config, attention_metadata_state).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tensorrt_llm/_torch/visual_gen/config.py`:
- Around line 1-5: The file is missing the required SPDX copyright header; add
the two-line header exactly as specified ("# SPDX-FileCopyrightText: Copyright
(c) 2025–2026, NVIDIA CORPORATION & AFFILIATES" and "# SPDX-License-Identifier:
Apache-2.0") at the very top of tensorrt_llm/_torch/visual_gen/config.py before
any imports (above the existing imports such as json, Enum, Path) so every
source file includes the latest modification year.

---

Nitpick comments:
In `@tensorrt_llm/_torch/visual_gen/attention_backend/utils.py`:
- Line 29: The import AttentionConfig and the parameter attention_config on the
function that consumes attention_metadata_state are unused; either remove the
import and the attention_config parameter from the function signature (and any
call sites) to eliminate dead code, or keep them and add a clear TODO comment
explaining the intended future use for backend-specific configuration (e.g.,
FA4) near the AttentionConfig import and inside the function where
attention_metadata_state is used; update references to attention_config in this
module accordingly (symbols: AttentionConfig, attention_config,
attention_metadata_state).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 504b39da-72dc-46d1-880a-a4f54d1bcdc2

📥 Commits

Reviewing files that changed from the base of the PR and between 2a0be45 and 4fbf4b1.

📒 Files selected for processing (12)
  • examples/visual_gen/visual_gen_wan_i2v.py
  • examples/visual_gen/visual_gen_wan_t2v.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
  • tensorrt_llm/_torch/visual_gen/config.py
  • tensorrt_llm/_torch/visual_gen/models/ltx2/transformer_ltx2.py
  • tensorrt_llm/_torch/visual_gen/modules/attention.py
  • tests/unittest/_torch/visual_gen/multi_gpu/test_flux_ulysses.py
  • tests/unittest/_torch/visual_gen/test_attention_integration.py
  • tests/unittest/_torch/visual_gen/test_attention_perf.py
  • tests/unittest/_torch/visual_gen/test_flux_attention.py
  • tests/unittest/_torch/visual_gen/test_ltx2_attention.py

@o-stoner
Copy link
Copy Markdown
Collaborator Author

o-stoner commented Apr 9, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42592 [ run ] triggered by Bot. Commit: 4fbf4b1 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42592 [ run ] completed with state SUCCESS. Commit: 4fbf4b1
/LLM/main/L0_MergeRequest_PR pipeline #33318 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Comment thread tensorrt_llm/_torch/visual_gen/config.py
Comment thread tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
@o-stoner
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

2 similar comments
@o-stoner
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@o-stoner
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43827 [ run ] triggered by Bot. Commit: 2d69ecb Link to invocation

@o-stoner
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43833 [ run ] triggered by Bot. Commit: 8390580 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43827 [ run ] completed with state ABORTED. Commit: 2d69ecb

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43833 [ run ] completed with state FAILURE. Commit: 8390580
/LLM/main/L0_MergeRequest_PR pipeline #34300 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@o-stoner
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43948 [ run ] triggered by Bot. Commit: 8390580 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43948 [ run ] completed with state SUCCESS. Commit: 8390580
/LLM/main/L0_MergeRequest_PR pipeline #34394 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Olivia Stoner <245287810+o-stoner@users.noreply.github.com>
@o-stoner o-stoner force-pushed the user/o-stoner/visual-gen-pipeline-metadata branch from 8390580 to b300b14 Compare April 17, 2026 19:26
@o-stoner
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44067 [ run ] triggered by Bot. Commit: b300b14 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44067 [ run ] completed with state SUCCESS. Commit: b300b14
/LLM/main/L0_MergeRequest_PR pipeline #34497 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@o-stoner
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44511 [ run ] triggered by Bot. Commit: 0f772c6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44511 [ run ] completed with state SUCCESS. Commit: 0f772c6
/LLM/main/L0_MergeRequest_PR pipeline #34910 completed with status: 'SUCCESS'

CI Report

Link to invocation

@o-stoner o-stoner changed the title [None][fix] VisualGen metadata updates [TRTLLM-12127][fix] VisualGen metadata updates Apr 21, 2026
@chang-l chang-l merged commit 6e5a339 into NVIDIA:main Apr 21, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants