Skip to content

[https://nvbugs/6244474][fix] AutoDeploy: skip explicit shape-prop after MLIR elementwise fusion#14795

Merged
MrGeva merged 2 commits into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6244474
Jun 4, 2026
Merged

[https://nvbugs/6244474][fix] AutoDeploy: skip explicit shape-prop after MLIR elementwise fusion#14795
MrGeva merged 2 commits into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6244474

Conversation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

@tensorrt-cicd tensorrt-cicd commented May 31, 2026

Summary

Fixes nvbugs/6244474: Llama-3.1-8B-Instruct-FP8 AutoDeploy pipeline crashes at attention output reshape (modeling_llama3.py:189) when both fuse_rope_into_trtllm_attention and mlir_elementwise_fusion are enabled.

Root cause

Both transforms run in the post_load_fusion stage. In post_load_fusion, fuse_rope_into_trtllm_attention deliberately rewires Q/K/V to a single fused-QKV tensor of shape (B, S, 6144) and records _trtllm_fused_qkv in node.meta; the actual op swap to trtllm_mha_with_cache happens later at cache_init. While the graph sits in this intermediate state, mlir_elementwise_fusion._apply was unconditionally calling run_shape_prop(new_gm) after FX reconstruction. FakeTensorProp then re-evaluated torch_attention.register_fake with query=key=value=fused_qkv, producing an output of shape (B, S, 6144), which fails the downstream attn_output.reshape(B, S, num_heads * head_dim = 4096).

The transform's YAML config already declares run_shape_prop: false (see mlir/agent_learnings.md §6: "Prefer run_shape_prop: false unless the transform specifically needs re-propagated shapes"). The redundant inline call contradicted that intent — a stale leftover from before the YAML flag was added in 7a4752df2e. The combination
became reachable only after #13859 moved mlir_elementwise_fusion to run after fuse_rope_into_trtllm_attention in the YAML ordering, and was first triggered by the Llama-3.1-8B FP8 config tuning in #14622.

Fix

  • Drop the inline canonicalize_graph + run_shape_prop calls in MLIRElementwiseFusion._apply.
  • Return is_clean=False, has_valid_shapes=False so the framework's _run_cleanup runs canonicalization (since run_graph_cleanup defaults to True) and downstream
    transforms with requires_shape_prop=True re-derive shapes via the framework's standard path — by which point insert_cached_attention has performed the real op swap
    and the graph is in a valid state.
  • Remove the stale waiver for perf/test_perf_sanity.py::test_e2e[aggr_upload-llama3_1_8b_fp8_ad_hopper-llama3_1_8b_ad_ws1_1k1k].

Summary by CodeRabbit

  • Refactor
    • Optimized internal graph transformation pipeline by deferring validation operations to later processing stages for improved efficiency during model compilation.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 31, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This pull request modifies the MLIR elementwise fusion transform to defer graph cleanup and shape propagation operations to later framework stages rather than performing them immediately. The returned TransformInfo now marks the intermediate graph as non-canonical and lacking valid shapes.

Changes

Deferred graph cleanup

Layer / File(s) Summary
Step 6 deferred cleanup logic
tensorrt_llm/_torch/auto_deploy/transform/library/mlir_elementwise_fusion.py
Step 6 removes immediate graph canonicalization and shape propagation calls, instead deferring these cleanup operations to later stages. TransformInfo now reports is_clean=False and has_valid_shapes=False to indicate the graph is in an intermediate state where tensor ranks may be intentionally invalid during later Q/K/V rewiring.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly and specifically describes the main change: skipping explicit shape propagation after MLIR elementwise fusion, which matches the file modified and the PR's core objective.
Description check ✅ Passed The PR description provides comprehensive context including root cause analysis, detailed explanation of the fix, and references to related issues and commits.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@tensorrt-cicd tensorrt-cicd force-pushed the repair-bot-bug6244474 branch from 4e2a6bc to ae001c3 Compare May 31, 2026 10:43
@MrGeva MrGeva changed the title [https://nvbugs/6244474][fix] Remove the inline run_shape_prop(new_gm) call and report has_valid_shapes=False [https://nvbugs/6244474] [fix] Remove the inline run_shape_prop(new_gm) call and report has_valid_shapes=False May 31, 2026
@MrGeva
Copy link
Copy Markdown
Collaborator

MrGeva commented May 31, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

@MrGeva MrGeva changed the title [https://nvbugs/6244474] [fix] Remove the inline run_shape_prop(new_gm) call and report has_valid_shapes=False [https://nvbugs/6244474][fix] Remove the inline run_shape_prop(new_gm) call and report has_valid_shapes=False May 31, 2026
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51255 [ run ] triggered by Bot. Commit: ae001c3 Link to invocation

@MrGeva MrGeva changed the title [https://nvbugs/6244474][fix] Remove the inline run_shape_prop(new_gm) call and report has_valid_shapes=False [https://nvbugs/6244474][fix] AutoDeploy: skip explicit shape-prop after MLIR elementwise fusion May 31, 2026
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51255 [ run ] completed with state FAILURE. Commit: ae001c3
/LLM/main/L0_MergeRequest_PR pipeline #40676 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@MrGeva
Copy link
Copy Markdown
Collaborator

MrGeva commented Jun 1, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51302 [ run ] triggered by Bot. Commit: ae001c3 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51302 [ run ] completed with state SUCCESS. Commit: ae001c3
/LLM/main/L0_MergeRequest_PR pipeline #40719 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@MrGeva
Copy link
Copy Markdown
Collaborator

MrGeva commented Jun 1, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

@MrGeva MrGeva enabled auto-merge (squash) June 1, 2026 12:41
@MrGeva
Copy link
Copy Markdown
Collaborator

MrGeva commented Jun 2, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51628 [ run ] triggered by Bot. Commit: ae001c3 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51628 [ run ] completed with state FAILURE. Commit: ae001c3
/LLM/main/L0_MergeRequest_PR pipeline #41013 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@MrGeva
Copy link
Copy Markdown
Collaborator

MrGeva commented Jun 3, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51756 [ run ] triggered by Bot. Commit: ae001c3 Link to invocation

… fusion

The MLIR elementwise fusion transform unconditionally ran fake-tensor
shape propagation on the FX-reconstructed graph. On Llama-3.1-8B FP8
(post-NVIDIA#14622 YAML), fuse_rope_into_trtllm_attention rewires Q/K/V to a
single fused-QKV tensor of shape (B, S, 6144) and stores _trtllm_fused_qkv
in node.meta; the actual op swap happens later at cache_init. While the
graph sits in this deliberately invalid intermediate state at
post_load_fusion, FakeTensorProp re-evaluates torch_attention.register_fake
with query=value=fused_qkv and produces (B, S, 6144), which then fails
the downstream attn_output.reshape(B, S, num_heads * head_dim = 4096) at
modeling_llama3.py:189.

The transform's YAML config already declares run_shape_prop: false (see
mlir/agent_learnings.md section 6: 'Prefer run_shape_prop: false unless
the transform specifically needs re-propagated shapes'). The redundant
inline call contradicts that intent. Drop it and return
has_valid_shapes=False so downstream transforms that require shape
metadata re-derive it via the framework's _run_cleanup, which handles
these intermediate states correctly.

Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
@MrGeva MrGeva force-pushed the repair-bot-bug6244474 branch from ae001c3 to 30979b5 Compare June 3, 2026 05:51
@MrGeva
Copy link
Copy Markdown
Collaborator

MrGeva commented Jun 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51768 [ run ] triggered by Bot. Commit: 30979b5 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51756 [ run ] completed with state ABORTED. Commit: ae001c3

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51768 [ run ] completed with state SUCCESS. Commit: 30979b5
/LLM/main/L0_MergeRequest_PR pipeline #41135 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@MrGeva
Copy link
Copy Markdown
Collaborator

MrGeva commented Jun 3, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51799 [ run ] triggered by Bot. Commit: 30979b5 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51799 [ run ] completed with state SUCCESS. Commit: 30979b5
/LLM/main/L0_MergeRequest_PR pipeline #41163 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@MrGeva
Copy link
Copy Markdown
Collaborator

MrGeva commented Jun 3, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51860 [ run ] triggered by Bot. Commit: 30979b5 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51860 [ run ] completed with state SUCCESS. Commit: 30979b5
/LLM/main/L0_MergeRequest_PR pipeline #41218 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@MrGeva
Copy link
Copy Markdown
Collaborator

MrGeva commented Jun 4, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51997 [ run ] triggered by Bot. Commit: 30979b5 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator Author

PR_Github #51997 [ run ] completed with state SUCCESS. Commit: 30979b5
/LLM/main/L0_MergeRequest_PR pipeline #41340 completed with status: 'SUCCESS'

CI Report

Link to invocation

@MrGeva MrGeva merged commit 8b0eba9 into NVIDIA:main Jun 4, 2026
7 checks passed
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Jun 4, 2026
…ter MLIR elementwise fusion (NVIDIA#14795)

Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants