[None][revert] Revert "[TRTLLM-11119][feat] Blackwell SageAttention, Integrate into …#12679
Conversation
…AttentionOp API (NVIDIA#11718)" This reverts commit 1b66e96. Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
|
/bot run --disable-fail-fast |
📝 WalkthroughWalkthroughComprehensive removal of SageAttention (per-block quantized attention) support from TensorRT-LLM, including deletion of SageQuant kernels, simplification of FMHA kernel metadata/loading logic, removal of Sage parameters from attention operators and Python APIs, and elimination of VisualGen-specific kernel infrastructure. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.h (1)
1-15:⚠️ Potential issue | 🟡 MinorUpdate copyright year to include 2025.
The file has been meaningfully modified (constructor signature simplified, SageAttention members removed) but the copyright year still shows 2020-2023. As per coding guidelines, the copyright header should include the year of its latest meaningful modification.
Suggested fix
/* - * Copyright (c) 2020-2023, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2020-2025, NVIDIA CORPORATION. All rights reserved. *🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.h` around lines 1 - 15, Update the top-of-file copyright header in fmhaRunner.h to include 2025 (e.g., change "2020-2023" to "2020-2025") to reflect the latest meaningful modifications; locate the header comment block at the top of the file and edit the year range accordingly so it matches current guidelines.
🧹 Nitpick comments (4)
tests/unittest/_torch/visual_gen/test_attention_perf.py (1)
1-2: Update copyright year to include 2026.The copyright header shows only 2025, but this file is being modified in 2026. Per coding guidelines, the copyright year should reflect the latest meaningful modification.
Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unittest/_torch/visual_gen/test_attention_perf.py` around lines 1 - 2, Update the copyright header at the top of this file to include 2026 by changing the existing year token "2025" to "2025-2026" (or another project-standard format that includes 2026); locate the header comment lines at the top of tests/unittest/_torch/visual_gen/test_attention_perf.py and replace the line containing "Copyright (c) 2025" so the file reflects the latest modification year.cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cpp (1)
1-15: Update copyright year to include 2026.The copyright header shows 2020-2023, but this file is being modified in 2026.
Proposed fix
/* - * Copyright (c) 2020-2023, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2020-2026, NVIDIA CORPORATION. All rights reserved.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cpp` around lines 1 - 15, Update the copyright header block at the top of fmhaRunner.cpp to include 2026 (change "2020-2023" to "2020-2026") so the file's copyright range reflects the current modification year; edit the leading comment block containing the license text to update the year span accordingly.cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp (1)
2-2: Update copyright year to include 2026.The copyright header shows 2020-2024, but this file is being modified in 2026.
Proposed fix
- * Copyright (c) 2020-2024, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2020-2026, NVIDIA CORPORATION. All rights reserved.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp` at line 2, The file header currently reads "2020-2024" and must be updated to include 2026; edit the top-of-file copyright line in fmhaDispatcher.cpp by replacing the string "2020-2024" with "2020-2026" (search for that exact substring in the header) so the copyright range reflects the 2026 modification.tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py (1)
80-88: Track the last prepared batch size separately.
_cached_seq_lens.shape[0]is buffer capacity, not the last prepared batch size. After one larger batch, every later smaller batch still trips the shape check and rerunsBaseTrtllmAttentionMetadata.prepare()even whenseq_lensare unchanged.♻️ Suggested fix
self._cached_seq_lens: Optional[torch.Tensor] = None self._prepared = False + self._prepared_batch_size = 0 @@ def _needs_prepare(self, batch_size: int, seq_lens: torch.Tensor) -> bool: """Check if we need to call prepare() (seq_lens changed).""" if not self._prepared: return True if self._cached_seq_lens is None: return True - if self._cached_seq_lens.shape[0] != batch_size: + if self._prepared_batch_size != batch_size: return True return not torch.equal(self._cached_seq_lens[:batch_size], seq_lens) @@ else: self._cached_seq_lens[:batch_size].copy_(seq_lens_tensor) self._prepared = True + self._prepared_batch_size = batch_sizeAlso applies to: 139-143
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py` around lines 80 - 88, The _needs_prepare method incorrectly uses self._cached_seq_lens.shape[0] (buffer capacity) to detect a changed batch size; add and use a separate attribute (e.g. self._last_prepared_batch_size) to record the batch size used by the last successful prepare(), and replace checks against _cached_seq_lens.shape[0] with this new attribute in _needs_prepare (and the analogous check around lines 139-143). Ensure prepare() updates self._last_prepared_batch_size when it completes so subsequent calls correctly detect whether a re-prepare is needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h`:
- Line 2: Update the copyright year range in the header of fmhaKernels.h to
include 2026 (e.g., change "2020-2025" to "2020-2026") so the file's NVIDIA
copyright header reflects the modification year.
In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py`:
- Line 1: The SPDX header at the top (the SPDX-FileCopyrightText line) lists the
copyright end year as 2025; update that year range to include 2026 (e.g., change
"2025" to "2025-2026" or the appropriate range) so the header reflects the file
modification year.
- Around line 244-255: The cross-attention fallback currently calls
self._concat_qkv(...) even when seq_len != kv_seq_len, causing mismatched row
counts and a failed torch.cat; guard that path by only using qkv concatenation
when k and v are None or when kv_seq_len == seq_len: if k is None and v is None
keep the existing flatten-to-(batch_size*seq_len) behavior, else if kv_seq_len
== seq_len use self._concat_qkv(q, k, v, batch_size, seq_len, kv_seq_len) and
pass the resulting qkv into super.forward(...), otherwise do not concatenate —
instead call super().forward with q=q, k=k, v=v (or otherwise choose an
appropriate unfused fallback) so the unequal-length cross-attention case avoids
torch.cat errors.
---
Outside diff comments:
In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.h`:
- Around line 1-15: Update the top-of-file copyright header in fmhaRunner.h to
include 2025 (e.g., change "2020-2023" to "2020-2025") to reflect the latest
meaningful modifications; locate the header comment block at the top of the file
and edit the year range accordingly so it matches current guidelines.
---
Nitpick comments:
In `@cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp`:
- Line 2: The file header currently reads "2020-2024" and must be updated to
include 2026; edit the top-of-file copyright line in fmhaDispatcher.cpp by
replacing the string "2020-2024" with "2020-2026" (search for that exact
substring in the header) so the copyright range reflects the 2026 modification.
In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cpp`:
- Around line 1-15: Update the copyright header block at the top of
fmhaRunner.cpp to include 2026 (change "2020-2023" to "2020-2026") so the file's
copyright range reflects the current modification year; edit the leading comment
block containing the license text to update the year span accordingly.
In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py`:
- Around line 80-88: The _needs_prepare method incorrectly uses
self._cached_seq_lens.shape[0] (buffer capacity) to detect a changed batch size;
add and use a separate attribute (e.g. self._last_prepared_batch_size) to record
the batch size used by the last successful prepare(), and replace checks against
_cached_seq_lens.shape[0] with this new attribute in _needs_prepare (and the
analogous check around lines 139-143). Ensure prepare() updates
self._last_prepared_batch_size when it completes so subsequent calls correctly
detect whether a re-prepare is needed.
In `@tests/unittest/_torch/visual_gen/test_attention_perf.py`:
- Around line 1-2: Update the copyright header at the top of this file to
include 2026 by changing the existing year token "2025" to "2025-2026" (or
another project-standard format that includes 2026); locate the header comment
lines at the top of tests/unittest/_torch/visual_gen/test_attention_perf.py and
replace the line containing "Copyright (c) 2025" so the file reflects the latest
modification year.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1294d1d5-fcbd-40b0-b5ea-dfa5a570b5f6
📒 Files selected for processing (119)
cpp/tensorrt_llm/common/attentionOp.cppcpp/tensorrt_llm/common/attentionOp.hcpp/tensorrt_llm/common/sageQuant.cucpp/tensorrt_llm/common/sageQuant.hcpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fused_multihead_attention_common.hcpp/tensorrt_llm/kernels/fmhaDispatcher.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk256HV256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk256HV256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/kernelMetaInfoVisualGen.hcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.hcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.cppcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunner.hcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaRunnerParams.hcpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/kernelParamsVisualGen.hcpp/tensorrt_llm/nanobind/thop/bindings.cppcpp/tensorrt_llm/thop/attentionOp.cppcpp/tensorrt_llm/thop/attentionOp.hexamples/visual_gen/README.mdexamples/visual_gen/visual_gen_wan_i2v.pyexamples/visual_gen/visual_gen_wan_t2v.pytensorrt_llm/_torch/attention_backend/trtllm.pytensorrt_llm/_torch/visual_gen/attention_backend/trtllm.pytensorrt_llm/_torch/visual_gen/attention_backend/utils.pytensorrt_llm/_torch/visual_gen/config.pytensorrt_llm/_torch/visual_gen/models/ltx2/transformer_ltx2.pytensorrt_llm/_torch/visual_gen/modules/attention.pytests/integration/test_lists/test-db/l0_b200.ymltests/unittest/_torch/visual_gen/multi_gpu/test_flux_ulysses.pytests/unittest/_torch/visual_gen/test_attention_integration.pytests/unittest/_torch/visual_gen/test_attention_perf.pytests/unittest/_torch/visual_gen/test_attention_trtllm_sage.pytests/unittest/_torch/visual_gen/test_flux_attention.pytests/unittest/_torch/visual_gen/test_ltx2_attention.py
💤 Files with no reviewable changes (101)
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk256HV256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk256HV256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
- tensorrt_llm/_torch/visual_gen/modules/attention.py
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
- tests/unittest/_torch/visual_gen/multi_gpu/test_flux_ulysses.py
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- tests/integration/test_lists/test-db/l0_b200.yml
- examples/visual_gen/README.md
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk256HV256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV256SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/common/sageQuant.h
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
- tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fused_multihead_attention_common.h
- tests/unittest/_torch/visual_gen/test_attention_trtllm_sage.py
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/common/sageQuant.cu
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/kernelMetaInfoVisualGen.h
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/kernelParamsVisualGen.h
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV128SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk256HV256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk128HV128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
- tensorrt_llm/_torch/visual_gen/config.py
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk64HV64SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
- tensorrt_llm/_torch/visual_gen/models/ltx2/transformer_ltx2.py
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H128SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk64HV64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK16SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1PersistentContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkvE4m3OBfloat16H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK1SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkvE4m3OE4m3H64SeparateQkvDenseVarSeqQ128Kv128SageQ1SageK4SageV1StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm100aKernel_QkInt8VE4m3OE4m3HQk256HV256SeparateQkvDenseVarSeqQ128Kv128StaticContext_cubin.cpp
- cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin_visual_gen/FmhaSm103aKernel_QkInt8VE4m3OBfloat16HQk128HV128SeparateQkvDenseVarSeqQ128Kv128PersistentContext_cubin.cpp
|
PR_Github #41323 [ run ] triggered by Bot. Commit: |
|
/bot help |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot run --disable-fail-fast |
|
/bot kill |
|
/bot run --disable-fail-fast |
|
PR_Github #41398 [ run ] triggered by Bot. Commit: |
|
PR_Github #41323 [ run ] completed with state |
|
PR_Github #41400 [ kill ] triggered by Bot. Commit: |
|
PR_Github #41398 [ run ] completed with state |
|
PR_Github #41400 [ kill ] completed with state |
|
PR_Github #41403 [ run ] triggered by Bot. Commit: |
|
PR_Github #41403 [ run ] completed with state |
…Integrate into AttentionOp API (NVIDIA#11718)" (NVIDIA#12679) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
…AttentionOp API (#11718)"
This reverts commit 1b66e96.
Summary by CodeRabbit
Release Notes
Removed Features
--enable_sage_attentionflag from example scripts and simplified attention backend configuration.Simplified APIs
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.