[None][feat] Trtllm-gen FMHA JIT support by yunruis · Pull Request #12612 · NVIDIA/TensorRT-LLM

yunruis · 2026-03-31T06:40:54Z

Description

This PR integrates the trtllm-gen FMHA JIT (NVRTC) compilation path into TensorRT-LLM alongside the existing cubin (pre-compiled) path, enabling runtime kernel generation for FMHA configurations not covered by pre-compiled cubins.

Key Changes

1. Dual kernel dispatch: cubin + NVRTC JIT

The TllmGenFmhaKernel class now supports two dispatch paths:

Cubin path (default): loads pre-compiled kernels from embedded cubin data, used for generation-phase MLA, E2M1 KV cache, and head_dim=64 kernels.
NVRTC path (new): uses FmhaInterface from trtllm-gen to generate and compile kernels at runtime via NVRTC for configurations where cubins are not available (e.g., SwapsMmaAbForGeneration with non-MLA, non-E2M1, head_dim != 64).

2. Unified kernel selection via FmhaAutoTuner — consistent with trtllm-gen

This is the most significant architectural change. Previously, TensorRT-LLM maintained its own hand-written heuristic kernel selection logic (selectGqGenerationKernel, selectMlaGenerationKernel, selectTileSizeQForGqaGeneration, etc.), which diverged from trtllm-gen's selection logic and was a frequent source of bugs — kernel hash mismatches, TRTLLM-GEN kernels not found errors, and silent fallbacks to unfused MHA.

This PR replaces all of that with trtllm-gen's FmhaAutoTuner, which automatically determines the optimal tile sizes, kernel types (Swaps/Keeps MMA AB), CTA configurations, and multi-CTA KV modes. This removes ~300 lines of manual selection code and ensures that TensorRT-LLM and trtllm-gen always agree on which kernel to select, eliminating the class of bugs caused by selection logic divergence between the two repos.

3. Simplified TllmGenFmhaRunner constructor

Removed VisualGen-specific parameters (maxNumHeadsQPerKvInCta, sageAttnBlk*, dataTypeQkReinterpret) from the runner/kernel constructor and hash key. These are now handled internally by the auto-tuner and kernel options system, reducing the API surface from 10 parameters to 3 (dtypeQ, dtypeKv, dtypeOut).

4. trtllm-gen export headers, static libraries, and cubin artifacts

Added 36 export headers from trtllm-gen under trtllmGen_fmha_export/ providing FmhaInterface, FmhaAutoTuner, FmhaOptions, KernelParams, and device-side runtime code.
Added prebuilt static libraries (libTrtLlmGenFmhaLib.a, libTrtLlmGen.a) for both x86_64 and aarch64 architectures.
Added cuda_ptx/cuda_ptx.h containing PTX intrinsic wrappers needed by NVRTC-generated kernels.
Regenerated ~3000 cubin files with updated kernel configurations and refreshed kernelMetaInfo.h metadata.
Extended pre-commit global exclude patterns to cover trtllmGen_fmha_export/ and cuda_ptx/ to prevent formatting tools from modifying external/generated code.

yunruis · 2026-03-31T06:43:05Z

/bot run --disable-fail-fast --stage-list "DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-1,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-2,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-3,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-4,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-5,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-6,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU2-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU2-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU2-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU4-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU4-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU4-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU4-Post-Merge-4,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE2-GPU8-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-2,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-3,GB200-16_GPUs-4_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE2-GPU8-GEN1-NODE2-GPU8-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-4"

tensorrt-cicd · 2026-03-31T06:49:07Z

PR_Github #40886 [ run ] triggered by Bot. Commit: 5e4f57b Link to invocation

coderabbitai · 2026-03-31T06:57:09Z

📝 Walkthrough

Walkthrough

Pre-commit configuration adds exclusion rules for FMHA kernel output directories. FmhaDispatcher populates additional runner parameters including processor count and layout-dependent settings. CMake now links prebuilt kernel archives. Hundreds of Git LFS pointers updated for cubin artifacts. Added floating-point header include.

Changes

Cohort / File(s)	Summary
Pre-commit configuration `.pre-commit-config.yaml`	Added per-hook `exclude` patterns for FMHA kernel output directories (`trtllmGen_fmha_export/.`, `cuda_ptx/.`, `.*cubin\.(cpp\|h)$`) across multiple linting tools (isort, ruff, yapf, clang-format, cmake-format, codespell, autoflake, remove-crlf, end-of-file-fixer, trailing-whitespace).
FMHA dispatcher updates `cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp`	Populates additional TllmGenFmhaRunnerParams fields: `mNumHeadsQ`, `mNumHeadsKv`, `mHeadDimQkNope`, fixed sizing fields (`mBatchSize`, `mMaxSeqLen`, `mSumOfSeqLens`, `mMaxNumPagesPerSeqKv`), `mMultiProcessorCount`. Conditionally sets `mNumTokensPerPage` based on `qkvLayout` instead of always using fixed params.
CMake build configuration `cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt`	Creates build-time symlinks and copies kernel headers. Extends interface include directories with `trtllmGen_fmha_export`. Adds compile definitions including `TRTLLM_FMHA_BUILD_DIR`, `TLLM_PUBLIC_RELEASE`, `TLLM_GEN_EXPORT_INTERFACE`, `TLLM_FMHA_TRTLLM_COMPAT`. Links object library against prebuilt static archives (`libTrtLlmGenFmhaLib.a`, `libTrtLlmGen.a`) with fatal error checking.
Kernel include `cpp/tensorrt_llm/kernels/indexerTopK.cu`	Added `#include <cfloat>` for floating-point constants.
FMHA cubin artifacts `cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/*_cubin.cpp` (600\+ files)	Updated Git LFS pointers for compiled CUDA kernels across multiple kernel variants (Sm100a, Sm100f with various configurations: head dimensions, qkv layouts, sequence handling modes). Each file's `oid` (SHA-256) and `size` metadata changed, indicating binary payloads were regenerated.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

The change comprises predominantly homogeneous updates (600\+ Git LFS pointer metadata changes with identical patterns), which require minimal individual review. The non-repetitive portions (CMakeLists.txt build configuration, FmhaDispatcher parameter additions, pre-commit exclusions) are localized and straightforward, offsetting the large file count through pattern repetition rather than diverse logic complexity.

Possibly related PRs

NVIDIA/TensorRT-LLM\[None][feat] add trtllm-gen kernels for glm4.7 and support groupsTokensHeadsQ + e2m1 output #11643: Modifies trtllm-gen FMHA kernel artifacts (cubin pointers and CMake/dispatcher integration for generated kernels).
NVIDIA/TensorRT-LLM\[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support #8086: Updates FMHA runner parameter structures (adds sparse-attention fields to TllmGenFmhaRunnerParams alongside the main PR's parameter population).

Suggested reviewers

niukuo
PerkzZheng
yuxianq
Wanli-Jiang
byshiue

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp (1)
2-2: ⚠️ Potential issue | 🟡 Minor

Update copyright year in modified file.

This file is modified but still shows 2020-2024; it should include the latest modification year.
🛠️ Proposed fix
- * Copyright (c) 2020-2024, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2020-2026, NVIDIA CORPORATION.  All rights reserved.
As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp` at line 2, Update the file
header in cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp to reflect the latest
modification year by changing the copyright range from "2020-2024" to include
2026 (e.g., "2020-2026"); ensure the header format exactly matches other source
files' NVIDIA copyright header so the symbol fmhaDispatcher.cpp contains the
updated year.

🧹 Nitpick comments (4)

cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100fKernel_QkvBfloat16OBfloat16H256PagedKvSlidingOrChunkedCausalP32VarSeqQ128Kv128PersistentContext_cubin.cpp (1)

1-3: Treat *_cubin.cpp LFS pointers as artifacts in static-analysis jobs.

These lines are Git LFS pointer metadata, so parsing them as C++ creates false hard errors (oid, size syntax failures). Please make CI either hydrate LFS before C++ analysis or exclude these artifact-pointer files from Clang/Cppcheck scans.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100fKernel_QkvBfloat16OBfloat16H256PagedKvSlidingOrChunkedCausalP32VarSeqQ128Kv128PersistentContext_cubin.cpp`
around lines 1 - 3, The CI is treating Git LFS pointer files like real C++ and
failing; update the pipeline to either hydrate LFS before static analysis or
exclude these artifacts by glob. Specifically, in the static-analysis steps that
run Clang/Cppcheck/tidy, add a pre-step to run "git lfs pull" (or equivalent) so
files such as
FmhaSm100fKernel_QkvBfloat16OBfloat16H256PagedKvSlidingOrChunkedCausalP32VarSeqQ128Kv128PersistentContext_cubin.cpp
are real C++ blobs, or add an exclusion glob for "*_cubin.cpp" (and any other
LFS pointer patterns) in the Clang/Cppcheck invocation so these LFS pointer
files are skipped. Ensure the change touches the CI job(s) that run
Clang/Cppcheck/tidy.

.pre-commit-config.yaml (1)

1391-1517: Consider deduplicating the repeated FMHA exclude regex via a YAML anchor.

The same block is repeated in many hooks, which is easy to drift over time.

♻️ Suggested DRY refactor

+fmha-generated-exclude: &fmha_generated_exclude |
+    (?x)^(
+        cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/trtllmGen_fmha_export/.* |
+        cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cuda_ptx/.*
+    )$
@@
-        exclude: |
-            (?x)^(
-                cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/trtllmGen_fmha_export/.* |
-                cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cuda_ptx/.*
-            )$
+        exclude: *fmha_generated_exclude

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.pre-commit-config.yaml around lines 1391 - 1517, Define a single YAML
anchor for the repeated FMHA exclude regex and replace the repeated multi-line
exclude values with a reference to that anchor; locate the top-level repeated
block (the long exclude: | (?x)^( ... )$ pattern) and extract it into a named
anchor (e.g. &fmha_exclude) and then use the alias (*fmha_exclude) in each
hook's exclude field (examples to update include hooks with id: remove-crlf,
yapf, end-of-file-fixer, trailing-whitespace, clang-format, cmake-format,
codespell, ruff, ruff-format, autoflake, etc.); ensure whitespace/indentation
matches YAML nesting so pre-commit still parses the exclude entries correctly.

cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100fKernel_QkvBfloat16OBfloat16H128PagedKvSlidingOrChunkedCausalP32VarSeqSkipsSoftmaxQ16Kv128StaticSwapsAbForGen_cubin.cpp (1)

1-3: Exclude Git LFS pointer stubs from C++ static-analysis passes.

This file is a valid Git LFS pointer update (oid/size), but clang/cppcheck will misparse it as C++ when LFS objects are not materialized. Please ensure these paths are excluded (or LFS pull is guaranteed) in static-analysis jobs to prevent false failures.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100fKernel_QkvBfloat16OBfloat16H128PagedKvSlidingOrChunkedCausalP32VarSeqSkipsSoftmaxQ16Kv128StaticSwapsAbForGen_cubin.cpp`
around lines 1 - 3, The CI/static-analysis is picking up Git LFS pointer stubs
(files starting with the line "version https://git-lfs.github.com/spec/v1" and
containing "oid sha256:"/ "size") as C++; update the
static-analysis/clang/cppcheck job configuration to exclude such files (or
ensure LFS objects are pulled) by adding a rule to skip files matching that
header or the specific cubin filename pattern (e.g.,
FmhaSm100fKernel_..._cubin.cpp) so the analyzer ignores pointer stubs; ensure
the exclusion references the pointer header string ("version
https://git-lfs.github.com/spec/v1") or the oid/size pattern to reliably detect
LFS pointer files.

cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp (1)

117-123: Extract repeated non-zero literals to a named constant.

Using raw 1 repeatedly in assignments violates the literal-usage rule and makes intent less clear.

♻️ Proposed refactor

+        int32_t const kKERNEL_SELECTION_PROBE_VALUE = 1;
-        tllmRunnerParams.mBatchSize = 1;
-        tllmRunnerParams.mMaxSeqLenQ = 1;
-        tllmRunnerParams.mMaxSeqLenKv = 1;
-        tllmRunnerParams.mMaxSeqLenCacheKv = 1;
-        tllmRunnerParams.mSumOfSeqLensQ = 1;
-        tllmRunnerParams.mSumOfSeqLensKv = 1;
-        tllmRunnerParams.mMaxNumPagesPerSeqKv = 1;
+        tllmRunnerParams.mBatchSize = kKERNEL_SELECTION_PROBE_VALUE;
+        tllmRunnerParams.mMaxSeqLenQ = kKERNEL_SELECTION_PROBE_VALUE;
+        tllmRunnerParams.mMaxSeqLenKv = kKERNEL_SELECTION_PROBE_VALUE;
+        tllmRunnerParams.mMaxSeqLenCacheKv = kKERNEL_SELECTION_PROBE_VALUE;
+        tllmRunnerParams.mSumOfSeqLensQ = kKERNEL_SELECTION_PROBE_VALUE;
+        tllmRunnerParams.mSumOfSeqLensKv = kKERNEL_SELECTION_PROBE_VALUE;
+        tllmRunnerParams.mMaxNumPagesPerSeqKv = kKERNEL_SELECTION_PROBE_VALUE;

As per coding guidelines, "Except 0, nullptr, true, false, all other literals in C++ should only be used for variable initialization; extract other literal usages to named constants."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp` around lines 117 - 123, Replace
the repeated magic literal 1 used to initialize fields on tllmRunnerParams with
a named constant: declare a constexpr (e.g., kDefaultOne = 1) and use it for
mBatchSize, mMaxSeqLenQ, mMaxSeqLenKv, mMaxSeqLenCacheKv, mSumOfSeqLensQ,
mSumOfSeqLensKv, and mMaxNumPagesPerSeqKv; update the assignments in
fmhaDispatcher.cpp so the intent is clear and the literal is not repeated
directly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt`:
- Around line 24-27: The computed CUDA_TARGETS_INCLUDE_DIR uses
CMAKE_SYSTEM_PROCESSOR directly which yields "aarch64" on Arm SBSA systems but
the CUDA toolkit places headers under "targets/sbsa-linux"; update the logic
around get_filename_component/CUDA_TARGETS_INCLUDE_DIR to map
CMAKE_SYSTEM_PROCESSOR "aarch64" to "sbsa" (or otherwise detect ARM SBSA) before
assembling the path, then build the include path as
"${CUDA_TOOLKIT_ROOT}/targets/${MAPPED_PROCESSOR}-linux/include" so the symbolic
link and AArch64 JIT path resolve correctly (refer to get_filename_component,
CUDA_BIN_PATH, CUDA_TOOLKIT_ROOT, CUDA_TARGETS_INCLUDE_DIR and
CMAKE_SYSTEM_PROCESSOR).
- Around line 39-42: Replace the file(COPY ...) usage for the FMHA exported
headers with configure_file so the generated CUDA/NVRTC build dir gets updated
whenever KernelParams.h or KernelParamsDecl.h change: locate the two file(COPY
${CMAKE_CURRENT_SOURCE_DIR}/trtllmGen_fmha_export/KernelParams.h ...) and
file(COPY ${CMAKE_CURRENT_SOURCE_DIR}/trtllmGen_fmha_export/KernelParamsDecl.h
...) entries in CMakeLists.txt and change them to configure_file calls that copy
from ${CMAKE_CURRENT_SOURCE_DIR}/trtllmGen_fmha_export/KernelParams.h and
KernelParamsDecl.h to ${CMAKE_CURRENT_BINARY_DIR}/KernelParams.h and
KernelParamsDecl.h (using `@ONLY` if needed), ensuring the build system reruns and
NVRTC sees up-to-date headers referenced by TRTLLM_FMHA_BUILD_DIR.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ8Kv128PersistentSwapsAbForGen_cubin.cpp`:
- Around line 2-3: CMake currently globs all .cpp files with file(GLOB_RECURSE
SRC_CPP *.cpp) which pulls in Git LFS pointer .cpp files from the cubin
directory and causes build failures when LFS is not hydrated; update the
CMakeLists handling by excluding the cubin directory from the glob or by adding
an explicit exclusion filter for the cubin path when populating SRC_CPP (or
alternatively treat the cubin directory separately), ensuring the unique symbol
SRC_CPP and the existing file(GLOB_RECURSE ...) invocation are modified so
cubin/*.cpp files are not added to the compile list unless Git LFS has been
hydrated.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100aKernel_QE4m3KvE2m1OE4m3H64PagedKvDenseP32VarSeqQ16Kv128StaticSwapsAbForGen_cubin.cpp`:
- Around line 2-3: The CMake GLOB that populates SRC_CPP is including Git LFS
pointer files like the cubin.cpp artifact; update the CMakeLists handling around
the "file(GLOB_RECURSE SRC_CPP *.cpp)" and subsequent
"add_library(trtllm_gen_fmha OBJECT ${SRC_CPP} ${SRC_CU})" so those artifacts
are filtered out — e.g. after the glob, remove or filter entries matching the
LFS/binary patterns (like ".*cubin.cpp$" and ".*_cubin.h$") using list(FILTER
SRC_CPP EXCLUDE REGEX ...) or use a foreach loop to push_valid files into
SRC_CPP_CLEAN and use that in add_library; ensure the same exclusion is applied
to any SRC_CU or other globbed source lists so LFS pointer files never reach the
compiler.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100fKernel_QkvBfloat16OBfloat16H256PagedKvSlidingOrChunkedCausalP32MultiCtasKvCgaVarSeqSkipsSoftmaxQ128Kv128StaticKeepsAbForGen_cubin.cpp`:
- Around line 1-3: The CI workflows pr-check.yml and precommit-check.yml are
missing LFS hydration which causes the CMake glob (file(GLOB_RECURSE SRC_CPP
*.cpp)) to pick up LFS pointer files; update the actions/checkout@v6 steps in
both workflows to include lfs: 'true' (matching blossom-ci.yml) so large-file
pointers are hydrated before static analysis runs and compilation.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100fKernel_QkvBfloat16OBfloat16H256PagedKvSlidingOrChunkedCausalP32MultiCtasKvVarSeqSkipsSoftmaxQ128Kv128StaticKeepsAbForGen_cubin.cpp`:
- Around line 1-3: The CI is attempting to compile Git LFS pointer files like
FmhaSm100fKernel_QkvBfloat16OBfloat16H256PagedKvSlidingOrChunkedCausalP32MultiCtasKvVarSeqSkipsSoftmaxQ128Kv128StaticKeepsAbForGen_cubin.cpp
before the LFS objects are materialized; update the pipeline to run a Git LFS
materialization step (e.g., git lfs pull or enabling git lfs smudge) as a
pre-build/pre-analysis step so the actual .cubin.cpp binaries/sources are
present before invoking the C++ compiler or static analyzers, and ensure this
step runs before any job that references these files.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100fKernel_QkvBfloat16OBfloat16H256PagedKvSlidingOrChunkedCausalP32VarSeqSkipsSoftmaxQ8Kv128PersistentSwapsAbForGen_cubin.cpp`:
- Around line 1-3: CI is running C++ analysis against LFS pointer stubs (e.g.,
files like
FmhaSm100fKernel_QkvBfloat16OBfloat16H256PagedKvSlidingOrChunkedCausalP32VarSeqSkipsSoftmaxQ8Kv128PersistentSwapsAbForGen_cubin.cpp);
ensure the pipeline materializes LFS objects early by adding a step that runs
git lfs pull --all (or git lfs checkout) before any C++
parsing/static-analysis/lint stages, and place this step at the start of the
job(s) that run the analyzers so the actual binary .cubin files are present
instead of pointer files.

---

Outside diff comments:
In `@cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp`:
- Line 2: Update the file header in cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp
to reflect the latest modification year by changing the copyright range from
"2020-2024" to include 2026 (e.g., "2020-2026"); ensure the header format
exactly matches other source files' NVIDIA copyright header so the symbol
fmhaDispatcher.cpp contains the updated year.

---

Nitpick comments:
In @.pre-commit-config.yaml:
- Around line 1391-1517: Define a single YAML anchor for the repeated FMHA
exclude regex and replace the repeated multi-line exclude values with a
reference to that anchor; locate the top-level repeated block (the long exclude:
| (?x)^( ... )$ pattern) and extract it into a named anchor (e.g. &fmha_exclude)
and then use the alias (*fmha_exclude) in each hook's exclude field (examples to
update include hooks with id: remove-crlf, yapf, end-of-file-fixer,
trailing-whitespace, clang-format, cmake-format, codespell, ruff, ruff-format,
autoflake, etc.); ensure whitespace/indentation matches YAML nesting so
pre-commit still parses the exclude entries correctly.

In `@cpp/tensorrt_llm/kernels/fmhaDispatcher.cpp`:
- Around line 117-123: Replace the repeated magic literal 1 used to initialize
fields on tllmRunnerParams with a named constant: declare a constexpr (e.g.,
kDefaultOne = 1) and use it for mBatchSize, mMaxSeqLenQ, mMaxSeqLenKv,
mMaxSeqLenCacheKv, mSumOfSeqLensQ, mSumOfSeqLensKv, and mMaxNumPagesPerSeqKv;
update the assignments in fmhaDispatcher.cpp so the intent is clear and the
literal is not repeated directly.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100fKernel_QkvBfloat16OBfloat16H128PagedKvSlidingOrChunkedCausalP32VarSeqSkipsSoftmaxQ16Kv128StaticSwapsAbForGen_cubin.cpp`:
- Around line 1-3: The CI/static-analysis is picking up Git LFS pointer stubs
(files starting with the line "version https://git-lfs.github.com/spec/v1" and
containing "oid sha256:"/ "size") as C++; update the
static-analysis/clang/cppcheck job configuration to exclude such files (or
ensure LFS objects are pulled) by adding a rule to skip files matching that
header or the specific cubin filename pattern (e.g.,
FmhaSm100fKernel_..._cubin.cpp) so the analyzer ignores pointer stubs; ensure
the exclusion references the pointer header string ("version
https://git-lfs.github.com/spec/v1") or the oid/size pattern to reliably detect
LFS pointer files.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/cubin/FmhaSm100fKernel_QkvBfloat16OBfloat16H256PagedKvSlidingOrChunkedCausalP32VarSeqQ128Kv128PersistentContext_cubin.cpp`:
- Around line 1-3: The CI is treating Git LFS pointer files like real C++ and
failing; update the pipeline to either hydrate LFS before static analysis or
exclude these artifacts by glob. Specifically, in the static-analysis steps that
run Clang/Cppcheck/tidy, add a pre-step to run "git lfs pull" (or equivalent) so
files such as
FmhaSm100fKernel_QkvBfloat16OBfloat16H256PagedKvSlidingOrChunkedCausalP32VarSeqQ128Kv128PersistentContext_cubin.cpp
are real C++ blobs, or add an exclusion glob for "*_cubin.cpp" (and any other
LFS pointer patterns) in the Clang/Cppcheck invocation so these LFS pointer
files are skipped. Ensure the change touches the CI job(s) that run
Clang/Cppcheck/tidy.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/CMakeLists.txt

...vE2m1OE4m3H256PagedKvSlidingOrChunkedCausalP32VarSeqQ8Kv128PersistentSwapsAbForGen_cubin.cpp

...haSm100aKernel_QE4m3KvE2m1OE4m3H64PagedKvDenseP32VarSeqQ16Kv128StaticSwapsAbForGen_cubin.cpp

...dingOrChunkedCausalP32MultiCtasKvCgaVarSeqSkipsSoftmaxQ128Kv128StaticKeepsAbForGen_cubin.cpp

...SlidingOrChunkedCausalP32MultiCtasKvVarSeqSkipsSoftmaxQ128Kv128StaticKeepsAbForGen_cubin.cpp

...56PagedKvSlidingOrChunkedCausalP32VarSeqSkipsSoftmaxQ8Kv128PersistentSwapsAbForGen_cubin.cpp

yunruis · 2026-03-31T07:43:10Z

/bot run --disable-fail-fast --post-merge

tensorrt-cicd · 2026-03-31T07:51:19Z

PR_Github #40902 [ run ] triggered by Bot. Commit: 5e4f57b Link to invocation

tensorrt-cicd · 2026-03-31T07:51:22Z

PR_Github #40886 [ run ] completed with state ABORTED. Commit: 5e4f57b

Link to invocation

yunruis · 2026-03-31T11:29:19Z

/bot run --disable-fail-fast --post-merge

tensorrt-cicd · 2026-03-31T11:35:46Z

PR_Github #40936 [ run ] triggered by Bot. Commit: 5dca29d Link to invocation

tensorrt-cicd · 2026-03-31T15:25:10Z

PR_Github #40936 [ run ] completed with state FAILURE. Commit: 5dca29d
/LLM/main/L0_MergeRequest_PR pipeline #31929 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

yunruis · 2026-03-31T15:30:46Z

/bot run --post-merge --disable-fail-fast

tensorrt-cicd · 2026-03-31T15:37:32Z

PR_Github #40965 [ run ] triggered by Bot. Commit: 5dca29d Link to invocation

cpp/tensorrt_llm/kernels/indexerTopK.cu

tensorrt-cicd · 2026-03-31T17:55:03Z

PR_Github #40965 [ run ] completed with state FAILURE. Commit: 5dca29d
/LLM/main/L0_MergeRequest_PR pipeline #31951 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h

niukuo

jenkins/: LGTM

yunruis · 2026-04-01T06:13:09Z

/bot run --disable-fail-fast --post-merge

tensorrt-cicd · 2026-04-01T06:19:49Z

PR_Github #41127 [ run ] triggered by Bot. Commit: 5e4c72b Link to invocation

.pre-commit-config.yaml

trtllm-gen tag1: trtllm gen pass, trtllm-llm fail fix several bugs Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> fix chunk prefill accuracy error Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> ignore trtllm-gen fmha release check Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> drop SBSA CI Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> wrap nvrtc path error on CI Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> 1. change lib to release and do not dump file; 2. fix bug of nvrtc include file Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix ci check test list error Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> add print on auto tuner Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> drop internal instructions Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> pre-commit check ignore cpp/tensorrt_llm/kernels/trtllmGenKernels Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> skip CI Release Check stage Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> waive nvfp4 kv cache case Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix the bug when trying to enable CGA reduction Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> cancel print debug in autotuner Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix mla and mha bugs for nvrtc path Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix startTokenIdxSfO field loss Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> unify format to trtllm-gen Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix spec decoding error, and sync file with trtllm-gen, wave gptoss hang error Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix chunk window size assert error Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> force using cubin path Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> add fix for cubin path Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix sparse mla on nvrtc path Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix fmha cubin path kernel select, to more sync with main branch Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix masktype selection Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> disable A10 and RTX5090 PackageSanityCheck to unblock multi-GPU testing Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> refactor cubin path kernel params to use TMA-based KernelParams struct, and more struct alignment with trtllm-gen Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix tokens per page==0 for not paged kvcache case Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> trtllm-gen: struct align with main branch Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix fmhaDispatch isSupported Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> trtllm-gen: fix illegal resource handle on multi-GPU Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> cancel waive GPTOSS case and layer_wise_benchmark Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> trtllm-gen: enable build for aarch64 Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> waive test_layer_wise_benchmarks.py::test_qwen3_next_gen_tep[1] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> trtllm-gen:update exportCubin, and do not dump cu file for nvrtc path Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix generation phase TMA mismatch in cubin path Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> trtllm-gen: fix for qwen3_next and test rocky Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> fix wrong aarch64 lib Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> add cubin Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> trtllm-gen: add TLLM_FMHA_TRTLLM_COMPAT guards for TRT-LLM export compatibility. and adapt to rebased to trtllm-gen Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix debug build error Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> trtllm-gen: fix exportCubin on trtllm, and fix SmemTile.h Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> trtllm-gen: refine with perkz comment Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix sth Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix: allow E2M1 KV cache on cubin path by guarding with mIsTrtllmLayout in checkFmhaOptions Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Made-with: Cursor fix NemotronNanoV3 by setting correct kvlen in context, and trivial sync trtllm-gen lib Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> trtllm-gen: WAR hang for dpsk v3 lite on B300 Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> trtllm-gen: WAR hang for dpsk v3 lite on B300, export Cubin Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> refinement Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix custom mask ut Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> update trtllm-gen lib Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> trtllm-gen: Fix confidential/public scan: internal-release markers, comment scrub, cHigh rename in CutlassUtils Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> cancel pre-commit war Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix license check Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> fix: NVRTC path sanity test header path issue Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> fix warning Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> fix fastmath.h Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> add path selection Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> trtllm-gen: fix compile wanring on trtllm, and add rocky lib Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>

Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

yunruis · 2026-04-01T09:05:17Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2026-04-01T09:16:21Z

PR_Github #41166 [ run ] triggered by Bot. Commit: ba085c2 Link to invocation

tensorrt-cicd · 2026-04-01T18:26:09Z

PR_Github #41166 [ run ] completed with state SUCCESS. Commit: ba085c2
/LLM/main/L0_MergeRequest_PR pipeline #32136 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

yunruis · 2026-04-02T12:17:55Z

/bot run --add-multi-gpu-test --disable-fail-fast

yunruis · 2026-04-02T14:46:56Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2026-04-02T14:54:16Z

PR_Github #41442 [ run ] triggered by Bot. Commit: 9bcc613 Link to invocation

tensorrt-cicd · 2026-04-03T00:15:52Z

PR_Github #41442 [ run ] completed with state SUCCESS. Commit: 9bcc613
/LLM/main/L0_MergeRequest_PR pipeline #32373 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

yunruis · 2026-04-03T01:59:09Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-03T02:06:36Z

PR_Github #41543 [ run ] triggered by Bot. Commit: 9bcc613 Link to invocation

tensorrt-cicd · 2026-04-03T04:37:39Z

PR_Github #41543 [ run ] completed with state SUCCESS. Commit: 9bcc613
/LLM/main/L0_MergeRequest_PR pipeline #32457 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

yunruis · 2026-04-03T08:23:03Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-03T08:29:36Z

PR_Github #41634 [ run ] triggered by Bot. Commit: e9f29d7 Link to invocation

tensorrt-cicd · 2026-04-03T13:12:39Z

PR_Github #41634 [ run ] completed with state SUCCESS. Commit: e9f29d7
/LLM/main/L0_MergeRequest_PR pipeline #32542 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Co-authored-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>

yunruis requested review from a team as code owners March 31, 2026 06:40

yunruis requested review from mzweilz and niukuo March 31, 2026 06:40

yunruis changed the title ~~trtllm-gen attention JIT support~~ [None][feat] Trtllm-gen FMHA JIT support Mar 31, 2026

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

yuxianq reviewed Mar 31, 2026

View reviewed changes

cpp/tensorrt_llm/kernels/indexerTopK.cu Show resolved Hide resolved

pengbowang-nv reviewed Apr 1, 2026

View reviewed changes

cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h Outdated Show resolved Hide resolved

niukuo approved these changes Apr 1, 2026

View reviewed changes

yuxianq reviewed Apr 1, 2026

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

yuxianq approved these changes Apr 1, 2026

View reviewed changes

yunruis and others added 5 commits April 1, 2026 17:04

fix clusterDimx bug and some review comment

4f3c65e

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

[None][Infra] Fix Argument list too long error

0718236

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>

try ai fix

439bfbe

Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>

refine with code review

ba085c2

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

yunruis force-pushed the user/yunruis/add_fmha_interface_rebased_cubin_2_perf branch from 4e7fd7b to ba085c2 Compare April 1, 2026 09:04

force headdim=64 to nvrtc path, for gptoss

9bcc613

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

nvrtc path exclude Llama70bFp4Tp4

e9f29d7

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>

mzweilz approved these changes Apr 7, 2026

View reviewed changes

pengbowang-nv merged commit 88bbb4d into NVIDIA:main Apr 7, 2026
5 checks passed

coderabbitai bot mentioned this pull request Apr 7, 2026

[None][test] Remove RTX-6000 OOM test cases #12798

Closed

1 task

Conversation

yunruis commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Uh oh!

yunruis commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yunruis commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

yunruis commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

yunruis commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

Uh oh!

niukuo left a comment

Choose a reason for hiding this comment

Uh oh!

yunruis commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

Uh oh!

yunruis commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

yunruis commented Apr 2, 2026

Uh oh!

yunruis commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

yunruis commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

yunruis commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

yunruis commented Mar 31, 2026 •

edited

Loading

coderabbitai bot commented Mar 31, 2026 •

edited

Loading