[None][feat] Minimax RMS norm optimization by jmydurant · Pull Request #12163 · NVIDIA/TensorRT-LLM

jmydurant · 2026-03-12T14:18:07Z

This PR optimizes MiniMax M2 Q/K RMSNorm in tensor-parallel attention.

Previously, after qkv_proj, each rank only owned a local shard [N, D / tp]. To perform RMSNorm over the full Q/K hidden dimension, the implementation first all-gathered local shards
into a full [N, D] tensor, applied RMSNorm, and then sliced the result back to each rank. This introduced unnecessary communication and temporary full-tensor materialization.

This PR adds a dedicated MiniMax allreduce RMS kernel that keeps computation on local shards. Each rank computes the local variance sum for its [N, D / tp] shard, reduces the variance
across TP ranks, and then applies RMSNorm locally using the rank-local gamma shard. This reduces synchronization volume from full Q/K activations to per-token variance sums and removes the
allgather -> full RMSNorm -> reshard path.

Main changes:

Add a dedicated CUDA kernel for MiniMax allreduce RMS and fused Q+K RMS.
Add PyTorch custom-op bindings for single-input RMS and fused Q+K RMS.
Integrate the new path into MiniMaxM2 TP attention Q/K norm.
Load Q/K RMSNorm weights as TP-local shards.
Add unit tests and a microbenchmark.

Here's benchmark result B200 * 4, isl/osl 2k/256, concurrency 10

method	total throughput (tokens/s)
origin tp	4643.2
new tp	7088.5
attention dp	5791.7

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

jmydurant · 2026-04-09T07:52:42Z

/bot help

github-actions · 2026-04-09T07:52:52Z

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

jmydurant · 2026-04-09T07:53:20Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-09T07:59:22Z

PR_Github #42503 [ run ] triggered by Bot. Commit: d406d02 Link to invocation

coderabbitai · 2026-04-09T08:00:05Z

📝 Walkthrough

Walkthrough

This pull request introduces a new MiniMax collective all-reduce operation for RMS normalization using Lamport-style cross-rank synchronization. It adds CUDA kernels, PyTorch bindings, a distributed module wrapper, integration into the MiniMaxM2 attention layer, benchmarks, and unit tests to support both single-tensor and dual Q+K tensor paths.

Changes

Cohort / File(s)	Summary
CUDA Kernel Implementation `cpp/tensorrt_llm/kernels/communicationKernels/MiniMaxReduceRMSKernel.cu`, `MiniMaxReduceRMSKernel.h`	New CUDA kernels (`minimax_reduce_rms_kernel_lamport` and `minimax_reduce_qk_rms_kernel_lamport_float4`) implementing RMS normalization with variance allreduce via Lamport synchronization, volatile global loads, warp/block-level reductions, and device utilities for RMS reciprocal operations. Includes host-side launchers for cluster/stream configuration and dtype/rank-based kernel selection.
PyTorch Bindings `cpp/tensorrt_llm/thop/allreduceOp.cpp`	Added two new torch operators (`minimax_allreduce_rms` and `minimax_allreduce_rms_qk`) with CUDA implementations. Includes kernel dispatch logic constructing parameter structs, runtime validation for tensor rank/contiguity/dtype matching, and dimension constraints for the Q+K variant.
Distributed Module API `tensorrt_llm/_torch/distributed/__init__.py`, `tensorrt_llm/_torch/distributed/ops.py`	New `MiniMaxAllReduceRMS` nn.Module exposing `forward()` for single tensors and `forward_qk()` for dual Q+K paths, wrapping workspace allocation and torch operation dispatch.
Model Integration `tensorrt_llm/_torch/models/modeling_minimaxm2.py`	Replaced prior allgather-based QK RMS normalization with new `MiniMaxRMSNorm` module using `MiniMaxAllReduceRMS` for tensor-parallel configurations. Includes new imports for tensor-parallel utilities and distributed collectives.
Testing & Benchmarking `tests/microbenchmarks/minimax_all_reduce.py`, `tests/unittest/_torch/multi_gpu/test_allreduce.py`	Added comprehensive benchmark script measuring latency across tensor shapes with CUDA graph capture and MPI coordination. Added test cases validating single-tensor and Q+K RMS normalization correctness against reference implementations with ~0.2 relative tolerance in bfloat16.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 24.64% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	PR description lacks required template sections. The author provided context about optimization but did not fill in the Description, Test Coverage, or complete the PR Checklist properly.	Add a substantive 'Description' section explaining the optimization rationale and approach. Complete 'Test Coverage' section listing all added tests. Verify and update the checklist items as appropriate for this PR.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[None][feat] Minimax RMS norm optimization' is related to the changeset which implements RMS norm optimization kernels and distributed collective support, but lacks specificity about the main innovation (cross-rank synchronization via Lamport clock).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/tensorrt_llm/kernels/communicationKernels/MiniMaxReduceRMSKernel.cu`:
- Around line 1-7: This new CUDA source (MiniMaxReduceRMSKernel.cu) is missing
the required NVIDIA copyright/SPDX file header; add the standard NVIDIA header
block at the very top of MiniMaxReduceRMSKernel.cu (before any `#include`),
including the current year, "NVIDIA CORPORATION" copyright line and the
SPDX-License-Identifier (e.g., Apache-2.0) per the repo guideline so the file
matches other TensorRT-LLM sources.

In `@cpp/tensorrt_llm/kernels/communicationKernels/MiniMaxReduceRMSKernel.h`:
- Around line 1-68: This header (containing MiniMaxReduceRMSParams and
minimax_reduce_rms_op in namespace kernels::minimax_ar) is new and must include
the required NVIDIA OSS file header; add the NVIDIA copyright/SPDX header block
at the top of the file (with the correct latest modification year and SPDX
identifier) before the `#pragma` once so the file complies with the project coding
guidelines.

In `@cpp/tensorrt_llm/thop/allreduceOp.cpp`:
- Around line 1837-1848: The kernel currently assumes rms_gamma is BF16 but
accepts any norm_weight dtype; add a runtime dtype guard that rejects non-BF16
norm_weight before constructing MiniMaxReduceRMSParams to avoid silent mis-typed
gamma (check norm_weight.scalar_type() and return/throw a clear error if not
torch::kBFloat16), and apply the same guard for the analogous assignment blocks
around the other allreduce params region (the block using
allreduce_params.rms_gamma/_k at ~1867-1898) so both entrypoints refuse non-BF16
gamma until the kernel supports other types.

In `@tensorrt_llm/_torch/distributed/__init__.py`:
- Around line 5-8: The export block in
tensorrt_llm/_torch/distributed/__init__.py is not sorted and fails pre-commit;
run isort (or manually sort alphabetically) on the from .ops import (...) line
so the imported names (AllReduce, AllReduceParams, AllReduceStrategy,
HelixAllToAllNative, MiniMaxAllReduceRMS, MoEAllReduce, MoEAllReduceParams,
all_to_all_4d, all_to_all_5d, allgather, alltoall_helix, cp_allgather,
reducescatter, userbuffers_allreduce_finalize) are in the linter-expected order
and update the single import line accordingly.

In `@tests/unittest/_torch/multi_gpu/test_allreduce.py`:
- Around line 900-903: The test test_minimax_allreduce_rms_qk currently forces
mpi_pool_executor=4 but lacks a guard; add a pytest skip condition to the test
so it only runs when at least 4 GPUs are visible (e.g., use
pytest.mark.skipif(torch.cuda.device_count() < 4, reason="requires 4 GPUs")) and
ensure torch is imported at top of the test file; apply this to the parametrized
decorator that sets mpi_pool_executor so fixture setup won't fail on smaller
runners.
- Around line 715-725: The current reference path computes rms_norm only over
the local hidden slice (after reshape to [total_tokens, tp_size, local_hidden])
so it misses the cross-rank reduction; change the reference computation to
perform normalization over the full hidden dimension (tp_size * local_hidden)
before slicing back to the per-rank view: reshape input to [total_tokens, -1]
(or compute squared-sum/mean across the combined hidden dimension using
tensor_parallel_size * local_hidden), run rms_norm (or equivalent manual rms
calculation using rms_weights and eps) on that full-hidden tensor to produce a
global ref_output, then reshape to [total_tokens, tensor_parallel_size, -1],
cast to origin_dtype and finally take the slice ref_output[:,
tensor_parallel_rank, :] so the reference includes the cross-rank reduction;
update uses of rms_norm, ref_output, input and rms_weights accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a69d8bb5-91e6-4f32-815b-f00f4d262d37

📥 Commits

Reviewing files that changed from the base of the PR and between ce71620 and d406d02.

📒 Files selected for processing (8)

cpp/tensorrt_llm/kernels/communicationKernels/MiniMaxReduceRMSKernel.cu
cpp/tensorrt_llm/kernels/communicationKernels/MiniMaxReduceRMSKernel.h
cpp/tensorrt_llm/thop/allreduceOp.cpp
tensorrt_llm/_torch/distributed/__init__.py
tensorrt_llm/_torch/distributed/ops.py
tensorrt_llm/_torch/models/modeling_minimaxm2.py
tests/microbenchmarks/minimax_all_reduce.py
tests/unittest/_torch/multi_gpu/test_allreduce.py

tensorrt-cicd · 2026-04-10T02:33:30Z

PR_Github #42503 [ run ] completed with state SUCCESS. Commit: d406d02
/LLM/main/L0_MergeRequest_PR pipeline #33248 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

jmydurant · 2026-04-10T04:37:58Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-10T06:15:44Z

PR_Github #42663 [ run ] triggered by Bot. Commit: f995db2 Link to invocation

tensorrt-cicd · 2026-04-10T23:28:43Z

PR_Github #42663 [ run ] completed with state SUCCESS. Commit: f995db2
/LLM/main/L0_MergeRequest_PR pipeline #33372 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

jmydurant · 2026-04-13T02:30:02Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-13T02:35:21Z

PR_Github #42930 [ run ] triggered by Bot. Commit: dc57cc7 Link to invocation

tensorrt-cicd · 2026-04-13T18:36:42Z

PR_Github #42930 [ run ] completed with state SUCCESS. Commit: dc57cc7
/LLM/main/L0_MergeRequest_PR pipeline #33590 completed with status: 'SUCCESS'

CI Report

Link to invocation

hyukn

LGTM. Thanks a lot.

syuoni

LGTM

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

jmydurant · 2026-04-16T07:34:37Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-16T07:40:31Z

PR_Github #43710 [ run ] triggered by Bot. Commit: d912d1c Link to invocation

jmydurant · 2026-04-16T14:46:59Z

/bot kill

jmydurant · 2026-04-16T14:47:19Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-16T14:55:45Z

PR_Github #43784 [ kill ] triggered by Bot. Commit: d912d1c Link to invocation

tensorrt-cicd · 2026-04-16T14:56:13Z

PR_Github #43785 [ run ] triggered by Bot. Commit: d912d1c Link to invocation

tensorrt-cicd · 2026-04-16T14:56:16Z

PR_Github #43784 [ kill ] completed with state ABORTED. Commit: d912d1c

Link to invocation

tensorrt-cicd · 2026-04-16T14:56:58Z

PR_Github #43710 [ run ] completed with state ABORTED. Commit: d912d1c

Link to invocation

jmydurant · 2026-04-16T15:08:55Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-16T15:16:45Z

PR_Github #43793 [ run ] triggered by Bot. Commit: d912d1c Link to invocation

tensorrt-cicd · 2026-04-17T01:21:51Z

PR_Github #43793 [ run ] completed with state FAILURE. Commit: d912d1c
/LLM/main/L0_MergeRequest_PR pipeline #34271 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

jmydurant · 2026-04-17T03:42:49Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-17T03:48:13Z

PR_Github #43941 [ run ] triggered by Bot. Commit: d912d1c Link to invocation

tensorrt-cicd · 2026-04-17T09:33:51Z

PR_Github #43941 [ run ] completed with state SUCCESS. Commit: d912d1c
/LLM/main/L0_MergeRequest_PR pipeline #34386 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

jmydurant · 2026-04-18T02:21:14Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-18T02:26:51Z

PR_Github #44086 [ run ] triggered by Bot. Commit: d912d1c Link to invocation

tensorrt-cicd · 2026-04-18T10:49:13Z

PR_Github #44086 [ run ] completed with state SUCCESS. Commit: d912d1c
/LLM/main/L0_MergeRequest_PR pipeline #34514 completed with status: 'SUCCESS'

CI Report

Link to invocation

github-actions Bot assigned jmydurant Mar 12, 2026

jeejeelee mentioned this pull request Mar 14, 2026

[Kernel] Porting the TRTLLM minimax_allreduce_rms kernels vllm-project/vllm#37045

Merged

4 tasks

DarkSharpness mentioned this pull request Mar 16, 2026

[Feature][JIT Kernel] Fused TP QK norm For Minimax sgl-project/sglang#20673

Merged

5 tasks

jmydurant force-pushed the user/mingyangj/minimax_opt_rebase branch from 096ec75 to 201717c Compare March 27, 2026 05:19

jmydurant force-pushed the user/mingyangj/minimax_opt_rebase branch 2 times, most recently from 049daba to d406d02 Compare April 9, 2026 07:49

jmydurant marked this pull request as ready for review April 9, 2026 07:51

jmydurant requested review from a team as code owners April 9, 2026 07:51

jmydurant requested review from shaharmor98 and symphonylyh April 9, 2026 07:51

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

jmydurant requested a review from a team as a code owner April 13, 2026 02:28

jmydurant requested a review from hyukn April 13, 2026 02:28

hyukn reviewed Apr 14, 2026

View reviewed changes

hyukn approved these changes Apr 15, 2026

View reviewed changes

syuoni approved these changes Apr 15, 2026

View reviewed changes

jmydurant force-pushed the user/mingyangj/minimax_opt_rebase branch from 328d56c to 8cf5272 Compare April 15, 2026 12:40

jmydurant added 11 commits April 16, 2026 15:28

test: add benchmark code, fix clean lamport buffer bug

2eaa0e7

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

feature: fuse q and k rms norm kernel

05885e7

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

feature: q and k use different thread idx

78e38cf

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

test: add test case for qk norm, add range reduce func

a610f72

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

feature: modify float4 kernel, split q, k to different warp, enable pdl

55739c2

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

chore: modify rms norm weight loading method

212c46a

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

fix: fix split tensor contiguous bug

76db2c3

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

chore: add more torch check for input shape

5049a8e

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

fix: fix for code rabbit review comments

87020f2

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

fix: add fake op impl

76be688

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

fix: clean code by hyukn comments

d912d1c

Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>

jmydurant force-pushed the user/mingyangj/minimax_opt_rebase branch from d1a546f to d912d1c Compare April 16, 2026 07:31

syuoni merged commit a56a8d2 into NVIDIA:main Apr 20, 2026
6 of 7 checks passed

Conversation

jmydurant commented Mar 12, 2026 • edited by syuoni Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

jmydurant commented Apr 9, 2026

Uh oh!

github-actions Bot commented Apr 9, 2026

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

jmydurant commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

coderabbitai Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

jmydurant commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

jmydurant commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hyukn left a comment

Choose a reason for hiding this comment

Uh oh!

syuoni left a comment

Choose a reason for hiding this comment

Uh oh!

jmydurant commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

jmydurant commented Apr 16, 2026

Uh oh!

jmydurant commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

jmydurant commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

jmydurant commented Mar 12, 2026 •

edited by syuoni

Loading

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading