[None][feat] Add the invocation path for mamba2 mtp custom op by JadoTu · Pull Request #12787 · NVIDIA/TensorRT-LLM

JadoTu · 2026-04-07T02:21:47Z

Summary by CodeRabbit

Release Notes

New Features
- Added optional CUDA custom operator support for speculative decoding optimization, controllable via environment variable configuration.
Tests
- Added integration tests validating custom operator functionality across 4 and 8 GPU configurations with acceptance-rate and accuracy benchmarks.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>

JadoTu · 2026-04-07T02:22:19Z

/bot run

coderabbitai · 2026-04-07T02:27:16Z

📝 Walkthrough

Walkthrough

This PR adds optional use of a TRT-LLM CUDA custom op for MTP SSM cache updates in Mamba2 mixer through an environment-variable-controlled dispatch mechanism. Two integration tests validate this new code path on 4-GPU and 8-GPU configurations.

Changes

Cohort / File(s)	Summary
MTP SSM Cache Update Optimization `tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py`	Modified `is_target_verify` branch to conditionally dispatch between TRT-LLM custom op (`selective_state_update_mtp_ssm_cache_trtllm`) and native/flashinfer implementations based on `mamba2_mtp_use_custom_op` environment variable. Refactored to reshape inputs into MTP-shaped tensors and construct appropriate argument sets for each code path.
Integration Tests `tests/integration/defs/accuracy/test_llm_api_pytorch.py`	Added two new `TestNemotronV3Super` methods: `test_nvfp4_4gpu_mtp_ar_custom_op()` (validates acceptance rate > 0.2) and `test_nvfp4_8gpus_mtp_custom_op()` (runs MMLU/GSM8K evaluation), both with custom-op path enabled via environment variable.
Test Matrix `tests/integration/test_lists/test-db/l0_dgx_b200.yml`	Extended test configuration with two new `TIMEOUT (60)` test entries for the custom-op MTP tests on 4-GPU and 8-GPU clusters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description lacks substantive content required by the template; it contains only placeholders and a checklist without explaining what, why, or test coverage details.	Fill in the Description section with a clear explanation of the changes and their purpose. Complete the Test Coverage section by listing the newly added tests (test_nvfp4_4gpu_mtp_ar_custom_op and test_nvfp4_8gpus_mtp_custom_op).
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly summarizes the main change: adding an invocation path for the mamba2 MTP custom operator, which is directly reflected in the code changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py (1)
1-1: 🛠️ Refactor suggestion | 🟠 Major

Update the copyright year to 2026.

The copyright header shows "2022-2024" but this file is being modified in 2026. As per coding guidelines, the year should be updated on modified files.
-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py` at line 1, Update the SPDX
copyright header line (the SPDX-FileCopyrightText comment at the top of the
file) to reflect the current modification year by changing "2022-2024" to
"2022-2026" so the header is accurate for 2026.
tests/integration/defs/accuracy/test_llm_api_pytorch.py (1)
1-1: ⚠️ Potential issue | 🟠 Major

Update SPDX copyright year for this modified file.

Line 1 still uses 2025, but this file has meaningful changes in this PR and should be updated to the latest modification year.
Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
As per coding guidelines: “Add NVIDIA copyright header to ALL new files; update year on modified files” and “All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header with the year of latest meaningful modification”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py` at line 1, Update
the SPDX header year in the file's top-line comment: replace "Copyright (c) 2025
NVIDIA CORPORATION & AFFILIATES." in the SPDX-FileCopyrightText line with the
latest modification year (2026) so the header reads 2026.

🧹 Nitpick comments (2)

tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py (1)

162-164: Consider using UPPER_SNAKE_CASE for the environment variable name.

Environment variables conventionally use UPPER_SNAKE_CASE (e.g., MAMBA2_MTP_USE_CUSTOM_OP). This aligns with common conventions and coding guidelines that specify constants should use UPPER_SNAKE_CASE.

-        self._use_mtp_custom_op = os.environ.get("mamba2_mtp_use_custom_op",
-                                                 "0") == "1"
+        self._use_mtp_custom_op = os.environ.get("MAMBA2_MTP_USE_CUSTOM_OP",
+                                                 "0") == "1"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py` around lines 162 - 164,
The environment variable name mamba2_mtp_use_custom_op used to set
self._use_mtp_custom_op should follow UPPER_SNAKE_CASE conventions; update the
code that reads os.environ.get("mamba2_mtp_use_custom_op", "0") to use
"MAMBA2_MTP_USE_CUSTOM_OP" instead and ensure any related docs/tests or other
occurrences are updated to the new name so _use_mtp_custom_op continues to be
initialized from the consistent UPPER_SNAKE_CASE variable.

tests/integration/defs/accuracy/test_llm_api_pytorch.py (1)

6668-6727: Consider extracting shared MTP acceptance-rate test logic into a helper.

This block is nearly identical to the existing non-custom-op AR test. Pulling prompt prep + acceptance-rate computation into a shared helper would reduce maintenance drift between the two paths.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py` around lines 6668 -
6727, This test duplicates prompt preparation and MTP acceptance-rate logic;
extract a shared helper (e.g., create function
compute_accept_rate_or_run_mtp_test) that accepts an LLM spec (LLM),
raw_prompts, MTPDecodingConfig/sampling params (SamplingParams), and
max_draft_len, performs tokenizer.apply_chat_template + encode, iterates
generate_async to compute num_drafted/num_accepted and returns the acceptance
rate; then call that helper from this MTP test and the existing non-custom-op AR
test to replace the duplicated blocks (refer to symbols: MTPDecodingConfig, LLM,
tokenizer.apply_chat_template, tokenizer.encode, generate_async, SamplingParams,
max_draft_len).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py`:
- Line 1: Update the SPDX copyright header line (the SPDX-FileCopyrightText
comment at the top of the file) to reflect the current modification year by
changing "2022-2024" to "2022-2026" so the header is accurate for 2026.

In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Line 1: Update the SPDX header year in the file's top-line comment: replace
"Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES." in the
SPDX-FileCopyrightText line with the latest modification year (2026) so the
header reads 2026.

---

Nitpick comments:
In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py`:
- Around line 162-164: The environment variable name mamba2_mtp_use_custom_op
used to set self._use_mtp_custom_op should follow UPPER_SNAKE_CASE conventions;
update the code that reads os.environ.get("mamba2_mtp_use_custom_op", "0") to
use "MAMBA2_MTP_USE_CUSTOM_OP" instead and ensure any related docs/tests or
other occurrences are updated to the new name so _use_mtp_custom_op continues to
be initialized from the consistent UPPER_SNAKE_CASE variable.

In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Around line 6668-6727: This test duplicates prompt preparation and MTP
acceptance-rate logic; extract a shared helper (e.g., create function
compute_accept_rate_or_run_mtp_test) that accepts an LLM spec (LLM),
raw_prompts, MTPDecodingConfig/sampling params (SamplingParams), and
max_draft_len, performs tokenizer.apply_chat_template + encode, iterates
generate_async to compute num_drafted/num_accepted and returns the acceptance
rate; then call that helper from this MTP test and the existing non-custom-op AR
test to replace the duplicated blocks (refer to symbols: MTPDecodingConfig, LLM,
tokenizer.apply_chat_template, tokenizer.encode, generate_async, SamplingParams,
max_draft_len).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 81888ec5-e6e6-4c5a-b44b-75dc1433a394

📥 Commits

Reviewing files that changed from the base of the PR and between 4496e69 and ce7ea2c.

📒 Files selected for processing (3)

tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py
tests/integration/defs/accuracy/test_llm_api_pytorch.py
tests/integration/test_lists/test-db/l0_dgx_b200.yml

tensorrt-cicd · 2026-04-07T02:28:05Z

PR_Github #42032 [ run ] triggered by Bot. Commit: ce7ea2c Link to invocation

Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>

JadoTu · 2026-04-07T02:38:51Z

/bot run

tensorrt-cicd · 2026-04-07T02:45:02Z

PR_Github #42035 [ run ] triggered by Bot. Commit: 4030ea1 Link to invocation

tensorrt-cicd · 2026-04-07T04:17:13Z

PR_Github #42035 [ run ] completed with state SUCCESS. Commit: 4030ea1
/LLM/main/L0_MergeRequest_PR pipeline #32881 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JadoTu · 2026-04-07T04:34:28Z

/bot run

tensorrt-cicd · 2026-04-07T04:39:55Z

PR_Github #42063 [ run ] triggered by Bot. Commit: 1bbf108 Link to invocation

tensorrt-cicd · 2026-04-07T07:28:50Z

PR_Github #42063 [ run ] completed with state SUCCESS. Commit: 1bbf108
/LLM/main/L0_MergeRequest_PR pipeline #32903 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JadoTu · 2026-04-07T07:41:31Z

/bot run

tensorrt-cicd · 2026-04-07T07:50:15Z

PR_Github #42102 [ run ] triggered by Bot. Commit: 536e248 Link to invocation

tensorrt-cicd · 2026-04-07T08:42:53Z

PR_Github #42102 [ run ] completed with state FAILURE. Commit: 536e248
/LLM/main/L0_MergeRequest_PR pipeline #32941 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JadoTu · 2026-04-07T09:04:19Z

/bot run

tensorrt-cicd · 2026-04-07T09:11:06Z

PR_Github #42118 [ run ] triggered by Bot. Commit: 536e248 Link to invocation

tensorrt-cicd · 2026-04-07T10:49:22Z

PR_Github #42118 [ run ] completed with state SUCCESS. Commit: 536e248
/LLM/main/L0_MergeRequest_PR pipeline #32956 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JadoTu · 2026-04-07T11:23:42Z

/bot run

tensorrt-cicd · 2026-04-15T00:39:14Z

PR_Github #43312 [ run ] triggered by Bot. Commit: 77d0770 Link to invocation

tensorrt-cicd · 2026-04-15T03:05:20Z

PR_Github #43312 [ run ] completed with state SUCCESS. Commit: 77d0770
/LLM/main/L0_MergeRequest_PR pipeline #33854 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JadoTu · 2026-04-15T03:08:17Z

/bot run

JadoTu · 2026-04-15T03:10:39Z

/bot kill

tensorrt-cicd · 2026-04-15T03:14:01Z

PR_Github #43349 [ run ] triggered by Bot. Commit: 1bded3a Link to invocation

tensorrt-cicd · 2026-04-15T03:16:19Z

PR_Github #43351 [ kill ] triggered by Bot. Commit: 1bded3a Link to invocation

tensorrt-cicd · 2026-04-15T03:16:22Z

PR_Github #43349 [ run ] completed with state ABORTED. Commit: 1bded3a

Link to invocation

tensorrt-cicd · 2026-04-15T03:16:52Z

PR_Github #43351 [ kill ] completed with state SUCCESS. Commit: 1bded3a
Successfully killed previous jobs for commit 1bded3a

Link to invocation

JadoTu · 2026-04-15T03:38:30Z

/bot run

tensorrt-cicd · 2026-04-15T03:45:31Z

PR_Github #43368 [ run ] triggered by Bot. Commit: d5bebf0 Link to invocation

tensorrt-cicd · 2026-04-15T05:35:18Z

PR_Github #43368 [ run ] completed with state FAILURE. Commit: d5bebf0
/LLM/main/L0_MergeRequest_PR pipeline #33903 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JadoTu · 2026-04-15T06:49:27Z

/bot run

tensorrt-cicd · 2026-04-15T06:55:01Z

PR_Github #43414 [ run ] triggered by Bot. Commit: d5bebf0 Link to invocation

tensorrt-cicd · 2026-04-15T08:18:06Z

PR_Github #43414 [ run ] completed with state SUCCESS. Commit: d5bebf0
/LLM/main/L0_MergeRequest_PR pipeline #33946 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JadoTu · 2026-04-15T08:22:42Z

/bot run

tensorrt-cicd · 2026-04-15T08:28:26Z

PR_Github #43451 [ run ] triggered by Bot. Commit: d5bebf0 Link to invocation

tensorrt-cicd · 2026-04-15T12:29:27Z

PR_Github #43451 [ run ] completed with state SUCCESS. Commit: d5bebf0
/LLM/main/L0_MergeRequest_PR pipeline #33977 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JadoTu · 2026-04-16T01:36:58Z

/bot run

JadoTu · 2026-04-16T02:49:03Z

/bot run

tensorrt-cicd · 2026-04-16T02:55:52Z

PR_Github #43629 [ run ] triggered by Bot. Commit: d2e271c Link to invocation

tensorrt-cicd · 2026-04-16T04:59:18Z

PR_Github #43629 [ run ] completed with state SUCCESS. Commit: d2e271c
/LLM/main/L0_MergeRequest_PR pipeline #34121 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JadoTu · 2026-04-16T08:55:45Z

/bot run

tensorrt-cicd · 2026-04-16T09:02:28Z

PR_Github #43741 [ run ] triggered by Bot. Commit: 403d49b Link to invocation

tensorrt-cicd · 2026-04-16T20:41:49Z

PR_Github #43741 [ run ] completed with state SUCCESS. Commit: 403d49b
/LLM/main/L0_MergeRequest_PR pipeline #34223 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

feat: add the invocation path for mamba2 mtp custom op

ce7ea2c

Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>

JadoTu requested review from a team as code owners April 7, 2026 02:21

JadoTu requested review from symphonylyh and tomeras91 April 7, 2026 02:21

github-actions Bot assigned JadoTu Apr 7, 2026

JadoTu requested review from nv-guomingz and sunnyqgg April 7, 2026 02:23

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

nv-guomingz reviewed Apr 7, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py Outdated

change env name

4030ea1

Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>

Merge branch 'main' into mamba2_mtp_custom_op_invocation

1bbf108

Merge branch 'main' into mamba2_mtp_custom_op_invocation

536e248

Merge branch 'main' into mamba2_mtp_custom_op_invocation

1bded3a

Merge branch 'main' into mamba2_mtp_custom_op_invocation

d5bebf0

Merge branch 'main' into mamba2_mtp_custom_op_invocation

d2e271c

JadoTu added 2 commits April 16, 2026 13:01

Merge branch 'main' into mamba2_mtp_custom_op_invocation

2062453

Merge branch 'main' into mamba2_mtp_custom_op_invocation

403d49b

nv-guomingz merged commit d635e13 into NVIDIA:main Apr 17, 2026
5 checks passed

Conversation

JadoTu commented Apr 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

JadoTu commented Apr 7, 2026

Uh oh!

coderabbitai Bot commented Apr 7, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

Uh oh!

JadoTu commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

JadoTu commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

JadoTu commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

JadoTu commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

JadoTu commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

JadoTu commented Apr 15, 2026

Uh oh!

JadoTu commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

JadoTu commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

JadoTu commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

JadoTu commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

JadoTu commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading