Skip to content

[None][feat] Add the invocation path for mamba2 mtp custom op#12787

Merged
nv-guomingz merged 12 commits intoNVIDIA:mainfrom
JadoTu:mamba2_mtp_custom_op_invocation
Apr 17, 2026
Merged

[None][feat] Add the invocation path for mamba2 mtp custom op#12787
nv-guomingz merged 12 commits intoNVIDIA:mainfrom
JadoTu:mamba2_mtp_custom_op_invocation

Conversation

@JadoTu
Copy link
Copy Markdown
Collaborator

@JadoTu JadoTu commented Apr 7, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added optional CUDA custom operator support for speculative decoding optimization, controllable via environment variable configuration.
  • Tests

    • Added integration tests validating custom operator functionality across 4 and 8 GPU configurations with acceptance-rate and accuracy benchmarks.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
@JadoTu JadoTu requested review from a team as code owners April 7, 2026 02:21
@JadoTu JadoTu requested review from symphonylyh and tomeras91 April 7, 2026 02:21
@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 7, 2026

/bot run

@JadoTu JadoTu requested review from nv-guomingz and sunnyqgg April 7, 2026 02:23
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

This PR adds optional use of a TRT-LLM CUDA custom op for MTP SSM cache updates in Mamba2 mixer through an environment-variable-controlled dispatch mechanism. Two integration tests validate this new code path on 4-GPU and 8-GPU configurations.

Changes

Cohort / File(s) Summary
MTP SSM Cache Update Optimization
tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py
Modified is_target_verify branch to conditionally dispatch between TRT-LLM custom op (selective_state_update_mtp_ssm_cache_trtllm) and native/flashinfer implementations based on mamba2_mtp_use_custom_op environment variable. Refactored to reshape inputs into MTP-shaped tensors and construct appropriate argument sets for each code path.
Integration Tests
tests/integration/defs/accuracy/test_llm_api_pytorch.py
Added two new TestNemotronV3Super methods: test_nvfp4_4gpu_mtp_ar_custom_op() (validates acceptance rate > 0.2) and test_nvfp4_8gpus_mtp_custom_op() (runs MMLU/GSM8K evaluation), both with custom-op path enabled via environment variable.
Test Matrix
tests/integration/test_lists/test-db/l0_dgx_b200.yml
Extended test configuration with two new TIMEOUT (60) test entries for the custom-op MTP tests on 4-GPU and 8-GPU clusters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description lacks substantive content required by the template; it contains only placeholders and a checklist without explaining what, why, or test coverage details. Fill in the Description section with a clear explanation of the changes and their purpose. Complete the Test Coverage section by listing the newly added tests (test_nvfp4_4gpu_mtp_ar_custom_op and test_nvfp4_8gpus_mtp_custom_op).
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly summarizes the main change: adding an invocation path for the mamba2 MTP custom operator, which is directly reflected in the code changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py (1)

1-1: 🛠️ Refactor suggestion | 🟠 Major

Update the copyright year to 2026.

The copyright header shows "2022-2024" but this file is being modified in 2026. As per coding guidelines, the year should be updated on modified files.

-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py` at line 1, Update the SPDX
copyright header line (the SPDX-FileCopyrightText comment at the top of the
file) to reflect the current modification year by changing "2022-2024" to
"2022-2026" so the header is accurate for 2026.
tests/integration/defs/accuracy/test_llm_api_pytorch.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Update SPDX copyright year for this modified file.

Line 1 still uses 2025, but this file has meaningful changes in this PR and should be updated to the latest modification year.

Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

As per coding guidelines: “Add NVIDIA copyright header to ALL new files; update year on modified files” and “All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header with the year of latest meaningful modification”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py` at line 1, Update
the SPDX header year in the file's top-line comment: replace "Copyright (c) 2025
NVIDIA CORPORATION & AFFILIATES." in the SPDX-FileCopyrightText line with the
latest modification year (2026) so the header reads 2026.
🧹 Nitpick comments (2)
tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py (1)

162-164: Consider using UPPER_SNAKE_CASE for the environment variable name.

Environment variables conventionally use UPPER_SNAKE_CASE (e.g., MAMBA2_MTP_USE_CUSTOM_OP). This aligns with common conventions and coding guidelines that specify constants should use UPPER_SNAKE_CASE.

-        self._use_mtp_custom_op = os.environ.get("mamba2_mtp_use_custom_op",
-                                                 "0") == "1"
+        self._use_mtp_custom_op = os.environ.get("MAMBA2_MTP_USE_CUSTOM_OP",
+                                                 "0") == "1"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py` around lines 162 - 164,
The environment variable name mamba2_mtp_use_custom_op used to set
self._use_mtp_custom_op should follow UPPER_SNAKE_CASE conventions; update the
code that reads os.environ.get("mamba2_mtp_use_custom_op", "0") to use
"MAMBA2_MTP_USE_CUSTOM_OP" instead and ensure any related docs/tests or other
occurrences are updated to the new name so _use_mtp_custom_op continues to be
initialized from the consistent UPPER_SNAKE_CASE variable.
tests/integration/defs/accuracy/test_llm_api_pytorch.py (1)

6668-6727: Consider extracting shared MTP acceptance-rate test logic into a helper.

This block is nearly identical to the existing non-custom-op AR test. Pulling prompt prep + acceptance-rate computation into a shared helper would reduce maintenance drift between the two paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py` around lines 6668 -
6727, This test duplicates prompt preparation and MTP acceptance-rate logic;
extract a shared helper (e.g., create function
compute_accept_rate_or_run_mtp_test) that accepts an LLM spec (LLM),
raw_prompts, MTPDecodingConfig/sampling params (SamplingParams), and
max_draft_len, performs tokenizer.apply_chat_template + encode, iterates
generate_async to compute num_drafted/num_accepted and returns the acceptance
rate; then call that helper from this MTP test and the existing non-custom-op AR
test to replace the duplicated blocks (refer to symbols: MTPDecodingConfig, LLM,
tokenizer.apply_chat_template, tokenizer.encode, generate_async, SamplingParams,
max_draft_len).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py`:
- Line 1: Update the SPDX copyright header line (the SPDX-FileCopyrightText
comment at the top of the file) to reflect the current modification year by
changing "2022-2024" to "2022-2026" so the header is accurate for 2026.

In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Line 1: Update the SPDX header year in the file's top-line comment: replace
"Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES." in the
SPDX-FileCopyrightText line with the latest modification year (2026) so the
header reads 2026.

---

Nitpick comments:
In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py`:
- Around line 162-164: The environment variable name mamba2_mtp_use_custom_op
used to set self._use_mtp_custom_op should follow UPPER_SNAKE_CASE conventions;
update the code that reads os.environ.get("mamba2_mtp_use_custom_op", "0") to
use "MAMBA2_MTP_USE_CUSTOM_OP" instead and ensure any related docs/tests or
other occurrences are updated to the new name so _use_mtp_custom_op continues to
be initialized from the consistent UPPER_SNAKE_CASE variable.

In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Around line 6668-6727: This test duplicates prompt preparation and MTP
acceptance-rate logic; extract a shared helper (e.g., create function
compute_accept_rate_or_run_mtp_test) that accepts an LLM spec (LLM),
raw_prompts, MTPDecodingConfig/sampling params (SamplingParams), and
max_draft_len, performs tokenizer.apply_chat_template + encode, iterates
generate_async to compute num_drafted/num_accepted and returns the acceptance
rate; then call that helper from this MTP test and the existing non-custom-op AR
test to replace the duplicated blocks (refer to symbols: MTPDecodingConfig, LLM,
tokenizer.apply_chat_template, tokenizer.encode, generate_async, SamplingParams,
max_draft_len).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 81888ec5-e6e6-4c5a-b44b-75dc1433a394

📥 Commits

Reviewing files that changed from the base of the PR and between 4496e69 and ce7ea2c.

📒 Files selected for processing (3)
  • tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
  • tests/integration/test_lists/test-db/l0_dgx_b200.yml

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42032 [ run ] triggered by Bot. Commit: ce7ea2c Link to invocation

Comment thread tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py Outdated
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 7, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42035 [ run ] triggered by Bot. Commit: 4030ea1 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42035 [ run ] completed with state SUCCESS. Commit: 4030ea1
/LLM/main/L0_MergeRequest_PR pipeline #32881 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 7, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42063 [ run ] triggered by Bot. Commit: 1bbf108 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42063 [ run ] completed with state SUCCESS. Commit: 1bbf108
/LLM/main/L0_MergeRequest_PR pipeline #32903 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 7, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42102 [ run ] triggered by Bot. Commit: 536e248 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42102 [ run ] completed with state FAILURE. Commit: 536e248
/LLM/main/L0_MergeRequest_PR pipeline #32941 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 7, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42118 [ run ] triggered by Bot. Commit: 536e248 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42118 [ run ] completed with state SUCCESS. Commit: 536e248
/LLM/main/L0_MergeRequest_PR pipeline #32956 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 7, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43312 [ run ] triggered by Bot. Commit: 77d0770 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43312 [ run ] completed with state SUCCESS. Commit: 77d0770
/LLM/main/L0_MergeRequest_PR pipeline #33854 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 15, 2026

/bot run

@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 15, 2026

/bot kill

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43349 [ run ] triggered by Bot. Commit: 1bded3a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43351 [ kill ] triggered by Bot. Commit: 1bded3a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43349 [ run ] completed with state ABORTED. Commit: 1bded3a

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43351 [ kill ] completed with state SUCCESS. Commit: 1bded3a
Successfully killed previous jobs for commit 1bded3a

Link to invocation

@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 15, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43368 [ run ] triggered by Bot. Commit: d5bebf0 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43368 [ run ] completed with state FAILURE. Commit: d5bebf0
/LLM/main/L0_MergeRequest_PR pipeline #33903 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 15, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43414 [ run ] triggered by Bot. Commit: d5bebf0 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43414 [ run ] completed with state SUCCESS. Commit: d5bebf0
/LLM/main/L0_MergeRequest_PR pipeline #33946 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 15, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43451 [ run ] triggered by Bot. Commit: d5bebf0 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43451 [ run ] completed with state SUCCESS. Commit: d5bebf0
/LLM/main/L0_MergeRequest_PR pipeline #33977 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 16, 2026

/bot run

1 similar comment
@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 16, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43629 [ run ] triggered by Bot. Commit: d2e271c Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43629 [ run ] completed with state SUCCESS. Commit: d2e271c
/LLM/main/L0_MergeRequest_PR pipeline #34121 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JadoTu
Copy link
Copy Markdown
Collaborator Author

JadoTu commented Apr 16, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43741 [ run ] triggered by Bot. Commit: 403d49b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43741 [ run ] completed with state SUCCESS. Commit: 403d49b
/LLM/main/L0_MergeRequest_PR pipeline #34223 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

@nv-guomingz nv-guomingz merged commit d635e13 into NVIDIA:main Apr 17, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants