[https://nvbugs/6185173][fix] Set mamba ssm cache to fp32 for NemotronV2 by tensorrt-cicd · Pull Request #14448 · NVIDIA/TensorRT-LLM

tensorrt-cicd · 2026-05-22T07:22:51Z

Summary

Root cause: Nemotron-Nano-9B-v2 has mamba_head_dim=80, unsupported by FlashInfer SSM kernel (only {64, 128}), forcing fallback to Triton SSM. With the default bf16 SSM state cache on H20/Hopper, accumulating recurrent state across an entire prefill in a single forward pass produces catastrophic numerical drift (0% MMLU on full prefill, ~65% on chunked prefill).
Fix: Set kv_cache_config.mamba_ssm_cache_dtype: float32 in the model registry YAML for Nemotron-Nano-9B-v2 and remove the now-stale H20 waivers for the three TestNemotronV2 cases.
Automated fix generated by repair-bot

Test plan

Verify fix on the same GPU type as the original failure
Check for regressions in related tests

Links

Bug: https://nvbugs/6185173

Summary by CodeRabbit

Improvements
- Updated Nemotron Nano 9B v2 model configuration settings.
- Improved test reliability with removal of test waivers.

…Nano-9B-v2 Nemotron-Nano-9B-v2 has mamba_head_dim=80, which is unsupported by the FlashInfer SSM kernel (only {64, 128}), so AutoDeploy falls back to the Triton SSM backend. On H20/Hopper, accumulating bf16 state across an entire prefill in a single forward pass produces catastrophic accuracy drop (0% MMLU for full prefill, ~65% for chunked prefill). Pinning the SSM state cache to float32 keeps the recurrence numerically stable. Same precision/perf trade-off Nano-V3's config documents ("use float32 for accuracy and default (auto) for speed"). Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>

coderabbitai · 2026-05-22T07:24:33Z

📝 Walkthrough

Walkthrough

The Nemotron Nano 9B v2 model configuration is updated to specify float32 as the dtype for the Mamba SSM state cache, with documentation explaining kernel compatibility and long-context behavior requirements. Three corresponding test waivers are removed to enable previously skipped accuracy tests.

Changes

Nemotron SSM Cache Configuration

Layer / File(s)	Summary
SSM cache dtype configuration and test waiver removal `examples/auto_deploy/model_registry/configs/nemotron-nano-9b-v2.yaml`, `tests/integration/test_lists/waives.txt`	`mamba_ssm_cache_dtype: float32` is added to the model's `kv_cache_config` with comments explaining the Triton kernel constraint and avoidance of bf16 underflow in long-context cases. Three test waivers for `TestNemotronV2` variants (`test_auto_dtype[False]`, `test_auto_dtype[True]`, `test_fp8[True]`) are removed, enabling the tests to run.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Suggested reviewers

crazydemo
mikeiovine

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description includes root cause analysis, the fix applied, test verification, and relevant bug links, meeting the essential information requirements despite missing some template sections.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title clearly and specifically refers to the main change: setting the mamba SSM cache to fp32 for NemotronV2 in the model registry configuration.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

nvchenghaoz

Confirm that the fix in the PR fixed the issue. Tested on H20 from computelab

nvchenghaoz · 2026-05-27T23:53:36Z

/bot run

tensorrt-cicd · 2026-05-27T23:59:06Z

PR_Github #50636 [ run ] triggered by Bot. Commit: d3e52b9 Link to invocation

tensorrt-cicd · 2026-05-28T06:00:25Z

PR_Github #50636 [ run ] completed with state SUCCESS. Commit: d3e52b9
/LLM/main/L0_MergeRequest_PR pipeline #40129 completed with status: 'SUCCESS'

CI Report

Link to invocation

tensorrt-cicd requested a review from a team as a code owner May 22, 2026 07:22

tensorrt-cicd requested a review from QiJune May 22, 2026 07:22

tensorrt-cicd assigned suyoggupta May 22, 2026

github-actions Bot assigned tensorrt-cicd May 22, 2026

Update nemotron-nano-9b-v2.yaml

d3e52b9

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

nvchenghaoz approved these changes May 27, 2026

View reviewed changes

nvchenghaoz changed the title ~~[https://nvbugs/6185173][fix] Set kv_cache_config.mamba_ssm_cache_dtype: float32 in the model registry YAML~~ [https://nvbugs/6185173][fix] Set mamba ssm cache to fp32 for NemotronV2 May 27, 2026

nvchenghaoz enabled auto-merge (squash) May 27, 2026 23:53

nvchenghaoz merged commit fc3b69e into NVIDIA:main May 28, 2026
12 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6185173][fix] Set mamba ssm cache to fp32 for NemotronV2#14448

[https://nvbugs/6185173][fix] Set mamba ssm cache to fp32 for NemotronV2#14448
nvchenghaoz merged 2 commits into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6185173

tensorrt-cicd commented May 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

nvchenghaoz left a comment

Uh oh!

nvchenghaoz commented May 27, 2026

Uh oh!

tensorrt-cicd commented May 27, 2026

Uh oh!

tensorrt-cicd commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tensorrt-cicd commented May 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Links

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

nvchenghaoz left a comment

Choose a reason for hiding this comment

Uh oh!

nvchenghaoz commented May 27, 2026

Uh oh!

tensorrt-cicd commented May 27, 2026

Uh oh!

tensorrt-cicd commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tensorrt-cicd commented May 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading