fix: set moe_permute_fusion default to true for deterministic MoE forward by zpqiu · Pull Request #2258 · NVIDIA-NeMo/RL

zpqiu · 2026-04-13T09:30:28Z

What does this PR do ?

With moe_permute_fusion=false, MoE models produce non-deterministic forward pass results due to scatter_add_ in the unpermute operation, causing train/probs_ratio to deviate from 1.0 in on-policy GRPO.

Issues

List issues that this PR closes (syntax):
Fixes #2255

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

…ward With moe_permute_fusion=false, MoE models produce non-deterministic forward pass results due to scatter_add_ in the unpermute operation, causing train/probs_ratio to deviate from 1.0 in on-policy GRPO. Fixes NVIDIA-NeMo#2255 Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

copy-pr-bot · 2026-04-13T09:30:37Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copilot

Pull request overview

Updates example training configs to default moe_permute_fusion: true to avoid non-deterministic MoE forward passes (and resulting train/probs_ratio drift) caused by the unfused unpermute path.

Changes:

Flip policy.megatron_cfg.moe_permute_fusion from false to true in several example configs.
Apply the same default in both standard and Megatron variants of GRPO/distillation configs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
examples/configs/sft.yaml	Sets `megatron_cfg.moe_permute_fusion: true` in the SFT example config.
examples/configs/grpo_math_1B.yaml	Sets `megatron_cfg.moe_permute_fusion: true` in the GRPO math 1B baseline config.
examples/configs/grpo_math_1B_megatron.yaml	Sets `megatron_cfg.moe_permute_fusion: true` in the Megatron GRPO math 1B config.
examples/configs/dpo.yaml	Sets `megatron_cfg.moe_permute_fusion: true` in the DPO example config.
examples/configs/distillation_math.yaml	Sets `megatron_cfg.moe_permute_fusion: true` in the distillation math config.
examples/configs/distillation_math_megatron.yaml	Sets `megatron_cfg.moe_permute_fusion: true` in the Megatron distillation math config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

zpqiu · 2026-04-13T09:41:16Z

/ok to test 48e920e

copy-pr-bot · 2026-04-13T09:41:19Z

/ok to test 48e920e

@zpqiu, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

zpqiu · 2026-04-13T09:42:26Z

/ok to test 018ca26

These recipe configs previously overrode the base default (false) with true. Now that the base default is true, these overrides are redundant and fail the minimize-check lint. Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

terrykong · 2026-04-14T05:50:49Z

/ok to test 4346565

…ward (NVIDIA-NeMo#2258) Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

zpqiu requested review from a team as code owners April 13, 2026 09:30

zpqiu requested review from Copilot, terrykong and yfw April 13, 2026 09:31

Copilot started reviewing on behalf of zpqiu April 13, 2026 09:31 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Comment thread examples/configs/grpo_math_1B.yaml

fix: also update nemo_gym and research template configs

018ca26

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

zpqiu added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Apr 13, 2026

copy-pr-bot Bot temporarily deployed to nemo-ci April 13, 2026 09:42 Inactive

terrykong approved these changes Apr 13, 2026

View reviewed changes

yfw approved these changes Apr 14, 2026

View reviewed changes

terrykong enabled auto-merge (squash) April 14, 2026 05:48

terrykong approved these changes Apr 14, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to nemo-ci April 14, 2026 05:51 Inactive

terrykong merged commit dd3e8b7 into NVIDIA-NeMo:main Apr 14, 2026
27 checks passed

snivertynv pushed a commit to snivertynv/RL that referenced this pull request May 5, 2026

fix: set moe_permute_fusion default to true for deterministic MoE for…

d1c140b

…ward (NVIDIA-NeMo#2258) Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

qiaochuz-nv mentioned this pull request May 5, 2026

Should moe_permute_fusion default to true for MoE models? #2255

Closed

seonjinn mentioned this pull request May 15, 2026

perf: enable MoE GroupedGEMM for MoE models #2278

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: set moe_permute_fusion default to true for deterministic MoE forward#2258

fix: set moe_permute_fusion default to true for deterministic MoE forward#2258
terrykong merged 3 commits into
NVIDIA-NeMo:mainfrom
zpqiu:alexq/fix-moe-permute-fusion-default

zpqiu commented Apr 13, 2026

Uh oh!

copy-pr-bot Bot commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

zpqiu commented Apr 13, 2026

Uh oh!

copy-pr-bot Bot commented Apr 13, 2026

Uh oh!

zpqiu commented Apr 13, 2026

Uh oh!

terrykong commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zpqiu commented Apr 13, 2026

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

zpqiu commented Apr 13, 2026

Uh oh!

copy-pr-bot Bot commented Apr 13, 2026

Uh oh!

zpqiu commented Apr 13, 2026

Uh oh!

terrykong commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants