test: mark TestFusedApplyMLARope::test_forward_backward_for_q flaky_in_dev by ko3n1g · Pull Request #4639 · NVIDIA/Megatron-LM

ko3n1g · 2026-05-06T08:13:27Z

Claude summary

Mark TestFusedApplyMLARope::test_forward_backward_for_q as flaky_in_dev.

The backward gradient assert_close intermittently exceeds bf16 tolerances
on the thd packed-sequence path. Observed failure in CI run
25408640123:

FAILED tests/unit_tests/fusions/test_mla_yarn_rope_apply.py::TestFusedApplyMLARope::test_forward_backward_for_q[thd]
E   AssertionError: Mismatch in bwd: Tensor-likes are not close!
E   Mismatched elements: 31 / 786432 (0.0%)
E   Greatest absolute difference: 3.015625 at index (104, 29, 170) (up to 0.05 allowed)
E   Greatest relative difference: 33.509803771972656 at index (104, 28, 166) (up to 0.02 allowed)

The same merge-queue commit passed the unit-test job on rerun (run
25415024224),
confirming the bwd mismatch is non-deterministic in the fused
fused_apply_mla_rope_for_q kernel.

@pytest.mark.experimental
@pytest.mark.internal
@pytest.mark.skipif(not is_torch_min_version("2.5.0"), reason="Requires PyTorch >= 2.5.0")
@pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA not available")
@pytest.mark.parametrize("input_format", ["sbhd", "thd"])
class TestFusedApplyMLARope:
    @pytest.mark.flaky_in_dev
    def test_forward_backward_for_q(self, input_format):
        _test_fused_apply_mla_rope_for_q(input_format)

Owning code: megatron/core/fusions/fused_mla_yarn_rope_apply.py.

…n_dev Backward gradient assertion intermittently exceeds bf16 tolerances on the thd packed-sequence path (e.g. run 25408640123: 31 / 786,432 elements out of tolerance, max abs diff 3.02 vs allowed 0.05). The same merge-queue commit passed on rerun, confirming non-determinism in the fused kernel backward. Signed-off-by: oliver könig <okoenig@nvidia.com>

ko3n1g · 2026-05-06T08:13:32Z

/ok to test b763c08

github-actions · 2026-05-06T08:13:46Z

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

Add the oncall reviewer (optional reviewer)
Add required review teams based on your changes

See the contribution guide for more details.

copy-pr-bot · 2026-05-06T08:13:48Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

svcnvidia-nemo-ci marked this pull request as draft May 6, 2026 08:13

ko3n1g mentioned this pull request May 6, 2026

Flaky: TestFusedApplyMLARope::test_forward_backward_for_q[thd] backward mismatch exceeds bf16 tolerances #4640

Open

copy-pr-bot Bot temporarily deployed to test May 6, 2026 08:14 Inactive

ko3n1g marked this pull request as ready for review May 6, 2026 08:14

ko3n1g merged commit c325855 into NVIDIA:main May 6, 2026
26 checks passed

svcnvidia-nemo-ci requested a review from a team May 6, 2026 08:14

svcnvidia-nemo-ci added the complexity: low label May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: mark TestFusedApplyMLARope::test_forward_backward_for_q flaky_in_dev#4639

test: mark TestFusedApplyMLARope::test_forward_backward_for_q flaky_in_dev#4639
ko3n1g merged 1 commit intoNVIDIA:mainfrom
ko3n1g:ko3n1g/test/mark-mla-rope-q-flaky

ko3n1g commented May 6, 2026

Uh oh!

ko3n1g commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

copy-pr-bot Bot commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ko3n1g commented May 6, 2026

Uh oh!

ko3n1g commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

copy-pr-bot Bot commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants