Skip to content

test: mark TestFusedApplyMLARope::test_forward_backward_for_q flaky_in_dev#4639

Merged
ko3n1g merged 1 commit intoNVIDIA:mainfrom
ko3n1g:ko3n1g/test/mark-mla-rope-q-flaky
May 6, 2026
Merged

test: mark TestFusedApplyMLARope::test_forward_backward_for_q flaky_in_dev#4639
ko3n1g merged 1 commit intoNVIDIA:mainfrom
ko3n1g:ko3n1g/test/mark-mla-rope-q-flaky

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented May 6, 2026

Claude summary

Mark TestFusedApplyMLARope::test_forward_backward_for_q as flaky_in_dev.

The backward gradient assert_close intermittently exceeds bf16 tolerances
on the thd packed-sequence path. Observed failure in CI run
25408640123:

FAILED tests/unit_tests/fusions/test_mla_yarn_rope_apply.py::TestFusedApplyMLARope::test_forward_backward_for_q[thd]
E   AssertionError: Mismatch in bwd: Tensor-likes are not close!
E   Mismatched elements: 31 / 786432 (0.0%)
E   Greatest absolute difference: 3.015625 at index (104, 29, 170) (up to 0.05 allowed)
E   Greatest relative difference: 33.509803771972656 at index (104, 28, 166) (up to 0.02 allowed)

The same merge-queue commit passed the unit-test job on rerun (run
25415024224),
confirming the bwd mismatch is non-deterministic in the fused
fused_apply_mla_rope_for_q kernel.

@pytest.mark.experimental
@pytest.mark.internal
@pytest.mark.skipif(not is_torch_min_version("2.5.0"), reason="Requires PyTorch >= 2.5.0")
@pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA not available")
@pytest.mark.parametrize("input_format", ["sbhd", "thd"])
class TestFusedApplyMLARope:
    @pytest.mark.flaky_in_dev
    def test_forward_backward_for_q(self, input_format):
        _test_fused_apply_mla_rope_for_q(input_format)

Owning code: megatron/core/fusions/fused_mla_yarn_rope_apply.py.

…n_dev

Backward gradient assertion intermittently exceeds bf16 tolerances on the thd
packed-sequence path (e.g. run 25408640123: 31 / 786,432 elements out of
tolerance, max abs diff 3.02 vs allowed 0.05). The same merge-queue commit
passed on rerun, confirming non-determinism in the fused kernel backward.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g
Copy link
Copy Markdown
Contributor Author

ko3n1g commented May 6, 2026

/ok to test b763c08

@svcnvidia-nemo-ci svcnvidia-nemo-ci marked this pull request as draft May 6, 2026 08:13
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

  1. Add the oncall reviewer (optional reviewer)
  2. Add required review teams based on your changes

See the contribution guide for more details.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 6, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ko3n1g ko3n1g marked this pull request as ready for review May 6, 2026 08:14
@ko3n1g ko3n1g merged commit c325855 into NVIDIA:main May 6, 2026
26 checks passed
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team May 6, 2026 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants