Enable selective recompute for `norm_out` in GDN layers by xuantengh · Pull Request #4715 · NVIDIA/Megatron-LM

xuantengh · 2026-05-10T11:18:28Z

This PR enables the selective recompute for the gated norm step in GDN layers.

You may enable this by adding the following options via MBridge recipe:

model.recompute_granularity=selective
model.recompute_modules=[gdn_norm_out]

The recompute module name may subject to change.

copy-pr-bot · 2026-05-10T11:18:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

xuantengh · 2026-05-10T11:19:21Z

/claude review

claude · 2026-05-10T11:26:16Z

+        if self.recompute_norm_out:
+            self.norm_out_checkpoint = tensor_parallel.CheckpointWithoutOutput()
+            norm_out = self.norm_out_checkpoint.checkpoint(
+                self._run_gated_norm_and_a2a, core_attn_out, gate
+            )
+        else:
+            norm_out = self._run_gated_norm_and_a2a(core_attn_out, gate)

        # Output projection
        nvtx_range_push(suffix="out_proj")
        out, out_bias = self.out_proj(norm_out)
        nvtx_range_pop(suffix="out_proj")

+        if self.recompute_norm_out:
+            self.norm_out_checkpoint.discard_output_and_register_recompute(out)


Nit: Other recompute modules (mla_up_proj, layernorm, moe_act) have unit test coverage for the forward/backward pass with recompute enabled. Consider adding a test case in tests/unit_tests/ssm/test_gated_delta_net.py that sets recompute_granularity="selective" and recompute_modules=["gdn_norm_out"] and verifies the forward+backward pass produces correct gradients. This would help catch regressions if the CheckpointWithoutOutput contract changes.

claude

LGTM. The CheckpointWithoutOutput pattern matches the established usage in transformer_layer.py and multi_latent_attention.py. Config validation for gdn_norm_out is correct. Left a minor suggestion about adding unit test coverage.

xuantengh · 2026-05-11T09:22:26Z

/ok to test cdaca66

add norm_out recompute in gdn

cdaca66

claude Bot reviewed May 10, 2026

View reviewed changes

claude Bot approved these changes May 10, 2026

View reviewed changes

xuantengh self-assigned this May 10, 2026

copy-pr-bot Bot temporarily deployed to test May 11, 2026 09:23 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable selective recompute for `norm_out` in GDN layers #4715

Enable selective recompute for `norm_out` in GDN layers #4715
xuantengh wants to merge 1 commit into
NVIDIA:mainfrom
xuantengh:xuantengh/gdn_recompute

xuantengh commented May 10, 2026

Uh oh!

copy-pr-bot Bot commented May 10, 2026

Uh oh!

xuantengh commented May 10, 2026

Uh oh!

claude Bot May 10, 2026

Uh oh!

claude Bot left a comment

Uh oh!

xuantengh commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xuantengh commented May 10, 2026

Uh oh!

copy-pr-bot Bot commented May 10, 2026

Uh oh!

xuantengh commented May 10, 2026

Uh oh!

claude Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

xuantengh commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant