Skip to content

chore: bumpup Megatron-Bridge submodule to main#2039

Merged
terrykong merged 13 commits intomainfrom
zhiyul/bump_up_mbridge
Mar 31, 2026
Merged

chore: bumpup Megatron-Bridge submodule to main#2039
terrykong merged 13 commits intomainfrom
zhiyul/bump_up_mbridge

Conversation

@ZhiyuLi-Nvidia
Copy link
Copy Markdown
Contributor

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia commented Mar 1, 2026

What does this PR do ?

Oncall responsibility, regularly bump up Megatron-Bridge/Megatron-LM dependency:

  • Megatron-Bridge bump 45dbf473 → a2bb70b9 (137 commits)
  • Megatron-LM bump 8318b809 → 9e2810417 (127 commits) — synced with what Megatron-Bridge expects
NeMo-RL
├── Megatron-LM           @ 9e2810417
└── Megatron-Bridge        @ a2bb70b9
    └── Megatron-LM        @ 9e2810417  (expected by Bridge)

add gradient_accumulation_fusion: false to address the following issue:

RuntimeError: ColumnParallelLinear was called with gradient_accumulation_fusion set to True but the custom CUDA extension fused_weight_gradient_mlp_cuda module is not found. To use gradient_accumulation_fusion you must install APEX with --cpp_ext and --cuda_ext. For example: pip install --global-option="--cpp_ext" --global-option="--cuda_ext ." Note that the extension requires CUDA>=11. Otherwise, you must turn off gradient accumulation fusion.

dsv3 style model defaults gradient_accumulation_fusion to true in mbridge:

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Chores
    • Updated Megatron-Bridge submodule reference.

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia requested a review from a team as a code owner March 1, 2026 10:57
@ZhiyuLi-Nvidia ZhiyuLi-Nvidia force-pushed the zhiyul/bump_up_mbridge branch from bedab04 to 3a26b67 Compare March 1, 2026 10:57
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 1, 2026

✅ Submodule Fast-Forward Check Results

Check based on commit: bedab04 (PR #2039 from zhiyul/bump_up_mbridge)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia added CI:L2 Run doctests, unit tests, functional tests, and convergence tests Run CICD labels Mar 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 1, 2026

✅ Submodule Fast-Forward Check Results

Check based on commit: 3a26b67 (PR #2039 from zhiyul/bump_up_mbridge)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 1, 2026

📝 Walkthrough

Walkthrough

Updates the Megatron-Bridge submodule pointer from commit f91542b90908ad08b7e13672feea03e27bedee27 to commit 383b610dca1fe553346b37fc0172f54917b02069. No functional code changes—only the submodule reference is updated.

Changes

Cohort / File(s) Summary
Submodule Update
3rdparty/Megatron-Bridge-workspace/Megatron-Bridge
Submodule pointer updated to a newer commit reference.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

Suggested labels

CI:L1

Suggested reviewers

  • terrykong
  • yaoyu-33
  • yfw
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: bumping up the Megatron-Bridge submodule, which is the sole purpose of this pull request.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR contains only a minor submodule pointer update with no code or logic changes, qualifying as a routine dependency bump.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch zhiyul/bump_up_mbridge

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 1, 2026

✅ Submodule Fast-Forward Check Results

Check based on commit: ac9fa35 (PR #2039 from zhiyul/bump_up_mbridge)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia force-pushed the zhiyul/bump_up_mbridge branch from ac9fa35 to dd5f479 Compare March 25, 2026 02:09
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 25, 2026

❌ Submodule Fast-Forward Check Failed

Check based on commit: dd5f479 (PR #2039 from zhiyul/bump_up_mbridge)

✅ Submodules that are properly updated:

Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-Bridge: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/45dbf4735ec3a2034be66ed4183604642a84af0d/
CURRENT (PR #2039 from zhiyul/bump_up_mbridge): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/a2bb70b91b827bd6b085a77442c7cf60cfdb59fe/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

It is as expected as current branch uses cherry-picked fix commits.

@ZhiyuLi-Nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test dd5f479

@ZhiyuLi-Nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test a522794

@github-actions
Copy link
Copy Markdown

❌ Submodule Fast-Forward Check Failed

Check based on commit: a522794 (PR #2039 from zhiyul/bump_up_mbridge)

✅ Submodules that are properly updated:

Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-Bridge: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/45dbf4735ec3a2034be66ed4183604642a84af0d/
CURRENT (PR #2039 from zhiyul/bump_up_mbridge): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/a2bb70b91b827bd6b085a77442c7cf60cfdb59fe/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@github-actions
Copy link
Copy Markdown

❌ Submodule Fast-Forward Check Failed

Check based on commit: a8fa0ff (PR #2039 from zhiyul/bump_up_mbridge)

✅ Submodules that are properly updated:

Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-Bridge: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/45dbf4735ec3a2034be66ed4183604642a84af0d/
CURRENT (PR #2039 from zhiyul/bump_up_mbridge): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/a2bb70b91b827bd6b085a77442c7cf60cfdb59fe/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@ZhiyuLi-Nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test a8fa0ff

@ZhiyuLi-Nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test 56b8508

@github-actions
Copy link
Copy Markdown

❌ Submodule Fast-Forward Check Failed

Check based on commit: 56b8508 (PR #2039 from zhiyul/bump_up_mbridge)

✅ Submodules that are properly updated:

Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-Bridge: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/45dbf4735ec3a2034be66ed4183604642a84af0d/
CURRENT (PR #2039 from zhiyul/bump_up_mbridge): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/a2bb70b91b827bd6b085a77442c7cf60cfdb59fe/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@ZhiyuLi-Nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test ab73597

@github-actions
Copy link
Copy Markdown

❌ Submodule Fast-Forward Check Failed

Check based on commit: ab73597 (PR #2039 from zhiyul/bump_up_mbridge)

✅ Submodules that are properly updated:

Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-Bridge: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/45dbf4735ec3a2034be66ed4183604642a84af0d/
CURRENT (PR #2039 from zhiyul/bump_up_mbridge): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/a2bb70b91b827bd6b085a77442c7cf60cfdb59fe/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

ZhiyuLi-Nvidia and others added 9 commits March 31, 2026 08:27
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
…t-llama3.1-8b-1n8g-fsdp2tp4-dynamicbatch

Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 force-pushed the zhiyul/bump_up_mbridge branch from 32220d2 to 77a4b2c Compare March 31, 2026 00:27
@github-actions
Copy link
Copy Markdown

❌ Submodule Fast-Forward Check Failed

Check based on commit: 32220d2 (PR #2039 from zhiyul/bump_up_mbridge)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-Bridge: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/45dbf4735ec3a2034be66ed4183604642a84af0d/
CURRENT (PR #2039 from zhiyul/bump_up_mbridge): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/a2bb70b91b827bd6b085a77442c7cf60cfdb59fe/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@github-actions
Copy link
Copy Markdown

❌ Submodule Fast-Forward Check Failed

Check based on commit: 77a4b2c (PR #2039 from zhiyul/bump_up_mbridge)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-Bridge: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/45dbf4735ec3a2034be66ed4183604642a84af0d/
CURRENT (PR #2039 from zhiyul/bump_up_mbridge): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/a2bb70b91b827bd6b085a77442c7cf60cfdb59fe/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@yuki-97
Copy link
Copy Markdown
Contributor

yuki-97 commented Mar 31, 2026

/ok to test 77a4b2c

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 added CI:docs Run doctest and removed CI:L1 Run doctests, unit tests, and functional tests Run CICD labels Mar 31, 2026
@yuki-97
Copy link
Copy Markdown
Contributor

yuki-97 commented Mar 31, 2026

/ok to test b8f5647

@github-actions
Copy link
Copy Markdown

❌ Submodule Fast-Forward Check Failed

Check based on commit: b8f5647 (PR #2039 from zhiyul/bump_up_mbridge)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-Bridge: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/45dbf4735ec3a2034be66ed4183604642a84af0d/
CURRENT (PR #2039 from zhiyul/bump_up_mbridge): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/a2bb70b91b827bd6b085a77442c7cf60cfdb59fe/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all nightly tests passed, could be merged.

CI:L1 passed in 77a4b2c in https://github.com/NVIDIA-NeMo/RL/actions/runs/23774491454/job/69279367064.

only scripts changes in b8f5647 after that, so run with CI:docs

@terrykong terrykong merged commit bc8aa39 into main Mar 31, 2026
25 of 26 checks passed
@terrykong terrykong deleted the zhiyul/bump_up_mbridge branch March 31, 2026 05:36
isomap pushed a commit to isomap/RL that referenced this pull request Mar 31, 2026
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:docs Run doctest

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants