Skip to content

chore: nightly sync main into dev (09_05_2026)#4713

Closed
svcnvidia-nemo-ci wants to merge 120 commits into
devfrom
main2dev/09_05_2026
Closed

chore: nightly sync main into dev (09_05_2026)#4713
svcnvidia-nemo-ci wants to merge 120 commits into
devfrom
main2dev/09_05_2026

Conversation

@svcnvidia-nemo-ci
Copy link
Copy Markdown

Summary

Nightly sync from main into dev.

  • Commits synced from main: 118
  • Total files changed: 492
  • Python lines: +36212 / -10810 across 269 files

Conflict resolution

Files taken from main (override per skill guidance)

These files have known semantic divergence where dev's versions referenced args/APIs that main removed or renamed:

  • megatron/training/training.py
  • megatron/training/initialize.py
  • megatron/training/utils.py
  • megatron/training/datasets/data_samplers.py
  • megatron/core/optimizer/layer_wise_optimizer.py

Files restored from main

  • megatron/core/pipeline_parallel/hybrid_cp_schedule.py — provides BalancedCPScheduler required by main's HybridCPDataLoaderWrapper (which is now appended to dev's data_schedule.py).

Files kept from dev

  • Dependency triple: pyproject.toml, uv.lock, docker/Dockerfile.ci.dev (dev-only deps including fast-hadamard-transform).
  • .github/CODEOWNERS (per skill: never modify).

Modify/delete conflicts resolved

Main wins (file deleted in main, modified in dev — -X theirs strategy):

  • megatron/legacy/model/__init__.py
  • megatron/legacy/model/transformer.py
  • tools/checkpoint/loader_legacy.py
  • tools/checkpoint/loader_llama_mistral.py

Dev wins (file deleted in dev, modified in main — kept dev's deletion):

  • .github/workflows/multi-approval-bot.yml

Manual surgical edit

  • megatron/core/datasets/data_schedule.py — kept dev's classes (BasePackingScheduler, DpBalancedScheduler, DefaultDynamicCPScheduler, wrap_data_iterator, get_batch_on_this_rank_for_sequence_packing) and appended main's HybridCPDataLoaderWrapper class with the required imports (Any, List, BalancedCPScheduler).

Validations performed

  • Pre-push CODEOWNERS-equality check: PASS
  • Pre-push dependency-triple-equality check: PASS
  • Black formatter: clean (with black==24.4.2, --config pyproject.toml)
  • isort on megatron/core/* changes: clean (with isort==5.13.2)
  • pylint on megatron/core/* changes: 10.00/10
  • Python AST syntax validation: 520 files in megatron/, 336 files in tests/ — no syntax errors
  • API mismatch detection on overridden files: all imports and call signatures verified consistent

Test plan

  • Nemo_CICD_Test (aggregate gate) passes
  • All non-exempt unit tests pass on pull-request/<PR>
  • Integration tests pass (functional + MBridge labels are added on PR creation)

minitu and others added 30 commits April 22, 2026 18:02
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: john2 <john2@jrlogin01.jureca>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: root <root@nvl72098-T17.cm.cluster>
Co-authored-by: William Dykas <wdykas@oci-hsg-cs-001-vscode-03.cm.cluster>
Co-authored-by: root <root@nvl72160-T13.cm.cluster>
…classmethod (#3812)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
#4403)

Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Co-authored-by: Siddharth Singh <sidsingh@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ss curve gaps for latent MoE models (#4433)

Signed-off-by: root <jiemingz@nvidia.com>
…4158)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…4422)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: rprenger <rprenger@nvidia.com>
Signed-off-by: qiyuw <qiyuw@nvidia.com>
Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
chtruong814 and others added 8 commits May 8, 2026 12:15
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
# Conflicts:
#	.github/workflows/multi-approval-bot.yml
#	megatron/legacy/model/__init__.py
#	megatron/legacy/model/transformer.py
#	tools/checkpoint/loader_legacy.py
#	tools/checkpoint/loader_llama_mistral.py
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 9, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@svcnvidia-nemo-ci svcnvidia-nemo-ci added Run functional tests Run MBridge tests Attach this for testing this PR against MBridge main labels May 9, 2026
@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test d424efd

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test d3ef2d7

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test 36c4849

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test 16e25df

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test 1d16a3e

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test 5d5a539

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

Superseded by today's nightly sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Run functional tests Run MBridge tests Attach this for testing this PR against MBridge main

Projects

None yet

Development

Successfully merging this pull request may close these issues.