Skip to content

chore: nightly sync main into dev (08_05_2026)#4708

Closed
svcnvidia-nemo-ci wants to merge 118 commits into
devfrom
main2dev/08_05_2026
Closed

chore: nightly sync main into dev (08_05_2026)#4708
svcnvidia-nemo-ci wants to merge 118 commits into
devfrom
main2dev/08_05_2026

Conversation

@svcnvidia-nemo-ci
Copy link
Copy Markdown

Summary

Automated nightly sync of maindev for 08_05_2026.

  • Commits synced from main: 116
  • Files changed: 482
  • Total lines: +68,337 / -87,809
  • Python lines: +35,086 / -10,517 across 266 files

Files where main's version was taken (intent-driven resolutions)

The merge used -X theirs to favor main on conflicts. The following are
notable conflict resolutions:

Files deleted from main (kept deleted)

  • megatron/legacy/model/__init__.py — main removed in remove legacy tranformer and modules #4207 ("remove legacy transformer and modules")
  • megatron/legacy/model/transformer.py — same removal commit
  • tools/checkpoint/loader_legacy.py — same removal commit
  • tools/checkpoint/loader_llama_mistral.py — same removal commit

Files dev-only and intentionally kept removed

Files restored from dev (NOT taken from main)

Per the sync skill, these are kept at dev's version:

  • pyproject.toml — preserves dev-only dependencies (fast-hadamard-transform, dev's pinned nvidia-resiliency-ext rev)
  • uv.lock — kept consistent with dev's pyproject.toml
  • docker/Dockerfile.ci.dev — kept consistent with dev's dependency-management triple
  • .github/CODEOWNERS — never touched by sync bot

No new git sources from main needed to be added (dev already has all required packages).

API Mismatches Detected and Fixed

  • HybridCPDataLoaderWrapper: main's training.py import line for this
    class was merged in (line 203) but the class no longer exists in dev's
    data_schedule.py (renamed/replaced by wrap_data_iterator API in dev's
    [Dev] feat: Dynamic CP (part 2) rewrite). Removed the dead import line.
    No call sites to fix — dev's wrap_data_iterator is already used
    throughout merged training.py.

Pre-push Invariants

  • .github/CODEOWNERS unchanged from origin/dev
  • pyproject.toml, uv.lock, docker/Dockerfile.ci.dev unchanged from origin/dev
  • ✅ All 250 changed Python files pass black==24.4.2 --skip-magic-trailing-comma --skip-string-normalization --check
  • ✅ All 250 changed Python files pass isort==5.13.2 --check
  • ✅ 90 changed megatron/core/ files: pylint==3.2.6 rated 10.00/10
  • ✅ All changed files: ruff==0.9.10 check passed

Remerge-diff (conflict resolutions only)

76 files had non-trivial merge conflicts resolved by -X theirs. Below is
the file-level stat from git show --remerge-diff --stat HEAD. The full
diff is omitted from this PR body for brevity (it is ~12,000 lines); use
git show --remerge-diff HEAD locally to inspect.

Conflict resolution stats (76 files)
 .github/workflows/cicd-main.yml                    |    5 -
 .github/workflows/multi-approval-bot.yml           |   74 -
 docker/Dockerfile.ci.dev                           |    4 -
 docs/conf.py                                       |   18 +-
 .../detxoify_lm/generate_samples_gpt.py            |   76 +-
 .../gpt/gpt_dynamic_inference_with_coordinator.py  |    6 +-
 examples/mimo/train.py                             |    6 +-
 examples/multimodal/layer_specs.py                 |    2 +-
 examples/multimodal/model.py                       |   85 +-
 examples/post_training/modelopt/convert_model.py   |   19 +-
 examples/post_training/modelopt/export.py          |    5 +-
 examples/post_training/modelopt/finetune.py        |   67 +-
 examples/post_training/modelopt/generate.py        |   27 +-
 examples/post_training/modelopt/mmlu.py            |   45 +-
 .../modelopt/offline_feature_extract.py            |   56 +-
 examples/post_training/modelopt/prune.py           |   13 +-
 examples/post_training/modelopt/quantize.py        |   55 +-
 examples/post_training/modelopt/validate.py        |   32 +-
 gpt_builders.py                                    |   77 +-
 hybrid_builders.py                                 |    4 +-
 megatron/core/datasets/readme.md                   |   64 -
 megatron/core/distributed/param_and_grad_buffer.py |    4 -
 megatron/core/transformer/mlp.py                   |    4 -
 megatron/core/transformer/moe/fused_a2a.py         |   13 -
 megatron/core/transformer/moe/moe_layer.py         |    8 -
 megatron/core/transformer/moe/token_dispatcher.py  |    4 -
 megatron/core/transformer/transformer_config.py    |   27 -
 megatron/core/transformer/transformer_layer.py     |   13 -
 megatron/elastification/arguments.py               |    6 +-
 megatron/elastification/flextron_utils.py          |   11 +-
 megatron/elastification/pretrain_hybrid_flex.py    |  136 +-
 .../elastification/router/hybrid_flex_router.py    |    7 +-
 megatron/legacy/model/__init__.py                  |    6 -
 megatron/legacy/model/transformer.py               | 1645 --------------------
 megatron/post_training/arguments.py                |    7 +-
 megatron/post_training/model_builder.py            |   55 +-
 megatron/training/activation_logging.py            |   37 +-
 megatron/training/argument_utils.py                |   90 +-
 megatron/training/arguments.py                     |  589 +------
 megatron/training/async_utils.py                   |    4 +-
 megatron/training/checkpointing.py                 |   33 +-
 megatron/training/config/__init__.py               |   27 +-
 megatron/training/config/container.py              |   40 +-
 megatron/training/config/instantiate_utils.py      |   46 +-
 megatron/training/config/training_config.py        |   24 +-
 megatron/training/config/utils.py                  |   13 +-
 megatron/training/config/yaml_utils.py             |   10 +-
 megatron/training/datasets/data_samplers.py        |    9 +-
 megatron/training/gpu_sniff_test.py                |   81 +-
 megatron/training/training.py                      |  153 +-
 megatron/training/utils.py                         |    4 -
 model_provider.py                                  |   12 +-
 pretrain_bert.py                                   |   32 +-
 pretrain_gpt.py                                    |   42 +-
 pretrain_hybrid.py                                 |   65 +-
 pretrain_mamba.py                                  |  363 -----
 pretrain_t5.py                                     |    2 +-
 pretrain_vlm.py                                    |   10 +-
 pyproject.toml                                     |   23 +-
 .../golden_values_dev_dgx_h100.json                |    8 -
 .../golden_values_dev_dgx_h100.json                |    8 -
 .../unit_tests/fusions/test_mla_yarn_rope_apply.py |   10 -
 tests/unit_tests/models/test_hybrid_moe_model.py   |   16 -
 tools/checkpoint/checkpoint_inspector.py           |    9 +-
 tools/checkpoint/convert.py                        |   62 +-
 tools/checkpoint/dist_checkpoint_io.py             |   45 +-
 tools/checkpoint/gpt_hybrid_conversion.py          |  171 +-
 tools/checkpoint/loader_legacy.py                  |  416 -----
 tools/checkpoint/loader_llama_mistral.py           |  751 ---------
 tools/checkpoint/loader_mixtral_hf.py              |   12 +-
 tools/checkpoint/remap_gpt_dsa_to_mamba.py         |    5 -
 tools/prepare_cache.py                             |    9 +-
 tools/preprocess_data.py                           |  217 ++-
 tools/preprocess_mmdata.py                         |  160 +-
 train_rl.py                                        |   20 +-
 uv.lock                                            | 1182 ++------------

minitu and others added 30 commits April 22, 2026 18:02
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: john2 <john2@jrlogin01.jureca>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: root <root@nvl72098-T17.cm.cluster>
Co-authored-by: William Dykas <wdykas@oci-hsg-cs-001-vscode-03.cm.cluster>
Co-authored-by: root <root@nvl72160-T13.cm.cluster>
…classmethod (#3812)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
#4403)

Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Co-authored-by: Siddharth Singh <sidsingh@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ss curve gaps for latent MoE models (#4433)

Signed-off-by: root <jiemingz@nvidia.com>
…4158)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…4422)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: rprenger <rprenger@nvidia.com>
Signed-off-by: qiyuw <qiyuw@nvidia.com>
Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@svcnvidia-nemo-ci svcnvidia-nemo-ci added Run functional tests Run MBridge tests Attach this for testing this PR against MBridge main labels May 8, 2026
@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test d268c6f

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test c776aad

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test 9d98047

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test dcf9563

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test dcf9563

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test 81ec280

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

/ok to test 6a17219

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Author

Superseded by today's nightly sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Run functional tests Run MBridge tests Attach this for testing this PR against MBridge main

Projects

None yet

Development

Successfully merging this pull request may close these issues.