feat: support override HF model name in convert_megatron_to_hf by dhineshkumar-r · Pull Request #2202 · NVIDIA-NeMo/RL

dhineshkumar-r · 2026-04-03T06:35:19Z

…F format.

What does this PR do ?

Enables a way to override hf_model_name when converting checkpoints from megatron to HF format. This is useful for models like GPT-OSS whose base checkpoint precision(mxfp4) is different from supported export precision(bfloat16) in Megatron-Bridge, Ref.

Issues

List issues that this PR closes (syntax):

closes #2124

Usage

If openai/gpt-oss-20b is finetuned in bfloat16 precision and checkpoints are stored in megatron format, the override argument can be used to pass the supported unsloth/gpt-oss-20b-BF16 hf model name to use the config corresponding to bf16 precision.

uv run --extra mcore python examples/converters/convert_megatron_to_hf.py \
  --config <path_to_gpt_oss_20b_megatron_ckpt>/config.yaml \
  --hf-model-name unsloth/gpt-oss-20b-BF16 \
  --megatron-ckpt-path <path_to_gpt_oss_20b_megatron_ckpt>/policy/weights/iter_xxxxx \
  --hf-ckpt-path <path_to_save_hf_ckpt>

Before your PR is "Ready for review"

Pre checks:

[Y] Make sure you read and followed Contributor guidelines
[N] Did you write any new necessary tests?
[N] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
[Y] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

Although there are not unit tests, verified the change works as expected locally by running above mentioned command with gpt-oss-20b checkpoint finetuned in bfloat16 precision. Did not notice any layers missing warnings.

...

copy-pr-bot · 2026-04-03T06:35:23Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yuki-97

thanks @dhineshkumar-r , changes lgtm.

could you help to add what you mentioned in the PR description to docs/design-docs/checkpointing.md, this will help people who meet the same situation.

dhineshkumar-r · 2026-04-03T22:50:09Z

Done. Please take a look.
cc: @yuki-97

yuki-97 · 2026-04-04T11:01:01Z

/ok to test d162d13

yuki-97 · 2026-04-04T12:41:18Z

hi @dhineshkumar-r , there's a lint check fail, could you use pre-commit run --all-files to fix it?

…F format. Signed-off-by: Dhineshkumar Ramasubbu <dhineshkumar.ramasubbu@gmail.com>

dhineshkumar-r · 2026-04-04T14:34:55Z

Yes, I don't see it fail anymore. Please let me know if anything else.

yuki-97

thanks, let me re-trigger CI.

yuki-97 · 2026-04-04T14:36:26Z

/ok to test 3ea26b7

dhineshkumar-r requested a review from a team as a code owner April 3, 2026 06:35

github-actions Bot added the community-request label Apr 3, 2026

dhineshkumar-r mentioned this pull request Apr 3, 2026

Many layers missing when converting GPTOSS Megatron checkpoint to HF format #2124

Closed

yuki-97 reviewed Apr 3, 2026

View reviewed changes

dhineshkumar-r force-pushed the checkpoint-conversion-hf-model-override branch 2 times, most recently from 5951c90 to bed5b3b Compare April 3, 2026 22:38

dhineshkumar-r requested a review from a team as a code owner April 3, 2026 22:38

github-actions Bot added the Documentation Improvements or additions to documentation label Apr 3, 2026

dhineshkumar-r force-pushed the checkpoint-conversion-hf-model-override branch 2 times, most recently from 4d22f5a to d162d13 Compare April 3, 2026 22:49

yuki-97 changed the title ~~Argument to override HF model name when converting from megatron to H…~~ feat: support override HF model name in convert_megatron_to_hf Apr 4, 2026

yuki-97 previously approved these changes Apr 4, 2026

View reviewed changes

yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Apr 4, 2026

yuki-97 enabled auto-merge (squash) April 4, 2026 11:01

copy-pr-bot Bot temporarily deployed to nemo-ci April 4, 2026 11:01 Inactive

auto-merge was automatically disabled April 4, 2026 14:26
Head branch was pushed to by a user without write access

dhineshkumar-r dismissed yuki-97’s stale review via 33bdeec April 4, 2026 14:26

dhineshkumar-r force-pushed the checkpoint-conversion-hf-model-override branch from d162d13 to 33bdeec Compare April 4, 2026 14:26

Argument to override HF model name when converting from megatron to H…

3ea26b7

…F format. Signed-off-by: Dhineshkumar Ramasubbu <dhineshkumar.ramasubbu@gmail.com>

dhineshkumar-r force-pushed the checkpoint-conversion-hf-model-override branch from 33bdeec to 3ea26b7 Compare April 4, 2026 14:29

yuki-97 approved these changes Apr 4, 2026

View reviewed changes

yuki-97 enabled auto-merge (squash) April 4, 2026 14:36

copy-pr-bot Bot temporarily deployed to nemo-ci April 4, 2026 14:36 Inactive

yuki-97 merged commit fe3c4fc into NVIDIA-NeMo:main Apr 4, 2026
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support override HF model name in convert_megatron_to_hf#2202

feat: support override HF model name in convert_megatron_to_hf#2202
yuki-97 merged 1 commit into
NVIDIA-NeMo:mainfrom
dhineshkumar-r:checkpoint-conversion-hf-model-override

dhineshkumar-r commented Apr 3, 2026

Uh oh!

copy-pr-bot Bot commented Apr 3, 2026

Uh oh!

yuki-97 left a comment •

edited

Loading

Uh oh!

dhineshkumar-r commented Apr 3, 2026 •

edited

Loading

Uh oh!

yuki-97 commented Apr 4, 2026

Uh oh!

yuki-97 commented Apr 4, 2026

Uh oh!

dhineshkumar-r commented Apr 4, 2026

Uh oh!

yuki-97 left a comment

Uh oh!

yuki-97 commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dhineshkumar-r commented Apr 3, 2026

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Apr 3, 2026

Uh oh!

yuki-97 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhineshkumar-r commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuki-97 commented Apr 4, 2026

Uh oh!

yuki-97 commented Apr 4, 2026

Uh oh!

dhineshkumar-r commented Apr 4, 2026

Uh oh!

yuki-97 left a comment

Choose a reason for hiding this comment

Uh oh!

yuki-97 commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuki-97 left a comment •

edited

Loading

dhineshkumar-r commented Apr 3, 2026 •

edited

Loading