Fix unified_export_megatron for transformers 5.6 by kevalmorabia97 · Pull Request #1335 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2026-04-23T17:43:29Z

What does this PR do?

Type of change: Bug fix

Broaden the exception handler around AutoTokenizer.from_pretrained in GPTModelExporter.save_pretrained to also catch ValueError and ImportError.

In transformers 4.x, attempting to load a tokenizer from a directory that contains only config.json (no tokenizer.json / tokenizer.model / tokenizer_config.json) raised OSError, which was already handled. In transformers 5.x the resolution path now reaches PreTrainedTokenizerFast.__init__ and raises a ValueError ("Couldn't instantiate the backend tokenizer from one of: ...") when none of the three backend sources are available. This caused export_mcore_gpt_to_hf to hard-fail for checkpoint directories that don't carry tokenizer files — including tests/gpu_megatron/torch/export/test_unified_export_megatron.py, which writes only a minimal config.json.

The broadened except (OSError, TypeError, ValueError, ImportError) mirrors the pattern already used just below for AutoProcessor.from_pretrained and keeps tokenizer export best-effort, as originally intended.

Usage

No API change. Existing call sites continue to work:

from modelopt.torch.export import export_mcore_gpt_to_hf

export_mcore_gpt_to_hf(
    model,
    pretrained_model_name_or_path,  # may or may not contain tokenizer files
    dtype=torch.bfloat16,
    export_dir=export_dir,
)

Testing

Reproduced the failure on transformers==5.6 with:
```
pytest tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[llama-LlamaForCausalLM-medusa-None-None]
```
which failed with ValueError: Couldn't instantiate the backend tokenizer... raised from unified_export_megatron.py:299.
After the fix, the same parametrization passes, and the other llama / nemotron / eagle / medusa parametrizations in the same test file remain green.
No behavioral change on transformers 4.x: the OSError path is still caught.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A
Did you update Changelog?: N/A

Additional Information

Triggered by the upgrade to transformers 5.6 (used in nvcr.io/nvidia/nemo:26.04, which is the container for the gpu_megatron nox session). The error message from transformers — "You need to have sentencepiece or tiktoken installed..." — is a misleading generic fallback; sentencepiece and tiktoken are already pulled in via the [hf] extras, and the real cause is the missing tokenizer files in the export source directory.

Summary by CodeRabbit

Refactor
- Improved error handling in the export process to gracefully manage additional exception types.
Tests
- Enhanced test validation for Megatron export by using actual tokenizer artifacts, ensuring model vocabulary size matches test tokenizer configuration.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

coderabbitai · 2026-04-23T17:43:42Z

📝 Walkthrough

Walkthrough

The pull request refactors exception handling in tokenizer saving to catch OSError, TypeError, ValueError, and ImportError together, and updates the Megatron export test to use a real tiny tokenizer instead of hardcoded vocabulary size values.

Changes

Cohort / File(s)	Summary
Exception handling refactoring `modelopt/torch/export/unified_export_megatron.py`	Consolidated multiple exception handlers into a single catch block that silently ignores `OSError`, `TypeError`, `ValueError`, and `ImportError` during tokenizer saving.
Test tokenizer provisioning `tests/gpu_megatron/torch/export/test_unified_export_megatron.py`	Updated test to provision a tiny Hugging Face tokenizer via `get_tiny_tokenizer()` and `save_pretrained()`, replacing hardcoded `vocab_size = 64` with dynamic `vocab_size = tokenizer.vocab_size`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: fixing unified_export_megatron for transformers 5.6 compatibility, which aligns with the code changes addressing tokenizer exception handling and test setup.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected. No unsafe torch.load(), numpy.load(), hardcoded trust_remote_code=True, eval()/exec() calls, nosec comments, or non-permissive dependencies found.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kmorabia/tf-5.6-megatron-test-fix

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

modelopt/torch/export/unified_export_megatron.py (1)

304-305: Add minimal diagnostics when tokenizer export is skipped.

Line 304 swallows multiple exceptions and immediately passes, which makes export failures hard to debug later. A lightweight log/print here would improve operability without changing behavior.

♻️ Suggested patch

-            except (OSError, TypeError, ValueError, ImportError):
-                pass
+            except (OSError, TypeError, ValueError, ImportError) as e:
+                print(
+                    f"Skipping tokenizer.save_pretrained due to "
+                    f"{type(e).__name__}: {e}"
+                )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/unified_export_megatron.py` around lines 304 - 305, The
except block that currently catches (OSError, TypeError, ValueError,
ImportError) and simply passes during the tokenizer export should emit a
lightweight diagnostic so failures are visible; locate the try/except around the
tokenizer export (the block catching (OSError, TypeError, ValueError,
ImportError) where the tokenizer is being exported/saved) and replace the bare
pass with a warning log/print that states "tokenizer export skipped" plus the
exception message (capture the exception as e and include e or use
logger.exception/logging.warning with exc_info=False to avoid raising), then
keep the same control flow (do not re-raise).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/export/unified_export_megatron.py`:
- Around line 304-305: The except block that currently catches (OSError,
TypeError, ValueError, ImportError) and simply passes during the tokenizer
export should emit a lightweight diagnostic so failures are visible; locate the
try/except around the tokenizer export (the block catching (OSError, TypeError,
ValueError, ImportError) where the tokenizer is being exported/saved) and
replace the bare pass with a warning log/print that states "tokenizer export
skipped" plus the exception message (capture the exception as e and include e or
use logger.exception/logging.warning with exc_info=False to avoid raising), then
keep the same control flow (do not re-raise).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 62f4c0d5-01bd-4c5d-b50d-b2145f31ce29

📥 Commits

Reviewing files that changed from the base of the PR and between 8663678 and aa8cc8a.

📒 Files selected for processing (2)

modelopt/torch/export/unified_export_megatron.py
tests/gpu_megatron/torch/export/test_unified_export_megatron.py

github-actions · 2026-04-23T17:47:08Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-23 19:23 UTC

codecov · 2026-04-23T17:56:55Z

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 75.74%. Comparing base (e4e3508) to head (aa8cc8a).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/export/unified_export_megatron.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1335      +/-   ##
==========================================
+ Coverage   74.67%   75.74%   +1.07%     
==========================================
  Files         468      468              
  Lines       50374    50372       -2     
==========================================
+ Hits        37615    38154     +539     
+ Misses      12759    12218     -541

Flag	Coverage Δ
examples	`41.58% <0.00%> (+5.97%)`	⬆️
gpu	`58.32% <0.00%> (-0.70%)`	⬇️
regression	`14.78% <0.00%> (+0.02%)`	⬆️
unit	`52.52% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Fix unified_export_megatron for transformers 5.6

aa8cc8a

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 requested a review from a team as a code owner April 23, 2026 17:43

kevalmorabia97 requested review from ChenhanYu, Edwardf0t1 and cjluo-nv April 23, 2026 17:43

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

kevalmorabia97 enabled auto-merge (squash) April 23, 2026 17:47

cjluo-nv approved these changes Apr 23, 2026

View reviewed changes

ChenhanYu approved these changes Apr 23, 2026

View reviewed changes

kevalmorabia97 merged commit 0a1ca5d into main Apr 23, 2026
57 of 61 checks passed

kevalmorabia97 deleted the kmorabia/tf-5.6-megatron-test-fix branch April 23, 2026 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unified_export_megatron for transformers 5.6#1335

Fix unified_export_megatron for transformers 5.6#1335
kevalmorabia97 merged 1 commit intomainfrom
kmorabia/tf-5.6-megatron-test-fix

kevalmorabia97 commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kevalmorabia97 commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kevalmorabia97 commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

github-actions Bot commented Apr 23, 2026 •

edited

Loading

codecov Bot commented Apr 23, 2026 •

edited

Loading