Skip to content

Fix unified_export_megatron for transformers 5.6#1335

Merged
kevalmorabia97 merged 1 commit intomainfrom
kmorabia/tf-5.6-megatron-test-fix
Apr 23, 2026
Merged

Fix unified_export_megatron for transformers 5.6#1335
kevalmorabia97 merged 1 commit intomainfrom
kmorabia/tf-5.6-megatron-test-fix

Conversation

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 commented Apr 23, 2026

What does this PR do?

Type of change: Bug fix

Broaden the exception handler around AutoTokenizer.from_pretrained in GPTModelExporter.save_pretrained to also catch ValueError and ImportError.

In transformers 4.x, attempting to load a tokenizer from a directory that contains only config.json (no tokenizer.json / tokenizer.model / tokenizer_config.json) raised OSError, which was already handled. In transformers 5.x the resolution path now reaches PreTrainedTokenizerFast.__init__ and raises a ValueError ("Couldn't instantiate the backend tokenizer from one of: ...") when none of the three backend sources are available. This caused export_mcore_gpt_to_hf to hard-fail for checkpoint directories that don't carry tokenizer files — including tests/gpu_megatron/torch/export/test_unified_export_megatron.py, which writes only a minimal config.json.

The broadened except (OSError, TypeError, ValueError, ImportError) mirrors the pattern already used just below for AutoProcessor.from_pretrained and keeps tokenizer export best-effort, as originally intended.

Usage

No API change. Existing call sites continue to work:

from modelopt.torch.export import export_mcore_gpt_to_hf

export_mcore_gpt_to_hf(
    model,
    pretrained_model_name_or_path,  # may or may not contain tokenizer files
    dtype=torch.bfloat16,
    export_dir=export_dir,
)

Testing

  • Reproduced the failure on transformers==5.6 with:
    pytest tests/gpu_megatron/torch/export/test_unified_export_megatron.py::test_unified_export_megatron[llama-LlamaForCausalLM-medusa-None-None]
    
    which failed with ValueError: Couldn't instantiate the backend tokenizer... raised from unified_export_megatron.py:299.
  • After the fix, the same parametrization passes, and the other llama / nemotron / eagle / medusa parametrizations in the same test file remain green.
  • No behavioral change on transformers 4.x: the OSError path is still caught.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A
  • Did you update Changelog?: N/A

Additional Information

Triggered by the upgrade to transformers 5.6 (used in nvcr.io/nvidia/nemo:26.04, which is the container for the gpu_megatron nox session). The error message from transformers"You need to have sentencepiece or tiktoken installed..." — is a misleading generic fallback; sentencepiece and tiktoken are already pulled in via the [hf] extras, and the real cause is the missing tokenizer files in the export source directory.

Summary by CodeRabbit

  • Refactor

    • Improved error handling in the export process to gracefully manage additional exception types.
  • Tests

    • Enhanced test validation for Megatron export by using actual tokenizer artifacts, ensuring model vocabulary size matches test tokenizer configuration.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner April 23, 2026 17:43
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

The pull request refactors exception handling in tokenizer saving to catch OSError, TypeError, ValueError, and ImportError together, and updates the Megatron export test to use a real tiny tokenizer instead of hardcoded vocabulary size values.

Changes

Cohort / File(s) Summary
Exception handling refactoring
modelopt/torch/export/unified_export_megatron.py
Consolidated multiple exception handlers into a single catch block that silently ignores OSError, TypeError, ValueError, and ImportError during tokenizer saving.
Test tokenizer provisioning
tests/gpu_megatron/torch/export/test_unified_export_megatron.py
Updated test to provision a tiny Hugging Face tokenizer via get_tiny_tokenizer() and save_pretrained(), replacing hardcoded vocab_size = 64 with dynamic vocab_size = tokenizer.vocab_size.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: fixing unified_export_megatron for transformers 5.6 compatibility, which aligns with the code changes addressing tokenizer exception handling and test setup.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns detected. No unsafe torch.load(), numpy.load(), hardcoded trust_remote_code=True, eval()/exec() calls, nosec comments, or non-permissive dependencies found.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmorabia/tf-5.6-megatron-test-fix

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
modelopt/torch/export/unified_export_megatron.py (1)

304-305: Add minimal diagnostics when tokenizer export is skipped.

Line 304 swallows multiple exceptions and immediately passes, which makes export failures hard to debug later. A lightweight log/print here would improve operability without changing behavior.

♻️ Suggested patch
-            except (OSError, TypeError, ValueError, ImportError):
-                pass
+            except (OSError, TypeError, ValueError, ImportError) as e:
+                print(
+                    f"Skipping tokenizer.save_pretrained due to "
+                    f"{type(e).__name__}: {e}"
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/unified_export_megatron.py` around lines 304 - 305, The
except block that currently catches (OSError, TypeError, ValueError,
ImportError) and simply passes during the tokenizer export should emit a
lightweight diagnostic so failures are visible; locate the try/except around the
tokenizer export (the block catching (OSError, TypeError, ValueError,
ImportError) where the tokenizer is being exported/saved) and replace the bare
pass with a warning log/print that states "tokenizer export skipped" plus the
exception message (capture the exception as e and include e or use
logger.exception/logging.warning with exc_info=False to avoid raising), then
keep the same control flow (do not re-raise).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/export/unified_export_megatron.py`:
- Around line 304-305: The except block that currently catches (OSError,
TypeError, ValueError, ImportError) and simply passes during the tokenizer
export should emit a lightweight diagnostic so failures are visible; locate the
try/except around the tokenizer export (the block catching (OSError, TypeError,
ValueError, ImportError) where the tokenizer is being exported/saved) and
replace the bare pass with a warning log/print that states "tokenizer export
skipped" plus the exception message (capture the exception as e and include e or
use logger.exception/logging.warning with exc_info=False to avoid raising), then
keep the same control flow (do not re-raise).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 62f4c0d5-01bd-4c5d-b50d-b2145f31ce29

📥 Commits

Reviewing files that changed from the base of the PR and between 8663678 and aa8cc8a.

📒 Files selected for processing (2)
  • modelopt/torch/export/unified_export_megatron.py
  • tests/gpu_megatron/torch/export/test_unified_export_megatron.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-23 19:23 UTC

@kevalmorabia97 kevalmorabia97 enabled auto-merge (squash) April 23, 2026 17:47
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 75.74%. Comparing base (e4e3508) to head (aa8cc8a).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/export/unified_export_megatron.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1335      +/-   ##
==========================================
+ Coverage   74.67%   75.74%   +1.07%     
==========================================
  Files         468      468              
  Lines       50374    50372       -2     
==========================================
+ Hits        37615    38154     +539     
+ Misses      12759    12218     -541     
Flag Coverage Δ
examples 41.58% <0.00%> (+5.97%) ⬆️
gpu 58.32% <0.00%> (-0.70%) ⬇️
regression 14.78% <0.00%> (+0.02%) ⬆️
unit 52.52% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 merged commit 0a1ca5d into main Apr 23, 2026
57 of 61 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kmorabia/tf-5.6-megatron-test-fix branch April 23, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants