Fix unified_export_megatron for transformers 5.6#1335
Conversation
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
📝 WalkthroughWalkthroughThe pull request refactors exception handling in tokenizer saving to catch Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/torch/export/unified_export_megatron.py (1)
304-305: Add minimal diagnostics when tokenizer export is skipped.Line 304 swallows multiple exceptions and immediately
passes, which makes export failures hard to debug later. A lightweight log/print here would improve operability without changing behavior.♻️ Suggested patch
- except (OSError, TypeError, ValueError, ImportError): - pass + except (OSError, TypeError, ValueError, ImportError) as e: + print( + f"Skipping tokenizer.save_pretrained due to " + f"{type(e).__name__}: {e}" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/export/unified_export_megatron.py` around lines 304 - 305, The except block that currently catches (OSError, TypeError, ValueError, ImportError) and simply passes during the tokenizer export should emit a lightweight diagnostic so failures are visible; locate the try/except around the tokenizer export (the block catching (OSError, TypeError, ValueError, ImportError) where the tokenizer is being exported/saved) and replace the bare pass with a warning log/print that states "tokenizer export skipped" plus the exception message (capture the exception as e and include e or use logger.exception/logging.warning with exc_info=False to avoid raising), then keep the same control flow (do not re-raise).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@modelopt/torch/export/unified_export_megatron.py`:
- Around line 304-305: The except block that currently catches (OSError,
TypeError, ValueError, ImportError) and simply passes during the tokenizer
export should emit a lightweight diagnostic so failures are visible; locate the
try/except around the tokenizer export (the block catching (OSError, TypeError,
ValueError, ImportError) where the tokenizer is being exported/saved) and
replace the bare pass with a warning log/print that states "tokenizer export
skipped" plus the exception message (capture the exception as e and include e or
use logger.exception/logging.warning with exc_info=False to avoid raising), then
keep the same control flow (do not re-raise).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 62f4c0d5-01bd-4c5d-b50d-b2145f31ce29
📒 Files selected for processing (2)
modelopt/torch/export/unified_export_megatron.pytests/gpu_megatron/torch/export/test_unified_export_megatron.py
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1335 +/- ##
==========================================
+ Coverage 74.67% 75.74% +1.07%
==========================================
Files 468 468
Lines 50374 50372 -2
==========================================
+ Hits 37615 38154 +539
+ Misses 12759 12218 -541
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: Bug fix
Broaden the exception handler around
AutoTokenizer.from_pretrainedinGPTModelExporter.save_pretrainedto also catchValueErrorandImportError.In
transformers4.x, attempting to load a tokenizer from a directory that contains onlyconfig.json(notokenizer.json/tokenizer.model/tokenizer_config.json) raisedOSError, which was already handled. Intransformers5.x the resolution path now reachesPreTrainedTokenizerFast.__init__and raises aValueError("Couldn't instantiate the backend tokenizer from one of: ...") when none of the three backend sources are available. This causedexport_mcore_gpt_to_hfto hard-fail for checkpoint directories that don't carry tokenizer files — includingtests/gpu_megatron/torch/export/test_unified_export_megatron.py, which writes only a minimalconfig.json.The broadened
except (OSError, TypeError, ValueError, ImportError)mirrors the pattern already used just below forAutoProcessor.from_pretrainedand keeps tokenizer export best-effort, as originally intended.Usage
No API change. Existing call sites continue to work:
Testing
transformers==5.6with:ValueError: Couldn't instantiate the backend tokenizer...raised fromunified_export_megatron.py:299.llama/nemotron/eagle/medusaparametrizations in the same test file remain green.transformers4.x: theOSErrorpath is still caught.Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
Triggered by the upgrade to
transformers5.6 (used innvcr.io/nvidia/nemo:26.04, which is the container for thegpu_megatronnox session). The error message fromtransformers— "You need to have sentencepiece or tiktoken installed..." — is a misleading generic fallback;sentencepieceandtiktokenare already pulled in via the[hf]extras, and the real cause is the missing tokenizer files in the export source directory.Summary by CodeRabbit
Refactor
Tests