[https://nvbugs/6185234][fix] DeepSeek-V3.2 tokenizer load on transformers 5.x#14261
Conversation
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-8_GPUs-PyTorch-1, DGX_B200-8_GPUs-PyTorch-2, DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" |
📝 WalkthroughWalkthroughThis PR adds Transformers 5.x compatibility to tokenizer loading by detecting ChangesTransformers 5.x AutoTokenizer Fallback
Sequence DiagramsequenceDiagram
participant Caller
participant TransformersTokenizer
participant AutoTokenizer
participant Fallback as _fallback_to_fast_tokenizer
participant ConfigFile as tokenizer_config.json
participant PreTrainedTokenizerFast
Caller->>TransformersTokenizer: from_pretrained(model_id, ...)
TransformersTokenizer->>AutoTokenizer: from_pretrained(model_id, ...)
AutoTokenizer-->>TransformersTokenizer: AttributeError (max_position_embeddings)
TransformersTokenizer->>Fallback: _fallback_to_fast_tokenizer(model_id, ...)
Fallback->>ConfigFile: read tokenizer_config.json
ConfigFile-->>Fallback: config data
Fallback->>Fallback: extract LlamaTokenizerFast flags
Fallback->>Fallback: merge with caller kwargs
Fallback->>PreTrainedTokenizerFast: construct tokenizer
PreTrainedTokenizerFast-->>Fallback: tokenizer instance
Fallback-->>TransformersTokenizer: return tokenizer
TransformersTokenizer-->>Caller: return tokenizer
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tensorrt_llm/tokenizer/tokenizer.py`:
- Around line 62-73: Run YAPF/pre-commit to reformat the
_TOKENIZER_CONFIG_INHERIT_KEYS tuple in tokenizer.py so it matches repository
style; specifically reflow the tuple elements and parentheses as YAPF expects
(the symbol to change is _TOKENIZER_CONFIG_INHERIT_KEYS) and stage the resulting
change. Ensure the tuple formatting matches the rest of the file and passes the
pre-commit hook.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 849e6637-0a9e-4aa2-a401-809561e3d116
📒 Files selected for processing (1)
tensorrt_llm/tokenizer/tokenizer.py
|
PR_Github #48904 [ run ] triggered by Bot. Commit: |
…rmers 5.x transformers 5.x converted PreTrainedConfig to a @dataclass_transform. For models whose model_type is not registered in CONFIG_MAPPING_NAMES (e.g. DeepSeek-V3.2's deepseek_v32), AutoTokenizer.from_pretrained falls back to a bare PreTrainedConfig whose __post_init__ runs RoPE standardization and reads self.max_position_embeddings, raising: AttributeError: 'PreTrainedConfig' object has no attribute 'max_position_embeddings' load_hf_tokenizer swallows this and returns None, after which the disagg orchestrator hangs waiting on a non-existent tokenizer. Surfaced in L0_PostMerge Build 2722 deepseek-v3.2 disagg. PreTrainedTokenizerFast.from_pretrained sidesteps AutoConfig (reads tokenizer.json via the Rust tokenizers library), so on the max_position_embeddings AttributeError we fall back to it. To match the behavior AutoTokenizer would have had pre-regression we forward LlamaTokenizerFast-style flags from tokenizer_config.json (add_bos_token, padding_side, ...) — without add_bos_token=True the bare fast tokenizer drops the leading BOS V3.2 expects. Verified on B200 against transformers 4.57.6: vocab_size, model_max_length, eos/bos, chat_template, all 818 added tokens, and 8 encode probes (incl. CJK, emoji, math, special tokens) are byte-identical. Llama-3.1-8B and other models whose model_type is registered take the normal AutoTokenizer path unchanged. Upstream context: deepseek-ai/DeepSeek-V3#1207, vllm-project/vllm#30933. Signed-off-by: tianruih <tianruih@nvidia.com>
73b9b4c to
669e852
Compare
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-8_GPUs-PyTorch-1, DGX_B200-8_GPUs-PyTorch-2, DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" |
|
PR_Github #48908 [ run ] triggered by Bot. Commit: |
|
PR_Github #48904 [ run ] completed with state |
|
PR_Github #48908 [ run ] completed with state
|
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-8_GPUs-PyTorch-1, DGX_B200-8_GPUs-PyTorch-2, DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" |
|
PR_Github #49036 [ run ] triggered by Bot. Commit: |
|
PR_Github #49036 [ run ] completed with state |
…rmers 5.x (NVIDIA#14261) Signed-off-by: tianruih <tianruih@nvidia.com>
Description
Fix
TransformersTokenizer.from_pretrainedfor models whosemodel_typeis not registered in HF transformers'
CONFIG_MAPPING_NAMES(notablyDeepSeek-V3.2-Exp,
model_type=deepseek_v32) under transformers >= 5.0.Root cause
On transformers 5.x,
PreTrainedConfigis@dataclass_transform-decorated.When
AutoConfig.from_pretrainedencounters an unknownmodel_type, itfalls back to the base
PreTrainedConfiginstead of a concrete subclass,and any field that subclasses normally provide is missing. Reading
max_position_embeddingson that base instance raises:This kills
AutoTokenizer.from_pretrainedeven though the tokenizer files(
tokenizer.json,tokenizer_config.json) are themselves loadable.Repro:
accuracy/test_disaggregated_serving.py::TestDeepSeekV32Exp::test_auto_dtype[False]
5.5.3, any backend.
Fix
In
tensorrt_llm/tokenizer/tokenizer.py:AutoTokenizer.from_pretrainedinTransformersTokenizer.from_pretrainedwith a fallback path thatcatches
AttributeError/KeyErrormentioningmax_position_embeddings.PreTrainedTokenizerFast.from_pretrained, which readstokenizer.jsondirectly via the Rust tokenizers backend and does nottouch
AutoConfig.add_bos_token,add_eos_token,clean_up_tokenization_spaces,model_max_lengthfromtokenizer_config.jsonso behavior matchesAutoTokenizeron the keys that affect prompt formatting.[TRT-LLM] [W] [tokenizr]warning naming the bypassed dataclassregression and listing inherited keys, so this stays debuggable.
The normal
AutoTokenizerpath is unchanged for all models whosemodel_typeis registered (Llama, Gemma, Qwen, ...).Summary by CodeRabbit