Skip to content

[https://nvbugs/6185234][fix] DeepSeek-V3.2 tokenizer load on transformers 5.x#14261

Merged
Hudayday merged 1 commit into
NVIDIA:mainfrom
Hudayday:tianruih/v32-tokenizer-fix-c
May 19, 2026
Merged

[https://nvbugs/6185234][fix] DeepSeek-V3.2 tokenizer load on transformers 5.x#14261
Hudayday merged 1 commit into
NVIDIA:mainfrom
Hudayday:tianruih/v32-tokenizer-fix-c

Conversation

@Hudayday
Copy link
Copy Markdown
Collaborator

@Hudayday Hudayday commented May 18, 2026

Description

Fix TransformersTokenizer.from_pretrained for models whose model_type
is not registered in HF transformers' CONFIG_MAPPING_NAMES (notably
DeepSeek-V3.2-Exp, model_type=deepseek_v32) under transformers >= 5.0.

Root cause

On transformers 5.x, PreTrainedConfig is @dataclass_transform-decorated.
When AutoConfig.from_pretrained encounters an unknown model_type, it
falls back to the base PreTrainedConfig instead of a concrete subclass,
and any field that subclasses normally provide is missing. Reading
max_position_embeddings on that base instance raises:

  AttributeError: 'PreTrainedConfig' object has no attribute                                                        
                  'max_position_embeddings'                                                                         

This kills AutoTokenizer.from_pretrained even though the tokenizer files
(tokenizer.json, tokenizer_config.json) are themselves loadable.

Repro:

  • L0_PostMerge Build 2722:
    accuracy/test_disaggregated_serving.py::TestDeepSeekV32Exp::test_auto_dtype[False]
  • Manual: trtllm-eval against DeepSeek-V3.2-Exp-FP4-v2 on transformers
    5.5.3, any backend.

Fix

In tensorrt_llm/tokenizer/tokenizer.py:

  1. Wrap AutoTokenizer.from_pretrained in
    TransformersTokenizer.from_pretrained with a fallback path that
    catches AttributeError/KeyError mentioning max_position_embeddings.
  2. Fall back to PreTrainedTokenizerFast.from_pretrained, which reads
    tokenizer.json directly via the Rust tokenizers backend and does not
    touch AutoConfig.
  3. Inherit add_bos_token, add_eos_token, clean_up_tokenization_spaces,
    model_max_length from tokenizer_config.json so behavior matches
    AutoTokenizer on the keys that affect prompt formatting.
  4. Emit a [TRT-LLM] [W] [tokenizr] warning naming the bypassed dataclass
    regression and listing inherited keys, so this stays debuggable.

The normal AutoTokenizer path is unchanged for all models whose
model_type is registered (Llama, Gemma, Qwen, ...).

Summary by CodeRabbit

  • Bug Fixes
    • Improved compatibility with Transformers 5.x library versions for tokenizer initialization
    • Added fallback mechanism to handle tokenizer configuration edge cases
    • Enhanced handling of model types that previously caused initialization failures with newer library versions

Review Change Stack

@Hudayday
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --extra-stage "DGX_B200-8_GPUs-PyTorch-1, DGX_B200-8_GPUs-PyTorch-2, DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1"

@Hudayday Hudayday enabled auto-merge (squash) May 18, 2026 12:11
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Walkthrough

This PR adds Transformers 5.x compatibility to tokenizer loading by detecting AutoTokenizer.from_pretrained failures and falling back to constructing PreTrainedTokenizerFast directly. Helper functions read LlamaTokenizerFast config flags and merge them with caller kwargs when the fallback is triggered.

Changes

Transformers 5.x AutoTokenizer Fallback

Layer / File(s) Summary
Fallback helper functions and constants
tensorrt_llm/tokenizer/tokenizer.py
Constants define LlamaTokenizerFast config keys; _load_tokenizer_config_flags reads and extracts those flags from tokenizer_config.json; _fallback_to_fast_tokenizer merges loaded flags with caller kwargs and constructs PreTrainedTokenizerFast.
Exception-driven fallback in from_pretrained
tensorrt_llm/tokenizer/tokenizer.py
TransformersTokenizer.from_pretrained wraps AutoTokenizer.from_pretrained call in try/except; when AttributeError mentions max_position_embeddings, invokes fallback helper; other AttributeError cases are re-raised.

Sequence Diagram

sequenceDiagram
  participant Caller
  participant TransformersTokenizer
  participant AutoTokenizer
  participant Fallback as _fallback_to_fast_tokenizer
  participant ConfigFile as tokenizer_config.json
  participant PreTrainedTokenizerFast
  Caller->>TransformersTokenizer: from_pretrained(model_id, ...)
  TransformersTokenizer->>AutoTokenizer: from_pretrained(model_id, ...)
  AutoTokenizer-->>TransformersTokenizer: AttributeError (max_position_embeddings)
  TransformersTokenizer->>Fallback: _fallback_to_fast_tokenizer(model_id, ...)
  Fallback->>ConfigFile: read tokenizer_config.json
  ConfigFile-->>Fallback: config data
  Fallback->>Fallback: extract LlamaTokenizerFast flags
  Fallback->>Fallback: merge with caller kwargs
  Fallback->>PreTrainedTokenizerFast: construct tokenizer
  PreTrainedTokenizerFast-->>Fallback: tokenizer instance
  Fallback-->>TransformersTokenizer: return tokenizer
  TransformersTokenizer-->>Caller: return tokenizer
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the issue (tokenizer load failure) and the root cause context (transformers 5.x compatibility problem) with a specific model reference (DeepSeek-V3.2).
Description check ✅ Passed The description provides a thorough explanation of the root cause, reproduction steps, and the implemented fix, addressing all key aspects of the problem and solution.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/tokenizer/tokenizer.py`:
- Around line 62-73: Run YAPF/pre-commit to reformat the
_TOKENIZER_CONFIG_INHERIT_KEYS tuple in tokenizer.py so it matches repository
style; specifically reflow the tuple elements and parentheses as YAPF expects
(the symbol to change is _TOKENIZER_CONFIG_INHERIT_KEYS) and stage the resulting
change. Ensure the tuple formatting matches the rest of the file and passes the
pre-commit hook.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 849e6637-0a9e-4aa2-a401-809561e3d116

📥 Commits

Reviewing files that changed from the base of the PR and between a6fc155 and 73b9b4c.

📒 Files selected for processing (1)
  • tensorrt_llm/tokenizer/tokenizer.py

Comment thread tensorrt_llm/tokenizer/tokenizer.py
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48904 [ run ] triggered by Bot. Commit: 73b9b4c Link to invocation

…rmers 5.x

transformers 5.x converted PreTrainedConfig to a @dataclass_transform.
For models whose model_type is not registered in CONFIG_MAPPING_NAMES
(e.g. DeepSeek-V3.2's deepseek_v32), AutoTokenizer.from_pretrained
falls back to a bare PreTrainedConfig whose __post_init__ runs RoPE
standardization and reads self.max_position_embeddings, raising:

  AttributeError: 'PreTrainedConfig' object has no attribute
  'max_position_embeddings'

load_hf_tokenizer swallows this and returns None, after which the disagg
orchestrator hangs waiting on a non-existent tokenizer. Surfaced in
L0_PostMerge Build 2722 deepseek-v3.2 disagg.

PreTrainedTokenizerFast.from_pretrained sidesteps AutoConfig (reads
tokenizer.json via the Rust tokenizers library), so on the
max_position_embeddings AttributeError we fall back to it. To match the
behavior AutoTokenizer would have had pre-regression we forward
LlamaTokenizerFast-style flags from tokenizer_config.json
(add_bos_token, padding_side, ...) — without add_bos_token=True the
bare fast tokenizer drops the leading BOS V3.2 expects.

Verified on B200 against transformers 4.57.6: vocab_size,
model_max_length, eos/bos, chat_template, all 818 added tokens, and 8
encode probes (incl. CJK, emoji, math, special tokens) are
byte-identical. Llama-3.1-8B and other models whose model_type is
registered take the normal AutoTokenizer path unchanged.

Upstream context: deepseek-ai/DeepSeek-V3#1207,
vllm-project/vllm#30933.

Signed-off-by: tianruih <tianruih@nvidia.com>
@Hudayday Hudayday force-pushed the tianruih/v32-tokenizer-fix-c branch from 73b9b4c to 669e852 Compare May 18, 2026 12:53
@Hudayday
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --extra-stage "DGX_B200-8_GPUs-PyTorch-1, DGX_B200-8_GPUs-PyTorch-2, DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48908 [ run ] triggered by Bot. Commit: 669e852 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48904 [ run ] completed with state ABORTED. Commit: 73b9b4c

Link to invocation

Copy link
Copy Markdown
Collaborator

@VALLIS-NERIA VALLIS-NERIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48908 [ run ] completed with state SUCCESS. Commit: 669e852
/LLM/main/L0_MergeRequest_PR pipeline #38657 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@Hudayday
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --extra-stage "DGX_B200-8_GPUs-PyTorch-1, DGX_B200-8_GPUs-PyTorch-2, DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49036 [ run ] triggered by Bot. Commit: 669e852 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49036 [ run ] completed with state SUCCESS. Commit: 669e852
/LLM/main/L0_MergeRequest_PR pipeline #38773 completed with status: 'SUCCESS'

CI Report

Link to invocation

@Hudayday Hudayday merged commit 2af9ba9 into NVIDIA:main May 19, 2026
7 checks passed
KleinBlueC pushed a commit to KleinBlueC/TensorRT-LLM that referenced this pull request May 19, 2026
…rmers 5.x (NVIDIA#14261)

Signed-off-by: tianruih <tianruih@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants