[https://nvbugs/6185234][fix] DeepSeek-V3.2 tokenizer load on transformers 5.x by Hudayday · Pull Request #14261 · NVIDIA/TensorRT-LLM

Hudayday · 2026-05-18T12:09:39Z

Description

Fix TransformersTokenizer.from_pretrained for models whose model_type
is not registered in HF transformers' CONFIG_MAPPING_NAMES (notably
DeepSeek-V3.2-Exp, model_type=deepseek_v32) under transformers >= 5.0.

Root cause

On transformers 5.x, PreTrainedConfig is @dataclass_transform-decorated.
When AutoConfig.from_pretrained encounters an unknown model_type, it
falls back to the base PreTrainedConfig instead of a concrete subclass,
and any field that subclasses normally provide is missing. Reading
max_position_embeddings on that base instance raises:

  AttributeError: 'PreTrainedConfig' object has no attribute                                                        
                  'max_position_embeddings'

This kills AutoTokenizer.from_pretrained even though the tokenizer files
(tokenizer.json, tokenizer_config.json) are themselves loadable.

Repro:

L0_PostMerge Build 2722:
accuracy/test_disaggregated_serving.py::TestDeepSeekV32Exp::test_auto_dtype[False]
Manual: trtllm-eval against DeepSeek-V3.2-Exp-FP4-v2 on transformers
5.5.3, any backend.

Fix

In tensorrt_llm/tokenizer/tokenizer.py:

Wrap AutoTokenizer.from_pretrained in
TransformersTokenizer.from_pretrained with a fallback path that
catches AttributeError/KeyError mentioning max_position_embeddings.
Fall back to PreTrainedTokenizerFast.from_pretrained, which reads
tokenizer.json directly via the Rust tokenizers backend and does not
touch AutoConfig.
Inherit add_bos_token, add_eos_token, clean_up_tokenization_spaces,
model_max_length from tokenizer_config.json so behavior matches
AutoTokenizer on the keys that affect prompt formatting.
Emit a [TRT-LLM] [W] [tokenizr] warning naming the bypassed dataclass
regression and listing inherited keys, so this stays debuggable.

The normal AutoTokenizer path is unchanged for all models whose
model_type is registered (Llama, Gemma, Qwen, ...).

Summary by CodeRabbit

Bug Fixes
- Improved compatibility with Transformers 5.x library versions for tokenizer initialization
- Added fallback mechanism to handle tokenizer configuration edge cases
- Enhanced handling of model types that previously caused initialization failures with newer library versions

Hudayday · 2026-05-18T12:11:51Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-8_GPUs-PyTorch-1, DGX_B200-8_GPUs-PyTorch-2, DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1"

coderabbitai · 2026-05-18T12:14:17Z

📝 Walkthrough

Walkthrough

This PR adds Transformers 5.x compatibility to tokenizer loading by detecting AutoTokenizer.from_pretrained failures and falling back to constructing PreTrainedTokenizerFast directly. Helper functions read LlamaTokenizerFast config flags and merge them with caller kwargs when the fallback is triggered.

Changes

Transformers 5.x AutoTokenizer Fallback

Layer / File(s)	Summary
Fallback helper functions and constants `tensorrt_llm/tokenizer/tokenizer.py`	Constants define LlamaTokenizerFast config keys; `_load_tokenizer_config_flags` reads and extracts those flags from `tokenizer_config.json`; `_fallback_to_fast_tokenizer` merges loaded flags with caller kwargs and constructs `PreTrainedTokenizerFast`.
Exception-driven fallback in from_pretrained `tensorrt_llm/tokenizer/tokenizer.py`	`TransformersTokenizer.from_pretrained` wraps `AutoTokenizer.from_pretrained` call in try/except; when `AttributeError` mentions `max_position_embeddings`, invokes fallback helper; other `AttributeError` cases are re-raised.

Sequence Diagram

sequenceDiagram
  participant Caller
  participant TransformersTokenizer
  participant AutoTokenizer
  participant Fallback as _fallback_to_fast_tokenizer
  participant ConfigFile as tokenizer_config.json
  participant PreTrainedTokenizerFast
  Caller->>TransformersTokenizer: from_pretrained(model_id, ...)
  TransformersTokenizer->>AutoTokenizer: from_pretrained(model_id, ...)
  AutoTokenizer-->>TransformersTokenizer: AttributeError (max_position_embeddings)
  TransformersTokenizer->>Fallback: _fallback_to_fast_tokenizer(model_id, ...)
  Fallback->>ConfigFile: read tokenizer_config.json
  ConfigFile-->>Fallback: config data
  Fallback->>Fallback: extract LlamaTokenizerFast flags
  Fallback->>Fallback: merge with caller kwargs
  Fallback->>PreTrainedTokenizerFast: construct tokenizer
  PreTrainedTokenizerFast-->>Fallback: tokenizer instance
  Fallback-->>TransformersTokenizer: return tokenizer
  TransformersTokenizer-->>Caller: return tokenizer

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the issue (tokenizer load failure) and the root cause context (transformers 5.x compatibility problem) with a specific model reference (DeepSeek-V3.2).
Description check	✅ Passed	The description provides a thorough explanation of the root cause, reproduction steps, and the implemented fix, addressing all key aspects of the problem and solution.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/tokenizer/tokenizer.py`:
- Around line 62-73: Run YAPF/pre-commit to reformat the
_TOKENIZER_CONFIG_INHERIT_KEYS tuple in tokenizer.py so it matches repository
style; specifically reflow the tuple elements and parentheses as YAPF expects
(the symbol to change is _TOKENIZER_CONFIG_INHERIT_KEYS) and stage the resulting
change. Ensure the tuple formatting matches the rest of the file and passes the
pre-commit hook.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 849e6637-0a9e-4aa2-a401-809561e3d116

📥 Commits

Reviewing files that changed from the base of the PR and between a6fc155 and 73b9b4c.

📒 Files selected for processing (1)

tensorrt_llm/tokenizer/tokenizer.py

tensorrt-cicd · 2026-05-18T12:17:21Z

PR_Github #48904 [ run ] triggered by Bot. Commit: 73b9b4c Link to invocation

…rmers 5.x transformers 5.x converted PreTrainedConfig to a @dataclass_transform. For models whose model_type is not registered in CONFIG_MAPPING_NAMES (e.g. DeepSeek-V3.2's deepseek_v32), AutoTokenizer.from_pretrained falls back to a bare PreTrainedConfig whose __post_init__ runs RoPE standardization and reads self.max_position_embeddings, raising: AttributeError: 'PreTrainedConfig' object has no attribute 'max_position_embeddings' load_hf_tokenizer swallows this and returns None, after which the disagg orchestrator hangs waiting on a non-existent tokenizer. Surfaced in L0_PostMerge Build 2722 deepseek-v3.2 disagg. PreTrainedTokenizerFast.from_pretrained sidesteps AutoConfig (reads tokenizer.json via the Rust tokenizers library), so on the max_position_embeddings AttributeError we fall back to it. To match the behavior AutoTokenizer would have had pre-regression we forward LlamaTokenizerFast-style flags from tokenizer_config.json (add_bos_token, padding_side, ...) — without add_bos_token=True the bare fast tokenizer drops the leading BOS V3.2 expects. Verified on B200 against transformers 4.57.6: vocab_size, model_max_length, eos/bos, chat_template, all 818 added tokens, and 8 encode probes (incl. CJK, emoji, math, special tokens) are byte-identical. Llama-3.1-8B and other models whose model_type is registered take the normal AutoTokenizer path unchanged. Upstream context: deepseek-ai/DeepSeek-V3#1207, vllm-project/vllm#30933. Signed-off-by: tianruih <tianruih@nvidia.com>

Hudayday · 2026-05-18T12:56:47Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-8_GPUs-PyTorch-1, DGX_B200-8_GPUs-PyTorch-2, DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1"

tensorrt-cicd · 2026-05-18T13:02:13Z

PR_Github #48908 [ run ] triggered by Bot. Commit: 669e852 Link to invocation

tensorrt-cicd · 2026-05-18T13:06:02Z

PR_Github #48904 [ run ] completed with state ABORTED. Commit: 73b9b4c

Link to invocation

VALLIS-NERIA

LGTM

tensorrt-cicd · 2026-05-18T17:36:41Z

PR_Github #48908 [ run ] completed with state SUCCESS. Commit: 669e852
/LLM/main/L0_MergeRequest_PR pipeline #38657 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Hudayday · 2026-05-19T02:09:42Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-8_GPUs-PyTorch-1, DGX_B200-8_GPUs-PyTorch-2, DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1"

tensorrt-cicd · 2026-05-19T02:15:02Z

PR_Github #49036 [ run ] triggered by Bot. Commit: 669e852 Link to invocation

tensorrt-cicd · 2026-05-19T07:59:28Z

PR_Github #49036 [ run ] completed with state SUCCESS. Commit: 669e852
/LLM/main/L0_MergeRequest_PR pipeline #38773 completed with status: 'SUCCESS'

CI Report

Link to invocation

…rmers 5.x (NVIDIA#14261) Signed-off-by: tianruih <tianruih@nvidia.com>

github-actions Bot assigned Hudayday May 18, 2026

Hudayday enabled auto-merge (squash) May 18, 2026 12:11

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread tensorrt_llm/tokenizer/tokenizer.py

Hudayday force-pushed the tianruih/v32-tokenizer-fix-c branch from 73b9b4c to 669e852 Compare May 18, 2026 12:53

VALLIS-NERIA approved these changes May 18, 2026

View reviewed changes

Hudayday merged commit 2af9ba9 into NVIDIA:main May 19, 2026
7 checks passed

KleinBlueC pushed a commit to KleinBlueC/TensorRT-LLM that referenced this pull request May 19, 2026

[https://nvbugs/6185234][fix] DeepSeek-V3.2 tokenizer load on transfo…

6aa8b30

…rmers 5.x (NVIDIA#14261) Signed-off-by: tianruih <tianruih@nvidia.com>

Conversation

Hudayday commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Root cause

Fix

Summary by CodeRabbit

Uh oh!

Hudayday commented May 18, 2026

Uh oh!

coderabbitai Bot commented May 18, 2026

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

Hudayday commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

VALLIS-NERIA left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

Hudayday commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hudayday commented May 18, 2026 •

edited by coderabbitai Bot

Loading