bench: fix v5 tokenizer fix when --model is an HF Hub repo id by trevor-m · Pull Request #1381 · SemiAnalysisAI/InferenceX

trevor-m · 2026-05-15T00:21:10Z

With transformers 5.6.0, an AutoTokenizer dispatch change now returns a broken LlamaTokenizer for ByteLevel-BPE models which causes the tokens to be multiplied by ~4x resulting in an apparent throughput regression sgl-project/sglang#25315

#937 has a patch to fix this, however it wasn't applying if a HF model card ID was passed to --model; It only would apply when local directories were passed. This PR applies it for both cases, and also removed the exception hiding to hopefully make future issues more apparent.

_fix_tokenizer_for_sglang resolved tokenizer.json via Path(model_path), which only works for local model directories. For HF Hub repo ids (e.g. "nvidia/DeepSeek-R1-0528-FP4-V2") the path lookup silently no-ops and a bare `except Exception: pass` hid the failure. Combined with the transformers 5.6.0 AutoTokenizer dispatch change that now returns a broken LlamaTokenizer for ByteLevel-BPE models, this produced a ~5x client/server tokenizer mismatch and a false throughput regression (DeepSeek-R1 FP4 on B200: 1970 -> 425 tok/s). Fall back to huggingface_hub.hf_hub_download for repo ids, surface resolution failures as warnings, and add an info log when the fix actually rewires pre_tokenizer/decoder so client/server alignment is visible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

cquil11

seems reasonable. thx for the fix

…o-MTP/MTP) (#1502) * perf-changelog: re-run DSR1 SGLang agg configs to pick up tokenizer fix Re-runs DSR1 SGLang agg configs on B200/B300 (FP8/FP4, no-MTP/MTP) to pick up the tokenizer fix from #1381. * perf-changelog: set PR link to #1502 * launcher(b300-nv): drop nodelist pinning; restore perf-changelog entries - runners/launch_b300-nv.sh: remove --nodelist=b300-[001-006,008-012,017-020] from salloc so jobs can land on any healthy B300 node. - perf-changelog.yaml: restore ~18 entries that were unintentionally dropped during a prior rebase; net effect of this branch is now just the new DSR1 SGLang agg re-run entry. * perf-changelog: sync to origin/main and append DSR1 re-run entry at end * Update perf-changelog.yaml --------- Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>

vLLM's tokenizer wrapper can raise AttributeError on all_special_tokens_extended with newer transformers (e.g. Qwen3.5). Use backend_request_func.get_tokenizer on fallback so client tokenization keeps the sglang v5 fix (#1381); bare AutoTokenizer only if backend is unavailable. Co-authored-by: cliu1004@amd.com <cliu1004@amd.com@mia1-p01-g18.mia.tensorwave.lan> Co-authored-by: Cursor <cursoragent@cursor.com>

trevor-m requested a review from a team May 15, 2026 00:21

github-project-automation Bot added this to InferenceMAX Board May 15, 2026

claude Bot reviewed May 15, 2026

View reviewed changes

trevor-m mentioned this pull request May 15, 2026

[Bug] DeepSeek-R1 increased kv usage and performance regression due to transformers==5.6.0 sgl-project/sglang#25315

Open

5 tasks

kedarpotdar-nv added the bug Something isn't working label May 15, 2026

cquil11 approved these changes May 15, 2026

View reviewed changes

cquil11 merged commit 5fe6d56 into SemiAnalysisAI:main May 15, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board May 15, 2026

Ankur-singh mentioned this pull request May 18, 2026

perf-changelog: re-run DSR1 SGLang agg configs (B200/B300, FP8/FP4, no-MTP/MTP) #1502

Merged

ChangLiu0709 mentioned this pull request May 27, 2026

bench_serving: fallback when vLLM get_tokenizer hits removed API #1573

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: fix v5 tokenizer fix when --model is an HF Hub repo id#1381

bench: fix v5 tokenizer fix when --model is an HF Hub repo id#1381
cquil11 merged 1 commit into
SemiAnalysisAI:mainfrom
trevor-m:fix/bench-tokenizer-hf-hub

trevor-m commented May 15, 2026

Uh oh!

claude Bot left a comment

Uh oh!

cquil11 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

trevor-m commented May 15, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants