Skip to content

bench: fix v5 tokenizer fix when --model is an HF Hub repo id#1381

Merged
cquil11 merged 1 commit into
SemiAnalysisAI:mainfrom
trevor-m:fix/bench-tokenizer-hf-hub
May 15, 2026
Merged

bench: fix v5 tokenizer fix when --model is an HF Hub repo id#1381
cquil11 merged 1 commit into
SemiAnalysisAI:mainfrom
trevor-m:fix/bench-tokenizer-hf-hub

Conversation

@trevor-m
Copy link
Copy Markdown
Contributor

With transformers 5.6.0, an AutoTokenizer dispatch change now returns a broken LlamaTokenizer for ByteLevel-BPE models which causes the tokens to be multiplied by ~4x resulting in an apparent throughput regression sgl-project/sglang#25315

#937 has a patch to fix this, however it wasn't applying if a HF model card ID was passed to --model; It only would apply when local directories were passed. This PR applies it for both cases, and also removed the exception hiding to hopefully make future issues more apparent.

_fix_tokenizer_for_sglang resolved tokenizer.json via Path(model_path),
which only works for local model directories. For HF Hub repo ids
(e.g. "nvidia/DeepSeek-R1-0528-FP4-V2") the path lookup silently no-ops
and a bare `except Exception: pass` hid the failure. Combined with the
transformers 5.6.0 AutoTokenizer dispatch change that now returns a
broken LlamaTokenizer for ByteLevel-BPE models, this produced a ~5x
client/server tokenizer mismatch and a false throughput regression
(DeepSeek-R1 FP4 on B200: 1970 -> 425 tok/s).

Fall back to huggingface_hub.hf_hub_download for repo ids, surface
resolution failures as warnings, and add an info log when the fix
actually rewires pre_tokenizer/decoder so client/server alignment is
visible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@trevor-m trevor-m requested a review from a team May 15, 2026 00:21
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable. thx for the fix

@cquil11 cquil11 merged commit 5fe6d56 into SemiAnalysisAI:main May 15, 2026
functionstackx added a commit that referenced this pull request May 20, 2026
…o-MTP/MTP) (#1502)

* perf-changelog: re-run DSR1 SGLang agg configs to pick up tokenizer fix

Re-runs DSR1 SGLang agg configs on B200/B300 (FP8/FP4, no-MTP/MTP) to
pick up the tokenizer fix from #1381.

* perf-changelog: set PR link to #1502

* launcher(b300-nv): drop nodelist pinning; restore perf-changelog entries

- runners/launch_b300-nv.sh: remove --nodelist=b300-[001-006,008-012,017-020] from salloc so jobs can land on any healthy B300 node.
- perf-changelog.yaml: restore ~18 entries that were unintentionally dropped during a prior rebase; net effect of this branch is now just the new DSR1 SGLang agg re-run entry.

* perf-changelog: sync to origin/main and append DSR1 re-run entry at end

* Update perf-changelog.yaml

---------

Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
functionstackx pushed a commit that referenced this pull request May 27, 2026
vLLM's tokenizer wrapper can raise AttributeError on
all_special_tokens_extended with newer transformers (e.g. Qwen3.5).
Use backend_request_func.get_tokenizer on fallback so client
tokenization keeps the sglang v5 fix (#1381); bare AutoTokenizer only
if backend is unavailable.

Co-authored-by: cliu1004@amd.com <cliu1004@amd.com@mia1-p01-g18.mia.tensorwave.lan>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Development

Successfully merging this pull request may close these issues.

3 participants