[None][feat] Add --custom_tokenizer CLI option to trtllm-bench#12586
[None][feat] Add --custom_tokenizer CLI option to trtllm-bench#12586longlee0622 merged 7 commits intoNVIDIA:mainfrom
Conversation
Models like GLM-5 ship tokenizer_config.json with tokenizer_class set to a non-standard class (e.g., TokenizersBackend from transformers 5.x) that AutoTokenizer cannot load. trtllm-serve already supports a custom_tokenizer LlmArgs field with built-in aliases, but trtllm-bench calls AutoTokenizer.from_pretrained() directly before LlmArgs is constructed, so --extra_llm_api_options cannot help. Add --custom_tokenizer to both throughput and latency sub-commands. The option accepts a built-in alias (deepseek_v32, glm_moe_dsa) or a fully-qualified module.path.ClassName, reusing the same TOKENIZER_ALIASES registry from llm_args.py (now promoted to module level so it can be imported by the bench utilities). Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
|
/bot run |
📝 WalkthroughWalkthroughThis change adds support for custom tokenizers to the benchmark suite by introducing a Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI
participant BenchmarkCmd as latency/throughput Command
participant InitTokenizer as initialize_tokenizer()
participant Resolver as Tokenizer Resolver
participant Tokenizer
User->>CLI: --custom_tokenizer my.Module.Class
CLI->>BenchmarkCmd: Parse options
BenchmarkCmd->>BenchmarkCmd: Extract custom_tokenizer param
BenchmarkCmd->>InitTokenizer: initialize_tokenizer(model_name, custom_tokenizer)
alt custom_tokenizer provided
InitTokenizer->>Resolver: Resolve via TOKENIZER_ALIASES<br/>or dynamic import
Resolver->>Tokenizer: from_pretrained(model_name)
else custom_tokenizer not provided
InitTokenizer->>Tokenizer: AutoTokenizer.from_pretrained(model_name)
end
Tokenizer->>InitTokenizer: Return tokenizer instance
InitTokenizer->>BenchmarkCmd: Return initialized tokenizer
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/bench/benchmark/low_latency.py`:
- Around line 100-108: The changed option block decorated with `@optgroup.option`
for "--custom_tokenizer" is failing YAPF formatting; reformat that block (and
the similar block around lines 212-215) to comply with the project's YAPF style
by running the formatter (e.g., pre-commit run yapf --files
tensorrt_llm/bench/benchmark/low_latency.py or yapf -i on that file) and commit
the rewritten/normalized code so the option declaration for "--custom_tokenizer"
(and the other affected option) matches the project's line-wrapping and
indentation rules.
In `@tensorrt_llm/bench/benchmark/throughput.py`:
- Around line 131-139: YAPF formatting failed for the updated optgroup.option
block that defines the "--custom_tokenizer" option; run the project's YAPF
formatter on tensorrt_llm/bench/benchmark/throughput.py (and the nearby affected
block around the optgroup.option at the other mentioned location) to rewrite the
option declaration to conform to style rules (wrap/align the multi-line help
string and arguments as the formatter expects) so the file passes CI; locate the
option by the optgroup.option decorator and the "--custom_tokenizer" option name
when applying the formatter.
In `@tensorrt_llm/bench/utils/data.py`:
- Around line 29-40: Wrap the custom tokenizer resolution and import logic (the
block that uses TOKENIZER_ALIASES, tokenizer_path, tokenizer_path.rsplit(...),
import_module, getattr and tokenizer_class.from_pretrained) in a try/except that
validates inputs and converts low-level errors into a clear, user-facing
exception; specifically check that custom_tokenizer is a non-empty string, use
rsplit safely and ensure module_path and class_name are present, catch
ImportError/ModuleNotFoundError/AttributeError/ValueError and re-raise a
ValueError (or a custom exception) with a message like "Invalid custom_tokenizer
'{custom_tokenizer}': could not resolve '<module>.<Class>' - <original error>"
so callers see a concise, actionable error instead of an opaque traceback.
In `@tensorrt_llm/llmapi/llm_args.py`:
- Around line 2389-2395: YAPF formatting is failing CI for the TOKENIZER_ALIASES
block in tensorrt_llm.llmapi.llm_args (symbol TOKENIZER_ALIASES); run the
project's YAPF pre-commit or formatting command on this file, reformat the
dictionary to match the repo style (e.g., proper line breaks, indentation and
trailing comma rules), and re-commit the updated llm_args.py so pre-commit/yapf
no longer modifies it in CI.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c4532387-2696-46ce-8865-d1dcc9d89279
📒 Files selected for processing (4)
tensorrt_llm/bench/benchmark/low_latency.pytensorrt_llm/bench/benchmark/throughput.pytensorrt_llm/bench/utils/data.pytensorrt_llm/llmapi/llm_args.py
|
PR_Github #40701 [ run ] triggered by Bot. Commit: |
Add try/except around custom tokenizer loading in initialize_tokenizer for clear error messages. Run yapf on all changed files. Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
|
/bot run |
|
PR_Github #40709 [ run ] triggered by Bot. Commit: |
|
PR_Github #40701 [ run ] completed with state |
|
PR_Github #40709 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #40809 [ run ] triggered by Bot. Commit: |
Same TokenizersBackend issue as trtllm-bench: benchmark_serving.py calls AutoTokenizer.from_pretrained() directly via get_tokenizer(), which fails for models like GLM-5. Add --custom-tokenizer CLI arg and wire it through get_tokenizer() using the shared TOKENIZER_ALIASES registry. Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #40821 [ run ] triggered by Bot. Commit: |
|
PR_Github #40809 [ run ] completed with state |
benchmark_serving.py accesses tokenizer.vocab_size for synthetic dataset generation. GlmMoeDsaTokenizer wraps PreTrainedTokenizerFast but does not delegate attribute access, so vocab_size was missing. Add a property that forwards to the inner tokenizer. Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #40893 [ run ] triggered by Bot. Commit: |
|
PR_Github #40893 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41050 [ run ] triggered by Bot. Commit: |
|
/bot run |
|
PR_Github #41461 [ run ] triggered by Bot. Commit: |
|
PR_Github #41461 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41530 [ run ] triggered by Bot. Commit: |
|
PR_Github #41530 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
1 similar comment
|
/bot run --disable-fail-fast |
|
PR_Github #41770 [ run ] triggered by Bot. Commit: |
|
PR_Github #41771 [ run ] triggered by Bot. Commit: |
|
/bot run --disable-fail-fast |
|
PR_Github #41780 [ run ] triggered by Bot. Commit: |
|
PR_Github #41780 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41814 [ run ] triggered by Bot. Commit: |
|
PR_Github #41814 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #41816 [ run ] triggered by Bot. Commit: |
|
PR_Github #41816 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #41817 [ run ] triggered by Bot. Commit: |
|
PR_Github #41817 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #41849 [ run ] triggered by Bot. Commit: |
|
PR_Github #41849 [ run ] completed with state |
…A#12586) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
…A#12586) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Summary
tokenizer_config.jsonwithtokenizer_classset toTokenizersBackend(from transformers 5.x), whichAutoTokenizercannot load.trtllm-servealready supportscustom_tokenizerviaLlmArgs, buttrtllm-benchcallsAutoTokenizer.from_pretrained()directly for dataset preprocessing beforeLlmArgsis constructed, so--extra_llm_api_optionswithcustom_tokenizercannot help.--custom_tokenizerCLI option to boththroughputandlatencysub-commands, accepting built-in aliases (deepseek_v32,glm_moe_dsa) or fully-qualifiedmodule.path.ClassName.TOKENIZER_ALIASESfrom a local variable inBaseLlmArgsto module-level inllm_args.pyso it can be shared with bench utilities.Usage
Test plan
trtllm-bench throughput --custom_tokenizer glm_moe_dsaloads GLM-5 tokenizer successfullytrtllm-bench throughputwithout--custom_tokenizerstill usesAutoTokenizer(no regression for DeepSeek V3.2, Llama, etc.)trtllm-bench latency --custom_tokenizer glm_moe_dsaworks similarly🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
--custom_tokenizerCLI option to latency and throughput benchmark commands, enabling users to specify custom tokenizers by alias or fully-qualified class path.Chores