Skip to content

[None][feat] Add --custom_tokenizer CLI option to trtllm-bench#12586

Merged
longlee0622 merged 7 commits intoNVIDIA:mainfrom
qiaoxj07:feat/bench-custom-tokenizer
Apr 5, 2026
Merged

[None][feat] Add --custom_tokenizer CLI option to trtllm-bench#12586
longlee0622 merged 7 commits intoNVIDIA:mainfrom
qiaoxj07:feat/bench-custom-tokenizer

Conversation

@qiaoxj07
Copy link
Copy Markdown
Collaborator

@qiaoxj07 qiaoxj07 commented Mar 30, 2026

Summary

  • Models like GLM-5 ship tokenizer_config.json with tokenizer_class set to TokenizersBackend (from transformers 5.x), which AutoTokenizer cannot load.
  • trtllm-serve already supports custom_tokenizer via LlmArgs, but trtllm-bench calls AutoTokenizer.from_pretrained() directly for dataset preprocessing before LlmArgs is constructed, so --extra_llm_api_options with custom_tokenizer cannot help.
  • Add --custom_tokenizer CLI option to both throughput and latency sub-commands, accepting built-in aliases (deepseek_v32, glm_moe_dsa) or fully-qualified module.path.ClassName.
  • Promote TOKENIZER_ALIASES from a local variable in BaseLlmArgs to module-level in llm_args.py so it can be shared with bench utilities.

Usage

trtllm-bench --model /path/to/GLM-5-NVFP4 throughput   --custom_tokenizer glm_moe_dsa   --dataset <path> ...

Test plan

  • Verify trtllm-bench throughput --custom_tokenizer glm_moe_dsa loads GLM-5 tokenizer successfully
  • Verify trtllm-bench throughput without --custom_tokenizer still uses AutoTokenizer (no regression for DeepSeek V3.2, Llama, etc.)
  • Verify trtllm-bench latency --custom_tokenizer glm_moe_dsa works similarly

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added --custom_tokenizer CLI option to latency and throughput benchmark commands, enabling users to specify custom tokenizers by alias or fully-qualified class path.
  • Chores

    • Optimized tokenizer alias resolution to reduce redundant reinitialization during benchmark operations.

Models like GLM-5 ship tokenizer_config.json with tokenizer_class set
to a non-standard class (e.g., TokenizersBackend from transformers 5.x)
that AutoTokenizer cannot load. trtllm-serve already supports a
custom_tokenizer LlmArgs field with built-in aliases, but trtllm-bench
calls AutoTokenizer.from_pretrained() directly before LlmArgs is
constructed, so --extra_llm_api_options cannot help.

Add --custom_tokenizer to both throughput and latency sub-commands.
The option accepts a built-in alias (deepseek_v32, glm_moe_dsa) or a
fully-qualified module.path.ClassName, reusing the same
TOKENIZER_ALIASES registry from llm_args.py (now promoted to module
level so it can be imported by the bench utilities).

Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
@qiaoxj07 qiaoxj07 requested review from a team as code owners March 30, 2026 08:32
@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

/bot run

@qiaoxj07 qiaoxj07 requested a review from dc3671 March 30, 2026 08:33
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

This change adds support for custom tokenizers to the benchmark suite by introducing a --custom_tokenizer CLI option to both latency and throughput commands. The initialize_tokenizer() function is updated to resolve custom tokenizers via TOKENIZER_ALIASES or dynamic import, and TOKENIZER_ALIASES is refactored to a module-level constant to avoid recreation on each call.

Changes

Cohort / File(s) Summary
CLI Option Addition
tensorrt_llm/bench/benchmark/low_latency.py, tensorrt_llm/bench/benchmark/throughput.py
Added --custom_tokenizer CLI option accepting a tokenizer alias or fully-qualified class path. Commands now extract this parameter and pass it to initialize_tokenizer().
Tokenizer Initialization Logic
tensorrt_llm/bench/utils/data.py
Extended initialize_tokenizer() to accept optional custom_tokenizer parameter. When provided, resolves via TOKENIZER_ALIASES or dynamically imports and instantiates the specified tokenizer class. Otherwise, uses default AutoTokenizer.from_pretrained() behavior.
Module Constants Refactoring
tensorrt_llm/llmapi/llm_args.py
Moved TOKENIZER_ALIASES from function-local to module-level constant to prevent repeated recreation during validation calls. Alias resolution logic remains unchanged.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI
    participant BenchmarkCmd as latency/throughput Command
    participant InitTokenizer as initialize_tokenizer()
    participant Resolver as Tokenizer Resolver
    participant Tokenizer

    User->>CLI: --custom_tokenizer my.Module.Class
    CLI->>BenchmarkCmd: Parse options
    BenchmarkCmd->>BenchmarkCmd: Extract custom_tokenizer param
    BenchmarkCmd->>InitTokenizer: initialize_tokenizer(model_name, custom_tokenizer)
    
    alt custom_tokenizer provided
        InitTokenizer->>Resolver: Resolve via TOKENIZER_ALIASES<br/>or dynamic import
        Resolver->>Tokenizer: from_pretrained(model_name)
    else custom_tokenizer not provided
        InitTokenizer->>Tokenizer: AutoTokenizer.from_pretrained(model_name)
    end
    
    Tokenizer->>InitTokenizer: Return tokenizer instance
    InitTokenizer->>BenchmarkCmd: Return initialized tokenizer
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding a --custom_tokenizer CLI option to trtllm-bench, which aligns with the core modifications across multiple files.
Description check ✅ Passed The PR description covers the motivation, implementation approach, usage example, and test plan, meeting the template requirements for explaining what and why.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/bench/benchmark/low_latency.py`:
- Around line 100-108: The changed option block decorated with `@optgroup.option`
for "--custom_tokenizer" is failing YAPF formatting; reformat that block (and
the similar block around lines 212-215) to comply with the project's YAPF style
by running the formatter (e.g., pre-commit run yapf --files
tensorrt_llm/bench/benchmark/low_latency.py or yapf -i on that file) and commit
the rewritten/normalized code so the option declaration for "--custom_tokenizer"
(and the other affected option) matches the project's line-wrapping and
indentation rules.

In `@tensorrt_llm/bench/benchmark/throughput.py`:
- Around line 131-139: YAPF formatting failed for the updated optgroup.option
block that defines the "--custom_tokenizer" option; run the project's YAPF
formatter on tensorrt_llm/bench/benchmark/throughput.py (and the nearby affected
block around the optgroup.option at the other mentioned location) to rewrite the
option declaration to conform to style rules (wrap/align the multi-line help
string and arguments as the formatter expects) so the file passes CI; locate the
option by the optgroup.option decorator and the "--custom_tokenizer" option name
when applying the formatter.

In `@tensorrt_llm/bench/utils/data.py`:
- Around line 29-40: Wrap the custom tokenizer resolution and import logic (the
block that uses TOKENIZER_ALIASES, tokenizer_path, tokenizer_path.rsplit(...),
import_module, getattr and tokenizer_class.from_pretrained) in a try/except that
validates inputs and converts low-level errors into a clear, user-facing
exception; specifically check that custom_tokenizer is a non-empty string, use
rsplit safely and ensure module_path and class_name are present, catch
ImportError/ModuleNotFoundError/AttributeError/ValueError and re-raise a
ValueError (or a custom exception) with a message like "Invalid custom_tokenizer
'{custom_tokenizer}': could not resolve '<module>.<Class>' - <original error>"
so callers see a concise, actionable error instead of an opaque traceback.

In `@tensorrt_llm/llmapi/llm_args.py`:
- Around line 2389-2395: YAPF formatting is failing CI for the TOKENIZER_ALIASES
block in tensorrt_llm.llmapi.llm_args (symbol TOKENIZER_ALIASES); run the
project's YAPF pre-commit or formatting command on this file, reformat the
dictionary to match the repo style (e.g., proper line breaks, indentation and
trailing comma rules), and re-commit the updated llm_args.py so pre-commit/yapf
no longer modifies it in CI.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c4532387-2696-46ce-8865-d1dcc9d89279

📥 Commits

Reviewing files that changed from the base of the PR and between 58d6975 and 873c4f0.

📒 Files selected for processing (4)
  • tensorrt_llm/bench/benchmark/low_latency.py
  • tensorrt_llm/bench/benchmark/throughput.py
  • tensorrt_llm/bench/utils/data.py
  • tensorrt_llm/llmapi/llm_args.py

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40701 [ run ] triggered by Bot. Commit: 873c4f0 Link to invocation

Copy link
Copy Markdown
Collaborator

@dc3671 dc3671 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Add try/except around custom tokenizer loading in initialize_tokenizer
for clear error messages. Run yapf on all changed files.

Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40709 [ run ] triggered by Bot. Commit: 7cb48c9 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40701 [ run ] completed with state ABORTED. Commit: 873c4f0

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40709 [ run ] completed with state SUCCESS. Commit: 7cb48c9
/LLM/main/L0_MergeRequest_PR pipeline #31737 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40809 [ run ] triggered by Bot. Commit: 7cb48c9 Link to invocation

Same TokenizersBackend issue as trtllm-bench: benchmark_serving.py
calls AutoTokenizer.from_pretrained() directly via get_tokenizer(),
which fails for models like GLM-5. Add --custom-tokenizer CLI arg
and wire it through get_tokenizer() using the shared
TOKENIZER_ALIASES registry.

Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40821 [ run ] triggered by Bot. Commit: 5a8d753 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40809 [ run ] completed with state ABORTED. Commit: 7cb48c9

Link to invocation

benchmark_serving.py accesses tokenizer.vocab_size for synthetic dataset
generation. GlmMoeDsaTokenizer wraps PreTrainedTokenizerFast but does
not delegate attribute access, so vocab_size was missing. Add a
property that forwards to the inner tokenizer.

Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40893 [ run ] triggered by Bot. Commit: f17b7f6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40893 [ run ] completed with state SUCCESS. Commit: f17b7f6
/LLM/main/L0_MergeRequest_PR pipeline #31895 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

qiaoxj07 commented Apr 1, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41050 [ run ] triggered by Bot. Commit: f17b7f6 Link to invocation

@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

qiaoxj07 commented Apr 2, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41461 [ run ] triggered by Bot. Commit: 2af36c2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41461 [ run ] completed with state SUCCESS. Commit: 2af36c2
/LLM/main/L0_MergeRequest_PR pipeline #32392 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@longlee0622
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41530 [ run ] triggered by Bot. Commit: 2af36c2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41530 [ run ] completed with state SUCCESS. Commit: 2af36c2
/LLM/main/L0_MergeRequest_PR pipeline #32446 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

qiaoxj07 commented Apr 4, 2026

/bot run --disable-fail-fast

1 similar comment
@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

qiaoxj07 commented Apr 4, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41770 [ run ] triggered by Bot. Commit: 2af36c2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41771 [ run ] triggered by Bot. Commit: 2af36c2 Link to invocation

@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

qiaoxj07 commented Apr 4, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41780 [ run ] triggered by Bot. Commit: 5baae27 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41780 [ run ] completed with state SUCCESS. Commit: 5baae27
/LLM/main/L0_MergeRequest_PR pipeline #32676 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

qiaoxj07 commented Apr 4, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41814 [ run ] triggered by Bot. Commit: 5baae27 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41814 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 4/4.

Link to invocation

@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

qiaoxj07 commented Apr 4, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41816 [ run ] triggered by Bot. Commit: 5baae27 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41816 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 4/4.

Link to invocation

@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

qiaoxj07 commented Apr 4, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41817 [ run ] triggered by Bot. Commit: 5baae27 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41817 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 4/4.

Link to invocation

@qiaoxj07
Copy link
Copy Markdown
Collaborator Author

qiaoxj07 commented Apr 5, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41849 [ run ] triggered by Bot. Commit: 5baae27 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41849 [ run ] completed with state SUCCESS. Commit: 5baae27
/LLM/main/L0_MergeRequest_PR pipeline #32719 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

@longlee0622 longlee0622 merged commit 6d0a8b3 into NVIDIA:main Apr 5, 2026
5 checks passed
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026
…A#12586)

Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
…A#12586)

Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants