[None][feat] Add --custom_tokenizer CLI option to trtllm-bench by qiaoxj07 · Pull Request #12586 · NVIDIA/TensorRT-LLM

qiaoxj07 · 2026-03-30T08:32:29Z

Summary

Models like GLM-5 ship tokenizer_config.json with tokenizer_class set to TokenizersBackend (from transformers 5.x), which AutoTokenizer cannot load.
trtllm-serve already supports custom_tokenizer via LlmArgs, but trtllm-bench calls AutoTokenizer.from_pretrained() directly for dataset preprocessing before LlmArgs is constructed, so --extra_llm_api_options with custom_tokenizer cannot help.
Add --custom_tokenizer CLI option to both throughput and latency sub-commands, accepting built-in aliases (deepseek_v32, glm_moe_dsa) or fully-qualified module.path.ClassName.
Promote TOKENIZER_ALIASES from a local variable in BaseLlmArgs to module-level in llm_args.py so it can be shared with bench utilities.

Usage

trtllm-bench --model /path/to/GLM-5-NVFP4 throughput   --custom_tokenizer glm_moe_dsa   --dataset <path> ...

Test plan

Verify trtllm-bench throughput --custom_tokenizer glm_moe_dsa loads GLM-5 tokenizer successfully
Verify trtllm-bench throughput without --custom_tokenizer still uses AutoTokenizer (no regression for DeepSeek V3.2, Llama, etc.)
Verify trtllm-bench latency --custom_tokenizer glm_moe_dsa works similarly

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added --custom_tokenizer CLI option to latency and throughput benchmark commands, enabling users to specify custom tokenizers by alias or fully-qualified class path.
Chores
- Optimized tokenizer alias resolution to reduce redundant reinitialization during benchmark operations.

Models like GLM-5 ship tokenizer_config.json with tokenizer_class set to a non-standard class (e.g., TokenizersBackend from transformers 5.x) that AutoTokenizer cannot load. trtllm-serve already supports a custom_tokenizer LlmArgs field with built-in aliases, but trtllm-bench calls AutoTokenizer.from_pretrained() directly before LlmArgs is constructed, so --extra_llm_api_options cannot help. Add --custom_tokenizer to both throughput and latency sub-commands. The option accepts a built-in alias (deepseek_v32, glm_moe_dsa) or a fully-qualified module.path.ClassName, reusing the same TOKENIZER_ALIASES registry from llm_args.py (now promoted to module level so it can be imported by the bench utilities). Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>

qiaoxj07 · 2026-03-30T08:33:28Z

/bot run

coderabbitai · 2026-03-30T08:39:42Z

📝 Walkthrough

Walkthrough

This change adds support for custom tokenizers to the benchmark suite by introducing a --custom_tokenizer CLI option to both latency and throughput commands. The initialize_tokenizer() function is updated to resolve custom tokenizers via TOKENIZER_ALIASES or dynamic import, and TOKENIZER_ALIASES is refactored to a module-level constant to avoid recreation on each call.

Changes

Cohort / File(s)	Summary
CLI Option Addition `tensorrt_llm/bench/benchmark/low_latency.py`, `tensorrt_llm/bench/benchmark/throughput.py`	Added `--custom_tokenizer` CLI option accepting a tokenizer alias or fully-qualified class path. Commands now extract this parameter and pass it to `initialize_tokenizer()`.
Tokenizer Initialization Logic `tensorrt_llm/bench/utils/data.py`	Extended `initialize_tokenizer()` to accept optional `custom_tokenizer` parameter. When provided, resolves via `TOKENIZER_ALIASES` or dynamically imports and instantiates the specified tokenizer class. Otherwise, uses default `AutoTokenizer.from_pretrained()` behavior.
Module Constants Refactoring `tensorrt_llm/llmapi/llm_args.py`	Moved `TOKENIZER_ALIASES` from function-local to module-level constant to prevent repeated recreation during validation calls. Alias resolution logic remains unchanged.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI
    participant BenchmarkCmd as latency/throughput Command
    participant InitTokenizer as initialize_tokenizer()
    participant Resolver as Tokenizer Resolver
    participant Tokenizer

    User->>CLI: --custom_tokenizer my.Module.Class
    CLI->>BenchmarkCmd: Parse options
    BenchmarkCmd->>BenchmarkCmd: Extract custom_tokenizer param
    BenchmarkCmd->>InitTokenizer: initialize_tokenizer(model_name, custom_tokenizer)
    
    alt custom_tokenizer provided
        InitTokenizer->>Resolver: Resolve via TOKENIZER_ALIASES<br/>or dynamic import
        Resolver->>Tokenizer: from_pretrained(model_name)
    else custom_tokenizer not provided
        InitTokenizer->>Tokenizer: AutoTokenizer.from_pretrained(model_name)
    end
    
    Tokenizer->>InitTokenizer: Return tokenizer instance
    InitTokenizer->>BenchmarkCmd: Return initialized tokenizer

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: adding a --custom_tokenizer CLI option to trtllm-bench, which aligns with the core modifications across multiple files.
Description check	✅ Passed	The PR description covers the motivation, implementation approach, usage example, and test plan, meeting the template requirements for explaining what and why.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/bench/benchmark/low_latency.py`:
- Around line 100-108: The changed option block decorated with `@optgroup.option`
for "--custom_tokenizer" is failing YAPF formatting; reformat that block (and
the similar block around lines 212-215) to comply with the project's YAPF style
by running the formatter (e.g., pre-commit run yapf --files
tensorrt_llm/bench/benchmark/low_latency.py or yapf -i on that file) and commit
the rewritten/normalized code so the option declaration for "--custom_tokenizer"
(and the other affected option) matches the project's line-wrapping and
indentation rules.

In `@tensorrt_llm/bench/benchmark/throughput.py`:
- Around line 131-139: YAPF formatting failed for the updated optgroup.option
block that defines the "--custom_tokenizer" option; run the project's YAPF
formatter on tensorrt_llm/bench/benchmark/throughput.py (and the nearby affected
block around the optgroup.option at the other mentioned location) to rewrite the
option declaration to conform to style rules (wrap/align the multi-line help
string and arguments as the formatter expects) so the file passes CI; locate the
option by the optgroup.option decorator and the "--custom_tokenizer" option name
when applying the formatter.

In `@tensorrt_llm/bench/utils/data.py`:
- Around line 29-40: Wrap the custom tokenizer resolution and import logic (the
block that uses TOKENIZER_ALIASES, tokenizer_path, tokenizer_path.rsplit(...),
import_module, getattr and tokenizer_class.from_pretrained) in a try/except that
validates inputs and converts low-level errors into a clear, user-facing
exception; specifically check that custom_tokenizer is a non-empty string, use
rsplit safely and ensure module_path and class_name are present, catch
ImportError/ModuleNotFoundError/AttributeError/ValueError and re-raise a
ValueError (or a custom exception) with a message like "Invalid custom_tokenizer
'{custom_tokenizer}': could not resolve '<module>.<Class>' - <original error>"
so callers see a concise, actionable error instead of an opaque traceback.

In `@tensorrt_llm/llmapi/llm_args.py`:
- Around line 2389-2395: YAPF formatting is failing CI for the TOKENIZER_ALIASES
block in tensorrt_llm.llmapi.llm_args (symbol TOKENIZER_ALIASES); run the
project's YAPF pre-commit or formatting command on this file, reformat the
dictionary to match the repo style (e.g., proper line breaks, indentation and
trailing comma rules), and re-commit the updated llm_args.py so pre-commit/yapf
no longer modifies it in CI.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c4532387-2696-46ce-8865-d1dcc9d89279

📥 Commits

Reviewing files that changed from the base of the PR and between 58d6975 and 873c4f0.

📒 Files selected for processing (4)

tensorrt_llm/bench/benchmark/low_latency.py
tensorrt_llm/bench/benchmark/throughput.py
tensorrt_llm/bench/utils/data.py
tensorrt_llm/llmapi/llm_args.py

tensorrt_llm/bench/benchmark/low_latency.py

tensorrt_llm/bench/benchmark/throughput.py

tensorrt_llm/bench/utils/data.py

tensorrt_llm/llmapi/llm_args.py

tensorrt-cicd · 2026-03-30T08:40:07Z

PR_Github #40701 [ run ] triggered by Bot. Commit: 873c4f0 Link to invocation

dc3671

LGTM

Add try/except around custom tokenizer loading in initialize_tokenizer for clear error messages. Run yapf on all changed files. Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>

qiaoxj07 · 2026-03-30T09:16:27Z

/bot run

tensorrt-cicd · 2026-03-30T09:23:00Z

PR_Github #40709 [ run ] triggered by Bot. Commit: 7cb48c9 Link to invocation

tensorrt-cicd · 2026-03-30T09:23:02Z

PR_Github #40701 [ run ] completed with state ABORTED. Commit: 873c4f0

Link to invocation

tensorrt-cicd · 2026-03-30T14:59:02Z

PR_Github #40709 [ run ] completed with state SUCCESS. Commit: 7cb48c9
/LLM/main/L0_MergeRequest_PR pipeline #31737 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

qiaoxj07 · 2026-03-31T01:15:57Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-31T01:23:15Z

PR_Github #40809 [ run ] triggered by Bot. Commit: 7cb48c9 Link to invocation

Same TokenizersBackend issue as trtllm-bench: benchmark_serving.py calls AutoTokenizer.from_pretrained() directly via get_tokenizer(), which fails for models like GLM-5. Add --custom-tokenizer CLI arg and wire it through get_tokenizer() using the shared TOKENIZER_ALIASES registry. Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>

qiaoxj07 · 2026-03-31T02:15:05Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-31T02:21:25Z

PR_Github #40821 [ run ] triggered by Bot. Commit: 5a8d753 Link to invocation

tensorrt-cicd · 2026-03-31T02:21:26Z

PR_Github #40809 [ run ] completed with state ABORTED. Commit: 7cb48c9

Link to invocation

benchmark_serving.py accesses tokenizer.vocab_size for synthetic dataset generation. GlmMoeDsaTokenizer wraps PreTrainedTokenizerFast but does not delegate attribute access, so vocab_size was missing. Add a property that forwards to the inner tokenizer. Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>

qiaoxj07 · 2026-03-31T07:23:14Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-31T07:29:49Z

PR_Github #40893 [ run ] triggered by Bot. Commit: f17b7f6 Link to invocation

tensorrt-cicd · 2026-03-31T18:10:38Z

PR_Github #40893 [ run ] completed with state SUCCESS. Commit: f17b7f6
/LLM/main/L0_MergeRequest_PR pipeline #31895 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

qiaoxj07 · 2026-04-01T01:09:16Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-01T01:18:26Z

PR_Github #41050 [ run ] triggered by Bot. Commit: f17b7f6 Link to invocation

qiaoxj07 · 2026-04-02T16:14:11Z

/bot run

tensorrt-cicd · 2026-04-02T16:20:48Z

PR_Github #41461 [ run ] triggered by Bot. Commit: 2af36c2 Link to invocation

tensorrt-cicd · 2026-04-02T17:49:43Z

PR_Github #41461 [ run ] completed with state SUCCESS. Commit: 2af36c2
/LLM/main/L0_MergeRequest_PR pipeline #32392 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

longlee0622 · 2026-04-03T00:40:59Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-03T00:46:47Z

PR_Github #41530 [ run ] triggered by Bot. Commit: 2af36c2 Link to invocation

tensorrt-cicd · 2026-04-03T01:39:49Z

PR_Github #41530 [ run ] completed with state SUCCESS. Commit: 2af36c2
/LLM/main/L0_MergeRequest_PR pipeline #32446 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

qiaoxj07 · 2026-04-04T00:40:51Z

/bot run --disable-fail-fast

qiaoxj07 · 2026-04-04T00:46:03Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-04T00:46:36Z

PR_Github #41770 [ run ] triggered by Bot. Commit: 2af36c2 Link to invocation

tensorrt-cicd · 2026-04-04T00:52:26Z

PR_Github #41771 [ run ] triggered by Bot. Commit: 2af36c2 Link to invocation

qiaoxj07 · 2026-04-04T01:35:38Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-04T01:42:06Z

PR_Github #41780 [ run ] triggered by Bot. Commit: 5baae27 Link to invocation

tensorrt-cicd · 2026-04-04T09:16:01Z

PR_Github #41780 [ run ] completed with state SUCCESS. Commit: 5baae27
/LLM/main/L0_MergeRequest_PR pipeline #32676 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

qiaoxj07 · 2026-04-04T09:42:19Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-04T09:47:43Z

PR_Github #41814 [ run ] triggered by Bot. Commit: 5baae27 Link to invocation

tensorrt-cicd · 2026-04-04T09:47:44Z

PR_Github #41814 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 4/4.

Link to invocation

qiaoxj07 · 2026-04-04T10:42:48Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-04T10:48:41Z

PR_Github #41816 [ run ] triggered by Bot. Commit: 5baae27 Link to invocation

tensorrt-cicd · 2026-04-04T10:48:42Z

PR_Github #41816 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 4/4.

Link to invocation

qiaoxj07 · 2026-04-04T10:51:31Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-04T10:57:06Z

PR_Github #41817 [ run ] triggered by Bot. Commit: 5baae27 Link to invocation

tensorrt-cicd · 2026-04-04T10:57:07Z

PR_Github #41817 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 4/4.

Link to invocation

qiaoxj07 · 2026-04-05T04:58:12Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-05T05:03:44Z

PR_Github #41849 [ run ] triggered by Bot. Commit: 5baae27 Link to invocation

tensorrt-cicd · 2026-04-05T08:18:01Z

PR_Github #41849 [ run ] completed with state SUCCESS. Commit: 5baae27
/LLM/main/L0_MergeRequest_PR pipeline #32719 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

…A#12586) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>

qiaoxj07 requested review from a team as code owners March 30, 2026 08:32

qiaoxj07 requested review from FrankD412 and zhenhuaw-me March 30, 2026 08:32

github-actions bot assigned qiaoxj07 Mar 30, 2026

qiaoxj07 requested a review from dc3671 March 30, 2026 08:33

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

tensorrt_llm/bench/benchmark/low_latency.py Show resolved Hide resolved

tensorrt_llm/bench/benchmark/throughput.py Show resolved Hide resolved

tensorrt_llm/bench/utils/data.py Outdated Show resolved Hide resolved

tensorrt_llm/llmapi/llm_args.py Show resolved Hide resolved

dc3671 approved these changes Mar 30, 2026

View reviewed changes

[None][fix] Address review: add error handling and fix yapf formatting

7cb48c9

Add try/except around custom tokenizer loading in initialize_tokenizer for clear error messages. Run yapf on all changed files. Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>

FrankD412 approved these changes Mar 31, 2026

View reviewed changes

FrankD412 approved these changes Apr 3, 2026

View reviewed changes

Merge branch 'main' into feat/bench-custom-tokenizer

5baae27

longlee0622 merged commit 6d0a8b3 into NVIDIA:main Apr 5, 2026
5 checks passed

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026

[None][feat] Add --custom_tokenizer CLI option to trtllm-bench (NVIDI…

e1b1e58

…A#12586) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>

karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026

[None][feat] Add --custom_tokenizer CLI option to trtllm-bench (NVIDI…

1fdb885

…A#12586) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>

Conversation

qiaoxj07 commented Mar 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Test plan

Summary by CodeRabbit

Uh oh!

qiaoxj07 commented Mar 30, 2026

Uh oh!

coderabbitai bot commented Mar 30, 2026

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 30, 2026

Uh oh!

dc3671 left a comment

Choose a reason for hiding this comment

Uh oh!

qiaoxj07 commented Mar 30, 2026

Uh oh!

tensorrt-cicd commented Mar 30, 2026

Uh oh!

tensorrt-cicd commented Mar 30, 2026

Uh oh!

tensorrt-cicd commented Mar 30, 2026

Uh oh!

qiaoxj07 commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

qiaoxj07 commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

qiaoxj07 commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

qiaoxj07 commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

qiaoxj07 commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

longlee0622 commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

qiaoxj07 commented Apr 4, 2026

Uh oh!

qiaoxj07 commented Apr 4, 2026

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

qiaoxj07 commented Apr 4, 2026

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

qiaoxj07 commented Mar 30, 2026 •

edited by coderabbitai bot

Loading