[None][feat] Add multi-turn support for trtllm-bench#12468
[None][feat] Add multi-turn support for trtllm-bench#12468cascade812 merged 5 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
📝 WalkthroughWalkthroughMulti-turn conversation benchmarking support is added to the throughput system. The dataset loading now detects multi-turn requests and extracts associated metadata. When multi-turn requests are present, a tokenizer is passed through the benchmarking pipeline to enable sequential turn-by-turn processing within single concurrency slots, with aggregated token counts across all turns per request. Changes
Sequence Diagram(s)sequenceDiagram
participant Dataset
participant Coordinator as Benchmark<br/>Coordinator
participant AsyncMgr as Async<br/>Manager
participant LlmMgr as LLM<br/>Manager
participant Tokenizer
Dataset->>Coordinator: Load requests (with is_multi_turn flag)
Coordinator->>Coordinator: Detect multi-turn requests
alt Has Multi-Turn Requests
Coordinator->>Tokenizer: Initialize tokenizer
Coordinator->>AsyncMgr: async_benchmark(tokenizer=multi_turn_tokenizer)
else No Multi-Turn Requests
Coordinator->>AsyncMgr: async_benchmark(tokenizer=None)
end
AsyncMgr->>LlmMgr: Store tokenizer on manager
loop For Each Request
AsyncMgr->>AsyncMgr: Check request.is_multi_turn
alt is_multi_turn && tokenizer present
AsyncMgr->>LlmMgr: process_multi_turn_request()
loop For Each Turn
LlmMgr->>LlmMgr: Build chat message
LlmMgr->>LlmMgr: Generate with streaming=False
LlmMgr->>Tokenizer: Decode assistant tokens
LlmMgr->>LlmMgr: Accumulate token counts
end
LlmMgr->>AsyncMgr: Emit single PerfItemTuple(all turns)
else Single-turn
AsyncMgr->>LlmMgr: _process_single_request()
LlmMgr->>AsyncMgr: Emit PerfItemTuple
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tensorrt_llm/bench/utils/data.py (1)
166-168: Consider addingstrict=Truetozip()for safety.All lists (
prompts,all_logits,all_osl, etc.) are appended in lockstep within the same loop, so they should always have the same length. However, addingstrict=Truewould catch any future bugs where the lists diverge.♻️ Proposed fix
- for prompt, logits, osl, task_id, lora_request, turns, category, question_id in zip( - prompts, all_logits, all_osl, task_ids, lora_requests, all_turns, - all_categories, all_question_ids): + for prompt, logits, osl, task_id, lora_request, turns, category, question_id in zip( + prompts, all_logits, all_osl, task_ids, lora_requests, all_turns, + all_categories, all_question_ids, strict=True):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/bench/utils/data.py` around lines 166 - 168, The zip over prompts, all_logits, all_osl, task_ids, lora_requests, all_turns, all_categories, all_question_ids should be made strict to catch any length mismatches at runtime; update the for-loop header that currently iterates with zip(prompts, all_logits, all_osl, task_ids, lora_requests, all_turns, all_categories, all_question_ids) to call zip(..., strict=True) so a ValueError is raised if any list lengths diverge, leaving the rest of the loop intact and running tests to ensure compatibility with your Python version.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/bench/utils/data.py`:
- Around line 118-128: The code can raise IndexError when turns is an empty list
because prompts.append uses turns[0]; change the conditional and append logic in
the block handling turns so you only access turns[0] when turns and len(turns)>0
(e.g., compute prompt_val = data.get("prompt") or (turns[0] if len(turns) > 0
else None)) and then append prompt_val to prompts; keep the other appends
(media_paths, all_logits, all_turns, all_categories, all_question_ids, all_osl,
task_ids) unchanged but ensure all_turns still appends turns (even if empty) as
currently intended.
---
Nitpick comments:
In `@tensorrt_llm/bench/utils/data.py`:
- Around line 166-168: The zip over prompts, all_logits, all_osl, task_ids,
lora_requests, all_turns, all_categories, all_question_ids should be made strict
to catch any length mismatches at runtime; update the for-loop header that
currently iterates with zip(prompts, all_logits, all_osl, task_ids,
lora_requests, all_turns, all_categories, all_question_ids) to call zip(...,
strict=True) so a ValueError is raised if any list lengths diverge, leaving the
rest of the loop intact and running tests to ensure compatibility with your
Python version.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fff7d13a-a90c-4b5e-b064-45c48a5ca1ad
📒 Files selected for processing (4)
tensorrt_llm/bench/benchmark/throughput.pytensorrt_llm/bench/benchmark/utils/asynchronous.pytensorrt_llm/bench/dataclasses/general.pytensorrt_llm/bench/utils/data.py
|
/bot run |
|
PR_Github #40171 [ run ] triggered by Bot. Commit: |
|
PR_Github #40171 [ run ] completed with state |
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
|
/bot run |
|
PR_Github #40362 [ run ] triggered by Bot. Commit: |
|
PR_Github #40362 [ run ] completed with state
|
|
/bot run |
|
PR_Github #40651 [ run ] triggered by Bot. Commit: |
|
PR_Github #40651 [ run ] completed with state
|
|
@cascade812 -- Thanks for the contribution to Question -- I noticed you added a new JSONL format, are you using a dataset you generated? I'm wondering because I noticed that the Additionally, I noticed that the multi-turn benchmark has a |
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
|
@FrankD412, thanks for your review. For second question, I've changed it by offloading both tokenizer.decode() and tokenizer.apply_chat_template() to the thread pool, so they no longer block the event loop. |
FrankD412
left a comment
There was a problem hiding this comment.
Awesome -- thanks for the updates @cascade812, lgtm!
|
/bot run |
|
PR_Github #41589 [ run ] triggered by Bot. Commit: |
|
PR_Github #41589 [ run ] completed with state
|
|
/bot run |
|
PR_Github #41713 [ run ] triggered by Bot. Commit: |
|
PR_Github #41713 [ run ] completed with state
|
|
/bot run |
|
PR_Github #41733 [ run ] triggered by Bot. Commit: |
|
PR_Github #41733 [ run ] completed with state |
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
Summary by CodeRabbit
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.