Skip to content

[None][feat] Add multi-turn support for trtllm-bench#12468

Merged
cascade812 merged 5 commits intoNVIDIA:mainfrom
cascade812:guiju/multi-turn
Apr 4, 2026
Merged

[None][feat] Add multi-turn support for trtllm-bench#12468
cascade812 merged 5 commits intoNVIDIA:mainfrom
cascade812:guiju/multi-turn

Conversation

@cascade812
Copy link
Copy Markdown
Collaborator

@cascade812 cascade812 commented Mar 24, 2026

Summary by CodeRabbit

  • New Features
    • Benchmark throughput for multi-turn conversations with automatic detection and support for multi-turn request patterns
    • New JSONL dataset format supporting multi-turn conversation inputs alongside single-turn prompts, with additional metadata fields for categories and question tracking
    • Refined sequence-length and token-count calculations to accurately measure performance metrics across multi-turn conversation scenarios

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
@cascade812 cascade812 requested a review from a team as a code owner March 24, 2026 01:15
@cascade812 cascade812 requested a review from FrankD412 March 24, 2026 01:15
@cascade812 cascade812 changed the title Add multi-turn support for trtllm-bench [None][feat] Add multi-turn support for trtllm-bench Mar 24, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

Multi-turn conversation benchmarking support is added to the throughput system. The dataset loading now detects multi-turn requests and extracts associated metadata. When multi-turn requests are present, a tokenizer is passed through the benchmarking pipeline to enable sequential turn-by-turn processing within single concurrency slots, with aggregated token counts across all turns per request.

Changes

Cohort / File(s) Summary
Data Models and Structure
tensorrt_llm/bench/dataclasses/general.py
Added turns, category, and question_id optional fields to InferenceRequest; added is_multi_turn property; updated validation to accept turns as alternative to prompt and input_ids.
Dataset Loading
tensorrt_llm/bench/utils/data.py
Extended JSONL parsing to support multi-turn format with turns, category, question_id fields; adjusted sequence-length accounting to scale output tokens by turn count; fixed docstring typo.
Async Benchmark Execution
tensorrt_llm/bench/benchmark/utils/asynchronous.py
Added optional tokenizer parameter to async_benchmark(); implemented _process_multi_turn_request() for sequential turn-by-turn processing with accumulated token counts; refactored process_request() to conditionally route to multi-turn or single-turn paths.
Benchmark Orchestration
tensorrt_llm/bench/benchmark/throughput.py
Added detection of multi-turn requests in dataset; passes multi_turn_tokenizer to both warmup and main async_benchmark() calls when multi-turn requests are present.

Sequence Diagram(s)

sequenceDiagram
    participant Dataset
    participant Coordinator as Benchmark<br/>Coordinator
    participant AsyncMgr as Async<br/>Manager
    participant LlmMgr as LLM<br/>Manager
    participant Tokenizer

    Dataset->>Coordinator: Load requests (with is_multi_turn flag)
    Coordinator->>Coordinator: Detect multi-turn requests
    alt Has Multi-Turn Requests
        Coordinator->>Tokenizer: Initialize tokenizer
        Coordinator->>AsyncMgr: async_benchmark(tokenizer=multi_turn_tokenizer)
    else No Multi-Turn Requests
        Coordinator->>AsyncMgr: async_benchmark(tokenizer=None)
    end
    
    AsyncMgr->>LlmMgr: Store tokenizer on manager
    
    loop For Each Request
        AsyncMgr->>AsyncMgr: Check request.is_multi_turn
        alt is_multi_turn && tokenizer present
            AsyncMgr->>LlmMgr: process_multi_turn_request()
            loop For Each Turn
                LlmMgr->>LlmMgr: Build chat message
                LlmMgr->>LlmMgr: Generate with streaming=False
                LlmMgr->>Tokenizer: Decode assistant tokens
                LlmMgr->>LlmMgr: Accumulate token counts
            end
            LlmMgr->>AsyncMgr: Emit single PerfItemTuple(all turns)
        else Single-turn
            AsyncMgr->>LlmMgr: _process_single_request()
            LlmMgr->>AsyncMgr: Emit PerfItemTuple
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description is incomplete with empty Description and Test Coverage sections, failing to explain the multi-turn support implementation or testing approach. Fill in the Description section explaining what multi-turn support was added and why, and complete the Test Coverage section listing relevant tests that validate the changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding multi-turn support to trtllm-bench. It is concise, specific, and directly related to the changeset modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tensorrt_llm/bench/utils/data.py (1)

166-168: Consider adding strict=True to zip() for safety.

All lists (prompts, all_logits, all_osl, etc.) are appended in lockstep within the same loop, so they should always have the same length. However, adding strict=True would catch any future bugs where the lists diverge.

♻️ Proposed fix
-    for prompt, logits, osl, task_id, lora_request, turns, category, question_id in zip(
-            prompts, all_logits, all_osl, task_ids, lora_requests, all_turns,
-            all_categories, all_question_ids):
+    for prompt, logits, osl, task_id, lora_request, turns, category, question_id in zip(
+            prompts, all_logits, all_osl, task_ids, lora_requests, all_turns,
+            all_categories, all_question_ids, strict=True):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/bench/utils/data.py` around lines 166 - 168, The zip over
prompts, all_logits, all_osl, task_ids, lora_requests, all_turns,
all_categories, all_question_ids should be made strict to catch any length
mismatches at runtime; update the for-loop header that currently iterates with
zip(prompts, all_logits, all_osl, task_ids, lora_requests, all_turns,
all_categories, all_question_ids) to call zip(..., strict=True) so a ValueError
is raised if any list lengths diverge, leaving the rest of the loop intact and
running tests to ensure compatibility with your Python version.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/bench/utils/data.py`:
- Around line 118-128: The code can raise IndexError when turns is an empty list
because prompts.append uses turns[0]; change the conditional and append logic in
the block handling turns so you only access turns[0] when turns and len(turns)>0
(e.g., compute prompt_val = data.get("prompt") or (turns[0] if len(turns) > 0
else None)) and then append prompt_val to prompts; keep the other appends
(media_paths, all_logits, all_turns, all_categories, all_question_ids, all_osl,
task_ids) unchanged but ensure all_turns still appends turns (even if empty) as
currently intended.

---

Nitpick comments:
In `@tensorrt_llm/bench/utils/data.py`:
- Around line 166-168: The zip over prompts, all_logits, all_osl, task_ids,
lora_requests, all_turns, all_categories, all_question_ids should be made strict
to catch any length mismatches at runtime; update the for-loop header that
currently iterates with zip(prompts, all_logits, all_osl, task_ids,
lora_requests, all_turns, all_categories, all_question_ids) to call zip(...,
strict=True) so a ValueError is raised if any list lengths diverge, leaving the
rest of the loop intact and running tests to ensure compatibility with your
Python version.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fff7d13a-a90c-4b5e-b064-45c48a5ca1ad

📥 Commits

Reviewing files that changed from the base of the PR and between ffb1fed and e83b893.

📒 Files selected for processing (4)
  • tensorrt_llm/bench/benchmark/throughput.py
  • tensorrt_llm/bench/benchmark/utils/asynchronous.py
  • tensorrt_llm/bench/dataclasses/general.py
  • tensorrt_llm/bench/utils/data.py

@cascade812
Copy link
Copy Markdown
Collaborator Author

/bot run

@cascade812 cascade812 requested a review from mikeiovine March 24, 2026 21:40
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40171 [ run ] triggered by Bot. Commit: e83b893 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40171 [ run ] completed with state SUCCESS. Commit: e83b893
/LLM/main/L0_MergeRequest_PR pipeline #31316 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
@cascade812
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40362 [ run ] triggered by Bot. Commit: a2d0727 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40362 [ run ] completed with state FAILURE. Commit: a2d0727
/LLM/main/L0_MergeRequest_PR pipeline #31464 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@cascade812
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40651 [ run ] triggered by Bot. Commit: a2d0727 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40651 [ run ] completed with state SUCCESS. Commit: a2d0727
/LLM/main/L0_MergeRequest_PR pipeline #31686 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@FrankD412
Copy link
Copy Markdown
Collaborator

@cascade812 -- Thanks for the contribution to trtllm-bench. I skimmed over and things seem fine overall but I need to go over it once more in a little more depth.

Question -- I noticed you added a new JSONL format, are you using a dataset you generated? I'm wondering because I noticed that the dataset subcommand wasn't updated. If you've got some code that is generating synthetic multi-turn datasets, would you mind updating the subcommand to add the ability?

Additionally, I noticed that the multi-turn benchmark has a decode call in the async task that manages the request which is somewhat of a heavy weight operation to have in the loop. So far as I'm aware, that does not yield the event loop so it will hold up other requests. Are you trying to measure multi-turn performance with this?

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
@cascade812
Copy link
Copy Markdown
Collaborator Author

@FrankD412, thanks for your review.
For first question, it's not a dataset I generated — the multi-turn JSONL format is added to support existing conversation benchmarks MT-Bench (80 two-turn prompts across 8 categories).
I've now integrated multi-turn support into the existing prepare-dataset real-dataset subcommand.
Example usage:

trtllm-bench --model meta-llama/Llama-3.1-8B-Instruct prepare-dataset --output /tmp/mt_bench.jsonl real-dataset --dataset-name philschmid/mt-bench --dataset-split train --dataset-input-key turns --output-len-dist 256,0

For second question, I've changed it by offloading both tokenizer.decode() and tokenizer.apply_chat_template() to the thread pool, so they no longer block the event loop.

Copy link
Copy Markdown
Collaborator

@FrankD412 FrankD412 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome -- thanks for the updates @cascade812, lgtm!

@cascade812
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41589 [ run ] triggered by Bot. Commit: fc1daca Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41589 [ run ] completed with state SUCCESS. Commit: fc1daca
/LLM/main/L0_MergeRequest_PR pipeline #32498 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@cascade812
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41713 [ run ] triggered by Bot. Commit: fc1daca Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41713 [ run ] completed with state SUCCESS. Commit: fc1daca
/LLM/main/L0_MergeRequest_PR pipeline #32616 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@cascade812
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41733 [ run ] triggered by Bot. Commit: a35668c Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41733 [ run ] completed with state SUCCESS. Commit: a35668c
/LLM/main/L0_MergeRequest_PR pipeline #32634 completed with status: 'SUCCESS'

CI Report

Link to invocation

@cascade812 cascade812 merged commit 13826d3 into NVIDIA:main Apr 4, 2026
5 checks passed
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants