[None][feat] Add multi-turn support for trtllm-bench by cascade812 · Pull Request #12468 · NVIDIA/TensorRT-LLM

cascade812 · 2026-03-24T01:15:38Z

Summary by CodeRabbit

New Features
- Benchmark throughput for multi-turn conversations with automatic detection and support for multi-turn request patterns
- New JSONL dataset format supporting multi-turn conversation inputs alongside single-turn prompts, with additional metadata fields for categories and question tracking
- Refined sequence-length and token-count calculations to accurately measure performance metrics across multi-turn conversation scenarios

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>

coderabbitai · 2026-03-24T01:18:52Z

📝 Walkthrough

Walkthrough

Multi-turn conversation benchmarking support is added to the throughput system. The dataset loading now detects multi-turn requests and extracts associated metadata. When multi-turn requests are present, a tokenizer is passed through the benchmarking pipeline to enable sequential turn-by-turn processing within single concurrency slots, with aggregated token counts across all turns per request.

Changes

Cohort / File(s)	Summary
Data Models and Structure `tensorrt_llm/bench/dataclasses/general.py`	Added `turns`, `category`, and `question_id` optional fields to `InferenceRequest`; added `is_multi_turn` property; updated validation to accept `turns` as alternative to `prompt` and `input_ids`.
Dataset Loading `tensorrt_llm/bench/utils/data.py`	Extended JSONL parsing to support multi-turn format with `turns`, `category`, `question_id` fields; adjusted sequence-length accounting to scale output tokens by turn count; fixed docstring typo.
Async Benchmark Execution `tensorrt_llm/bench/benchmark/utils/asynchronous.py`	Added optional `tokenizer` parameter to `async_benchmark()`; implemented `_process_multi_turn_request()` for sequential turn-by-turn processing with accumulated token counts; refactored `process_request()` to conditionally route to multi-turn or single-turn paths.
Benchmark Orchestration `tensorrt_llm/bench/benchmark/throughput.py`	Added detection of multi-turn requests in dataset; passes `multi_turn_tokenizer` to both warmup and main `async_benchmark()` calls when multi-turn requests are present.

Sequence Diagram(s)

sequenceDiagram
    participant Dataset
    participant Coordinator as Benchmark<br/>Coordinator
    participant AsyncMgr as Async<br/>Manager
    participant LlmMgr as LLM<br/>Manager
    participant Tokenizer

    Dataset->>Coordinator: Load requests (with is_multi_turn flag)
    Coordinator->>Coordinator: Detect multi-turn requests
    alt Has Multi-Turn Requests
        Coordinator->>Tokenizer: Initialize tokenizer
        Coordinator->>AsyncMgr: async_benchmark(tokenizer=multi_turn_tokenizer)
    else No Multi-Turn Requests
        Coordinator->>AsyncMgr: async_benchmark(tokenizer=None)
    end
    
    AsyncMgr->>LlmMgr: Store tokenizer on manager
    
    loop For Each Request
        AsyncMgr->>AsyncMgr: Check request.is_multi_turn
        alt is_multi_turn && tokenizer present
            AsyncMgr->>LlmMgr: process_multi_turn_request()
            loop For Each Turn
                LlmMgr->>LlmMgr: Build chat message
                LlmMgr->>LlmMgr: Generate with streaming=False
                LlmMgr->>Tokenizer: Decode assistant tokens
                LlmMgr->>LlmMgr: Accumulate token counts
            end
            LlmMgr->>AsyncMgr: Emit single PerfItemTuple(all turns)
        else Single-turn
            AsyncMgr->>LlmMgr: _process_single_request()
            LlmMgr->>AsyncMgr: Emit PerfItemTuple
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description is incomplete with empty Description and Test Coverage sections, failing to explain the multi-turn support implementation or testing approach.	Fill in the Description section explaining what multi-turn support was added and why, and complete the Test Coverage section listing relevant tests that validate the changes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: adding multi-turn support to trtllm-bench. It is concise, specific, and directly related to the changeset modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tensorrt_llm/bench/utils/data.py (1)

166-168: Consider adding strict=True to zip() for safety.

All lists (prompts, all_logits, all_osl, etc.) are appended in lockstep within the same loop, so they should always have the same length. However, adding strict=True would catch any future bugs where the lists diverge.

♻️ Proposed fix

-    for prompt, logits, osl, task_id, lora_request, turns, category, question_id in zip(
-            prompts, all_logits, all_osl, task_ids, lora_requests, all_turns,
-            all_categories, all_question_ids):
+    for prompt, logits, osl, task_id, lora_request, turns, category, question_id in zip(
+            prompts, all_logits, all_osl, task_ids, lora_requests, all_turns,
+            all_categories, all_question_ids, strict=True):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/bench/utils/data.py` around lines 166 - 168, The zip over
prompts, all_logits, all_osl, task_ids, lora_requests, all_turns,
all_categories, all_question_ids should be made strict to catch any length
mismatches at runtime; update the for-loop header that currently iterates with
zip(prompts, all_logits, all_osl, task_ids, lora_requests, all_turns,
all_categories, all_question_ids) to call zip(..., strict=True) so a ValueError
is raised if any list lengths diverge, leaving the rest of the loop intact and
running tests to ensure compatibility with your Python version.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/bench/utils/data.py`:
- Around line 118-128: The code can raise IndexError when turns is an empty list
because prompts.append uses turns[0]; change the conditional and append logic in
the block handling turns so you only access turns[0] when turns and len(turns)>0
(e.g., compute prompt_val = data.get("prompt") or (turns[0] if len(turns) > 0
else None)) and then append prompt_val to prompts; keep the other appends
(media_paths, all_logits, all_turns, all_categories, all_question_ids, all_osl,
task_ids) unchanged but ensure all_turns still appends turns (even if empty) as
currently intended.

---

Nitpick comments:
In `@tensorrt_llm/bench/utils/data.py`:
- Around line 166-168: The zip over prompts, all_logits, all_osl, task_ids,
lora_requests, all_turns, all_categories, all_question_ids should be made strict
to catch any length mismatches at runtime; update the for-loop header that
currently iterates with zip(prompts, all_logits, all_osl, task_ids,
lora_requests, all_turns, all_categories, all_question_ids) to call zip(...,
strict=True) so a ValueError is raised if any list lengths diverge, leaving the
rest of the loop intact and running tests to ensure compatibility with your
Python version.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fff7d13a-a90c-4b5e-b064-45c48a5ca1ad

📥 Commits

Reviewing files that changed from the base of the PR and between ffb1fed and e83b893.

📒 Files selected for processing (4)

tensorrt_llm/bench/benchmark/throughput.py
tensorrt_llm/bench/benchmark/utils/asynchronous.py
tensorrt_llm/bench/dataclasses/general.py
tensorrt_llm/bench/utils/data.py

tensorrt_llm/bench/utils/data.py

cascade812 · 2026-03-24T21:38:49Z

/bot run

tensorrt-cicd · 2026-03-24T21:44:26Z

PR_Github #40171 [ run ] triggered by Bot. Commit: e83b893 Link to invocation

tensorrt-cicd · 2026-03-25T01:26:19Z

PR_Github #40171 [ run ] completed with state SUCCESS. Commit: e83b893
/LLM/main/L0_MergeRequest_PR pipeline #31316 completed with status: 'SUCCESS'

CI Report

Link to invocation

tensorrt_llm/bench/benchmark/utils/asynchronous.py

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>

cascade812 · 2026-03-25T18:54:49Z

/bot run

tensorrt-cicd · 2026-03-25T19:01:32Z

PR_Github #40362 [ run ] triggered by Bot. Commit: a2d0727 Link to invocation

tensorrt-cicd · 2026-03-26T00:13:17Z

PR_Github #40362 [ run ] completed with state FAILURE. Commit: a2d0727
/LLM/main/L0_MergeRequest_PR pipeline #31464 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

cascade812 · 2026-03-30T03:48:41Z

/bot run

tensorrt-cicd · 2026-03-30T03:54:55Z

PR_Github #40651 [ run ] triggered by Bot. Commit: a2d0727 Link to invocation

tensorrt-cicd · 2026-03-30T06:44:44Z

PR_Github #40651 [ run ] completed with state SUCCESS. Commit: a2d0727
/LLM/main/L0_MergeRequest_PR pipeline #31686 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

FrankD412 · 2026-03-30T16:59:58Z

@cascade812 -- Thanks for the contribution to trtllm-bench. I skimmed over and things seem fine overall but I need to go over it once more in a little more depth.

Question -- I noticed you added a new JSONL format, are you using a dataset you generated? I'm wondering because I noticed that the dataset subcommand wasn't updated. If you've got some code that is generating synthetic multi-turn datasets, would you mind updating the subcommand to add the ability?

Additionally, I noticed that the multi-turn benchmark has a decode call in the async task that manages the request which is somewhat of a heavy weight operation to have in the loop. So far as I'm aware, that does not yield the event loop so it will hold up other requests. Are you trying to measure multi-turn performance with this?

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>

cascade812 · 2026-04-02T00:15:05Z

@FrankD412, thanks for your review.
For first question, it's not a dataset I generated — the multi-turn JSONL format is added to support existing conversation benchmarks MT-Bench (80 two-turn prompts across 8 categories).
I've now integrated multi-turn support into the existing prepare-dataset real-dataset subcommand.
Example usage:

trtllm-bench --model meta-llama/Llama-3.1-8B-Instruct prepare-dataset --output /tmp/mt_bench.jsonl real-dataset --dataset-name philschmid/mt-bench --dataset-split train --dataset-input-key turns --output-len-dist 256,0

For second question, I've changed it by offloading both tokenizer.decode() and tokenizer.apply_chat_template() to the thread pool, so they no longer block the event loop.

FrankD412

Awesome -- thanks for the updates @cascade812, lgtm!

cascade812 · 2026-04-03T04:42:29Z

/bot run

tensorrt-cicd · 2026-04-03T04:47:56Z

PR_Github #41589 [ run ] triggered by Bot. Commit: fc1daca Link to invocation

tensorrt-cicd · 2026-04-03T09:59:47Z

PR_Github #41589 [ run ] completed with state SUCCESS. Commit: fc1daca
/LLM/main/L0_MergeRequest_PR pipeline #32498 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

cascade812 · 2026-04-03T16:43:31Z

/bot run

tensorrt-cicd · 2026-04-03T16:49:56Z

PR_Github #41713 [ run ] triggered by Bot. Commit: fc1daca Link to invocation

tensorrt-cicd · 2026-04-03T17:57:42Z

PR_Github #41713 [ run ] completed with state SUCCESS. Commit: fc1daca
/LLM/main/L0_MergeRequest_PR pipeline #32616 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

cascade812 · 2026-04-03T19:42:26Z

/bot run

tensorrt-cicd · 2026-04-03T19:48:28Z

PR_Github #41733 [ run ] triggered by Bot. Commit: a35668c Link to invocation

tensorrt-cicd · 2026-04-04T00:39:28Z

PR_Github #41733 [ run ] completed with state SUCCESS. Commit: a35668c
/LLM/main/L0_MergeRequest_PR pipeline #32634 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>

add multi-turn support for trtllm-bench

e83b893

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>

cascade812 requested a review from a team as a code owner March 24, 2026 01:15

cascade812 requested a review from FrankD412 March 24, 2026 01:15

github-actions bot assigned cascade812 Mar 24, 2026

cascade812 changed the title ~~Add multi-turn support for trtllm-bench~~ [None][feat] Add multi-turn support for trtllm-bench Mar 24, 2026

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

tensorrt_llm/bench/utils/data.py Show resolved Hide resolved

cascade812 requested a review from mikeiovine March 24, 2026 21:40

venkywonka reviewed Mar 25, 2026

View reviewed changes

tensorrt_llm/bench/benchmark/utils/asynchronous.py Show resolved Hide resolved

address comment

a2d0727

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>

address comment

008ca48

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>

FrankD412 approved these changes Apr 3, 2026

View reviewed changes

Merge branch 'main' into guiju/multi-turn

fc1daca

Merge branch 'main' into guiju/multi-turn

a35668c

cascade812 merged commit 13826d3 into NVIDIA:main Apr 4, 2026
5 checks passed

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026

[None][feat] Add multi-turn support for trtllm-bench (NVIDIA#12468)

b78d868

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>

karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026

[None][feat] Add multi-turn support for trtllm-bench (NVIDIA#12468)

7d99357

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>

Conversation

cascade812 commented Mar 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cascade812 commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

Uh oh!

cascade812 commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 26, 2026

Uh oh!

cascade812 commented Mar 30, 2026

Uh oh!

tensorrt-cicd commented Mar 30, 2026

Uh oh!

tensorrt-cicd commented Mar 30, 2026

Uh oh!

FrankD412 commented Mar 30, 2026

Uh oh!

cascade812 commented Apr 2, 2026

Uh oh!

FrankD412 left a comment

Choose a reason for hiding this comment

Uh oh!

cascade812 commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

cascade812 commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

cascade812 commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cascade812 commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading