[https://nvbugs/6204488][fix] Replace fixed disagg fill throttle with slow-start ramp by chienchunhung · Pull Request #14475 · NVIDIA/TensorRT-LLM

chienchunhung · 2026-05-22T23:59:15Z

Summary by CodeRabbit

Improvements
- Enhanced request admission logic during fill phases with a slow-start ramping mechanism for improved performance and throughput control.
Tests
- Added comprehensive test coverage for slow-start admission behavior, including ramp initialization, iteration scaling, and phase transitions.

Summary

PR #13347's fixed tp_size/iter cap on benchmark disagg fill admission stretched fill into a long ramp at high concurrency (con/tp_size iterations), regressing post-merge perf-sanity gen-only configs by 7–13 % on output token throughput across DeepSeek-R1, GPT-OSS, and Kimi-K2.5 1k1k configurations on B200 / GB200 / GB300.
This PR replaces the fixed cap with a doubling slow-start ramp:

Iter 0 still admits at most tp_size requests — the load-bearing iter-0 burst protection against PR [None][fix] return an explicit error if the requests can't be schedul… #12206's insufficient-KV fail-fast under transient ADP-router imbalance.
Each subsequent iter doubles the cap.
Cap saturates at total_max within ceil(log2(total_max / tp_size)) iters (~10 iters / ~1 s for con=4096, tp=8).
Net effect:
Original Kimi-K2-Thinking 8k1k ctx8/gen1 con=8192 repro continues to pass — iter 0 caps at tp_size exactly as before.
High-concurrency 1k1k post-merge configs no longer pay 7–13 % wall-clock overhead.

Test Coverage

Unit tests — tests/unittest/_torch/executor/test_benchmark_disagg.py

TestBenchmarkFillAdmissionFlowControl (rewritten) — five tests pinning the ramp shape:
- test_first_iter_caps_at_tp_size (iter-0 burst protection invariant, parametrized tp_size ∈ {1, 4, 32}).
- test_ramp_doubles_each_iter_and_saturates_at_total_max (full ramp shape across iters, parametrized tp_size ∈ {1, 4, 8, 32}; also asserts O(log2) convergence).
- test_no_fill_phase_uses_full_available_capacity (throttle disengages once the gate opens).
- test_ramp_state_resets_when_fill_phase_ends (back-to-back benchmarks restart at tp_size).
- test_warmup_skips_throttle (warmup bypasses the ramp).
TestFillPhaseEndToEnd::test_full_lifecycle (existing) — Phase 0 still asserts iter-0 admission caps at TP_SIZE, confirming the ramp's first iter matches the original fixed-cap behavior the lifecycle was written against.
Integration tests — tests/integration/defs/disaggregated/test_disaggregated.py::test_disaggregated_benchmark_gen_only_insufficient_kv. This was the test that exercised PR [https://nvbugs/6093911][fix] Fix disagg gen-only benchmark hang under ADP router imbalance #13347's original repro; the slow-start preserves its iter-0 behavior so it remains green.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

… slow-start ramp Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

chienchunhung · 2026-05-23T02:47:04Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-23T02:53:39Z

PR_Github #50018 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

tensorrt-cicd · 2026-05-23T10:34:56Z

PR_Github #50018 [ run ] completed with state FAILURE. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #39582 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-05-23T15:59:14Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2026-05-23T16:05:01Z

PR_Github #50058 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

tensorrt-cicd · 2026-05-23T20:00:47Z

PR_Github #50058 [ run ] completed with state SUCCESS. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #39616 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-05-24T02:18:51Z

/bot run --disable-fail-fast --add-multi-gpu-test --skip-test "TestQwen3_5_35B_A3B::test_fp8[enable_block_reuse=True]"

tensorrt-cicd · 2026-05-24T02:24:16Z

PR_Github #50070 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: TestQwen3_5_35B_A3B::test_fp8[enable_block_reuse=True]

Link to invocation

chienchunhung · 2026-05-24T02:40:24Z

/bot run --disable-fail-fast --add-multi-gpu-test --skip-test "test_fp8[enable_block_reuse=True]"

tensorrt-cicd · 2026-05-24T02:46:30Z

PR_Github #50075 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: test_fp8[enable_block_reuse=True]

Link to invocation

chienchunhung · 2026-05-24T04:21:26Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2026-05-24T04:26:48Z

PR_Github #50079 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

tensorrt-cicd · 2026-05-24T06:45:36Z

PR_Github #50079 [ run ] completed with state SUCCESS. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #39633 completed with status: 'SUCCESS'

CI Report

Link to invocation

chenfeiz0326 · 2026-05-27T02:20:13Z

/bot run --disable-fail-fast --stage-list "*B200*PerfSanity*,*B300*PerfSanity*,*GB200*PerfSanity*"

tensorrt-cicd · 2026-05-27T02:27:02Z

PR_Github #50426 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

tensorrt-cicd · 2026-05-27T06:44:44Z

PR_Github #50426 [ run ] completed with state SUCCESS. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #39949 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chenfeiz0326 · 2026-05-27T09:02:55Z

With this PR, regressive cases can recover, like:
https://tensorrt-llm.tensorrt-llm-perf-ci-report.sc2-paas.nvidia.com/pre-merge?selectedBranches=main&selectedTests=gen_only-gb200_gpt-oss-120b-fp4_1k1k_con64_ctx1_tp1_gen1_tp4_eplb0_mtp0_ccb-NIXL-con64_iter1_isl1024_osl1024&selectedCurve=__all__&pinnedSection=gen_only-gb200_gpt-oss-120b-fp4_1k1k_con64_ctx1_tp1_gen1_tp4_eplb0_mtp0_ccb-NIXL-con64_iter1_isl1024_osl1024%7Cmain%7Cgb200&in_build-id-input=39949&in_job-name-select=LLM%2Fmain%2FL0_MergeRequest_PR

https://tensorrt-llm.tensorrt-llm-perf-ci-report.sc2-paas.nvidia.com/pre-merge?selectedBranches=main&selectedTests=gen_only-gb200_kimi-k25-thinking-fp4_1k1k_con4096_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-NIXL-con4096_iter1_isl1024_osl1024&selectedCurve=__all__&pinnedSection=gen_only-gb200_kimi-k25-thinking-fp4_1k1k_con4096_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-NIXL-con4096_iter1_isl1024_osl1024%7Cmain%7Cgb200&in_build-id-input=39949&in_job-name-select=LLM%2Fmain%2FL0_MergeRequest_PR

https://tensorrt-llm.tensorrt-llm-perf-ci-report.sc2-paas.nvidia.com/pre-merge?selectedBranches=main&selectedTests=gen_only-gb200_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-NIXL-con1024_iter1_isl1024_osl1024&selectedCurve=__all__&pinnedSection=gen_only-gb200_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-NIXL-con1024_iter1_isl1024_osl1024%7Cmain%7Cgb200&in_build-id-input=39949&in_job-name-select=LLM%2Fmain%2FL0_MergeRequest_PR

chienchunhung · 2026-05-27T15:35:09Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2026-05-27T15:41:18Z

PR_Github #50569 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

coderabbitai · 2026-05-27T15:44:57Z

📝 Walkthrough

Walkthrough

This PR implements a slow-start admission control algorithm for the benchmark disaggregated fill phase. Instead of using a fixed throttling cap, the executor now dynamically increases the per-iteration admission cap by doubling it each iteration until it saturates at a global maximum. The ramp state resets when the fill phase completes.

Changes

Slow-start admission throttle for benchmark disaggregated fill

Layer / File(s)	Summary
Slow-start admission ramp state and logic `tensorrt_llm/_torch/pyexecutor/py_executor.py`	`_fill_admit_cap` field initialized to `0` in `__init__`; reset to `0` when fill gate opens in `_check_benchmark_disagg_gate`; `_pop_from_waiting_queue` reworked to initialize cap at `tp_size` and double it each iteration up to `total_max`.
Test infrastructure and stub initialization for ramp state `tests/unittest/_torch/executor/test_benchmark_disagg.py`	Updated `TestBenchmarkFillAdmissionFlowControl` class docstring and added `_expected_ramp()` helper to compute expected admission-cap sequences; initialized `_fill_admit_cap` to `0` in multiple test executor stubs across `MockBenchmarkExecutor`, `TestPrepareAndScheduleBatchNoBlock`, `TestFailFastDuringBenchmarkFill`, and `TestFillPhaseEndToEnd`.
Test coverage for slow-start ramp behavior `tests/unittest/_torch/executor/test_benchmark_disagg.py`	Added test methods verifying first iteration caps at `tp_size`, subsequent iterations double and saturate at `total_max`, ramp state resets after fill phase completes, and warmup bypasses throttling while leaving ramp state unchanged; tightened non-fill-phase assertions.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: replacing a fixed throttle with a slow-start ramp algorithm for disaggregated fill admission.
Description check	✅ Passed	The PR description is comprehensive and complete, covering the problem statement, rationale, solution details, test coverage, and all checklist items.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/unittest/_torch/executor/test_benchmark_disagg.py (1)
488-1153: QA list update not needed for this PR scope.

These changes are unit-test-only (tests/unittest/...), so updating tests/integration/test_lists/qa/* is unnecessary in this PR.

As per coding guidelines: “If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/executor/test_benchmark_disagg.py` around lines 488 -
1153, Add an explicit QA-list note to the PR (or the commit message) stating
that QA list updates are unnecessary because the changes only touch unit tests
(e.g., files under tests/unittest, such as classes
TestPrepareAndScheduleBatchNoBlock and TestBenchmarkFillAdmissionFlowControl in
test_benchmark_disagg.py); ensure the PR description includes a one-line
sentence like "QA list update not required: this PR only modifies unit tests" so
reviewers can quickly see QA gating is not needed.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unittest/_torch/executor/test_benchmark_disagg.py`:
- Around line 488-1153: Add an explicit QA-list note to the PR (or the commit
message) stating that QA list updates are unnecessary because the changes only
touch unit tests (e.g., files under tests/unittest, such as classes
TestPrepareAndScheduleBatchNoBlock and TestBenchmarkFillAdmissionFlowControl in
test_benchmark_disagg.py); ensure the PR description includes a one-line
sentence like "QA list update not required: this PR only modifies unit tests" so
reviewers can quickly see QA gating is not needed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f41ed635-2d62-4cce-9271-e2a729c12291

📥 Commits

Reviewing files that changed from the base of the PR and between f02cc22 and dddf296.

📒 Files selected for processing (2)

tensorrt_llm/_torch/pyexecutor/py_executor.py
tests/unittest/_torch/executor/test_benchmark_disagg.py

tensorrt-cicd · 2026-05-27T16:11:02Z

PR_Github #50569 [ run ] completed with state SUCCESS. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #40069 completed with status: 'SUCCESS'

CI Report

Link to invocation

chenfeiz0326 · 2026-05-28T07:42:21Z

/bot run --disable-fail-fast --stage-list "DGX_B200-16_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU8-Post-Merge-2,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-2"

tensorrt-cicd · 2026-05-28T07:47:57Z

PR_Github #50738 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

tensorrt-cicd · 2026-05-28T10:18:35Z

PR_Github #50738 [ run ] completed with state SUCCESS. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #40218 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chenfeiz0326 · 2026-05-29T01:33:27Z

/bot run --disable-fail-fast --stage-list "DGX_B200-16_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU8-Post-Merge-2"

tensorrt-cicd · 2026-05-29T01:39:16Z

PR_Github #50925 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

tensorrt-cicd · 2026-05-29T02:51:19Z

PR_Github #50925 [ run ] completed with state SUCCESS. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #40386 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

chienchunhung · 2026-05-29T17:24:57Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2026-05-29T17:30:40Z

PR_Github #51078 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

tensorrt-cicd · 2026-05-29T17:31:28Z

PR_Github #51078 [ run ] completed with state FAILURE. Commit: dddf296

Link to invocation

brb-nv

Logic looks good to me. But, the comments will largely be out of context in the codebase once MR lands. Let's clean them up.

…14475 Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

chienchunhung · 2026-05-29T18:30:31Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2026-05-29T18:36:52Z

PR_Github #51091 [ run ] triggered by Bot. Commit: 77072f5 Link to invocation

brb-nv

LGTM.

Tabrizian

Discussed with @chienchunhung offline, looks like this is a false positive bug. @chienchunhung will follow up with QA to update the bug condition so that we can remove this logic once QA metrics are updated.

chienchunhung · 2026-05-30T01:58:07Z

PR#14438 (by @chenfeiz0326 ) is addressing the metric mis-calculation issue for gen-only benchmark. cc: @Tabrizian

tensorrt-cicd · 2026-05-30T02:56:11Z

PR_Github #51091 [ run ] completed with state SUCCESS. Commit: 77072f5
/LLM/main/L0_MergeRequest_PR pipeline #40530 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chienchunhung · 2026-05-30T03:59:26Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2026-05-30T04:06:10Z

PR_Github #51154 [ run ] triggered by Bot. Commit: 77072f5 Link to invocation

tensorrt-cicd · 2026-05-30T09:31:00Z

PR_Github #51154 [ run ] completed with state SUCCESS. Commit: 77072f5
/LLM/main/L0_MergeRequest_PR pipeline #40588 completed with status: 'SUCCESS'

CI Report

Link to invocation

[https://nvbugs/6204488][fix] Replace fixed disagg fill throttle with…

dddf296

… slow-start ramp Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

github-actions Bot assigned chienchunhung May 22, 2026

chienchunhung marked this pull request as ready for review May 27, 2026 15:40

chienchunhung requested a review from a team as a code owner May 27, 2026 15:40

chienchunhung requested review from Tabrizian and pcastonguay May 27, 2026 15:40

chienchunhung requested a review from brb-nv May 27, 2026 15:42

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

brb-nv reviewed May 29, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/pyexecutor/py_executor.py

Comment thread tensorrt_llm/_torch/pyexecutor/py_executor.py Outdated

Comment thread tests/unittest/_torch/executor/test_benchmark_disagg.py Outdated

Comment thread tests/unittest/_torch/executor/test_benchmark_disagg.py Outdated

[https://nvbugs/6204488][chore] Address review comments on PR NVIDIA#…

77072f5

…14475 Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

brb-nv approved these changes May 29, 2026

View reviewed changes

Tabrizian approved these changes May 29, 2026

View reviewed changes

chienchunhung enabled auto-merge (squash) May 29, 2026 22:33

chienchunhung merged commit f20858c into NVIDIA:main May 30, 2026
8 checks passed

Conversation

chienchunhung commented May 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Summary

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

chienchunhung commented May 23, 2026

Uh oh!

tensorrt-cicd commented May 23, 2026

Uh oh!

tensorrt-cicd commented May 23, 2026

Uh oh!

chienchunhung commented May 23, 2026

Uh oh!

tensorrt-cicd commented May 23, 2026

Uh oh!

tensorrt-cicd commented May 23, 2026

Uh oh!

chienchunhung commented May 24, 2026

Uh oh!

tensorrt-cicd commented May 24, 2026

Uh oh!

chienchunhung commented May 24, 2026

Uh oh!

tensorrt-cicd commented May 24, 2026

Uh oh!

chienchunhung commented May 24, 2026

Uh oh!

tensorrt-cicd commented May 24, 2026

Uh oh!

tensorrt-cicd commented May 24, 2026

Uh oh!

chenfeiz0326 commented May 27, 2026

Uh oh!

tensorrt-cicd commented May 27, 2026

Uh oh!

tensorrt-cicd commented May 27, 2026

Uh oh!

chenfeiz0326 commented May 27, 2026

Uh oh!

chienchunhung commented May 27, 2026

Uh oh!

tensorrt-cicd commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented May 27, 2026

Uh oh!

chenfeiz0326 commented May 28, 2026

Uh oh!

tensorrt-cicd commented May 28, 2026

Uh oh!

tensorrt-cicd commented May 28, 2026

Uh oh!

chenfeiz0326 commented May 29, 2026

Uh oh!

tensorrt-cicd commented May 29, 2026

Uh oh!

tensorrt-cicd commented May 29, 2026

Uh oh!

chienchunhung commented May 29, 2026

Uh oh!

tensorrt-cicd commented May 29, 2026

Uh oh!

tensorrt-cicd commented May 29, 2026

Uh oh!

brb-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chienchunhung commented May 22, 2026 •

edited by coderabbitai Bot

Loading