Skip to content

[https://nvbugs/6204488][fix] Replace fixed disagg fill throttle with slow-start ramp#14475

Merged
chienchunhung merged 2 commits into
NVIDIA:mainfrom
chienchunhung:fix/disagg-fill-throttle-slow-start
May 30, 2026
Merged

[https://nvbugs/6204488][fix] Replace fixed disagg fill throttle with slow-start ramp#14475
chienchunhung merged 2 commits into
NVIDIA:mainfrom
chienchunhung:fix/disagg-fill-throttle-slow-start

Conversation

@chienchunhung
Copy link
Copy Markdown
Collaborator

@chienchunhung chienchunhung commented May 22, 2026

Summary by CodeRabbit

  • Improvements

    • Enhanced request admission logic during fill phases with a slow-start ramping mechanism for improved performance and throughput control.
  • Tests

    • Added comprehensive test coverage for slow-start admission behavior, including ramp initialization, iteration scaling, and phase transitions.

Review Change Stack

Summary

PR #13347's fixed tp_size/iter cap on benchmark disagg fill admission stretched fill into a long ramp at high concurrency (con/tp_size iterations), regressing post-merge perf-sanity gen-only configs by 7–13 % on output token throughput across DeepSeek-R1, GPT-OSS, and Kimi-K2.5 1k1k configurations on B200 / GB200 / GB300.
This PR replaces the fixed cap with a doubling slow-start ramp:

  • Iter 0 still admits at most tp_size requests — the load-bearing iter-0 burst protection against PR [None][fix] return an explicit error if the requests can't be schedul… #12206's insufficient-KV fail-fast under transient ADP-router imbalance.
  • Each subsequent iter doubles the cap.
  • Cap saturates at total_max within ceil(log2(total_max / tp_size)) iters (~10 iters / ~1 s for con=4096, tp=8).
    Net effect:
  • Original Kimi-K2-Thinking 8k1k ctx8/gen1 con=8192 repro continues to pass — iter 0 caps at tp_size exactly as before.
  • High-concurrency 1k1k post-merge configs no longer pay 7–13 % wall-clock overhead.

Test Coverage

Unit teststests/unittest/_torch/executor/test_benchmark_disagg.py

  • TestBenchmarkFillAdmissionFlowControl (rewritten) — five tests pinning the ramp shape:
    • test_first_iter_caps_at_tp_size (iter-0 burst protection invariant, parametrized tp_size ∈ {1, 4, 32}).
    • test_ramp_doubles_each_iter_and_saturates_at_total_max (full ramp shape across iters, parametrized tp_size ∈ {1, 4, 8, 32}; also asserts O(log2) convergence).
    • test_no_fill_phase_uses_full_available_capacity (throttle disengages once the gate opens).
    • test_ramp_state_resets_when_fill_phase_ends (back-to-back benchmarks restart at tp_size).
    • test_warmup_skips_throttle (warmup bypasses the ramp).
  • TestFillPhaseEndToEnd::test_full_lifecycle (existing) — Phase 0 still asserts iter-0 admission caps at TP_SIZE, confirming the ramp's first iter matches the original fixed-cap behavior the lifecycle was written against.
    Integration teststests/integration/defs/disaggregated/test_disaggregated.py::test_disaggregated_benchmark_gen_only_insufficient_kv. This was the test that exercised PR [https://nvbugs/6093911][fix] Fix disagg gen-only benchmark hang under ADP router imbalance #13347's original repro; the slow-start preserves its iter-0 behavior so it remains green.

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

… slow-start ramp

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50018 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50018 [ run ] completed with state FAILURE. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #39582 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50058 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50058 [ run ] completed with state SUCCESS. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #39616 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test --skip-test "TestQwen3_5_35B_A3B::test_fp8[enable_block_reuse=True]"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50070 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: TestQwen3_5_35B_A3B::test_fp8[enable_block_reuse=True]

Link to invocation

@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test --skip-test "test_fp8[enable_block_reuse=True]"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50075 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: test_fp8[enable_block_reuse=True]

Link to invocation

@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50079 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50079 [ run ] completed with state SUCCESS. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #39633 completed with status: 'SUCCESS'

CI Report

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast --stage-list "*B200*PerfSanity*,*B300*PerfSanity*,*GB200*PerfSanity*"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50426 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50426 [ run ] completed with state SUCCESS. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #39949 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator

With this PR, regressive cases can recover, like:
https://tensorrt-llm.tensorrt-llm-perf-ci-report.sc2-paas.nvidia.com/pre-merge?selectedBranches=main&selectedTests=gen_only-gb200_gpt-oss-120b-fp4_1k1k_con64_ctx1_tp1_gen1_tp4_eplb0_mtp0_ccb-NIXL-con64_iter1_isl1024_osl1024&selectedCurve=__all__&pinnedSection=gen_only-gb200_gpt-oss-120b-fp4_1k1k_con64_ctx1_tp1_gen1_tp4_eplb0_mtp0_ccb-NIXL-con64_iter1_isl1024_osl1024%7Cmain%7Cgb200&in_build-id-input=39949&in_job-name-select=LLM%2Fmain%2FL0_MergeRequest_PR

https://tensorrt-llm.tensorrt-llm-perf-ci-report.sc2-paas.nvidia.com/pre-merge?selectedBranches=main&selectedTests=gen_only-gb200_kimi-k25-thinking-fp4_1k1k_con4096_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-NIXL-con4096_iter1_isl1024_osl1024&selectedCurve=__all__&pinnedSection=gen_only-gb200_kimi-k25-thinking-fp4_1k1k_con4096_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-NIXL-con4096_iter1_isl1024_osl1024%7Cmain%7Cgb200&in_build-id-input=39949&in_job-name-select=LLM%2Fmain%2FL0_MergeRequest_PR

https://tensorrt-llm.tensorrt-llm-perf-ci-report.sc2-paas.nvidia.com/pre-merge?selectedBranches=main&selectedTests=gen_only-gb200_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-NIXL-con1024_iter1_isl1024_osl1024&selectedCurve=__all__&pinnedSection=gen_only-gb200_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-NIXL-con1024_iter1_isl1024_osl1024%7Cmain%7Cgb200&in_build-id-input=39949&in_job-name-select=LLM%2Fmain%2FL0_MergeRequest_PR

@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@chienchunhung chienchunhung marked this pull request as ready for review May 27, 2026 15:40
@chienchunhung chienchunhung requested a review from a team as a code owner May 27, 2026 15:40
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50569 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

@chienchunhung chienchunhung requested a review from brb-nv May 27, 2026 15:42
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

📝 Walkthrough

Walkthrough

This PR implements a slow-start admission control algorithm for the benchmark disaggregated fill phase. Instead of using a fixed throttling cap, the executor now dynamically increases the per-iteration admission cap by doubling it each iteration until it saturates at a global maximum. The ramp state resets when the fill phase completes.

Changes

Slow-start admission throttle for benchmark disaggregated fill

Layer / File(s) Summary
Slow-start admission ramp state and logic
tensorrt_llm/_torch/pyexecutor/py_executor.py
_fill_admit_cap field initialized to 0 in __init__; reset to 0 when fill gate opens in _check_benchmark_disagg_gate; _pop_from_waiting_queue reworked to initialize cap at tp_size and double it each iteration up to total_max.
Test infrastructure and stub initialization for ramp state
tests/unittest/_torch/executor/test_benchmark_disagg.py
Updated TestBenchmarkFillAdmissionFlowControl class docstring and added _expected_ramp() helper to compute expected admission-cap sequences; initialized _fill_admit_cap to 0 in multiple test executor stubs across MockBenchmarkExecutor, TestPrepareAndScheduleBatchNoBlock, TestFailFastDuringBenchmarkFill, and TestFillPhaseEndToEnd.
Test coverage for slow-start ramp behavior
tests/unittest/_torch/executor/test_benchmark_disagg.py
Added test methods verifying first iteration caps at tp_size, subsequent iterations double and saturate at total_max, ramp state resets after fill phase completes, and warmup bypasses throttling while leaving ramp state unchanged; tightened non-fill-phase assertions.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: replacing a fixed throttle with a slow-start ramp algorithm for disaggregated fill admission.
Description check ✅ Passed The PR description is comprehensive and complete, covering the problem statement, rationale, solution details, test coverage, and all checklist items.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unittest/_torch/executor/test_benchmark_disagg.py (1)

488-1153: QA list update not needed for this PR scope.

These changes are unit-test-only (tests/unittest/...), so updating tests/integration/test_lists/qa/* is unnecessary in this PR.

As per coding guidelines: “If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/executor/test_benchmark_disagg.py` around lines 488 -
1153, Add an explicit QA-list note to the PR (or the commit message) stating
that QA list updates are unnecessary because the changes only touch unit tests
(e.g., files under tests/unittest, such as classes
TestPrepareAndScheduleBatchNoBlock and TestBenchmarkFillAdmissionFlowControl in
test_benchmark_disagg.py); ensure the PR description includes a one-line
sentence like "QA list update not required: this PR only modifies unit tests" so
reviewers can quickly see QA gating is not needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unittest/_torch/executor/test_benchmark_disagg.py`:
- Around line 488-1153: Add an explicit QA-list note to the PR (or the commit
message) stating that QA list updates are unnecessary because the changes only
touch unit tests (e.g., files under tests/unittest, such as classes
TestPrepareAndScheduleBatchNoBlock and TestBenchmarkFillAdmissionFlowControl in
test_benchmark_disagg.py); ensure the PR description includes a one-line
sentence like "QA list update not required: this PR only modifies unit tests" so
reviewers can quickly see QA gating is not needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f41ed635-2d62-4cce-9271-e2a729c12291

📥 Commits

Reviewing files that changed from the base of the PR and between f02cc22 and dddf296.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tests/unittest/_torch/executor/test_benchmark_disagg.py

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50569 [ run ] completed with state SUCCESS. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #40069 completed with status: 'SUCCESS'

CI Report

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast --stage-list "DGX_B200-16_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU8-Post-Merge-2,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-2"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50738 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50738 [ run ] completed with state SUCCESS. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #40218 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast --stage-list "DGX_B200-16_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU8-Post-Merge-2"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50925 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50925 [ run ] completed with state SUCCESS. Commit: dddf296
/LLM/main/L0_MergeRequest_PR pipeline #40386 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51078 [ run ] triggered by Bot. Commit: dddf296 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51078 [ run ] completed with state FAILURE. Commit: dddf296

Link to invocation

Copy link
Copy Markdown
Collaborator

@brb-nv brb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good to me. But, the comments will largely be out of context in the codebase once MR lands. Let's clean them up.

Comment thread tensorrt_llm/_torch/pyexecutor/py_executor.py
Comment thread tensorrt_llm/_torch/pyexecutor/py_executor.py Outdated
Comment thread tests/unittest/_torch/executor/test_benchmark_disagg.py Outdated
Comment thread tests/unittest/_torch/executor/test_benchmark_disagg.py Outdated
…14475

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51091 [ run ] triggered by Bot. Commit: 77072f5 Link to invocation

Copy link
Copy Markdown
Collaborator

@brb-nv brb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Copy Markdown
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @chienchunhung offline, looks like this is a false positive bug. @chienchunhung will follow up with QA to update the bug condition so that we can remove this logic once QA metrics are updated.

@chienchunhung chienchunhung enabled auto-merge (squash) May 29, 2026 22:33
@chienchunhung
Copy link
Copy Markdown
Collaborator Author

PR#14438 (by @chenfeiz0326 ) is addressing the metric mis-calculation issue for gen-only benchmark. cc: @Tabrizian

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51091 [ run ] completed with state SUCCESS. Commit: 77072f5
/LLM/main/L0_MergeRequest_PR pipeline #40530 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51154 [ run ] triggered by Bot. Commit: 77072f5 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #51154 [ run ] completed with state SUCCESS. Commit: 77072f5
/LLM/main/L0_MergeRequest_PR pipeline #40588 completed with status: 'SUCCESS'

CI Report

Link to invocation

@chienchunhung chienchunhung merged commit f20858c into NVIDIA:main May 30, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants