[https://nvbugs/6204488][fix] Replace fixed disagg fill throttle with slow-start ramp#14475
Conversation
… slow-start ramp Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #50018 [ run ] triggered by Bot. Commit: |
|
PR_Github #50018 [ run ] completed with state
|
|
/bot run --disable-fail-fast --add-multi-gpu-test |
|
PR_Github #50058 [ run ] triggered by Bot. Commit: |
|
PR_Github #50058 [ run ] completed with state
|
|
/bot run --disable-fail-fast --add-multi-gpu-test --skip-test "TestQwen3_5_35B_A3B::test_fp8[enable_block_reuse=True]" |
|
PR_Github #50070 Bot args parsing error: usage: /bot [-h] |
|
/bot run --disable-fail-fast --add-multi-gpu-test --skip-test "test_fp8[enable_block_reuse=True]" |
|
PR_Github #50075 Bot args parsing error: usage: /bot [-h] |
|
/bot run --disable-fail-fast --add-multi-gpu-test |
|
PR_Github #50079 [ run ] triggered by Bot. Commit: |
|
PR_Github #50079 [ run ] completed with state |
|
/bot run --disable-fail-fast --stage-list "*B200*PerfSanity*,*B300*PerfSanity*,*GB200*PerfSanity*" |
|
PR_Github #50426 [ run ] triggered by Bot. Commit: |
|
PR_Github #50426 [ run ] completed with state
|
|
/bot run --disable-fail-fast --add-multi-gpu-test |
|
PR_Github #50569 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughThis PR implements a slow-start admission control algorithm for the benchmark disaggregated fill phase. Instead of using a fixed throttling cap, the executor now dynamically increases the per-iteration admission cap by doubling it each iteration until it saturates at a global maximum. The ramp state resets when the fill phase completes. ChangesSlow-start admission throttle for benchmark disaggregated fill
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unittest/_torch/executor/test_benchmark_disagg.py (1)
488-1153: QA list update not needed for this PR scope.These changes are unit-test-only (
tests/unittest/...), so updatingtests/integration/test_lists/qa/*is unnecessary in this PR.As per coding guidelines: “If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/_torch/executor/test_benchmark_disagg.py` around lines 488 - 1153, Add an explicit QA-list note to the PR (or the commit message) stating that QA list updates are unnecessary because the changes only touch unit tests (e.g., files under tests/unittest, such as classes TestPrepareAndScheduleBatchNoBlock and TestBenchmarkFillAdmissionFlowControl in test_benchmark_disagg.py); ensure the PR description includes a one-line sentence like "QA list update not required: this PR only modifies unit tests" so reviewers can quickly see QA gating is not needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unittest/_torch/executor/test_benchmark_disagg.py`:
- Around line 488-1153: Add an explicit QA-list note to the PR (or the commit
message) stating that QA list updates are unnecessary because the changes only
touch unit tests (e.g., files under tests/unittest, such as classes
TestPrepareAndScheduleBatchNoBlock and TestBenchmarkFillAdmissionFlowControl in
test_benchmark_disagg.py); ensure the PR description includes a one-line
sentence like "QA list update not required: this PR only modifies unit tests" so
reviewers can quickly see QA gating is not needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f41ed635-2d62-4cce-9271-e2a729c12291
📒 Files selected for processing (2)
tensorrt_llm/_torch/pyexecutor/py_executor.pytests/unittest/_torch/executor/test_benchmark_disagg.py
|
PR_Github #50569 [ run ] completed with state |
|
/bot run --disable-fail-fast --stage-list "DGX_B200-16_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU8-Post-Merge-2,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-2" |
|
PR_Github #50738 [ run ] triggered by Bot. Commit: |
|
PR_Github #50738 [ run ] completed with state
|
|
/bot run --disable-fail-fast --stage-list "DGX_B200-16_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU8-Post-Merge-2" |
|
PR_Github #50925 [ run ] triggered by Bot. Commit: |
|
PR_Github #50925 [ run ] completed with state |
|
/bot run --disable-fail-fast --add-multi-gpu-test |
|
PR_Github #51078 [ run ] triggered by Bot. Commit: |
|
PR_Github #51078 [ run ] completed with state |
brb-nv
left a comment
There was a problem hiding this comment.
Logic looks good to me. But, the comments will largely be out of context in the codebase once MR lands. Let's clean them up.
…14475 Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
|
/bot run --disable-fail-fast --add-multi-gpu-test |
|
PR_Github #51091 [ run ] triggered by Bot. Commit: |
Tabrizian
left a comment
There was a problem hiding this comment.
Discussed with @chienchunhung offline, looks like this is a false positive bug. @chienchunhung will follow up with QA to update the bug condition so that we can remove this logic once QA metrics are updated.
|
PR#14438 (by @chenfeiz0326 ) is addressing the metric mis-calculation issue for gen-only benchmark. cc: @Tabrizian |
|
PR_Github #51091 [ run ] completed with state
|
|
/bot run --disable-fail-fast --add-multi-gpu-test |
|
PR_Github #51154 [ run ] triggered by Bot. Commit: |
|
PR_Github #51154 [ run ] completed with state |
Summary by CodeRabbit
Improvements
Tests
Summary
PR #13347's fixed
tp_size/iter cap on benchmark disagg fill admission stretched fill into a long ramp at high concurrency (con/tp_sizeiterations), regressing post-merge perf-sanity gen-only configs by 7–13 % on output token throughput across DeepSeek-R1, GPT-OSS, and Kimi-K2.5 1k1k configurations on B200 / GB200 / GB300.This PR replaces the fixed cap with a doubling slow-start ramp:
tp_sizerequests — the load-bearing iter-0 burst protection against PR [None][fix] return an explicit error if the requests can't be schedul… #12206's insufficient-KV fail-fast under transient ADP-router imbalance.total_maxwithinceil(log2(total_max / tp_size))iters (~10 iters / ~1 s for con=4096, tp=8).Net effect:
tp_sizeexactly as before.Test Coverage
Unit tests —
tests/unittest/_torch/executor/test_benchmark_disagg.pyTestBenchmarkFillAdmissionFlowControl(rewritten) — five tests pinning the ramp shape:test_first_iter_caps_at_tp_size(iter-0 burst protection invariant, parametrizedtp_size ∈ {1, 4, 32}).test_ramp_doubles_each_iter_and_saturates_at_total_max(full ramp shape across iters, parametrizedtp_size ∈ {1, 4, 8, 32}; also assertsO(log2)convergence).test_no_fill_phase_uses_full_available_capacity(throttle disengages once the gate opens).test_ramp_state_resets_when_fill_phase_ends(back-to-back benchmarks restart attp_size).test_warmup_skips_throttle(warmup bypasses the ramp).TestFillPhaseEndToEnd::test_full_lifecycle(existing) — Phase 0 still asserts iter-0 admission caps atTP_SIZE, confirming the ramp's first iter matches the original fixed-cap behavior the lifecycle was written against.Integration tests —
tests/integration/defs/disaggregated/test_disaggregated.py::test_disaggregated_benchmark_gen_only_insufficient_kv. This was the test that exercised PR [https://nvbugs/6093911][fix] Fix disagg gen-only benchmark hang under ADP router imbalance #13347's original repro; the slow-start preserves its iter-0 behavior so it remains green.PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.