[#13561][fix] AutoDeploy: forward garbage_collection_gen0_threshold to PyExecutor#14218
Conversation
20d732a to
a051c10
Compare
|
/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR wires garbage collection Gen0 threshold configuration through to the ChangesGarbage collection Gen0 threshold configuration wiring
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
PR_Github #48753 [ run ] triggered by Bot. Commit: |
|
PR_Github #48753 [ run ] completed with state
|
…PyExecutor PT BE passes llm_args.garbage_collection_gen0_threshold (TorchLlmArgs default: 20000) into PyExecutor, which wraps the executor loop with customized_gc_thresholds and calls gc.set_threshold(gen0_threshold). AD's create_autodeploy_executor() did not forward this field, so PyExecutor saw None and the context manager became a no-op. AD then ran at Python's default gen-0 threshold of 700 (~28x more frequent), and the resulting gen-2 collections periodically stalled decode for 0.5-1s on heavy models (FX graph + CUDA-graph wrappers + mamba/SSM caches + MoE state), producing large ITL spikes and substantial run-to-run throughput variance (e.g. ~286 vs ~342 toks/s/user on Nemotron-3-Nano-30B-A3B-FP8 TP=4). Forward the field from ad_config; the existing TorchLlmArgs default is honored and users can still override it from YAML. Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
a051c10 to
9e2d0df
Compare
|
/bot run |
|
/bot run --disable-fail-fast |
|
PR_Github #48910 [ run ] triggered by Bot. Commit: |
|
PR_Github #48912 [ run ] triggered by Bot. Commit: |
|
PR_Github #48912 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #48973 [ run ] triggered by Bot. Commit: |
|
PR_Github #48973 [ run ] completed with state
|
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" |
|
PR_Github #49084 [ run ] triggered by Bot. Commit: |
|
PR_Github #49084 [ run ] completed with state
|
|
/bot skip --comment "test_aggregate_counters_match_expected was removed" |
|
PR_Github #49253 [ skip ] triggered by Bot. Commit: |
|
PR_Github #49253 [ skip ] completed with state |
PT BE passes llm_args.garbage_collection_gen0_threshold (TorchLlmArgs default: 20000) into PyExecutor, which wraps the executor loop with customized_gc_thresholds and calls gc.set_threshold(gen0_threshold).
AD's create_autodeploy_executor() did not forward this field, so PyExecutor saw None and the context manager became a no-op. AD then ran at Python's default gen-0 threshold of 700 (~28x more frequent), and the resulting gen-2 collections periodically stalled decode for 0.5-1s on heavy models (FX graph + CUDA-graph wrappers + mamba/SSM caches + MoE state), producing large ITL spikes and substantial run-to-run throughput variance (e.g. ~286 vs ~342 toks/s/user on Nemotron-3-Nano-30B-A3B-FP8 TP=4).
Forward the field from ad_config; the existing TorchLlmArgs default is honored and users can still override it from YAML.
Pref Improvement in Nano V3 1k/1k , Concurrency 1 TP4:
t/s/u: 295 --> 346
otps: 282->336
Summary by CodeRabbit
Release Notes
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.