[#13561][fix] AutoDeploy: forward garbage_collection_gen0_threshold to PyExecutor by MrGeva · Pull Request #14218 · NVIDIA/TensorRT-LLM

MrGeva · 2026-05-17T12:21:30Z

PT BE passes llm_args.garbage_collection_gen0_threshold (TorchLlmArgs default: 20000) into PyExecutor, which wraps the executor loop with customized_gc_thresholds and calls gc.set_threshold(gen0_threshold).

AD's create_autodeploy_executor() did not forward this field, so PyExecutor saw None and the context manager became a no-op. AD then ran at Python's default gen-0 threshold of 700 (~28x more frequent), and the resulting gen-2 collections periodically stalled decode for 0.5-1s on heavy models (FX graph + CUDA-graph wrappers + mamba/SSM caches + MoE state), producing large ITL spikes and substantial run-to-run throughput variance (e.g. ~286 vs ~342 toks/s/user on Nemotron-3-Nano-30B-A3B-FP8 TP=4).

Forward the field from ad_config; the existing TorchLlmArgs default is honored and users can still override it from YAML.

Pref Improvement in Nano V3 1k/1k , Concurrency 1 TP4:
t/s/u: 295 --> 346
otps: 282->336

Summary by CodeRabbit

Release Notes

Chores
- Enhanced executor configuration flexibility for improved memory management optimization.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

MrGeva · 2026-05-17T12:30:07Z

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

coderabbitai · 2026-05-17T12:30:42Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 32471b04-95b3-422f-8ba9-d32e2f1290c6

📥 Commits

Reviewing files that changed from the base of the PR and between 667bec0 and a051c10.

📒 Files selected for processing (1)

tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py

📝 Walkthrough

Walkthrough

The PR wires garbage collection Gen0 threshold configuration through to the PyExecutor constructor initialization in the auto-deploy executor factory function, enabling configuration of garbage collection behavior during executor setup.

Changes

Garbage collection Gen0 threshold configuration wiring

Layer / File(s)	Summary
PyExecutor garbage collection Gen0 threshold initialization `tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py`	`create_autodeploy_executor` passes `ad_config.garbage_collection_gen0_threshold` as a constructor argument to `PyExecutor` initialization.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#13532: Both PRs modify the create_autodeploy_executor and PyExecutor construction in ad_executor.py, with overlapping executor configuration changes.

Suggested reviewers

govind-ramnarayan
nvchenghaoz
taylor-yb-lee

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: forwarding garbage_collection_gen0_threshold to PyExecutor, which directly corresponds to the code change in the pull request.
Description check	✅ Passed	The PR description provides detailed context about the problem, the fix, and performance improvements, but does not follow the template structure with explicit sections.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2026-05-17T12:36:24Z

PR_Github #48753 [ run ] triggered by Bot. Commit: a051c10 Link to invocation

galagam

Good find!!

tensorrt-cicd · 2026-05-17T20:53:28Z

PR_Github #48753 [ run ] completed with state FAILURE. Commit: a051c10
/LLM/main/L0_MergeRequest_PR pipeline #38520 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

…PyExecutor PT BE passes llm_args.garbage_collection_gen0_threshold (TorchLlmArgs default: 20000) into PyExecutor, which wraps the executor loop with customized_gc_thresholds and calls gc.set_threshold(gen0_threshold). AD's create_autodeploy_executor() did not forward this field, so PyExecutor saw None and the context manager became a no-op. AD then ran at Python's default gen-0 threshold of 700 (~28x more frequent), and the resulting gen-2 collections periodically stalled decode for 0.5-1s on heavy models (FX graph + CUDA-graph wrappers + mamba/SSM caches + MoE state), producing large ITL spikes and substantial run-to-run throughput variance (e.g. ~286 vs ~342 toks/s/user on Nemotron-3-Nano-30B-A3B-FP8 TP=4). Forward the field from ad_config; the existing TorchLlmArgs default is honored and users can still override it from YAML. Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

MrGeva · 2026-05-18T13:01:41Z

/bot run

MrGeva · 2026-05-18T13:02:10Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-18T13:07:42Z

PR_Github #48910 [ run ] triggered by Bot. Commit: 9e2d0df Link to invocation

tensorrt-cicd · 2026-05-18T13:09:38Z

PR_Github #48912 [ run ] triggered by Bot. Commit: 9e2d0df Link to invocation

tensorrt-cicd · 2026-05-18T17:33:29Z

PR_Github #48912 [ run ] completed with state SUCCESS. Commit: 9e2d0df
/LLM/main/L0_MergeRequest_PR pipeline #38661 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

MrGeva · 2026-05-18T19:15:33Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-18T19:21:11Z

PR_Github #48973 [ run ] triggered by Bot. Commit: 9e2d0df Link to invocation

tensorrt-cicd · 2026-05-18T20:10:32Z

PR_Github #48973 [ run ] completed with state SUCCESS. Commit: 9e2d0df
/LLM/main/L0_MergeRequest_PR pipeline #38717 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

MrGeva · 2026-05-19T04:38:09Z

/bot run --disable-fail-fast --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

tensorrt-cicd · 2026-05-19T04:45:42Z

PR_Github #49084 [ run ] triggered by Bot. Commit: 9e2d0df Link to invocation

tensorrt-cicd · 2026-05-19T16:20:52Z

PR_Github #49084 [ run ] completed with state SUCCESS. Commit: 9e2d0df
/LLM/main/L0_MergeRequest_PR pipeline #38804 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

tburt-nv · 2026-05-19T18:48:11Z

/bot skip --comment "test_aggregate_counters_match_expected was removed"

tensorrt-cicd · 2026-05-19T18:55:04Z

PR_Github #49253 [ skip ] triggered by Bot. Commit: 9e2d0df Link to invocation

tensorrt-cicd · 2026-05-19T19:02:44Z

PR_Github #49253 [ skip ] completed with state SUCCESS. Commit: 9e2d0df
Skipping testing for commit 9e2d0df

Link to invocation

github-actions Bot assigned MrGeva May 17, 2026

MrGeva force-pushed the eg/ad-fix-gc-threshold branch 2 times, most recently from 20d732a to a051c10 Compare May 17, 2026 12:26

MrGeva marked this pull request as ready for review May 17, 2026 12:28

MrGeva requested a review from a team as a code owner May 17, 2026 12:28

MrGeva requested a review from galagam May 17, 2026 12:28

MrGeva mentioned this pull request May 17, 2026

[AutoDeploy] BTK findings: Nano doesn't scale well for tp>1 #13561

Closed

1 task

galagam approved these changes May 17, 2026

View reviewed changes

MrGeva enabled auto-merge (squash) May 17, 2026 15:11

MrGeva force-pushed the eg/ad-fix-gc-threshold branch from a051c10 to 9e2d0df Compare May 18, 2026 08:03

MrGeva merged commit 1c2c5c3 into NVIDIA:main May 19, 2026
7 checks passed

Conversation

MrGeva commented May 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

MrGeva commented May 17, 2026

Uh oh!

coderabbitai Bot commented May 17, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

galagam left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

MrGeva commented May 18, 2026

Uh oh!

MrGeva commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

MrGeva commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

MrGeva commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tburt-nv commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MrGeva commented May 17, 2026 •

edited by coderabbitai Bot

Loading