Skip to content

[#13561][fix] AutoDeploy: forward garbage_collection_gen0_threshold to PyExecutor#14218

Merged
MrGeva merged 1 commit into
NVIDIA:mainfrom
nv-auto-deploy:eg/ad-fix-gc-threshold
May 19, 2026
Merged

[#13561][fix] AutoDeploy: forward garbage_collection_gen0_threshold to PyExecutor#14218
MrGeva merged 1 commit into
NVIDIA:mainfrom
nv-auto-deploy:eg/ad-fix-gc-threshold

Conversation

@MrGeva
Copy link
Copy Markdown
Collaborator

@MrGeva MrGeva commented May 17, 2026

PT BE passes llm_args.garbage_collection_gen0_threshold (TorchLlmArgs default: 20000) into PyExecutor, which wraps the executor loop with customized_gc_thresholds and calls gc.set_threshold(gen0_threshold).

AD's create_autodeploy_executor() did not forward this field, so PyExecutor saw None and the context manager became a no-op. AD then ran at Python's default gen-0 threshold of 700 (~28x more frequent), and the resulting gen-2 collections periodically stalled decode for 0.5-1s on heavy models (FX graph + CUDA-graph wrappers + mamba/SSM caches + MoE state), producing large ITL spikes and substantial run-to-run throughput variance (e.g. ~286 vs ~342 toks/s/user on Nemotron-3-Nano-30B-A3B-FP8 TP=4).

Forward the field from ad_config; the existing TorchLlmArgs default is honored and users can still override it from YAML.

Pref Improvement in Nano V3 1k/1k , Concurrency 1 TP4:
t/s/u: 295 --> 346
otps: 282->336

Summary by CodeRabbit

Release Notes

  • Chores
    • Enhanced executor configuration flexibility for improved memory management optimization.

Review Change Stack

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@MrGeva MrGeva force-pushed the eg/ad-fix-gc-threshold branch 2 times, most recently from 20d732a to a051c10 Compare May 17, 2026 12:26
@MrGeva MrGeva marked this pull request as ready for review May 17, 2026 12:28
@MrGeva MrGeva requested a review from a team as a code owner May 17, 2026 12:28
@MrGeva MrGeva requested a review from galagam May 17, 2026 12:28
@MrGeva
Copy link
Copy Markdown
Collaborator Author

MrGeva commented May 17, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 32471b04-95b3-422f-8ba9-d32e2f1290c6

📥 Commits

Reviewing files that changed from the base of the PR and between 667bec0 and a051c10.

📒 Files selected for processing (1)
  • tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py

📝 Walkthrough

Walkthrough

The PR wires garbage collection Gen0 threshold configuration through to the PyExecutor constructor initialization in the auto-deploy executor factory function, enabling configuration of garbage collection behavior during executor setup.

Changes

Garbage collection Gen0 threshold configuration wiring

Layer / File(s) Summary
PyExecutor garbage collection Gen0 threshold initialization
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py
create_autodeploy_executor passes ad_config.garbage_collection_gen0_threshold as a constructor argument to PyExecutor initialization.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

  • NVIDIA/TensorRT-LLM#13532: Both PRs modify the create_autodeploy_executor and PyExecutor construction in ad_executor.py, with overlapping executor configuration changes.

Suggested reviewers

  • govind-ramnarayan
  • nvchenghaoz
  • taylor-yb-lee
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: forwarding garbage_collection_gen0_threshold to PyExecutor, which directly corresponds to the code change in the pull request.
Description check ✅ Passed The PR description provides detailed context about the problem, the fix, and performance improvements, but does not follow the template structure with explicit sections.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48753 [ run ] triggered by Bot. Commit: a051c10 Link to invocation

Copy link
Copy Markdown
Collaborator

@galagam galagam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find!!

@MrGeva MrGeva enabled auto-merge (squash) May 17, 2026 15:11
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48753 [ run ] completed with state FAILURE. Commit: a051c10
/LLM/main/L0_MergeRequest_PR pipeline #38520 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

…PyExecutor

PT BE passes llm_args.garbage_collection_gen0_threshold (TorchLlmArgs default:
20000) into PyExecutor, which wraps the executor loop with customized_gc_thresholds
and calls gc.set_threshold(gen0_threshold).

AD's create_autodeploy_executor() did not forward this field, so PyExecutor saw
None and the context manager became a no-op. AD then ran at Python's default
gen-0 threshold of 700 (~28x more frequent), and the resulting gen-2 collections
periodically stalled decode for 0.5-1s on heavy models (FX graph + CUDA-graph
wrappers + mamba/SSM caches + MoE state), producing large ITL spikes and
substantial run-to-run throughput variance (e.g. ~286 vs ~342 toks/s/user on
Nemotron-3-Nano-30B-A3B-FP8 TP=4).

Forward the field from ad_config; the existing TorchLlmArgs default is honored
and users can still override it from YAML.

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
@MrGeva MrGeva force-pushed the eg/ad-fix-gc-threshold branch from a051c10 to 9e2d0df Compare May 18, 2026 08:03
@MrGeva
Copy link
Copy Markdown
Collaborator Author

MrGeva commented May 18, 2026

/bot run

@MrGeva
Copy link
Copy Markdown
Collaborator Author

MrGeva commented May 18, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48910 [ run ] triggered by Bot. Commit: 9e2d0df Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48912 [ run ] triggered by Bot. Commit: 9e2d0df Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48912 [ run ] completed with state SUCCESS. Commit: 9e2d0df
/LLM/main/L0_MergeRequest_PR pipeline #38661 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@MrGeva
Copy link
Copy Markdown
Collaborator Author

MrGeva commented May 18, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48973 [ run ] triggered by Bot. Commit: 9e2d0df Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48973 [ run ] completed with state SUCCESS. Commit: 9e2d0df
/LLM/main/L0_MergeRequest_PR pipeline #38717 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@MrGeva
Copy link
Copy Markdown
Collaborator Author

MrGeva commented May 19, 2026

/bot run --disable-fail-fast --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49084 [ run ] triggered by Bot. Commit: 9e2d0df Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49084 [ run ] completed with state SUCCESS. Commit: 9e2d0df
/LLM/main/L0_MergeRequest_PR pipeline #38804 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@tburt-nv
Copy link
Copy Markdown
Collaborator

/bot skip --comment "test_aggregate_counters_match_expected was removed"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49253 [ skip ] triggered by Bot. Commit: 9e2d0df Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49253 [ skip ] completed with state SUCCESS. Commit: 9e2d0df
Skipping testing for commit 9e2d0df

Link to invocation

@MrGeva MrGeva merged commit 1c2c5c3 into NVIDIA:main May 19, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants