[https://nvbugs/5963665][refactor] Refactor warmup orchestration in M… by liji-nv · Pull Request #12407 · NVIDIA/TensorRT-LLM

liji-nv · 2026-03-20T08:09:31Z

…odelEngine

Refactor the warmup flow to improve clarity and structure:

Extract _get_max_shape_warmup_requests and _get_full_general_warmup_requests to separate warmup config construction from execution
Replace _run_torch_compile_warmup with a unified _general_warmup call that handles both torch.compile specialization and memory pool pre-population
Consolidate the can_run_general_warmup condition into a named variable for readability
Fix deduplication in warmup config list construction

Summary by CodeRabbit

Refactor
- Improved internal warmup configuration generation and specialization process to enhance model engine efficiency.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

liji-nv · 2026-03-20T08:17:14Z

/bot run

coderabbitai · 2026-03-20T08:19:50Z

📝 Walkthrough

Walkthrough

Refactored the model engine's warmup flow by adding two helper methods for deriving warmup request configurations, restructuring the warmup() method to conditionally execute general warmup, and updating the _general_warmup() API to accept pre-computed warmup configurations instead of internally computing them.

Changes

Cohort / File(s)	Summary
Warmup Logic Refactoring `tensorrt_llm/_torch/pyexecutor/model_engine.py`	Added `_get_max_shape_warmup_requests()` and `_get_full_general_warmup_requests()` helper methods for computing warmup configurations. Refactored `warmup()` to conditionally run general warmup before autotuner/CUDA graph warmups. Updated `_general_warmup()` API from accepting a `reverse` flag to accepting an ordered list of `warmup_requests_configs`. Removed `_run_torch_compile_warmup()` method and related extra memory-pool warmup logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title is partially related to the changeset—it references the main refactoring effort but is incomplete, ending with 'in M…' and lacking the full summary.	Complete the title to clearly state the main change, e.g., '[https://nvbugs/5963665][refactor] Refactor warmup orchestration in ModelEngine' without truncation.
Description check	❓ Inconclusive	The PR description provides objectives but lacks required template sections (Description details and Test Coverage specifics).	Add detailed explanation under 'Description' section about what was changed and why, and list specific test cases under 'Test Coverage' that validate the refactoring.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/model_engine.py`:
- Around line 683-688: The curr_max_num_tokens calculation is using
get_num_available_tokens with max_num_draft_tokens=self.original_max_draft_len
and no batch_size which differs from the later warmup path; change the call to
kv_cache_manager.get_num_available_tokens to pass the real constraints used for
warmups — use max_num_draft_tokens=self.max_total_draft_tokens and include
batch_size=self.batch_size (and keep token_num_upper_bound as-is) so
curr_max_num_tokens reflects actual KV limits and avoids generating impossible
“max-shape” configs.
- Around line 748-752: The new pre-specialization call currently invokes
_general_warmup (via forward) while cuda graph capture may still be enabled,
which can prematurely populate the CUDA-graph cache; change the sequence so the
pre-specialization pass created by _get_full_general_warmup_requests runs with
CUDA-graph capture disabled (e.g., temporarily set cuda_graph_runner.enabled =
False or use a non-capturing runner/context around the call to
self._general_warmup), then restore the original cuda_graph_runner state and
afterwards call self._run_cuda_graph_warmup() so the dedicated CUDA-graph warmup
is the only path that populates graph cache. Ensure you reference and modify the
runner enable flag around calls to _general_warmup, and keep forward() unchanged
except for observing the temporary non-capturing runner state.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8861ba34-93f9-4691-9252-05fb84e1dd51

📥 Commits

Reviewing files that changed from the base of the PR and between 4197e19 and ac89ebb.

📒 Files selected for processing (1)

tensorrt_llm/_torch/pyexecutor/model_engine.py

tensorrt_llm/_torch/pyexecutor/model_engine.py

tensorrt-cicd · 2026-03-20T08:23:37Z

PR_Github #39719 [ run ] triggered by Bot. Commit: ac89ebb Link to invocation

…odelEngine Refactor the warmup flow to improve clarity and structure: - Extract _get_max_shape_warmup_requests and _get_full_general_warmup_requests to separate warmup config construction from execution - Replace _run_torch_compile_warmup with a unified _general_warmup call that handles both torch.compile specialization and memory pool pre-population - Consolidate the can_run_general_warmup condition into a named variable for readability - Fix deduplication in warmup config list construction Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>

liji-nv · 2026-03-20T08:41:02Z

/bot run

tensorrt-cicd · 2026-03-20T08:48:31Z

PR_Github #39720 [ run ] triggered by Bot. Commit: 850f28a Link to invocation

tensorrt-cicd · 2026-03-20T14:50:25Z

PR_Github #39720 [ run ] completed with state SUCCESS. Commit: 850f28a
/LLM/main/L0_MergeRequest_PR pipeline #30917 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

liji-nv requested a review from a team as a code owner March 20, 2026 08:09

liji-nv requested a review from shaharmor98 March 20, 2026 08:09

github-actions bot assigned liji-nv Mar 20, 2026

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

tensorrt_llm/_torch/pyexecutor/model_engine.py Show resolved Hide resolved

tensorrt_llm/_torch/pyexecutor/model_engine.py Outdated Show resolved Hide resolved

liji-nv force-pushed the warmup-refactor branch from ac89ebb to 850f28a Compare March 20, 2026 08:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/5963665][refactor] Refactor warmup orchestration in M…#12407

[https://nvbugs/5963665][refactor] Refactor warmup orchestration in M…#12407
liji-nv wants to merge 1 commit intoNVIDIA:mainfrom
liji-nv:warmup-refactor

liji-nv commented Mar 20, 2026 •

edited

Loading

Uh oh!

liji-nv commented Mar 20, 2026

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

liji-nv commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liji-nv commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

liji-nv commented Mar 20, 2026

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

liji-nv commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liji-nv commented Mar 20, 2026 •

edited

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading