Skip to content

[https://nvbugs/5963665][refactor] Refactor warmup orchestration in M…#12407

Open
liji-nv wants to merge 1 commit intoNVIDIA:mainfrom
liji-nv:warmup-refactor
Open

[https://nvbugs/5963665][refactor] Refactor warmup orchestration in M…#12407
liji-nv wants to merge 1 commit intoNVIDIA:mainfrom
liji-nv:warmup-refactor

Conversation

@liji-nv
Copy link
Collaborator

@liji-nv liji-nv commented Mar 20, 2026

…odelEngine

Refactor the warmup flow to improve clarity and structure:

  • Extract _get_max_shape_warmup_requests and _get_full_general_warmup_requests to separate warmup config construction from execution
  • Replace _run_torch_compile_warmup with a unified _general_warmup call that handles both torch.compile specialization and memory pool pre-population
  • Consolidate the can_run_general_warmup condition into a named variable for readability
  • Fix deduplication in warmup config list construction

Summary by CodeRabbit

  • Refactor
    • Improved internal warmup configuration generation and specialization process to enhance model engine efficiency.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@liji-nv liji-nv requested a review from a team as a code owner March 20, 2026 08:09
@liji-nv liji-nv requested a review from shaharmor98 March 20, 2026 08:09
@liji-nv
Copy link
Collaborator Author

liji-nv commented Mar 20, 2026

/bot run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

Refactored the model engine's warmup flow by adding two helper methods for deriving warmup request configurations, restructuring the warmup() method to conditionally execute general warmup, and updating the _general_warmup() API to accept pre-computed warmup configurations instead of internally computing them.

Changes

Cohort / File(s) Summary
Warmup Logic Refactoring
tensorrt_llm/_torch/pyexecutor/model_engine.py
Added _get_max_shape_warmup_requests() and _get_full_general_warmup_requests() helper methods for computing warmup configurations. Refactored warmup() to conditionally run general warmup before autotuner/CUDA graph warmups. Updated _general_warmup() API from accepting a reverse flag to accepting an ordered list of warmup_requests_configs. Removed _run_torch_compile_warmup() method and related extra memory-pool warmup logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is partially related to the changeset—it references the main refactoring effort but is incomplete, ending with 'in M…' and lacking the full summary. Complete the title to clearly state the main change, e.g., '[https://nvbugs/5963665][refactor] Refactor warmup orchestration in ModelEngine' without truncation.
Description check ❓ Inconclusive The PR description provides objectives but lacks required template sections (Description details and Test Coverage specifics). Add detailed explanation under 'Description' section about what was changed and why, and list specific test cases under 'Test Coverage' that validate the refactoring.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/model_engine.py`:
- Around line 683-688: The curr_max_num_tokens calculation is using
get_num_available_tokens with max_num_draft_tokens=self.original_max_draft_len
and no batch_size which differs from the later warmup path; change the call to
kv_cache_manager.get_num_available_tokens to pass the real constraints used for
warmups — use max_num_draft_tokens=self.max_total_draft_tokens and include
batch_size=self.batch_size (and keep token_num_upper_bound as-is) so
curr_max_num_tokens reflects actual KV limits and avoids generating impossible
“max-shape” configs.
- Around line 748-752: The new pre-specialization call currently invokes
_general_warmup (via forward) while cuda graph capture may still be enabled,
which can prematurely populate the CUDA-graph cache; change the sequence so the
pre-specialization pass created by _get_full_general_warmup_requests runs with
CUDA-graph capture disabled (e.g., temporarily set cuda_graph_runner.enabled =
False or use a non-capturing runner/context around the call to
self._general_warmup), then restore the original cuda_graph_runner state and
afterwards call self._run_cuda_graph_warmup() so the dedicated CUDA-graph warmup
is the only path that populates graph cache. Ensure you reference and modify the
runner enable flag around calls to _general_warmup, and keep forward() unchanged
except for observing the temporary non-capturing runner state.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8861ba34-93f9-4691-9252-05fb84e1dd51

📥 Commits

Reviewing files that changed from the base of the PR and between 4197e19 and ac89ebb.

📒 Files selected for processing (1)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39719 [ run ] triggered by Bot. Commit: ac89ebb Link to invocation

…odelEngine

Refactor the warmup flow to improve clarity and structure:
- Extract _get_max_shape_warmup_requests and _get_full_general_warmup_requests
  to separate warmup config construction from execution
- Replace _run_torch_compile_warmup with a unified _general_warmup call that
  handles both torch.compile specialization and memory pool pre-population
- Consolidate the can_run_general_warmup condition into a named variable for
  readability
- Fix deduplication in warmup config list construction

Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
@liji-nv
Copy link
Collaborator Author

liji-nv commented Mar 20, 2026

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39720 [ run ] triggered by Bot. Commit: 850f28a Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39720 [ run ] completed with state SUCCESS. Commit: 850f28a
/LLM/main/L0_MergeRequest_PR pipeline #30917 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants