[https://nvbugs/6015329][fix] Use model-level warmup cache key for visual gen pipelines#12516
Conversation
📝 WalkthroughWalkthroughThis change introduces a Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tensorrt_llm/_torch/visual_gen/pipeline.py (1)
443-448: Consider deduplicating warmup runs bywarmup_cache_keyto avoid redundant work.When multiple warmup shapes map to the same key (e.g., FLUX with different
num_frames), startup may run duplicate warmups.♻️ Suggested refactor
- for height, width, num_frames in shapes: - logger.info(f"Warmup: {height}x{width}, {num_frames} frames, {steps} steps") - self._run_warmup(height, width, num_frames, steps) - torch.cuda.synchronize() - - self._warmed_up_shapes = set(self.warmup_cache_key(h, w, f) for h, w, f in shapes) + warmed_keys = set() + for height, width, num_frames in shapes: + cache_key = self.warmup_cache_key(height, width, num_frames) + if cache_key in warmed_keys: + continue + logger.info(f"Warmup: {height}x{width}, {num_frames} frames, {steps} steps") + self._run_warmup(height, width, num_frames, steps) + torch.cuda.synchronize() + warmed_keys.add(cache_key) + + self._warmed_up_shapes = warmed_keys🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/visual_gen/pipeline.py` around lines 443 - 448, Multiple warmup shapes can map to the same warmup_cache_key causing duplicate warmup runs; deduplicate by computing warmup_cache_key(h,w,f) for each shape first, keep one representative (e.g., first) per key, then iterate only over those unique representatives to call self._run_warmup(height, width, num_frames, steps) and torch.cuda.synchronize(), and finally set self._warmed_up_shapes to the set of keys (using warmup_cache_key) that were actually run; refer to warmup_cache_key, _run_warmup, and _warmed_up_shapes to locate where to perform the deduplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tensorrt_llm/_torch/visual_gen/pipeline.py`:
- Around line 443-448: Multiple warmup shapes can map to the same
warmup_cache_key causing duplicate warmup runs; deduplicate by computing
warmup_cache_key(h,w,f) for each shape first, keep one representative (e.g.,
first) per key, then iterate only over those unique representatives to call
self._run_warmup(height, width, num_frames, steps) and torch.cuda.synchronize(),
and finally set self._warmed_up_shapes to the set of keys (using
warmup_cache_key) that were actually run; refer to warmup_cache_key,
_run_warmup, and _warmed_up_shapes to locate where to perform the deduplication.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f27d8d6a-4409-48c8-afc8-b6192e4c5e88
📒 Files selected for processing (4)
tensorrt_llm/_torch/visual_gen/executor.pytensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.pytensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux2.pytensorrt_llm/_torch/visual_gen/pipeline.py
eb5c714 to
e1e0b9e
Compare
|
/bot run |
|
PR_Github #40177 [ run ] triggered by Bot. Commit: |
|
PR_Github #40177 [ run ] completed with state
|
e1e0b9e to
7f2e2ea
Compare
FLUX is image-only but the warmup cache checked (height, width, num_frames), causing false cache misses when num_frames differed from the default. Add a warmup_cache_key() method that each pipeline can override — FLUX returns (height, width), video models keep the default (height, width, num_frames). Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
FLUX overrides warmup_cache_key with **kwargs to ignore num_frames. Callers must pass num_frames as a keyword arg so **kwargs can absorb it — positional args fail with "takes 3 positional arguments but 4 were given". Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
7f2e2ea to
0e917ab
Compare
|
/bot run |
|
PR_Github #40232 [ run ] triggered by Bot. Commit: |
|
PR_Github #40232 [ run ] completed with state
|
|
/bot help |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot run --disable-fail-fast |
|
PR_Github #40348 [ run ] triggered by Bot. Commit: |
|
PR_Github #40348 [ run ] completed with state
|
|
/bot run |
|
PR_Github #40417 [ run ] triggered by Bot. Commit: |
|
PR_Github #40417 [ run ] completed with state |
Summary
warmup_cache_key()method toBasePipelineso each model controls which dimensions matter for warmup cache matching(height, width), ignoringnum_frames(height, width, num_frames)pipeline.warmup_cache_key()instead of hardcoded(h, w, num_frames)tupleMotivation
FLUX is image-only —
num_frameshas no effect on its computation. But the warmup cache key includednum_frames, so requests with a non-defaultnum_framesvalue at the same resolution would trigger a false "not warmed up" warning and potentially unnecessary recompilation.Test plan
num_framesvalues no longer triggers warningnum_frames🤖 Generated with Claude Code
Summary by CodeRabbit