[None][fix] Fix Cuda event crash with perf metrics#12639
[None][fix] Fix Cuda event crash with perf metrics#12639jthomson04 merged 2 commits intoNVIDIA:mainfrom
Conversation
|
/bot run --disable-fail-fast |
📝 WalkthroughWalkthroughAdded synchronization checks in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/perf_metrics_manager.py (1)
166-169: Consider adding a regression test for incomplete-event timing.This bug was timing-dependent; a focused test that exercises
compute_batch_gpu_times()when events are not yet complete would help prevent regressions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/pyexecutor/perf_metrics_manager.py` around lines 166 - 169, Add a regression test that exercises compute_batch_gpu_times() when CUDA events are still incomplete: create a PerfMetrics-like object with gpu_forward_end_event and gpu_sample_end_event that initially return False for query() (or use real CUDA events recorded without synchronization) and verify compute_batch_gpu_times() calls synchronize() and returns correct timings without hanging; specifically target the compute_batch_gpu_times function and assert that gpu_forward_end_event.synchronize() and gpu_sample_end_event.synchronize() are invoked (or that returned times are finite/non-zero) to catch regressions in the code paths handling incomplete events.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tensorrt_llm/_torch/pyexecutor/perf_metrics_manager.py`:
- Around line 166-169: Add a regression test that exercises
compute_batch_gpu_times() when CUDA events are still incomplete: create a
PerfMetrics-like object with gpu_forward_end_event and gpu_sample_end_event that
initially return False for query() (or use real CUDA events recorded without
synchronization) and verify compute_batch_gpu_times() calls synchronize() and
returns correct timings without hanging; specifically target the
compute_batch_gpu_times function and assert that
gpu_forward_end_event.synchronize() and gpu_sample_end_event.synchronize() are
invoked (or that returned times are finite/non-zero) to catch regressions in the
code paths handling incomplete events.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cdcffe51-dad7-41ef-9d5b-e1e32cdfa9e9
📒 Files selected for processing (1)
tensorrt_llm/_torch/pyexecutor/perf_metrics_manager.py
|
PR_Github #41010 [ run ] triggered by Bot. Commit: |
|
PR_Github #41010 [ run ] completed with state
|
cc973cf to
2cc23b0
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #41227 [ run ] triggered by Bot. Commit: |
|
PR_Github #41227 [ run ] completed with state
|
2cc23b0 to
fdde119
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #41743 [ run ] triggered by Bot. Commit: |
|
PR_Github #41743 [ run ] completed with state
|
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
fdde119 to
4784555
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #41845 [ run ] triggered by Bot. Commit: |
|
PR_Github #41845 [ run ] completed with state |
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Summary by CodeRabbit
Bug Fixes
When return_perf_metrics=True, compute_batch_gpu_times() in tensorrt_llm/_torch/pyexecutor/perf_metrics_manager.py can call torch.cuda.Event.elapsed_time() before the recorded CUDA events have actually completed. This causes:
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.