[None][fix] Fix Cuda event crash with perf metrics by jthomson04 · Pull Request #12639 · NVIDIA/TensorRT-LLM

jthomson04 · 2026-03-31T20:25:01Z

Summary by CodeRabbit

Bug Fixes

Fixed GPU timing measurements to ensure accurate elapsed time calculations by properly synchronizing timing events before computation.

When return_perf_metrics=True, compute_batch_gpu_times() in tensorrt_llm/_torch/pyexecutor/perf_metrics_manager.py can call torch.cuda.Event.elapsed_time() before the recorded CUDA events have actually completed. This causes:

Exception in thread Thread-3 (_event_loop_wrapper):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/dynamo/venv/lib/python3.12/site-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 595, in _event_loop_wrapper
    raise e
  File "/opt/dynamo/venv/lib/python3.12/site-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 591, in _event_loop_wrapper
    self.event_loop()
  File "/opt/dynamo/venv/lib/python3.12/site-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1982, in _executor_loop
    self.perf_manager.compute_batch_gpu_times(
  File "/opt/dynamo/venv/lib/python3.12/site-packages/tensorrt_llm/_torch/pyexecutor/perf_metrics_manager.py", line 166, in compute_batch_gpu_times
    batch_gpu_forward_time = perf.gpu_forward_start_event.elapsed_time(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/dynamo/venv/lib/python3.12/site-packages/torch/cuda/streams.py", line 233, in elapsed_time
    return super().elapsed_time(end_event)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Both events must be completed before calculating elapsed time.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

jthomson04 · 2026-03-31T20:25:39Z

/bot run --disable-fail-fast

coderabbitai · 2026-03-31T20:29:44Z

📝 Walkthrough

Walkthrough

Added synchronization checks in compute_batch_gpu_times to ensure GPU timing events are completed before computing elapsed times. The code now verifies event completion using query() and calls synchronize() if events haven't finished before calculating GPU timings.

Changes

Cohort / File(s)	Summary
GPU Timing Synchronization `tensorrt_llm/_torch/pyexecutor/perf_metrics_manager.py`	Added event completion checks and synchronization calls before computing GPU forward and sample elapsed times to prevent incomplete event data from being used in timing calculations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description explains the issue well with a detailed error traceback, but the required template sections for Description, Test Coverage, and PR Checklist are incomplete or only partially filled.	Complete the Description section explaining the solution, add Test Coverage details listing relevant tests, and verify all PR Checklist items are properly addressed.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fixing a CUDA event crash in the perf metrics manager when GPU timing events aren't completed before elapsed_time() is called.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tensorrt_llm/_torch/pyexecutor/perf_metrics_manager.py (1)

166-169: Consider adding a regression test for incomplete-event timing.

This bug was timing-dependent; a focused test that exercises compute_batch_gpu_times() when events are not yet complete would help prevent regressions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/perf_metrics_manager.py` around lines 166 -
169, Add a regression test that exercises compute_batch_gpu_times() when CUDA
events are still incomplete: create a PerfMetrics-like object with
gpu_forward_end_event and gpu_sample_end_event that initially return False for
query() (or use real CUDA events recorded without synchronization) and verify
compute_batch_gpu_times() calls synchronize() and returns correct timings
without hanging; specifically target the compute_batch_gpu_times function and
assert that gpu_forward_end_event.synchronize() and
gpu_sample_end_event.synchronize() are invoked (or that returned times are
finite/non-zero) to catch regressions in the code paths handling incomplete
events.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tensorrt_llm/_torch/pyexecutor/perf_metrics_manager.py`:
- Around line 166-169: Add a regression test that exercises
compute_batch_gpu_times() when CUDA events are still incomplete: create a
PerfMetrics-like object with gpu_forward_end_event and gpu_sample_end_event that
initially return False for query() (or use real CUDA events recorded without
synchronization) and verify compute_batch_gpu_times() calls synchronize() and
returns correct timings without hanging; specifically target the
compute_batch_gpu_times function and assert that
gpu_forward_end_event.synchronize() and gpu_sample_end_event.synchronize() are
invoked (or that returned times are finite/non-zero) to catch regressions in the
code paths handling incomplete events.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cdcffe51-dad7-41ef-9d5b-e1e32cdfa9e9

📥 Commits

Reviewing files that changed from the base of the PR and between 6ac5c15 and cc973cf.

📒 Files selected for processing (1)

tensorrt_llm/_torch/pyexecutor/perf_metrics_manager.py

tensorrt-cicd · 2026-03-31T20:31:25Z

PR_Github #41010 [ run ] triggered by Bot. Commit: cc973cf Link to invocation

tensorrt-cicd · 2026-04-01T01:01:13Z

PR_Github #41010 [ run ] completed with state FAILURE. Commit: cc973cf
/LLM/main/L0_MergeRequest_PR pipeline #31990 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

jthomson04 · 2026-04-01T16:25:13Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-01T16:31:18Z

PR_Github #41227 [ run ] triggered by Bot. Commit: 2cc23b0 Link to invocation

tensorrt-cicd · 2026-04-01T23:13:19Z

PR_Github #41227 [ run ] completed with state SUCCESS. Commit: 2cc23b0
/LLM/main/L0_MergeRequest_PR pipeline #32188 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

jthomson04 · 2026-04-03T21:18:14Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-03T21:23:55Z

PR_Github #41743 [ run ] triggered by Bot. Commit: fdde119 Link to invocation

tensorrt-cicd · 2026-04-04T01:47:32Z

PR_Github #41743 [ run ] completed with state SUCCESS. Commit: fdde119
/LLM/main/L0_MergeRequest_PR pipeline #32644 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

jthomson04 · 2026-04-05T02:53:33Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-05T02:58:57Z

PR_Github #41845 [ run ] triggered by Bot. Commit: 4784555 Link to invocation

tensorrt-cicd · 2026-04-05T10:53:34Z

PR_Github #41845 [ run ] completed with state SUCCESS. Commit: 4784555
/LLM/main/L0_MergeRequest_PR pipeline #32714 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

jthomson04 requested a review from a team as a code owner March 31, 2026 20:25

jthomson04 requested a review from achartier March 31, 2026 20:25

github-actions bot assigned jthomson04 Mar 31, 2026

jthomson04 changed the title ~~[None][fix] Fix Cuda event crash~~ [None][fix] Fix Cuda event crash with perf metrics Mar 31, 2026

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

achartier approved these changes Apr 1, 2026

View reviewed changes

jthomson04 force-pushed the jthomson04/fix-crash branch from cc973cf to 2cc23b0 Compare April 1, 2026 16:24

jthomson04 enabled auto-merge (squash) April 1, 2026 16:25

jthomson04 force-pushed the jthomson04/fix-crash branch from 2cc23b0 to fdde119 Compare April 3, 2026 01:56

jthomson04 added 2 commits April 4, 2026 19:53

fix

cb237c0

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

remove header

4784555

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

jthomson04 force-pushed the jthomson04/fix-crash branch from fdde119 to 4784555 Compare April 5, 2026 02:53

jthomson04 merged commit 24d2340 into NVIDIA:main Apr 5, 2026
5 checks passed

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026

[None][fix] Fix Cuda event crash with perf metrics (NVIDIA#12639)

1abeb78

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026

[None][fix] Fix Cuda event crash with perf metrics (NVIDIA#12639)

055097f

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

Conversation

jthomson04 commented Mar 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Bug Fixes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

jthomson04 commented Mar 31, 2026

Uh oh!

coderabbitai bot commented Mar 31, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

jthomson04 commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

jthomson04 commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

jthomson04 commented Apr 5, 2026

Uh oh!

tensorrt-cicd commented Apr 5, 2026

Uh oh!

tensorrt-cicd commented Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jthomson04 commented Mar 31, 2026 •

edited by coderabbitai bot

Loading