commit vulkan compute command as soon as enough pending dispatches collected for avoiding driver timeout#6541
Conversation
…llected for avoiding driver timeout
|
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6541 +/- ##
==========================================
+ Coverage 93.16% 93.18% +0.01%
==========================================
Files 847 847
Lines 266225 267341 +1116
==========================================
+ Hits 248040 249109 +1069
- Misses 18185 18232 +47 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Vulkan driver timeouts (e.g., vkWaitForFences failed -4) by proactively submitting compute command buffers once enough work has been queued, with thresholds scaled by a newly introduced GPU “rough performance score”.
Changes:
- Add
GpuInfo::rough_score()and compute/log the score during Vulkan device initialization. - Track queued compute work via
VkCompute::pending_dispatch_total(). - In Vulkan
NetPrivate::forward_layer, submit and reset the command buffer early when pending dispatches exceed a score-based threshold.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/net.cpp | Adds early submit logic based on pending dispatch count and GPU rough score. |
| src/gpu.h | Exposes GpuInfo::rough_score() API with documentation. |
| src/gpu.cpp | Implements rough score evaluation and logs it during GPU instance creation. |
| src/command.h | Exposes VkCompute::pending_dispatch_total() for submit heuristics. |
| src/command.cpp | Tracks and resets pending dispatch totals during command recording/submit/reset. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Significantly reduce the
vkWaitForFences failed -4error caused by insufficient GPU computing power.