Skip to content

commit vulkan compute command as soon as enough pending dispatches collected for avoiding driver timeout#6541

Merged
nihui merged 4 commits intoTencent:masterfrom
nihui:vkwaitforfences-4-hack
Feb 11, 2026
Merged

commit vulkan compute command as soon as enough pending dispatches collected for avoiding driver timeout#6541
nihui merged 4 commits intoTencent:masterfrom
nihui:vkwaitforfences-4-hack

Conversation

@nihui
Copy link
Copy Markdown
Member

@nihui nihui commented Feb 10, 2026

Significantly reduce the vkWaitForFences failed -4 error caused by insufficient GPU computing power.

@github-actions github-actions Bot added the core label Feb 10, 2026
@tencent-adm
Copy link
Copy Markdown
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 10, 2026

Codecov Report

❌ Patch coverage is 84.61538% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.18%. Comparing base (1b847b1) to head (6b3c54c).

Files with missing lines Patch % Lines
src/gpu.cpp 85.29% 5 Missing ⚠️
src/net.cpp 75.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6541      +/-   ##
==========================================
+ Coverage   93.16%   93.18%   +0.01%     
==========================================
  Files         847      847              
  Lines      266225   267341    +1116     
==========================================
+ Hits       248040   249109    +1069     
- Misses      18185    18232      +47     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce Vulkan driver timeouts (e.g., vkWaitForFences failed -4) by proactively submitting compute command buffers once enough work has been queued, with thresholds scaled by a newly introduced GPU “rough performance score”.

Changes:

  • Add GpuInfo::rough_score() and compute/log the score during Vulkan device initialization.
  • Track queued compute work via VkCompute::pending_dispatch_total().
  • In Vulkan NetPrivate::forward_layer, submit and reset the command buffer early when pending dispatches exceed a score-based threshold.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/net.cpp Adds early submit logic based on pending dispatch count and GPU rough score.
src/gpu.h Exposes GpuInfo::rough_score() API with documentation.
src/gpu.cpp Implements rough score evaluation and logs it during GPU instance creation.
src/command.h Exposes VkCompute::pending_dispatch_total() for submit heuristics.
src/command.cpp Tracks and resets pending dispatch totals during command recording/submit/reset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/command.cpp
Comment thread src/gpu.h Outdated
nihui and others added 2 commits February 11, 2026 15:07
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@nihui nihui merged commit 0ed9702 into Tencent:master Feb 11, 2026
109 of 110 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants