Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safely destroying semaphore used with WSI #152

Open
jjulianoatnv opened this issue Mar 22, 2016 · 7 comments
Open

safely destroying semaphore used with WSI #152

jjulianoatnv opened this issue Mar 22, 2016 · 7 comments

Comments

@jjulianoatnv
Copy link

@jjulianoatnv jjulianoatnv commented Mar 22, 2016

This issue seeks clarification on when it is safe to destroy a semaphore that has been used by WSI functions vkAcquireNextImageKHR and vkQueuePresentKHR.

A separate issue in the layer repo will address the validation layer detecting when a semaphore used by these functions is destroyed too soon. (I plan to edit this text with a cross-reference after the layer issue has been filed.)


vkAcquireNextImage's semaphore is used to establish or extend a dependency chain with a write-after-read hazard ensuring that the acquired image is modified only after the presentation engine is done reading from the image.

Quoting from the spec:

If semaphore is not VK_NULL_HANDLE, the semaphore must be unsignaled and not have any uncompleted signal or wait operations pending. It will become signaled when the presentation engine has released ownership of the image, and the device may modify its contents. Queue operations that access the image contents must wait until the semaphore signals; typically applications should include the semaphore in the pWaitSemaphores list for the queue submission that transitions the image away from the VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout. Use of the semaphore allows rendering operations to be recorded and submitted before the presentation engine releases ownership.

and

A successful call to vkAcquireNextImageKHR counts as a signal operation on semaphore for the purposes of queue forward-progress requirements. The semaphore is guaranteed to signal, so a wait operation can be queued for the semaphore without risk of deadlock.

This seems to fully defines when vkAcquireNextImageKHR's semaphore may be destroyed. I think it says that semaphore may be destroyed after completion of all queue operations that include semaphore in the pWaitSemaphores list for the queue submission. But that destroying it any sooner results in undefined behavior. In other words, one must track completion of all consumers of semaphore to know when it is safe to destroy semaphore.

For the validation layer to detect an application destroying vkAcquireNextImageKHR's semaphore too soon, I think the burden is no different than for other semaphores that are not used by vkAcquireNextImageKHR.


Switching to vkQueuePresentKHR, pPresentInfo->pWaitSemaphores extends a dependency chain with a read-after-write hazard ensuring that the images are presented only after writes to the images have completed.

Quoting from the spec:

pWaitSemaphores, if non-NULL, is an array of VkSemaphore objects with waitSemaphoreCount entries, and specifies the semaphores to wait for before issuing the present request.

and

The transfer of ownership to the presentation engine happens in issue order with other queue operations, but semaphores have to be used to ensure that prior rendering and other commands in the specified queue complete before the presentation operation. The presentation operation itself does not delay processing of subsequent commands on the queue, however, presentation requests sent to a particular queue are always performed in order. Exact presentation timing is controled by the semantics of the presentation engine and native platform in use.

Comparing these two excerpts there might be an issue here with the specific spec language of "present request" and "present operation". I'm addressing that separately within Khronos. I will assume that vkQueuePresentKHR is the present request, and that the present operation is carried out by the presentation engine after the present request schedules on the present operation on the presentation engine's own queue. (The presentation engine's queue is separate from the VkQueue.)

The spec excerpts do not seem to fully define when the semaphores pointed to by pWaitSemaphores may be destroyed. It is my interpretation of both the spec language and the intent of the spec that a non-blocking vkQueuePresentKHR request may schedule a present operation on the presentation engine's queue such that the present operation on the presentation engine's queue has a read-after-write hazard on pWaitSemaphores. This would mean that pWaitSemaphores may not be destroyed until after the presentation engine has completed the present operation.

However there seems to be nothing in vkQueuePresentKHR to indicate to the app when the presentation engine has completed the present operation. The next-best thing is to assume that pWaitSemaphires may be destroyed after the swapchain is destroyed, or after a future vkAcquireNextImageKHR returns the same image. The latter imposes a book-keeping requirement on the application that is non-obvious and that had not occurred to me before now.

It might be possible to assume the semaphores may be destroyed after vkAcquireNextImageKHR returns some number of other images. How many? I don't see language in the spec in support of such an assumption. I'm not even entirely certain that the spec supports the earlier assumption of "the same image".


Finally, the purpose of this issue is to feel out and document discussion related to clarifying the spec language defining when the application may destroy the semaphores pointed to by vkQueuePresentKHR's pWaitSemaphores. If the existing spec language already covers this adequately that's fine, and this issue can document that interpretation.

@jjulianoatnv
Copy link
Author

@jjulianoatnv jjulianoatnv commented Mar 22, 2016

However this is resolved, support in the validation layers is needed. One of the factors motivating creation of this issue is that I don't see how to add this to the validation layer without first clarifying when it is safe to destroy the semaphores pointed to by vkQueuePresentKHR's pWaitSemaphores.

@krOoze
Copy link
Contributor

@krOoze krOoze commented Sep 4, 2018

I think the formalism to describe it is mostly there since the synchronization descriptions overhaul. Some final touches in #785.

@krOoze
Copy link
Contributor

@krOoze krOoze commented Jan 22, 2019

Yea, I do somewhat keep finding loopholes and ambiguities.

What worries me most is this spec quote:

Also, in the case of vkQueueSubmit, the second synchronization scope additionally includes all commands that occur later in submission order.

(Same for Semaphore and Fence.) It marks any other queue commands that are not vkQueueSubmit to not include what follows in submission order. It somehow makes sense that we do not want to formally stall rest of the work for the present that is orthogonal to this work, but it makes things much harder than it is worth.

I had some time to collect my thoughts making the PR, and I think the (hopefully complete) behavior loosely based my current spec interpretation is (while necesserily making some stuff up):

  • acquire with a Fence is hopefully straigtforward. If you wait on the fence, then if there is also a semaphore, then this does not prevent destroying the semaphore, destroying the swapchain, and the image is already visible to whatever you want on device domain.

  • acquire with only Semaphore is finished when you wait on the Semaphore, and know the waiting stuff has finished execution (by regular means). Or if you call a vkDeviceWaitIdle. vkQueueWaitIdle should not work no matter the used Queue (or even if used on all Queues known to the user via API).

    • it does not cover any work earlier in submission order. Except in the case the acquired image is an previously presented image. In that case the batch of the present is considered finished (and present semaphore safe to destroy, and stuff that signaled this semaphore finished execution)
  • present with a semaphore does not cover subsequent submissions. We know the work is done (and semaphore destroyable etc) if we call vkDeviceWaitIdle, vkQueueWaitIdle, different semaphore from subsequent vkQueueSubmit (which covers previous queue submissions) was signaled, or if we know Present following this present has finished execution on the same queue.

Does that make sense? I welcome any notes. It also feels awfully complex. I wonder if I am overthinking it, and there is a room to make this much simpler. I am also waiting on some input from Khronos, because I assume too much stuff about the behavior, and some things need official resolution.

@cubanismo
Copy link

@cubanismo cubanismo commented Oct 9, 2019

acquire with only Semaphore is finished when you wait on the Semaphore, and know the waiting stuff has finished execution (by regular means). Or if you call a vkDeviceWaitIdle. vkQueueWaitIdle should not work no matter the used Queue (or even if used on all Queues known to the user via API).

In the above (emphasis mine) statement, I don't think the bold portion is true. DeviceWaitIdle does not cover the pending work that signals the semaphore, because that work happens external to Vulkan. It is presentation engine work. Hence, the only way to ensure the semaphore is idle is to first wait on it. After that, the usual synchronization logic falls out naturally because at that point, you have work related to the semaphore enqueued on a well-defined device queue.

From my understanding, the rest of the assertions in the above comment are correct.

@krOoze
Copy link
Contributor

@krOoze krOoze commented Oct 16, 2019

@cubanismo If that is so, I feel it should be made an exception. The obvious expectiation of vkDeviceWaitIdle is that it is the all-stop scram kill switch. If the Implementation has some internal secret Queues, it feels it should be Implementation's job to stop that work. Not application's, having to weirdly submit empty command buffer just to wait on that Semaphore and signal and wait on some ad-hoc Fence in addition to the vkDeviceWaitIdle.

@cubanismo
Copy link

@cubanismo cubanismo commented Oct 16, 2019

Think about how WSI would be/is implemented. It's essentially an under-the-covers external object situation. If you had a semaphore shared between two VkDevice objects, potentially in separate processes, would you expect calling vkDeviceWaitIdle() on one of those devices to idle any pending operations on the semaphore happening on the other device? That would be an impossible implementation requirement.

The caveat with this line of thinking is that in such a case, there would be two OS-level references to the semaphore payload, so you could safely destroy one instance of the semaphore without idling the other. This gets fuzzy with AcquireNextImage. It essentially binds the specified semaphore to an opaquely-defined semaphore payload whose lifetime is not discussed in the spec as far as I can tell. Without examining our driver more closely, I'm not even sure what I can say about our implementation's behavior if the device were destroyed while operations on such semaphores are pending. This ambiguity is unfortunate, but it doesn't change the awkwardness of asking the driver to idle things happening outside of the scope of its queues.

@krOoze
Copy link
Contributor

@krOoze krOoze commented Oct 16, 2019

@cubanismo No, I would expect only the device that user asked to be stopped to be stopped. VkSwapchainKHR is a child of VkDevice, therefore I (ideally) expect it to be stopped.

I mean, you are talking about implementation details. You could say the same about vkDestroySemaphore. How do I know I am not destroying semaphore that some other process is using? Simple: it is not my resposibility. If the Implementation uses it that way, the Implementation has to clean it in whatever way is appropriate.

Anyway, I am gonna make an Issue (#1059) as it seems to be a separate can of worms that actually might impact compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants
@cubanismo @krOoze @jjulianoatnv and others
You can’t perform that action at this time.