New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement VK_KHR_present_wait to instrument actual present timings #11473
base: master
Are you sure you want to change the base?
Conversation
Slight correction: present_wait will be exposed by default on x11 by the next mesa release if the application doesn't enable any extension that can't support it. Which means in practice that it'll be available if VK_KHR_wayland_surface isn't enabled. |
Not the behaviour I experienced. When I built the current mesa trunk from source, the extension wasn't listed until I created the following <driconf>
<device>
<application name="all">
<option name="vk_khr_present_wait" value="true" />
</application>
</device>
</driconf> |
If you used the linux version of |
Source/Core/Common/WorkQueueThread.h
Outdated
if (m_thread.joinable()) | ||
{ | ||
m_flush.Set(); | ||
Clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you explicitly clear the working queue? It should be empty after the wait call anyway.
This also deadlocks if you call it when the thread is idle as far as I can tell. (Flush(); Flush();)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... This is hyer-specialised for my usecase, where I actually want "AbortAndFlush".
Because each work item takes 20ms minimum, I wanted the queue to empty as soon as possible. I should fix it to actually do what you expect a function named flush to do, and just manually call Clear()
in my code.
This also deadlocks if you call it when the thread is idle as far as I can tell. (Flush(); Flush();)
Yes... might be hard to trigger because the Clear function does a wakeup. But there is a race condition if the thread sleeps between Clear()
and m_flushed.Wait();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um....
It's missing the is_flushing()
feature that I need here. The code is about 20% longer. Harder to reason about correctness. And I can spot at least one bug that the Flag wrapper was designed to prevent.
I don't think it's an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm, probably just me then. I find the 4 different Flags + Events much harder to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. Maybe I just find condition vars slightly harder to reason about.
The advantage of the Flag and Event wrapper is that they are idiot-proof. Near impossible to mess up. I can look at the calls and know it's going to do the right thing.
Your code essentially just replaced Flags with (atomic) bools and Events with condition vars. Which is what they both are internally.
Yes, your code is slightly more efficient. One you add a forth atomic bool to implement is_flushing()
it manages to use two less mutexes and one less atomic bool than the Event/Flag version.
Edit: The problem with this type of code is that it's always easier to write it correctly than it is to read and reason it's correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantage of the Flag and Event wrapper is that they are idiot-proof. Near impossible to mess up. I can look at the calls and know it's going to do the right thing.
With multiple flags and events you often rely on the exact order of those getting checked and cleared and waiting on an event. That isn't shared by a of that is protected by a lock. That seems far from idiot-proof to me.
I'm pretty sure it also relies on a strict 1:1 relationship between the thread Flushing and the worker thread. If you have multiple threads interacting with the WorkQueueThread, it will probably fall apart.
That's just my 2 cents though. At the end of the day, I'm perfectly fine with keeping the current implementation (and adding a proper FlushAll + Push).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though we are getting a little side tracked with personal preferences about what's easier to reason about.
The real question is what implementation we should go with, and you do make a good argument that your implementation is superior (I didn't even consider the multiple threads interacting usecase, because I wasn't planning to use it).
It might be worth going with your option (after adding the IsFlushing feature I need)
Oh... Right... Good point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM aside from the linter complaints and that one line left over from debugging.
Metal has an API much closer to Would it make sense to try to make the VideoCommon API support that here (being given timestamps instead of calculating them as "the time you called this function") or should I leave that for when I add a Metal implementation for this? |
D3D12 also has something very similar to VK_KHR_present_wait as far as I know. So it would make sense to design something in VideoCommon. |
This adds to VideoCommon. Just a question of what API we want there. The current API is just a function you call that records the current CPU timestamp as the time of the present. |
Well. It's dubious if performance metrics should be considered to be part of video common or not. I want to move everything ImGUI out of VideoCommon into a DolphinImGui subproject that sits along side DolphinQt and DolphinNogui. The drawing parts of But we do want a VideoCommon API eventually (potentially not part of this PR). The wait on present behaviour is somewhat useful for the async present PR I'm about to work on. As part of my almost finished refactor that Kills Renderer, I'm thinking about adding a "VideoEvents" class that contains various frame life-cycle events to replace the current messy code in Most importantly, a clean It would also be nice to have a VideoCommon API that exposes historical present timings. When the driver/backend doesn't support VK_GOOGLE_display_timing or VK_EXT_present_timing, it can fall back to CPU time of present wait (or worse). |
and rename the existing Flush to FlushOne.
Otherwise we will end up with a dozen threads named "WorkQueueThread"
- Cancel doesn't shut down anymore. Allowing it to be used multiple times thoughout the life of the WorkQueue - Remove Clear, so we only have Cancel semantics - Add IsCancelling so work items can abort early if cancelling - Replace m_cancelled and m_thread.joinable() guars with m_shutdown. - Rename Flush to WaitForCompletion (As it's ambiguous if a function called flush should be blocking or not) - Add documentation
And VK_KHR_present_id, which it depends on
Also, reimplmented as WorkQueueThread.
@TellowKrinkle My intended API ended up in the massive KillRenderer PR here Which probably means this should wait for #11522 (and also #11539) before merging. |
VkPhysicalDevicePresentWaitFeaturesKHR present_wait = { | ||
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PRESENT_WAIT_FEATURES_KHR}; | ||
|
||
DolphinFeatures() : VkPhysicalDeviceFeatures2(), m_tail(&pNext) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to delete the move constructor since you hold interior pointers
I'd recommend using #11532 to record timings with something like If you begin recording something like frame latency, it would be like There is more information about what the boolean does in the PR, but it should give you a nice way to get some fast graphs. I spent a lot of time optimizing the |
I'm going to mark this as draft until I finish my current work refactoring the whole vulkan backend. |
Very few GPUs currently support the VK_KHR_present_wait extension.
But it has better support on desktop GPUs than VK_GOOGLE_display_timing, and we can abuse it to get presentation timings, which we can use for pretty graphs of when the user actually sees the a new frame.
Why present timings?
Because the frame timings don't actually represent what the user experiences. For example, here is a 30 FPS game running vsyned on my 100hz monitor. Because 30 doesn't divide into 100, the frame pacing is bad.
For another example, this sometimes happens to a 59.97 fps game when vsynced to my monitor at 60hz. It only happens if dolphin manages to run without a single stutter for several min. I'm guessing my monitor is actually slightly lower than 59.97hz and eventually dolphin runs ahead and is forced to wait for vsync.
Supported GPUs/Drivers
Currently only Nvidia on both Linux and Windows.
But the next release of Mesa will add present_wait for AMD and Intel GPUs. But only if you use X11 and set a flag to force it on
Limitations:
Currently doesn't support wayland. And X11 will only show true present timings when running in full screen. Otherwise you get the timing that the compositor accepted the frame. Not the true present timing. This is could probably be considered to be a bug in Xorg.