Implement VK_KHR_present_wait to instrument actual present timings #11473

phire · 2023-01-23T07:15:27Z

Very few GPUs currently support the VK_KHR_present_wait extension.

But it has better support on desktop GPUs than VK_GOOGLE_display_timing, and we can abuse it to get presentation timings, which we can use for pretty graphs of when the user actually sees the a new frame.

Why present timings?

Because the frame timings don't actually represent what the user experiences. For example, here is a 30 FPS game running vsyned on my 100hz monitor. Because 30 doesn't divide into 100, the frame pacing is bad.

For another example, this sometimes happens to a 59.97 fps game when vsynced to my monitor at 60hz. It only happens if dolphin manages to run without a single stutter for several min. I'm guessing my monitor is actually slightly lower than 59.97hz and eventually dolphin runs ahead and is forced to wait for vsync.

Supported GPUs/Drivers

Currently only Nvidia on both Linux and Windows.
But the next release of Mesa will add present_wait for AMD and Intel GPUs. But only if you use X11 and set a flag to force it on

Limitations:

Currently doesn't support wayland. And X11 will only show true present timings when running in full screen. Otherwise you get the timing that the compositor accepted the frame. Not the true present timing. This is could probably be considered to be a bug in Xorg.

mbriar · 2023-01-23T11:15:51Z

But only if you use X11 and set a flag to force it on

Slight correction: present_wait will be exposed by default on x11 by the next mesa release if the application doesn't enable any extension that can't support it. Which means in practice that it'll be available if VK_KHR_wayland_surface isn't enabled.

phire · 2023-01-24T02:51:44Z

Not the behaviour I experienced. When I built the current mesa trunk from source, the extension wasn't listed until I created the following /etc/drirc:

<driconf>
  <device>
    <application name="all">
      <option name="vk_khr_present_wait" value="true" />
    </application>
  </device>
</driconf>

mbriar · 2023-01-24T08:57:27Z

If you used the linux version of vulkaninfo then this is expected, because it enables VK_KHR_wayland_surface and thus won't list present_wait by default. If you e.g. run the vulkaninfo.exe from the windows vulkan sdk on a recent version of wine, it will, however, list it by default, because wine only supports x11 at the moment and doesn't enable VK_KHR_wayland_surface.
As a side note, you don't need to edit drirc to force it on, vk_khr_present_wait=true also works as an environment variable, like all drirc options.

K0bin · 2023-01-24T20:44:08Z

Source/Core/Common/WorkQueueThread.h

+    if (m_thread.joinable())
+    {
+      m_flush.Set();
+      Clear();


Why do you explicitly clear the working queue? It should be empty after the wait call anyway.

This also deadlocks if you call it when the thread is idle as far as I can tell. (Flush(); Flush();)

Yes... This is hyer-specialised for my usecase, where I actually want "AbortAndFlush".

Because each work item takes 20ms minimum, I wanted the queue to empty as soon as possible. I should fix it to actually do what you expect a function named flush to do, and just manually call Clear() in my code.

This also deadlocks if you call it when the thread is idle as far as I can tell. (Flush(); Flush();)

Yes... might be hard to trigger because the Clear function does a wakeup. But there is a race condition if the thread sleeps between Clear() and m_flushed.Wait();

5d86c88

Any thoughts on this?

Um....
It's missing the is_flushing() feature that I need here. The code is about 20% longer. Harder to reason about correctness. And I can spot at least one bug that the Flag wrapper was designed to prevent.

I don't think it's an improvement.

Mhm, probably just me then. I find the 4 different Flags + Events much harder to reason about.

I don't know. Maybe I just find condition vars slightly harder to reason about.
The advantage of the Flag and Event wrapper is that they are idiot-proof. Near impossible to mess up. I can look at the calls and know it's going to do the right thing.

Your code essentially just replaced Flags with (atomic) bools and Events with condition vars. Which is what they both are internally.

Yes, your code is slightly more efficient. One you add a forth atomic bool to implement is_flushing() it manages to use two less mutexes and one less atomic bool than the Event/Flag version.

Edit: The problem with this type of code is that it's always easier to write it correctly than it is to read and reason it's correctness.

The advantage of the Flag and Event wrapper is that they are idiot-proof. Near impossible to mess up. I can look at the calls and know it's going to do the right thing.

With multiple flags and events you often rely on the exact order of those getting checked and cleared and waiting on an event. That isn't shared by a of that is protected by a lock. That seems far from idiot-proof to me.

I'm pretty sure it also relies on a strict 1:1 relationship between the thread Flushing and the worker thread. If you have multiple threads interacting with the WorkQueueThread, it will probably fall apart.

That's just my 2 cents though. At the end of the day, I'm perfectly fine with keeping the current implementation (and adding a proper FlushAll + Push).

Though we are getting a little side tracked with personal preferences about what's easier to reason about.

The real question is what implementation we should go with, and you do make a good argument that your implementation is superior (I didn't even consider the multiple threads interacting usecase, because I wasn't planning to use it).

It might be worth going with your option (after adding the IsFlushing feature I need)

phire · 2023-01-27T01:28:45Z

If you used the linux version of vulkaninfo then this is expected

Oh... Right... Good point.

K0bin

LGTM aside from the linter complaints and that one line left over from debugging.

Source/Core/VideoBackends/Vulkan/CommandBufferManager.cpp

Source/Core/VideoBackends/Vulkan/PresentWait.cpp

TellowKrinkle · 2023-01-29T22:03:17Z

Metal has an API much closer to VK_GOOGLE_display_timing (where you can ask for the actual time of the present, instead of just getting a callback some time after it happened)

Would it make sense to try to make the VideoCommon API support that here (being given timestamps instead of calculating them as "the time you called this function") or should I leave that for when I add a Metal implementation for this?

K0bin · 2023-01-29T22:05:18Z

D3D12 also has something very similar to VK_KHR_present_wait as far as I know. So it would make sense to design something in VideoCommon.

TellowKrinkle · 2023-01-29T22:21:55Z

This adds to VideoCommon. Just a question of what API we want there.
Metal and VK_GOOGLE_display_timing can provide their own timestamp for the best accuracy
VK_KHR_present_wait can't, and needs to rely on a CPU timestamp

The current API is just a function you call that records the current CPU timestamp as the time of the present.

phire · 2023-01-30T02:00:12Z

This adds to VideoCommon

Well. It's dubious if performance metrics should be considered to be part of video common or not. I want to move everything ImGUI out of VideoCommon into a DolphinImGui subproject that sits along side DolphinQt and DolphinNogui. The drawing parts of PerformanceMetrics would obviously move, and maybe the collection parts too?

But we do want a VideoCommon API eventually (potentially not part of this PR).

The wait on present behaviour is somewhat useful for the async present PR I'm about to work on. As part of my almost finished refactor that Kills Renderer, I'm thinking about adding a "VideoEvents" class that contains various frame life-cycle events to replace the current messy code in RenderBase::Swap.

Most importantly, a clean FrameEnd event that various things can hook into. But it would also be useful to have BeforePresent and AfterPresent events. That AfterPresent event could be defined to happen after present wait (if the backend/driver have support for it, otherwise you might just get instant or post-vsync timings).

It would also be nice to have a VideoCommon API that exposes historical present timings. When the driver/backend doesn't support VK_GOOGLE_display_timing or VK_EXT_present_timing, it can fall back to CPU time of present wait (or worse).

and rename the existing Flush to FlushOne.

Otherwise we will end up with a dozen threads named "WorkQueueThread"

- Cancel doesn't shut down anymore. Allowing it to be used multiple times thoughout the life of the WorkQueue - Remove Clear, so we only have Cancel semantics - Add IsCancelling so work items can abort early if cancelling - Replace m_cancelled and m_thread.joinable() guars with m_shutdown. - Rename Flush to WaitForCompletion (As it's ambiguous if a function called flush should be blocking or not) - Add documentation

And VK_KHR_present_id, which it depends on

Also, reimplmented as WorkQueueThread.

phire · 2023-02-04T03:51:49Z

This adds to VideoCommon. Just a question of what API we want there.

@TellowKrinkle My intended API ended up in the massive KillRenderer PR here

Which probably means this should wait for #11522 (and also #11539) before merging.

TellowKrinkle · 2023-02-07T06:40:49Z

Source/Core/VideoBackends/Vulkan/VulkanContext.h

+  VkPhysicalDevicePresentWaitFeaturesKHR present_wait = {
+      VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PRESENT_WAIT_FEATURES_KHR};
+
+  DolphinFeatures() : VkPhysicalDeviceFeatures2(), m_tail(&pNext) {}


Might want to delete the move constructor since you hold interior pointers

Sam-Belliveau · 2023-02-10T19:52:23Z

I'd recommend using #11532 to record timings with something like m_present_counter.Count(present_times, true);, which should give you a really good graph.

If you begin recording something like frame latency, it would be like m_frame_latency.Count(latency, false);

There is more information about what the boolean does in the PR, but it should give you a nice way to get some fast graphs. I spent a lot of time optimizing the PerformanceTracker class so there shouldn't be too much overhead to adding more counters.

phire · 2023-02-11T06:01:23Z

I'm going to mark this as draft until I finish my current work refactoring the whole vulkan backend.

phire mentioned this pull request Jan 24, 2023

VideoBackends:Vulkan: Clean up submission thread using WorkQueueThread #11417

Merged

K0bin reviewed Jan 24, 2023

View reviewed changes

K0bin reviewed Jan 29, 2023

View reviewed changes

Source/Core/VideoBackends/Vulkan/CommandBufferManager.cpp Outdated Show resolved Hide resolved

Source/Core/VideoBackends/Vulkan/PresentWait.cpp Outdated Show resolved Hide resolved

phire and others added 4 commits February 4, 2023 14:31

WorkQueueThread: Add flush capability

512273a

WorkQueueThread: Add Push

9badcc6

WorkQueueThread: Implement proper Flush

9affbfe

and rename the existing Flush to FlushOne.

WorkQueueThread: Rework without Flags/Events

94a0c50

phire mentioned this pull request Feb 4, 2023

Various WorkQueueThread improvements #11539

Merged

phire added 9 commits February 4, 2023 14:58

WorkQueueThread: Implement thread name

acdb0c5

Otherwise we will end up with a dozen threads named "WorkQueueThread"

WorkQueueThread: provide name and function at same time

7c4fcc3

Rework app_info.apiVersion to signal Vulkan 1.2

ccfaa65

Save Vulkan Instance api version in context

6705527

Implement support for VkPhysicalDeviceFeatures2

e7c5e4c

Enable VK_KHR_present_wait

e14d29c

And VK_KHR_present_id, which it depends on

Implement Present Wait

2f86e9f

PresentWait: Correctly handle swapchain destruction

34cb386

Also, reimplmented as WorkQueueThread.

phire force-pushed the present_wait branch from 8fcf83e to 34cb386 Compare February 4, 2023 03:38

TellowKrinkle reviewed Feb 7, 2023

View reviewed changes

phire marked this pull request as draft February 11, 2023 06:00

nyanpasu64 mentioned this pull request Jul 6, 2023

Add closed-loop latency control for Vulkan vsync #12035

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement VK_KHR_present_wait to instrument actual present timings #11473

Implement VK_KHR_present_wait to instrument actual present timings #11473

phire commented Jan 23, 2023

mbriar commented Jan 23, 2023

phire commented Jan 24, 2023

mbriar commented Jan 24, 2023 •

edited

K0bin Jan 24, 2023 •

edited

phire Jan 27, 2023

K0bin Jan 29, 2023

phire Jan 29, 2023

K0bin Jan 29, 2023

phire Jan 29, 2023 •

edited

K0bin Jan 29, 2023

phire Jan 30, 2023

phire commented Jan 27, 2023

K0bin left a comment

TellowKrinkle commented Jan 29, 2023

K0bin commented Jan 29, 2023

TellowKrinkle commented Jan 29, 2023

phire commented Jan 30, 2023

phire commented Feb 4, 2023

TellowKrinkle Feb 7, 2023

Sam-Belliveau commented Feb 10, 2023

phire commented Feb 11, 2023

Implement VK_KHR_present_wait to instrument actual present timings #11473

Are you sure you want to change the base?

Implement VK_KHR_present_wait to instrument actual present timings #11473

Conversation

phire commented Jan 23, 2023

Why present timings?

Supported GPUs/Drivers

Limitations:

mbriar commented Jan 23, 2023

phire commented Jan 24, 2023

mbriar commented Jan 24, 2023 • edited

K0bin Jan 24, 2023 • edited

Choose a reason for hiding this comment

phire Jan 27, 2023

Choose a reason for hiding this comment

K0bin Jan 29, 2023

Choose a reason for hiding this comment

phire Jan 29, 2023

Choose a reason for hiding this comment

K0bin Jan 29, 2023

Choose a reason for hiding this comment

phire Jan 29, 2023 • edited

Choose a reason for hiding this comment

K0bin Jan 29, 2023

Choose a reason for hiding this comment

phire Jan 30, 2023

Choose a reason for hiding this comment

phire commented Jan 27, 2023

K0bin left a comment

Choose a reason for hiding this comment

TellowKrinkle commented Jan 29, 2023

K0bin commented Jan 29, 2023

TellowKrinkle commented Jan 29, 2023

phire commented Jan 30, 2023

phire commented Feb 4, 2023

TellowKrinkle Feb 7, 2023

Choose a reason for hiding this comment

Sam-Belliveau commented Feb 10, 2023

phire commented Feb 11, 2023

mbriar commented Jan 24, 2023 •

edited

K0bin Jan 24, 2023 •

edited

phire Jan 29, 2023 •

edited