Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement VK_KHR_present_wait to instrument actual present timings #11473

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

phire
Copy link
Member

@phire phire commented Jan 23, 2023

Very few GPUs currently support the VK_KHR_present_wait extension.

But it has better support on desktop GPUs than VK_GOOGLE_display_timing, and we can abuse it to get presentation timings, which we can use for pretty graphs of when the user actually sees the a new frame.

Why present timings?

Because the frame timings don't actually represent what the user experiences. For example, here is a 30 FPS game running vsyned on my 100hz monitor. Because 30 doesn't divide into 100, the frame pacing is bad.

image

For another example, this sometimes happens to a 59.97 fps game when vsynced to my monitor at 60hz. It only happens if dolphin manages to run without a single stutter for several min. I'm guessing my monitor is actually slightly lower than 59.97hz and eventually dolphin runs ahead and is forced to wait for vsync.

image

Supported GPUs/Drivers

Currently only Nvidia on both Linux and Windows.
But the next release of Mesa will add present_wait for AMD and Intel GPUs. But only if you use X11 and set a flag to force it on

Limitations:

Currently doesn't support wayland. And X11 will only show true present timings when running in full screen. Otherwise you get the timing that the compositor accepted the frame. Not the true present timing. This is could probably be considered to be a bug in Xorg.

@mbriar
Copy link

mbriar commented Jan 23, 2023

But only if you use X11 and set a flag to force it on

Slight correction: present_wait will be exposed by default on x11 by the next mesa release if the application doesn't enable any extension that can't support it. Which means in practice that it'll be available if VK_KHR_wayland_surface isn't enabled.

@phire
Copy link
Member Author

phire commented Jan 24, 2023

Not the behaviour I experienced. When I built the current mesa trunk from source, the extension wasn't listed until I created the following /etc/drirc:

<driconf>
  <device>
    <application name="all">
      <option name="vk_khr_present_wait" value="true" />
    </application>
  </device>
</driconf>

@mbriar
Copy link

mbriar commented Jan 24, 2023

If you used the linux version of vulkaninfo then this is expected, because it enables VK_KHR_wayland_surface and thus won't list present_wait by default. If you e.g. run the vulkaninfo.exe from the windows vulkan sdk on a recent version of wine, it will, however, list it by default, because wine only supports x11 at the moment and doesn't enable VK_KHR_wayland_surface.
As a side note, you don't need to edit drirc to force it on, vk_khr_present_wait=true also works as an environment variable, like all drirc options.

if (m_thread.joinable())
{
m_flush.Set();
Clear();
Copy link
Contributor

@K0bin K0bin Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you explicitly clear the working queue? It should be empty after the wait call anyway.

This also deadlocks if you call it when the thread is idle as far as I can tell. (Flush(); Flush();)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... This is hyer-specialised for my usecase, where I actually want "AbortAndFlush".

Because each work item takes 20ms minimum, I wanted the queue to empty as soon as possible. I should fix it to actually do what you expect a function named flush to do, and just manually call Clear() in my code.

This also deadlocks if you call it when the thread is idle as far as I can tell. (Flush(); Flush();)

Yes... might be hard to trigger because the Clear function does a wakeup. But there is a race condition if the thread sleeps between Clear() and m_flushed.Wait();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5d86c88

Any thoughts on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um....
It's missing the is_flushing() feature that I need here. The code is about 20% longer. Harder to reason about correctness. And I can spot at least one bug that the Flag wrapper was designed to prevent.

I don't think it's an improvement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm, probably just me then. I find the 4 different Flags + Events much harder to reason about.

Copy link
Member Author

@phire phire Jan 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. Maybe I just find condition vars slightly harder to reason about.
The advantage of the Flag and Event wrapper is that they are idiot-proof. Near impossible to mess up. I can look at the calls and know it's going to do the right thing.

Your code essentially just replaced Flags with (atomic) bools and Events with condition vars. Which is what they both are internally.

Yes, your code is slightly more efficient. One you add a forth atomic bool to implement is_flushing() it manages to use two less mutexes and one less atomic bool than the Event/Flag version.

Edit: The problem with this type of code is that it's always easier to write it correctly than it is to read and reason it's correctness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of the Flag and Event wrapper is that they are idiot-proof. Near impossible to mess up. I can look at the calls and know it's going to do the right thing.

With multiple flags and events you often rely on the exact order of those getting checked and cleared and waiting on an event. That isn't shared by a of that is protected by a lock. That seems far from idiot-proof to me.

I'm pretty sure it also relies on a strict 1:1 relationship between the thread Flushing and the worker thread. If you have multiple threads interacting with the WorkQueueThread, it will probably fall apart.

That's just my 2 cents though. At the end of the day, I'm perfectly fine with keeping the current implementation (and adding a proper FlushAll + Push).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though we are getting a little side tracked with personal preferences about what's easier to reason about.

The real question is what implementation we should go with, and you do make a good argument that your implementation is superior (I didn't even consider the multiple threads interacting usecase, because I wasn't planning to use it).

It might be worth going with your option (after adding the IsFlushing feature I need)

@phire
Copy link
Member Author

phire commented Jan 27, 2023

If you used the linux version of vulkaninfo then this is expected

Oh... Right... Good point.

Copy link
Contributor

@K0bin K0bin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from the linter complaints and that one line left over from debugging.

Source/Core/VideoBackends/Vulkan/CommandBufferManager.cpp Outdated Show resolved Hide resolved
Source/Core/VideoBackends/Vulkan/PresentWait.cpp Outdated Show resolved Hide resolved
@TellowKrinkle
Copy link
Contributor

Metal has an API much closer to VK_GOOGLE_display_timing (where you can ask for the actual time of the present, instead of just getting a callback some time after it happened)

Would it make sense to try to make the VideoCommon API support that here (being given timestamps instead of calculating them as "the time you called this function") or should I leave that for when I add a Metal implementation for this?

@K0bin
Copy link
Contributor

K0bin commented Jan 29, 2023

D3D12 also has something very similar to VK_KHR_present_wait as far as I know. So it would make sense to design something in VideoCommon.

@TellowKrinkle
Copy link
Contributor

This adds to VideoCommon. Just a question of what API we want there.
Metal and VK_GOOGLE_display_timing can provide their own timestamp for the best accuracy
VK_KHR_present_wait can't, and needs to rely on a CPU timestamp

The current API is just a function you call that records the current CPU timestamp as the time of the present.

@phire
Copy link
Member Author

phire commented Jan 30, 2023

This adds to VideoCommon

Well. It's dubious if performance metrics should be considered to be part of video common or not. I want to move everything ImGUI out of VideoCommon into a DolphinImGui subproject that sits along side DolphinQt and DolphinNogui. The drawing parts of PerformanceMetrics would obviously move, and maybe the collection parts too?

But we do want a VideoCommon API eventually (potentially not part of this PR).

The wait on present behaviour is somewhat useful for the async present PR I'm about to work on. As part of my almost finished refactor that Kills Renderer, I'm thinking about adding a "VideoEvents" class that contains various frame life-cycle events to replace the current messy code in RenderBase::Swap.

Most importantly, a clean FrameEnd event that various things can hook into. But it would also be useful to have BeforePresent and AfterPresent events. That AfterPresent event could be defined to happen after present wait (if the backend/driver have support for it, otherwise you might just get instant or post-vsync timings).

It would also be nice to have a VideoCommon API that exposes historical present timings. When the driver/backend doesn't support VK_GOOGLE_display_timing or VK_EXT_present_timing, it can fall back to CPU time of present wait (or worse).

Otherwise we will end up with a dozen threads named "WorkQueueThread"
 - Cancel doesn't shut down anymore.
   Allowing it to be used multiple times thoughout the life of
   the WorkQueue
 - Remove Clear, so we only have Cancel semantics
 - Add IsCancelling so work items can abort early if cancelling
 - Replace m_cancelled and m_thread.joinable() guars with m_shutdown.
 - Rename Flush to WaitForCompletion (As it's ambiguous if a function
   called flush should be blocking or not)
 - Add documentation
And VK_KHR_present_id, which it depends on
Also, reimplmented as WorkQueueThread.
@phire
Copy link
Member Author

phire commented Feb 4, 2023

This adds to VideoCommon. Just a question of what API we want there.

@TellowKrinkle My intended API ended up in the massive KillRenderer PR here

Which probably means this should wait for #11522 (and also #11539) before merging.

VkPhysicalDevicePresentWaitFeaturesKHR present_wait = {
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PRESENT_WAIT_FEATURES_KHR};

DolphinFeatures() : VkPhysicalDeviceFeatures2(), m_tail(&pNext) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to delete the move constructor since you hold interior pointers

@Sam-Belliveau
Copy link
Contributor

I'd recommend using #11532 to record timings with something like m_present_counter.Count(present_times, true);, which should give you a really good graph.

If you begin recording something like frame latency, it would be like m_frame_latency.Count(latency, false);

There is more information about what the boolean does in the PR, but it should give you a nice way to get some fast graphs. I spent a lot of time optimizing the PerformanceTracker class so there shouldn't be too much overhead to adding more counters.

@phire phire marked this pull request as draft February 11, 2023 06:00
@phire
Copy link
Member Author

phire commented Feb 11, 2023

I'm going to mark this as draft until I finish my current work refactoring the whole vulkan backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants