Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add the VK_EXT_present_timing extension #1364

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

cubanismo
Copy link

@cubanismo cubanismo commented Sep 14, 2020

This extension allows an application that uses the VK_KHR_swapchain extension to obtain information about the presentation engine's display, to obtain timing information about each present operation, and to schedule a present to happen at a specific
time. Applications can use this to minimize various visual anomalies (e.g., stuttering).

Contributing

Because of the nature of the problem space this specification covers, the Vulkan System Integration working group has decided to move development of this specification to a public forum to make it easier to gather feedback and collaborate with developers across the various graphics ecosystems. However, to preserve the option of promoting the resulting extension to a Khronos ratified extension in the future, some guidelines (In addition to the usual contributing guidelines) must be followed when providing feedback or making contributions to the specification language.

  • If you have high-level feedback that does not include sensitive IP or implementation details, feel free to comment on the merge requested directly.

  • If you have specific wording changes or functionality you wish to add to the specification, please provide it as a pull request targeting the VK_EXT_present_timing branch of the KhronosGroup/Vulkan-Docs repository. This will require accepting the Khronos CLA before the pull request can be merged.

This is a relatively new process for Khronos, so feel free to provide meta-suggestions on how best to facilitate dialogue and foster engagement as well.

@CLAassistant
Copy link

CLAassistant commented Sep 14, 2020

CLA assistant check
All committers have signed the CLA.

@afrantzis
Copy link

afrantzis commented Sep 15, 2020

Hi I have created an WIP/RFC proposal for a Wayland protocol which is modeled after this extension (here). It aims to provide enhanced presentation timing requests and events for Wayland, and to provide support for this VK extension in particular. Given the early stages of the public discussion both at the Wayland protocol and the Vulkan extension level, (possibly significant) changes should be expected. Thanks!

[NOTE]
.Note
====
The pname:presentSlop is used to avoid unintentionally missing a vertical
Copy link
Contributor

@emersion emersion Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why the presentation engine needs both targetPresentTime and presentSlop? Why can't the application just provide a target of targetPresentTime - presentSlop?

I was wondering if the reason was VRR, but there's already idealPresentTime that could be used in this case. Or are there situations where idealPresentTime is not suitable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked a similar question on the discussion for the new wayland present-timing protocol which i think is supposed to provide the foundation for VK_EXT_present_timing for Wayland/WSI.

My understanding would be that presentSlop is only meant for FRR displays, to avoid skipped frames if targetPresentTime happens to lie inside or just after the end of a vblank, due to numerical inaccuracies, or slightly noisy presentation feedback timestamps used as basis for calculating a new targetPresentTime. E.g., given that actualPresentTime reported by the driver would mark the end of a vblank/start of a new active scanout period, doing an animation that sets targetPresentTime = actualPresentTime + some_multiple_of_a_refresh_duration could easily miss the present for the wanted target vblank if actualPresentTime is a few microseconds too late or some_multiple_of_a_refresh_duration is slightly too large due to accumulated numerical roundoff error.

So my understanding of presentSlop for a FRR display would be that iff the time interval [targetPresentTime - presentSlop ; targetPresentTime] intersects with a vblank, then the presentation engine should target that intersected vblank for presenting. But if the interval does not intersect with a vblank, then the semantic should be that a present must not happen before targetPresentTime. For a sufficiently large presentSlop, e.g., 1 msec, that would avoid accidentally missing a target vblank and thereby a skipped frame, while still preserving which vblank the application intended to target for a present.

E.g., on Linux with the open-source display stack, a display server could compare vblank timestamps against such an interval and in the case of intersection it would schedule a page-flip for the refresh interval anywhere before the start of the intersecting vblank interval and leave it to the kms drivers and gpu to wait for completion of all needed fences and lock to the right vsync to trigger the page-flip in hardware.

Right understanding? Or a misunderstanding of the use case for presentSlop?

My own software implements a similar mechanism for FRR mode by advising to use this formula to calculate a targetPresentTime, based on actualPresentTime from a previous present, if it wants to present every 'waitFrames' video refresh cycles:

targetPresentTime = actualPresentTime + waitFrames * refreshDuration - 0.5 * refreshDuration

presentSlop would be the equivalent of the shift of "-0.5 * refreshDuration", but more elegant and robust at the driver level and wrt. interfacing with the application. Assuming i understand the intended use of presentSlop?

xml/vk.xml Outdated Show resolved Hide resolved
[open,refpage='VkRelativePresentTimeEXT',desc='Specifying the minimum duration an image should be presented',type='structs']
--

The sname:VkRelativePresentTimeEXT structure is defined as:
Copy link
Contributor

@emersion emersion Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be a good idea to state explicitly the motivation for a separate relative timing target.

Applications submitting a single frame at a time wouldn't have any use for this struct I think (apart from maybe a slightly simpler interface), because they could create a suitable absolute target instead (last actualPresentTime + minPresentDuration).

However, applications queueing multiple frames at once don't know in advance when the presentation engine will choose to display their image. VkRelativePresentTimeEXT allows the application to specify a target relative to the previous actual present time, without having to manually adjust the absolute timing (n+1) depending on feedback n.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be great for stuff like presenting at a stable 30fps on a 60hz display.

Copy link

@nvlduc nvlduc Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think spec language should refrain from making recommendations or suggestions about usage and intent. I know this PR has quite a few of these, which I might clean up, but I'd rather not add more here. The appendix, which could probably be fleshed out a bit, is a better place for these remarks. Alternatively, one of the large existing .NOTE sections may also be appropriate; I'll look into adding a comment to that effect somewhere.

The pname:presentSlop is used to avoid unintentionally missing a vertical
blanking period on FRR displays due to rounding errors or drift between
clocks.
A suggested value for pname:presentSlop is half the _Refresh Rate_.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rounding behavior feels a little inelegant. For most FRR scenarios, what is the motivation to use nanoseconds instead of a frame counter timebase?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the simple "glXSwapInterval" use case for example, the actual present times might not be that important, but it would be usetul to just spam a refresh cycle of 2 frames for example without having to compute and keep track of exact refresh durations, etc. Perhaps it would be possible to have a "frame count" timebase somehow?

Copy link
Contributor

@emersion emersion Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relevant discussion in wayland-protocols: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/45#note_627138

having to compute and keep track of exact refresh durations

This doesn't sound like a lot of work tbh. Omitting the "frame count" timebase allows reducing the API surface and simplifying the extension.

visual anomalies.
Adjustments to a larger IPD because of late images should happen quickly,
but adjustments to a smaller IPD should only happen if the
pname:optimalPresentTime member of the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does same mean here? Exactly the same value, or +/- some delta?


The sname:VkPastPresentationTimingEXT structure is defined as:

include::{generated}/api/structs/VkPastPresentationTimingEXT.txt[]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing the "headroom" field from GOOGLE_display_timing. That feedback is useful for low-latency situations where you'd be able to time the main loop such that there is little time before GPU is done rendering and display unit scans out.

I suppose calibrated timestamps might work around that niche, but that would rely on swapchain and GPU working on the same timebase, and it wouldn't be able to take into account any compositing work happening outside the knowledge of Vulkan.

At each vertical blanking period, the presentation engine dequeues
successive-queued images for which their associated wait semaphores have
signaled.
The last image dequeued is presented.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as MAILBOX, but more precisely specified?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this dequeue operation happen? In the vblank handler itself, or some unspecified time before vblank for next frame (i.e. when the compositor might want to do some work)?

Copy link

@nvlduc nvlduc Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per previous internal discussions, this new present mode will be the subject of a separate extension. The key difference with MAILBOX, I think, is that the dequeue is done at vblank time, instead of when the wait semaphores have been signaled, which is difficult to implement in some situations. I'll make sure these questions are addressed when a proposal comes up.

user.
A value of zero specifies that the presentation engine may: display the
image for any duration.
* pname:idealPresentDuration provides an indication to the presentation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact semantics of idealPresentDuration seem a bit vague to me. I don't quite understand how it works.

There seem to be three distinct cases for it:

target == ideal:

Can optimalPresent still be smaller than target in case where we could go from 33.3ms to 16.67ms duration? If so, why use ideal at all?

target < ideal:

Does this mean anything? Will the presentation engine try to aim for present at ideal instead, rather than target? If so, can't the app just use target = ideal?

target > ideal:

Same question as the first case with equality. Are optimalPresent values returned somehow directly equal to ideal here? I.e., will driver look at reported present time, and snap it to idealTime if it's close enough? If so, what is close enough for snapping to happen?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/45/diffs#f852003dd5091841baf96e797d61195b312f3395_0_22 seems more precisely specified, if it's intended that it should work like the Wayland RFC.

If the presentation engine's pname:refreshDuration is a fixed value,
the application's image present duration (IPD) must be a multiple of
pname:refreshDuration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This language may be valid for other presentation modes, but is not valid for immediate present mode.

Image Present Duration::
The amount of time the application intends for each
newly-presented image to be visible to the user.
This value should: be a multiple of the refresh cycle duration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This language may be valid for other presentation modes, but not for immediate present mode.

Image Present Rate::
The number of newly-presented images the application intends to present
each second (a.k.a. frame rate).
This value should: be a multiple of the refresh rate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this language is not what is intended.

@timo-suoranta-varjo
Copy link

timo-suoranta-varjo commented Oct 21, 2020

How this extension interacts with immediate present mode?

Can we also make it possible to query vertical blanking duration, which is a fraction of refresh cycle duration?

2) Do we return min/max Values for Refresh Duration for VRR?

*PROPOSED*: return only the minimum value of refreshDuration for a VRR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reconsider this. I would like to ask for providing (at least as an optionally reported value) both minimum and maximum value for refreshDuration for VRR. One could always have special values like 0 to signal that specific info is not available/not yet available/not applicable to a given display setup, e.g., if a driver vendor doesn't want to implement additional info fields in initial implementations of the extension. Ideally it would be great if even more detailed info would be provided, e.g.,

  • minimum duration (== maximum refresh rate), corresponding to what the display can do at current video mode/display link bandwidth/other limitations.

  • maximum duration (== minimum refresh rate) of what the display can physically do without tricks like low framerate compensation (lfc). Typical values for current G-Sync or FreeSync/DP-adaptive sync displays are around 30 Hz - 48 Hz as far as i know.

  • maximum duration (== minimum refresh rate) of what can be done on the display + gpu combo + driver via tricks like low framerate compensation, panel self-refresh etc., e.g., the solutions implemented by NVidia and AMD atm. for going below ~ 30 Hz refresh rate / above present intervals of 33 msecs.

  • Ideally for VRR modes we'd even have some reporting of the "good quality range" in which duration can be varied quickly without causing strong perceptible visual artifacts like significant flicker of the image.

In my experience, some (many? most?) VRR panels do show a perceptible dimming of the image if the time-gap between successive presents is substantially increased beyond minimum duration, e.g., if min duration would be 6.944 msecs (on a max 144 Hz display), and max duration supported by the display itself would be 33.333 msecs (on a min 30 Hz display), then switching between presents with a spacing of 7 msecs and 17 msecs and back to 7 msecs may work without any user-perceptible flicker, but making bigger jumps from 7 msecs to 33 msecs and back will cause very perceptible and irritating flicker.

Similar, going beyond 33.333 msecs interval between presents would work as gpu's or panels would trigger an automatic refresh cycle (once maximum front-porch duration of the vblank is exceeded), but then block the app from presenting for a period of time (ie. at least one active scanout duration). Similar exceeding the physical maximum duration will trigger low framerate compensation which changes the timing characteristics wrt. flicker and blocking the application from presenting.

I am the core developer and maintainer of the popular open-source toolkit Psychtoolbox-3 which is used for basic research in neuroscience, vision-science, cognitive science, psychology and related basic medical research applications related to visual perception, eye care etc. The toolkit is also used for basic perception research in industry for future VR/AR applications and display technologies. Cfe. this VESA press release for how the software will be used for research into new HDR displays, display stream compression methods, VR and AR.. Without going into too many details here, be assured that precise visual presentation timing and precise/trustworthy visual presentation timestamping matters critically for many research applications. VRR is one of the most useful tools developed in the last decade for these paradigms, to allow fine-grained timing and framerate control. But we need presentation to be high quality, e.g., minimizing flicker when transitioning between different presentation intervals between presents. So having as much info about the timing properties as possible is very valuable to achieve that goal.

Since January 2020 Psychtoolbox implements a special VRR based presentation mode for research applications, which tries to make as good use as possible of AMD FreeSync with Displayport adaptive sync etc. to give fine-grained presentation timing control. This is currently done under Linux with classic native X11/GLX + OpenGL on top of AMD's open-source display drivers. A demo and test of VRR scheduling is part of Psychtoolbox under this link VRRTest.m.

I contributed some bug fixes and enhancements to AMD's Linux kernel driver to make this more robust and trustworthy for apps like mine. It works reasonably well, given the constraint of doing all scheduling of bufferswaps/flips in the user-space application, but one annoyance of the current approach is that my algorithm needs to know both minimum and maximum physical refresh duration of the display at a minimum. Ideally also the other properties mentioned above. Lacking api support mean users have to figure out minimum refresh rates themselves, e.g., by finding their displays specs online or in printed manuals, or by manually decoding EDID data or accessing root-only-readable privileged files, deep into the Linux debug filesystem, just to tell the application what these values are. Other useful properties like the ones mentioned above need to be guessed via heuristics. That's a quite tedious and error prone user experience that could be easily avoided in future presentation timing extensions like this one, by providing rather more details about display system timing characteristics rather than less.

Since recently we also support HDR display for research, using Vulkan's HDR extensions. However, reasonable timing can only be achieved under Vulkan atm. via rather gross hacks - and often not at all. Therefore my hope is that this extension can contribute to great progress for applications like mine wrt. research use of VRR + HDR, as well as for VR/AR.

Some of the issues requiring more detailed info about presentation engine properties could be solved in the Vulkan drivers or display drivers themselves, and i intend to contribute to the Linux FOSS drivers related to this extension where it is helpful, once the extension is finalized. But assuming not all problems can be solved on all target platforms perfectly, it will be neccessary for applications like mine to adapt or compensate for shortcomings or limitations of drivers, so more information and control is always good for use cases like research.

That would be the most-likely use-case that I can come up with for why a query
might be useful.
Is that compelling enough?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In neuroscience/vision-science testing it is not unusual to have paradigms where an application may prerender dozens or hundreds of frames - as much as it can fit into VRAM, and then wants to present them in a fast "burst", ie. queuing up many of them at once, specifying a present duration or target present time for each of the images in the sequence. Think about trying to present a sequence of complex images at 240 fps or more without skipping a frame. A variaton of this is to render ahead at least a few frames, like with triple-buffering etc. to maximize parallelism between gpu and cpu when complex rendering at high fps has to push the system hard, to avoid or minimize dropped frames.

In any of these scenarios it would be a big advantage to be able to render/queue ahead a lot and only collect the presentation timestamps at the end of such a burst. Atm. Psychtoolbox-3 on Linux + X11 + GLX windowing system interface + OpenGL takes advantage of the GLX extension INTEL_swap_events to allow for that. Wayland allows for the same with the current presentation_time extension. So for research applications, having a deep feedback queue would be good to retain functionality we already have under Linux/X11 or Linux/Wayland.

Copy link

@nvlduc nvlduc Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the size of the internal timing feedback queue should be defined by the application. As it may involve some non-trivial allocations from the implementation (such as internal synchronization primitives), my proposal would be to add a pNext structure extending VkSwapchainCreateInfoKHR and specify it there, e.g.:

     typedef struct VkSwapchainPresentTimingCreateInfoEXT {
         VkStructureType            sType;
         const void*                pNext;
         uint32_t                   timingQueueSize;
         // ...
     } VkSwapchainPresentTimingCreateInfoEXT;

The question then is how to specify the behavior when that queue is full of entries not yet queried by vkGetPastPresentationTimingEXT. From previous internal discussions, my understanding is that not all implementations would be able to support either a ring-buffer behavior (overwrite oldest entries) or easily drop new entries. Making that case invalid by adding a Valid Usage entry to vkQueuePresentKHR seems like a reasonable compromise.

* pname:pNext is `NULL` or a pointer to an extension-specific structure.
* pname:timeDomain is a elink:VkTimeDomainEXT value representing the time
domain that should be used with the swapchain.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that i would like to see - not sure if this would be the right place, but given it is related to presentation timing - would be a way for the application to optionally request or forbid the use of VRR mode, similar to the fullscreen_exclusive extension, where there could be three settings "Enable VRR", "Disable VRR" and "Let driver decide between VRR or FRR". Again, research applications would appreciate control about VRR vs. FRR on VRR capable systems, as VRR is very well suited and beneficial for some research paradigms, but detrimental to others. My toolkit currently implements such a switch on Linux+X11+Mesa via some hacks to manipulate X-Atoms "behind the back" of the Mesa graphics library, based on detailed knowledge about how this works on XOrg's X11 + Mesa DRI3/Present implementation. It works, but it is ugly and only works on FOSS drivers on Linux, not on other drivers or operating systems. A proper on/off/auto switch as part of this Vulkan extension could make this so much more sane, future-proof and portable.

zero; otherwise it is ename:VK_FALSE if the presentation engine is
operating as a FRR display, or ename:VK_TRUE if the presentation
engine is operating as a VRR display.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be a good place to also report if the swapchain's associated presentation engine is in principle capable of VRR, via a VkBool variableRefreshSupported? I'd like to have a way to find out if VRR is in principle supported on a swapchain's display (ie. driver + gpu + display support it and it is enabled by compositor/user). "variableRefresh" describes if VRR mode is active at time of query.

E.g., if connected display or gpu or display server/driver is incapable of VRR or if VRR is disabled by the user in some display setting, then both variableRefreshSupported and variableRefresh would be VK_FALSE. But if the setup is capable of VRR and just at this specific time of query VRR mode is not active, maybe because the window is not a fullscreen window or partially obscured, then variableRefreshSuppported would be VK_TRUE, but variableRefresh would be VK_FALSE.

This would be very useful for research applications but presumably also games and other apps to provide users with more accurate feedback/warnings or some troubleshooting instructions if the given use case requires VRR and VRR is not active, e.g., to figure out if it is because of obscured window / desktop composition kicking in, or instead because of unsuitable display/gpu or user error (=user disabled the required VRR feature in some display setting GUI or config file). My toolkit does this on Linux/X11, involving lots of non-portable hacks, but a proper and portable reporting mechanism in this extension would be splendid.

pname:actualPresentTime and pname:optimalPresentTime with any
other values, as the pname:presentation engine may not be able to
provide accurate values.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another feature request for this structure: I'd like to have an indicator of how the present was performed and how trustworthy/reliable the reported actualPresentTime is. Given how absolutely critical proper timing and timestamping is for research applications like the ones i mentioned, it is important to be able to assess how likely it is that presentation happened in a reliable way and that the timestamps reflect reality. A research application may want to warn the user or even abort operation if it detects unreliable operating conditions. Or it wants to guide the user in troubleshooting common problems.

E.g., in my experience, on current desktop operating systems like Linux, Windows, macOS, a minimum requirement for being reasonably certain that reported/computed presentation timestamps are of sufficient accuracy and that an image was presented in a pixel exact fashion without any artifacts is that a window is fullscreen exclusive, ie. unobscured, top-level, decorationless, with the desktop compositor mostly out of the way and using gpu page-flipping for presenting images. A research application wants to be certain that its window was not obscured by other windows (or popup notification messages), the pixels were displayed as specified (ie. not altered due to alpha-blending, soft-shadows from other windows, or modified by some desktop effects), without tearing and at the time represented by the timestamps. Anything else would be considered a critical failure in this scenario - something that needs to be reported to the user and fixed by the user immediately.

On Linux/X11/GLX on the XOrg X-Server with FOSS graphics stack, i can use the GLX extension INTEL_swap_events to find out if pageflipping was used for presentation (good!) or if framebuffer copies were used (bad!). With a properietary graphics driver like NVidia's i can still use more painful and non-portable hacks like evaluating return values from GLX_ext_buffer_age and low-level hacks like reading certain gpu MMIO hardware registers directly to try to figure out if pageflipping was used.

On Linux+Wayland, the presentation_time protocol extension provides feedback about how a present happened, at least on the few Wayland compositors that implement the extension. We have flags like
WP_PRESENTATION_FEEDBACK_KIND_HW_CLOCK, WP_PRESENTATION_FEEDBACK_KIND_HW_COMPLETION,
WP_PRESENTATION_FEEDBACK_KIND_ZERO_COPY and
WP_PRESENTATION_FEEDBACK_KIND_VSYNC
to tell us if the present was synchronized to the refresh (KIND_VSYNC) and thereby tear-free, done with a pageflip or equivalent into some hardware overlay plane (KIND_ZERO_COPY), if present completion was detected and signalled by some robust and trustworthy gpu hardware mechanism like some pageflip completion interrupt or similar (KIND_HW_COMPLETION), and timestamped with a high-precision clock (e.g., a gpu hardware timestamp or some mechanism of equivalent reliability and precision == KIND_HW_CLOCK). The Wayland protocol extension is an improvement over the X11 INTEL_swap_events extension, and both are huge improvements over what other operating systems and windowing systems have to offer.

For serious research applications and similar "production" use cases that go beyond pure entertainment like games or video players, it would be great if the VkPastPresentationTimingEXT feedback could improve on these mechanisms, or at least retain some similar information to fulfill the same purpose of reliability assessment or aid in troubleshooting.
Even if some vendor implementations would just return some VK_DONT_KNOW flag to bail on such reporting if the burden of implementation is deemed to high. I would certainly try to contribute at least to Linux FOSS driver implementations of the extension to make sure that such feedback flags would be implemented in a useful way.

images of the swapchain are presented, an
slink:VkRelativePresentTimeEXT can: be provided to specify the
minimum duration they should: be displayed.
endif::VK_EXT_present_timing[]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrt. to these new swapchain create flags: Maybe i missed it, but will there be corresponding capability bits somewhere, to signal to the application if VK_SWAPCHAIN_CREATE_ABSOLUTE_TIME_BIT_EXT and/or VK_SWAPCHAIN_CREATE_RELATIVE_TIME_BIT_EXT is supported by a given surface to which the new swapchain is going to be attached? So the app could e.g., only request the timestamp feedback bits, but not presentation time scheduling?

Ideally one wants to have both presentation time scheduling support as provided by these bits, and feedback about actual presentation timing via vkGetPastPresentationTimingEXT(). But some windowing system implementations on some platforms may only provide the infrastructure to support a partial implementation of the VK_EXT_presentation_timing_extension at the moment, at least for a while until windowing system implementations catch up. Even partial enablement of this extension, e.g., only the timing feedback part, would be valuable for some use cases, so it would be nice if apps could query what is supported and make use of that and adapt for missing bits.

E.g., for FRR display modes the Linux DRM/KMS FOSS display stack has all that is needed to support timing and timestamping in fullscreen exclusive direct display mode (leased DRM/KMS outputs on X11 or future Wayland). For VRR mode, the current kernels have the timestamping part in excellent shape, but nothing yet for scheduling of VRR flips at a specific target time. Would be nice to make use of the extension as far as possible on already shipping Linux kernels.

Similar, X11 with DRI3/Present has all that is needed for FRR timing and FRR/VRR timestamping on the FOSS stack, although NVidia's proprietary driver is severely lacking. Nothing for VRR scheduling though. Current X-Server can be put to good use for a partial extension.

On Wayland we have a stable protocol extension named presentation_time since > 5 years, which provides timing feedback on the subset of compositors that implement it (afaik Weston does it well - i was involved in testing/debugging/improvements, wlroots and Sway seem to support it? Some KDE/Kwin fork seems to support it? Googles chromium seems to have support? may imply ChromeOS? / Android?). There isn't any support for scheduling presentation at a specific target time yet. Would be good to be able to at least use the timestamp feedback part on already shipping Wayland compositors.

On macOS 10.15.7 MoltenVK seems to implement the VK_GOOGLE_display_timing extension at least partially on top of some Metal api timing mechanisms.

On Windows-10, i don't know the status, but some DXGKI structs in public docs suggest there may be at least something somewhat usable for timestamp feedback?

Copy link

@nvlduc nvlduc Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few people have requested this. I added a new structure extending VkSurfaceCapabilities2KHR which looks like this:

    // Provided by VK_EXT_present_timing
    typedef struct VkPresentTimingSurfaceCapabilitiesEXT {
        VkStructureType    sType;
        void*              pNext;
        VkBool32           presentTimingSupported;
        VkBool32           presentAtAbsoluteTimeSupported;
        VkBool32           presentAtRelativeTimeSupported;
    } VkPresentTimingSurfaceCapabilitiesEXT;

I think we might also want to plug into vkGetPhysicalDeviceSurfacePresentModes2EXT, but I'm slightly annoyed that it is provided by VK_EXT_full_screen_exclusive which I'm reluctant to add as a requirement to this extension, so I will leave this out for now. We need to better specify interaction with each present mode first anyway.

Copy link
Author

@cubanismo cubanismo Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct way to pull in a bit of functionality from an unrelated EXT is to add it to both extensions. Then you get OR ifdef logic in the spec for that function, and I believe there are existing samples of a way to handle this in the XML as well.

* pname:swapchain is the swapchain to obtain timing properties for.
* pname:pSwapchainTimingProperties is a pointer to an instance of the
sname:VkSwapchainTimingPropertiesEXT structure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could an application do an efficient blocking wait for present completion? Ie. without busy-waiting in a tight loop?
Research applications like mine often need to synchronize certain actions with present completion. E.g., starting playback of a sound at the time a certain image starts displaying, or starting audio capture, or programming some digital i/o interface card / serial port / to send out some digital TTL trigger signal to research equipment, or sampling some sensor, or sending a trigger packet over the network. Some official way of having a blocking wait for a present completion with a specific target "presentID" would be useful. Or maybe a VkFence that could be associated with such a present completion and signalled?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking for a way for the app to perform a CPU-wait that blocks until a queued present has executed in the PE? While this can be valuable info, I feel like it would be feature creep for this extension, and therefore would better be added in a separate extension. I am guessing here, but I'm not convinced that all currently-relevant platforms that could implement this proposed extension are necessarily also capable of unblocking a CPU waiter at the event you request.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what i want. Busy-waiting in the app is not good, especially not if you have to poll at a very high rate, because one needs lowest possible delay between present execution completion and responding to that. Drains battery on laptops, causes overhead, and on some systems the app may get penalized for such behaviour by reducing its scheduling priority, or even demoting it out of realtime scheduling for hogging a cpu too much, e.g., macOS, Windows iirc, or some of the schedulers on Linux. E.g., i already implement VK_GOOGLE_display timing support in my app, and i'd like to cut this 100 lines+ mess... https://github.com/kleinerm/Psychtoolbox-3/blob/master/PsychSourceGL/Source/Common/PsychVulkanCore/PsychVulkan.c#L2602 ... down to a handful lines, if we had block for presentID support in this extension.

If you only have polling and a frame is skipped, and even more so for variable refresh displays, you can end up with polling loops that run for dozens of milliseconds, burning cpu needlessly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjulianoatnv @keith-packard Following up on this after some recent "play-time" with NVidia's recent proprietary Vulkan driver on Windows-10 and Linux:

Seems the VK_KHR_present_id and VK_KHR_present_wait extensions are the ones that implement this block-until-present-complete behavior. That's what i hoped for with my comments above.

I also saw that @keith-packard GitHub repo has some wip branch for Mesa on this. Great! Any ETA on when that will get integrated into Mesa?

However, the NVidia proprietary implementation seems to be quite broken atm., to the point of being useless. I couldn't get any spec compliant or recommended behavior on Windows-10 at all. No meaningfully accurate relationship between return from vkWaitForPresent() and actual image presentation. On Linux it didn't add anything in precision for the single-gpu case, and was totally broken for the Optimus / Prime render offload case as far as i could tell.

Another problem i realized is that the extension does not allow to query for support per VKSurface, only if the implementation in general supports it. So we have the same problem as mentioned in these comments wrt. querying support. Some windowing systems on Linux can easily support vkWaitForPresent(), e.g., DRM/KMS direct display on a DRM master or leased DRM output, and X11 via DRI3 + Present extension. But on Wayland, only some compositors implement the required presentation_feedback extension, others don't, so a per-WSI capability reporting would be needed for correct behavior. I guess an implementation of the extension will simply have to lie to clients as it stands, and claim support on WSI/Wayland even if the underlying Wayland compositor may not support it.

@allenwp
Copy link

allenwp commented Jul 16, 2021

I believe this extension would be useful in developing tools to measure video latency of sink devices, such as displays connected via HDMI, DisplayPort, etc. This purpose has similar requirements to the neuroscience/vision-science testing requirements presented by @kleinerm.

In order to develop such tools, timing information for when specific pixels of an image are transported is required and this timing information will be dependent on blanking interval durations. I believe this extension would be appropriate for reporting blanking interval durations, which would allow calculation of exactly when each pixel of an image is transported.

In terms of how this information could be presented in this extension, two fields would be relevant: exactly when an image begins to be presented and when image presentation has completed. It can be assumed that the remaining time is a blanking interval between images. Currently, I believe only the time when an image begins to be presented is available in these structs. I don't see any way to determine when an image completes presentation and between-image (vertical) blanking begins. With modes such as HDMI Quick Frame Transport, the blanking region can become quite large.

Background:

Video latency is relevant for audio/video synchronization (lip sync) between different sink devices, such as a video display and audio amplifier. It is also relevant for a user's selection of a display/mode for purposes such as esports, virtual reality, and neuroscience/vision-science testing. "Video latency" of a sink device, for these purposes, should be measured as the average time between when a pixel’s complete data arrives at a device and when that pixel is presented to the user as light.

This definition is effectively what is stated in the HDMI Specification 1.3 and higher, which is used in the HDMI "Auto Lipsync" feature (Spec 1.3, Section 8.9.1 “EDID Latency Info”):

“The latency values within these fields indicate the amount of time between the video or audio entering the HDMI input to the actual presentation to the user (on a display or speakers), whether that presentation is performed by the device itself or by another device downstream.”

Video Latency Measurement Tool:

An easy trick to measure the average video latency across a display's surface is to place a photodiode at the center of a display, which will account for different display scannout methods such as black frame insertion/low persistence, presenting the entire image at once rather than raster scanning, HDMI "Quick Frame Transport", variable refresh rate mode, etc. The time between when pixel information for the center of the image arrives at the display's input and when the photodiode detects this change in light is the average video latency for the display.

Thanks for taking the time to consider this purpose! I'd love to hear your thoughts.

Allen

@pasikarkkainen
Copy link

pasikarkkainen commented Jul 21, 2021

Vulkan 1.2.185 spec has been released, including the VK_KHR_present_id and VK_KHR_present_wait extensions, which are relevant to VK_EXT_present_timing aswell.

@swick
Copy link

swick commented Jul 21, 2021

I've come to believe that the idealPresentTime/optimalPresentTime mechanism doesn't work properly when the buffer readiness deadline (or even the commit deadline) is earlier than the presentation. This can happen when the presentation engine is compositing but also in other cases.

Consider the follow scenario: the display is refreshing at a 16ms interval, the PE requires the buffer to be ready 8ms before the presentation. A client uses VK_KHR_present_wait to start drawing at a present, the GPU finishes with the buffer 10ms later and the targetPresentTime is set to T+32. The client notices that it only takes 10ms so it sets the idealPresentTime to T+10 and the PE correctly answers with optimalPresentTime T+16. The next frame the targetPresentTime is set to T+16, the client again finishes the buffer at T+10 but it won't reach the present at T+16 because that present has a buffer readiness deadline of T+8.

@swick
Copy link

swick commented Jul 21, 2021

Adding to that: when VK_KHR_present_wait gets called and the display is refreshing at a 16ms interval one would expect to have a budget of 16ms to finish the frame. If the PE requires the buffer to be ready earlier than the present this is not the case. In the example above the frame budget is 8ms. So while the problem above results in frames with an actualPresentTime != targetPresentTime the VK_KHR_present_wait extension is also flawed and reduces the frame budget. Waiting for the buffer readiness deadline of a present instead would solve this issue.

* pname:actualPresentTime is the time when the image of the
pname:swapchain was actually displayed.
Copy link

@aleiby aleiby Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to @kleinerm's suggestion to provide timing variability, I would like to see actualPresentTime a bit more rigorously defined. Does "displayed" refer to when the display is illuminated after scanout completes? Is this the precise start, end or midpoint of that period? What about rolling displays which illuminate a band of pixels as they are being scanned out?

I propose reporting the start of vsync instead. Then with the ability to query more detailed timing info (e.g. vblank, front porch, clock rate) any other specific required relative points of time can be calculated. This would also enable reporting the time prior to when actual illumination takes place.

Copy link

@swick swick Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. While it would be nice to know when a present hits the users eyeballs the start of vsync to that point in time is usually ~constant and if the system knows it could be communicated as a separate field.

Copy link

@kleinerm kleinerm Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose end of vblank / aka start of scanout / when the first top-left pixel of the new image leaves the video output of the gpu as the point in time, because that is what is used by any other extension i am aware of (e.g, on Linux VDPAU, GLX/OML_sync_control, X11/DRI3/Present, X11/INTEL_swap_events, Wayland presentation_feedback, kernel DRM/KMS vblank and pageflip events, on Apple macOS/iOS Metal drawable presentation timestamps) etc. and also the VK_GOOGLE_display_timing extensions.

This is consistent with all past efforts, and easy to implement correctly. Especially useful if one wants to implement other past extensions on top of this one.

Should also work with VRR displays, at least as far as our current Linux/DRM/KMS implementation for AMD's amdgpu-kms driver goes, and some experiments i did on NVidia G-Sync hardware against the proprietary driver suggest the same approach works there.

@pasikarkkainen
Copy link

pasikarkkainen commented Sep 28, 2021

Any updates about the VK_EXT_present_timing extension? .. for the general public not participating in the khronos working groups? :)

@cubanismo
Copy link
Author

cubanismo commented Oct 1, 2021

Any updates about the VK_EXT_present_timing extension? .. for the general public not participating in the khronos working groups? :)

Unfortunately, the general public is as up to date on progress as the working groups are at the moment.

@ishitatsuyuki
Copy link

ishitatsuyuki commented Dec 3, 2021

I'm currently working on a latency reduction solution and I would like use the present timing extension to predict the next present time as accurately as possible (in particular on VRR displays, on FRR the prediction can be done mostly just with an initial timing).

One annoyance currently is that both the EXT and GOOGLE present timings extension requires external synchronization on swapchains, and many of the swapchain functions can block for an extended period. This makes it hard to acquire timing information "as soon as possible".

Please consider adding some synchronization-free way to obtain the timing, just like how timestamp queries can be accessed without synchronization.

@pasikarkkainen
Copy link

pasikarkkainen commented May 29, 2022

Any progress recently with the VK_EXT_present_timing extension ?

ianelliottus and others added 3 commits Oct 7, 2022
This extension allows an application that uses the
VK_KHR_swapchain extension to obtain information
about the presentation engine's display, to obtain
timing information about each present operation,
and to schedule a present to happen at a specific
time.  Applications can use this to minimize
various visual anomalies (e.g., stuttering).
Fix some trivial spec build and validation issues:
- Update some includes from .txt to .adoc.
- Reserve different bits for VkSwapchainCreateFlagsKHR.
- Remove disallowed contractions.
- Mark pNext members in present timing structs as optional.
- Typo fixes.
- Update copyright years.
- Remove conflicting VUIDs.
This change removes the 'presentID' member from
VkAbsolutePresentTimeEXT and VkRelativePresentTimeEXT to rely on
VkPresentIdKHR instead. If no present id is supplied, timing
collection will reference id 0.

Other minor changes:
- Rename VkPastPresentationTimingEXT::presentID to 'presentId'
  for consistency with VK_KHR_present_id.
- Change VkPastPresentationTimingEXT::presentId to uint64_t to
  match VkPresentIdKHR.
- Add dependency on VK_KHR_swapchain and VK_KHR_present_id in the
  spec xml and appendix.
@nvlduc nvlduc force-pushed the VK_EXT_present_timing branch from 0341f2b to f6fe88d Compare Nov 14, 2022
nvlduc added 2 commits Nov 14, 2022
Add a new VkPhysicalDevicePresentTimingFeaturesEXT struct that exposes 3 features:
- presentTiming: corresponds to the ability to collect present timing information via
  vkGetPastPresentationTimingEXT. This feature is required when VK_EXT_present_timing
  is supported
- presentAtAbsoluteTime / presentAtRelativeTime: these are optional and correspond to
  the "present-at" capabilities of the extension, i.e. adding a VkPresentTimesInfoEXT
  in the VkPresentInfoKHR pNext chain.
As per previous discussions, VK_PRESENT_MODE_FIFO_LATEST_READY_EXT isn't
directly related to present timing. Split it out so we can roll it into
its own extension.
Copy link

@kleinerm kleinerm left a comment

Hi, maybe my comments in the general discussion were overlooked, or there is still something in the making, but looking at this commit, I'm worried my issues are not addressed:

How can a client app find out if/which present timing extension features are supported for a given VKSurface if surfaces can be created from different windowing system WSI implementations for a given OS platform? E.g., on desktop Linux there exist at least 3 different WSI backends from which a single application instance could simultaneously create and use different VKSurface's:

  • Direct output to DRM/KMS via a VkDisplayKHR handle aquired from X11 via RandR output leasing or the corresponding Wayland output leasing protocol.
  • X11
  • Wayland.

Implementing the present timing extension on top of DRM/KMS kernel api / VkDisplayKHR, or native X11/XOrg on top of DRI3/Present is rather straightforward and possible now. @keith-packard already had suitable prototype implementation ready for this years ago.

Implementing the extension at all, or fully, on top of Wayland is currently impossible, and I don't have much hope this will change in the foreseeable future. There's a presentation_feedback protocol for vkGetPastPresentationTimingEXT, but it is only implemented in a subset of existing Wayland compositors, e.g., Weston, GNOME/Mutter, wlroots/sway. And there isn't any protocol for the presentAtAbsoluteTime / presentAtRelativeTime functionality, with protocol specification ongoing since over a decade (!) without signs of this concluding anytime soon.

@keith-packard implementation of the predecessor VK_GOOGLE_DISPLAY_TIMING extension, while working very well under both DRM/KMS and X11 according to my own testing, so far hasn't been merged into Mesa and my impression is that this is mostly due to lack of any way forward with Wayland. That extension also had no way of reporting capabilities / support on a per surface basis.

Maybe there are similar problems on other platforms, depending on WSI backend used?

So I think the inability to query on a per invidual VKSurface basis if this extension is supported, and which subset of features, will be a real problem at least on Linux, and at least I am desperately waiting for this extension becoming available on at least Linux since years now.

Couldn't availability of these and future similar features instead be queried by some extension struct to, e.g., vkGetPhysicalDeviceSurfaceCapabilities2KHR, ie. a struct put into the .pNext chain of the VkSurfaceCapabilities2KHR struct, where reported capabilities are individual per VkSurface?

Or if you are going forward with this approach, what will be the way to go forward for client apps and Vulkan implementations like Mesa's Vulkan/WSI on systems like Linux with multiple WSI backends of wildly varying capabilities wrt. presentation timing? I really wouldn't want to wait for availability of this extension on desktop Linux until Wayland the protocol and all shipping Wayland compositors get their act together.

@nvlduc
Copy link

nvlduc commented Nov 22, 2022

@kleinerm Thank you for the feedback. Your comments have not been overlooked, this is definitely still a work-in-progress. I'm going at the low hanging fruits first to reduce the noise when going after the bigger comments / issues raised in this PR later on. This also leaves a bit of time for people to react to the smaller changes. I will reply to all the existing comments as I address them.

I do not have the expertise to comment on the specific Wayland issues you raised, but adding more queries such as the one you describe is planned.

nvlduc added 2 commits Nov 30, 2022
- Remove remaining reference to VK_PRESENT_MODE_FIFO_LATEST_READY_EXT.
- Add missing headings in appendix
- Add missing description for VkPresentTimesInfoEXT
- Reword VkSwapchainTimingInfoEXT::timeDomain xml comment
- Change pNext wording to match current style
- Specify order of elements in vkGetPastPresentationTimingEXT
A VkPresentTimingSurfaceCapabilitiesEXT can be returned in the pNext chain of
VkSurfaceCapabilities2KHR.
@nvlduc nvlduc self-assigned this Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment