Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

platform/windows: add DwmFlush() call to improve Windows capture #164

Merged
merged 5 commits into from
Jun 15, 2022

Conversation

psyke83
Copy link
Collaborator

@psyke83 psyke83 commented May 18, 2022

Description

Adds DwmFlush() call before acquiring a new frame to resolve framerate stuttering observed when moving the mouse. For more context, see (obsolete) PR and issue from upstream that describes the problem in detail:

loki-47-6F-64/sunshine#227
loki-47-6F-64/sunshine#286

Please note that I had to include a very minor but unrelated fix for VideoToolbox variables missing from the video_t struct, as this needed to be fixed in order to accommodate the new dwmflush entry correctly.

Issues Fixed or Closed

Type of Change

  • Breaking* change (fix or feature that would cause existing functionality to not work as expected)

Potential breakage only in specific circumstances when enabled: DwmFlush() constrains the framerate to the host computer's monitor refresh rate, so it may cause issues if the client is streaming at an FPS higher than supported by the host's connected monitor. I would personally prefer for this fix to be enabled by default, as this scenario seems like a rare edge case compared to the stutter that's experienced on all Windows clients. Feedback on whether I should change the default would be appreciated.
Breakage should no longer occur, as DwmFlush() is now called only if enabled and the host monitor refresh rate meets or exceeds the client requested rate.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated the documentation blocks for new or existing components

@psyke83
Copy link
Collaborator Author

psyke83 commented May 19, 2022

I notice the webui entry is not working correctly. Clicking on Audio/Video will show DwmFlush as Disabled, but navigating to the Advanced tab and back results in the DwmFlush entry being blank, if the option was set to disabled. Saving configuration on the Advanced page will clobber the current DwmFlush setting as well.

I won't be have the opportunity to check this until later tonight, so I'll set this as a draft.

@psyke83 psyke83 marked this pull request as draft May 19, 2022 07:19
@psyke83
Copy link
Collaborator Author

psyke83 commented May 19, 2022

Perhaps I'm missing something, but it seems that the webui is not behaving correctly independently of this commit? I downloaded the latest Windows binary release and launched a fresh session. After only setting up the username/pass and logging in, simply clicking randomly between Advanced and the three encoder tabs will result in entries randomly showing a blank modal selection in the encoder tabs.

@ReenigneArcher
Copy link
Member

Perhaps I'm missing something, but it seems that the webui is not behaving correctly independently of this commit? I downloaded the latest Windows binary release and launched a fresh session. After only setting up the username/pass and logging in, simply clicking randomly between Advanced and the three encoder tabs will result in entries randomly showing a blank modal selection in the encoder tabs.

Can you share a screenshot of what you're experiencing? I will see if I can replicate it.

@TheElixZammuto
Copy link
Contributor

I've tested this and don't see any regressions, so I guess that the fix works. @psyke83 if the problem happens outside of your changes it's problably better to file an issue and merge this anyway

@psyke83
Copy link
Collaborator Author

psyke83 commented May 19, 2022

@TheElixZammuto @ReenigneArcher

The problem that was happening for me: when you randomly switched tabs on the webui - particularly when moving from Advanced to another tab, random select drop-down boxes were showing no label instead of the correct setting. This was actually happening no matter whether or not the configuration was sourced from sunshine.conf or filled in by the fallback defaults -- and has nothing to do with this PR. I narrowed down the issue to the colon prefixes used for a handful of v-model directives. I'm no expert on HTML or Vue, but I understand is supposed to signify that they are v-bind elements, so they refer to named variables rather than strings. Their usage didn't seem correct, and when I removed them from the directives, everything started working correctly.

I've updated the PR with two small fixes for the webui that are split out from the DwmFlush commit.

I'd like a little more time to test the DwmFlush change itself, as I noticed a strange performance regression in a game (Dying Light) when this is enabled, but in the meantime, if you would like me to split out the unrelated webui/video_t struct commits into a new PR, let me know and I'll do that.

@ReenigneArcher
Copy link
Member

@psyke83 if there are game dependent performance issues, would it be possible (or even make sense) to enable this depending on the application that is launched?

@psyke83
Copy link
Collaborator Author

psyke83 commented May 20, 2022

I've tested about a dozen games using different graphics APIs with this patch. For all games that were affected by stuttering issues, DwmFlush seems to work as intended, and none exhibit any negative performance issues similar to what I see with Dying Light.

Keeping in mind that Dying Light was never a game in which I ever noticed the type of stutter DwmFlush is aimed at fixing, some notes:

  • Steam in-game counter shows a constant 60fps in all test cases.
  • Streaming looks subjectively very choppy (constant dips down to 30 fps) in all test cases.
  • Without DwmFlush, Moonlight's encoder statistics show a constant 60fps, matching Steam. But I definitely see major visual stuttering.
  • With DwmFlush, Moonlight's encoder statistics accurately reflect the slowdown/stuttering observed, constantly dipping from 60-40-30.

I checked Dying Light's Steam forums, and there's a recent thread of users complaining of major FPS drops. Although Steam's FPS counter showed a locked 60fps for me, I decided to try the workaround in the thread of manually selecting the beta branch that holds the previous released version, and Sunshine now works perfectly fine with or without DwmFlush enabled. So, I'm chalking this up to some kind of issue with the latest Dying Light that happens to negatively impact Sunshine.

I'll remove the draft status and let this go for review now. If any other regressions in other applications or games are noticed when DwmFlush is enabled, please let me know. Thanks.

@psyke83 psyke83 marked this pull request as ready for review May 20, 2022 00:09
@psyke83
Copy link
Collaborator Author

psyke83 commented May 20, 2022

I've made additional changes that would enable DwmFlush to be enabled by default: psyke83/SunshineStream@DwmFlush...psyke83:DwmFlush_auto

Should I include that commit into this PR?

I also found a very useful resource to help exhibit stuttering issue: https://www.vsynctester.com/ - specifically, theVSYNC synchronized indicator box. Without DwmFlush, the text appears as solid red/cyan whenever I move the mouse. With DwmFlush (or my previous attempt to fix the issue), you can only see occasional red/cyan flickers when the network or encoder misses a frame.

@ReenigneArcher
Copy link
Member

@TheElixZammuto I'll leave it up to your judgement if you think this option should be applied as the default.

@TheElixZammuto
Copy link
Contributor

@TheElixZammuto I'll leave it up to your judgement if you think this option should be applied as the default.

Given that @psyke83 has given a very toughtful comparison of both situations, and also made an option for disabling it using the Web UI, IMHO we should make it enabled-by-default.

@psyke83 psyke83 changed the title platform/windows: add optional DwmFlush() call to improve Windows capture platform/windows: add DwmFlush() call to improve Windows capture May 21, 2022
@mirh
Copy link

mirh commented May 22, 2022

I don't want to be that guy, but AFAICT dwmflush is always just kind of a hack to workaround issues elsewhere.
In pcsx2, it did use to work and be helpful to avoid stuttering, but long story short that was just because of having some messed up code in wxwidgets with multiple stacked windows somehow confusing drivers syncing with the compositor. So the UI stalling on every refresh cycle was giving it room to breath.
In openfl/lime#390 you can see it supposedly was about drivers bugs.
.. and even the people that wants to explictily vsync disdain it anyway.

In this case, you aren't really "rendering" stuff yourself (and the entire thing has to inherently depend and wait on another process anyway), but it still feels like having thrown something at the wall and appreciating that it sticked.
How does OBS, ffmpeg, VLC, magpie, webrtc or even just the DXGIDesktopDuplication sample handle the same task?

@psyke83
Copy link
Collaborator Author

psyke83 commented May 22, 2022

@mirh

Keeping in mind that I'm quite unfamiliar with programming with DirectX/DXGI or Windows in general, I did investigate some alternative solutions before DwmFlush. My previous PR (mentioned in first post) changed the logic to release the frame immediately after capturing the frame, instead of doing it after waiting for the next frame interval. It worked, but it goes against Microsoft's guidance:

For performance reasons, we recommend that you release the frame just before you call the IDXGIOutputDuplication::AcquireNextFrame method to acquire the next frame.

Another alternative I tried was flushing the command queue and blocking until complete using an event query, as per the official documentation: psyke83/SunshineStream@DwmFlush...psyke83:DwmFlush_alternative - this didn't help.

I also tried replacing the frame wait logic with WaitForVBlank() to see if it could at least resolve stutter when the host refresh rate and client framerate are matched, but this also failed to help alleviate the stutter issue.

I'm not familiar with all of the projects you mentioned, but OBS invokes ReleaseFrame directly after AcquireNextFrame. This is similar to my previous PR's approach, but again, goes against Microsoft's performance guidelines.

Would appreciate any suggestions, but I can say that I haven't noticed increased capture latency when DwmFlush is used.

@mirh
Copy link

mirh commented May 22, 2022

I really have no technical insight on the matter (my skill is spitballing), but it's just that it looks particularly odd here for a random vista function to be necessary for good performance despite not being mentioned anywhere.

https://stackoverflow.com/questions/48278207/dxgi-desktop-duplication-screen-capture-speed/#48279602
https://ko-fi.com/post/Massive-DXGI-Performance-Boost-W7W2W6YN
https://codereview.webrtc.org/2712353002/
Can this give you some extra clue?
https://stackoverflow.com/questions/60198426/invalid-call-on-releaseframe-in-desktopduplication-api/#60527805
Are we using a separate thread for acquisition?

obsproject/obs-studio#2604 (comment)
And last but not least, could it be that different games are having different results because of using software cursors rather than hardware ones?

@psyke83
Copy link
Collaborator Author

psyke83 commented May 22, 2022

I really have no technical insight on the matter (my skill is spitballing), but it's just that it looks particularly odd here for a random vista function to be necessary for good performance despite not being mentioned anywhere.

https://stackoverflow.com/questions/48278207/dxgi-desktop-duplication-screen-capture-speed/#48279602

The problem doesn't seem to be that we're not receiving frames fast enough. When the stutter issue occurs, frame_info.AccumulatedFrames always increases to 2, but Moonlight's encoder statistics never show a framerate dip below 60. If you change the client to display 90fps and stream from a host with a 60Hz monitor, running a VSync-locked game will show a constant 60FPS in Moonlight's statistics, but as soon as you move the mouse, the incoming framerate increases up to 90, along with the visual stutter (which subjectively looks like 30fps). So it seems to be some kind of quirk with the Desktop Duplication API by which mouse updates will exceed the host's vertical blank period.

https://ko-fi.com/post/Massive-DXGI-Performance-Boost-W7W2W6YN

That's Looking-Glass, which was listed in the Credits section of this project by the original author. If you check their upstream, they use DwmFlush to resolve the same issue via a commit that post-dates their blog entry you linked: gnif/LookingGlass@92f27cc

https://codereview.webrtc.org/2712353002/
Can this give you some extra clue? https://stackoverflow.com/questions/60198426/invalid-call-on-releaseframe-in-desktopduplication-api/#60527805 Are we using a separate thread for acquisition?

duplication_t::next_frame does release and acquisition, and display_ram_t::capture is what waits between frames. The current implementation means that ReleaseFrame is being called immediately before AcquireNextFrame, but the frame is not released until the next frame is scheduled to be captured (in the case of 60FPS, 16.6ms minus the overhead that was incurred to process the last frame).

obsproject/obs-studio#2604 (comment) And last but not least, could it be that different games are having different results because of using software cursors rather than hardware ones?

No. I tested this against The Witcher 3 (it has the stutter issue when scrolling the in-game map); there is an in-game option to enable the hardware cursor, and it made no difference.

@mirh
Copy link

mirh commented May 23, 2022

Sigh... thank you for putting up with my own set of "throwing at the wall".

psieg/Lightpack#199
https://stackoverflow.com/questions/56449934/dxgi-desktop-duplication-causes-stuttering-with-freesync-g-sync
Anyhow, assuming you don't have VRR/VSR/DSR enabled

https://old.reddit.com/r/VFIO/comments/f22k6u/windows_capture_performance_dxgi_bug/
https://forum.level1techs.com/t/news-dxgi-capture-performance/160459
gnif/LookingGlass#98 (comment)
https://github.com/gnif/LookingGlass/search?q=priority&type=commits
Are we requesting higher priorities than the base programs? Because I suspect this was the actual fix for our issue (they also talk about changing the way FPS targets are handled but idk). Having to limit framerate is explicitly excluded from being the big deal.

Instead what I believe LS uses DwmFlush for (and not even by default then) is actually.. just letting the system breath more.
Putting yourself as the most important program on the system is the proper thing to do I guess, when the expectation is that the actual user isn't interacting directly with it but can only act through you - but in some situations you might still want to strike a balance with the source application necessities.

@psyke83
Copy link
Collaborator Author

psyke83 commented May 23, 2022

Sigh... thank you for putting up with my own set of "throwing at the wall".

Happy to have more eyes on the problem.

psieg/Lightpack#199 https://stackoverflow.com/questions/56449934/dxgi-desktop-duplication-causes-stuttering-with-freesync-g-sync Anyhow, assuming you don't have VRR/VSR/DSR enabled

No. I'm testing on a 60Hz non-Freesync laptop LCD via Linux, usually at 1366x768. Shouldn't be too taxing.

https://old.reddit.com/r/VFIO/comments/f22k6u/windows_capture_performance_dxgi_bug/

This is describing our problem exactly, yes. Unfortunately I can't see the MS link, but I suspect that DwmFlush was the solution that gnif settled on?

https://forum.level1techs.com/t/news-dxgi-capture-performance/160459 gnif/LookingGlass#98 (comment) https://github.com/gnif/LookingGlass/search?q=priority&type=commits Are we requesting higher priorities than the base programs? Because I suspect this was the actual fix for our issue (they also talk about changing the way FPS targets are handled but idk). Having to limit framerate is explicitly excluded from being the big deal.

I'm aware of this problem. The symptoms was that (only) in high GPU load scenarios, Sunshine's encoder would throttle to below the displayed render rate while the intensive application/game was in the foreground, but would shoot up to full speed if the application was backgrounded (even if mostly still on-screen).

I solved that in Sunshine some time ago: 0a883ab

As a troubleshooting measure, I tried disabling this, as well as disabling other measures to reduce latency. None made a difference with this specific mouse-induced stutter issue.

Instead what I believe LS uses DwmFlush for (and not even by default then) is actually.. just letting the system breath more. Putting yourself as the most important program on the system is the proper thing to do I guess, when the expectation is that the actual user isn't interacting directly with it but can only act through you - but in some situations you might still want to strike a balance with the source application necessities.

I'm not convinced about that. DwmFlush is invoked within next_frame, which is called after the frame wait logic. If it were introducing extra delay, that would cause the overall framerate to be lower or out of sync, but that's not happening. If you put in an arbitrary sleep or output->WaitForVBlank() in place of where I invoked DwmFlush(), you will probably see the overall framerate be halved.

Edit: also, keep in mind that DwmFlush is not enabled by default in LookingGlass due to the potential to throttle the client to the host's framerate. One of my commits in this PR works around the issue by comparing the host framerate to the client requested rate, and automatically disables if the client rate exceeds the host. So, theoretically, there should be no issue, as long as the function always reports the correct rate and doesn't screw up with multi-monitor configs, etc. I can't really test that as my host only has one connected monitor.

@mirh
Copy link

mirh commented May 23, 2022

I solved that in Sunshine some time ago: 0a883ab

Don't you need to run as a service/SYSTEM for that to apply?
Also, please notice that there are like at least 3 priorities that I could count (process, gpu and MCSS).

I'm not convinced about that. DwmFlush is invoked within next_frame

I'm not making a technical argument there, lol. I'm just comparing "times", if I can explain.
I'm saying that the guy seemed far too happy for the entire year 2021, to still have had such a nasty problem with the mouse go 100% unmentioned.

One of my commits in this PR works around the issue by comparing the host framerate to the client requested rate, and automatically disables if the client rate exceeds the host.

Uhm, speaking of which, now that I think to it.. what happens to the entire house of cards is you are running in exclusive fullscreen mode?

@psyke83
Copy link
Collaborator Author

psyke83 commented May 23, 2022

I solved that in Sunshine some time ago: 0a883ab

Don't you need to run as a service/SYSTEM for that to apply? Also, please notice that there are like at least 3 priorities that I could count (process, gpu and MCSS).

Yes, but it can work as administrator account due to the invocation of SE_INC_BASE_PRIORITY_NAME. That doesn't really matter, as Windows installations are usually configured to run as a service.

We don't touch MCSS scheduling in the code, but I did do some tests turning it on/off, and it made no difference.

As for process priority: the Windows service defaults to "above normal", but I've tried everything from lowest to realtime. None of that impacts this issue. I've even gone as far as fiddling with process affinities, but it makes absolutely no difference. My CPU is getting old (Ryzen 2700 @ 3.9Ghz), but when configured to use software encoding with all 16 threads, it uses very little CPU. The AMD AMF encoder also has the exact same mouse stuttering irrespective of all the above.

I'm not convinced about that. DwmFlush is invoked within next_frame

I'm not making a technical argument there, lol. I'm just comparing "times", if I can explain. I'm saying that the guy seemed far too happy for the entire year 2021, to still have had such a nasty problem with the mouse go 100% unmentioned.

Unfortunately I can't test LookingGlass, as I've only got a single-GPU system.

One of my commits in this PR works around the issue by comparing the host framerate to the client requested rate, and automatically disables if the client rate exceeds the host.

Uhm, speaking of which, now that I think to it.. what happens to the entire house of cards is you are running in exclusive fullscreen mode?

I don't see any issues with exclusive fullscreen mode, but that may be because Windows 10/11 uses fake exclusive fullscreen unless you manually change the compatibility property for an executable.

To re-iterate, an alternative solution to this is to reorder the ReleaseFrame() to occur directly after AcquireNextFrame and the subsequent CopyResource, but this goes against Microsoft's guidelines. If we (and LookingGlass) are implementing the Desktop Duplication API incorrectly for other reasons, it's beyond my knowledge/skill level to fix properly.

@ReenigneArcher
Copy link
Member

I've made additional changes that would enable DwmFlush to be enabled by default

@psyke83 mind including these in this PR?

@psyke83
Copy link
Collaborator Author

psyke83 commented Jun 15, 2022

@ReenigneArcher

It's already present in the current PR set via commit: 37d86401712adaddf5d24ce9c4654b2c6607e44

@ReenigneArcher
Copy link
Member

Can you resolve the conflict? I believe the only conflict is that the assets/web/config.html was relocated to src_assets/common/assets/web/config.html

These were causing unusual behaviour (select dialogs displaying a blank label
when a value should be selected, and values randomly setting themselves to
undefined when switching tabs).
…ture

Invoke DwmFlush() before acquiring the next frame to alleviate visual stutter
during mouse movement at the cost of constraining the capture rate to the host's
monitor refresh.

Disabled by default; enable via "dwmflush" boolean configuration parameter.
On each re/init, query the active monitor refresh rate via
DwmGetCompositionTimingInfo. If the client requested framerate exceeds
the host monitor refresh, automatically disable DwmFlush.

This avoids the problem by which DwmFlush would constrain the client
FPS if the host monitor runs at a lower refresh rate, thus allowing
the feature to be enabled by default.

If there are other issues caused by DwmFlush for certain systems,
it can still be disabled via configuration.
@ReenigneArcher ReenigneArcher merged commit b5fd69a into LizardByte:nightly Jun 15, 2022
@mirh
Copy link

mirh commented Aug 22, 2023

The saga follows in #826 and then #1288

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants