Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoBackend/OGL: Prefer KHR_shader_subgroup over NV_shader_thread. #11523

Merged
merged 1 commit into from Feb 10, 2023

Conversation

degasus
Copy link
Member

@degasus degasus commented Jan 31, 2023

While the NV extension is totally fine, the KHR extension should be able to support more hardware.

For NVIDIA, the hardware either supports both or neither, it just needs a driver from the last two years.
For AMD, the drivers from late 2022-12 seems to bring support for the KHR extension.
For Intel, the KHR is also supported for some years.

This needs testing of BBOX on ALL GPU vendors, both behavior and performance.
And we need reporting if the two optional features are really supported on AMD.

Testing results:

  • NVIDIA
    • Fermi (391.35): NOT AFFECTED. No support for GL_NV_shader_thread_shuffle nor GL_KHR_shader_subgroup
    • Ampere (528.24): MATCH. Supports both, same performance
  • AMD
    • MISSING
  • INTEL
    • Haswell (20.19.15.4624): NOT AFFECTED. No support for GL_NV_shader_thread_shuffle nor GL_KHR_shader_subgroup
    • Tiger Lake (31.0.101.4090): WIN. Supports only GL_KHR_shader_subgroup. All needed sub-features are supported. Small speedup as expected

@mbc07
Copy link
Contributor

mbc07 commented Jan 31, 2023

Gave it a try on Windows 11, with a NVIDIA RTX 3060 (laptop version, 130W TGP, driver 528.24, with Optimus enabled) and a Intel UHD Graphics (Gen12, Tiger Lake-H, driver 31.0.101.4090), using Lord Crump's "PUNISH HIM" cutscene from Paper Mario TTYD prologue. All settings were kept at default, except for disabling Dual Core mode since TTYD is sensitive to it. Here are the results:

  • On master (5.0-18485)
    RTX 3060: 148 FPS, up to 10x IR, performance starts decreasing at 11x IR. Maximum it can go while still maintaining 60 FPS is 19x IR.
    UHD Graphics: 118 FPS, up to 2x IR, performance starts decreasing at 3x IR. Maximum it can go while still maintaining 60 FPS is 4x IR.

  • This PR
    RTX 3060: 176 FPS, up to 10x IR, performance starts decreasing at 11x IR. Maximum it can go while still maintaining 60 FPS is 19x IR.
    UHD Graphics: 118 FPS, up to 3x IR, performance starts decreasing at 4x IR. Maximum it can go while still maintaining 60 FPS is 5x IR.

Haven't noticed anything wrong as far as rendering goes. Will test this on some old hardware soon (NVIDIA Fermi, Intel Haswell)...

Edit: forgot to mention the CPU, it's a Core i7-11800H

@mbc07
Copy link
Contributor

mbc07 commented Feb 1, 2023

Second batch of tests, this time on a Pentium G3258 at 4.2 GHz, Windows 10, with a NVIDIA GT 440 (Fermi, driver 391.35) and a Intel HD Graphics (Gen7.5, Haswell GT1, driver 15.40.34.4624), same scene and settings as before:

  • On master (5.0-18485)
    GT 440: 118 FPS, up to 2x IR, performance starts decreasing at 3x IR, which is also the maximum it can go while still maintaining 60 FPS.
    HD Graphics: 80 FPS at 1x IR, performance starts decreasing at 2x IR, which is also the maximum it can go while still maintaining 60 FPS.

  • This PR
    GT 440: 114 FPS, up to 2x IR, performance starts decreasing at 3x IR, which is also the maximum it can go while still maintaining 60 FPS.
    HD Graphics: 80 FPS at 1x IR, performance starts decreasing at 2x IR, which is also the maximum it can go while still maintaining 60 FPS.

I noticed that some bounding box effects has clipping issues with the Intel HD Graphics, but this also happens on master, not exclusive to this PR:


@mbc07
Copy link
Contributor

mbc07 commented Feb 1, 2023

This might be useful, it contains XML reports from GLview, listing supported extensions and other related info of the GPUs I tested:

GLview Reports.zip

@degasus
Copy link
Member Author

degasus commented Feb 1, 2023

@mbc07 Yes, this was very helpful. The XML files showed that the newer Intel driver supports everything we need and that this feature was enabled and so tested. It also shows that Fermi wasn't supported before and won't be afterwards.

However, I'm a bit doubtful about the speedup on your NVIDIA RTX 3060, it should be neglegtable as we "just" switch the GLSL extension to generate the same intrinsic.

@degasus degasus changed the title VideoBackend/OGL: Prefer KHR_subgroup over NV_shader_thread. VideoBackend/OGL: Prefer KHR_shader_subgroup over NV_shader_thread. Feb 1, 2023
@mbc07
Copy link
Contributor

mbc07 commented Feb 1, 2023

However, I'm a bit doubtful about the speedup on your NVIDIA RTX 3060, it should be neglegtable as we "just" switch the GLSL extension to generate the same intrinsic.

I'll try retesting it later, this time with Optimus disabled. In my initial test I forgot to disable Optimus, so whatever was rendered on it passed through the iGPU before reaching the screen. The iGPU hasn't been particularly stable since the driver merge Intel did early January (Gen12 iGPUs and their Arc dGPUs now shares the same driver)...

@mbc07
Copy link
Contributor

mbc07 commented Feb 1, 2023

@degasus you were right. Retested the RTX 3060 with NVIDIA Optimus disabled and got 178 FPS with this PR and 176 FPS on master (min/max IR results are the same as before)...

@degasus degasus marked this pull request as ready for review February 6, 2023 13:26
Copy link
Member

@phire phire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the GlslVersion uglyness, it looks fine to me.

While the NV extension is totally fine, the KHR extension should be able to support more hardware.

For NVIDIA, the hardware either supports both or neither, it just needs a driver from the last two years.
For AMD, the drivers from late 2022-12 seems to bring support for the KHR extension.
For Intel, the KHR is also supported for some years.
@JMC47 JMC47 merged commit 258151f into dolphin-emu:master Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants