Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various Metal renderer improvements #11028

Merged
merged 7 commits into from Oct 24, 2022
Merged

Conversation

TellowKrinkle
Copy link
Contributor

  • Disables subgroup ops on Intel GPUs (for both MoltenVK and Metal). Looks like they were returning garbage data when running at > 1x resolution, which was crashing Paper Mario.
    • Renames BUG_BROKEN_SUBGROUP_INVOCATION_ID to BUG_BROKEN_SUBGROUP_OPS because the various GPUs in the list have all sorts of reasons they can't use subgroup ops, not just because the invocation id is broken.
  • Fixes a bug where I forgot to set vsync settings when creating a Metal renderer, meaning that you'd have vsync on no matter what unless you messed with the checkbox while a game was running.
  • Force unroll the ubershader lighting loop. AMD's Metal compiler backend didn't like unrolling this by default, which made it use 1/3 the instructions but run much slower. 2-3x performance increase for exclusive ubershaders with per-pixel lighting, small increase otherwise. (Their DX backend on Windows does unroll the loop, so no problems there)
    • I couldn't see anywhere in GLSL to annotate this, so I added it as a post-MSL-conversion fixup. If anyone knows of an annotation we could add to the GLSL (or SPIRV), that would allow us to get this for MoltenVK as well.
  • Bring back manual buffer uploads. I previously removed it due to it not offering much benefit, while slowing down bbox. Further testing indicates that it does have benefit (esp. on ubershaders), and modifications were made to disable it on the command buffer directly after a CPU-GPU sync, which fixes the bbox performance issue, allowing it to be enabled by default.
  • Explicitly disable ARC for the videometal target. Prevents Metal backend from failing to compile if you globally enable ARC, which you might do if you were working on an iOS build.
  • Add a config option to use [MTLCommandBuffer presentDrawable:] instead of [MTLDrawable present]. This is mostly for me trying to take GPU frame captures in Xcode, where it has a much higher success rate (it helps Xcode detect frame boundaries better, and capturing too much breaks the frame capture because it doesn't notice when our circular upload buffers wrap around and we start overwriting data that it needed). We can't use it all the time because it breaks fast forwarding.

@dvessel
Copy link
Contributor

dvessel commented Oct 11, 2022

Been running a build with this PR for weeks and found no issues on an M1.

static constexpr std::pair<std::string_view, std::string_view> MSL_FIXUPS[] = {
// Force-unroll the lighting loop in ubershaders, which greatly reduces register pressure on AMD
{"for (uint chan = 0u; chan < 2u; chan++)",
"_Pragma(\"unroll\") for (uint chan = 0u; chan < 2u; chan++)"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you're using this system instead of just putting _Pragma("unroll") in the ubershader generation code? I think that might be more simpler.

Or, do you plan to add more of these fixups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GLSL have _Pragma("unroll")? I searched spirv-cross's codebase and couldn't find anything that would generate one, but maybe there's something I'm missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misread what you're doing here. I thought this was patching the GLSL. Never mind!

@JMC47 JMC47 merged commit b667931 into dolphin-emu:master Oct 24, 2022
11 checks passed
@pizuz
Copy link

pizuz commented Oct 24, 2022

Glad to report that this PR (and #10979) managed to put your Metal renderer into the lead, again. Very impressive performance increase. Frame rate basically doubled on my Kepler machine with Monterey. I haven't tested this on Windows, yet, but it looks very promising and might finally close the performance gap.

Thank you guys for the hard work.

@TellowKrinkle TellowKrinkle deleted the MetalFixes branch October 24, 2022 20:59
@TellowKrinkle
Copy link
Contributor Author

TellowKrinkle commented Oct 24, 2022

Interesting, I wouldn't have expected this PR to make much of a difference to performance outside of ubershaders

If you could test with --config=Graphics.Settings.ManuallyUploadBuffers=0 or by adding ManuallyUploadBuffers=0 to GFX.ini (which will disable one of the new features added here) and see if that reverts the performance boost, that would be helpful to know. (A similar thing could be implemented in Vulkan, but hasn't yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants