New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various Metal renderer improvements #11028
Conversation
92cb4ac
to
3629a1c
Compare
Turns out it was helpful. (Most improvement in ubershaders.) This time with much better auto mode.
Makes it easier to enable arc elsewhere without breaking the Metal backend
3629a1c
to
fd2680d
Compare
|
Been running a build with this PR for weeks and found no issues on an M1. |
| static constexpr std::pair<std::string_view, std::string_view> MSL_FIXUPS[] = { | ||
| // Force-unroll the lighting loop in ubershaders, which greatly reduces register pressure on AMD | ||
| {"for (uint chan = 0u; chan < 2u; chan++)", | ||
| "_Pragma(\"unroll\") for (uint chan = 0u; chan < 2u; chan++)"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you're using this system instead of just putting _Pragma("unroll") in the ubershader generation code? I think that might be more simpler.
Or, do you plan to add more of these fixups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does GLSL have _Pragma("unroll")? I searched spirv-cross's codebase and couldn't find anything that would generate one, but maybe there's something I'm missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I misread what you're doing here. I thought this was patching the GLSL. Never mind!
|
Glad to report that this PR (and #10979) managed to put your Metal renderer into the lead, again. Very impressive performance increase. Frame rate basically doubled on my Kepler machine with Monterey. I haven't tested this on Windows, yet, but it looks very promising and might finally close the performance gap. Thank you guys for the hard work. |
|
Interesting, I wouldn't have expected this PR to make much of a difference to performance outside of ubershaders If you could test with |
BUG_BROKEN_SUBGROUP_INVOCATION_IDtoBUG_BROKEN_SUBGROUP_OPSbecause the various GPUs in the list have all sorts of reasons they can't use subgroup ops, not just because the invocation id is broken.videometaltarget. Prevents Metal backend from failing to compile if you globally enable ARC, which you might do if you were working on an iOS build.[MTLCommandBuffer presentDrawable:]instead of[MTLDrawable present]. This is mostly for me trying to take GPU frame captures in Xcode, where it has a much higher success rate (it helps Xcode detect frame boundaries better, and capturing too much breaks the frame capture because it doesn't notice when our circular upload buffers wrap around and we start overwriting data that it needed). We can't use it all the time because it breaks fast forwarding.