Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a post-cache shader UID fixup pass #10747

Merged
merged 3 commits into from Jul 17, 2022

Conversation

TellowKrinkle
Copy link
Contributor

@TellowKrinkle TellowKrinkle commented Jun 14, 2022

Adds a fixup pass that runs between the pipeline uid cache and actual pipeline generation that applies fixups based on the current backend's supported features.

A number of bugs (in particular BUG_BROKEN_DUAL_SOURCE_BLENDING) found themselves spread across VideoCommon and the backends, with each place having to guess at what the other would do, risking a failed pipeline compile if any disagreed. With the new implementation, a single pass runs over the entire pipeline at a point where all other decisions have been finalized, so it can make decisions based on what is actually about to happen, rather than trying to guess.
I'm not super happy with the new location either though. Other recommendations welcome.

Reasons for the new location:

  • The UID cache is shared among all backends and GPUs, so placing it any earlier would mean that other GPUs' and backends' UIDs would end up getting compiled upon cache load, requiring some sort of validation of the UIDs in shader generation to make sure that you're not generating a shader that will fail to compile (even if the shader will never actually be used).
  • Moving it any later will lose the ability to see the whole pipeline, which brings back the guesswork that was all over the old implementation.
  • The current location is the only one currently in the emulator that satisfies both of the above needs
  • Reduces the frequency that differences in bugs (which aren't serialized into cache IDs) result in bad cache hits in the shader binary cache when switching between two different GPUs in Vulkan. We should really have a dedicated solution to this though. Preferably one that takes actual shader source into account, so it notices when you edit shader generation functions.

Drawbacks of the new location:

  • There's now two types of UIDs, one with and without fixups, with no differentiation in the type system to catch mixups
    • Maybe we should make the post-fixup UID a different type instead?
  • If two UIDs map to the same post-fixup UID, the pipeline cache will still generate separate pipelines for them (since the cache is done on pre-fixup UIDs), which could result in unnecessary compilation stutter.
    • We could just leave this up to the backend/driver, as is done currently in the PR.
    • We could move the pipeline-reducing changes to a separate pre-UID-cache pass. This would mean that when loading a uid cache touched by a GPU without fbfetch, the extra pipelines would still be unnecessarily created. Both the "use fbfetch to remove separate pipeline blend modes" one included here and a future one I have planned wouldn't cause issues in any drivers just by compiling the additional pipelines, it would just be a bit wasteful.
    • We could add a second pipeline cache that isn't saved to disk and handles post-fixup UIDs

I also deleted dstalpha from the BlendingState struct, since none of the backends use it anymore. I assume this is fine? Also, should I adjust the positions of all the other bitfields to fill in the gap?

Testing needed

  • I changed the meaning of BUG_BROKEN_DUAL_SOURCE_BLENDING to "dual source breaks when the shader outputs src1 but the blending configuration doesn't use it". I can confirm (from testing for PCSX2) that this is the actual cause for the Intel graphics listed (and it means less brokenness for dstalpha, which previously still enabled DSB), but need verification that I didn't break things for the AMD GPUs with the flag. If it does, maybe we should split it for the two different bug types. I tried running on OpenGL+AMD+macOS and didn't notice anything weird, but maybe I didn't try the right game.

    • Mario Kart Double Dash uses dstalpha without blending, and is improved by this PR on Intel+MVK

      Images

      image
      image

  • I reduced BUG_BROKEN_DISCARD_WITH_EARLY_Z to only apply fbfetch when early z was in use. I assume this won't break anything but you never know. Please test on M1.

  • I fully enabled all fbfetch things at all times on ubershaders when fbfetch is supported, which should reduce the number of different pipelines needed, but may slow down the shader a bit. (It was also a lazy way for me to get around a bug in the unofficial Intel Metal fbfetch support, where fbfetch + dual source blend freezes the GPU.) I assume this is fine since reducing compilation time is kind of the point of ubershaders, but it would be good to know others' opinions. For reference, my UHD 630 goes from 24 to 20 fps at 2x resolution on the Wind Waker title screen flyover to enable fbfetch with exclusive ubershaders.

  • I haven't actually tested this outside of macOS. I don't see why it would break, but just in case.

@TellowKrinkle TellowKrinkle force-pushed the LateUIDFixup branch 2 times, most recently from 09925bb to a26f79c Compare June 14, 2022 05:03
@TellowKrinkle
Copy link
Contributor Author

TellowKrinkle commented Jun 15, 2022

Oh yeah, is there something I'm supposed to do when I change the layout of uids? The uid cache doesn't seem to be invalidating properly

@Pokechu22
Copy link
Contributor

Yeah, you need to bump this:

// This version number must be incremented whenever any of the shader UID structures change.
// As pipelines encompass both shader UIDs and render states, changes to either of these should
// also increment the pipeline UID version. Incrementing the UID version will cause all UID
// caches to be invalidated.
constexpr u32 GX_PIPELINE_UID_VERSION = 4; // Last changed in PR 10215

@TellowKrinkle
Copy link
Contributor Author

Yeah, you need to bump this:

// This version number must be incremented whenever any of the shader UID structures change.
// As pipelines encompass both shader UIDs and render states, changes to either of these should
// also increment the pipeline UID version. Incrementing the UID version will cause all UID
// caches to be invalidated.
constexpr u32 GX_PIPELINE_UID_VERSION = 4; // Last changed in PR 10215

Ahh okay, should be fixed up now

@Simonx22
Copy link
Member

I tested games affected by BUG_BROKEN_DISCARD_WITH_EARLY_Z, and they still work fine on my M1 Pro with this PR.

image

image

@JMC47
Copy link
Contributor

JMC47 commented Jul 13, 2022

The point of Ubershaders is to remove stuttering, so whatever performance down there is from fbfetch being used shouldn't be a problem.

Adds a pass to process driver deficiencies between UID caching and use, allowing a full view of the whole pipeline, since some bugs/workarounds involve interactions between blend modes and the pixel shader
Reduce the number of different pipelines needed.  Also works around drivers that break when you combine fbfetch with dual source blending
It's not supported by any PC graphics API, and therefore completely unused
Copy link
Contributor

@iwubcode iwubcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code wise LGTM. I did test a handful of games on my AMD card.

@JMC47 JMC47 merged commit 70b0b03 into dolphin-emu:master Jul 17, 2022
11 checks passed
@TellowKrinkle TellowKrinkle deleted the LateUIDFixup branch July 17, 2022 04:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants