New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace constant buffer access on shader with new Load instruction #4646
Conversation
Download the artifacts for this pull request: Experimental GUI (Avalonia)GUI-less (SDL2)Only for Developers
|
Tested BOTW, particles were fine and working as intended. |
Tested about a dozen titles (will test more and edit later if i get time) : Looked for any performance changes compared to master or any visual oddities that can be reliably spotted in 15 mins of playtime for each of the following titles Key observations :
|
I think it might be better to keep the constant buffer operand type. While it can be a bit redundant when there is another way to load data from it, there are a lot of passes that needs to check if data is coming from specific constant buffer ranges, and having a constant buffer operand makes those checks a lot simpler and more efficient. Plus it does not really hinder the core goal of those changes which is allowing data to be loaded from any constant buffer binding, not just guest buffers, and it would reduce the amount of diffs, so it seems tempting... |
I changed the approach a bit keeping the Will require re-testing. |
The BOTW regression seems to be FIXED now, re tested on both setups and the results are attached. Setup 1 : Setup 2 : |
Re-Tested about 20 titles : Looked for any performance changes compared to master or any visual oddities that can be reliably spotted in limited playtime for each of the following titles. Key observations :
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, the only unknown might be testing the vector indexing bug workaround for AMD/Adreno, since I don't think anyone has done that yet.
@@ -1,6 +1,6 @@ | |||
ivec2 Helper_TexelFetchScale(ivec2 inputVec, int samplerIndex) | |||
{ | |||
float scale = s_render_scale[samplerIndex]; | |||
float scale = support_buffer.s_render_scale[1 + samplerIndex]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the graphics render scale reserved now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before GLSL was using different structs for graphics and compute. The compute struct had a s_reserved
field before the s_render_scale
array to account for the render target scale. Now they both use the same struct, so adding 1 here is necessary to account for that.
Overview
This is a continuation of #4565 (and depends on it).
Like attributes, the shader IR currently has two way to represent constant buffer access, either by using the
ConstantBuffer
operand type, or using theLoadConstant
instruction. The instruction is supposed to be used for non-constant offsets (or slot) access, since the operand can only have constant values.Having two ways to represent the same operation is undesirable since it requires handling both cases which increases the complexity of the implementation. So this PR replaces both the constant buffer operand and the
LoadConstant
instruction with the newLoad
instruction introduced on the linked PR, with aStorageKind
ofConstantBuffer
.There are other notable changes:
Load
instruction has the host binding number rather than the guest constant buffer slot now. This allow using this method to also access the support buffer, which is always at binding 0. With the previous approach, accessing it would require some special slot number which again, requires special handling on various places and more complexity. The approach on this PR does not require any special handling on the backend, it is just a buffer like any other.Load
has a field index, vector index and element index as operands. Before it would just have a word offset of the data to access. Like the above, this is required to support loading from the support buffer with this instruction. If a offset was used, it would need to look up the struct field and element from that offset, which is more complicated, plus unaligned offsets would be "invalid" and I'm not sure how those would be handled.CbIndexable
feature, where it was possible to index the constant buffer slots was removed. Instead, it now compares the slot value with each bound constant buffer, and selects the correct one with a chain ofConditionalSelect
operations. This makes the backend code simpler (since there's no need for a special case forCbIndexable
now). This is also potentially a fix since the descriptor set was not really correct for this case (it would not set it was a uniform array, but as separate uniform buffer bindings, although I don't think this was causing problems in practice). It might help implementations where the indexing never worked (like AMD and MoltenVK), but it will require testing.Downsides:
The main downside is that any pattern recognition that needs to find specific constant buffer usage becomes more complex. We no longer need to check just the constant buffer operand, but instead we need to check if the operand is the result of a
Load
instruction, and if all the load parameters matches what we expect (plus, we need to do a reverse look up of the slot from the binding number). This could also make the process a bit slower since there's more to check.What to test
The only thing that I expect to be fixed here is the exploded models on Super Mario 3D All Stars games (Super Mario Sunshine and Super Mario Galaxy) with AMD GPUs. It worth testing.
Other than that it just needs general testing to ensure there's no regressions. It might be worth testing if the particles are still fine on BOTW since it depends on the constant buffer replacement which I did not test.