Replace constant buffer access on shader with new Load instruction #4646

gdkchan · 2023-04-06T04:01:19Z

Overview

This is a continuation of #4565 (and depends on it).

Like attributes, the shader IR currently has two way to represent constant buffer access, either by using the ConstantBuffer operand type, or using the LoadConstant instruction. The instruction is supposed to be used for non-constant offsets (or slot) access, since the operand can only have constant values.

Having two ways to represent the same operation is undesirable since it requires handling both cases which increases the complexity of the implementation. So this PR replaces both the constant buffer operand and the LoadConstant instruction with the new Load instruction introduced on the linked PR, with a StorageKind of ConstantBuffer.

There are other notable changes:

The Load instruction has the host binding number rather than the guest constant buffer slot now. This allow using this method to also access the support buffer, which is always at binding 0. With the previous approach, accessing it would require some special slot number which again, requires special handling on various places and more complexity. The approach on this PR does not require any special handling on the backend, it is just a buffer like any other.
The Load has a field index, vector index and element index as operands. Before it would just have a word offset of the data to access. Like the above, this is required to support loading from the support buffer with this instruction. If a offset was used, it would need to look up the struct field and element from that offset, which is more complicated, plus unaligned offsets would be "invalid" and I'm not sure how those would be handled.
The CbIndexable feature, where it was possible to index the constant buffer slots was removed. Instead, it now compares the slot value with each bound constant buffer, and selects the correct one with a chain of ConditionalSelect operations. This makes the backend code simpler (since there's no need for a special case for CbIndexable now). This is also potentially a fix since the descriptor set was not really correct for this case (it would not set it was a uniform array, but as separate uniform buffer bindings, although I don't think this was causing problems in practice). It might help implementations where the indexing never worked (like AMD and MoltenVK), but it will require testing.

Downsides:

The main downside is that any pattern recognition that needs to find specific constant buffer usage becomes more complex. We no longer need to check just the constant buffer operand, but instead we need to check if the operand is the result of a Load instruction, and if all the load parameters matches what we expect (plus, we need to do a reverse look up of the slot from the binding number). This could also make the process a bit slower since there's more to check.

What to test

The only thing that I expect to be fixed here is the exploded models on Super Mario 3D All Stars games (Super Mario Sunshine and Super Mario Galaxy) with AMD GPUs. It worth testing.

Other than that it just needs general testing to ensure there's no regressions. It might be worth testing if the particles are still fine on BOTW since it depends on the constant buffer replacement which I did not test.

github-actions · 2023-04-06T17:19:58Z

Download the artifacts for this pull request:

Experimental GUI (Avalonia)

GUI-less (SDL2)

Only for Developers

Bjorn29512 · 2023-04-06T17:51:50Z

Tested BOTW, particles were fine and working as intended.

Bjorn29512 · 2023-05-04T15:30:26Z

Tested about a dozen titles (will test more and edit later if i get time) :

Looked for any performance changes compared to master or any visual oddities that can be reliably spotted in 15 mins of playtime for each of the following titles

Key observations :

Reduced performance compared to master in The legend of Zelda : Breath of the Wild - Tested across 2 different setups and took multiple samples and both noticed comparatively lower performance each time. It's recommended for a bunch of ppl to test this and get more samples.
- Setup 1 : i5-12400f, 1660Super (531.79), 32Gigs Ram
  
  Top : PR , Bottom : Master
- Setup 2 : i7 12700h, 3060 laptop, 16Gigs Ram
  
  Top : Master, Bottom : PR
Other tested titles where no Change/Regression/Issue was noticed :
- Bayonetta 3
- FE: Engage
- Advance Wars 1+2 Re-Boot Camp
- Pokemon Violet
- Pokemon Sword
- SMT 5
- Super Mario Odyssey
- Xenoblade Chronicles : DE
- Xenoblade Chronicles 2
- Xenoblade Chronicles 3
- TloZ: Skyward Sword HD

gdkchan · 2023-05-06T02:45:47Z

I think it might be better to keep the constant buffer operand type. While it can be a bit redundant when there is another way to load data from it, there are a lot of passes that needs to check if data is coming from specific constant buffer ranges, and having a constant buffer operand makes those checks a lot simpler and more efficient. Plus it does not really hinder the core goal of those changes which is allowing data to be loaded from any constant buffer binding, not just guest buffers, and it would reduce the amount of diffs, so it seems tempting...

gdkchan · 2023-05-07T04:05:48Z

I changed the approach a bit keeping the ConstantBuffer operand type. It now transforms it into the Load operation with StorageKind.ConstantBuffer when creating the "StructuredIr" that the backends will consume. This reduced 138 lines from the diff.

Will require re-testing.

Bjorn29512 · 2023-05-07T07:36:00Z

Tested about a dozen titles (will test more and edit later if i get time) :

Looked for any performance changes compared to master or any visual oddities that can be reliably spotted in 15 mins of playtime for each of the following titles

Key observations :

* Reduced performance compared to master in The legend of Zelda : Breath of the Wild - Tested across 2 different setups and took multiple samples and both noticed comparatively lower performance each time. It's recommended for a bunch of ppl to test this and get more samples.
  
  * Setup 1 : i5-12400f, 1660Super (531.79), 32Gigs Ram
    Top : PR , Bottom : Master
    ![image](https://user-images.githubusercontent.com/110204265/236253516-22065e53-7f1b-484c-8d4c-9f4588f85fd5.png)
  * Setup 2 : i7 12700h, 3060 laptop, 16Gigs Ram
    Top : Master, Bottom : PR
    ![image](https://user-images.githubusercontent.com/110204265/236254033-c3af737e-21fa-49d8-93ad-21e2c960a004.png)

* Other tested titles where no Change/Regression/Issue was noticed :
  
  * Bayonetta 3
  * FE: Engage
  * Advance Wars 1+2 Re-Boot Camp
  * Pokemon Violet
  * Pokemon Sword
  * SMT 5
  * Super Mario Odyssey
  * Xenoblade Chronicles : DE
  * Xenoblade Chronicles 2
  * Xenoblade Chronicles 3
  * TloZ: Skyward Sword HD

The BOTW regression seems to be FIXED now, re tested on both setups and the results are attached.

Setup 1 :

Top: Master, Bottom: PR

Setup 2 :

Top: Master, Bottom: PR

Bjorn29512 · 2023-05-08T10:37:01Z

Re-Tested about 20 titles :

Looked for any performance changes compared to master or any visual oddities that can be reliably spotted in limited playtime for each of the following titles.

Key observations :

No Regression/change was noted compared to master in any of the titles tested :
- Bayonetta 3
- FE: Three Hopes
- FE: Engage
- FE: Three Houses
- Advance Wars 1+2 Re-Boot Camp
- Hyrule Warriors : Age of Calamity
- Mario + Rabbids: Sparks of Hope
- Metroid Dread
- Metroid Prime Remastered
- Pokemon Violet
- Pokemon Sword
- SMT 5
- Super Mario Odyssey
- TloZ: Link's Awakening
- TloZ: Skyward Sword HD
- TloZ: Breath of The Wild
- Xenoblade Chronicles DE
- Xenoblade Chronicles 2
- Xenoblade Chronicles 3
- Yo Kai Watch 4

riperiperi

I think this looks good, the only unknown might be testing the vector indexing bug workaround for AMD/Adreno, since I don't think anyone has done that yet.

riperiperi · 2023-05-18T12:52:59Z

src/Ryujinx.Graphics.Shader/CodeGen/Glsl/HelperFunctions/TexelFetchScale_cp.glsl

@@ -1,6 +1,6 @@
 ivec2 Helper_TexelFetchScale(ivec2 inputVec, int samplerIndex)
 {
-    float scale = s_render_scale[samplerIndex];
+    float scale = support_buffer.s_render_scale[1 + samplerIndex];


This has the graphics render scale reserved now?

Before GLSL was using different structs for graphics and compute. The compute struct had a s_reserved field before the s_render_scale array to account for the render target scale. Now they both use the same struct, so adding 1 here is necessary to account for that.

gdkchan added enhancement New feature or request gpu Related to Ryujinx.Graphics labels Apr 6, 2023

gdkchan force-pushed the load-cbuf branch from 4cdcf56 to 8c0a065 Compare April 6, 2023 16:51

gdkchan mentioned this pull request Apr 13, 2023

Fix incorrect fragment origin when YNegate is enabled #4673

Merged

gdkchan mentioned this pull request Apr 23, 2023

Generate scaling helper functions on IR #4714

Merged

gdkchan force-pushed the load-cbuf branch from 8c0a065 to b08e400 Compare April 26, 2023 02:06

gdkchan marked this pull request as ready for review April 26, 2023 02:11

MutantAura requested review from marysaka, riperiperi and a team April 26, 2023 21:30

marysaka approved these changes Apr 27, 2023

View reviewed changes

gdkchan force-pushed the load-cbuf branch from b08e400 to 3c84483 Compare May 3, 2023 21:06

gdkchan force-pushed the load-cbuf branch from 3c84483 to 0260fe6 Compare May 7, 2023 04:02

gdkchan mentioned this pull request May 17, 2023

Implement shader storage buffer operations using new Load/Store instructions #4993

Merged

riperiperi approved these changes May 18, 2023

View reviewed changes

Replace constant buffer access on shader with new Load instruction

3945407

gdkchan force-pushed the load-cbuf branch from b3cc513 to 3945407 Compare May 20, 2023 17:33

gdkchan merged commit 402f05b into Ryujinx:master May 20, 2023
8 checks passed

gdkchan deleted the load-cbuf branch May 20, 2023 19:19

gdkchan mentioned this pull request May 31, 2023

Share ResourceManager between vertex A and B shaders #5181

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace constant buffer access on shader with new Load instruction #4646

Replace constant buffer access on shader with new Load instruction #4646

gdkchan commented Apr 6, 2023 •

edited

github-actions bot commented Apr 6, 2023 •

edited

Bjorn29512 commented Apr 6, 2023

Bjorn29512 commented May 4, 2023 •

edited

gdkchan commented May 6, 2023

gdkchan commented May 7, 2023

Bjorn29512 commented May 7, 2023

Bjorn29512 commented May 8, 2023

riperiperi left a comment

riperiperi May 18, 2023

gdkchan May 20, 2023

Replace constant buffer access on shader with new Load instruction #4646

Replace constant buffer access on shader with new Load instruction #4646

Conversation

gdkchan commented Apr 6, 2023 • edited

Overview

What to test

github-actions bot commented Apr 6, 2023 • edited

Bjorn29512 commented Apr 6, 2023

Bjorn29512 commented May 4, 2023 • edited

gdkchan commented May 6, 2023

gdkchan commented May 7, 2023

Bjorn29512 commented May 7, 2023

Bjorn29512 commented May 8, 2023

riperiperi left a comment

Choose a reason for hiding this comment

riperiperi May 18, 2023

Choose a reason for hiding this comment

gdkchan May 20, 2023

Choose a reason for hiding this comment

gdkchan commented Apr 6, 2023 •

edited

github-actions bot commented Apr 6, 2023 •

edited

Bjorn29512 commented May 4, 2023 •

edited