Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace constant buffer access on shader with new Load instruction #4646

Merged
merged 1 commit into from May 20, 2023

Conversation

gdkchan
Copy link
Member

@gdkchan gdkchan commented Apr 6, 2023

Overview

This is a continuation of #4565 (and depends on it).

Like attributes, the shader IR currently has two way to represent constant buffer access, either by using the ConstantBuffer operand type, or using the LoadConstant instruction. The instruction is supposed to be used for non-constant offsets (or slot) access, since the operand can only have constant values.

Having two ways to represent the same operation is undesirable since it requires handling both cases which increases the complexity of the implementation. So this PR replaces both the constant buffer operand and the LoadConstant instruction with the new Load instruction introduced on the linked PR, with a StorageKind of ConstantBuffer.

There are other notable changes:

  • The Load instruction has the host binding number rather than the guest constant buffer slot now. This allow using this method to also access the support buffer, which is always at binding 0. With the previous approach, accessing it would require some special slot number which again, requires special handling on various places and more complexity. The approach on this PR does not require any special handling on the backend, it is just a buffer like any other.
  • The Load has a field index, vector index and element index as operands. Before it would just have a word offset of the data to access. Like the above, this is required to support loading from the support buffer with this instruction. If a offset was used, it would need to look up the struct field and element from that offset, which is more complicated, plus unaligned offsets would be "invalid" and I'm not sure how those would be handled.
  • The CbIndexable feature, where it was possible to index the constant buffer slots was removed. Instead, it now compares the slot value with each bound constant buffer, and selects the correct one with a chain of ConditionalSelect operations. This makes the backend code simpler (since there's no need for a special case for CbIndexable now). This is also potentially a fix since the descriptor set was not really correct for this case (it would not set it was a uniform array, but as separate uniform buffer bindings, although I don't think this was causing problems in practice). It might help implementations where the indexing never worked (like AMD and MoltenVK), but it will require testing.

Downsides:

The main downside is that any pattern recognition that needs to find specific constant buffer usage becomes more complex. We no longer need to check just the constant buffer operand, but instead we need to check if the operand is the result of a Load instruction, and if all the load parameters matches what we expect (plus, we need to do a reverse look up of the slot from the binding number). This could also make the process a bit slower since there's more to check.

What to test

The only thing that I expect to be fixed here is the exploded models on Super Mario 3D All Stars games (Super Mario Sunshine and Super Mario Galaxy) with AMD GPUs. It worth testing.

Other than that it just needs general testing to ensure there's no regressions. It might be worth testing if the particles are still fine on BOTW since it depends on the constant buffer replacement which I did not test.

@gdkchan gdkchan added enhancement New feature or request gpu Related to Ryujinx.Graphics labels Apr 6, 2023
@Bjorn29512
Copy link

Tested BOTW, particles were fine and working as intended.

@Bjorn29512
Copy link

Bjorn29512 commented May 4, 2023

Tested about a dozen titles (will test more and edit later if i get time) :

Looked for any performance changes compared to master or any visual oddities that can be reliably spotted in 15 mins of playtime for each of the following titles

Key observations :

  • Reduced performance compared to master in The legend of Zelda : Breath of the Wild - Tested across 2 different setups and took multiple samples and both noticed comparatively lower performance each time. It's recommended for a bunch of ppl to test this and get more samples.

    • Setup 1 : i5-12400f, 1660Super (531.79), 32Gigs Ram

      Top : PR , Bottom : Master
      image

    • Setup 2 : i7 12700h, 3060 laptop, 16Gigs Ram

      Top : Master, Bottom : PR
      image

  • Other tested titles where no Change/Regression/Issue was noticed :

    • Bayonetta 3
    • FE: Engage
    • Advance Wars 1+2 Re-Boot Camp
    • Pokemon Violet
    • Pokemon Sword
    • SMT 5
    • Super Mario Odyssey
    • Xenoblade Chronicles : DE
    • Xenoblade Chronicles 2
    • Xenoblade Chronicles 3
    • TloZ: Skyward Sword HD

@gdkchan
Copy link
Member Author

gdkchan commented May 6, 2023

I think it might be better to keep the constant buffer operand type. While it can be a bit redundant when there is another way to load data from it, there are a lot of passes that needs to check if data is coming from specific constant buffer ranges, and having a constant buffer operand makes those checks a lot simpler and more efficient. Plus it does not really hinder the core goal of those changes which is allowing data to be loaded from any constant buffer binding, not just guest buffers, and it would reduce the amount of diffs, so it seems tempting...

@gdkchan
Copy link
Member Author

gdkchan commented May 7, 2023

I changed the approach a bit keeping the ConstantBuffer operand type. It now transforms it into the Load operation with StorageKind.ConstantBuffer when creating the "StructuredIr" that the backends will consume. This reduced 138 lines from the diff.

Will require re-testing.

@Bjorn29512
Copy link

Tested about a dozen titles (will test more and edit later if i get time) :

Looked for any performance changes compared to master or any visual oddities that can be reliably spotted in 15 mins of playtime for each of the following titles

Key observations :

* Reduced performance compared to master in The legend of Zelda : Breath of the Wild - Tested across 2 different setups and took multiple samples and both noticed comparatively lower performance each time. It's recommended for a bunch of ppl to test this and get more samples.
  
  * Setup 1 : i5-12400f, 1660Super (531.79), 32Gigs Ram
    Top : PR , Bottom : Master
    ![image](https://user-images.githubusercontent.com/110204265/236253516-22065e53-7f1b-484c-8d4c-9f4588f85fd5.png)
  * Setup 2 : i7 12700h, 3060 laptop, 16Gigs Ram
    Top : Master, Bottom : PR
    ![image](https://user-images.githubusercontent.com/110204265/236254033-c3af737e-21fa-49d8-93ad-21e2c960a004.png)

* Other tested titles where no Change/Regression/Issue was noticed :
  
  * Bayonetta 3
  * FE: Engage
  * Advance Wars 1+2 Re-Boot Camp
  * Pokemon Violet
  * Pokemon Sword
  * SMT 5
  * Super Mario Odyssey
  * Xenoblade Chronicles : DE
  * Xenoblade Chronicles 2
  * Xenoblade Chronicles 3
  * TloZ: Skyward Sword HD

The BOTW regression seems to be FIXED now, re tested on both setups and the results are attached.

Setup 1 :

Top: Master, Bottom: PR
image

Setup 2 :

Top: Master, Bottom: PR
image

@Bjorn29512
Copy link

Re-Tested about 20 titles :

Looked for any performance changes compared to master or any visual oddities that can be reliably spotted in limited playtime for each of the following titles.

Key observations :

  • No Regression/change was noted compared to master in any of the titles tested :

    • Bayonetta 3
    • FE: Three Hopes
    • FE: Engage
    • FE: Three Houses
    • Advance Wars 1+2 Re-Boot Camp
    • Hyrule Warriors : Age of Calamity
    • Mario + Rabbids: Sparks of Hope
    • Metroid Dread
    • Metroid Prime Remastered
    • Pokemon Violet
    • Pokemon Sword
    • SMT 5
    • Super Mario Odyssey
    • TloZ: Link's Awakening
    • TloZ: Skyward Sword HD
    • TloZ: Breath of The Wild
    • Xenoblade Chronicles DE
    • Xenoblade Chronicles 2
    • Xenoblade Chronicles 3
    • Yo Kai Watch 4

Copy link
Member

@riperiperi riperiperi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, the only unknown might be testing the vector indexing bug workaround for AMD/Adreno, since I don't think anyone has done that yet.

@@ -1,6 +1,6 @@
ivec2 Helper_TexelFetchScale(ivec2 inputVec, int samplerIndex)
{
float scale = s_render_scale[samplerIndex];
float scale = support_buffer.s_render_scale[1 + samplerIndex];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the graphics render scale reserved now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before GLSL was using different structs for graphics and compute. The compute struct had a s_reserved field before the s_render_scale array to account for the render target scale. Now they both use the same struct, so adding 1 here is necessary to account for that.

@gdkchan gdkchan merged commit 402f05b into Ryujinx:master May 20, 2023
8 checks passed
@gdkchan gdkchan deleted the load-cbuf branch May 20, 2023 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gpu Related to Ryujinx.Graphics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants