Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New shader cache implementation #3194

Merged
merged 40 commits into from
Apr 10, 2022
Merged

New shader cache implementation #3194

merged 40 commits into from
Apr 10, 2022

Conversation

gdkchan
Copy link
Member

@gdkchan gdkchan commented Mar 13, 2022

This change has the goal of fixing issues related to the shader cache implementation. Here, shader cache refers to two different caches: the memory shader cache and the disk shader cache. Below I divide shader cache problems in categories, and explain how each one is addressed here.

Performance

Shader lists per address are inefficient

The current implementation keeps a list of all shaders that were ever located at a given GPU address. This list keeps growing forever, which is already a problem, but even worse is that every time there is a shader at that address, it will check every single shader in the list to try finding a match. That means performance can degrade over time as the list grows larger. It was done this way because at the time there was no disk cache, and I did not want to implement hash based lookup.

The new cache removed the list and now keeps only one entry per address, which means that it always do a single check for shader match. If that fails, it then does a hash based lookup which can find any shader that was already compiled. The implementation on master can do hash based lookup, but only when the shader cache is enabled, while the new one can also do it with shader cache disabled. Furthermore, it no longer needs to know the shader size to do the lookup.

The new lookup uses a sorted list of sizes, with the sizes of all shaders in the cache. For each size, it also has a hash table with all shaders of that size (hash and associated data). It starts by binary searching for the correct size. First, it starts at the middle of the list, and hashes that amount of bytes in memory of the shader GPU address. It then checks on the hash table, if there is any shader with this size whose code matches the hash. If there is a match, it can be either full or partial. Partial means that there is a shader in the cache that matches the hash, but the hash is only for part of the code, and in this case it should keep searching on shaders of greater size, going right on the sorted size list. If the match is full, then the entire code matches the hash and it found the shader. If there is no match, then it should try shaders with lower sizes, going left on the sorted list.

This approach avoids the need of decoding the shader code to find the size, which can save some time.

Packing when closing

The old cache uses a ZIP archive where all shader files are packed and compressed. This has a cost that is paid every time the emulation is stopped or the emulator closed, since it has to pack everything back, making the shutdown process slower. The new cache solves this by not using a ZIP archive at all. Now it uses 6 files, which are 3 .toc and .data file pairs. They are "guest", which stores the guest code and constant buffer data required to translate the shader, "shared", which contains GPU state needed to translate the shader, in addition to metadata produced by the translator, and "host", which contains the host shader binary, retrieved from the driver. The host file pair is also per API and per vendor. The .toc file contains offsets into the .data file in addition to a small header. The .data file contains the actual header. This is because those files are append-only, and each entry has variable length, so that allows easy random access of entries, while only needing to write at the end of both files.

Memory usage

Sometimes, the GLSL code is kept in memory

On master, the GLSL code of shaders that are compiled (not from the disk cache) is kept in memory. After compilation, it is not really needed so keeping them in memory is just a waste. The new cache solves this by storing the ShaderProgramInfo rather ShaderProgram in the cahce. The GLSL code is located on ShaderProgram. To allow this, some state was moved from ShaderProgram to ShaderProgramInfo. A copy of the guest code is also kept in memory for comparison purposes. On master, if there are different combinations of the same shader stages, it would need to compile new shaders, and would also create new arrays to store the guest code. The new cache re-uses the existing arrays to avoid wasting memory with the same data.

Correctness

Lack of shader specialization

Since shader translation depends on GPU state, the correct thing to do would be creating a new version of the shader if this state changes. But on master, this is not done, and the old cache does not support storing different versions of the same shader. This is addresses here by adding a SpecializationState where the state for a version of the shader is stored. In addition to that, it checks if the state recorded there matches the current GPU state for every use. If it doesn't match, a new version of the shader is compiled for the new state.

This fixes graphical glitches on Yokai Watch 1 and maybe other games.
Before:

Uk0BpudCts.mp4

After:

P1jnBrgaR4.mp4

Fixes #3004.

Other

Bindless textures were not supported.

The old cache did not support shaders using bindless textures. This seems to be because it does not save the constant buffer slot where the texture handle is located on the cache. The new cache stores all information required to recompile those shaders, so games using bindless textures (UE4 games and a few others) are now supported.

Fixes #2045.

Closing while the shader cache loads

It was not possible to close the emulator clealy, or stop emulation while the shader cache was loaded. This is very annoying when one opens a game by mistake, and then finds they now need to wait until the shader cache finishes loading to close. This has been addressed now, the shader cache Initialize method takes a cancellation token that can be used by the UI to cancel the load process at any time.

TODO:

  • Ensure that it works with multiple emulator instances.
  • Ensure it can handle corrupted cache files (either discard or repair).
  • XML docs.
  • Migration from old cache format.
  • Make purge shader cache option also delete the new cache.
  • Investigate driver crash on NVIDIA when rebuilding after migrating from old cache.

@gdkchan gdkchan added enhancement New feature or request gpu Related to Ryujinx.Graphics fix Fix something labels Mar 13, 2022
Copy link
Contributor

@marysaka marysaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick dive into it.

Great work this does address the shortcomings of my initial implementation 👍
I'm not entirely sure if we would be able to carry over the old cache tho...

Ryujinx.Common/Cache/HashState.cs Outdated Show resolved Hide resolved
Ryujinx.Graphics.Gpu/Shader/ShaderSpecializationState.cs Outdated Show resolved Hide resolved
@MutantAura
Copy link
Collaborator

Putting these here from discord. Seems to have 2 core issues so far (error codes below):

On 2nd boot of a game directly after the cache initialising:

00:00:01.547 |E| GPU.MainThread Application : Unhandled exception caught: System.IndexOutOfRangeException: Index was outside the bounds of the array. at Ryujinx.Graphics.Gpu.Shader.DiskCacheHostStorage.LoadShaders(GpuContext context, ShaderCacheHashTable graphicsCache, ComputeShaderCacheHashTable computeCache, ParallelDiskCacheLoader loader) in C:\Users\domw0\Documents\GitHub\Ryujinx\Ryujinx.Graphics.Gpu\Shader\DiskCacheHostStorage.cs:line 126

When running in portable mode I could consistently get this error:

00:00:10.645 |E| Gpu.BackgroundDiskCacheWriter Application : Unhandled exception caught: System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\stuff\ryujinx-Release-1.0.0+afcc705-win_x64\portable\games\01007ef00011e000\cache\shader\shared.toc'. at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String , FileMode , FileAccess , FileShare , FileOptions )

Other users reported the 2nd error even without portable mode.
Log error 1:
Ryujinx_1.0.0-dirty_2022-03-13_11-47-26.log

Log error 2:
Ryujinx_1.1.0afcc705_2022-03-13_13-18-59.log

@LukeWarnut

This comment was marked as outdated.

@LukeWarnut
Copy link
Contributor

This fixes graphical glitches on Yokai Watch 1, Charge Kid, and maybe other games.

I am not finding any difference in Charge Kid.

Master:
image
image

PR:
image
image

@LukeWarnut

This comment was marked as outdated.

@gdkchan
Copy link
Member Author

gdkchan commented Mar 13, 2022

All reported issues should be fixed now.

@gdkchan
Copy link
Member Author

gdkchan commented Mar 13, 2022

This fixes graphical glitches on Yokai Watch 1, Charge Kid, and maybe other games.

I am not finding any difference in Charge Kid.

The graphics used to look blocky ingame due to it using non-normalized coordinates. But I can't reproduce it now on master, maybe it's random, maybe something else fixed it, who knows. I will just pretend it never happened I guess.

@GamerzHell9137
Copy link

Atelier Sophie 2 on master was showing

00:07:49.861 |E| GUI.RenderLoop Gpu : GL Sync Object 56236 failed to signal within 1000ms. Continuing...

and didn't let players progress in the game but with the shader cache rewrite it doesn't freeze in the Near Forest and the game runs properly!

image

@riperiperi
Copy link
Member

riperiperi commented Mar 14, 2022

Great to finally have shaders using bindless textures covered by the shader cache. Been doing some early testing and Unreal Engine 4 games are much smoother, even when master has a driver shader cache present. (which inevitably ends up wiped, whereas this does not)

One quick example is in Mario Party Superstars when the character select comes up. In master, it noticably freezes while it translates the shaders for all the characters, whereas with this PR it finds the shaders much faster and probably spends more time loading their textures.

Master (driver shader cache present):

Shadercache-old.mp4

PR:

Shadercache-new.mp4

I'll make some more comparisons shortly.

@Shahil-Ayato
Copy link
Contributor

Closes #2045

@Shahil-Ayato
Copy link
Contributor

Closes #864

@Shahil-Ayato
Copy link
Contributor

Potentially closes #1768

@Shahil-Ayato
Copy link
Contributor

Potentially closes #2603

@Shahil-Ayato
Copy link
Contributor

Shahil-Ayato commented Mar 15, 2022

Potentially closes #2851

@gdkchan
Copy link
Member Author

gdkchan commented Mar 15, 2022

Closes #864

This solves the second issue, but not the first one described there. I will think about whenever it should be closed or not.

@A-w-x
Copy link
Contributor

A-w-x commented Mar 16, 2022

fixes minor shield glitch on SSBU on intel (linux)

@gdkchan gdkchan marked this pull request as ready for review March 29, 2022 01:07
@gdkchan
Copy link
Member Author

gdkchan commented Mar 29, 2022

This is ready for review/testing now.
A few things that I changed and are worth noting:

  • CompileShader was removed from IRenderer. A new ShaderSource structure was added. It has the shader code (with a string for text shaders and a byte array for binary shaders), the shader stage and language enums. The structure is now passed to CreateProgram directly. This was done not only to simplify the GAL interface, but also to allow OpenGL shader objects to be deleted early (they are now deleted after checking the link status). This might allow reducing the driver memory usage.
  • The QueryIsTextureRectangle method was changed to QueryTextureCoordNormalized. It now just returns the coordinate normalized state from the texture pool descritor, rather than "is it a 2D texture and coordinates are not normalized" (rectangle texture), since this is the correct thing to do here (as coordinate normalization is not limited to 2D texture, although NVN only has rectangle textures). This is less problematic now with specialization state, since if for some reason that state changes, it will just recompile the shader.

Things to test:

  • Performance. Since it now checks if the shader state matches current GPU state before each draw, it could have some performance impact.
  • For regressions in general.

@MutantAura
Copy link
Collaborator

Works very well on my system. All caches were migrated without issue other than Breath of the Wild which had a lot of invalid texture descriptor errors and then hung for around 10 mins with a full translation bar. It did eventually resolve itself so perhaps some patience will be required for those larger caches.

Performance was almost identical so I don't think that's an issue. The only game I could consistency measure any kind of loss was PLA and it was minor. May need some lower end system tests here though.

Master:
image
image

New:
image
image

gdkchan and others added 26 commits April 8, 2022 10:59
…w ShaderSource struct passed to CreateProgram directly
…existing new caches (file format version was incremented)
…ad of crashing

This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fix Fix something gpu Related to Ryujinx.Graphics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Yo-Kai Watch 1 for Nintendo Switch: Grafical issue Bindless textures break the shader cache