New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VideoBackends: GPU Texture Decoding #4467
Conversation
5104d84
to
9edfa17
Compare
|
Review status: 0 of 27 files reviewed at latest revision, 4 unresolved discussions. Source/Core/VideoBackends/OGL/GPUTimer.h, line 1 at r2 (raw file):
Should be a newline after Source/Core/VideoBackends/OGL/GPUTimer.h, line 2 at r2 (raw file):
This class doesn't seem to use Source/Core/VideoBackends/OGL/ProgramShaderCache.cpp, line 365 at r2 (raw file):
This can likely be either a std::string (assuming GLchar will always be Source/Core/VideoCommon/TextureConversionShader.cpp, line 1254 at r2 (raw file):
This can be a ranged-for loop or a Comments from Reviewable |
| ERROR_LOG(VIDEO, "%s Shader info log:\n%s", | ||
| type == GL_VERTEX_SHADER ? "VS" : type == GL_FRAGMENT_SHADER ? "PS" : "GS", infoLog); | ||
|
|
||
| const char* prefix = ""; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
98ea27d
to
403de18
Compare
7948a58
to
6ec8c7f
Compare
|
You should probably tell testers that there's a new GUI option for this under the texture cache slider. I knew there had to be an option, but i only found it on the 3rd look. Anyways, for it it increases the performance from the smash bros melee video average performace from sub 300 to 300+ fps(something like 260-290 changed to 300 to 340), changing the displayed minimum from 200 to 240 fps. |
|
FYI, the checkbox is missing in the Vulkan dialog. |
|
@linkmauve This is intended, I have a branch with Vulkan GPU texture decoding but it requires #4462 which would add a lot to this PR. If you want to build it yourself, the code is here: https://github.com/stenzek/dolphin/tree/vulkan-compute (but I doubt the speedup % would be significantly different to GL, it does have the advantage of grouping all the dispatches together, but at the cost of an extra copy). |
1c3d70e
to
556353b
Compare
49f6f8f
to
81b78e3
Compare
|
Works in Vulkan. Works in OpenGL. About 5% faster in the current worst case scenario on a PC that wasn't likely to be affected much by this. Likely much greater gains on slower PCs and Android. LGTM. |
|
Is something blocking this? |
81b78e3
to
8066d32
Compare
|
This PR has made a huge difference on the Shield TV. Windwaker is now full speed with occasional sub 29fps slow downs, very playable. |
|
@robofunk: I don't think this pr should provide a general performance boost on wind waker. Texture decoding should ony happen when new textures are loaded, or when Dolphin can't properly cache the textures. So it should only reduce the stutter when loading new levels or objects in wind waker. There are other games where the texture cache "fails" and Dolphin constantly has to decode textures, that's where i would expect big gains. Examples would be Harvest Moon(at night or something, ask JMC47) or Metroid Prime 3(?) when it's raining on your face. @stenzek : Can you confirm this? ^^ Can you tell me where you are testing this, so i can check if the texture cache is failing? On pc(most likely not implemented on android, at least not the gui part) you could check for yourself. In the graphics settings under advanced, there's "show statistics". Enable that and watch the "textues uploaded" counter. When that counter is standing still, the texture cache is working, and there shouldn't be any texture decoding going on. On the other hand, when it rises by at least 1 per frame, this pr should provide a noticable speedup, if gpu decoding is faster on your system than cpu decoding. But when that happens, it might also be possible that the texture cache could be optimised to catch this, and that would be faster than gpu decoding. |
|
@robofunk @mimimi085181 is correct here, this PR may reduce peak frame times ( -> stutter when moving the camera and such) but if you're standing in one place without moving the camera performance should be the same, unless the texture cache isn't working correctly. Furthermore, if you're on Android, I didn't add a UI option for enabling GPU decoding, so unless you edited the INI manually it's likely still using CPU decoding and the performance changes are likely due to something else (perhaps the recent Jit changes?). |
e3d23c0
to
e45dfe1
Compare
| @@ -119,12 +141,19 @@ TextureCache::TCacheEntryBase* TextureCache::CreateTexture(const TCacheEntryConf | |||
|
|
|||
| glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MAX_LEVEL, config.levels - 1); | |||
|
|
|||
| if (g_ogl_config.bSupportsTextureStorage) | |||
| glTexStorage3D(GL_TEXTURE_2D_ARRAY, config.levels, GL_RGBA8, config.width, config.height, 1); | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| @@ -267,7 +304,7 @@ TextureCache::TextureCache() | |||
|
|
|||
| if (g_ActiveConfig.backend_info.bSupportsPaletteConversion) | |||
| { | |||
| s32 buffer_size = 1024 * 1024; | |||
| s32 buffer_size = 32 * 1024 * 1024; | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| // determine whether we can use compute shaders, this check alone is sufficient. | ||
| g_Config.backend_info.bSupportsGPUTextureDecoding = | ||
| g_Config.backend_info.bSupportsPaletteConversion && | ||
| g_Config.backend_info.bSupportsComputeShaders; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| @@ -526,6 +528,7 @@ TextureCacheBase::TCacheEntryBase* TextureCacheBase::Load(const u32 stage) | |||
|
|
|||
| const u32 texture_size = | |||
| TexDecoder_GetTextureSizeInBytes(expandedWidth, expandedHeight, texformat); | |||
| u32 bytes_per_block = ((bsw * bsh * TexDecoder_GetTexelSizeInNibbles(texformat)) / 2); | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| { | ||
| const u8* tlut = &texMem[tlutaddr]; | ||
| u32 row_stride = bytes_per_block * (expandedWidth / bsw); | ||
| g_texture_cache->DecodeTextureOnGPU( |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
47d7667
to
cd81363
Compare
The loop was allocating one-too-many levels, as well as incorrect sizes for each level. Probably not an issue as mipmapped render targets aren't used, but the logic should be correct anyway.
This allows us to use glTexStorage on GL3.3 implementations that support the extension.
Allows the usage of glBindImageTexture and glMemoryBarrier from GLES (requires GLES 3.1).
Allows the usage of glDispatchCompute from GLES (requires GLES 3.1).
This ensures that they are complete textures by decoding time, as when using compute shaders we write directly to the destination texture.
This also changes bSupportsEarlyFragmentTests to bSupportsImageLoadStore, as it is used for both.
cd81363
to
105be9b
Compare
|
LGTM, and sorry for missing the 31st deadline. |
|
This causes graphical breakage when using custom textures on macOS: https://bugs.dolphin-emu.org/issues/10183 |
Fixes bug dolphin-emu#10183 [0] introduced by 3bd184a / PR dolphin-emu#4467 [1]. TextureCacheBase was no longer calling `entry->Load` for custom textures since the compute shader decoding logic was added. This adds it back in. It also slightly restructures the decoding if-group to match the one below, which I think makes the logic more obvious. (recommend viewing with `git diff -b` to ignore the indentation changes) [0]: https://bugs.dolphin-emu.org/issues/10183 [1]: dolphin-emu#4467
Fixes bug dolphin-emu#10183 [0] introduced by 3bd184a / PR dolphin-emu#4467 [1]. TextureCacheBase was no longer calling `entry->Load` for custom textures since the compute shader decoding logic was added. This adds it back in. It also slightly restructures the decoding if-group to match the one below, which I think makes the logic more obvious. (recommend viewing with `git diff -b` to ignore the indentation changes) [0]: https://bugs.dolphin-emu.org/issues/10183 [1]: dolphin-emu#4467
Fixes bug dolphin-emu#10183 [0] introduced by 3bd184a / PR dolphin-emu#4467 [1]. TextureCacheBase was no longer calling `entry->Load` for custom textures since the compute shader decoding logic was added. This adds it back in. It also slightly restructures the decoding if-group to match the one below, which I think makes the logic more obvious. (recommend viewing with `git diff -b` to ignore the indentation changes) [0]: https://bugs.dolphin-emu.org/issues/10183 [1]: dolphin-emu#4467
|
Hi. I can't see a difference using this with Intel HD Graphics 3000 on my laptop. Is there something I'm missing? |
|
@PizzicatoWolf I don't believe the HD3000 supports compute shaders/image load store, which are required for this feature. |
|
@stenzek Fair enough. It's 6 years old. Would you know if AMD and/or Nvidia have released laptop GPUs, released the same year as the HD3000, that support it? |
Fixes bug dolphin-emu#10183 [0] introduced by 3bd184a / PR dolphin-emu#4467 [1]. TextureCacheBase was no longer calling `entry->Load` for custom textures since the compute shader decoding logic was added. This adds it back in. It also slightly restructures the decoding if-group to match the one below, which I think makes the logic more obvious. (recommend viewing with `git diff -b` to ignore the indentation changes) [0]: https://bugs.dolphin-emu.org/issues/10183 [1]: dolphin-emu#4467

This PR adds support for decoding textures on the GPU using compute shaders.
It is not aimed at high-end PC users, those users will likely see very little if no benefit at all by using this option, and texture decoding is not something that happens a large number of times each frame in most games. It is aimed at those with slower CPUs, to alleviate some of the overhead by moving this work to the GPU (therefore possibly reducing some stutter) or ARM users, as there is no optimized texture decoding path for ARM.
Currently there is an OpenGL and Vulkan implementation, no support for either D3D backend is planned. Porting the shaders to HLSL would not be impossible, I just don't see much value in doing it (but there's no reason it couldn't be done if this is ever merged and someone else wants to do it, or even better, compile to SPIR-V and emit HLSL from that).