Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoBackends: GPU Texture Decoding #4467

Merged
merged 15 commits into from Apr 3, 2017

Conversation

stenzek
Copy link
Contributor

@stenzek stenzek commented Nov 22, 2016

This PR adds support for decoding textures on the GPU using compute shaders.

It is not aimed at high-end PC users, those users will likely see very little if no benefit at all by using this option, and texture decoding is not something that happens a large number of times each frame in most games. It is aimed at those with slower CPUs, to alleviate some of the overhead by moving this work to the GPU (therefore possibly reducing some stutter) or ARM users, as there is no optimized texture decoding path for ARM.

Currently there is an OpenGL and Vulkan implementation, no support for either D3D backend is planned. Porting the shaders to HLSL would not be impossible, I just don't see much value in doing it (but there's no reason it couldn't be done if this is ever merged and someone else wants to do it, or even better, compile to SPIR-V and emit HLSL from that).

@stenzek stenzek added the WIP / do not merge Work in progress (do not merge) label Nov 22, 2016
@lioncash
Copy link
Member

Review status: 0 of 27 files reviewed at latest revision, 4 unresolved discussions.


Source/Core/VideoBackends/OGL/GPUTimer.h, line 1 at r2 (raw file):

#pragma once

Should be a newline after #pragma once. Also missing the comment that usually goes at the top of source files.


Source/Core/VideoBackends/OGL/GPUTimer.h, line 2 at r2 (raw file):

#pragma once
#include <array>

This class doesn't seem to use <array> anywhere.


Source/Core/VideoBackends/OGL/ProgramShaderCache.cpp, line 365 at r2 (raw file):

  {
    GLsizei charsWritten;
    GLchar* infoLog = new GLchar[length];

This can likely be either a std::string (assuming GLchar will always be char*), or a std::vector of GLchar to get rid of the new/delete.


Source/Core/VideoCommon/TextureConversionShader.cpp, line 1254 at r2 (raw file):

const TextureDecodingShaderInfo* GetTextureDecodingShaderInfo(u32 format)
{
  for (size_t i = 0; i < ArraySize(decoding_shader_info); i++)

This can be a ranged-for loop or a std::find_if


Comments from Reviewable

ERROR_LOG(VIDEO, "%s Shader info log:\n%s",
type == GL_VERTEX_SHADER ? "VS" : type == GL_FRAGMENT_SHADER ? "PS" : "GS", infoLog);

const char* prefix = "";

This comment was marked as off-topic.

@stenzek stenzek force-pushed the gpu-texture-decoding branch 3 times, most recently from 98ea27d to 403de18 Compare November 22, 2016 16:08
@JMC47
Copy link
Contributor

JMC47 commented Nov 23, 2016

Harvest Moon is still broken. Reported it earlier before the PR but musta got forgotten.

gywee9-9

@stenzek stenzek force-pushed the gpu-texture-decoding branch 2 times, most recently from 7948a58 to 6ec8c7f Compare November 23, 2016 08:46
@mimimi085181
Copy link
Contributor

You should probably tell testers that there's a new GUI option for this under the texture cache slider. I knew there had to be an option, but i only found it on the 3rd look.

Anyways, for it it increases the performance from the smash bros melee video average performace from sub 300 to 300+ fps(something like 260-290 changed to 300 to 340), changing the displayed minimum from 200 to 240 fps.

@linkmauve
Copy link
Member

FYI, the checkbox is missing in the Vulkan dialog.

@stenzek
Copy link
Contributor Author

stenzek commented Nov 26, 2016

@linkmauve This is intended, I have a branch with Vulkan GPU texture decoding but it requires #4462 which would add a lot to this PR. If you want to build it yourself, the code is here: https://github.com/stenzek/dolphin/tree/vulkan-compute (but I doubt the speedup % would be significantly different to GL, it does have the advantage of grouping all the dispatches together, but at the cost of an extra copy).

@stenzek stenzek force-pushed the gpu-texture-decoding branch 3 times, most recently from 1c3d70e to 556353b Compare November 28, 2016 05:58
@stenzek stenzek force-pushed the gpu-texture-decoding branch 2 times, most recently from 49f6f8f to 81b78e3 Compare December 9, 2016 12:30
@JMC47
Copy link
Contributor

JMC47 commented Dec 9, 2016

Works in Vulkan. Works in OpenGL. About 5% faster in the current worst case scenario on a PC that wasn't likely to be affected much by this. Likely much greater gains on slower PCs and Android.

LGTM.

@bb010g
Copy link
Contributor

bb010g commented Dec 19, 2016

Is something blocking this?

@robofunk
Copy link

This PR has made a huge difference on the Shield TV. Windwaker is now full speed with occasional sub 29fps slow downs, very playable.

@mimimi085181
Copy link
Contributor

@robofunk: I don't think this pr should provide a general performance boost on wind waker. Texture decoding should ony happen when new textures are loaded, or when Dolphin can't properly cache the textures. So it should only reduce the stutter when loading new levels or objects in wind waker. There are other games where the texture cache "fails" and Dolphin constantly has to decode textures, that's where i would expect big gains. Examples would be Harvest Moon(at night or something, ask JMC47) or Metroid Prime 3(?) when it's raining on your face.

@stenzek : Can you confirm this? ^^

Can you tell me where you are testing this, so i can check if the texture cache is failing? On pc(most likely not implemented on android, at least not the gui part) you could check for yourself. In the graphics settings under advanced, there's "show statistics". Enable that and watch the "textues uploaded" counter. When that counter is standing still, the texture cache is working, and there shouldn't be any texture decoding going on. On the other hand, when it rises by at least 1 per frame, this pr should provide a noticable speedup, if gpu decoding is faster on your system than cpu decoding. But when that happens, it might also be possible that the texture cache could be optimised to catch this, and that would be faster than gpu decoding.

@stenzek
Copy link
Contributor Author

stenzek commented Jan 29, 2017

@robofunk @mimimi085181 is correct here, this PR may reduce peak frame times ( -> stutter when moving the camera and such) but if you're standing in one place without moving the camera performance should be the same, unless the texture cache isn't working correctly.

Furthermore, if you're on Android, I didn't add a UI option for enabling GPU decoding, so unless you edited the INI manually it's likely still using CPU decoding and the performance changes are likely due to something else (perhaps the recent Jit changes?).

@@ -119,12 +141,19 @@ TextureCache::TCacheEntryBase* TextureCache::CreateTexture(const TCacheEntryConf

glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MAX_LEVEL, config.levels - 1);

if (g_ogl_config.bSupportsTextureStorage)
glTexStorage3D(GL_TEXTURE_2D_ARRAY, config.levels, GL_RGBA8, config.width, config.height, 1);

This comment was marked as off-topic.

@@ -267,7 +304,7 @@ TextureCache::TextureCache()

if (g_ActiveConfig.backend_info.bSupportsPaletteConversion)
{
s32 buffer_size = 1024 * 1024;
s32 buffer_size = 32 * 1024 * 1024;

This comment was marked as off-topic.

// determine whether we can use compute shaders, this check alone is sufficient.
g_Config.backend_info.bSupportsGPUTextureDecoding =
g_Config.backend_info.bSupportsPaletteConversion &&
g_Config.backend_info.bSupportsComputeShaders;

This comment was marked as off-topic.

@@ -526,6 +528,7 @@ TextureCacheBase::TCacheEntryBase* TextureCacheBase::Load(const u32 stage)

const u32 texture_size =
TexDecoder_GetTextureSizeInBytes(expandedWidth, expandedHeight, texformat);
u32 bytes_per_block = ((bsw * bsh * TexDecoder_GetTexelSizeInNibbles(texformat)) / 2);

This comment was marked as off-topic.

{
const u8* tlut = &texMem[tlutaddr];
u32 row_stride = bytes_per_block * (expandedWidth / bsw);
g_texture_cache->DecodeTextureOnGPU(

This comment was marked as off-topic.

The loop was allocating one-too-many levels, as well as incorrect sizes
for each level. Probably not an issue as mipmapped render targets aren't
used, but the logic should be correct anyway.
This allows us to use glTexStorage on GL3.3 implementations that support
the extension.
Allows the usage of glBindImageTexture and glMemoryBarrier from GLES
(requires GLES 3.1).
Allows the usage of glDispatchCompute from GLES (requires GLES 3.1).
This ensures that they are complete textures by decoding time, as when
using compute shaders we write directly to the destination texture.
This also changes bSupportsEarlyFragmentTests to
bSupportsImageLoadStore, as it is used for both.
@degasus
Copy link
Member

degasus commented Apr 2, 2017

LGTM, and sorry for missing the 31st deadline.

@degasus degasus merged commit 3bd184a into dolphin-emu:master Apr 3, 2017
@ligfx
Copy link
Contributor

ligfx commented Apr 4, 2017

This causes graphical breakage when using custom textures on macOS: https://bugs.dolphin-emu.org/issues/10183

ligfx added a commit to ligfx/dolphin that referenced this pull request Apr 4, 2017
Fixes bug dolphin-emu#10183 [0] introduced by 3bd184a / PR dolphin-emu#4467 [1].

TextureCacheBase was no longer calling `entry->Load` for custom textures
since the compute shader decoding logic was added. This adds it back in.
It also slightly restructures the decoding if-group to match the one
below, which I think makes the logic more obvious.

(recommend viewing with `git diff -b` to ignore the indentation changes)

[0]: https://bugs.dolphin-emu.org/issues/10183
[1]: dolphin-emu#4467
ligfx added a commit to ligfx/dolphin that referenced this pull request Apr 4, 2017
Fixes bug dolphin-emu#10183 [0] introduced by 3bd184a / PR dolphin-emu#4467 [1].

TextureCacheBase was no longer calling `entry->Load` for custom textures
since the compute shader decoding logic was added. This adds it back in.
It also slightly restructures the decoding if-group to match the one
below, which I think makes the logic more obvious.

(recommend viewing with `git diff -b` to ignore the indentation changes)

[0]: https://bugs.dolphin-emu.org/issues/10183
[1]: dolphin-emu#4467
ligfx added a commit to ligfx/dolphin that referenced this pull request Apr 4, 2017
Fixes bug dolphin-emu#10183 [0] introduced by 3bd184a / PR dolphin-emu#4467 [1].

TextureCacheBase was no longer calling `entry->Load` for custom textures
since the compute shader decoding logic was added. This adds it back in.
It also slightly restructures the decoding if-group to match the one
below, which I think makes the logic more obvious.

(recommend viewing with `git diff -b` to ignore the indentation changes)

[0]: https://bugs.dolphin-emu.org/issues/10183
[1]: dolphin-emu#4467
@EchoUrandom404
Copy link

Hi. I can't see a difference using this with Intel HD Graphics 3000 on my laptop. Is there something I'm missing?

@stenzek
Copy link
Contributor Author

stenzek commented Apr 8, 2017

@PizzicatoWolf I don't believe the HD3000 supports compute shaders/image load store, which are required for this feature.

@EchoUrandom404
Copy link

@stenzek Fair enough. It's 6 years old. Would you know if AMD and/or Nvidia have released laptop GPUs, released the same year as the HD3000, that support it?

@degasus degasus mentioned this pull request Apr 23, 2017
mahdihijazi pushed a commit to mahdihijazi/dolphin that referenced this pull request Apr 27, 2017
Fixes bug dolphin-emu#10183 [0] introduced by 3bd184a / PR dolphin-emu#4467 [1].

TextureCacheBase was no longer calling `entry->Load` for custom textures
since the compute shader decoding logic was added. This adds it back in.
It also slightly restructures the decoding if-group to match the one
below, which I think makes the logic more obvious.

(recommend viewing with `git diff -b` to ignore the indentation changes)

[0]: https://bugs.dolphin-emu.org/issues/10183
[1]: dolphin-emu#4467
@stenzek stenzek deleted the gpu-texture-decoding branch June 13, 2017 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet