PixelShaderGen: Use subgroup reduction for bounding box #7904

stenzek · 2019-03-17T03:03:56Z

Currently, we perform 4 atomic operations for every fragment being shaded when bounding box is enabled (assuming they aren't elided by the branch).

This branch reduces the number of atomic operations by up to a factor of 32 (or the GPU's warp/wave size), by doing a warp-wise min/max reduction, and only performing the atomic operations on the first active thread.

Unfortunately for GL, this is NVIDIA-only, since as far as I can tell there's no vendor-neutral extension for doing shuffles or subgroup reduction operations. Vulkan support is dependent on the vendor implementing Vulkan 1.1 and supporting GroupNonUniformArithmetic?

Source/Core/VideoBackends/OGL/ProgramShaderCache.cpp

Tilka · 2019-03-17T15:16:31Z

Source/Core/VideoCommon/PixelShaderGen.cpp

+#ifdef SUPPORTS_WARP_REDUCTION
+  WARP_MIN(minpos);
+  WARP_MAX(maxpos);
+  if (IS_FIRST_ACTIVE_WARP)


It's possible that this new condition will prevent the loads for the bbox_* values from being scheduled near the beginning of the shader, so it may actually end up being slower. Maybe help the compiler a bit by moving them there manually.

Degerz · 2019-03-18T01:45:12Z

Is it possible to do this on D3D as well ?

For D3D11, both AMD and Nvidia have driver extensions that might be of some interest to you via AGS or NVAPI. For D3D12, is shader model 6 another viable option to you ?

stenzek · 2019-03-18T01:50:38Z

@Degerz I don't really have any desire to implement it in D3D11 with vendor-specific stuff, since it's kinda messy.

SM6 could be an option with the D3D12 backend, but we have to merge that first. The only concern I would have is bloating the download size with DXCompiler, as it doesn't seem to be available anywhere in the system (only in the SDK AFAICT).

JMC47 · 2019-03-26T03:57:39Z

Works now and doesn't crash immediately on a new game anymore.

Is ~20% faster in OpenGL and 2% faster in Vulkan.

Degerz · 2019-03-27T01:06:10Z

I checked the latest Metal Shading Language spec and found out that it supports SIMD-group functions which looks pretty similar to Vulkan's subgroup operations. Is there any chance that SPIRV-Cross can handle this for Metal ?

…uffle

stenzek · 2019-03-29T10:19:19Z

@Degerz I don't see why SPIRV-Cross couldn't implement the extension, assuming the semantics are the same. There may be extra steps required to ensure the same behavior (e.g. helper or discarded threads/invocations).

degasus · 2023-01-31T14:06:16Z

As KHR_shader_subgroup seems to be usable on AMD now, shall we switch the OGL implementation to use KHR_shader_subgroup instead of NV_shader_thread_group?

If I interpret the numbers here correct https://opengl.gpuinfo.org/listreports.php?extension=GL_NV_shader_thread_group vs https://opengl.gpuinfo.org/listreports.php?extension=GL_KHR_shader_subgroup , AMD very recently gained support for the newer extension and INTEL has it for a while already. Both don't support the NV extention through.

However I'm unsure how many Nvidia users we might loose by switching to KHR_shader_subgroup on OGL.

What is your opinion?

If you want to try it: #11523

Source/Core/VideoBackends/Vulkan/ShaderCompiler.cpp

Tilka reviewed Mar 17, 2019

View reviewed changes

Source/Core/VideoBackends/OGL/ProgramShaderCache.cpp Outdated Show resolved Hide resolved

Tilka reviewed Mar 17, 2019

View reviewed changes

stenzek changed the title ~~PixelShaderGen: Use warp reduction operations for bounding box~~ PixelShaderGen: Use subgroup reduction for bounding box Mar 22, 2019

stenzek added 3 commits March 29, 2019 20:06

OGL: Support subgroup reduction operations via GL_NV_shader_thread_sh…

86da282

…uffle

Vulkan: Support subgroup reduction operations via GL_KHR_shader_subgroup

6561850

PixelShaderGen: Use subgroup reduction operations for bounding box

d66d778

stenzek merged commit a50a34b into dolphin-emu:master Mar 29, 2019

stenzek deleted the do-the-atomic-shuffle branch March 29, 2019 10:29

degasus reviewed Feb 1, 2023

View reviewed changes

Source/Core/VideoBackends/Vulkan/ShaderCompiler.cpp Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PixelShaderGen: Use subgroup reduction for bounding box #7904

PixelShaderGen: Use subgroup reduction for bounding box #7904

stenzek commented Mar 17, 2019 •

edited

Tilka Mar 17, 2019

Degerz commented Mar 18, 2019

stenzek commented Mar 18, 2019

JMC47 commented Mar 26, 2019 •

edited

Degerz commented Mar 27, 2019

stenzek commented Mar 29, 2019 •

edited

degasus commented Jan 31, 2023 •

edited

PixelShaderGen: Use subgroup reduction for bounding box #7904

PixelShaderGen: Use subgroup reduction for bounding box #7904

Conversation

stenzek commented Mar 17, 2019 • edited

Tilka Mar 17, 2019

Choose a reason for hiding this comment

Degerz commented Mar 18, 2019

stenzek commented Mar 18, 2019

JMC47 commented Mar 26, 2019 • edited

Degerz commented Mar 27, 2019

stenzek commented Mar 29, 2019 • edited

degasus commented Jan 31, 2023 • edited

stenzek commented Mar 17, 2019 •

edited

JMC47 commented Mar 26, 2019 •

edited

stenzek commented Mar 29, 2019 •

edited

degasus commented Jan 31, 2023 •

edited