Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PixelShaderGen: Use subgroup reduction for bounding box #7904

Merged
merged 3 commits into from Mar 29, 2019
Merged

PixelShaderGen: Use subgroup reduction for bounding box #7904

merged 3 commits into from Mar 29, 2019

Conversation

stenzek
Copy link
Contributor

@stenzek stenzek commented Mar 17, 2019

Currently, we perform 4 atomic operations for every fragment being shaded when bounding box is enabled (assuming they aren't elided by the branch).

This branch reduces the number of atomic operations by up to a factor of 32 (or the GPU's warp/wave size), by doing a warp-wise min/max reduction, and only performing the atomic operations on the first active thread.

Unfortunately for GL, this is NVIDIA-only, since as far as I can tell there's no vendor-neutral extension for doing shuffles or subgroup reduction operations. Vulkan support is dependent on the vendor implementing Vulkan 1.1 and supporting GroupNonUniformArithmetic?

#ifdef SUPPORTS_WARP_REDUCTION
WARP_MIN(minpos);
WARP_MAX(maxpos);
if (IS_FIRST_ACTIVE_WARP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that this new condition will prevent the loads for the bbox_* values from being scheduled near the beginning of the shader, so it may actually end up being slower. Maybe help the compiler a bit by moving them there manually.

@Degerz
Copy link

Degerz commented Mar 18, 2019

Is it possible to do this on D3D as well ?

For D3D11, both AMD and Nvidia have driver extensions that might be of some interest to you via AGS or NVAPI. For D3D12, is shader model 6 another viable option to you ?

@stenzek
Copy link
Contributor Author

stenzek commented Mar 18, 2019

@Degerz I don't really have any desire to implement it in D3D11 with vendor-specific stuff, since it's kinda messy.

SM6 could be an option with the D3D12 backend, but we have to merge that first. The only concern I would have is bloating the download size with DXCompiler, as it doesn't seem to be available anywhere in the system (only in the SDK AFAICT).

@stenzek stenzek changed the title PixelShaderGen: Use warp reduction operations for bounding box PixelShaderGen: Use subgroup reduction for bounding box Mar 22, 2019
@JMC47
Copy link
Contributor

JMC47 commented Mar 26, 2019

Works now and doesn't crash immediately on a new game anymore.

Is ~20% faster in OpenGL and 2% faster in Vulkan.

@Degerz
Copy link

Degerz commented Mar 27, 2019

I checked the latest Metal Shading Language spec and found out that it supports SIMD-group functions which looks pretty similar to Vulkan's subgroup operations. Is there any chance that SPIRV-Cross can handle this for Metal ?

@stenzek
Copy link
Contributor Author

stenzek commented Mar 29, 2019

@Degerz I don't see why SPIRV-Cross couldn't implement the extension, assuming the semantics are the same. There may be extra steps required to ensure the same behavior (e.g. helper or discarded threads/invocations).

@stenzek stenzek merged commit a50a34b into dolphin-emu:master Mar 29, 2019
@stenzek stenzek deleted the do-the-atomic-shuffle branch March 29, 2019 10:29
@degasus
Copy link
Member

degasus commented Jan 31, 2023

As KHR_shader_subgroup seems to be usable on AMD now, shall we switch the OGL implementation to use KHR_shader_subgroup instead of NV_shader_thread_group?

If I interpret the numbers here correct https://opengl.gpuinfo.org/listreports.php?extension=GL_NV_shader_thread_group vs https://opengl.gpuinfo.org/listreports.php?extension=GL_KHR_shader_subgroup , AMD very recently gained support for the newer extension and INTEL has it for a while already. Both don't support the NV extention through.

However I'm unsure how many Nvidia users we might loose by switching to KHR_shader_subgroup on OGL.

What is your opinion?

If you want to try it: #11523

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants