Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APPLE: GPU frustum culling #2053

Closed

Conversation

creijon
Copy link
Contributor

@creijon creijon commented Sep 30, 2022

Description of Change(s)

Implementation of GPU culling using compute shaders in the PipelineDrawBatch. The vertex-shader transform feedback approach that was used previously in the IndirectDrawBatch has not been supported in HGI since it would require multiple command buffers for synchronisation of the stages. The compute shader approach does not require a separate reset pass for instance culling since it can expand out the instances in a loop within the compute shader and populate the drawCommand buffers directly.

Indexing into the drawCommands in the shader is done through a small emulation function in the codegen, GetDrawingCoordField(), which has to be implemented in the kernel. This uses the thread ID to offset into the drawCommand buffer and access the encoded instance count, transform etc directly.

Also, the g_instanceID parameter needs to be set in the culling kernel so that the codegen can access it in GetInstanceIndexCoord().

For performance reasons, I've added a flag to specify that the compute encoder is created with concurrent dispatch.

To use the PipelineDrawBatch on OpenGL the option HDST_ENABLE_HGI_RESOURCE_GENERATION has been changed to default true

There was also an issue around how the garbage collector was releasing resources outside of the HGI BeginFrame::EndFrame pair. This was causing a crash on OpenGL when picking a prim in the viewport. I've put a potential fix for this in HdxPickTask.

Since the ICB changes have been merged into dev, this PR has been rebased on top of this. ICBs need to run in a separate batch, so they have been moved to the new Hd_DrawBatch::BeforeDraw method.

Fixed the Windows build by using an #undef on the MemoryBarrier define. I suggest we change the name of this function to avoid future issues.

Fixes Issue(s)

Culling on instanced and non-instanced primitives on the GPU.

Tested on:

MacBook Pro M1 Max, with macOS Ventura
MacBook Pro i7 Radeon 560, with macOS Ventura
HP Xeon nVidia 3080, with Ubuntu 20.02

Update

This was originally submitted as PR: #2045
but was accidentally closed with a force-push. Reopened.

  • I have submitted a signed Contributor License Agreement

@creijon creijon changed the title GPU frustum culling APPLE: GPU frustum culling Sep 30, 2022
@creijon creijon force-pushed the jon/dev/culling_without_icb branch 4 times, most recently from fa557df to 2bec208 Compare October 3, 2022 13:33
@tallytalwar
Copy link
Contributor

Filed as internal issue #USD-7669

@creijon creijon force-pushed the jon/dev/culling_without_icb branch 2 times, most recently from 304bd1f to cc5a187 Compare December 12, 2022 05:47
pixar-oss pushed a commit that referenced this pull request Jan 4, 2023
This is important for Hgi garbage collection.

Contribution: Jon Creighton

Fixes #2053

(Internal change: 2258046)
pixar-oss pushed a commit that referenced this pull request Jan 4, 2023
Moved CPU view frustum culling to be an internal
aspect of command buffer instead of split between
command buffer and renderPass. Also cleaned up some
out of date header includes.

Contribution: Jon Creighton

Fixes #2053

(Internal change: 2258054)
(Internal change: 2258057)
pixar-oss pushed a commit that referenced this pull request Jan 4, 2023
pixar-oss pushed a commit that referenced this pull request Jan 4, 2023
Added support for generating pipeline shaders for CS in
preparation for supporting GPU view frustum culling
using compute.

Instead of receiving drawing coord values as attribute
inputs, generated CS shaders are set up to fetch drawing
coordinate values directly from a dispatch buffer.
The client CS needs to set the current draw index and
current instance index before using any primvar accessors.

Contribution: Jon Creighton, David G Yu

Fixes #2053

(Internal change: 2258393)
pixar-oss pushed a commit that referenced this pull request Jan 4, 2023
Implementation of GPU culling using compute shaders in the
PipelineDrawBatch. The vertex-shader transform feedback
approach that was used previously in the IndirectDrawBatch
has not been supported in HGI since it would require multiple
command buffers for synchronisation of the stages.

The compute shader approach does not require a separate
reset pass for instance culling since it can expand out
the instances in a loop within the compute shader and
populate the drawCommand buffers directly.

Since the ICB changes have been merged into dev, this
PR has been rebased on top of this. ICBs need to run
in a separate batch, so they have been moved to the
new Hd_DrawBatch::EncodeDraw method.

Contribution: Jon Creighton

Fixes #2053

(Internal change: 2258394)
pixar-oss pushed a commit that referenced this pull request Jan 4, 2023
Currently, this affects the Storm Metal and Vulkan backends only and
is guarded by HDST_ENABLE_PIPELINE_DRAW_BATCH_GPU_FRUSTUM_CULLING
and the default is now set to true. View frustum culling for GL
remains enabled by default using the MDI implementation.

Contribution: Jon Creighton

Fixes #2053

(Internal change: 2258398)
pixar-oss pushed a commit that referenced this pull request Jan 4, 2023
This provides a significant performance improvement for dispatching
GPU frustum culling compute commands on Metal.

Contributions: Jon Creighton

Fixes #2053

(Internal change: 2258585)
@creijon creijon closed this Jan 6, 2023
@creijon creijon deleted the jon/dev/culling_without_icb branch January 6, 2023 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants