Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement minimal GPU culling for cameras. #12673

Closed
wants to merge 1 commit into from

Conversation

pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Mar 23, 2024

This commit introduces a new component, GpuCulling, which, when present on a camera, skips the CPU visibility check in favor of doing the frustum culling on the GPU. This trades off potentially-increased CPU work and drawcalls in favor of cheaper culling and doesn't improve the performance of any workloads that I know of today. However, it opens the door to significant optimizations in the future by taking the necessary first step toward GPU-driven rendering.

Enabling GPU culling for a view puts the rendering for that view into indirect mode. In indirect mode, CPU-level visibility checks are skipped, and all renderable entities are considered potentially visible. Bevy's batching logic still runs as usual, but it doesn't directly generate mesh instance indices. Instead, it generates instance handles, which are indices into an array of real instance indices. Before any rendering is done, for each view, a compute shader, cull.wgsl, maps instance handles to instance indices, discarding any instance handles that represent meshes that are outside the visible frustum. Draws are then done using the indirect draw feature of wgpu, which instructs the GPU to read the number of actual instances from the output of that compute shader.

Essentially, GPU culling works by adding a new level of indirection between the CPU's notion of instances (known as instance handles) and the GPU's notion of instances.

A new --gpu-culling flag has been added to the many_foxes, many_cubes, and 3d_shapes examples.

Potential follow-ups include:

  • Split up RenderMeshInstances into CPU-driven and GPU-driven parts. The former, which contain fields like the transform, won't be initialized at all in when GPU culling is enabled. Instead, the transform will be directly written to the GPU in extract_meshes, like extract_skins does for joint matrices.

  • Implement GPU culling for shadow maps.

    • Following that, we can treat all cascades as one as far as the CPU is concerned, simply replaying the final draw commands with different view uniforms, which should reduce the CPU overhead considerably.
  • Retain bins from frame to frame so that they don't have to be rebuilt. This is a longer term project that will build on top of Improve performance by binning together opaque items instead of sorting them. #12453 and several of the tasks in Renderer optimization tracking issue #12590, such as main-world pipeline specialization.

  • Implement two-phase occlusion culling on top of the new indirect mode. This allows us to move beyond simple frustum culling.

This PR needs a bit more polish before it's ready to go, so I'm marking it as a draft. Everything seems to work though.

This commit introduces a new component, `GpuCulling`, which, when
present on a camera, skips the CPU visibility check in favor of doing
the frustum culling on the GPU. This trades off potentially-increased
CPU work and drawcalls in favor of cheaper culling and doesn't improve
the performance of any workloads that I know of today. However, it opens
the door to significant optimizations in the future by taking the
necessary first step toward *GPU-driven rendering*.

Enabling GPU culling for a view puts the rendering for that view into
*indirect mode*. In indirect mode, CPU-level visibility checks are
skipped, and all visible entities are considered potentially visible.
Bevy's batching logic still runs as usual, but it doesn't directly
generate mesh instance indices. Instead, it generates *instance
handles*, which are indices into an array of real instance indices.
Before any rendering is done, for each view, a compute shader,
`cull.wgsl`, maps instance handles to instance indices, discarding any
instance handles that represent meshes that are outside the visible
frustum. Draws are then done using the *indirect draw* feature of
`wgpu`, which instructs the GPU to read the number of actual instances
from the output of that compute shader.

Essentially, GPU culling works by adding a new level of indirection
between the CPU's notion of instances (known as instance handles) and
the GPU's notion of instances.

A new `--gpu-culling` flag has been added to the `many_foxes`,
`many_cubes`, and `3d_shapes` examples.

Potential follow-ups include:

* Split up `RenderMeshInstances` into CPU-driven and GPU-driven parts.
  The former, which contain fields like the transform, won't be
  initialized at all in when GPU culling is enabled. Instead, the
  transform will be directly written to the GPU in `extract_meshes`,
  like `extract_skins` does for joint matrices.

* Implement GPU culling for shadow maps.

  - Following that, we can treat all cascades as one as far as the CPU
    is concerned, simply replaying the final draw commands with
    different view uniforms, which should reduce the CPU overhead
    considerably.

* Retain bins from frame to frame so that they don't have to be rebuilt.
  This is a longer term project that will build on top of bevyengine#12453 and
  several of the tasks in bevyengine#12590, such as main-world pipeline
  specialization.
@alice-i-cecile alice-i-cecile added C-Enhancement A new feature A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times C-Needs-Release-Note Work that should be called out in the blog due to impact and removed C-Enhancement A new feature labels Mar 24, 2024
@pcwalton
Copy link
Contributor Author

pcwalton commented Apr 6, 2024

Closing in favor of #12889.

@pcwalton pcwalton closed this Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Needs-Release-Note Work that should be called out in the blog due to impact C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants