Skip to content

Commit

Permalink
Improve performance by binning together opaque items instead of sorti…
Browse files Browse the repository at this point in the history
…ng them. (#12453)

Today, we sort all entities added to all phases, even the phases that
don't strictly need sorting, such as the opaque and shadow phases. This
results in a performance loss because our `PhaseItem`s are rather large
in memory, so sorting is slow. Additionally, determining the boundaries
of batches is an O(n) process.

This commit makes Bevy instead applicable place phase items into *bins*
keyed by *bin keys*, which have the invariant that everything in the
same bin is potentially batchable. This makes determining batch
boundaries O(1), because everything in the same bin can be batched.
Instead of sorting each entity, we now sort only the bin keys. This
drops the sorting time to near-zero on workloads with few bins like
`many_cubes --no-frustum-culling`. Memory usage is improved too, with
batch boundaries and dynamic indices now implicit instead of explicit.
The improved memory usage results in a significant win even on
unbatchable workloads like `many_cubes --no-frustum-culling
--vary-material-data-per-instance`, presumably due to cache effects.

Not all phases can be binned; some, such as transparent and transmissive
phases, must still be sorted. To handle this, this commit splits
`PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the
logic that today deals with `PhaseItem`s has been moved to
`SortedPhaseItem`. `BinnedPhaseItem` has the new logic.

Frame time results (in ms/frame) are as follows:

| Benchmark                | `binning` | `main`  | Speedup |
| ------------------------ | --------- | ------- | ------- |
| `many_cubes -nfc -vpi` | 232.179     | 312.123   | 34.43%  |
| `many_cubes -nfc`        | 25.874 | 30.117 | 16.40%  |
| `many_foxes`             | 3.276 | 3.515 | 7.30%   |

(`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for
`--vary-per-instance`.)

---

## Changelog

### Changed

* Render phases have been split into binned and sorted phases. Binned
phases, such as the common opaque phase, achieve improved CPU
performance by avoiding the sorting step.

## Migration Guide

- `PhaseItem` has been split into `BinnedPhaseItem` and
`SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need
to migrate them to one of these two types. `SortedPhaseItem` requires
the fewest code changes, but you may want to pick `BinnedPhaseItem` if
your phase doesn't require sorting, as that enables higher performance.

## Tracy graphs

`many-cubes --no-frustum-culling`, `main` branch:
<img width="1064" alt="Screenshot 2024-03-12 180037"
src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55">

`many-cubes --no-frustum-culling`, this branch:
<img width="1064" alt="Screenshot 2024-03-12 180011"
src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c">

You can see that `batch_and_prepare_binned_render_phase` is a much
smaller fraction of the time. Zooming in on that function, with yellow
being this branch and red being `main`, we see:

<img width="1064" alt="Screenshot 2024-03-12 175832"
src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0">

The binning happens in `queue_material_meshes`. Again with yellow being
this branch and red being `main`:
<img width="1064" alt="Screenshot 2024-03-12 175755"
src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96">

We can see that there is a small regression in `queue_material_meshes`
performance, but it's not nearly enough to outweigh the large gains in
`batch_and_prepare_binned_render_phase`.

---------

Co-authored-by: James Liu <contact@jamessliu.com>
  • Loading branch information
pcwalton and james7132 committed Mar 30, 2024
1 parent df76fd4 commit 4dadebd
Show file tree
Hide file tree
Showing 31 changed files with 1,059 additions and 418 deletions.
4 changes: 2 additions & 2 deletions crates/bevy_core_pipeline/src/core_2d/main_pass_2d_node.rs
Expand Up @@ -4,7 +4,7 @@ use bevy_render::{
camera::ExtractedCamera,
diagnostic::RecordDiagnostics,
render_graph::{Node, NodeRunError, RenderGraphContext},
render_phase::RenderPhase,
render_phase::SortedRenderPhase,
render_resource::RenderPassDescriptor,
renderer::RenderContext,
view::{ExtractedView, ViewTarget},
Expand All @@ -16,7 +16,7 @@ pub struct MainPass2dNode {
query: QueryState<
(
&'static ExtractedCamera,
&'static RenderPhase<Transparent2d>,
&'static SortedRenderPhase<Transparent2d>,
&'static ViewTarget,
),
With<ExtractedView>,
Expand Down
32 changes: 17 additions & 15 deletions crates/bevy_core_pipeline/src/core_2d/mod.rs
Expand Up @@ -38,7 +38,7 @@ use bevy_render::{
render_graph::{EmptyNode, RenderGraphApp, ViewNodeRunner},
render_phase::{
sort_phase_system, CachedRenderPipelinePhaseItem, DrawFunctionId, DrawFunctions, PhaseItem,
RenderPhase,
SortedPhaseItem, SortedRenderPhase,
},
render_resource::CachedRenderPipelineId,
Extract, ExtractSchedule, Render, RenderApp, RenderSet,
Expand Down Expand Up @@ -96,29 +96,16 @@ pub struct Transparent2d {
}

impl PhaseItem for Transparent2d {
type SortKey = FloatOrd;

#[inline]
fn entity(&self) -> Entity {
self.entity
}

#[inline]
fn sort_key(&self) -> Self::SortKey {
self.sort_key
}

#[inline]
fn draw_function(&self) -> DrawFunctionId {
self.draw_function
}

#[inline]
fn sort(items: &mut [Self]) {
// radsort is a stable radix sort that performed better than `slice::sort_by_key` or `slice::sort_unstable_by_key`.
radsort::sort_by_key(items, |item| item.sort_key().0);
}

#[inline]
fn batch_range(&self) -> &Range<u32> {
&self.batch_range
Expand All @@ -140,6 +127,21 @@ impl PhaseItem for Transparent2d {
}
}

impl SortedPhaseItem for Transparent2d {
type SortKey = FloatOrd;

#[inline]
fn sort_key(&self) -> Self::SortKey {
self.sort_key
}

#[inline]
fn sort(items: &mut [Self]) {
// radsort is a stable radix sort that performed better than `slice::sort_by_key` or `slice::sort_unstable_by_key`.
radsort::sort_by_key(items, |item| item.sort_key().0);
}
}

impl CachedRenderPipelinePhaseItem for Transparent2d {
#[inline]
fn cached_pipeline(&self) -> CachedRenderPipelineId {
Expand All @@ -155,7 +157,7 @@ pub fn extract_core_2d_camera_phases(
if camera.is_active {
commands
.get_or_spawn(entity)
.insert(RenderPhase::<Transparent2d>::default());
.insert(SortedRenderPhase::<Transparent2d>::default());
}
}
}
Expand Up @@ -7,7 +7,7 @@ use bevy_render::{
camera::ExtractedCamera,
diagnostic::RecordDiagnostics,
render_graph::{NodeRunError, RenderGraphContext, ViewNode},
render_phase::{RenderPhase, TrackedRenderPass},
render_phase::{BinnedRenderPhase, TrackedRenderPass},
render_resource::{CommandEncoderDescriptor, PipelineCache, RenderPassDescriptor, StoreOp},
renderer::RenderContext,
view::{ViewDepthTexture, ViewTarget, ViewUniformOffset},
Expand All @@ -17,14 +17,16 @@ use bevy_utils::tracing::info_span;

use super::AlphaMask3d;

/// A [`bevy_render::render_graph::Node`] that runs the [`Opaque3d`] and [`AlphaMask3d`] [`RenderPhase`].
/// A [`bevy_render::render_graph::Node`] that runs the [`Opaque3d`]
/// [`BinnedRenderPhase`] and [`AlphaMask3d`]
/// [`bevy_render::render_phase::SortedRenderPhase`]s.
#[derive(Default)]
pub struct MainOpaquePass3dNode;
impl ViewNode for MainOpaquePass3dNode {
type ViewQuery = (
&'static ExtractedCamera,
&'static RenderPhase<Opaque3d>,
&'static RenderPhase<AlphaMask3d>,
&'static BinnedRenderPhase<Opaque3d>,
&'static BinnedRenderPhase<AlphaMask3d>,
&'static ViewTarget,
&'static ViewDepthTexture,
Option<&'static SkyboxPipelineId>,
Expand Down Expand Up @@ -80,14 +82,14 @@ impl ViewNode for MainOpaquePass3dNode {
}

// Opaque draws
if !opaque_phase.items.is_empty() {
if !opaque_phase.is_empty() {
#[cfg(feature = "trace")]
let _opaque_main_pass_3d_span = info_span!("opaque_main_pass_3d").entered();
opaque_phase.render(&mut render_pass, world, view_entity);
}

// Alpha draws
if !alpha_mask_phase.items.is_empty() {
if !alpha_mask_phase.is_empty() {
#[cfg(feature = "trace")]
let _alpha_mask_main_pass_3d_span = info_span!("alpha_mask_main_pass_3d").entered();
alpha_mask_phase.render(&mut render_pass, world, view_entity);
Expand Down
Expand Up @@ -4,7 +4,7 @@ use bevy_ecs::{prelude::*, query::QueryItem};
use bevy_render::{
camera::ExtractedCamera,
render_graph::{NodeRunError, RenderGraphContext, ViewNode},
render_phase::RenderPhase,
render_phase::SortedRenderPhase,
render_resource::{Extent3d, RenderPassDescriptor, StoreOp},
renderer::RenderContext,
view::{ViewDepthTexture, ViewTarget},
Expand All @@ -13,15 +13,16 @@ use bevy_render::{
use bevy_utils::tracing::info_span;
use std::ops::Range;

/// A [`bevy_render::render_graph::Node`] that runs the [`Transmissive3d`] [`RenderPhase`].
/// A [`bevy_render::render_graph::Node`] that runs the [`Transmissive3d`]
/// [`SortedRenderPhase`].
#[derive(Default)]
pub struct MainTransmissivePass3dNode;

impl ViewNode for MainTransmissivePass3dNode {
type ViewQuery = (
&'static ExtractedCamera,
&'static Camera3d,
&'static RenderPhase<Transmissive3d>,
&'static SortedRenderPhase<Transmissive3d>,
&'static ViewTarget,
Option<&'static ViewTransmissionTexture>,
&'static ViewDepthTexture,
Expand Down
Expand Up @@ -4,22 +4,23 @@ use bevy_render::{
camera::ExtractedCamera,
diagnostic::RecordDiagnostics,
render_graph::{NodeRunError, RenderGraphContext, ViewNode},
render_phase::RenderPhase,
render_phase::SortedRenderPhase,
render_resource::{RenderPassDescriptor, StoreOp},
renderer::RenderContext,
view::{ViewDepthTexture, ViewTarget},
};
#[cfg(feature = "trace")]
use bevy_utils::tracing::info_span;

/// A [`bevy_render::render_graph::Node`] that runs the [`Transparent3d`] [`RenderPhase`].
/// A [`bevy_render::render_graph::Node`] that runs the [`Transparent3d`]
/// [`SortedRenderPhase`].
#[derive(Default)]
pub struct MainTransparentPass3dNode;

impl ViewNode for MainTransparentPass3dNode {
type ViewQuery = (
&'static ExtractedCamera,
&'static RenderPhase<Transparent3d>,
&'static SortedRenderPhase<Transparent3d>,
&'static ViewTarget,
&'static ViewDepthTexture,
);
Expand Down

0 comments on commit 4dadebd

Please sign in to comment.