Multithreaded render command encoding #9172

JMS55 · 2023-07-16T03:56:34Z

Objective

Encoding many GPU commands (such as in a renderpass with many draws, such as the main opaque pass) onto a wgpu::CommandEncoder is very expensive, and takes a long time.
To improve performance, we want to perform the command encoding for these heavy passes in parallel.

Solution

RenderContext can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called.
When finalizing the render context to produce the final list of command buffers, these tasks are run in parallel on the ComputeTaskPool to produce their corresponding command buffers.
The general idea is that the node graph will run in serial, but in a node, instead of doing rendering work, you can add tasks to do render work in parallel with other node's tasks that get ran at the end of the graph execution.

Nodes Parallelized

MainOpaquePass3dNode
PrepassNode
DeferredGBufferPrepassNode
ShadowPassNode (One task per view)

Future Work

For large number of draws calls, might be worth further subdividing passes into 2+ tasks.
Extend this to UI, 2d, transparent, and transmissive nodes?
Needs testing - small command buffers are inefficient - it may be worth reverting to the serial command encoder usage for render phases with few items.
All "serial" (traditional) rendering work must finish before parallel rendering tasks (the new stuff) can start to run.
There is still only one submission to the graphics queue at the end of the graph execution. There is still no ability to submit work earlier.

Performance Improvement

Thanks to @Elabajaba for testing on Bistro.

TLDR: Without shadow mapping, this PR has no impact. With shadow mapping, this PR gives ~40 more fps than main.

Changelog

MainOpaquePass3dNode, PrepassNode, DeferredGBufferPrepassNode, and each shadow map within ShadowPassNode are now encoded in parallel, giving greatly increased CPU performance, mainly when shadow mapping is enabled.
- Does not work on WASM or AMD+Windows+Vulkan.
Added RenderContext::add_command_buffer_generation_task().
RenderContext::new() now takes adapter info
Some render graph and Node related types and methods now have additional lifetime constraints.

Migration Guide

RenderContext::new() now takes adapter info

Some render graph and Node related types and methods now have additional lifetime constraints.

…eued

dmyyy · 2023-07-16T13:50:41Z

Base implementation looks good to me - interested in seeing what benchmarks will show.

JMS55 · 2023-07-16T19:59:56Z

Not sure if I should mark this as a breaking change or not. Technically there's some extra lifetime constraints now, but 99.999% of users won't need to change their code. I think the only thing you would need to change is if you're using bevy_render's RenderContext for your own stuff, independent from the RenderGraph.

JMS55 · 2023-07-16T20:02:36Z

Note that this PR currently depends on gfx-rs/wgpu#3626

james7132 · 2023-07-17T03:00:42Z

Marking this as blocked on a potential wgpu release, since this otherwise would add complexity without any gains.

cart · 2023-07-20T22:00:28Z

Does the benchmark use arcanized wgpu for both single and multi threaded impls?
Seems like arcanized wgpu on its own might give us a perf boost.

@MiniaczQ

# Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks bevyengine#9172 and bevyengine#10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see bevyengine#11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>

NthTensor · 2024-02-06T16:55:09Z

crates/bevy_core_pipeline/src/core_3d/main_opaque_pass_3d_node.rs

+            // Opaque draws
+            if !opaque_phase.items.is_empty() {
+                #[cfg(feature = "trace")]
+                let _opaque_main_pass_3d_span = info_span!("opaque_main_pass_3d").entered();


Smallest of all nits: I can see that this span has a much narrower scope than the one on line 56, but they have super similar names. Is there a good reason for them to be this similar? I assume it's to maintain compatibility with the previous version?

I think I just copy pasted what the previous version did, yeah

NthTensor

LGTM. Noted performance increase. No flickering shadows.

james7132 · 2024-02-07T20:46:15Z

crates/bevy_render/src/renderer/mod.rs

+                        QueuedCommandBuffer::Ready(command_buffer) => {
+                            command_buffers.push((i, command_buffer));
+                        }
+                        QueuedCommandBuffer::Task(command_buffer_generation_task) => {


Is there a reason why we can't immediately launch the task on queuing it, and just collect the results in finish? As this stands, if we have a large blocking part of the graph, the actual parallel tasks won't get started until that part of the graph is complete.

The tasks need to be scoped, which isn't easy to do unless we start them all at the end. For now, this should be fine. I don't foresee it really being a problem.

james7132 · 2024-02-07T20:48:08Z

crates/bevy_render/src/renderer/mod.rs

-        self.command_buffers
+
+        if self.command_buffer_queue.is_empty() {
+            self.command_buffer_queue


This is being done, even though the queue is empty? Shouldn't we just return Vec::new()?

I don't know why the code was like that. I removed all the redundant stuff.

I actually looked at the generated assembly of this when I did my review, and for whatever reason the original compiled into fewer operations. I should try it again, maybe I did something wrong.

Shouldn't have any actual impact, but I imagine it was because it early-outs if the queue is empty.

github-actions · 2024-02-07T20:58:10Z

It looks like your PR is a breaking change, but you didn't provide a migration guide.

Could you add some context on what users should update when this change get released in a new version of Bevy?
It will be used to help writing the migration guide for the version. Putting it after a ## Migration Guide will help it get automatically picked up by our tooling.

james7132 · 2024-02-07T20:58:32Z

The changes to the APIs are technically breaking changes. Could you document them in a Migration Guide section?

…readed-command-encoding

JMS55 · 2024-02-07T21:39:51Z

It's not a breaking change I don't think, as the lifetimes should be inferred for most use cases.

github-actions · 2024-02-08T07:48:59Z

It looks like your PR is a breaking change, but you didn't provide a migration guide.

Could you add some context on what users should update when this change get released in a new version of Bevy?
It will be used to help writing the migration guide for the version. Putting it after a ## Migration Guide will help it get automatically picked up by our tooling.

james7132 · 2024-02-08T07:49:39Z

It's not a breaking change I don't think, as the lifetimes should be inferred for most use cases.

This adds an argument to RenderGraphRunner::run. I don't expect many people to be running their own render graph, but it's worth documenting.

james7132

Tested this on a few of the stress tests and this does seem to deliver on the performance improvements. many_foxes goes from 6.33ms (157 FPS) to 4.68ms (213 FPS).

Overall this looks good to me. Just a few final points that need to be addressed before it can be merged.

crates/bevy_pbr/src/render/light.rs

james7132 · 2024-02-08T08:32:33Z

crates/bevy_render/src/renderer/mod.rs

+    /// Append a function that will generate a [`CommandBuffer`] to the
+    /// command buffer queue, to be ran later.
+    ///
+    /// If present, this will flush the currently unflushed [`CommandEncoder`]


Does this still need addressing?

# Objective While profiling around to validate the results of #9172, I noticed that `present_frames` can take a significant amount of time. Digging into the cause, it seems like we're creating a new `QueryState` from scratch every frame. This involves scanning the entire World's metadata instead of just updating its view of the world. ## Solution Use a `SystemState` argument to cache the `QueryState` to avoid this construction cost. ## Performance Against `many_foxes`, this seems to cut the time spent in `present_frames` by nearly almost 2x. Yellow is this PR, red is main. ![image](https://github.com/bevyengine/bevy/assets/3137680/2b02bbe0-6219-4255-958d-b690e37e7fba)

…readed-command-encoding

JMS55 · 2024-02-09T07:02:16Z

This adds an argument to RenderGraphRunner::run. I don't expect many people to be running their own render graph, but it's worth documenting.

No changes were made to the argument count, just adding some lifetimes that should be inferred for all existing use cases.

JMS55 · 2024-02-09T07:05:52Z

Does this still need addressing?

Github won't let me respond to it, but idk what this comment is in reference to.

james7132 · 2024-02-09T07:35:32Z

Ship it!

@MiniaczQ

# Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks bevyengine/bevy#9172 and bevyengine/bevy#10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see bevyengine/bevy#11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>

JMS55 added 7 commits July 15, 2023 18:16

Add RenderContext::add_async_command_buffer()

11af143

WIP

a64583c

More WIP

66d2b83

Misc

8b391d2

Complete RenderContext::finish()

6bbbe41

Misc

a4f2ab6

More lifetime WIP

d5112d6

JMS55 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Jul 16, 2023

JMS55 added 7 commits July 15, 2023 21:15

Fix lifetime issues

f002065

Misc lifetime rename

ad04cd5

Move trace into command gen task

6f2fbfa

Misc formatting

6e04259

Avoid unnecessary work when no command buffer generation tasks are qu…

2b447f4

…eued

Revert rename

0c830b9

Pass RenderDevice as task parameter

ef4a411

james7132 requested review from james7132 and superdump July 16, 2023 07:46

JMS55 added 2 commits July 16, 2023 10:11

Use command buffer generation tasks in more nodes

5446d28

Use arcanized wgpu

d52bcba

JMS55 marked this pull request as ready for review July 16, 2023 20:01

JMS55 added this to the 0.12 milestone Jul 16, 2023

james7132 added the S-Blocked This cannot move forward until something else changes label Jul 17, 2023

Fix bug with wrong color attachment

585ed5c

Elabajaba mentioned this pull request Jul 18, 2023

Arcanization of wgpu core resources gfx-rs/wgpu#3626

Merged

JMS55 mentioned this pull request Feb 5, 2024

Parallelize the core pipeline passes with wgpu's RenderBundles #5042

Closed

NthTensor reviewed Feb 6, 2024

View reviewed changes

NthTensor approved these changes Feb 6, 2024

View reviewed changes

JMS55 added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Feb 7, 2024

james7132 reviewed Feb 7, 2024

View reviewed changes

james7132 added the M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide label Feb 7, 2024

JMS55 added 2 commits February 7, 2024 13:19

Remove redundant code

b66799f

Merge commit 'ab16f5ed6afad717896c278381a0692531eedf44' into multi-th…

ecb810f

…readed-command-encoding

james7132 added M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide and removed M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide labels Feb 8, 2024

james7132 approved these changes Feb 8, 2024

View reviewed changes

This was referenced Feb 8, 2024

Cache the QueryState used to drop swapchain TextureViews #11781

Merged

Render Graph as Systems #5062

Closed

Merge commit '5313730534684408b6281fb2727144d6890524b0' into multi-th…

14a2822

…readed-command-encoding

Add trace to shadow pass

7d1f6f7

james7132 added this pull request to the merge queue Feb 9, 2024

Merged via the queue into bevyengine:main with commit f4dab8a Feb 9, 2024
23 checks passed

james7132 mentioned this pull request Feb 12, 2024

0.13 announcement post bevyengine/bevy-website#891

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreaded render command encoding #9172

Multithreaded render command encoding #9172

JMS55 commented Jul 16, 2023 •

edited

Loading

dmyyy commented Jul 16, 2023

JMS55 commented Jul 16, 2023

JMS55 commented Jul 16, 2023

james7132 commented Jul 17, 2023

cart commented Jul 20, 2023

NthTensor Feb 6, 2024

JMS55 Feb 7, 2024

NthTensor left a comment

james7132 Feb 7, 2024

JMS55 Feb 7, 2024

james7132 Feb 7, 2024

JMS55 Feb 7, 2024

NthTensor Feb 7, 2024 •

edited

Loading

JMS55 Feb 7, 2024

github-actions bot commented Feb 7, 2024

james7132 commented Feb 7, 2024

JMS55 commented Feb 7, 2024

github-actions bot commented Feb 8, 2024

james7132 commented Feb 8, 2024

james7132 left a comment

james7132 Feb 8, 2024

JMS55 commented Feb 9, 2024 •

edited

Loading

JMS55 commented Feb 9, 2024

james7132 commented Feb 9, 2024

Multithreaded render command encoding #9172

Multithreaded render command encoding #9172

Conversation

JMS55 commented Jul 16, 2023 • edited Loading

Objective

Solution

Nodes Parallelized

Future Work

Performance Improvement

Changelog

Migration Guide

dmyyy commented Jul 16, 2023

JMS55 commented Jul 16, 2023

JMS55 commented Jul 16, 2023

james7132 commented Jul 17, 2023

cart commented Jul 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NthTensor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NthTensor Feb 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 7, 2024

james7132 commented Feb 7, 2024

JMS55 commented Feb 7, 2024

github-actions bot commented Feb 8, 2024

james7132 commented Feb 8, 2024

james7132 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JMS55 commented Feb 9, 2024 • edited Loading

JMS55 commented Feb 9, 2024

james7132 commented Feb 9, 2024

JMS55 commented Jul 16, 2023 •

edited

Loading

NthTensor Feb 7, 2024 •

edited

Loading

JMS55 commented Feb 9, 2024 •

edited

Loading