-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreaded render command encoding #9172
Multithreaded render command encoding #9172
Conversation
Base implementation looks good to me - interested in seeing what benchmarks will show. |
Not sure if I should mark this as a breaking change or not. Technically there's some extra lifetime constraints now, but 99.999% of users won't need to change their code. I think the only thing you would need to change is if you're using bevy_render's RenderContext for your own stuff, independent from the RenderGraph. |
Note that this PR currently depends on gfx-rs/wgpu#3626 |
Marking this as blocked on a potential wgpu release, since this otherwise would add complexity without any gains. |
Does the benchmark use arcanized wgpu for both single and multi threaded impls? |
# Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks bevyengine#9172 and bevyengine#10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see bevyengine#11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
// Opaque draws | ||
if !opaque_phase.items.is_empty() { | ||
#[cfg(feature = "trace")] | ||
let _opaque_main_pass_3d_span = info_span!("opaque_main_pass_3d").entered(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smallest of all nits: I can see that this span has a much narrower scope than the one on line 56, but they have super similar names. Is there a good reason for them to be this similar? I assume it's to maintain compatibility with the previous version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I just copy pasted what the previous version did, yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Noted performance increase. No flickering shadows.
QueuedCommandBuffer::Ready(command_buffer) => { | ||
command_buffers.push((i, command_buffer)); | ||
} | ||
QueuedCommandBuffer::Task(command_buffer_generation_task) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we can't immediately launch the task on queuing it, and just collect the results in finish
? As this stands, if we have a large blocking part of the graph, the actual parallel tasks won't get started until that part of the graph is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tasks need to be scoped, which isn't easy to do unless we start them all at the end. For now, this should be fine. I don't foresee it really being a problem.
self.command_buffers | ||
|
||
if self.command_buffer_queue.is_empty() { | ||
self.command_buffer_queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being done, even though the queue is empty? Shouldn't we just return Vec::new()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why the code was like that. I removed all the redundant stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually looked at the generated assembly of this when I did my review, and for whatever reason the original compiled into fewer operations. I should try it again, maybe I did something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't have any actual impact, but I imagine it was because it early-outs if the queue is empty.
It looks like your PR is a breaking change, but you didn't provide a migration guide. Could you add some context on what users should update when this change get released in a new version of Bevy? |
The changes to the APIs are technically breaking changes. Could you document them in a Migration Guide section? |
…readed-command-encoding
It's not a breaking change I don't think, as the lifetimes should be inferred for most use cases. |
It looks like your PR is a breaking change, but you didn't provide a migration guide. Could you add some context on what users should update when this change get released in a new version of Bevy? |
This adds an argument to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this on a few of the stress tests and this does seem to deliver on the performance improvements. many_foxes
goes from 6.33ms (157 FPS) to 4.68ms (213 FPS).
Overall this looks good to me. Just a few final points that need to be addressed before it can be merged.
/// Append a function that will generate a [`CommandBuffer`] to the | ||
/// command buffer queue, to be ran later. | ||
/// | ||
/// If present, this will flush the currently unflushed [`CommandEncoder`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still need addressing?
# Objective While profiling around to validate the results of #9172, I noticed that `present_frames` can take a significant amount of time. Digging into the cause, it seems like we're creating a new `QueryState` from scratch every frame. This involves scanning the entire World's metadata instead of just updating its view of the world. ## Solution Use a `SystemState` argument to cache the `QueryState` to avoid this construction cost. ## Performance Against `many_foxes`, this seems to cut the time spent in `present_frames` by nearly almost 2x. Yellow is this PR, red is main. ![image](https://github.com/bevyengine/bevy/assets/3137680/2b02bbe0-6219-4255-958d-b690e37e7fba)
…readed-command-encoding
No changes were made to the argument count, just adding some lifetimes that should be inferred for all existing use cases. |
Github won't let me respond to it, but idk what this comment is in reference to. |
Ship it! |
# Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks bevyengine/bevy#9172 and bevyengine/bevy#10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see bevyengine/bevy#11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>
Objective
wgpu::CommandEncoder
is very expensive, and takes a long time.Solution
RenderContext
can now queue up "command buffer generation tasks" which are closures that will generate a command buffer when called.ComputeTaskPool
to produce their corresponding command buffers.Nodes Parallelized
MainOpaquePass3dNode
PrepassNode
DeferredGBufferPrepassNode
ShadowPassNode
(One task per view)Future Work
Performance Improvement
Thanks to @Elabajaba for testing on Bistro.
TLDR: Without shadow mapping, this PR has no impact. With shadow mapping, this PR gives ~40 more fps than main.
Changelog
MainOpaquePass3dNode
,PrepassNode
,DeferredGBufferPrepassNode
, and each shadow map withinShadowPassNode
are now encoded in parallel, giving greatly increased CPU performance, mainly when shadow mapping is enabled.RenderContext::add_command_buffer_generation_task()
.RenderContext::new()
now takes adapter infoMigration Guide
RenderContext::new()
now takes adapter info