more parallel check visibility #7310

hymm · 2023-01-21T00:21:16Z

Objective

The current check_visibility function first checks the visibility of entities with an aabb and then checks those without an aabb. These two can be done in parallel.

Solution

use a scope to spawn one into another task and run the second one on the scope thread.

Many Foxes

many cubes

3d scene (this change is small enough it could just be noise)

Changelog

make check visibility more parallel.

james7132

Does this deadlock on machines with less than 2 cores or on WASM?

hymm · 2023-01-21T05:45:57Z

Does this deadlock on machines with less than 2 cores or on WASM?

It shouldn't, but I haven't checked it. One of the scopes is pushed into a task, so if there aren't enough threads to run it. It'll be run on the executor on the scope after the first finishes.

I'll try running on wasm and see what happens.

superdump · 2023-01-21T12:50:01Z

I'm curious why the par_iter_mut doesn't saturate available threads already such that running two par_iter_muts in parallel produces a speed up.

hymm · 2023-01-21T21:40:45Z

I'm curious why the par_iter_mut doesn't saturate available threads already such that running two par_iter_muts in parallel produces a speed up.

The reason I did this pr is from looking at many foxes and seeing the trace look like this:

Many Foxes Main

So we gain something here from doing the without aabb's in parallel, but there really isn't much parallelism for the "with aabb"'s par_for_each. The other thing is that we're paying a ~15us cost if the scope thread goes to sleep to wake it back up. (I need to do a write up of this.) By doing the scopes in parallel we pay this cost at a maximum of one time, but when they're serial this overhead doubles. We should also gain something by keeping things hotter, as the threads can potentially go to sleep if we're running the par_for_each's serially.

There are also downsides to doing this in parallel. There will be increased contention on the global task queue. Also while the scope threads are spawning tasks they can't actually run any, so that could decrease the parallelism.

I think the benefits will usually outweigh the costs, but there are also other things we should be changing that could make this PR less impactful. We should probably be setting min batch sizes for all of uses of par_for_each. The batches on the with aabb's on above are probably too small to be worth is. If we don't reach the min batch size, we should just run it with a normal for_each.

I'm also investigating a significant rework of how par_for_each splits the batches and spawns things, so that it'll be more like how rayon does it. Not sure if that'll land though as I'm running into some issues.

james7132 · 2023-01-21T23:36:14Z

crates/bevy_render/src/view/visibility/mod.rs

@@ -375,71 +376,76 @@ pub fn check_visibility(
        let view_mask = maybe_view_mask.copied().unwrap_or_default();

        visible_entities.entities.clear();


Wondering if we could just combine the two queries together instead of splitting them like this.

Tested this over here: #10196

# Objective Alternative to #7310 ## Solution Implemented the suggestion from #7310 (comment) I am guessing that these were originally split as an optimization, but I am not sure since I believe the original author of the code is the one speculating about combining them up there. ## Benchmarks I ran three benchmarks to compare main, this PR, and the approach from #7310 ([updated](https://github.com/rparrett/bevy/commits/rebased-parallel-check-visibility) to the same commit on main). This seems to perform slightly better than main in scenarios where most entities have AABBs, and a bit worse when they don't (`many_lights`). That seems to make sense to me. Either way, the difference is ~-20 microseconds in the more common scenarios or ~+100 microseconds in the less common scenario. I would speculate that this might perform **very slightly** worse in single-threaded scenarios. Benches were run in release mode for 2000 frames while capturing a trace with tracy. | bench | commit | check_visibility_system mean μs | | -- | -- | -- | | many_cubes | main | 929.5 | | many_cubes | this | 914.0 | | many_cubes | 7310 | 1003.5 | | | | | many_foxes | main | 191.6 | | many_foxes | this | 173.2 | | many_foxes | 7310 | 167.9 | | | | | many_lights | main | 619.3 | | many_lights | this | 703.7 | | many_lights | 7310 | 842.5 | ## Notes Technically this behaves slightly differently -- prior to this PR, view visibility was determined even for entities without `GlobalTransform`. I don't think this has any practical impact though. IMO, I don't think we need to do this. But I opened a PR because it seemed like the handiest way to share the code / benchmarks. ## TODO I have done some rudimentary testing with the examples above, but I can do some screenshot diffing if it seems like we want to do this.

Fix branding inconsistencies don't Implement `Display` for `Val` (bevyengine#10345) - Revert bevyengine#10296 - Avoid implementing `Display` without a justification - `Display` implementation is a guarantee without a direct use, takes additional time to compile and require work to maintain - `Debug`, `Reflect` or `Serialize` should cover all needs Combine visibility queries in check_visibility_system (bevyengine#10196) Alternative to bevyengine#7310 Implemented the suggestion from bevyengine#7310 (comment) I am guessing that these were originally split as an optimization, but I am not sure since I believe the original author of the code is the one speculating about combining them up there. I ran three benchmarks to compare main, this PR, and the approach from ([updated](https://github.com/rparrett/bevy/commits/rebased-parallel-check-visibility) to the same commit on main). This seems to perform slightly better than main in scenarios where most entities have AABBs, and a bit worse when they don't (`many_lights`). That seems to make sense to me. Either way, the difference is ~-20 microseconds in the more common scenarios or ~+100 microseconds in the less common scenario. I would speculate that this might perform **very slightly** worse in single-threaded scenarios. Benches were run in release mode for 2000 frames while capturing a trace with tracy. | bench | commit | check_visibility_system mean μs | | -- | -- | -- | | many_cubes | main | 929.5 | | many_cubes | this | 914.0 | | many_cubes | 7310 | 1003.5 | | | | | many_foxes | main | 191.6 | | many_foxes | this | 173.2 | | many_foxes | 7310 | 167.9 | | | | | many_lights | main | 619.3 | | many_lights | this | 703.7 | | many_lights | 7310 | 842.5 | Technically this behaves slightly differently -- prior to this PR, view visibility was determined even for entities without `GlobalTransform`. I don't think this has any practical impact though. IMO, I don't think we need to do this. But I opened a PR because it seemed like the handiest way to share the code / benchmarks. I have done some rudimentary testing with the examples above, but I can do some screenshot diffing if it seems like we want to do this. Make VERTEX_COLORS usable in prepass shader, if available (bevyengine#10341) I was working with forward rendering prepass fragment shaders and ran into an issue of not being able to access vertex colors in the prepass. I was able to access vertex colors in regular fragment shaders as well as in deferred shaders. It seems like this `if` was nested unintentionally as moving it outside of the `deferred` block works. --- Enable vertex colors in forward rendering prepass fragment shaders allow DeferredPrepass to work without other prepass markers (bevyengine#10223) fix crash / misbehaviour when `DeferredPrepass` is used without `DepthPrepass`. - Deferred lighting requires the depth prepass texture to be present, so that the depth texture is available for binding. without it the deferred lighting pass will use 0 for depth of all meshes. - When `DeferredPrepass` is used without other prepass markers, and with any materials that use `OpaqueRenderMode::Forward`, those entities will try to queue to the `Opaque3dPrepass` render phase, which doesn't exist, causing a crash. - check if the prepass phases exist before queueing - generate prepass textures if `Opaque3dDeferred` is present - add a note to the DeferredPrepass marker to note that DepthPrepass is also required by the default deferred lighting pass - also changed some `With<T>.is_some()`s to `Has<T>`s UI batching Fix (bevyengine#9610) Reimplement bevyengine#8793 on top of the recent rendering changes. The batch creation logic is quite convoluted, but I tested it on enough examples to convince myself that it works. The initial value of `batch_image_handle` is changed from `HandleId::Id(Uuid::nil(), u64::MAX)` to `DEFAULT_IMAGE_HANDLE.id()`, which allowed me to make the if-block simpler I think. The default image from `DEFAULT_IMAGE_HANDLE` is always inserted into `UiImageBindGroups` even if it's not used. I tried to add a check so that it would be only inserted when there is only one batch using the default image but this crashed. --- `prepare_uinodes` * Changed the initial value of `batch_image_handle` to `DEFAULT_IMAGE_HANDLE.id()`. * The default image is added to the UI image bind groups before assembling the batches. * A new `UiBatch` isn't created when the next `ExtractedUiNode`s image is set to `DEFAULT_IMAGE_HANDLE` (unless it is the first item in the UI phase items list). Increase default normal bias to avoid common artifacts (bevyengine#10346) Bevy's default bias values for directional and spot lights currently cause significant artifacts. We should fix that so shadows look good by default! This is a less controversial/invasive alternative to bevyengine#10188, which might enable us to keep the default bias value low, but also has its own sets of concerns and caveats that make it a risky choice for Bevy 0.12. Bump the default normal bias from `0.6` to `1.8`. There is precedent for values in this general area as Godot has a default normal bias of `2.0`. ![image](https://github.com/superdump/bevy/assets/2694663/a5828011-33fc-4427-90ed-f093d7389053) ![image](https://github.com/bevyengine/bevy/assets/2694663/0f2b16b0-c116-41ab-9886-1ace9e00efd6) The default `shadow_normal_bias` value for `DirectionalLight` and `SpotLight` has changed to accommodate artifacts introduced with the new shadow PCF changes. It is unlikely (especially given the new PCF shadow behaviors with these values), but you might need to manually tweak this value if your scene requires a lower bias and it relied on the previous default value. Make `DirectionalLight` `Cascades` computation generic over `CameraProjection` (bevyengine#9226) Fixes bevyengine#9077 (see this issue for motivations) Implement 1 and 2 of the "How to fix it" section of bevyengine#9077 `update_directional_light_cascades` is split into `clear_directional_light_cascades` and a generic `build_directional_light_cascades`, to clear once and potentially insert many times. --- `DirectionalLight`'s computation is now generic over `CameraProjection` and can work with custom camera projections. If you have a component `MyCustomProjection` that implements `CameraProjection`: - You need to implement a new required associated method, `get_frustum_corners`, returning an array of the corners of a subset of the frustum with given `z_near` and `z_far`, in local camera space. - You can now add the `build_directional_light_cascades::<MyCustomProjection>` system in `SimulationLightSystems::UpdateDirectionalLightCascades` after `clear_directional_light_cascades` for your projection to work with directional lights. --------- Co-authored-by: Carter Anderson <mcanders1@gmail.com> Update default `ClearColor` to better match Bevy's branding (bevyengine#10339) - Changes the default clear color to match the code block color on Bevy's website. - Changed the clear color, updated text in examples to ensure adequate contrast. Inconsistent usage of white text color set to use the default color instead, which is already white. - Additionally, updated the `3d_scene` example to make it look a bit better, and use bevy's branding colors. ![image](https://github.com/bevyengine/bevy/assets/2632925/540a22c0-826c-4c33-89aa-34905e3e313a) Corrected incorrect doc comment on read_asset_bytes (bevyengine#10352) Fixes bevyengine#10302 - Removed the incorrect comment. Allow AccessKit to react to WindowEvents before they reach the engine (bevyengine#10356) - Adopt bevyengine#10239 to get it in time for the release - Fix accessibility on macOS and linux - call `on_event` from AcccessKit adapter on winit events --------- Co-authored-by: Nolan Darilek <nolan@thewordnerd.info> Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Fix typo in window.rs (bevyengine#10358) Fixes a small typo in `bevy_window/src/window.rs` Change `Should be used instead 'scale_factor' when set.` to `Should be used instead of 'scale_factor' when set.` Add UI Materials (bevyengine#9506) - Add Ui Materials so that UI can render more complex and animated widgets. - Fixes bevyengine#5607 - Create a UiMaterial trait for specifying a Shader Asset and Bind Group Layout/Data. - Create a pipeline for rendering these Materials inside the Ui layout/tree. - Create a MaterialNodeBundle for simple spawning. - Created a `UiMaterial` trait for specifying a Shader asset and Bind Group. - Created a `UiMaterialPipeline` for rendering said Materials. - Added Example [`ui_material` ](https://github.com/MarkusTheOrt/bevy/blob/ui_material/examples/ui/ui_material.rs) for example usage. - Created [`UiVertexOutput`](https://github.com/MarkusTheOrt/bevy/blob/ui_material/crates/bevy_ui/src/render/ui_vertex_output.wgsl) export as VertexData for shaders. - Created [`material_ui`](https://github.com/MarkusTheOrt/bevy/blob/ui_material/crates/bevy_ui/src/render/ui_material.wgsl) shader as default for both Vertex and Fragment shaders. --------- Co-authored-by: ickshonpe <david.curthoys@googlemail.com> Co-authored-by: François <mockersf@gmail.com> support file operations in single threaded context (bevyengine#10312) - Fixes bevyengine#10209 - Assets should work in single threaded - In single threaded mode, don't use `async_fs` but fallback on `std::fs` with a thin layer to mimic the async API - file `file_asset.rs` is the async imps from `mod.rs` - file `sync_file_asset.rs` is the same with `async_fs` APIs replaced by `std::fs` - which module is used depends on the `multi-threaded` feature --------- Co-authored-by: Carter Anderson <mcanders1@gmail.com> Fix gizmo crash when prepass enabled (bevyengine#10360) - Fix gizmo crash when prepass enabled - Add the prepass to the view key Fixes: bevyengine#10347

superdump · 2023-11-05T14:07:31Z

@hymm now that #10196 is merged, can we close this?

# Objective Alternative to bevyengine#7310 ## Solution Implemented the suggestion from bevyengine#7310 (comment) I am guessing that these were originally split as an optimization, but I am not sure since I believe the original author of the code is the one speculating about combining them up there. ## Benchmarks I ran three benchmarks to compare main, this PR, and the approach from bevyengine#7310 ([updated](https://github.com/rparrett/bevy/commits/rebased-parallel-check-visibility) to the same commit on main). This seems to perform slightly better than main in scenarios where most entities have AABBs, and a bit worse when they don't (`many_lights`). That seems to make sense to me. Either way, the difference is ~-20 microseconds in the more common scenarios or ~+100 microseconds in the less common scenario. I would speculate that this might perform **very slightly** worse in single-threaded scenarios. Benches were run in release mode for 2000 frames while capturing a trace with tracy. | bench | commit | check_visibility_system mean μs | | -- | -- | -- | | many_cubes | main | 929.5 | | many_cubes | this | 914.0 | | many_cubes | 7310 | 1003.5 | | | | | many_foxes | main | 191.6 | | many_foxes | this | 173.2 | | many_foxes | 7310 | 167.9 | | | | | many_lights | main | 619.3 | | many_lights | this | 703.7 | | many_lights | 7310 | 842.5 | ## Notes Technically this behaves slightly differently -- prior to this PR, view visibility was determined even for entities without `GlobalTransform`. I don't think this has any practical impact though. IMO, I don't think we need to do this. But I opened a PR because it seemed like the handiest way to share the code / benchmarks. ## TODO I have done some rudimentary testing with the examples above, but I can do some screenshot diffing if it seems like we want to do this.

run no aabb and aabb checks in parallel

f0ab36c

hymm force-pushed the more-parallel-check-visibility branch from acbb224 to f0ab36c Compare January 21, 2023 00:34

james7132 reviewed Jan 21, 2023

View reviewed changes

Weibye added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Jan 21, 2023

james7132 reviewed Jan 21, 2023

View reviewed changes

rparrett mentioned this pull request Oct 19, 2023

Combine visibility queries in check_visibility_system #10196

Merged

hymm closed this Nov 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

more parallel check visibility #7310

more parallel check visibility #7310

hymm commented Jan 21, 2023 •

edited

james7132 left a comment

hymm commented Jan 21, 2023

superdump commented Jan 21, 2023

hymm commented Jan 21, 2023

james7132 Jan 21, 2023

rparrett Oct 19, 2023

superdump commented Nov 5, 2023

		@@ -375,71 +376,76 @@ pub fn check_visibility(
		let view_mask = maybe_view_mask.copied().unwrap_or_default();

		visible_entities.entities.clear();

more parallel check visibility #7310

more parallel check visibility #7310

Conversation

hymm commented Jan 21, 2023 • edited

Objective

Solution

Changelog

james7132 left a comment

Choose a reason for hiding this comment

hymm commented Jan 21, 2023

superdump commented Jan 21, 2023

hymm commented Jan 21, 2023

james7132 Jan 21, 2023

Choose a reason for hiding this comment

rparrett Oct 19, 2023

Choose a reason for hiding this comment

superdump commented Nov 5, 2023

hymm commented Jan 21, 2023 •

edited