Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate together Bevy's TaskPools #12090

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

james7132
Copy link
Member

@james7132 james7132 commented Feb 24, 2024

Objective

Fixes #1907. Spiritual successor to #4740.

Bevy currently creates a 50%/25%/25% split between the Compute, AsyncCompute, and IO task pools, meaning that any given operation can only be scheduled onto that subset of threads. This is suboptimal in the cases where apps are not using any IO or Async Compute, or vice versa, where available parallelism would be under utilized due to the split not reflecting the actual use case. This PR aims to fix that under utilization.

Solution

  • Do away with the IO and AsyncCompute task pools and allocate all of the threads to the ComputeTaskPool.
  • Move all of the non-blocking IO tasks to the Compute task pool.
  • Add TaskPool::spawn_blocking as an alternative for blocking IO operations and any task that would previously be spawned on the AsyncCompute task pool. This will be backed by blocking's dynamically scaled thread pool, which will spin up and down threads depending on the need.

This allows ECS systems and parallel iterators to be scheduled onto all CPU cores instead of artificially constraining them to half of the logical cores available, as well as for typical async IO tasks to interweave onto any thread.

The use of spawn_blocking to perform CPU-bound operations will have to rely on the OS's preemptive scheduler to properly schedule the threads when the main task pool's threads are sitting idle. This comes with potential context switching costs, but is generally preferable to choking out the entire app due the task not cooperatively yielding available threads or artificially limiting available parallelism.

Note: We're already using blocking through async-fs for loading assets from disk. This shift primarily moves all of the other blocking IO tasks, and the async compute tasks into blocking's thread pool.


Changelog

Added: TaskPool::spawn_blocking
Added: TaskPool::spawn_blocking_async
Removed: IoTaskPool
Removed: AsyncComputeTaskPool
Changed: ComputeTaskPool by default now spawns a thread for every available logical CPU core.

Migration Guide

IoTaskPool and AsyncComputeTaskPool have been removed and merged with ComputeTaskPool. Replace IoTaskPool::get and AsyncComputeTaskPool::get.

If you were spawning futures that are reliant on non-blocking IO (as in the task spends most of it's time yielding via await), just spawn the task onto the ComputeTaskPool:

// in 0.13
IoTaskPool::get().spawn(async move {
     while let Ok(item) = stream.next().await {
         // process the item here
     }
});
// in 0.14
ComputeTaskPool::get().spawn(async move {
     while let Ok(item) = stream.next().await {
         // process the item here
     }
});

If you were spawning futures that are reliant on blocking IO (i.e. std::fs::File::read_to_string), use ComputeTaskPool::spawn_blocking instead. This will spawn and/or reuse one of the dynamically scaled threads explicitly made for blocking operations.

// in 0.13
IoTaskPool::get().spawn(async move {
     let contents = File::open(path).unwrap().read_to_string().unwrap();
     // process the file contents here
});
// in 0.14
ComputeTaskPool::get().spawn_blocking(move || {
     let contents = File::open(path).unwrap().read_to_string().unwrap();
     // process the file contents here
});

If you were spawning futures for async compute, use ComputeTaskPool::spawn_blocking instead. This will spawn and/or reuse one of the dynamically scaled threads explicitly made for blocking operations.

If you were spawning futures that are reliant on blocking IO (i.e. std::fs::File::read_to_string), use ComputeTaskPool::spawn_blocking instead. This will spawn and/or reuse one of the dynamically scaled threads explicitly made for blocking operations.

// in 0.13
AsyncComputeTaskPool::get().spawn(async move {
     solve_traveling_salesman();
});
// in 0.14
ComputeTaskPool::get().spawn_blocking(move || {
     solve_traveling_salesman();
});

If you were spawning futures for long running tasks that used both blocking and non-blocking work, use ComputeTaskPool::spawn_blocking_async instead.

// in 0.13
AsyncComputeTaskPool::get().spawn(async move {
     let contents = async_fs::File::open("assets/traveling_salesman.json")
         .await
         .unwrap()
         .read_to_string()
         .await
         .unwrap();
     solve_traveling_salesman(contents);
});
// in 0.14
ComputeTaskPool::get().spawn_blocking_async(async move {
     let contents = async_fs::File::open("assets/traveling_salesman.json")
         .await
         .unwrap()
         .read_to_string()
         .await
         .unwrap();
     solve_traveling_salesman(contents);
});

@james7132 james7132 added C-Performance A change motivated by improving speed, memory usage or compile times A-Tasks Tools for parallel and async work C-Breaking-Change A breaking change to Bevy's public API that needs to be noted in a migration guide X-Controversial There is active debate or serious implications around merging this PR labels Feb 24, 2024
@james7132 james7132 requested a review from hymm February 24, 2024 13:27
Copy link
Contributor

@AxiomaticSemantics AxiomaticSemantics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a read over this addresses a long-standing gripe I've had with bevy_tasks WRT thread assignment policies, as a bonus this makes bevy_tasks a bit easier to understand. I can't comment on the rendering side of the changes myself, but this is looking to be on the right track.

@hymm
Copy link
Contributor

hymm commented Feb 24, 2024

There should probably be a spawn_blocking_async method too for convenience.

Copy link
Contributor

@afonsolage afonsolage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reasoning behind the change and also blocking crate seems pretty good for this.

As someone who often use AsyncComputeTaskPool for procedural content generation, I think this change is fine, since I prefer the systems to have priority when using all cores available, to make sure FPS doesn't drop too often.

crates/bevy_tasks/src/task_pool.rs Outdated Show resolved Hide resolved
crates/bevy_tasks/src/task_pool.rs Outdated Show resolved Hide resolved
crates/bevy_tasks/src/task_pool.rs Outdated Show resolved Hide resolved
crates/bevy_tasks/src/task_pool.rs Outdated Show resolved Hide resolved
@alice-i-cecile alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it A-ECS Entities, components, systems, and events labels Feb 25, 2024
@hymm hymm added this to High Priority PRs in ECS Feb 25, 2024
Copy link
Contributor

@hymm hymm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of this. I argued more or less for this in the linked pr.

crates/bevy_tasks/src/task_pool.rs Outdated Show resolved Hide resolved
crates/bevy_tasks/src/task_pool.rs Outdated Show resolved Hide resolved
crates/bevy_tasks/src/task_pool.rs Outdated Show resolved Hide resolved
crates/bevy_tasks/src/task_pool.rs Outdated Show resolved Hide resolved
crates/bevy_tasks/src/task_pool.rs Outdated Show resolved Hide resolved
@AxiomaticSemantics
Copy link
Contributor

AxiomaticSemantics commented Feb 25, 2024

Is there an impending fix for the wasm build failure?

Would you consider renaming ComputeTaskPool to just TaskPool or ThreadPool or such?

@james7132
Copy link
Member Author

Is there an impending fix for the wasm build failure?

Fixed it with spawn_blocking_async, which should defer back to normal spawn on wasm targets now.

Would you consider renaming ComputeTaskPool to just TaskPool or ThreadPool or such?

I don't want to push that kind of breakage in this PR. The focus here is to get the functionality in place.

@james7132 james7132 closed this Feb 26, 2024
@james7132 james7132 reopened this Feb 26, 2024
@alice-i-cecile alice-i-cecile added the C-Needs-Release-Note Work that should be called out in the blog due to impact label Feb 26, 2024
@james7132
Copy link
Member Author

Tried to provide a more comprehensive changelog and migration guide.

@superdump
Copy link
Contributor

Just noting that I have previously argued for starting with the ‘allow use of all hardware threads/cores’ rather than artificially limiting them and leaving lots of performance on the table and deal with contention problems in other ways. So, I support this.

@james7132
Copy link
Member Author

james7132 commented Feb 27, 2024

Note that this seems to have a significant performance regression as is during app startup. It seems that this change blocks rendering until the entire scene is loaded. Discussion: https://discord.com/channels/691052431525675048/749335865876021248/1211935807099895888

@cart
Copy link
Member

cart commented Feb 27, 2024

I'm on board for this general idea provided we have the numbers to back up the move in the scenarios we care about. I'd like to see comparisons for things like "heavy scene startup", "loading big scene assets while also doing heavy / highly parallel compute tasks in systems", etc.

@james7132 james7132 added S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help and removed S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it labels Mar 4, 2024
@Elabajaba
Copy link
Contributor

Elabajaba commented Mar 14, 2024

Posted these traces in discord a few weeks ago, posting them here so they don't get buried.

This was loading a ~1-2GB gltf scene (I don't remember which specific one, it was either bistro with compressed textures or the synty fantasy castle scene, but both traces are for the same scene and iirc bevy main was at the time the same PR that this was branched off of).

With this PR
image

Bevy main
image

This PR takes significantly longer to load all the assets and get to a playable state (~7s vs ~4.7s), and doesn't let asset loading happen async with the game running.

Had to zip the traces together because github only accepts a few different file types traces.zip

@james7132
Copy link
Member Author

@Elabajaba with #12988 merged, it should now show the spans for IO tasks like loading/preprocessing assets. Could you try collecting another trace? If you don't have time, I'll go ahead and check against the Bistro load that I tried before.

@Elabajaba
Copy link
Contributor

Bistro with compressed textures
image

trace:
taskpoolconsolidation-IOspans-bistro.zip

@hymm
Copy link
Contributor

hymm commented Apr 18, 2024

The gltf loader is still using a scope to spawn some tasks. This is necessary since it's borrowing data for some of the tasks. We probably need a scope_blocking, that spawns tasks on the blocking threads for this use case. Another possibility is to somehow throttle the amount of tasks the asset loader is allowed to spawn at once.

edit: or maybe a Scope::spawn_blocking

@james7132
Copy link
Member Author

Some care needs to be taken to ensure that blocking tasks or scopes are not used from the main task pool, or it'll be trivial to block as seen here.

@JMS55 JMS55 self-requested a review April 27, 2024 06:41
@alice-i-cecile alice-i-cecile added D-Complex Quite challenging from either a design or technical perspective. Ask for help! X-Contentious There are nontrivial implications that should be thought through and removed X-Controversial There is active debate or serious implications around merging this PR labels May 16, 2024
@alice-i-cecile
Copy link
Member

@james7132 I don't think this makes sense for the 0.14 milestone in its current state. Removing, but feel free to argue with me if you disagree :)

@alice-i-cecile alice-i-cecile removed this from the 0.14 milestone May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events A-Tasks Tools for parallel and async work C-Breaking-Change A breaking change to Bevy's public API that needs to be noted in a migration guide C-Needs-Release-Note Work that should be called out in the blog due to impact C-Performance A change motivated by improving speed, memory usage or compile times D-Complex Quite challenging from either a design or technical perspective. Ask for help! S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help X-Contentious There are nontrivial implications that should be thought through
Projects
ECS
High Priority PRs
Development

Successfully merging this pull request may close these issues.

Using ComputeTaskPool for a par_for_each query only uses half of available logical cores
9 participants