Dynamic `Schedule` stages #73

Anders429 · 2022-03-20T07:06:13Z

Currently, Schedule stages are defined at creation by simply partitioning based off of mutable and immutable claims. However, this is not optimal for, say, the schedule benchmark in the ecs_bench_suite, which borrows (A,B), (C,D), and (C,E) mutably for each system. The current implementation divides this into two schedules, but it doesn't have to be. The input doesn't actually have any tables with all three of the C, D, and E components.

It's clear when looking at the example that we could run all three systems in parallel. However, making generic rules for this is trickier. We need to keep a few things in mind when proceeding:

query result order is not guaranteed.
system execution order is guaranteed.

Therefore, I think something like the following might work, at an extremely high level: for each system in the schedule, find all tables that can be executed during the current stage (meaning they aren't overlapping with a previous borrow from another stage in the schedule). The system should be executed over those tables during this stage, since we know there is no pending work on them that must complete before they can be run right now. We then proceed to the next stage and again attempt to run the systems, in order, over the remaining tables. We do this until there are no tables remaining.

This shifts the execution scheduling from being system-based to being table-based. I believe this should allow for faster execution, as it parallelizes as much computation as possible. In the ecs_bench_suite example, it would run all three systems at once, which should cut down execution time significantly.

I think it will be worth it, because the overhead introduced in scheduling dynamically should be minimal compared to the actual iteration. In most cases where Schedules are actually needed, this should result in more performance gains, as the overhead is nothing compared to the saved iteration time.

More specific details will need to be determined still, specifically with how mutable and immutable borrows will differ, as well as how the schedule will keep track of which tables have been modified. Additionally, consideration will need to be made for how to deal with post-processing of systems, and whether that needs to be reworked.

The text was updated successfully, but these errors were encountered:

Anders429 · 2022-03-22T05:41:35Z

More detailed overview:

Creating a schedule

Rather than creating a schedule with a Builder, a schedule can be created with a simple heterogeneous list creation macro. This will actually cut out a few modules and make schedules match with the patterns commonly used elsewhere in the library, which can make things simpler. A schedule will now simply defined as a heterogeneous list of tasks.

Tasks

Tasks will be an enum defined as follows:

pub enum Task<P, S> {
    System(S),
    ParSystem(P),
    Flush,
}

This is similar to how tasks are already defined. However, it should be noted that RawTasks are removed as an intermediate step due to the removal of the Builder struct.

Stages

Rather than stages being defined when a schedule is built, they are now defined at run-time, and can change between runs depending on what entity signatures are stored within the internal archetype tables. Stages are now defined as sets of tables to iterate over, rather than sets of systems.

The algorithm for creating stages is similar to the current stage creation algorithm: Starting at the first uncompleted stage, add every table to the table queue that has not yet been processed and whose borrows do not conflict with borrows from previous tasks already added to the stage. If a task is reached that is a flush, then this marks the end of the stages before and the beginning of the stages after, as it is a pause for pre- and post-processing.

Systems

There is one glaring issue, however: systems are not structured to run on a per-table basis. A system takes a result::Iter, which doesn't work well when we're iterating over only a subset of the data at a time.

To resolve this, systems could be changed to instead have three methods: pre_processing(), run(), and post_processing(), where run() now takes a single result!() instead of an iterator of results. pre_ and post_processing() methods will be run before any calls to run() are made and after all calls to run() are made, respectively.

An open question is whether this hurts the ability of Systems. I believe that most every possible usage would fall within that outline, unless there was some desire to not iterate over every value sequentially.

An alternative to this would be to not make this table-based, but still base the scheduling on the tables at the time of execution, yet still based on the tables within the world. This would save us from needing to redo the entire system interface, but also improve the efficiency and clean up the API.

Example

// Assuming `System` `Foo` and `ParSystem` `Bar` and a `World` `world`. 

// Creates a heterogeneous list.
let schedule = schedule!(Task::System(Foo), Task::ParSystem(Bar));

// The stages are determined at run-time
world.run_schedule(schedule);

Future work: stage caching

Something to consider is that stages will often be the same, even if they are created at run-time at every iteration. Some solution should be investigated for caching stages that are created for each schedule run so that they can be run again if the table make-up of the world has changed since the last run. That cache could be stored directly within the world and be based on the type id of the schedule, since each schedule's type will be unique.

Anders429 · 2022-11-10T00:26:11Z

A possible alternative idea is to work at the stage-level to determine if stages can be parallelized at run-time. When a stage is run, the archetype ids of the archetypes touched by that stage could be stored in a set. The next stage could then be queried for its archetypes, and if there is no overlap, that stage could be run in parallel, or "promoted".

This would only work on the stage level, which means that individual tasks wouldn't themselves be promoted if other tasks in their stage cause overlap. Perhaps that could be future work: to allow promotion of individual tasks in a stage.

The other thing to consider is how this affects flushing at the end of a stage. If a stage runs in parallel to the one before it, then the flushing will occur after both stages have run, along with all post-processing. This could create some surprising behavior for users who expect the post-processing to happen after the stage runs (although maybe the whole post-processing setup should be revisited altogether, and it should be determined what is actually guaranteed in system post-processing in the first place?).

Anders429 added the C - Enhancement Category: New feature or request. label Mar 20, 2022

Anders429 added S - Needs Investigation Status: Further investigation is needed. C - Performance Category: Related to performance. P - Low Priority: Not particularly urgent. A - Scheduling Area: Parallel scheduling of systems. labels Sep 15, 2022

Anders429 mentioned this issue Dec 20, 2022

Consider removing System::world_post_processing() #156

Closed

Anders429 mentioned this issue Jan 29, 2023

Dynamic scheduling optimizations #167

Merged

Anders429 linked a pull request Jan 29, 2023 that will close this issue

Dynamic scheduling optimizations #167

Merged

Anders429 closed this as completed Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic `Schedule` stages #73

Dynamic `Schedule` stages #73

Anders429 commented Mar 20, 2022

Anders429 commented Mar 22, 2022

Anders429 commented Nov 10, 2022

Dynamic Schedule stages #73

Dynamic Schedule stages #73

Comments

Anders429 commented Mar 20, 2022

Anders429 commented Mar 22, 2022

Creating a schedule

Tasks

Stages

Systems

Example

Future work: stage caching

Anders429 commented Nov 10, 2022

Dynamic `Schedule` stages #73

Dynamic `Schedule` stages #73