New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement thread-safe ECS #10

Closed
ebkalderon opened this Issue Jan 19, 2016 · 84 comments

Comments

Projects
@ebkalderon
Member

ebkalderon commented Jan 19, 2016

With game state management mostly stable, the next step is to implement an entity-component-system framework in the engine. This framework must follow the pure ECS model, in which entities are just IDs and components are POD structs.

Though the ECS model is extremely flexible and easy to data-drive, it has its shortcomings:

  • Difficult to cache important data
  • Difficult to parallelize, heavy reliance on mutable state

In a useful Reddit post, tomaka describes an alternative model similar to MVC with some parallelism. I propose a combination of the two designs resulting in a thread-safe ECS.

(Quoted from the other announcements section in TWIA Issue 1)

Clearly, the World struct containing our entities and components is analogous to the model in MVC. But systems typically require mutable state even for strictly read-only operations, so they don’t fit into either views or controllers, nor are they easy to parallelize.

However, there are two possible ways to remedy this issue.

  1. Systems are split into two separate traits: Updaters and Viewers. The Viewers only require read access to the World, so they can run in parallel. Updaters request mutable access to the World and submit a list of changes they want to make to the content. These changes are applied serially according to a predetermined priority.
  2. Systems are kept as one unit (perhaps renamed to “Processors” for clarity), and they all read the World struct in parallel but can optionally submit changes to the World, which are applied serially once everyone has finished reading.

Experiments are ongoing on the ecs branch.

@ebkalderon ebkalderon added this to the 1.0 milestone Jan 19, 2016

@ebkalderon ebkalderon referenced this issue Jan 19, 2016

Closed

Redesign GPU resource management #7

3 of 16 tasks complete
@rphmeier

This comment has been minimized.

Show comment
Hide comment
@rphmeier

rphmeier Jan 19, 2016

You may be interested in my (very early stages, very experimental) similar project here:
https://github.com/rphmeier/rust_toy_engine/tree/master/src/ecs

I've done quite a bit of thinking about parallelizing processors (as I call them) over the past few months, and I've arrived at the following conclusions:

Processors will often have a hierarchy of priority, where certain processors must be run before others.
Processors which do not conflict (i.e. do not access any of the same data) can be run concurrently with no issue. Most general processing can be split up into a "read" phase and a "write" phase.

From this I came up with the following design:

  • Internal game state (entity manager, component data blobs) is wrapped in an RwLock.
  • Every processor is passed a WorldHandle, which consists of a reference to the locked state as well as a handle to a thread pool.
  • The WorldHandle can be used to iterate over all entities with a given set of components either asynchronously or synchronously, with only read access to their data.
  • This iteration will internally produce a queue of actions, which can be consumed during the processor's write phase.

I am still in the middle of implementing the parallel iteration, as well as the MPSC queue, but the intent
is to have a standard processor look like this:

// The particle component this uses.
struct Particle {
    pos: Position,
    effect: Effect,
    frames_left: u32,
}

impl Component for Particle {}

struct ParticleProcessor {
    graphics: Graphics,
}
impl Processor for ParticleProcessor {
    fn process<'a, 'b, C: Components + 'a>(&'a mut self, world: WorldHandle<'a, 'b, C>) {
        enum Action {
            // draw the effect and decrement the frame count
            Decrement(Entity),
            // destroy the entity.
            Destroy(Entity)
        }

        // read phase is here -- no writes will occur while this is processing.
        let actions = world.all_with::<Particle>().for_each_async(|e| {
            // entities here are guaranteed to have the particle component.
            let part = world.get_component::<Particle>(e).unwrap();
            if part.frames_left != 0 {
                Action::Decrement(e),
            } else {
                Action::Destroy(e),
            }
        });

        // write phase is here. get exclusive access to the world.
        let mut write = world.write();

        for action in actions {
            match action {
                Action::Decrement(e) => {
                    let mut part = write.get_mut_component::<Particle>(e).unwrap();
                    part.frames -= 1;
                    draw_particle(&mut self.graphics, part.pos, part.effect);
                }
                Action::Destroy(e) => {
                    write.destroy_entity(e);
                }
            }
        }
    }
}

From this example we can see that there are a few different degrees of parallelism to processors.
Some can perform their reading phase asynchronously and then are required to do some work on a specific thread (graphics processors generally come to mind). Some will be able to do neither phase in parallel. Hopefully most will be capable of running purely in parallel with other processors, locking only to get exclusive access to the state.

Although the example here is somewhat contrived, the main idea is that as much work should be done
during the read phase as possible, deferring updates until the later write phase. Once a processor is ready to enter its write phase, it should be very quick to simply apply its updates to the state and finish up.
Actual execution of processors looks something like this:

let mut my_world = ...;

my_world.process(|ctxt| {
    // all processors executed in this group will execute purely asynchonously, except for
    // contention during their write phases.
    ctxt.process_group(|group| {
        group.process(&mut A);
        group.process(&mut B);
        group.process(&mut C);
    });
    // execution returns here only once all grouped processors have completely finished,
    // guaranteeing an order of events.

    // these processors may use some parallelism internally,
    // but are not able to be sent to other threads to execute.
    ctxt.process_sync(&mut D);
    ctxt.process_sync(&mut E);

    // E is only run once D is fully complete.
});

All this being said, it seems like your initial reaction has similar elements to mine, but it rather focuses on a priority of events rather than a priority of processors.

rphmeier commented Jan 19, 2016

You may be interested in my (very early stages, very experimental) similar project here:
https://github.com/rphmeier/rust_toy_engine/tree/master/src/ecs

I've done quite a bit of thinking about parallelizing processors (as I call them) over the past few months, and I've arrived at the following conclusions:

Processors will often have a hierarchy of priority, where certain processors must be run before others.
Processors which do not conflict (i.e. do not access any of the same data) can be run concurrently with no issue. Most general processing can be split up into a "read" phase and a "write" phase.

From this I came up with the following design:

  • Internal game state (entity manager, component data blobs) is wrapped in an RwLock.
  • Every processor is passed a WorldHandle, which consists of a reference to the locked state as well as a handle to a thread pool.
  • The WorldHandle can be used to iterate over all entities with a given set of components either asynchronously or synchronously, with only read access to their data.
  • This iteration will internally produce a queue of actions, which can be consumed during the processor's write phase.

I am still in the middle of implementing the parallel iteration, as well as the MPSC queue, but the intent
is to have a standard processor look like this:

// The particle component this uses.
struct Particle {
    pos: Position,
    effect: Effect,
    frames_left: u32,
}

impl Component for Particle {}

struct ParticleProcessor {
    graphics: Graphics,
}
impl Processor for ParticleProcessor {
    fn process<'a, 'b, C: Components + 'a>(&'a mut self, world: WorldHandle<'a, 'b, C>) {
        enum Action {
            // draw the effect and decrement the frame count
            Decrement(Entity),
            // destroy the entity.
            Destroy(Entity)
        }

        // read phase is here -- no writes will occur while this is processing.
        let actions = world.all_with::<Particle>().for_each_async(|e| {
            // entities here are guaranteed to have the particle component.
            let part = world.get_component::<Particle>(e).unwrap();
            if part.frames_left != 0 {
                Action::Decrement(e),
            } else {
                Action::Destroy(e),
            }
        });

        // write phase is here. get exclusive access to the world.
        let mut write = world.write();

        for action in actions {
            match action {
                Action::Decrement(e) => {
                    let mut part = write.get_mut_component::<Particle>(e).unwrap();
                    part.frames -= 1;
                    draw_particle(&mut self.graphics, part.pos, part.effect);
                }
                Action::Destroy(e) => {
                    write.destroy_entity(e);
                }
            }
        }
    }
}

From this example we can see that there are a few different degrees of parallelism to processors.
Some can perform their reading phase asynchronously and then are required to do some work on a specific thread (graphics processors generally come to mind). Some will be able to do neither phase in parallel. Hopefully most will be capable of running purely in parallel with other processors, locking only to get exclusive access to the state.

Although the example here is somewhat contrived, the main idea is that as much work should be done
during the read phase as possible, deferring updates until the later write phase. Once a processor is ready to enter its write phase, it should be very quick to simply apply its updates to the state and finish up.
Actual execution of processors looks something like this:

let mut my_world = ...;

my_world.process(|ctxt| {
    // all processors executed in this group will execute purely asynchonously, except for
    // contention during their write phases.
    ctxt.process_group(|group| {
        group.process(&mut A);
        group.process(&mut B);
        group.process(&mut C);
    });
    // execution returns here only once all grouped processors have completely finished,
    // guaranteeing an order of events.

    // these processors may use some parallelism internally,
    // but are not able to be sent to other threads to execute.
    ctxt.process_sync(&mut D);
    ctxt.process_sync(&mut E);

    // E is only run once D is fully complete.
});

All this being said, it seems like your initial reaction has similar elements to mine, but it rather focuses on a priority of events rather than a priority of processors.

@ebkalderon

This comment has been minimized.

Show comment
Hide comment
@ebkalderon

ebkalderon Jan 20, 2016

Member

Great information! Seems like our approaches are indeed pretty similar, for the most part. For further comparison, here is an academic paper called A Concurrent Component-Based Entity Architecture for Game Development (2015). It arrives at a similar conclusion to ours:

  1. Process: Equivalent to our "Processors".
  2. Past State: Refers to the read-only world state from the previous frame read by Processes.
  3. Future State: Refers to the modified world state being generated by the Processes during the current frame.

If possible, I prefer a lockless approach to parallelism, taking advantage of the Rust ownership system to catch data races at compile time, rather than at runtime with a mutex or similar locking data structure. However, I am open to using locks if concrete benchmarks prove better performance.

Member

ebkalderon commented Jan 20, 2016

Great information! Seems like our approaches are indeed pretty similar, for the most part. For further comparison, here is an academic paper called A Concurrent Component-Based Entity Architecture for Game Development (2015). It arrives at a similar conclusion to ours:

  1. Process: Equivalent to our "Processors".
  2. Past State: Refers to the read-only world state from the previous frame read by Processes.
  3. Future State: Refers to the modified world state being generated by the Processes during the current frame.

If possible, I prefer a lockless approach to parallelism, taking advantage of the Rust ownership system to catch data races at compile time, rather than at runtime with a mutex or similar locking data structure. However, I am open to using locks if concrete benchmarks prove better performance.

@rphmeier

This comment has been minimized.

Show comment
Hide comment
@rphmeier

rphmeier Jan 20, 2016

@ebkalderon Thanks for the paper. I haven't seen it yet, but it seems promising. I am wary of the idea of using N "past" states and a "future" state -- this necessarily means copying the game state N times per frame to different points in memory. Of course, there are situations like networking where copying game state, or part of it, is necessary.

One issue I have with the paper is the scheduling aspect of processors. The author chooses to assign each processor purely to one thread, using a task-parallel model rather than a data-parallel one. A large portion of the scheduling section is concessions that processor time is unpredictable, and there is likely to be a decent amount of wasted CPU time due to that unpredictability. I favor a work-stealing approach, where parallelizable processors simply decide which data they need, how to process it, and let all the threads work as fast as they can in order to do that. There is no excess time spent on scheduling or waiting for overloaded threads to finish.

I also am not completely convinced that purely lock-free is even viable here: using your model, at some the threads must contend for access to the MPSC queue which produces component updates. It might be possible to implement such a queue in a lock-free way, but consider that the costs of locking an unlocked mutex are vastly overestimated (often not much more than an atomic compare-and-swap).

The key to any multithreaded ECS is idempotence. The same state S0 passed through the same processors should always lead to an identical output state S1. My initial approach was similar to yours, where every processor produced some amount of changes to the state, to be applied later. I toyed with a concept of "commutative" and "non-commutative" changes which could be made to component data -- these are strangely similar to Rust's borrowing system:

  • you could either make N commutative updates to an instance of a component, or
  • you could have one single non-commutative update to the instance.

To break these rules would lead to a runtime error. The implementation of this design led a few glaring issues:

  • defining updates to components at the type level was unreasonably verbose
  • it was difficult to defer changes until post-processing efficiently

One data model I toyed with for complete concurrency was giving every component type its own data buffer, rather than storing all an entity's component data together in a block. If you protect each of these buffers with a mutex of its own, processors which do not access the same component data can interleave with zero contention. The caveat is that the entity manager is a resource which many processors will contend for access to, so it became the bottleneck.

I am curious what your mental model for the types of updates is currently. How does the user specify which component types the world manages and what order updates should be applied in? What type do updates have? Where are they stored? How are they iterated over to be applied?

rphmeier commented Jan 20, 2016

@ebkalderon Thanks for the paper. I haven't seen it yet, but it seems promising. I am wary of the idea of using N "past" states and a "future" state -- this necessarily means copying the game state N times per frame to different points in memory. Of course, there are situations like networking where copying game state, or part of it, is necessary.

One issue I have with the paper is the scheduling aspect of processors. The author chooses to assign each processor purely to one thread, using a task-parallel model rather than a data-parallel one. A large portion of the scheduling section is concessions that processor time is unpredictable, and there is likely to be a decent amount of wasted CPU time due to that unpredictability. I favor a work-stealing approach, where parallelizable processors simply decide which data they need, how to process it, and let all the threads work as fast as they can in order to do that. There is no excess time spent on scheduling or waiting for overloaded threads to finish.

I also am not completely convinced that purely lock-free is even viable here: using your model, at some the threads must contend for access to the MPSC queue which produces component updates. It might be possible to implement such a queue in a lock-free way, but consider that the costs of locking an unlocked mutex are vastly overestimated (often not much more than an atomic compare-and-swap).

The key to any multithreaded ECS is idempotence. The same state S0 passed through the same processors should always lead to an identical output state S1. My initial approach was similar to yours, where every processor produced some amount of changes to the state, to be applied later. I toyed with a concept of "commutative" and "non-commutative" changes which could be made to component data -- these are strangely similar to Rust's borrowing system:

  • you could either make N commutative updates to an instance of a component, or
  • you could have one single non-commutative update to the instance.

To break these rules would lead to a runtime error. The implementation of this design led a few glaring issues:

  • defining updates to components at the type level was unreasonably verbose
  • it was difficult to defer changes until post-processing efficiently

One data model I toyed with for complete concurrency was giving every component type its own data buffer, rather than storing all an entity's component data together in a block. If you protect each of these buffers with a mutex of its own, processors which do not access the same component data can interleave with zero contention. The caveat is that the entity manager is a resource which many processors will contend for access to, so it became the bottleneck.

I am curious what your mental model for the types of updates is currently. How does the user specify which component types the world manages and what order updates should be applied in? What type do updates have? Where are they stored? How are they iterated over to be applied?

@ebkalderon

This comment has been minimized.

Show comment
Hide comment
@ebkalderon

ebkalderon Jan 20, 2016

Member

Thanks for the response! Just to clarify, there is no copying of the world being done. There is only the existing world state and the future world state being computed by the processors. The reason why I contend that my proposal could be made lock-free is because it is task-parallel (AFAIK, a performance disadvantage) rather than data-parallel, with discrete parallel-read and serial-write stages, where the processors' writing priority is determined by the order in which they were passed into the Application through a builder pattern.

That said, I appreciate you explaining your position. The data-parallel design you described seems sound and probably more efficient (concrete benchmarks would have to be made to compare both to be sure). I welcome more feedback on both approaches, or other alternatives, from other participants before the final decision is made.

Member

ebkalderon commented Jan 20, 2016

Thanks for the response! Just to clarify, there is no copying of the world being done. There is only the existing world state and the future world state being computed by the processors. The reason why I contend that my proposal could be made lock-free is because it is task-parallel (AFAIK, a performance disadvantage) rather than data-parallel, with discrete parallel-read and serial-write stages, where the processors' writing priority is determined by the order in which they were passed into the Application through a builder pattern.

That said, I appreciate you explaining your position. The data-parallel design you described seems sound and probably more efficient (concrete benchmarks would have to be made to compare both to be sure). I welcome more feedback on both approaches, or other alternatives, from other participants before the final decision is made.

@HeroesGrave

This comment has been minimized.

Show comment
Hide comment
@HeroesGrave

HeroesGrave Feb 2, 2016

Avoid trying to handle process scheduling. If any kind of scheduling is required by the final design, provide the tools to allow the user of the library to do it.

They'll be the ones that are able to properly benchmark different schedules in order to get the best performance for their particular game.

HeroesGrave commented Feb 2, 2016

Avoid trying to handle process scheduling. If any kind of scheduling is required by the final design, provide the tools to allow the user of the library to do it.

They'll be the ones that are able to properly benchmark different schedules in order to get the best performance for their particular game.

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Feb 2, 2016

Member

@HeroesGrave nice to see you here!
I don't think your advice applies 100% in this context. I understand why you'd want to avoid scheduling in ecs-rs, since it's purely an ECS library. Amethyst, on the other hand, is a game engine, so it might as well take the responsibility of scheduling the systems.

Member

kvark commented Feb 2, 2016

@HeroesGrave nice to see you here!
I don't think your advice applies 100% in this context. I understand why you'd want to avoid scheduling in ecs-rs, since it's purely an ECS library. Amethyst, on the other hand, is a game engine, so it might as well take the responsibility of scheduling the systems.

@mangecoeur

This comment has been minimized.

Show comment
Hide comment
@mangecoeur

mangecoeur Feb 9, 2016

Some thoughts from my attempts to parallelize a Flocking simulation:

  • if two systems affect the same entity properties (e.g. position) they must run in a deterministic order, and so cannot be run in parallel on the same entity. If one system needs access to all entities to do its thing, that precludes any other system from running in parallel, and limits the amount of paralism the current system can do, because it must avoid changing the world that it is using as an input to calculate it's next update.
  • In the flocking example, acceleration vectors for each entitiy depends on the locations, velocities of neighbours - if you were to iterate-and-write you change positions, which messes with the simulation. You need to access all entites in order to calculate the update for each entitiy. One crude solution is to have two copies of the entities and switch between them - one is read-only and the other write-only, and you 'blit' them for each update. You can parallelize as much as you like, since you know you only ever write to one boid at a time and you can have shared reads. You could even simulate several frames in advance if you like and invalidate-on-input. You can optimize using spatial data structures, which could also use parallel builders.
  • From the flocking example, I feel that parallelism should be left up to the System implementations, because they know much more about what they are trying to do (flocking, physics, etc) and therefore can be much smarter about how (or if) to run it in parallel - including doing fancy things like pushing physics work to the GPU.
  • The job of the ECS would be to make writing parallel systems easier and more memory efficient. For example, for flocking you only need access to the position and velocity vectors. The ECS could let you "borrow" these from the World and they would have much lower memory footprint than copies of the whole world would.
  • In addition, this would enable parallelism for Systems that are guaranteed not to affect the same entity components. For instance one system could affect only the "Color" components of the World's Entities, while another the "Position" components. Using Rust's borrowing system, it could guarantee that nothing is trying write to both in parallel.
  • End result: ECS runs Systems in parallel that can be statically guaranteed not to tread on each others toes. Systems that cannot run in parallel run one after the other, but internally can use memory-efficient access to borrowed properties to parallelize their internal workings.

mangecoeur commented Feb 9, 2016

Some thoughts from my attempts to parallelize a Flocking simulation:

  • if two systems affect the same entity properties (e.g. position) they must run in a deterministic order, and so cannot be run in parallel on the same entity. If one system needs access to all entities to do its thing, that precludes any other system from running in parallel, and limits the amount of paralism the current system can do, because it must avoid changing the world that it is using as an input to calculate it's next update.
  • In the flocking example, acceleration vectors for each entitiy depends on the locations, velocities of neighbours - if you were to iterate-and-write you change positions, which messes with the simulation. You need to access all entites in order to calculate the update for each entitiy. One crude solution is to have two copies of the entities and switch between them - one is read-only and the other write-only, and you 'blit' them for each update. You can parallelize as much as you like, since you know you only ever write to one boid at a time and you can have shared reads. You could even simulate several frames in advance if you like and invalidate-on-input. You can optimize using spatial data structures, which could also use parallel builders.
  • From the flocking example, I feel that parallelism should be left up to the System implementations, because they know much more about what they are trying to do (flocking, physics, etc) and therefore can be much smarter about how (or if) to run it in parallel - including doing fancy things like pushing physics work to the GPU.
  • The job of the ECS would be to make writing parallel systems easier and more memory efficient. For example, for flocking you only need access to the position and velocity vectors. The ECS could let you "borrow" these from the World and they would have much lower memory footprint than copies of the whole world would.
  • In addition, this would enable parallelism for Systems that are guaranteed not to affect the same entity components. For instance one system could affect only the "Color" components of the World's Entities, while another the "Position" components. Using Rust's borrowing system, it could guarantee that nothing is trying write to both in parallel.
  • End result: ECS runs Systems in parallel that can be statically guaranteed not to tread on each others toes. Systems that cannot run in parallel run one after the other, but internally can use memory-efficient access to borrowed properties to parallelize their internal workings.
@Oflor

This comment has been minimized.

Show comment
Hide comment
@Oflor

Oflor Feb 9, 2016

Contributor

Isn't there a way to implement systems which do not affect (write to) the same components?

Contributor

Oflor commented Feb 9, 2016

Isn't there a way to implement systems which do not affect (write to) the same components?

@ebkalderon

This comment has been minimized.

Show comment
Hide comment
@ebkalderon

ebkalderon Feb 10, 2016

Member

@Oflor Honestly, that is going to be difficult. Most notably: the physics processor, the particle processor, and the animation processor all modify entities' transform data. One could break down the components into ever-more granular and specific pieces, but this introduces a host of unnecessary complexities (having multiple transform components, syncing duplicate data between them, etc).

I personally prefer to establish and enforce strict order dependencies between processors, either similar to @mangecoeur's or @rphmeier's approaches, or according to my original proposal.

Member

ebkalderon commented Feb 10, 2016

@Oflor Honestly, that is going to be difficult. Most notably: the physics processor, the particle processor, and the animation processor all modify entities' transform data. One could break down the components into ever-more granular and specific pieces, but this introduces a host of unnecessary complexities (having multiple transform components, syncing duplicate data between them, etc).

I personally prefer to establish and enforce strict order dependencies between processors, either similar to @mangecoeur's or @rphmeier's approaches, or according to my original proposal.

@ebkalderon

This comment has been minimized.

Show comment
Hide comment
@ebkalderon

ebkalderon Feb 19, 2016

Member

The user-facing API design of the ECS has been drafted on the ECS Design page on the wiki. Please leave your feedback here or on Gitter!

Edit: Updated link.

Member

ebkalderon commented Feb 19, 2016

The user-facing API design of the ECS has been drafted on the ECS Design page on the wiki. Please leave your feedback here or on Gitter!

Edit: Updated link.

@mangecoeur

This comment has been minimized.

Show comment
Hide comment
@mangecoeur

mangecoeur Feb 25, 2016

Just had a look at the proposed usage. One thing I'm not keen on is that the proposed way of adding Processors doesn't make it sufficiently explicit that they will run in the order they are added, so that something like this:

.with(Input::new())
.with(Physics::new())

will not necessarily have the same result as this:

.with(Physics::new())
.with(Input::new())

Since there was some emphasis on making everything predictable and deterministic I think the its important that this is really clear.

Some alternative suggestions

  • More explicit naming might help, like .append_processor(Physics::new()).
  • Create a vector of processors or processor references and add them, something like:
processors = vec!{Physics::new(), Input::new()}
world.add_processors(processors)
  • Create a list of processors using a macro (since vec might not be work as desired)
processors = processors_group!{Physics::new(), Input::new()}
world.add_processors(processors)

mangecoeur commented Feb 25, 2016

Just had a look at the proposed usage. One thing I'm not keen on is that the proposed way of adding Processors doesn't make it sufficiently explicit that they will run in the order they are added, so that something like this:

.with(Input::new())
.with(Physics::new())

will not necessarily have the same result as this:

.with(Physics::new())
.with(Input::new())

Since there was some emphasis on making everything predictable and deterministic I think the its important that this is really clear.

Some alternative suggestions

  • More explicit naming might help, like .append_processor(Physics::new()).
  • Create a vector of processors or processor references and add them, something like:
processors = vec!{Physics::new(), Input::new()}
world.add_processors(processors)
  • Create a list of processors using a macro (since vec might not be work as desired)
processors = processors_group!{Physics::new(), Input::new()}
world.add_processors(processors)
@mangecoeur

This comment has been minimized.

Show comment
Hide comment
@mangecoeur

mangecoeur Feb 25, 2016

Also a question: how are Processors associated with the Components that they need to process?

mangecoeur commented Feb 25, 2016

Also a question: how are Processors associated with the Components that they need to process?

@lschmierer

This comment has been minimized.

Show comment
Hide comment
@lschmierer

lschmierer Mar 10, 2016

Member

Since there was some emphasis on making everything predictable and deterministic I think the its important that this is really clear.

Since we want to run our ECS multi-threaded, will this order be used for the sequential writes to world and reading is allways simultaneously?

Or do we want to control which systems are run in parallel?
Therefore I would suggest something like this:

SimulationBuilder()::new()
    .append(Rendering::new())
    .append_parallel(Physics::new(), Ui::new())
    .finalize()

Also a question: how are Processors associated with the Components that they need to process?

I think processors simply get a reference to the world and query the components they need to access.

Proposal for the API of the world:

impl World {
    /// Creates a new empty world.
    pub fn new() -> World;

    /// Creates a new entity in the world and returns a handle to it.
    pub fn create_entity(&mut self) -> Entity;

    /// Destroys a given entity and removes its components.
    pub fn destroy_entity(&mut self, entity: Entity);

    /// Attaches a component to an entity.
    pub fn insert_component<T: Any>(&mut self, entity: Entity, comp: T);

    /// Remove a component from an entity.
    pub fn remove_component<T: Any>(&mut self, entity: Entity);

    /// Get component by entity.
    pub fn component<T: Any>(&mut self, entity: Entity) -> Option<&T>;

    pub fn component_mut<T: Any>(&mut self, entity: Entity) -> Option<&mut T>;

    /// Iterate over components by type.
    pub fn components1<A: Any>(&self)-> Iterator<(Entity, &A)>;
    pub fn components2<A: Any, B: Any>(&self)-> Iterator<(Entity, &A, &B)>;
    pub fn components3<A: Any, B: Any, C: Any>(&self)-> Iterator<(Entity, &A, &B, &C)>;
    // some more...

    pub fn components1_mut<A: Any>(&self)-> Iterator<(Entity, &mut A)>;
    pub fn components2_mut<A: Any, B: Any>(&self)-> Iterator<(Entity, &mut A, &mut B)>;
    pub fn components3_mut<A: Any, B: Any, C: Any>(&self)-> Iterator<(Entity, &mut A, &mut B, &mut C)>;
    // some more...

}
Member

lschmierer commented Mar 10, 2016

Since there was some emphasis on making everything predictable and deterministic I think the its important that this is really clear.

Since we want to run our ECS multi-threaded, will this order be used for the sequential writes to world and reading is allways simultaneously?

Or do we want to control which systems are run in parallel?
Therefore I would suggest something like this:

SimulationBuilder()::new()
    .append(Rendering::new())
    .append_parallel(Physics::new(), Ui::new())
    .finalize()

Also a question: how are Processors associated with the Components that they need to process?

I think processors simply get a reference to the world and query the components they need to access.

Proposal for the API of the world:

impl World {
    /// Creates a new empty world.
    pub fn new() -> World;

    /// Creates a new entity in the world and returns a handle to it.
    pub fn create_entity(&mut self) -> Entity;

    /// Destroys a given entity and removes its components.
    pub fn destroy_entity(&mut self, entity: Entity);

    /// Attaches a component to an entity.
    pub fn insert_component<T: Any>(&mut self, entity: Entity, comp: T);

    /// Remove a component from an entity.
    pub fn remove_component<T: Any>(&mut self, entity: Entity);

    /// Get component by entity.
    pub fn component<T: Any>(&mut self, entity: Entity) -> Option<&T>;

    pub fn component_mut<T: Any>(&mut self, entity: Entity) -> Option<&mut T>;

    /// Iterate over components by type.
    pub fn components1<A: Any>(&self)-> Iterator<(Entity, &A)>;
    pub fn components2<A: Any, B: Any>(&self)-> Iterator<(Entity, &A, &B)>;
    pub fn components3<A: Any, B: Any, C: Any>(&self)-> Iterator<(Entity, &A, &B, &C)>;
    // some more...

    pub fn components1_mut<A: Any>(&self)-> Iterator<(Entity, &mut A)>;
    pub fn components2_mut<A: Any, B: Any>(&self)-> Iterator<(Entity, &mut A, &mut B)>;
    pub fn components3_mut<A: Any, B: Any, C: Any>(&self)-> Iterator<(Entity, &mut A, &mut B, &mut C)>;
    // some more...

}
@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 10, 2016

Member

(sorry if I missed something in this thread, not studied it through)

ctxt.process_group(|group| {
        group.process(&mut A);
        group.process(&mut B);
        group.process(&mut C);
    });

and:

    .append_parallel(Physics::new(), Ui::new())

Both of these approaches enforce a fork-join model on the API level, which imposes a tight upper bound on the performance you can squeeze out of a parallel system, since chooses a rather tiny subset of all possible execution graphs. For example, what if Ui doesn't need to wait for the previous operation while Physics does?

Optimally, each system just specifies the dependencies for it to start, and then some sort of a scheduler figures out when to start it. We did some experiments with @csherratt on fibe-rs, you could use its example for inspiration:

    let ha = task(move |_| {print!("Hello, ")}).start(&mut front);
    task(move |_| {println!("World!")}).after(ha.signal()).start(&mut front);

I think processors simply get a reference to the world and query the components they need to access.

If you don't explicitly expose which components are being read, and which are modified per system, you don't provide the scheduler an opportunity to figure out the proper timeline for their execution.

One could specify the systems directly as dependencies, or build the dependency graph judging by which components are getting modified and read. The latter would work well if you could actually enforce the fact that these and only these components are what each system receives for processing.

Member

kvark commented Mar 10, 2016

(sorry if I missed something in this thread, not studied it through)

ctxt.process_group(|group| {
        group.process(&mut A);
        group.process(&mut B);
        group.process(&mut C);
    });

and:

    .append_parallel(Physics::new(), Ui::new())

Both of these approaches enforce a fork-join model on the API level, which imposes a tight upper bound on the performance you can squeeze out of a parallel system, since chooses a rather tiny subset of all possible execution graphs. For example, what if Ui doesn't need to wait for the previous operation while Physics does?

Optimally, each system just specifies the dependencies for it to start, and then some sort of a scheduler figures out when to start it. We did some experiments with @csherratt on fibe-rs, you could use its example for inspiration:

    let ha = task(move |_| {print!("Hello, ")}).start(&mut front);
    task(move |_| {println!("World!")}).after(ha.signal()).start(&mut front);

I think processors simply get a reference to the world and query the components they need to access.

If you don't explicitly expose which components are being read, and which are modified per system, you don't provide the scheduler an opportunity to figure out the proper timeline for their execution.

One could specify the systems directly as dependencies, or build the dependency graph judging by which components are getting modified and read. The latter would work well if you could actually enforce the fact that these and only these components are what each system receives for processing.

@Oflor

This comment has been minimized.

Show comment
Hide comment
@Oflor

Oflor Mar 11, 2016

Contributor

Paralleling over systems might be an overkill if we already parallel over entities. There are going to be more entities than systems (if not, we should be able run it in a single thread anyway).

Specifying accessed components requires dealing with trait-objects and TypeIds (I think), and that is already less efficient than compile-time system managers.

Contributor

Oflor commented Mar 11, 2016

Paralleling over systems might be an overkill if we already parallel over entities. There are going to be more entities than systems (if not, we should be able run it in a single thread anyway).

Specifying accessed components requires dealing with trait-objects and TypeIds (I think), and that is already less efficient than compile-time system managers.

@mangecoeur

This comment has been minimized.

Show comment
Hide comment
@mangecoeur

mangecoeur Mar 11, 2016

Specifying accessed components requires dealing with trait-objects and TypeIds (I think), and that is already less efficient than compile-time system managers.

Wouldn't this require that Components should be defined by implementing a trait? That would make it difficult to create game objects by loading their definitions from declarative data files (would would be forced to define all entities in rust source code).

If you don't explicitly expose which components are being read, and which are modified per system, you don't provide the scheduler an opportunity to figure out the proper timeline for their execution.

I think this is important - even if it does mean messing with TypeIds, it opens up a lot of doors for speeding up execution, which should offset the extra complexity.

Another thought - it may be possible to perform these checks at compile time using compiler plugins. Declarative game object definitions could be read in at compile time (instead of at runtime) to allow these checks to be made. It may even be possible to build re-usable machinery that can either perform the checks at runtime or at compile time, so that while you prototype you can edit entities dynamically without waiting for a re-compile every time, but when you move to release you could bake everything in. Just an idea...

mangecoeur commented Mar 11, 2016

Specifying accessed components requires dealing with trait-objects and TypeIds (I think), and that is already less efficient than compile-time system managers.

Wouldn't this require that Components should be defined by implementing a trait? That would make it difficult to create game objects by loading their definitions from declarative data files (would would be forced to define all entities in rust source code).

If you don't explicitly expose which components are being read, and which are modified per system, you don't provide the scheduler an opportunity to figure out the proper timeline for their execution.

I think this is important - even if it does mean messing with TypeIds, it opens up a lot of doors for speeding up execution, which should offset the extra complexity.

Another thought - it may be possible to perform these checks at compile time using compiler plugins. Declarative game object definitions could be read in at compile time (instead of at runtime) to allow these checks to be made. It may even be possible to build re-usable machinery that can either perform the checks at runtime or at compile time, so that while you prototype you can edit entities dynamically without waiting for a re-compile every time, but when you move to release you could bake everything in. Just an idea...

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 11, 2016

Member

@Oflor

Paralleling over systems might be an overkill if we already parallel over entities. There are going to be more entities than systems (if not, we should be able run it in a single thread anyway).

I haven't considered that. At the first sight, parallel entities processing would involve a lot of synchronization overhead. Like, you'd need some kind of dynamic sync before accessing each entity, as opposed to just starting a system. Another major problem would be the cache contention - if components are close to each other in memory, and one of them gets modified by one thread, access to all other components in this cache line will be delayed significantly for other threads... In other words, I'm not convinced it's worth it. Parallelizing systems just seems like an easier task, and it gives rather good performance - that's what Bungie and Naughty Dog are doing, IIRC.

Specifying accessed components requires dealing with trait-objects and TypeIds (I think), and that is already less efficient than compile-time system managers.

I'm pretty sure it should be possible to specify at compile time too. @mangecoeur proposed one way, via compiler plugins, but I'd also like to explore non-plugin approaches.

Member

kvark commented Mar 11, 2016

@Oflor

Paralleling over systems might be an overkill if we already parallel over entities. There are going to be more entities than systems (if not, we should be able run it in a single thread anyway).

I haven't considered that. At the first sight, parallel entities processing would involve a lot of synchronization overhead. Like, you'd need some kind of dynamic sync before accessing each entity, as opposed to just starting a system. Another major problem would be the cache contention - if components are close to each other in memory, and one of them gets modified by one thread, access to all other components in this cache line will be delayed significantly for other threads... In other words, I'm not convinced it's worth it. Parallelizing systems just seems like an easier task, and it gives rather good performance - that's what Bungie and Naughty Dog are doing, IIRC.

Specifying accessed components requires dealing with trait-objects and TypeIds (I think), and that is already less efficient than compile-time system managers.

I'm pretty sure it should be possible to specify at compile time too. @mangecoeur proposed one way, via compiler plugins, but I'd also like to explore non-plugin approaches.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Mar 14, 2016

// Create the entity eid with the compoents Foo{}/Bar{}
let eid = ENTITY.with(Foo{}).with(Bar{}).create(&mut world);

// Update the component Baz
eid.with(Baz{}).update(&mut world);

The advantage of the builder API like I have shown above is that it does not lock the world object while you are building the list of components that you are modifying. This can let functions like this exist

// do some update 
fn update(eid: Entity) -> EntityWith<(Foo,)> { eid.with(Foo{}) }

// in parallel update all entities and write the results into `world`
// not sure if this would actually work this elegantly (haven't got to trying it yet)
entities.parallel_map(update).write(&mut world);

Accessing the data could be done with queries on the world object.

// select every entity from world that has a Foo attached to it
for (eid, foo) in world.select::<(Foo,)>() {

}

// select every entity from world where Foo & Bar are attached to it
for (eid, (foo, bar)) in world.select::<(Foo, Bar)>() {

}

// select every entity from world where Foo & Bar are attached and 
// it is a member of the entity set
for (eid, (foo, bar)) in world.select::<(Foo, Bar)>().only(&entity_set) {

}

The biggest problem with these queries (or any iterator for that matter) is that you can't actually modify the world object since it is immutably borrowed by the select.

The solution I have always reverted to is making it cheap to clone world and then modify the cloned copy. I did this through copy-on-write, but I'm not sure it is a great general solution as it has performance penalties.

Alternatively systems could work in parallel with past copies, and write their updates to a changelist. This can be annoying as you can't see the writes you have made until they are committed to the world. But I know can be made to work.

ghost commented Mar 14, 2016

// Create the entity eid with the compoents Foo{}/Bar{}
let eid = ENTITY.with(Foo{}).with(Bar{}).create(&mut world);

// Update the component Baz
eid.with(Baz{}).update(&mut world);

The advantage of the builder API like I have shown above is that it does not lock the world object while you are building the list of components that you are modifying. This can let functions like this exist

// do some update 
fn update(eid: Entity) -> EntityWith<(Foo,)> { eid.with(Foo{}) }

// in parallel update all entities and write the results into `world`
// not sure if this would actually work this elegantly (haven't got to trying it yet)
entities.parallel_map(update).write(&mut world);

Accessing the data could be done with queries on the world object.

// select every entity from world that has a Foo attached to it
for (eid, foo) in world.select::<(Foo,)>() {

}

// select every entity from world where Foo & Bar are attached to it
for (eid, (foo, bar)) in world.select::<(Foo, Bar)>() {

}

// select every entity from world where Foo & Bar are attached and 
// it is a member of the entity set
for (eid, (foo, bar)) in world.select::<(Foo, Bar)>().only(&entity_set) {

}

The biggest problem with these queries (or any iterator for that matter) is that you can't actually modify the world object since it is immutably borrowed by the select.

The solution I have always reverted to is making it cheap to clone world and then modify the cloned copy. I did this through copy-on-write, but I'm not sure it is a great general solution as it has performance penalties.

Alternatively systems could work in parallel with past copies, and write their updates to a changelist. This can be annoying as you can't see the writes you have made until they are committed to the world. But I know can be made to work.

@ebkalderon

This comment has been minimized.

Show comment
Hide comment
@ebkalderon

ebkalderon Mar 14, 2016

Member

@csherratt We were initially going to go with the change list model, but eventually decided against it because of suspected latency. However, we haven't done any concrete profiling to confirm this. Have you tested this and/or the locking method before? If so, what were your conclusions?

Member

ebkalderon commented Mar 14, 2016

@csherratt We were initially going to go with the change list model, but eventually decided against it because of suspected latency. However, we haven't done any concrete profiling to confirm this. Have you tested this and/or the locking method before? If so, what were your conclusions?

@rphmeier

This comment has been minimized.

Show comment
Hide comment
@rphmeier

rphmeier Mar 14, 2016

@csherratt's approach sounds really similar to the basic "queries" I've been imagining while experimenting. I can try and work a little on the implementation, to see how feasible they really are. If user-specified storage of component data is allowed (e.g. positional data structures), they can also compose with custom filters associated with that storage to get the entities desired with maximum efficiency.

The main issue with cloning the world is that it can get extremely expensive when you've got a lot of entities, and a lot of data to go with them. Even if it were as simple as a raw memcpy, you'd probably end up with each thread making one copy for itself. With a lot of data, you'd end up wasting a lot of time just copying it each frame. This would probably dwarf the cost of having a change list, walking it, and applying changes. I would strongly advocate taking a lock-based approach, since mutexes without contention are practically free. This does mean that whatever way we schedule processors, it should prioritize minimizing contention over data.

@kvark I do think you're right that having an implicitly fork-join execution model limits the space of execution graphs drastically. However, it also works simply to minimize contention over components, and the subset of execution graphs which it does allow are generally efficient. The alternate approach is what you said above: have a signal-based approach where processors wait to be "kickstarted" by their dependencies. This does mean that each processor needs to be aware of not only its dependencies, but also its successors. Can we easily encode this information without much boilerplate?

rphmeier commented Mar 14, 2016

@csherratt's approach sounds really similar to the basic "queries" I've been imagining while experimenting. I can try and work a little on the implementation, to see how feasible they really are. If user-specified storage of component data is allowed (e.g. positional data structures), they can also compose with custom filters associated with that storage to get the entities desired with maximum efficiency.

The main issue with cloning the world is that it can get extremely expensive when you've got a lot of entities, and a lot of data to go with them. Even if it were as simple as a raw memcpy, you'd probably end up with each thread making one copy for itself. With a lot of data, you'd end up wasting a lot of time just copying it each frame. This would probably dwarf the cost of having a change list, walking it, and applying changes. I would strongly advocate taking a lock-based approach, since mutexes without contention are practically free. This does mean that whatever way we schedule processors, it should prioritize minimizing contention over data.

@kvark I do think you're right that having an implicitly fork-join execution model limits the space of execution graphs drastically. However, it also works simply to minimize contention over components, and the subset of execution graphs which it does allow are generally efficient. The alternate approach is what you said above: have a signal-based approach where processors wait to be "kickstarted" by their dependencies. This does mean that each processor needs to be aware of not only its dependencies, but also its successors. Can we easily encode this information without much boilerplate?

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Mar 15, 2016

You could define a system based on it's inputs and what it modifies. The scheduler knowing this knowledge is useful. as @kvark has said.

So, where does this lead? If a system is defined by what it's input is, the system is pending on those inputs to be stable before it can process. If a system modifies one or more components it is both waiting on that component to be stable, and it is produce a new version of said components.

A Future is an exact match for this model. A system is run when all the components futures are ready, and when it is complete a number of futures are now marked as ready and other systems that depend on this component are run. If two system are run back to back that don't depend on output from one another they are run in parallel. If a system does depend on a component from another, it implicitly waits until an upstream system is finished running before it runs itself.

The ordering follows the order in software in the case of a conflict.

let mut world = World{};
loop {
  // run is a magical function that in practice would probably take more boiler plate then this.
  // I think it is possible to write, but have not attempted to yet.
  world = run(world /* world is moved here */, system_a);
  world = run(world, system_b);
  world = run(world, system_c);
  world = run(world, system_d);
}

@rphmeier about the copying, I said it would be copy-on-write. Meaning that when you do a insert or delete the data-structure only copies the nodes that are shared with a read-only copy. It's not free as the overhead is pushed to the writer (in the form of checks), but it is very convenient to work with since cloning is just bumping an atomic reference count.

I'm not a fan of locking. It's not because the locking itself is expensive. There are two problems

  1. Multiple writers wanting to update the same component have none-deterministic ordering.
  2. Writers wait behind readers, stalling their progress. If Foo is shared with 10 systems, If a system needs to modify Foo it needs to wait for every other with a readlock on Foo to release it before you can make progress. It just takes one system running slow to stall everything. With the futures only writers can stall readers which is still a problem, but there are less writers then readers.

ghost commented Mar 15, 2016

You could define a system based on it's inputs and what it modifies. The scheduler knowing this knowledge is useful. as @kvark has said.

So, where does this lead? If a system is defined by what it's input is, the system is pending on those inputs to be stable before it can process. If a system modifies one or more components it is both waiting on that component to be stable, and it is produce a new version of said components.

A Future is an exact match for this model. A system is run when all the components futures are ready, and when it is complete a number of futures are now marked as ready and other systems that depend on this component are run. If two system are run back to back that don't depend on output from one another they are run in parallel. If a system does depend on a component from another, it implicitly waits until an upstream system is finished running before it runs itself.

The ordering follows the order in software in the case of a conflict.

let mut world = World{};
loop {
  // run is a magical function that in practice would probably take more boiler plate then this.
  // I think it is possible to write, but have not attempted to yet.
  world = run(world /* world is moved here */, system_a);
  world = run(world, system_b);
  world = run(world, system_c);
  world = run(world, system_d);
}

@rphmeier about the copying, I said it would be copy-on-write. Meaning that when you do a insert or delete the data-structure only copies the nodes that are shared with a read-only copy. It's not free as the overhead is pushed to the writer (in the form of checks), but it is very convenient to work with since cloning is just bumping an atomic reference count.

I'm not a fan of locking. It's not because the locking itself is expensive. There are two problems

  1. Multiple writers wanting to update the same component have none-deterministic ordering.
  2. Writers wait behind readers, stalling their progress. If Foo is shared with 10 systems, If a system needs to modify Foo it needs to wait for every other with a readlock on Foo to release it before you can make progress. It just takes one system running slow to stall everything. With the futures only writers can stall readers which is still a problem, but there are less writers then readers.
@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 15, 2016

Member

@csherratt Are you thinking about the Future as a concept (based on Pulse), or usable as is from the standard library? So somewhere we'd have each component storage associated with a Future, which gets passed to the next system that reads/writes this component. If a system modifies it, it replaces the Future with the new one, so that the next system will need to wait. Do I understand your idea correctly?

I also realized that the scheduling problem can be very much simplified. Sorry if this is obvious for someone, but I feel it being important to mention explicitly - we don't have to implement a full-featured scheduler for the ECS systems. Instead, we follow the order in which user specifies the systems and components they modify. Each system either waits or executes in parallel, depending on whether or not the input components are locked for writing, and the output components are still being read.

This is very simple for the user: they don't have to think about it, but they can still fine-tune the execution order according to their specific requirements that the scheduler wouldn't know about anyway ;)

Going back to @csherratt 's run function. Why does it need to receive the world by the move? Also, I'd expect it to not be a freestanding function, and instead be called as scheduler.run(...).

So, given that the scheduling problem is addressed by relaxing the order and using Future for synchronization, I see only one big fat problem left: how do we specify the in/out components for systems such that this information is both passed at run-time to the scheduler, and 2) - directly enforces the system implementation to only work with these components (at compile time, going back to the trait System topic)?

Member

kvark commented Mar 15, 2016

@csherratt Are you thinking about the Future as a concept (based on Pulse), or usable as is from the standard library? So somewhere we'd have each component storage associated with a Future, which gets passed to the next system that reads/writes this component. If a system modifies it, it replaces the Future with the new one, so that the next system will need to wait. Do I understand your idea correctly?

I also realized that the scheduling problem can be very much simplified. Sorry if this is obvious for someone, but I feel it being important to mention explicitly - we don't have to implement a full-featured scheduler for the ECS systems. Instead, we follow the order in which user specifies the systems and components they modify. Each system either waits or executes in parallel, depending on whether or not the input components are locked for writing, and the output components are still being read.

This is very simple for the user: they don't have to think about it, but they can still fine-tune the execution order according to their specific requirements that the scheduler wouldn't know about anyway ;)

Going back to @csherratt 's run function. Why does it need to receive the world by the move? Also, I'd expect it to not be a freestanding function, and instead be called as scheduler.run(...).

So, given that the scheduling problem is addressed by relaxing the order and using Future for synchronization, I see only one big fat problem left: how do we specify the in/out components for systems such that this information is both passed at run-time to the scheduler, and 2) - directly enforces the system implementation to only work with these components (at compile time, going back to the trait System topic)?

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Mar 15, 2016

Yes, I was thinking of something like Future via Pulse specificity something I called a SharedFuture. Basically a future that gives a read-only view of the data once it is ready, so multiple people can wait on it.

@kvark there is no problem with passing in &mut World and having the system modify it by swapping the systems it cares about. Just posted an example of the first thing that came to mind.

The one tricky area I foresee is enumeration of entities that are being created. It might be good enough to just use a mutex protected object. We could do the same thing as the futures of course.

ghost commented Mar 15, 2016

Yes, I was thinking of something like Future via Pulse specificity something I called a SharedFuture. Basically a future that gives a read-only view of the data once it is ready, so multiple people can wait on it.

@kvark there is no problem with passing in &mut World and having the system modify it by swapping the systems it cares about. Just posted an example of the first thing that came to mind.

The one tricky area I foresee is enumeration of entities that are being created. It might be good enough to just use a mutex protected object. We could do the same thing as the futures of course.

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 21, 2016

Just coming into this to look at rust game ECS engines but I thought it might be interesting to speak of the multi-threaded ECS design I used in C++.

The 'world' (Context in mine).

An Entity is just a 64 bit integer, where the low-order 32-bits is the index and the high order 32-bits is a counter (incremented every time an index is reused to 'nullify' old entities and make them invalid).

A component, which is conceptually POD, although it was defined more of something like:

class PositionComponent: public Component<PositionComponent> {
    Position& getPosition(Entity entity);
   // etc...
}

As in internally it could store it via, say, a packed array if it is an often-used component (like the above Position would be), or store it via a treemap, hashmap, set (if just an empty component), etc.., whatever is efficient for its use. I even have a component that changes the way it stores based on how many exists. This does require a bit of extra coding over a usual POD component, but it is not bad and there are plenty of abstractions that can be done to shorten it. I've found a Rust ECS system someone made at https://github.com/HeroesGrave/ecs-rs and it has an interesting style of making components (defining as either 'hot' or 'cold' to define whether it is a packed array or a hashmap, not quite as flexible, but interesting style). However, these component stores also have other functionality for holding 'pending' changes (hence the required subclassing in this design). This will be more explained shortly.

And Systems, these are fairly normal, except they are non-mutating. During processing of systems, which are run completely parallel (every system can be run on a different thread, a system can even branch further and let batches of entities be processed on different threads, most systems can do this, few that require checking all watched entities generally cannot, but these are few). What they do is if a change needs to be made to a component, like there was an entity has a DamageComponent of some kind added to it and there is a DamageSystem that watches for entities that have it and a HealthComponent, what it will instead do to apply the change is do:

// Do tests to see if the DamageComponent will usefully apply damage to this HealthComponent based on other things, if not it will do a changeset of just remove
context.changeset(HealthComponent::And<DamageComponent>(), [=]() {
    const auto damage = DamageComponent.getDamage(entity); // Think of this as a Result type return in Rust, good to test that here since some other changeset before this one could have pre-deleted it before this changeset applies, even though it existed during system run.
    if(damage) {
        HealthComponent.alterHealth(damage.value); 
        DamageComponent.remove(entity);
    }
};

So while the systems are all running in parallel the changesets are accumulated per thread, organized into a tree based on the first argument passed into the context.changeset so that, for example, anything that accesses HealthComponent will be run serially. After the systems are finished running then the tree-per-thread is merged into a single tree that is then branched out on across threads to apply the changesets to the components. I currently do not start running the changesets until all systems are finished but I left it open that I could do that change later by starting to run changesets of components that have no remaining systems touching them. Like https://github.com/HeroesGrave/ecs-rs I also have something Aspect'ish to advise the context of required components and 'potentially touched' components by a given system, this is also what builds up the first argument to the context.changeset function.

I have not really optimized it at all and there are obvious places in code that I could do so, but I have yet to have a need, it saturates the threads pretty well as it is as each thread just churns through the system processing lists and when empty it gets another from a non-locking/non-blocking atomic queue. It does take some programming discipline (I.E. Do not mutate components within systems, only within changesets, but a better design couple have preventing such possibilities as well if I were to redesign it today, and it would be easier so from what I see of Rust too). And thus far it far exceeds entityx in speed from any of the simple benchmarks I've done.

OvermindDL1 commented Mar 21, 2016

Just coming into this to look at rust game ECS engines but I thought it might be interesting to speak of the multi-threaded ECS design I used in C++.

The 'world' (Context in mine).

An Entity is just a 64 bit integer, where the low-order 32-bits is the index and the high order 32-bits is a counter (incremented every time an index is reused to 'nullify' old entities and make them invalid).

A component, which is conceptually POD, although it was defined more of something like:

class PositionComponent: public Component<PositionComponent> {
    Position& getPosition(Entity entity);
   // etc...
}

As in internally it could store it via, say, a packed array if it is an often-used component (like the above Position would be), or store it via a treemap, hashmap, set (if just an empty component), etc.., whatever is efficient for its use. I even have a component that changes the way it stores based on how many exists. This does require a bit of extra coding over a usual POD component, but it is not bad and there are plenty of abstractions that can be done to shorten it. I've found a Rust ECS system someone made at https://github.com/HeroesGrave/ecs-rs and it has an interesting style of making components (defining as either 'hot' or 'cold' to define whether it is a packed array or a hashmap, not quite as flexible, but interesting style). However, these component stores also have other functionality for holding 'pending' changes (hence the required subclassing in this design). This will be more explained shortly.

And Systems, these are fairly normal, except they are non-mutating. During processing of systems, which are run completely parallel (every system can be run on a different thread, a system can even branch further and let batches of entities be processed on different threads, most systems can do this, few that require checking all watched entities generally cannot, but these are few). What they do is if a change needs to be made to a component, like there was an entity has a DamageComponent of some kind added to it and there is a DamageSystem that watches for entities that have it and a HealthComponent, what it will instead do to apply the change is do:

// Do tests to see if the DamageComponent will usefully apply damage to this HealthComponent based on other things, if not it will do a changeset of just remove
context.changeset(HealthComponent::And<DamageComponent>(), [=]() {
    const auto damage = DamageComponent.getDamage(entity); // Think of this as a Result type return in Rust, good to test that here since some other changeset before this one could have pre-deleted it before this changeset applies, even though it existed during system run.
    if(damage) {
        HealthComponent.alterHealth(damage.value); 
        DamageComponent.remove(entity);
    }
};

So while the systems are all running in parallel the changesets are accumulated per thread, organized into a tree based on the first argument passed into the context.changeset so that, for example, anything that accesses HealthComponent will be run serially. After the systems are finished running then the tree-per-thread is merged into a single tree that is then branched out on across threads to apply the changesets to the components. I currently do not start running the changesets until all systems are finished but I left it open that I could do that change later by starting to run changesets of components that have no remaining systems touching them. Like https://github.com/HeroesGrave/ecs-rs I also have something Aspect'ish to advise the context of required components and 'potentially touched' components by a given system, this is also what builds up the first argument to the context.changeset function.

I have not really optimized it at all and there are obvious places in code that I could do so, but I have yet to have a need, it saturates the threads pretty well as it is as each thread just churns through the system processing lists and when empty it gets another from a non-locking/non-blocking atomic queue. It does take some programming discipline (I.E. Do not mutate components within systems, only within changesets, but a better design couple have preventing such possibilities as well if I were to redesign it today, and it would be easier so from what I see of Rust too). And thus far it far exceeds entityx in speed from any of the simple benchmarks I've done.

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 21, 2016

Member

@OvermindDL1 thank you for sharing! So basically you got an ECS with:

  • generational recyclable entity IDs
  • changeset-based component updates, produced by fully parallel systems

I don't know if you read the previous discussion here, where I'm being an asshole opponent of the changeset-based system processing. My main gripe is how do you deal with increased latency? E.g. If component A depends on B, and B depends on C, and C on D, your system would only propagate changes from A to D in 3 update cycles.

Member

kvark commented Mar 21, 2016

@OvermindDL1 thank you for sharing! So basically you got an ECS with:

  • generational recyclable entity IDs
  • changeset-based component updates, produced by fully parallel systems

I don't know if you read the previous discussion here, where I'm being an asshole opponent of the changeset-based system processing. My main gripe is how do you deal with increased latency? E.g. If component A depends on B, and B depends on C, and C on D, your system would only propagate changes from A to D in 3 update cycles.

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 22, 2016

@kvark Oh please be an opponent, any bugs or design flaws must be talked about or they may not be resolved. :-)

Honestly I never experienced such an issue, but that may just be due to the way I designed the things on top of the ECS so it never came about. Could you give an example? If I have hard dependencies so that Component A depends on a Component B always existing then when something adds Component A an event is immediately fired (only in the transaction processing step) that tells stored Aspects about the added Component A, but also anything else listening to the event that Component A has been added to something, of which there are often passive systems that will add such dependencies immediately as so:

context.registerSystem(new DependencySystem<AComponent, BComponent>());

I use events, I am not really happy about events, but they solve a lot of issues very easily regardless of the bit of a hate about the design... I try to minimize their use in the 'Game Code', but they are used pretty heavily internally, especially with Aspect receiving updates entity lists (although there are optimizations on them to prevent indirect function calls on the event unlike more generic events).

OvermindDL1 commented Mar 22, 2016

@kvark Oh please be an opponent, any bugs or design flaws must be talked about or they may not be resolved. :-)

Honestly I never experienced such an issue, but that may just be due to the way I designed the things on top of the ECS so it never came about. Could you give an example? If I have hard dependencies so that Component A depends on a Component B always existing then when something adds Component A an event is immediately fired (only in the transaction processing step) that tells stored Aspects about the added Component A, but also anything else listening to the event that Component A has been added to something, of which there are often passive systems that will add such dependencies immediately as so:

context.registerSystem(new DependencySystem<AComponent, BComponent>());

I use events, I am not really happy about events, but they solve a lot of issues very easily regardless of the bit of a hate about the design... I try to minimize their use in the 'Game Code', but they are used pretty heavily internally, especially with Aspect receiving updates entity lists (although there are optimizations on them to prevent indirect function calls on the event unlike more generic events).

@Oflor

This comment has been minimized.

Show comment
Hide comment
@Oflor

Oflor Mar 22, 2016

Contributor

@OvermindDL1 Why do you need the recycling part of the entity id? If we make the id a plain u64 which increments with every created entity, we don't need to store dead ids to recycle and the id creation is an easier process.

Contributor

Oflor commented Mar 22, 2016

@OvermindDL1 Why do you need the recycling part of the entity id? If we make the id a plain u64 which increments with every created entity, we don't need to store dead ids to recycle and the id creation is an easier process.

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 22, 2016

Member

@Oflor you are thinking too much in terms of HashMap-type access. Having the generation ID allows one to have linear access (as in Vec) to components, which is a big fat performance win for processing (as I already mentioned here).

Member

kvark commented Mar 22, 2016

@Oflor you are thinking too much in terms of HashMap-type access. Having the generation ID allows one to have linear access (as in Vec) to components, which is a big fat performance win for processing (as I already mentioned here).

@White-Oak

This comment has been minimized.

Show comment
Hide comment
@White-Oak

White-Oak Mar 22, 2016

Contributor

@kvark once again I don't believe in ID recycling in deterministic engine.

Contributor

White-Oak commented Mar 22, 2016

@kvark once again I don't believe in ID recycling in deterministic engine.

@Oflor

This comment has been minimized.

Show comment
Hide comment
@Oflor

Oflor Mar 22, 2016

Contributor

@kvark (generation, id) pair is a simplified Entity { id: u32, vec_idx: u32 } struct, which doesn't really follow the "Entity = ID" approach. There are other ways to implement fast linear access than this gen+id.

Contributor

Oflor commented Mar 22, 2016

@kvark (generation, id) pair is a simplified Entity { id: u32, vec_idx: u32 } struct, which doesn't really follow the "Entity = ID" approach. There are other ways to implement fast linear access than this gen+id.

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 22, 2016

Member

Please remind me what does ID recycling has to do with determinism? From the system's point of view a recycled ID is as new as a non-recycled one. The semantic is the same, the difference is only the implementation.

On Mar 22, 2016, at 7:52, Oak notifications@github.com wrote:

@kvark once again I don't believe in ID recycling in deterministic engine.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

Member

kvark commented Mar 22, 2016

Please remind me what does ID recycling has to do with determinism? From the system's point of view a recycled ID is as new as a non-recycled one. The semantic is the same, the difference is only the implementation.

On Mar 22, 2016, at 7:52, Oak notifications@github.com wrote:

@kvark once again I don't believe in ID recycling in deterministic engine.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@White-Oak

This comment has been minimized.

Show comment
Hide comment
@White-Oak

White-Oak Mar 22, 2016

Contributor

@kvark yeah, you're right, sorry. I was thinking about network races, but it is the same even w/o recycling, so it is not the problem.

Contributor

White-Oak commented Mar 22, 2016

@kvark yeah, you're right, sorry. I was thinking about network races, but it is the same even w/o recycling, so it is not the problem.

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 22, 2016

Member

@Oflor Don't take "Entity = ID" approach as a dogma. The idea is that an entity can be just a simple identifier of a collection of components, as opposed to actually containing the component data or even storing separate identifiers for components (which is the approach I took initially, still not figured why it's bad). In that sense, (Id, Generation) is a perfect identifier - it's small, hashable, comparable, etc.

Member

kvark commented Mar 22, 2016

@Oflor Don't take "Entity = ID" approach as a dogma. The idea is that an entity can be just a simple identifier of a collection of components, as opposed to actually containing the component data or even storing separate identifiers for components (which is the approach I took initially, still not figured why it's bad). In that sense, (Id, Generation) is a perfect identifier - it's small, hashable, comparable, etc.

@doppioslash

This comment has been minimized.

Show comment
Hide comment
@doppioslash

doppioslash Mar 25, 2016

@OvermindDL1 cool project! :)
Is it open source anywhere?

doppioslash commented Mar 25, 2016

@OvermindDL1 cool project! :)
Is it open source anywhere?

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 25, 2016

@doppioslash On my old svn server (pre-git-even-existing), which is probably still running now that I think about it... though sadly not public, perhaps once I take the time to get everything off of it someday. Was a fun project though. Not really ECS as I would define it 'now', and the server was not at all.

Basically in the Erlang server-side each 'entity' was an Erlang process (for those not familiar with Actor design a 'Process' is not an OS process but rather can be thought of as a very very lightweight green-thread, has its own polling loop and spends most of its time frozen until it receives a message, even if it wants to be awoken in, say, 50 milliseconds or every 50 milliseconds or whatever it sends a message to the Time system to have it send a message back to it of a 'timeout' (with a custom data ID so you can have multiple timeouts pending and distinguish them)). An 'Entity' process was built from a little data description that defined other Erlang modules with data for them to start with (kind of a unity'ish fake ECS instead of actual ECS now that I think about it...) to callback into those modules, and they could register their own message handlers for incoming messages that were delegated out from the main loop. They would send messages out to other things that were listening (such as an AI, events, a player socket, etc...) to tell about updates, so nothing knew anything up to date about anything else without registering to receive updates (or sending a message asking for whatever of the active state to be sent back).

Theoretically such a server could scale 'infinitely', so would be good for an MMO or so I guess, if you had more servers to distribute the load.

The client however would connect to a server, receive a bulk load of versioned LUA, that LUA would load, register message handlers (such as something nearby enough for the player to start getting updates from), the LUA would handle interpolation, local GUI, etc. Not really fully featured, but it was fun to basically just program/mod the server-side and the client was just always up-to-date and would see changes in real-time as I would hotload code into the Erlang system, so it was fun to show off with. :-)

I did everything in Erlang on the server, even things that I should have sent off to a C++ process (Erlang is great at binding with languages in a variety of ways, a fantastic glue language), so things like the procedural level generations could be slowish, but that is easily fixable by just slaving the work out to the correct languages if I continued such a project.

OvermindDL1 commented Mar 25, 2016

@doppioslash On my old svn server (pre-git-even-existing), which is probably still running now that I think about it... though sadly not public, perhaps once I take the time to get everything off of it someday. Was a fun project though. Not really ECS as I would define it 'now', and the server was not at all.

Basically in the Erlang server-side each 'entity' was an Erlang process (for those not familiar with Actor design a 'Process' is not an OS process but rather can be thought of as a very very lightweight green-thread, has its own polling loop and spends most of its time frozen until it receives a message, even if it wants to be awoken in, say, 50 milliseconds or every 50 milliseconds or whatever it sends a message to the Time system to have it send a message back to it of a 'timeout' (with a custom data ID so you can have multiple timeouts pending and distinguish them)). An 'Entity' process was built from a little data description that defined other Erlang modules with data for them to start with (kind of a unity'ish fake ECS instead of actual ECS now that I think about it...) to callback into those modules, and they could register their own message handlers for incoming messages that were delegated out from the main loop. They would send messages out to other things that were listening (such as an AI, events, a player socket, etc...) to tell about updates, so nothing knew anything up to date about anything else without registering to receive updates (or sending a message asking for whatever of the active state to be sent back).

Theoretically such a server could scale 'infinitely', so would be good for an MMO or so I guess, if you had more servers to distribute the load.

The client however would connect to a server, receive a bulk load of versioned LUA, that LUA would load, register message handlers (such as something nearby enough for the player to start getting updates from), the LUA would handle interpolation, local GUI, etc. Not really fully featured, but it was fun to basically just program/mod the server-side and the client was just always up-to-date and would see changes in real-time as I would hotload code into the Erlang system, so it was fun to show off with. :-)

I did everything in Erlang on the server, even things that I should have sent off to a C++ process (Erlang is great at binding with languages in a variety of ways, a fantastic glue language), so things like the procedural level generations could be slowish, but that is easily fixable by just slaving the work out to the correct languages if I continued such a project.

@doppioslash

This comment has been minimized.

Show comment
Hide comment
@doppioslash

doppioslash Mar 25, 2016

@OvermindDL1 hot loading always makes for a good party trick :)
Sounds very interesting, if you dig up the code do give me a shout.
I'm doing FRP with Elm, and doing some Erlang as well, I think it would be very interesting to figure out a way to have an FRP, actor-based Entity Component system.
I think Rust could be very helpful in interop with Erlang, or even a BEAM reimplementation, maybe.

doppioslash commented Mar 25, 2016

@OvermindDL1 hot loading always makes for a good party trick :)
Sounds very interesting, if you dig up the code do give me a shout.
I'm doing FRP with Elm, and doing some Erlang as well, I think it would be very interesting to figure out a way to have an FRP, actor-based Entity Component system.
I think Rust could be very helpful in interop with Erlang, or even a BEAM reimplementation, maybe.

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 25, 2016

Honestly I really wish the BEAM was rebuilt with LLVM. There was a little project started by a couple once but it never went anywhere. BEAM is blazing fast for message passing, faster than any other language that I have ever tested for the reliability that BEAM allows for, however the thing it is slow with is math and such similar things, the things that could easily be fixed by an LLVM backend especially with how the memory is typed in the Erlang backend and how you can infer a lot of typed information at compilation, though adding types to Erlang (not just dialyzer) would be awesome, haskall'ish perhaps but not so pure. Rust interop would work well with Erlang, but really Erlang can interop with any language well, however Rust, like C and C++ would be capable of embedding code 'inside' the BEAM for max speed if you state your code is uncrashable enough for that (as that can crash BEAM and everything in it, it may be fast, but not safe).

I've been looking at Elm, nice web language, wish it bound well with Elixir, they are two sides of the same coin and would work so fantastically together!

OvermindDL1 commented Mar 25, 2016

Honestly I really wish the BEAM was rebuilt with LLVM. There was a little project started by a couple once but it never went anywhere. BEAM is blazing fast for message passing, faster than any other language that I have ever tested for the reliability that BEAM allows for, however the thing it is slow with is math and such similar things, the things that could easily be fixed by an LLVM backend especially with how the memory is typed in the Erlang backend and how you can infer a lot of typed information at compilation, though adding types to Erlang (not just dialyzer) would be awesome, haskall'ish perhaps but not so pure. Rust interop would work well with Erlang, but really Erlang can interop with any language well, however Rust, like C and C++ would be capable of embedding code 'inside' the BEAM for max speed if you state your code is uncrashable enough for that (as that can crash BEAM and everything in it, it may be fast, but not safe).

I've been looking at Elm, nice web language, wish it bound well with Elixir, they are two sides of the same coin and would work so fantastically together!

@doppioslash

This comment has been minimized.

Show comment
Hide comment
@doppioslash

doppioslash Mar 25, 2016

Yeah, NIF use is certainly possible.
Apparently that was merged in OTP at one point erlang/otp@9d46875
At the SF Erlang Factory there was a joint talk about Elm&Elixir with Elm's author, and the community survey showed that Elixir is the favourite backend language used with Elm.
Elm is just on js now, but the idea is to keep it a language that can be compiled to other VMs.

doppioslash commented Mar 25, 2016

Yeah, NIF use is certainly possible.
Apparently that was merged in OTP at one point erlang/otp@9d46875
At the SF Erlang Factory there was a joint talk about Elm&Elixir with Elm's author, and the community survey showed that Elixir is the favourite backend language used with Elm.
Elm is just on js now, but the idea is to keep it a language that can be compiled to other VMs.

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 25, 2016

@doppioslash Huh, fascinating, did not know that they ever actually put anything like that in, though it is a bit past my time (my big Erlang learning heyday was during version 13-14, so a while ago, ever since it has just been programming in it or Elixir once it came out instead of staying more up-to-date as I should be...).

Heh, I could see that, Elixir and Elm would work very well together, it would be very possible to have them compiled via the same mix ecosystem too! Or maybe combine Elm and Elixir into some super TypedElixir? ^.^

/me loves types, that is the big irritant with Erlang/Elixir...

OvermindDL1 commented Mar 25, 2016

@doppioslash Huh, fascinating, did not know that they ever actually put anything like that in, though it is a bit past my time (my big Erlang learning heyday was during version 13-14, so a while ago, ever since it has just been programming in it or Elixir once it came out instead of staying more up-to-date as I should be...).

Heh, I could see that, Elixir and Elm would work very well together, it would be very possible to have them compiled via the same mix ecosystem too! Or maybe combine Elm and Elixir into some super TypedElixir? ^.^

/me loves types, that is the big irritant with Erlang/Elixir...

@ebkalderon

This comment has been minimized.

Show comment
Hide comment
@ebkalderon

ebkalderon Mar 25, 2016

Member

@doppioslash @OvermindDL1 Just a heads-up, please remain on topic. This thread is crowded enough as it is. I agree with @kvark that something needs to be done so everyone's words don't get drowned out by all the noise.

Member

ebkalderon commented Mar 25, 2016

@doppioslash @OvermindDL1 Just a heads-up, please remain on topic. This thread is crowded enough as it is. I agree with @kvark that something needs to be done so everyone's words don't get drowned out by all the noise.

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 25, 2016

@ebkalderon I think it is on topic actually, another idea popped up based on the prior. What about an ECS running inside Actor programming Actors/Processes? They have tightly packed memory, everything communicates via message passing, could even go far enough so that the rendering parts are disjoint from them and they are not aware of even their location without asking something, like real-life objects, could be fascinating... and scale very very well.

OvermindDL1 commented Mar 25, 2016

@ebkalderon I think it is on topic actually, another idea popped up based on the prior. What about an ECS running inside Actor programming Actors/Processes? They have tightly packed memory, everything communicates via message passing, could even go far enough so that the rendering parts are disjoint from them and they are not aware of even their location without asking something, like real-life objects, could be fascinating... and scale very very well.

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 25, 2016

Member

@OvermindDL1 With me being not familiar with Actor/Processes model, could you give some pointers, like existing solutions with this architecture, or some introductory stuff to see what you have in mind?

Member

kvark commented Mar 25, 2016

@OvermindDL1 With me being not familiar with Actor/Processes model, could you give some pointers, like existing solutions with this architecture, or some introductory stuff to see what you have in mind?

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 25, 2016

The link @doppioslash gave earlier of https://www.youtube.com/watch?v=eHTePEtLV-c is a game system built on Actor programming (pure message passing and such). But Actor Oriented Programming, a brief history:

Long ago a style of programming was thought up, and it was called Object Oriented Programming as it completely hid all data within Objects, only exposing communication points, message areas. Shortly after C++, LISP, and others made a programming model that used objects that exposed data via methods and called it Object Oriented Programming, the original definition was mostly forgotten for a decade before revived with the new name of Actor Oriented Programming. There were a few languages built with it, none truly high performing except maybe the LISP version, but Erlang is the model of it in modern times (that language is only about 25 years old give-or-take a few years).

Actor Oriented Programming is modeled by a set of distinct no-share 'processes', each with their own event/message loop, in a mostly functional pseudo-language see this:

def doSomething(blueprintID) 
  doSomething(blueprintID, load(blueprint))
end

def doSomething(blueprintID, state)
    receive
        {:save, pid} -> pid ! {:state, self(), blueprintID, state};
        # Handle other specific messages, usually would have these in a parent 'behaviour'
        msg -> handleMsgToSelfECS(state, msg)
    after # Can omit this if not wanted, '10' can also be a variable
        10 -> doSomethingOnTimeout(state) # if no message received in this period then do something else
    end
    doSomething(blueprintID, state)
end

You can have more than one receive block too, such as when an internal system receives a message it could 'receive' a very specific one that it is waiting on, maybe more information or something, ignoring any others in the 'mailbox' for now until the above receive processes it again and catches all messages via the 'msg' generic capture.

You could spawn, say, many of these actors via:

1 .. 10 |> map(spawn &doSomething/1)

spawn is a very low level thing in erlang/elixir, you usually build up supervision trees instead, which should always be done, it makes the system extremely reliable and fault-tolerant, it is the 'OTP Principles' in Elixir/Erlang design, makes for an amazingly fault-tolerant and scaleable system.

But in general imagine an Actor being like an OS process, completely distinct memory, (it technically has mutable memory internally to it, but the language does not expose that except via a couple of ways, immutable otherwise), copying messages (sounds slow, but pay the cost then you can scale arbitrarily).

It was designed after real life, each Actor is an Object that is distinct, cannot access the memory or methods of anything else, they communicate via messages (like a sound taking a second to reach your ears, even light takes time in real life), and the messages are not immediate, they can take a few milliseconds, a few seconds even if the receiving actor is on the other side of the world, and it may even not reach it (though Erlang makes assurances that a message will reach the receiver as long as the receiver is alive, though it could die between when you sent the message and it received it). Internally you could easily have optimizations (as Erlang does many of) to enhance various things, but the programming layer on top for the 'users' to use should always be exposed as Actors.

This is a wonderful programming pattern that Erlang/Elixir have quite well solved at this point, you should take everything from Erlang/Elixir that you can, rock solid systems for decades now. I can go into great detail about the OTP patterns, Actor abstractions, efficient Messaging and reliability if you are curious, but it did make for a very nice to program in game world.

OvermindDL1 commented Mar 25, 2016

The link @doppioslash gave earlier of https://www.youtube.com/watch?v=eHTePEtLV-c is a game system built on Actor programming (pure message passing and such). But Actor Oriented Programming, a brief history:

Long ago a style of programming was thought up, and it was called Object Oriented Programming as it completely hid all data within Objects, only exposing communication points, message areas. Shortly after C++, LISP, and others made a programming model that used objects that exposed data via methods and called it Object Oriented Programming, the original definition was mostly forgotten for a decade before revived with the new name of Actor Oriented Programming. There were a few languages built with it, none truly high performing except maybe the LISP version, but Erlang is the model of it in modern times (that language is only about 25 years old give-or-take a few years).

Actor Oriented Programming is modeled by a set of distinct no-share 'processes', each with their own event/message loop, in a mostly functional pseudo-language see this:

def doSomething(blueprintID) 
  doSomething(blueprintID, load(blueprint))
end

def doSomething(blueprintID, state)
    receive
        {:save, pid} -> pid ! {:state, self(), blueprintID, state};
        # Handle other specific messages, usually would have these in a parent 'behaviour'
        msg -> handleMsgToSelfECS(state, msg)
    after # Can omit this if not wanted, '10' can also be a variable
        10 -> doSomethingOnTimeout(state) # if no message received in this period then do something else
    end
    doSomething(blueprintID, state)
end

You can have more than one receive block too, such as when an internal system receives a message it could 'receive' a very specific one that it is waiting on, maybe more information or something, ignoring any others in the 'mailbox' for now until the above receive processes it again and catches all messages via the 'msg' generic capture.

You could spawn, say, many of these actors via:

1 .. 10 |> map(spawn &doSomething/1)

spawn is a very low level thing in erlang/elixir, you usually build up supervision trees instead, which should always be done, it makes the system extremely reliable and fault-tolerant, it is the 'OTP Principles' in Elixir/Erlang design, makes for an amazingly fault-tolerant and scaleable system.

But in general imagine an Actor being like an OS process, completely distinct memory, (it technically has mutable memory internally to it, but the language does not expose that except via a couple of ways, immutable otherwise), copying messages (sounds slow, but pay the cost then you can scale arbitrarily).

It was designed after real life, each Actor is an Object that is distinct, cannot access the memory or methods of anything else, they communicate via messages (like a sound taking a second to reach your ears, even light takes time in real life), and the messages are not immediate, they can take a few milliseconds, a few seconds even if the receiving actor is on the other side of the world, and it may even not reach it (though Erlang makes assurances that a message will reach the receiver as long as the receiver is alive, though it could die between when you sent the message and it received it). Internally you could easily have optimizations (as Erlang does many of) to enhance various things, but the programming layer on top for the 'users' to use should always be exposed as Actors.

This is a wonderful programming pattern that Erlang/Elixir have quite well solved at this point, you should take everything from Erlang/Elixir that you can, rock solid systems for decades now. I can go into great detail about the OTP patterns, Actor abstractions, efficient Messaging and reliability if you are curious, but it did make for a very nice to program in game world.

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 25, 2016

For note, Actor programming though nice is not terribly useful on its own, the OTP patterns that have built up around it is what makes it amazing (hence why I linked to Erlangs OTP Principles, I can link to far more if you are curious, especially using a better language than the Prolog'y Erlang).

OvermindDL1 commented Mar 25, 2016

For note, Actor programming though nice is not terribly useful on its own, the OTP patterns that have built up around it is what makes it amazing (hence why I linked to Erlangs OTP Principles, I can link to far more if you are curious, especially using a better language than the Prolog'y Erlang).

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 28, 2016

Member

@OvermindDL1 and @HeroesGrave - thank you for the feedback on the gist!
@OvermindDL1 thanks for describing the actor paradigm. What I struggle to understand is how it may relate to ECS. Actors seem to be a fairly high-level abstraction with non-trivial implementation. ECS we are targeting should be lightweight and close to the metal.

Member

kvark commented Mar 28, 2016

@OvermindDL1 and @HeroesGrave - thank you for the feedback on the gist!
@OvermindDL1 thanks for describing the actor paradigm. What I struggle to understand is how it may relate to ECS. Actors seem to be a fairly high-level abstraction with non-trivial implementation. ECS we are targeting should be lightweight and close to the metal.

@doppioslash

This comment has been minimized.

Show comment
Hide comment
@doppioslash

doppioslash Mar 28, 2016

@kvark Actors are about getting as parallel and distributed as possible. The principles to achieve maximum parallelisation are founding to Erlang and OTP.

"Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang."

Basically following those principles is a way to get code to naturally run on as many cpus as you have available. Some of them are already compatible with Rust's principles, like immutability by default.
It may not be trivial to devise an Entity Component system compatible to those principles, but it's likely to give to whoever manages that, big parallelisation advantages compared to everyone else.
The difficulty after you have Actors is synchronising them, which is what that Virding's talk I linked is about.

doppioslash commented Mar 28, 2016

@kvark Actors are about getting as parallel and distributed as possible. The principles to achieve maximum parallelisation are founding to Erlang and OTP.

"Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang."

Basically following those principles is a way to get code to naturally run on as many cpus as you have available. Some of them are already compatible with Rust's principles, like immutability by default.
It may not be trivial to devise an Entity Component system compatible to those principles, but it's likely to give to whoever manages that, big parallelisation advantages compared to everyone else.
The difficulty after you have Actors is synchronising them, which is what that Virding's talk I linked is about.

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 28, 2016

Member

@doppioslash so if we try to categorize the actors approach into the ECS wiki, will it go as "message passing" solution for the component updates? Or somewhere else?

Member

kvark commented Mar 28, 2016

@doppioslash so if we try to categorize the actors approach into the ECS wiki, will it go as "message passing" solution for the component updates? Or somewhere else?

@doppioslash

This comment has been minimized.

Show comment
Hide comment
@doppioslash

doppioslash Mar 28, 2016

@kvark I think "message passing" is correct.
What do you think, @OvermindDL1 ?

doppioslash commented Mar 28, 2016

@kvark I think "message passing" is correct.
What do you think, @OvermindDL1 ?

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 28, 2016

@kvark @doppioslash Correct, ECS and Actor are two different paradigms, however they can be used in tandem, they are not incompatible. Specifically what I would do is have an Actor system for all Entities, but have the Entities themselves manage their own components. By keeping the components internally then you can have the actors process things themselves, fully async, and you can pack them with the entity definition in memory as well (which is what Erlang does for an Actor, it allows the Erlang garbage collector to be blazing fast as it either needs to collect within a single Actor (very rare as it holds everything on the 'stack' so as functions are popped or TCO is used then things are just unseen even though still exist, if the GC runs it is generally only because the Actor got a very deep stack and is not using it so deep anymore so it will free up some of the tail memory, which you can force via the hibernate command to fully compact the Actor), or the GC 'runs' when an Actor dies by just freeing up the entire Actor memory (technically holding on to it to allocate another Actor within). The Erlang VM (BEAM) is blazing fast for Actors due to its design (which I know quite well in the JIT HiPE level so I can describe anything if you need), and I've yet to see a language implement it so it runs as fast or as safe, however the BEAM is fully dynamic, I.E. not typed, so that harms its efficiency (about on par with LUA for math, not great, but not bad, just not 'C' speed, although its interaction points of PORTS, NIFs, and others make up for that for needed areas) but the overall programming design is a great thing to model, and based on what I have been learning about Rust I think an Actor model could actually be implemented in it properly unlike in Java and C++ where the holes utterly cause it to fail in the corner cases.

'Within' the Actor you would allocate on its 'stack' the component data, it could have a cache on that same Actor 'stack' of which systems need to be run within it. Sadly this means that the function calls are inverted if the Actor itself runs it, however you could invert it internally so that the systems run on the internal Actor data. If the Actors remain tightly packed (say an on-created component allocation internally with externally linked dynamic components, or rebuild it on the fly, whichever makes more sense) than you could still get most of the cache efficiency as well, though honestly I would say do not even bother with this pre-optimization, the point of Actors is to run comparatively rarely by only responding to messages, just have them send messages. Basically say you have an Actor, it does not know where it is, it does not know its position, the Actor model is modeled after real life, if you are floating in space you have no clue where you are, not even on earth, you have to use reference points, you see the light coming from the reflected planetary surface of Earth to see your location, same with Actors, they do not hold their position themselves, they would query (send a message to) the 'World' that they are in (an octree perhaps, or however it wants to partition them internally, maybe it queries the physics engine for where it itself is). They only hold state data that is internal to them, their Position is not internal data, just like it is not to you, it is all relative, you need to figure out your location based on prior knowledge, looking at your surroundings, etc.. The nice thing about this design is say you have a game server, your Actors could, for example, be running across a whole bank of servers. You do have to design the game a little different, you have no such thing as lockstep (which I personally think is an anti-design anyway), messages take non-trivial time (usually microseconds, practically immeasurable, but the message will be received when the receiver is next scheduled, might be immediate, might be a second away if it is on a busy server across the world), and it is very easy to program for once you learn it, but it has such great advantages.

So for this, probably 'message passing' is the most accurate, but still not accurate, as the internal ECS system could still be any of the others as they run 'internally to' an Actor, but the Actors communicate via messages.

I just got home after a long day so I hope the above is understandable, I think I need a nap. ^.^

OvermindDL1 commented Mar 28, 2016

@kvark @doppioslash Correct, ECS and Actor are two different paradigms, however they can be used in tandem, they are not incompatible. Specifically what I would do is have an Actor system for all Entities, but have the Entities themselves manage their own components. By keeping the components internally then you can have the actors process things themselves, fully async, and you can pack them with the entity definition in memory as well (which is what Erlang does for an Actor, it allows the Erlang garbage collector to be blazing fast as it either needs to collect within a single Actor (very rare as it holds everything on the 'stack' so as functions are popped or TCO is used then things are just unseen even though still exist, if the GC runs it is generally only because the Actor got a very deep stack and is not using it so deep anymore so it will free up some of the tail memory, which you can force via the hibernate command to fully compact the Actor), or the GC 'runs' when an Actor dies by just freeing up the entire Actor memory (technically holding on to it to allocate another Actor within). The Erlang VM (BEAM) is blazing fast for Actors due to its design (which I know quite well in the JIT HiPE level so I can describe anything if you need), and I've yet to see a language implement it so it runs as fast or as safe, however the BEAM is fully dynamic, I.E. not typed, so that harms its efficiency (about on par with LUA for math, not great, but not bad, just not 'C' speed, although its interaction points of PORTS, NIFs, and others make up for that for needed areas) but the overall programming design is a great thing to model, and based on what I have been learning about Rust I think an Actor model could actually be implemented in it properly unlike in Java and C++ where the holes utterly cause it to fail in the corner cases.

'Within' the Actor you would allocate on its 'stack' the component data, it could have a cache on that same Actor 'stack' of which systems need to be run within it. Sadly this means that the function calls are inverted if the Actor itself runs it, however you could invert it internally so that the systems run on the internal Actor data. If the Actors remain tightly packed (say an on-created component allocation internally with externally linked dynamic components, or rebuild it on the fly, whichever makes more sense) than you could still get most of the cache efficiency as well, though honestly I would say do not even bother with this pre-optimization, the point of Actors is to run comparatively rarely by only responding to messages, just have them send messages. Basically say you have an Actor, it does not know where it is, it does not know its position, the Actor model is modeled after real life, if you are floating in space you have no clue where you are, not even on earth, you have to use reference points, you see the light coming from the reflected planetary surface of Earth to see your location, same with Actors, they do not hold their position themselves, they would query (send a message to) the 'World' that they are in (an octree perhaps, or however it wants to partition them internally, maybe it queries the physics engine for where it itself is). They only hold state data that is internal to them, their Position is not internal data, just like it is not to you, it is all relative, you need to figure out your location based on prior knowledge, looking at your surroundings, etc.. The nice thing about this design is say you have a game server, your Actors could, for example, be running across a whole bank of servers. You do have to design the game a little different, you have no such thing as lockstep (which I personally think is an anti-design anyway), messages take non-trivial time (usually microseconds, practically immeasurable, but the message will be received when the receiver is next scheduled, might be immediate, might be a second away if it is on a busy server across the world), and it is very easy to program for once you learn it, but it has such great advantages.

So for this, probably 'message passing' is the most accurate, but still not accurate, as the internal ECS system could still be any of the others as they run 'internally to' an Actor, but the Actors communicate via messages.

I just got home after a long day so I hope the above is understandable, I think I need a nap. ^.^

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 28, 2016

I do want to make clear, imagine an Actor being implemented underneath via something kind of like this (in pseudo-C++):

struct ActorData {
    ActorPid id;
    AtomicQueue messageStack;
    GrowableStack stack;
}

Erlangs have some (a lot of) extra fields for optimization purposes, but that is the basics of it.

The ActorPid would be say a struct of 2 64 bit integers, where there is the 'node' id and the 'self' id, the node being the machine ID that it is running on and the self id being the actual ID of the node, in Erlang they work kind of like the Handles I mentioned previously, the index ID's are reused but there is a counter of use so it cannot be re-accessed again.

If you send a message to an Actor pid then it appends the message onto the AtomicQueue end and is done with it. Every Erlang function takes an internal hidden pointer to its position in its stack, which is just untyped memory, the variables in Erlang are untyped, but the memory itself is typed, so doing something like anInt = 42 makes a name (not really a variable, it is immutable, just points to a point in its stack, at the end of the 'active' part when the name was defined, unless the JIT optimizes it out, which it often does), but the memory it points to is typed, an integer like the above will be a 32bit/64bit (depending on platform width) with the first few bits defining the type (only a few built-in types, sorted in such a way for efficiency, like a 0 integer is actually a fully 32/64bit 0 integer in memory). Obviously in Rust I would not recommend that as it is actually typed and that provides a great boon.

So the scheduler is running (one per CPU core) and it sees that the Actor that received the message has a non-empty message queue, so it then calls the frozen function (in Erlang a receive call) and the Actor grabs a message on the front of the stack and processes it until the Actor pauses again by calling receive. The OTP framework wraps up the raw receive calls (which the JIT splits up into distinct function calls internally anyway) into a thing with callback, the most basic of which is the GenServer, which defines an 'init' callback (on Actor creation, so it can setup initial state or load itself from a database or whatever), a set of handle_call methods (which are called when a message is received that wants a return message), a set of handle_cast methods (which are called with a message but does not send a reply, or it can do it later if it wants), and a set of handle_info methods (which handle messages that were passed to the Actor not via the OTP framework (which is just a wrapper message around the base message, the OTP style is very simple, yet very powerful), as well as a terminate method (which is called when one of the other functions request its death). As Erlang/BEAM/Elixir is functional and immutable it carries around the state in the argument and returns the mutated state (which will then be passed in other methods later), and if it wants to die it returns a value to stop itself, which then calls the terminate.

If an Actor terminates uncleanly (crashes, divides by zero, whatever) the terminate will not be called but the Actor will be outright killed (all others keep going) and the Supervisor that it was created under will see the reason of why it died and usually recreate it if you have it do so, if too many die too fast then the Supervisor will (if your settings tell it to) then kill itself too, also killing all Actors under its supervision, to where 'its' Supervisor can recreate it and it can recreate the others with fresh good state from the DB or whatever. That tree of Supervisors that go down to basic GenServers (which have other things built on them like GenEvents, GenFSM's, etc...) defines the Supervision Tree of the entire Application (of which there can be multiple running Applications within an entire project). When a Supervisor creates an Actor it can create it locally on the same Node (by default), on another Node, ephemerally, or in a few other ways, but it knows about them, watches them, sets up Monitors with the Schedulers so it knows when they die (or an entire Node dies, like hardware failure), etc...

Basically, a simple Elixir Actor without OTP would be (:something is an Atom in Elixir, like my Atoms in C++ though a bit more powerful, think of them as a global Enum for quick integral matching):

def getOtherThing():
  receive
    {:wait, anything} -> # wait for anything and return it!
      anything
  after 5000 -> # Nothing received after 5 seconds, how sad...  I could leave this out to wait forever...
    :nothing
  end
end

def myLoop(num):
  receive
    {:get, toPid} -> # Someone requested what my number is, lets tell them!
      toPid ! {:theNum, num} # Sending them a message of a tuple with the atom :theNum and my number
      myLoop(num) # Tail call, no stack growth

    {:set, newNum} -> # Someone wants to set my number, sure!
      myLoop(newNum) # Tail call, no stack growth

    {:set_get_old, toPid, newNum} -> # Someone wants to set my number and get the old one sent back to them, sure!
      toPid ! {:theNum, num}
      myLoop(newNum) # Tail call, no stack growth

    :wait_for_another_thing -> # Ooo, someone says they will send me something else, then ignore it
      getOtherThing() # This has an embedded receive that will return whatever the next message receive of a :wait type
      myLoop(num) # Tail call, no stack growth

    :stop -> # Aww, someone wants me to die...  Oh well...  :-(
      # just return, do nothing... you have to return 'something' in Elixir, so I'll just do the atom :stop
      :stop # Tail, exiting, this Actor will die

  after 1000 ->
    IO.puts "Nothing happened for a second, I'm bored..." # The IO system can even redirect output if you want to another Node, say you connected a shell into the system to introspect it
    myLoop(num) # Tail Call, no stack growth
end

You could use it like:

boop = spawn myLoop(5) # Spawn a new Actor with the initial arg of 5, returns the PID
boop ! {:get, self()} # Sends a message to boop, boop should respond fairly immediately, can dump all pending messages on self() in the shell by doing:
flush() # returns the number that boop sent back to us
boop ! {:set, 10} # Have boop use the number 10 now!
boop ! :erghblergh # boop will now have this message stuck in its message inbox since there is never a receive that handles any message, should have added a case to receive to handle any message and log and dump it, OTP does for you!
boop ! :stop # boop will now die when it processes this message a few microseconds later
boop ! :anything # This goes no where, you get no respond, no confirmation, if I set up a Monitor on boop then I would have got a Monitor :down message when boop died

With OTP the above as a GenServer it would be more like this:

defmodule Boop do
  use GenServer # this brings in a set of methods and such that you can override below, technically optional if you define them all, but good for documentation purposes anyway

  def start_link(startNum) do # The 'start_link' function can be named whatever, this is just normal OTP style
    GenServer.start_link(__MODULE__, startNum) # This calls the start_link on GenServer, what it does is that whatever process called this method will get an automatic monitor
      # setup for the process that GenServer is also creating (and returning the pid of), it will use this module 'Boop' as the callback module
      # When the GenServer code of this Actor (it has its own receive loop) receives a message then it will call this callback module
      # The callback functions are resolved at JIT/Actor creation time so no indirect calls are done that would waste time
  end

  def init(startNum) do # The argument is the second argument given to GenServer.start_link, can be whatever you want
    {:ok, startNum} # Return a tuple of {:ok, state} to start your GenServer actor, or return an {:error, msg} to die, few other options too
  end

  def handle_call(:get, _from, num) do
    {:reply, num, num} # return the number to the caller (second tuple entry) and as the state again (third)
  end

  def handle_call(handle more of these...) do
    # etc...
  end

  def handle_cast({:set, newNum}, _num) do
    {:noreply, newNum} # change the state to the new number
  end

  def handle_cast(more of these...) do
    # etc...
  end

  def handle_cast(:stop, _num) do
    {:stop, :normal, terminationStateIfAny} # Stop this GenServer with a :normal reason (can give anything else for a failure message) and pass in any data as the state for a terminate function if it exists
  end
end

And more (though for simple Actor Data containers like the above Elixir has a GenServer sub-type called an Agent, the above is similar to how an Agent is implemented). The raw BEAM 'receive' stuff would not be so easy in Rust as I've not seen a way that Rust can break up a single function into multiple callable parts, but the OTP style is doable by Rust perfectly, which is good as raw receive should almost never be used.

And an example of a BEAM webserver, take Phoenix, they've got it to over 2 million simultaneous websocket connections, and if they sent a message to those connections (a wikipedia article they used due to its size) took 2-4 seconds on a single machine. BEAM is, for example, what WhatsApp uses to host their messaging system on a single server for all users. This is not really a show-off of BEAM itself, as BEAM still has plenty more optimizations that can be done to it, but is rather showing off what the OTP style is capable of. Even if they did not have a single 24 core machine with 96 gigs of ram (which is still 40% unloaded at their most busy times) they could have scaled it out onto more computers, spool up more Phoenix nodes and create a Mesh where everything can communicate out transparently. BEAM scales not only up on more cores but also out onto more machines (great in case of hardware failure!), not because the VM is so good (although it is very well made), but because of the OTP programming principles.

If you want arbitrary scaling abilities among an arbitrary amount of connections with an arbitrary amount of running data containers (like a Game Entity) that can each be running different code, the OTP style is really the way to go.

Think of it this way, each Actor in an Actor system is like its own little running system process (just a lot more lightweight), you can program using OOP or ECS or whatever 'inside' of it, but all communication between Actors is done via messaging. Actors is just a higher level system view, you can still use ECS within them, or whatever is appropriate for the problem domain.

Also sorry for another long post. I get a bit rambly when tired. If you have specific questions I am much shorter at answering those. ^.^

OvermindDL1 commented Mar 28, 2016

I do want to make clear, imagine an Actor being implemented underneath via something kind of like this (in pseudo-C++):

struct ActorData {
    ActorPid id;
    AtomicQueue messageStack;
    GrowableStack stack;
}

Erlangs have some (a lot of) extra fields for optimization purposes, but that is the basics of it.

The ActorPid would be say a struct of 2 64 bit integers, where there is the 'node' id and the 'self' id, the node being the machine ID that it is running on and the self id being the actual ID of the node, in Erlang they work kind of like the Handles I mentioned previously, the index ID's are reused but there is a counter of use so it cannot be re-accessed again.

If you send a message to an Actor pid then it appends the message onto the AtomicQueue end and is done with it. Every Erlang function takes an internal hidden pointer to its position in its stack, which is just untyped memory, the variables in Erlang are untyped, but the memory itself is typed, so doing something like anInt = 42 makes a name (not really a variable, it is immutable, just points to a point in its stack, at the end of the 'active' part when the name was defined, unless the JIT optimizes it out, which it often does), but the memory it points to is typed, an integer like the above will be a 32bit/64bit (depending on platform width) with the first few bits defining the type (only a few built-in types, sorted in such a way for efficiency, like a 0 integer is actually a fully 32/64bit 0 integer in memory). Obviously in Rust I would not recommend that as it is actually typed and that provides a great boon.

So the scheduler is running (one per CPU core) and it sees that the Actor that received the message has a non-empty message queue, so it then calls the frozen function (in Erlang a receive call) and the Actor grabs a message on the front of the stack and processes it until the Actor pauses again by calling receive. The OTP framework wraps up the raw receive calls (which the JIT splits up into distinct function calls internally anyway) into a thing with callback, the most basic of which is the GenServer, which defines an 'init' callback (on Actor creation, so it can setup initial state or load itself from a database or whatever), a set of handle_call methods (which are called when a message is received that wants a return message), a set of handle_cast methods (which are called with a message but does not send a reply, or it can do it later if it wants), and a set of handle_info methods (which handle messages that were passed to the Actor not via the OTP framework (which is just a wrapper message around the base message, the OTP style is very simple, yet very powerful), as well as a terminate method (which is called when one of the other functions request its death). As Erlang/BEAM/Elixir is functional and immutable it carries around the state in the argument and returns the mutated state (which will then be passed in other methods later), and if it wants to die it returns a value to stop itself, which then calls the terminate.

If an Actor terminates uncleanly (crashes, divides by zero, whatever) the terminate will not be called but the Actor will be outright killed (all others keep going) and the Supervisor that it was created under will see the reason of why it died and usually recreate it if you have it do so, if too many die too fast then the Supervisor will (if your settings tell it to) then kill itself too, also killing all Actors under its supervision, to where 'its' Supervisor can recreate it and it can recreate the others with fresh good state from the DB or whatever. That tree of Supervisors that go down to basic GenServers (which have other things built on them like GenEvents, GenFSM's, etc...) defines the Supervision Tree of the entire Application (of which there can be multiple running Applications within an entire project). When a Supervisor creates an Actor it can create it locally on the same Node (by default), on another Node, ephemerally, or in a few other ways, but it knows about them, watches them, sets up Monitors with the Schedulers so it knows when they die (or an entire Node dies, like hardware failure), etc...

Basically, a simple Elixir Actor without OTP would be (:something is an Atom in Elixir, like my Atoms in C++ though a bit more powerful, think of them as a global Enum for quick integral matching):

def getOtherThing():
  receive
    {:wait, anything} -> # wait for anything and return it!
      anything
  after 5000 -> # Nothing received after 5 seconds, how sad...  I could leave this out to wait forever...
    :nothing
  end
end

def myLoop(num):
  receive
    {:get, toPid} -> # Someone requested what my number is, lets tell them!
      toPid ! {:theNum, num} # Sending them a message of a tuple with the atom :theNum and my number
      myLoop(num) # Tail call, no stack growth

    {:set, newNum} -> # Someone wants to set my number, sure!
      myLoop(newNum) # Tail call, no stack growth

    {:set_get_old, toPid, newNum} -> # Someone wants to set my number and get the old one sent back to them, sure!
      toPid ! {:theNum, num}
      myLoop(newNum) # Tail call, no stack growth

    :wait_for_another_thing -> # Ooo, someone says they will send me something else, then ignore it
      getOtherThing() # This has an embedded receive that will return whatever the next message receive of a :wait type
      myLoop(num) # Tail call, no stack growth

    :stop -> # Aww, someone wants me to die...  Oh well...  :-(
      # just return, do nothing... you have to return 'something' in Elixir, so I'll just do the atom :stop
      :stop # Tail, exiting, this Actor will die

  after 1000 ->
    IO.puts "Nothing happened for a second, I'm bored..." # The IO system can even redirect output if you want to another Node, say you connected a shell into the system to introspect it
    myLoop(num) # Tail Call, no stack growth
end

You could use it like:

boop = spawn myLoop(5) # Spawn a new Actor with the initial arg of 5, returns the PID
boop ! {:get, self()} # Sends a message to boop, boop should respond fairly immediately, can dump all pending messages on self() in the shell by doing:
flush() # returns the number that boop sent back to us
boop ! {:set, 10} # Have boop use the number 10 now!
boop ! :erghblergh # boop will now have this message stuck in its message inbox since there is never a receive that handles any message, should have added a case to receive to handle any message and log and dump it, OTP does for you!
boop ! :stop # boop will now die when it processes this message a few microseconds later
boop ! :anything # This goes no where, you get no respond, no confirmation, if I set up a Monitor on boop then I would have got a Monitor :down message when boop died

With OTP the above as a GenServer it would be more like this:

defmodule Boop do
  use GenServer # this brings in a set of methods and such that you can override below, technically optional if you define them all, but good for documentation purposes anyway

  def start_link(startNum) do # The 'start_link' function can be named whatever, this is just normal OTP style
    GenServer.start_link(__MODULE__, startNum) # This calls the start_link on GenServer, what it does is that whatever process called this method will get an automatic monitor
      # setup for the process that GenServer is also creating (and returning the pid of), it will use this module 'Boop' as the callback module
      # When the GenServer code of this Actor (it has its own receive loop) receives a message then it will call this callback module
      # The callback functions are resolved at JIT/Actor creation time so no indirect calls are done that would waste time
  end

  def init(startNum) do # The argument is the second argument given to GenServer.start_link, can be whatever you want
    {:ok, startNum} # Return a tuple of {:ok, state} to start your GenServer actor, or return an {:error, msg} to die, few other options too
  end

  def handle_call(:get, _from, num) do
    {:reply, num, num} # return the number to the caller (second tuple entry) and as the state again (third)
  end

  def handle_call(handle more of these...) do
    # etc...
  end

  def handle_cast({:set, newNum}, _num) do
    {:noreply, newNum} # change the state to the new number
  end

  def handle_cast(more of these...) do
    # etc...
  end

  def handle_cast(:stop, _num) do
    {:stop, :normal, terminationStateIfAny} # Stop this GenServer with a :normal reason (can give anything else for a failure message) and pass in any data as the state for a terminate function if it exists
  end
end

And more (though for simple Actor Data containers like the above Elixir has a GenServer sub-type called an Agent, the above is similar to how an Agent is implemented). The raw BEAM 'receive' stuff would not be so easy in Rust as I've not seen a way that Rust can break up a single function into multiple callable parts, but the OTP style is doable by Rust perfectly, which is good as raw receive should almost never be used.

And an example of a BEAM webserver, take Phoenix, they've got it to over 2 million simultaneous websocket connections, and if they sent a message to those connections (a wikipedia article they used due to its size) took 2-4 seconds on a single machine. BEAM is, for example, what WhatsApp uses to host their messaging system on a single server for all users. This is not really a show-off of BEAM itself, as BEAM still has plenty more optimizations that can be done to it, but is rather showing off what the OTP style is capable of. Even if they did not have a single 24 core machine with 96 gigs of ram (which is still 40% unloaded at their most busy times) they could have scaled it out onto more computers, spool up more Phoenix nodes and create a Mesh where everything can communicate out transparently. BEAM scales not only up on more cores but also out onto more machines (great in case of hardware failure!), not because the VM is so good (although it is very well made), but because of the OTP programming principles.

If you want arbitrary scaling abilities among an arbitrary amount of connections with an arbitrary amount of running data containers (like a Game Entity) that can each be running different code, the OTP style is really the way to go.

Think of it this way, each Actor in an Actor system is like its own little running system process (just a lot more lightweight), you can program using OOP or ECS or whatever 'inside' of it, but all communication between Actors is done via messaging. Actors is just a higher level system view, you can still use ECS within them, or whatever is appropriate for the problem domain.

Also sorry for another long post. I get a bit rambly when tired. If you have specific questions I am much shorter at answering those. ^.^

@HeroesGrave

This comment has been minimized.

Show comment
Hide comment
@HeroesGrave

HeroesGrave Mar 29, 2016

I've been messing around with an approach using clone-on-write and simple DAGs for scheduling.

First of all, data is clone-on-write. Component state is not mutated. This allows all components from the previous cycle to be visible in order to create the components for the next cycle.

Second of all, each system outputs one component (or intermediate calculation. see below).

The main problem with this approach of course is the number of frames it may take a change that depends on multiple components to propagate. The way I've solved this is allowing systems to depend on certain components or intermediate calculations already being processed, and to use those up-to-date values. This way, all neccessary changes can be propagated in 1 frame, as well as allowing systems to reuse intermediate calculations (I suspect this will be quite rare, but it's a useful feature nonetheless).

Therefore the signature for a system that requires 3 systems to be completed will look like this:

fn process(&mut self, in: (&ComponentList<A>, &ComponentList<B>, &ComponentList<C>), previous_cycle: &Components) -> ComponentList<Out>;

There is then a macro that builds up a sort-of DAG representing the systems that need to be completed before a certain system can be run. Then, using a scoped thread pool and some RwLocks, all the systems are queued up (or as many as the thread pool allows at once) and executed as their dependencies are met. When everything is finished, a new Components structure is returned with the updated components, and intermediate calculations are discarded.

It may even be possible to use critical path analysis to optimise the scheduling of systems at runtime based on some simple profiling, but that's not something I have the time to look into right now.

HeroesGrave commented Mar 29, 2016

I've been messing around with an approach using clone-on-write and simple DAGs for scheduling.

First of all, data is clone-on-write. Component state is not mutated. This allows all components from the previous cycle to be visible in order to create the components for the next cycle.

Second of all, each system outputs one component (or intermediate calculation. see below).

The main problem with this approach of course is the number of frames it may take a change that depends on multiple components to propagate. The way I've solved this is allowing systems to depend on certain components or intermediate calculations already being processed, and to use those up-to-date values. This way, all neccessary changes can be propagated in 1 frame, as well as allowing systems to reuse intermediate calculations (I suspect this will be quite rare, but it's a useful feature nonetheless).

Therefore the signature for a system that requires 3 systems to be completed will look like this:

fn process(&mut self, in: (&ComponentList<A>, &ComponentList<B>, &ComponentList<C>), previous_cycle: &Components) -> ComponentList<Out>;

There is then a macro that builds up a sort-of DAG representing the systems that need to be completed before a certain system can be run. Then, using a scoped thread pool and some RwLocks, all the systems are queued up (or as many as the thread pool allows at once) and executed as their dependencies are met. When everything is finished, a new Components structure is returned with the updated components, and intermediate calculations are discarded.

It may even be possible to use critical path analysis to optimise the scheduling of systems at runtime based on some simple profiling, but that's not something I have the time to look into right now.

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 29, 2016

Member

@HeroesGrave very interesting, thanks for sharing! Is this something you got working, or just toying around? I'd peek into the source ;)

Also, @csherratt mentioned that the cost of copy-on-write appears to be high. Cloning the world on each iteration can be expensive.

Member

kvark commented Mar 29, 2016

@HeroesGrave very interesting, thanks for sharing! Is this something you got working, or just toying around? I'd peek into the source ;)

Also, @csherratt mentioned that the cost of copy-on-write appears to be high. Cloning the world on each iteration can be expensive.

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Mar 29, 2016

@kvark there is always overhead in doing copy-on-write. The worst part about it is that it has a bunch of extra code being generated to handle the aliased cases. But I didn't find it to be significantly higher then a none cow data-structure.

The nice thing about copy-on-write is that it makes it really easy to get around the borrow checker. It's essentially free for the readers to clone data-structures, so if you want to do an update and need a copy of some of the old data you just call clone and you don't have to worry about who owns the tree. You end up with almost no RefCell<T>, Mutex<T>, Arc<T> because the data-structures behave well when shared by readers.

The big disadvantage of cow is that you limit yourself in what data-structures you can select. Trees map to copy-on-write extremely well. So BTree's and Radix trees work well with them. Hashmaps, and VecMap do not since they require cloning the entire data-structure (not just the parts changed).

ghost commented Mar 29, 2016

@kvark there is always overhead in doing copy-on-write. The worst part about it is that it has a bunch of extra code being generated to handle the aliased cases. But I didn't find it to be significantly higher then a none cow data-structure.

The nice thing about copy-on-write is that it makes it really easy to get around the borrow checker. It's essentially free for the readers to clone data-structures, so if you want to do an update and need a copy of some of the old data you just call clone and you don't have to worry about who owns the tree. You end up with almost no RefCell<T>, Mutex<T>, Arc<T> because the data-structures behave well when shared by readers.

The big disadvantage of cow is that you limit yourself in what data-structures you can select. Trees map to copy-on-write extremely well. So BTree's and Radix trees work well with them. Hashmaps, and VecMap do not since they require cloning the entire data-structure (not just the parts changed).

@HeroesGrave

This comment has been minimized.

Show comment
Hide comment
@HeroesGrave

HeroesGrave Mar 29, 2016

@kvark: It's pretty much working, but I want to clean up the code a little bit more and test some more complex workloads.

Also, due to the lack of HKTs, it's not possible to be generic across different component storage types, which is a bit of a pain. Might throw everything in an enum as a workaround.

HeroesGrave commented Mar 29, 2016

@kvark: It's pretty much working, but I want to clean up the code a little bit more and test some more complex workloads.

Also, due to the lack of HKTs, it's not possible to be generic across different component storage types, which is a bit of a pain. Might throw everything in an enum as a workaround.

@Liotitch

This comment has been minimized.

Show comment
Hide comment
@Liotitch

Liotitch Mar 29, 2016

I would like to give my opinion on this topic since I have already worked on implementing a
thread safe ECS. So i will give it a try.

In my implementation, entities are only composed of four things:

  • an index, for accessing component's containers
  • a version, to prevents confusing with an entity index recycled
  • a bitset to identify what are the components attached to the entity
  • a uuid to synchronise entity across networks

Most of the time the entity index is used for accessing everything, the handle (index + version),
is useful only when you need to keep a reference on an entity across upates. For exemple a target component that contains the entity targeted. If the entity dies for some reason, the component won't be able to upgrade the handle to a raw index and thus, accessing stale data.

Components are just pure data. They need to implement a trait Component with two associated types, one is for defining what is the container of the component. The second for defining how the
component is updated.

In this implementation, I always have two states. The past state and the futur one.

The past state is read-only, all processes read this state and compute modifications accordingly.
The futur state is write-only, it does not need to exist since we can modify in place the past
state in the sync points of the process scheduler.

Every process writes their modifications through a StateWriter that puts the updates in a buffer
(an insert, a remove or the update enum specified by the component trait). There is one StateWriter
by worker thread and we can reuse them across frames to avoid allocations.

The idea is that I can have a job system where a bunch of job are kicked to worker threads,
each job can spawn other jobs so the processes can create finer-grained units of computation.
This allows to kick a lot a processes on worker threads and let the work-stealing do the load balancing.

I can also use this job system to batch updates of components. Each job takes care of one component
type and applies all the update buffered to that component.

Finally, the process execution graph will be at best only one deep level, no dependencies, only a sync/update point.
If a process depends on other processes computation we can define precise sync points between their executions and update the past state before continuing. If you do COW structures you can even start the update of the components when no more systems are modifying it.
It is just a matter of adding a job waiting the execution of other jobs before being kicked.

There is a incomplete draft of the implementation available here. Note that it is incomplete and does not necessarily match what is described here.

Liotitch commented Mar 29, 2016

I would like to give my opinion on this topic since I have already worked on implementing a
thread safe ECS. So i will give it a try.

In my implementation, entities are only composed of four things:

  • an index, for accessing component's containers
  • a version, to prevents confusing with an entity index recycled
  • a bitset to identify what are the components attached to the entity
  • a uuid to synchronise entity across networks

Most of the time the entity index is used for accessing everything, the handle (index + version),
is useful only when you need to keep a reference on an entity across upates. For exemple a target component that contains the entity targeted. If the entity dies for some reason, the component won't be able to upgrade the handle to a raw index and thus, accessing stale data.

Components are just pure data. They need to implement a trait Component with two associated types, one is for defining what is the container of the component. The second for defining how the
component is updated.

In this implementation, I always have two states. The past state and the futur one.

The past state is read-only, all processes read this state and compute modifications accordingly.
The futur state is write-only, it does not need to exist since we can modify in place the past
state in the sync points of the process scheduler.

Every process writes their modifications through a StateWriter that puts the updates in a buffer
(an insert, a remove or the update enum specified by the component trait). There is one StateWriter
by worker thread and we can reuse them across frames to avoid allocations.

The idea is that I can have a job system where a bunch of job are kicked to worker threads,
each job can spawn other jobs so the processes can create finer-grained units of computation.
This allows to kick a lot a processes on worker threads and let the work-stealing do the load balancing.

I can also use this job system to batch updates of components. Each job takes care of one component
type and applies all the update buffered to that component.

Finally, the process execution graph will be at best only one deep level, no dependencies, only a sync/update point.
If a process depends on other processes computation we can define precise sync points between their executions and update the past state before continuing. If you do COW structures you can even start the update of the components when no more systems are modifying it.
It is just a matter of adding a job waiting the execution of other jobs before being kicked.

There is a incomplete draft of the implementation available here. Note that it is incomplete and does not necessarily match what is described here.

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 29, 2016

I've always questioned the point of the bitset that defines what components are set, I've just let the Component Containers handle that themselves. If something wants to know if an entity has some components then just query them. If something wants to know all the entities that have certain components, then make an Aspect and it will get updated. If something wants all entities and all components then it can make its own Aspect handler that will listen for everything, it can hold that data however it wants.

The reason I hate the bitsets is I use a LOT of components, many hundreds can even be a low number for a fairly small game, and carrying that data around of what is used and what is not is not really useful.

OvermindDL1 commented Mar 29, 2016

I've always questioned the point of the bitset that defines what components are set, I've just let the Component Containers handle that themselves. If something wants to know if an entity has some components then just query them. If something wants to know all the entities that have certain components, then make an Aspect and it will get updated. If something wants all entities and all components then it can make its own Aspect handler that will listen for everything, it can hold that data however it wants.

The reason I hate the bitsets is I use a LOT of components, many hundreds can even be a low number for a fairly small game, and carrying that data around of what is used and what is not is not really useful.

@Liotitch

This comment has been minimized.

Show comment
Hide comment
@Liotitch

Liotitch Mar 29, 2016

@OvermindDL1 I agree with you on that point. It is only useful when you don't have many entities to iterate and you don't cache them for future updates. However, if a process wants to filter entities conditionally it allows to just do a bit mask to check if the entity is bound to the filter but this might be rare/nonexistent.

Liotitch commented Mar 29, 2016

@OvermindDL1 I agree with you on that point. It is only useful when you don't have many entities to iterate and you don't cache them for future updates. However, if a process wants to filter entities conditionally it allows to just do a bit mask to check if the entity is bound to the filter but this might be rare/nonexistent.

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 29, 2016

Member

@Liotitch
Thank you for describing your approach!

@OvermindDL1
You assume there is an Aspect concept, which solves the problem. When would two different systems use the same aspect though? I'm thinking that each system would have it's own inputs/outputs, and thus it's own aspect.

Member

kvark commented Mar 29, 2016

@Liotitch
Thank you for describing your approach!

@OvermindDL1
You assume there is an Aspect concept, which solves the problem. When would two different systems use the same aspect though? I'm thinking that each system would have it's own inputs/outputs, and thus it's own aspect.

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 29, 2016

@kvark In every setup I've used so far I have many cases where multiple systems work on the same aspects, systems that have no aspects, and systems that have multiple aspects. It is a good style to follow.

OvermindDL1 commented Mar 29, 2016

@kvark In every setup I've used so far I have many cases where multiple systems work on the same aspects, systems that have no aspects, and systems that have multiple aspects. It is a good style to follow.

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Mar 29, 2016

Member

@OvermindDL1 could you drop in some examples?

Member

kvark commented Mar 29, 2016

@OvermindDL1 could you drop in some examples?

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Mar 29, 2016

@kvark Sure, from a small project:

*) Multiple systems with the same Aspects:
SoundComponent -> SoundManager (Play sounds), SoundBroadcaster (Broadcasts sounds to an AI system based on nearby things in an octree or so)

*) Systems with no Aspects
Anything that needs to process per tick that is not entities, like working through the network packet queue to distribute incoming packets where they belong

*) Systems with multiple different kinds of Aspects
SimpleCollision -> Has one Aspect list of entities with CollisionBodies and another Aspect with a list of entities that have CollisionAreas (many things often have both)

OvermindDL1 commented Mar 29, 2016

@kvark Sure, from a small project:

*) Multiple systems with the same Aspects:
SoundComponent -> SoundManager (Play sounds), SoundBroadcaster (Broadcasts sounds to an AI system based on nearby things in an octree or so)

*) Systems with no Aspects
Anything that needs to process per tick that is not entities, like working through the network packet queue to distribute incoming packets where they belong

*) Systems with multiple different kinds of Aspects
SimpleCollision -> Has one Aspect list of entities with CollisionBodies and another Aspect with a list of entities that have CollisionAreas (many things often have both)

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Apr 6, 2016

Member

Heads up everyone - I was able to implement a ECS design that fulfils my requirements:
https://github.com/kvark/parsec

Thread-safe parallel processing is there, with not even a char of unsafe code. Still got to figure out the entity/component recycling though, so will follow the thread.

Member

kvark commented Apr 6, 2016

Heads up everyone - I was able to implement a ECS design that fulfils my requirements:
https://github.com/kvark/parsec

Thread-safe parallel processing is there, with not even a char of unsafe code. Still got to figure out the entity/component recycling though, so will follow the thread.

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Apr 11, 2016

@kvark As I often like to do I like to run an ECS through a few of my stress tests that my system handles with ease to see if I have found a replacement so I can stop dealing with my own so it is time for parsec. ^.^

I used your example.rs as the base, created 1 million entities (not unheard of in some of my mini games, I often have ten thousand active entities with bursts of up to a million that can be rapidly created and destroyed, hence why I had to learn how to optimize an ECS), with 99% having a parsec::VecStorage Component and 2% having a parsec::HashMapStorage component with 1% of overlap. First issue found, these two lines takes a substantial amount of time (almost a full second):

    scheduler.run0w1r(|b: &CompBool| { // The rare component
        println!("Entity {}", b.0);
    });
    scheduler.wait();

Where such an equivalent Aspect in my system takes <1ms (even in a debug run). As I am still learning Rust I wondered if it was caused by println buffering output, so I removed that line (so the system is completely a no-op, still running in debug mode for note but was worried it might have been optimized out regardless as it often would in C++) and got the same results, so output buffering was not the issue.

At this point I delved into the code and it looked like it was iterating over every single entity for every single system callback even if not wanted (it is just skipped for the callback, but it is still iterated over), which to me does not seem in any sense scalable? What is the plan to work around that?

OvermindDL1 commented Apr 11, 2016

@kvark As I often like to do I like to run an ECS through a few of my stress tests that my system handles with ease to see if I have found a replacement so I can stop dealing with my own so it is time for parsec. ^.^

I used your example.rs as the base, created 1 million entities (not unheard of in some of my mini games, I often have ten thousand active entities with bursts of up to a million that can be rapidly created and destroyed, hence why I had to learn how to optimize an ECS), with 99% having a parsec::VecStorage Component and 2% having a parsec::HashMapStorage component with 1% of overlap. First issue found, these two lines takes a substantial amount of time (almost a full second):

    scheduler.run0w1r(|b: &CompBool| { // The rare component
        println!("Entity {}", b.0);
    });
    scheduler.wait();

Where such an equivalent Aspect in my system takes <1ms (even in a debug run). As I am still learning Rust I wondered if it was caused by println buffering output, so I removed that line (so the system is completely a no-op, still running in debug mode for note but was worried it might have been optimized out regardless as it often would in C++) and got the same results, so output buffering was not the issue.

At this point I delved into the code and it looked like it was iterating over every single entity for every single system callback even if not wanted (it is just skipped for the callback, but it is still iterated over), which to me does not seem in any sense scalable? What is the plan to work around that?

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Apr 11, 2016

Member

@OvermindDL1 thank you for having a look at parsec!

You might be interested in the existing project called ecs_bench (by @lschmierer) that aims to profile different ECS implementations in Rust. It's first test case is very similar to yours, where only a fraction of entities actually satisfy the system's requirements for running.

What you found is a known issue - the runXwYr variants always check for all components, and don't attempt to do early exit. We could check for the components to exist one by one, but we'd have to settle with the user-given order, and it has writable firsts, so it's limited. We'll do this anyway, there is no downsides for early-existing at least like that.

One way to address this is - if you know that some of the components are rarer than others - to use the custom run function and query those rare components first, early exit, or otherwise proceed.
Edit: just realized your test case only uses a single component, so this optimization wouldn't help you.

Another - a more radical way, is to use tables by @csherratt. He's already got a prototype implemented on top of parsec's storage interface, and it performed 6 times faster on the ecs_bench's test case. AFAIK, it doesn't require any extensions to parsec itself, and works somewhat automatically (@csherratt please correct me if that's wrong), which makes it a much nicer alternative (in my eyes) to rolling any sort of Aspect support. I like to keep things simple and separate complexity into layers, and that's what parsec succeeds at so far.

Please feel free to hop on our gitter and/or create issues in our project (or contribute!), to avoid off-topic in this thread.

Member

kvark commented Apr 11, 2016

@OvermindDL1 thank you for having a look at parsec!

You might be interested in the existing project called ecs_bench (by @lschmierer) that aims to profile different ECS implementations in Rust. It's first test case is very similar to yours, where only a fraction of entities actually satisfy the system's requirements for running.

What you found is a known issue - the runXwYr variants always check for all components, and don't attempt to do early exit. We could check for the components to exist one by one, but we'd have to settle with the user-given order, and it has writable firsts, so it's limited. We'll do this anyway, there is no downsides for early-existing at least like that.

One way to address this is - if you know that some of the components are rarer than others - to use the custom run function and query those rare components first, early exit, or otherwise proceed.
Edit: just realized your test case only uses a single component, so this optimization wouldn't help you.

Another - a more radical way, is to use tables by @csherratt. He's already got a prototype implemented on top of parsec's storage interface, and it performed 6 times faster on the ecs_bench's test case. AFAIK, it doesn't require any extensions to parsec itself, and works somewhat automatically (@csherratt please correct me if that's wrong), which makes it a much nicer alternative (in my eyes) to rolling any sort of Aspect support. I like to keep things simple and separate complexity into layers, and that's what parsec succeeds at so far.

Please feel free to hop on our gitter and/or create issues in our project (or contribute!), to avoid off-topic in this thread.

@OvermindDL1

This comment has been minimized.

Show comment
Hide comment
@OvermindDL1

OvermindDL1 Apr 11, 2016

@kvark Fascinating links and information, thanks.

Yeah my basic test involved a 2D screen with a set of tiny floating squares (movement system floats them around bouncing off screen edges) that when they intersect it (collision system) destroys both, emits (via event) a shower of new Entities for each (basically particles), and the spawn system shortly later spawns those back in within about 10ms. Generally have 1000 squares or 10k for a stress test, each collision creates about a hundred more entities as they zoom off until they fade out and are removed, and when it 10k exist the collisions are... consistently constant to put it mildly. I can only achieve real-time speeds in my system thanks to my aspect cache (and shader trickery for the rendering side). I should probably get a gitter account, done. ^.^

OvermindDL1 commented Apr 11, 2016

@kvark Fascinating links and information, thanks.

Yeah my basic test involved a 2D screen with a set of tiny floating squares (movement system floats them around bouncing off screen edges) that when they intersect it (collision system) destroys both, emits (via event) a shower of new Entities for each (basically particles), and the spawn system shortly later spawns those back in within about 10ms. Generally have 1000 squares or 10k for a stress test, each collision creates about a hundred more entities as they zoom off until they fade out and are removed, and when it 10k exist the collisions are... consistently constant to put it mildly. I can only achieve real-time speeds in my system thanks to my aspect cache (and shader trickery for the rendering side). I should probably get a gitter account, done. ^.^

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark Apr 11, 2016

Member

@OvermindDL1 that sounds like a very good testcase for ECS. I'm currently porting yasteroids, and it's going to take some time, but I'd love to try out your scenario with parsec to see where we stand.

Member

kvark commented Apr 11, 2016

@OvermindDL1 that sounds like a very good testcase for ECS. I'm currently porting yasteroids, and it's going to take some time, but I'd love to try out your scenario with parsec to see where we stand.

@kvark

This comment has been minimized.

Show comment
Hide comment
@kvark

kvark May 4, 2016

Member

Closed by #44

Member

kvark commented May 4, 2016

Closed by #44

@JeanMertz

This comment has been minimized.

Show comment
Hide comment
@JeanMertz

JeanMertz Aug 6, 2018

Just wanted to note that this was a super fascinating discussion. Lots of information here that's useful for someone figuring out what patterns there are, and how to handle them. Thank you all for spending your time here ❤️

JeanMertz commented Aug 6, 2018

Just wanted to note that this was a super fascinating discussion. Lots of information here that's useful for someone figuring out what patterns there are, and how to handle them. Thank you all for spending your time here ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment